블로그 이미지
보미아빠

카테고리

보미아빠, 석이 (525)
밥벌이 (16)
싸이클 (1)
일상 (1)
Total
Today
Yesterday

달력

« » 2025.8
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31

공지사항

최근에 올라온 글

sqler 에서 알게된 형주 형님이 와인을 선물해 주셨다.

울 둘째 여름이(첫째 이름은 보미 입니다.) 최종 빌드 완료되면 둘이 개봉샷 하겠습니다.
감사합니다. 이얏호~ 근데 4시즌 할 수 있을지는 미정 입니다.

와인에 와짜도 모르는데 요즘 고급 와인만 맛들이고 있습니다. 큰일 입니다.
베리짜노 호스텐....아 ~ 행복합니다.

자 오픈샷 입니다. 저렇게 고급 와인은 첨봐용~ 디캔팅도 해야 한답니다.


Posted by 보미아빠
, |

page

카테고리 없음 / 2011. 6. 24. 18:42

Next up in the Inside the Storage Engine series is a discussion of page structure. Pages exist to store records. A database page is an 8192-byte (8KB) chunk of a database data file. They are aligned on 8KB boundaries within the data files, starting at byte-offset 0 in the file. Here's a picture of the basic structure:

page.gif

Header

The page header is 96 bytes long. What I'd like to do in this section is take an example page header dump from DBCC PAGE and explain what all the fields mean. I'm using the database from the page split post and I've snipped off the rest of the DBCC PAGE output.

DBCC

TRACEON (3604)

DBCC

PAGE ('pagesplittest', 1, 143, 1);

GO

m_pageId = (1:143) m_headerVersion = 1 m_type = 1
m_typeFlagBits = 0x4 m_level = 0 m_flagBits = 0x200
m_objId (AllocUnitId.idObj) = 68 m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594042384384
Metadata: PartitionId = 72057594038386688 Metadata: IndexId = 1
Metadata: ObjectId = 2073058421 m_prevPage = (0:0) m_nextPage = (1:154)
pminlen = 8 m_slotCnt = 4 m_freeCnt = 4420
m_freeData = 4681 m_reservedCnt = 0 m_lsn = (18:116:25)
m_xactReserved = 0 m_xdesId = (0:0) m_ghostRecCnt = 0
m_tornBits = 1333613242

Here's what all the fields mean (note that the fields aren't quite stored in this order on the page):

  • m_pageId
    • This identifies the file number the page is part of and the position within the file. In this example, (1:143) means page 143 in file 1.
  • m_headerVersion
    • This is the page header version. Since version 7.0 this value has always been 1.
  • m_type
    • This is the page type. The values you're likely to see are:
      • 1 - data page. This holds data records in a heap or clustered index leaf-level.
      • 2 - index page. This holds index records in the upper levels of a clustered index and all levels of non-clustered indexes.
      • 3 - text mix page. A text page that holds small chunks of LOB values plus internal parts of text tree. These can be shared between LOB values in the same partition of an index or heap.
      • 4 - text tree page. A text page that holds large chunks of LOB values from a single column value.
      • 7 - sort page. A page that stores intermediate results during a sort operation.
      • 8 - GAM page. Holds global allocation information about extents in a GAM interval (every data file is split into 4GB chunks - the number of extents that can be represented in a bitmap on a single database page). Basically whether an extent is allocated or not. GAM = Global Allocation Map. The first one is page 2 in each file. More on these in a later post.
      • 9 - SGAM page. Holds global allocation information about extents in a GAM interval. Basically whether an extent is available for allocating mixed-pages. SGAM = Shared GAM. the first one is page 3 in each file. More on these in a later post.
      • 10 - IAM page. Holds allocation information about which extents within a GAM interval are allocated to an index or allocation unit, in SQL Server 2000 and 2005 respectively. IAM = Index Allocation Map. More on these in a later post.
      • 11 - PFS page. Holds allocation and free space information about pages within a PFS interval (every data file is also split into approx 64MB chunks - the number of pages that can be represented in a byte-map on a single database page. PFS = Page Free Space. The first one is page 1 in each file. More on these in a later post.
      • 13 - boot page. Holds information about the database. There's only one of these in the database. It's page 9 in file 1.
      • 15 - file header page. Holds information about the file. There's one per file and it's page 0 in the file.
      • 16 - diff map page. Holds information about which extents in a GAM interval have changed since the last full or differential backup. The first one is page 6 in each file.
      • 17 - ML map page. Holds information about which extents in a GAM interval have changed while in bulk-logged mode since the last backup. This is what allows you to switch to bulk-logged mode for bulk-loads and index rebuilds without worrying about breaking a backup chain. The first one is page 7 in each file.
  • m_typeFlagBits
    • This is mostly unused. For data and index pages it will always be 4. For all other pages it will always be 0 - except PFS pages. If a PFS page has m_typeFlagBits of 1, that means that at least one of the pages in the PFS interval mapped by the PFS page has at least one ghost record.
  • m_level
    • This is the level that the page is part of in the b-tree.
    • Levels are numbered from 0 at the leaf-level and increase to the single-page root level (i.e. the top of the b-tree).
    • In SQL Server 2000, the leaf level of a clustered index (with data pages) was level 0, and the next level up (with index pages) was also level 0. The level then increased to the root. So to determine whether a page was truly at the leaf level in SQL Server 2000, you need to look at the m_type as well as the m_level.
    • For all page types apart from index pages, the level is always 0.
  • m_flagBits
    • This stores a number of different flags that describe the page. For example, 0x200 means that the page has a page checksum on it (as our example page does) and 0x100 means the page has torn-page protection on it.
    • Some bits are no longer used in SQL Server 2005.
  • m_objId
  • m_indexId
    • In SQL Server 2000, these identified the actual relational object and index IDs to which the page is allocated. In SQL Server 2005 this is no longer the case. The allocation metadata totally changed so these instead identify what's called the allocation unit that the page belongs to (I'll do another post that describes these later today).
  • m_prevPage
  • m_nextPage
    • These are pointers to the previous and next pages at this level of the b-tree and store 6-byte page IDs.
    • The pages in each level of an index are joined in a doubly-linked list according to the logical order (as defined by the index keys) of the index. The pointers do not necessarily point to the immediately adjacent physical pages in the file (because of fragmentation).
    • The pages on the left-hand side of a b-tree level will have the m_prevPage pointer be NULL, and those on the right-hand side will have the m_nextPage be NULL.
    • In a heap, or if an index only has a single page, these pointers will both be NULL for all pages.
  • pminlen
    • This is the size of the fixed-length portion of the records on the page.
  • m_slotCnt
    • This is the count of records on the page.
  • m_freeCnt
    • This is the number of bytes of free space in the page.
  • m_freeData
    • This is the offset from the start of the page to the first byte after the end of the last record on the page. It doesn't matter if there is free space nearer to the start of the page.
  • m_reservedCnt
    • This is the number of bytes of free space that has been reserved by active transactions that freed up space on the page. It prevents the free space from being used up and allows the transactions to roll-back correctly. There's a very complicated algorithm for changing this value.
  • m_lsn
    • This is the Log Sequence Number of the last log record that changed the page.
  • m_xactReserved
    • This is the amount that was last added to the m_reservedCnt field.
  • m_xdesId
    • This is the internal ID of the most recent transaction that added to the m_reservedCnt field.
  • m_ghostRecCnt
    • The is the count of ghost records on the page.
  • m_tornBits
    • This holds either the page checksum or the bits that were displaced by the torn-page protection bits - depending on what form of page protection is turnde on for the database.

Note that I didn't include the fields starting with Metadata:. That's because they're not part of a page header. During SQL Server 2005 development I did some major work rewriting the guts of DBCC PAGE and to save everyone using it from having to do all the system table lookups to determine what the actual object and index IDs are, I changed DBCC PAGE to do them internally and output the results.

Records

See this blog post for details.

Slot Array

It's a very common misconception that records within a page are always stored in logical order. This is not true. There is another misconception that all the free-space in a page is always maintained in one contiguous chunk. This also is not true. (Yes, the image above shows the free space in one chunk and that very often is the case for pages that are being filled gradually.)

If a record is deleted from a page, everything remaining on the page is not suddenly compacted - inserters pay the cost of compaction when its necessary, not deleters.

Consider a completely full page - this means that record deletions cause free space holes within the page. If a new record needs to be inserted onto the page, and one of the holes is big enough to squeeze the record into, why go to the bother of comapcting it? Just stick the record in and carry on. What if the record should logically have come at the end of all other records on the page, but we've just inserted it in the middle - doesn't that screw things up somewhat?

No, because the slot array is ordered and gets reshuffled as records are inserted and deleted from pages. As long as the first slot array entry points to the logically first record on the page, everything's fine. Each slot entry is just a two-byte pointer into the page - so its far more efficient to manipulate the slot array than it is to manipulate a bunch of records on the page. Only when we know there's enough free space contained within the page to fit in a record, but its spread about the page do we compact the records on the page to make the free space into a contiguous chunk.

One interesting fact is that the slot array grows backwards from the end of the page, so the free space is squeezed from the top by new rows, and from the bottom by the slot array.

Posted by 보미아빠
, |


1. 두 테이블에 인덱스가 없으면, hash 조인으로만 푼다.
   아니다 오히려 hash 로 못 푸는 경우가 발생 할 수도 있다.

if object_id('tblx') is not null
drop table tblx
go

if object_id('tbly') is not null
drop table tbly
go


create table tblx
(idx int
,c1 int
)
go

create table tbly
(idx int
,c1 int
)
go

insert into tblx values (1,1)
insert into tblx values (1,2)
insert into tbly values (1,1)
insert into tbly values (2,null)
go

set statistics profile on

if object_id('tblx') is not null
drop table tblx
go

if object_id('tbly') is not null
drop table tbly
go


create table tblx
(idx int
,c1 int
)
go

create table tbly
(idx int
,c1 int
)
go

insert into tblx values (1,1)
insert into tblx values (1,2)
insert into tbly values (1,1)
insert into tbly values (2,null)
go

set statistics profile on

select *
  from tblx a
  join tbly b
    on a.idx = b.idx
 where a.idx > 1
option (hash join)

select *    from tblx a    join tbly b      on a.idx = b.idx    where a.idx > 1   option (hash join)
  |--Hash Match(Inner Join, HASH:([a].[idx])=([b].[idx]), RESIDUAL:([a].[dbo].[tbly].[idx] as [b].[idx]=[a].[dbo].[tblx].[idx] as [a].[idx]))
       |--Table Scan(OBJECT:([a].[dbo].[tblx] AS [a]), WHERE:([a].[dbo].[tblx].[idx] as [a].[idx]>(1)))
       |--Table Scan(OBJECT:([a].[dbo].[tbly] AS [b]), WHERE:([a].[dbo].[tbly].[idx] as [b].[idx]>(1)))
      
      
select *
  from tblx a
  join tbly b
    on a.idx = b.idx
 where a.idx = 1
option (hash join)

메시지 8622, 수준 16, 상태 1, 줄 1
이 쿼리에 정의된 힌트로 인해 쿼리 프로세서에서 쿼리 계획을 생성할 수 없습니다. 힌트를 지정하거나 SET FORCEPLAN을 사용하지 않고 쿼리를 다시 전송하십시오.


select *
  from tblx a
  join tbly b
    on a.idx = b.idx
 where a.idx = 1

select *    from tblx a    join tbly b      on a.idx = b.idx    where a.idx = 1
  |--Nested Loops(Inner Join) -- 뒤에 어떠한 join predicate 가 없다.
       |--Table Scan(OBJECT:([a].[dbo].[tbly] AS [b]), WHERE:([a].[dbo].[tbly].[idx] as [b].[idx]=(1)))
       |--Table Scan(OBJECT:([a].[dbo].[tblx] AS [a]), WHERE:([a].[dbo].[tblx].[idx] as [a].[idx]=(1)))

hash join 은 적어도 하나의 equijoin predicate 가 있어야만 풀 수 있기 때문이다. merge join 도 마찬가지 이다.

이것은 프로시저로 작성할 때 변수에 의해서 쿼리가 실행되고 안되고 하는 샘플을 만들 수 있고 변수에 따라서 쿼리가 동작 할 수 있고 없는 조건이 발생 하기도 한다.

a.idx between 1 and 100 이 입력되는 경우도 있고, 1 and 1 이라고 입력 된다고 생각해 보라. 이 또한 이행적 폐쇄(transitive closure) 법칙이 적용될 수 있고, 이것은 sql server 에서는 힌트로 막을 방법이 없다. 이러한 변경이 일어나는 경우 hash join, merge join 의 기본 원칙인 equijoin 이 predicate 로 반드시 들어가야 하는 기본 원칙을 위배 하므로 실행 될 수 없다.

(oracle 의 경우 query_rewrite_enabled 파라미터를 이용해 막을 수 있다.)


select *
  from tblx a
  join tbly b
    on a.idx = b.idx
 where a.idx = 1
option (merge join)

메시지 8622, 수준 16, 상태 1, 줄 1
이 쿼리에 정의된 힌트로 인해 쿼리 프로세서에서 쿼리 계획을 생성할 수 없습니다. 힌트를 지정하거나 SET FORCEPLAN을 사용하지 않고 쿼리를 다시 전송하십시오.


select *
  from tblx a
  join tbly b
    on a.idx = b.idx


select *    from tblx a    join tbly b      on a.idx = b.idx
  |--Hash Match(Inner Join, HASH:([b].[idx])=([a].[idx]), RESIDUAL:([pc].[dbo].[tbly].[idx] as [b].[idx]=[pc].[dbo].[tblx].[idx] as [a].[idx]))
       |--Table Scan(OBJECT:([pc].[dbo].[tbly] AS [b]))
       |--Table Scan(OBJECT:([pc].[dbo].[tblx] AS [a]))


select *
  from tblx a
  inner loop join tbly b
    on a.idx = b.idx


select *    from tblx a    inner loop join tbly b      on a.idx = b.idx
  |--Nested Loops(Inner Join, WHERE:([pc].[dbo].[tbly].[idx] as [b].[idx]=[pc].[dbo].[tblx].[idx] as [a].[idx]))
       |--Table Scan(OBJECT:([pc].[dbo].[tblx] AS [a]))
       |--Table Scan(OBJECT:([pc].[dbo].[tbly] AS [b]))


select *
  from tblx a
  inner merge join tbly b
    on a.idx = b.idx


select *    from tblx a    inner merge join tbly b      on a.idx = b.idx
  |--Merge Join(Inner Join, MANY-TO-MANY MERGE:([a].[idx])=([b].[idx]), RESIDUAL:([pc].[dbo].[tbly].[idx] as [b].[idx]=[pc].[dbo].[tblx].[idx] as [a].[idx]))
       |--Sort(ORDER BY:([a].[idx] ASC))
       |    |--Table Scan(OBJECT:([pc].[dbo].[tblx] AS [a]))
       |--Sort(ORDER BY:([b].[idx] ASC))
            |--Table Scan(OBJECT:([pc].[dbo].[tbly] AS [b]))

또하나의 간단한 이야기를 해보자.

if object_id('tblx') is not null
drop table tblx
go

if object_id('tbly') is not null
drop table tbly
go


create table tblx
(idx int
,c1 int
)
go

create table tbly
(idx int
,c1 int
)
go

insert into tblx values (1,1)
insert into tblx values (2,2)
insert into tbly values (1,1)

select *
  from tblx a
  join tbly b
    on a.idx = 1
option (hash join)
go

select *    from tblx a    join tbly b      on a.idx = 1
  |--Nested Loops(Inner Join) -- 여기에 아무것도 없다.
       |--Table Scan(OBJECT:([a].[dbo].[tblx] AS [a]), WHERE:([a].[dbo].[tblx].[idx] as [a].[idx]=(1)))
       |--Table Scan(OBJECT:([a].[dbo].[tbly] AS [b]))

해당 쿼리는 a.idx = 1 인것을 찾고 그냥 cross join 을 수행한다.
그러므로 hash function 을 수행 할 equijoin predicate 가 역시 없다. 그러므로 hash join 으로 풀 수 없다.
아래 not exists 에서 hash join 으로 못 푸는 것 역시 이렇게 해석 할 수 있다.

그러나 full outer merge join 의 경우 non equijoin 도 가능하기 때문에 다음 쿼리가 가능하다.

select *
  from tblx a
  full outer join tbly b
    on a.idx = 1
option (merge join)


필터 조건이 없으면 hash 만 되는가?
아니다 다음 쿼리의 예제를 보자


select *
  from tblx a
  join tbly b
    on a.idx > b.idx


select *    from tblx a    join tbly b      on a.idx > b.idx
  |--Nested Loops(Inner Join, WHERE:([pc].[dbo].[tblx].[idx] as [a].[idx]>[pc].[dbo].[tbly].[idx] as [b].[idx]))
       |--Table Scan(OBJECT:([pc].[dbo].[tbly] AS [b]))
       |--Table Scan(OBJECT:([pc].[dbo].[tblx] AS [a]))


힌트주면 되지요? 
아니다. 해쉬펑션을 통과한 값의 크다 작다를 비교 할 수 있으면 넌 신이다. 난 신 많이 봤다. ;)


select *
  from tblx a
  inner hash join tbly b
    on a.idx > b.idx


메시지 8622, 수준 16, 상태 1, 줄 1
이 쿼리에 정의된 힌트로 인해 쿼리 프로세서에서 쿼리 계획을 생성할 수 없습니다. 힌트를 지정하거나 SET FORCEPLAN을 사용하지 않고 쿼리를 다시 전송하십시오.


2. in 과 not in 은 반대 인가요?
   아니다


다음을 살펴보자.



if object_id('tblx') is not null
drop table tblx
go

if object_id('tbly') is not null
drop table tbly
go


create table tblx
(idx int
,c1 int
)
go

create table tbly
(idx int
,c1 int
)
go

insert into tblx values (1,1)
insert into tblx values (2,2)
insert into tbly values (1,1)
insert into tbly values (2,null)
go



select *
  from tblx
 where c1 in (select c1 from tbly)


idx         c1
----------- -----------
1           1

(1개 행이 영향을 받음)



select *
  from tblx
 where c1 not in (select c1 from tbly)


idx         c1
----------- -----------

(0개 행이 영향을 받음)


2 는 어디로 갔나?
왜 그런가? null 이있다는 생각을 하자 null 의 오묘함에 빠져보면 재미있을 것이다.



3. table 의 check 조건 써야 하나요?
   그렇다. 꼭 써 주는게 좋습니다. 이러한 체크조건이 있냐 없냐에 따라 join 방법도 틀리게 할 수 있다.
   써주면 optimizer 가 매우 효율적으로 동작 할 수 있는 실마리를 준다.


if object_id('tblx') is not null
drop table tblx
go

if object_id('tbly') is not null
drop table tbly
go


create table tblx
(idx int
,c1 int
)
go

create table tbly
(idx int
,c1 int
)
go

insert into tblx values (1,1)
insert into tblx values (2,2)
insert into tbly values (1,1)

go

select *
  from tblx
 where c1 not in (select c1 from tbly)


select *     from tblx    where c1 not in (select c1 from tbly)
  |--Nested Loops(Left Anti Semi Join, WHERE:([pc].[dbo].[tblx].[c1] IS NULL OR [pc].[dbo].[tbly].[c1] IS NULL OR [pc].[dbo].[tblx].[c1]=[pc].[dbo].[tbly].[c1]))
       |--Table Scan(OBJECT:([pc].[dbo].[tblx]))
       |--Table Scan(OBJECT:([pc].[dbo].[tbly]))


hash join 으로 풀 수 있는가?

select *
  from tblx
 where c1 not in (select c1 from tbly)
option (hash join)

메시지 8622, 수준 16, 상태 1, 줄 2
이 쿼리에 정의된 힌트로 인해 쿼리 프로세서에서 쿼리 계획을 생성할 수 없습니다. 힌트를 지정하거나 SET FORCEPLAN을 사용하지 않고 쿼리를 다시 전송하십시오.


테이블에 not null 체크 조건이 있으면 어떻게 되는가?
잘푼다.
if object_id('tblx') is not null
drop table tblx
go

if object_id('tbly') is not null
drop table tbly
go


create table tblx
(idx int
,c1 int not null
)
go

create table tbly
(idx int
,c1 int not null
)
go

insert into tblx values (1,1)
insert into tblx values (2,2)
insert into tbly values (1,1)
go

-- 아까 위헤서 아래 퀴리는 위에서 실행계획을 생성 할 수 없습니다. 그 쿼리이다.


select *
  from tblx
 where c1 not in (select c1 from tbly)
option (hash join)


select *     from tblx    where c1 not in (select c1 from tbly)  option (hash join)
  |--Hash Match(Right Anti Semi Join, HASH:([pc].[dbo].[tbly].[c1])=([pc].[dbo].[tblx].[c1]))
       |--Table Scan(OBJECT:([pc].[dbo].[tbly]))
       |--Table Scan(OBJECT:([pc].[dbo].[tblx]))


테이블 제약 조건 주지말고 hash join 을 하고 싶은가?


select *
  from tblx
 where c1 not in (select c1 from tbly where c1 is not null)
   and c1 is not null
option (hash join)


select *     from tblx    where c1 not in (select c1 from tbly where c1 is not null)     and c1 is not null  option (hash join)
  |--Hash Match(Right Anti Semi Join, HASH:([pc].[dbo].[tbly].[c1])=([pc].[dbo].[tblx].[c1]), RESIDUAL:([pc].[dbo].[tblx].[c1]=[pc].[dbo].[tbly].[c1]))
       |--Table Scan(OBJECT:([pc].[dbo].[tbly]), WHERE:([pc].[dbo].[tbly].[c1] IS NOT NULL))
       |--Table Scan(OBJECT:([pc].[dbo].[tblx]), WHERE:([pc].[dbo].[tblx].[c1] IS NOT NULL))


풀리긴 풀리는데 residual 이나 where 필터가 형성된다. 


체크조건이 있으면 residual 등이 없고 깔끔하지 않은가? 어떤가? check 조건 꼭 기술하는게 좋을것 같지 않은가?
이 외 다양한 이슈가 있다. 재미 있는가? 나머지 이슈는 나중에 책을 통해서 만나보자! (sqltag 에서는 sql 책을 준비하고 있어요 ;) 교성아 join 다 적어가나?????


if object_id('tblx') is not null
drop table tblx
go

if object_id('tbly') is not null
drop table tbly
go


create table tblx
(idx int
,c1 char(8000) not null
)
go

create table tbly
(idx int
,c1 char(8000) not null
)
go

insert into tblx values (1,1)
insert into tblx values (2,2)
insert into tbly values (1,1)
go

select *
  from tblx a
  join tbly b
    on a.idx = b.idx
 order by a.idx    
go
 

select *
  from tblx a
 inner hash join tbly b
    on a.idx = b.idx
 order by a.idx    
option(hash join)        

메시지 8618, 수준 16, 상태 2, 줄 1
작업 테이블이 필요하며 최소 행 크기가 허용되는 최대 크기 8060바이트를 초과하므로 쿼리 프로세서에서 쿼리 계획을 생성할 수 없습니다. 작업 테이블이 필요한 일반적인 이유는 쿼리의 GROUP BY 또는 ORDER BY 절 때문입니다. 쿼리에 GROUP BY 또는 ORDER BY 절이 있을 경우 이 절의 필드 수 및/또는 크기를 줄이십시오. 또한 필드의 접두사(LEFT()) 또는 해시(CHECKSUM())를 사용하여 그룹화하거나 접두사를 사용하여 정렬하십시오. 그러나 이 작업을 수행하면 쿼리의 동작이 변경됩니다.

다음은 full outer join 의 실행 예이다.



재미 있지 않은가? join 된 결과를 넣어두고 sort 해야 하는데 두 컬럼의 합이 8060을 넘으니 hash join 한 후 work table 을 만들수 없어 일어나는 에러이다.  

교성이의 join advanced 챕터로 쓸 예정 입니다.


Posted by 보미아빠
, |

최근에 달린 댓글

최근에 받은 트랙백

글 보관함