'분류 전체보기' 카테고리의 글 목록 (43 Page)

« » 2025.11
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

'보미아빠, 석이'에 해당되는 글 532건

2011.08.26 dbcc memorystatus 의 분석 2
2011.08.24 쿼리 분리가 필요한 경우?
2011.08.20 [NF 2011 Denali] Contained Database 2
2011.08.15 바다는 한방울 빗물을 마다하지 않는다.
2011.08.11 Aggregation WITH ROLLUP
2011.08.10 listening 연습은 이걸로 -_- reading 하고 있었는데 좋은걸~ ^^
2011.08.10 oracle linked server 5
2011.08.09 Hash Aggregate
2011.08.08 Table Value Function 의 종류
2011.08.08 Stream Aggregate

dbcc memorystatus 의 분석

카테고리 없음 / 2011. 8. 26. 17:50

메모리 상황을 보기위해 현재 가장 편한것은 dbcc memorystatus 이다. 이것을 이용해 어떤 메모리 영역에 문제가 있을 수 있는지 한번에 알아볼 수 있는 좋은 방법을 소개한다.

DMV 는 Session 당 할당 메모리를 보거나 뭐 더 좋은 많은 결과를 볼 수 있는 쿼리가 존재 한다. 그러나 DMV 는 모든 정보를 제공하지 않고 오직 dbcc memorystatus 만 모든 정보를 제공한다.

select granted_memory_kb, used_memory_kb, max_used_memory_kb
from sys.dm_exec_query_memory_grants
where session_id = 55

과거 버전의 서버에서 토탈 어떻게 사용하고 있는지 편리하게 제공하는 방법이 없다. 그래서 set nocount on 과 dbcc memorystatus 결과를 c:\dbcc.txt 라는 파일로 만들어 sql 로 파싱해서 편리하게 분석해 보자
눈으로 살펴보다가 내 눈이 심히 피곤했고, 이 결과를 파일로 만들어 넣은것은 헤더 정보를 같이 볼 방법은 이 방법밖에 없었다. 2005 이상에서는 편리하게 쿼리 할 수 있지만 2000 일 수 도 있어 쿼리를 과거 버전과 호환성 있게 만드느라 이렇게 만들었다.

동작 방법

xp_cmdshell 활성화가 필요하다.
강제로 설정하고 예전 설정으로 돌리면 되지만 작업 할려니 귀찮다.

모니터링 방법

use master
go
if object_id ('dbcc_memorystatus') is not null
drop proc dbcc_memorystatus
go

create proc dbcc_memorystatus
as
set nocount on
set transaction isolation level read uncommitted
-- script by minsouk kim
-- sqlsql.tistory.com
-- sqltag.org
-- 2011.08.30
-- xp_cmdshell 활성화 필요
declare @srvname varchar(1000)
   , @sql varchar(8000)
   , @ins_filename varchar(1000)

select @srvname = srvname from sysservers where srvid = 0
select @ins_filename = replace(@srvname,'\','_')
set @sql = 'sqlcmd -S '+@srvname+' -E -q "set nocount on; dbcc memorystatus;" -o c:\'+@ins_filename+'_dbcc_memorystatus.txt'
EXEC master..xp_cmdshell @sql ,no_output
if object_id ('tempdb..##dbcc_raw') is not null
drop table ##dbcc_raw
if object_id ('tempdb..##dbcc_memorystatus') is not null
drop table ##dbcc_memorystatus
if object_id ('tempdb..##dbcc_memorystatus_header') is not null
drop table ##dbcc_memorystatus_header
if object_id ('tempdb..##memorystatus') is not null
drop table ##memorystatus
create table ##dbcc_raw
(value varchar(1000))
set @sql = '
BULK INSERT ##dbcc_raw
FROM ''\\127.0.0.1\c$\'+@ins_filename+'_dbcc_memorystatus.txt''
WITH
(
TABLOCK
)'
exec (@sql)
create table ##dbcc_memorystatus
(idx int identity(1,1), value varchar(1000))
insert into ##dbcc_memorystatus
select * from ##dbcc_raw
create table ##dbcc_memorystatus_header
(idx int identity(1,1), header_idx int)
insert ##dbcc_memorystatus_header
select idx
from ##dbcc_memorystatus
where case when value like '%-%' then 1 else 0 end = 1
select * into ##memorystatus
from (
select h.description type, v.description, v.value, h.scale
from (select b.header_idx - 1 header_idx, b.header_idx+1 st_idx, a.header_idx-3 ed_idx
    from ##dbcc_memorystatus_header a
    join ##dbcc_memorystatus_header b
   on a.idx = b.idx + 1) d
join (select idx
    , substring(value,1, len(value) - charindex(' ',reverse(rtrim(value)))) description
    , substring(value,len(value) - charindex(' ',reverse(rtrim(value)))+1,100) scale
    from ##dbcc_memorystatus ) h
    on d.header_idx = h.idx
join (select idx
    , substring(value,1, len(value) - charindex(' ',reverse(rtrim(value)))) description
    , substring(value,len(value) - charindex(' ',reverse(rtrim(value)))+1,100) value
    from ##dbcc_memorystatus ) v
    on v.idx between d.st_idx and ed_idx
-- where h.description+v.description like '%sqlcp%'
--order by cast(v.value as bigint) desc
) a
PRINT 'select * from ##memorystatus where type+description like ''%sqlcp%'''
PRINT 'select * from ##memorystatus where type+description like ''%GLOBAL%'' ORDER BY CAST(VALUE AS BIGINT) DESC'
PRINT 'select * from ##memorystatus order by cast(value as bigint) desc '
-- select * from ##memorystatus
go

exec dbcc_memorystatus

결과 예제

SQL Server 2000 DBCC MEMORYSTATUS 에 대한 기술 정보
http://support.microsoft.com/?id=271624
SQL Server 2005 DBCC MEMORYSTATUS 에 대한 기술 정보
http://support.microsoft.com/?id=907877

저작자표시 비영리 변경금지 (새창열림)

Posted by 보미아빠

, |

쿼리 분리가 필요한 경우?

카테고리 없음 / 2011. 8. 24. 03:07

나는 이것을 프로시저와 메모리 그랜트 이슈라고 부르기 보다. "쿼리를 여러개로 분리하면 좋은 경우다" 라고 말하고 싶다.
실제 많은 이와 같은 프로시저는 경우에 따라 여러 sub procedure 를 콜 하도록 구성한다. steatement level recompile 도 용서 할 수 없는 경우가 있기 때문이다.

이 문서에는 recompile 하는 것이 좋겠다고 하지만, 우리는 이런경우 여러개로 분리해 쓴다.
아래 이유가 잘 설명되어 있지만. 쿼리플랜이 만들어질때 메모리 할당량이 설정되기 때문에 sort warning 이 생길수 있다는 관점에서 recompile 을 해 적당한 메모리를 할당 받으라는 것이다.

그러나, 실제 운영을 해보면, 이런 이슈보다 기간에 따라 쿼리 플랜이 달라져야 하고 인덱스를 다르게 써야 하는경우가 더 많았다. 그걸 수용하기 위해서는 statement level 의 recompile 이나, 분리된 다른 sub procedure 를 힌트로 박아 운영하는 것이 더 보편적인 이슈 해결 방법 이였다.

다음 사이트를 참고해서 학습하면 좋을듯 하다.
http://www.sqlworkshops.com/plancachingandquerymemory.htm

참고로 이분이 소개한 대부분의 내용이 옵티마이저를 소개한 블로그에서 먼저 소개되어 있다. 하지만, 현실적으로 어떻게 이러한 문제를 해결하는게 좋을지에 대한 해답은 여기가 가장 좋은듯 하다.

배껴서 강의하면 재미 날듯 해 한국어로 완벽하게 이해하는 강의를 해봐야지....SQLTAG 에서도 TeamView + 게임톡 강의로 많은 사람들과 이런 이슈를 하나 하나 나누어 보아야겠다. 언어의 장벽을 SQL에는 없도록 해봐야지.

비슷한 주제로 김정선 강사님이 강의 했다는데 들어 두었으면 좋았을걸..하는 아쉬움이 남는다. 아쉽게 스터디 날짜랑 같아 모두 한배타고 아무도 못 들었다. -_- 제가 책임지고 같은 주제로 더 심도깊게 강의 해 드리겠습니다. 운영의 경험 + 수많은 KB 조사까지 넣어 (알지 ?).

모두들 즐거운 SQL Time 되세요~

이번것은 넘 쉬어서 변역 안할란다.

Let’s create a stored procedure that sorts customers by name within certain date range.

To observe Sort Warnings, enable 'Sort Warnings' in SQL Profiler under Events 'Errors and Warnings'.

--Example provided by www.sqlworkshops.com

create proc CustomersByCreationDate @CreationDateFrom datetime, @CreationDateTo datetime as

begin

declare @CustomerID int, @CustomerName varchar(48), @CreationDate datetime

select @CustomerName = c.CustomerName, @CreationDate = c.CreationDate from Customers c

where c.CreationDate between @CreationDateFrom and @CreationDateTo

order by c.CustomerName

option (maxdop 1)

end

Let’s execute the stored procedure initially with 1 month date range.

set statistics time on

--Example provided by www.sqlworkshops.com

exec CustomersByCreationDate '2001-01-01', '2001-01-31'

The stored procedure took 48 ms to complete.

The stored procedure was granted 6656 KB based on 43199.9 rows being estimated.

The estimated number of rows, 43199.9 is similar to actual number of rows 43200 and hence the memory estimation should be ok.

There was no Sort Warnings in SQL Profiler. To observe Sort Warnings, enable 'Sort Warnings' in SQL Profiler under Events 'Errors and Warnings'.

Now let’s execute the stored procedure with 6 month date range.

--Example provided by www.sqlworkshops.com

exec CustomersByCreationDate '2001-01-01', '2001-06-30'

The stored procedure took 679 ms to complete.

The stored procedure was granted 6656 KB based on 43199.9 rows being estimated.

The estimated number of rows, 43199.9 is way different from the actual number of rows 259200 because the estimation is based on the first set of parameter value supplied to the stored procedure which is 1 month in our case. This underestimation will lead to sort spill over tempdb, resulting in poor performance.

There was Sort Warnings in SQL Profiler. To observe Sort Warnings, enable 'Sort Warnings' in SQL Profiler under Events 'Errors and Warnings'.

To monitor the amount of data written and read from tempdb, one can execute select num_of_bytes_written, num_of_bytes_read from sys.dm_io_virtual_file_stats(2, NULL) before and after the stored procedure execution, for additional information refer to the webcast: www.sqlworkshops.com/webcasts.

Let’s recompile the stored procedure and then let’s first execute the stored procedure with 6 month date range.

In a production instance it is not advisable to use sp_recompile instead one should use DBCC FREEPROCCACHE (plan_handle). This is due to locking issues involved with sp_recompile, refer to our webcasts for further details.

exec sp_recompile CustomersByCreationDate

--Example provided by www.sqlworkshops.com

exec CustomersByCreationDate '2001-01-01', '2001-06-30'

Now the stored procedure took only 294 ms instead of 679 ms.

The stored procedure was granted 26832 KB of memory.

The estimated number of rows, 259200 is similar to actual number of rows of 259200. Better performance of this stored procedure is due to better estimation of memory and avoiding sort spill over tempdb.

There was no Sort Warnings in SQL Profiler.

Now let’s execute the stored procedure with 1 month date range.

--Example provided by www.sqlworkshops.com

exec CustomersByCreationDate '2001-01-01', '2001-01-31'

The stored procedure took 49 ms to complete, similar to our very first stored procedure execution.

This stored procedure was granted more memory (26832 KB) than necessary memory (6656 KB) based on 6 months of data estimation (259200 rows) instead of 1 month of data estimation (43199.9 rows). This is because the estimation is based on the first set of parameter value supplied to the stored procedure which is 6 months in this case. This overestimation did not affect performance, but it might affect performance of other concurrent queries requiring memory and hence overestimation is not recommended. This overestimation might affect performance Hash Match operations, refer to article Plan Caching and Query Memory Part II for further details.

Let’s recompile the stored procedure and then let’s first execute the stored procedure with 2 day date range.

exec sp_recompile CustomersByCreationDate

--Example provided by www.sqlworkshops.com

exec CustomersByCreationDate '2001-01-01', '2001-01-02'

The stored procedure took 1 ms.

The stored procedure was granted 1024 KB based on 1440 rows being estimated.

There was no Sort Warnings in SQL Profiler.

Now let’s execute the stored procedure with 6 month date range.

--Example provided by www.sqlworkshops.com

exec CustomersByCreationDate '2001-01-01', '2001-06-30'

The stored procedure took 955 ms to complete, way higher than 679 ms or 294ms we noticed before.

The stored procedure was granted 1024 KB based on 1440 rows being estimated. But we noticed in the past this stored procedure with 6 month date range needed 26832 KB of memory to execute optimally without spill over tempdb. This is clear underestimation of memory and the reason for the very poor performance.

There was Sort Warnings in SQL Profiler. Unlike before this was a Multiple pass sort instead of Single pass sort. This occurs when granted memory is too low.

Intermediate Summary: This issue can be avoided by not caching the plan for memory allocating queries. Other possibility is to use recompile hint or optimize for hint to allocate memory for predefined date range.

Let’s recreate the stored procedure with recompile hint.

--Example provided by www.sqlworkshops.com

drop proc CustomersByCreationDate

create proc CustomersByCreationDate @CreationDateFrom datetime, @CreationDateTo datetime as

begin

declare @CustomerID int, @CustomerName varchar(48), @CreationDate datetime

select @CustomerName = c.CustomerName, @CreationDate = c.CreationDate from Customers c

where c.CreationDate between @CreationDateFrom and @CreationDateTo

order by c.CustomerName

option (maxdop 1, recompile)

end

Let’s execute the stored procedure initially with 1 month date range and then with 6 month date range.

--Example provided by www.sqlworkshops.com

exec CustomersByCreationDate '2001-01-01', '2001-01-30'

exec CustomersByCreationDate '2001-01-01', '2001-06-30'

The stored procedure took 48ms and 291 ms in line with previous optimal execution times.

The stored procedure with 1 month date range has good estimation like before.

The stored procedure with 6 month date range also has good estimation and memory grant like before because the query was recompiled with current set of parameter values.

The compilation time and compilation CPU of 1 ms is not expensive in this case compared to the performance benefit.

Let’s recreate the stored procedure with optimize for hint of 6 month date range.

--Example provided by www.sqlworkshops.com

drop proc CustomersByCreationDate

create proc CustomersByCreationDate @CreationDateFrom datetime, @CreationDateTo datetime as

begin

declare @CustomerID int, @CustomerName varchar(48), @CreationDate datetime

select @CustomerName = c.CustomerName, @CreationDate = c.CreationDate from Customers c

where c.CreationDate between @CreationDateFrom and @CreationDateTo

order by c.CustomerName

option (maxdop 1, optimize for (@CreationDateFrom = '2001-01-01', @CreationDateTo ='2001-06-30'))

end

Let’s execute the stored procedure initially with 1 month date range and then with 6 month date range.

--Example provided by www.sqlworkshops.com

exec CustomersByCreationDate '2001-01-01', '2001-01-30'

exec CustomersByCreationDate '2001-01-01', '2001-06-30'

The stored procedure took 48ms and 291 ms in line with previous optimal execution times.

The stored procedure with 1 month date range has overestimation of rows and memory. This is because we provided hint to optimize for 6 months of data.

The stored procedure with 6 month date range has good estimation and memory grant because we provided hint to optimize for 6 months of data.

Let’s execute the stored procedure with 12 month date range using the currently cashed plan for 6 month date range.

--Example provided by www.sqlworkshops.com

exec CustomersByCreationDate '2001-01-01', '2001-12-31'

The stored procedure took 1138 ms to complete.

2592000 rows were estimated based on optimize for hint value for 6 month date range. Actual number of rows is 524160 due to 12 month date range.

The stored procedure was granted enough memory to sort 6 month date range and not 12 month date range, so there will be spill over tempdb.

There was Sort Warnings in SQL Profiler.

As we see above, optimize for hint cannot guarantee enough memory and optimal performance compared to recompile hint.

This article covers underestimation / overestimation of memory for Sort. Plan Caching and Query Memory Part II covers underestimation / overestimation for Hash Match operation. It is important to note that underestimation of memory for Sort and Hash Match operations lead to spill over tempdb and hence negatively impact performance. Overestimation of memory affects the memory needs of other concurrently executing queries. In addition, it is important to note, with Hash Match operations, overestimation of memory can actually lead to poor performance.

Summary: Cached plan might lead to underestimation or overestimation of memory because the memory is estimated based on first set of execution parameters. It is recommended not to cache the plan if the amount of memory required to execute the stored procedure has a wide range of possibilities. One can mitigate this by using recompile hint, but that will lead to compilation overhead. However, in most cases it might be ok to pay for compilation rather than spilling sort over tempdb which could be very expensive compared to compilation cost. The other possibility is to use optimize for hint, but in case one sorts more data than hinted by optimize for hint, this will still lead to spill. On the other side there is also the possibility of overestimation leading to unnecessary memory issues for other concurrently executing queries. In case of Hash Match operations, this overestimation of memory might lead to poor performance. When the values used in optimize for hint are archived from the database, the estimation will be wrong leading to worst performance, so one has to exercise caution before using optimize for hint, recompile hint is better in this case.

저작자표시 비영리 변경금지 (새창열림)

Posted by 보미아빠

, |

[NF 2011 Denali] Contained Database

카테고리 없음 / 2011. 8. 20. 00:23

해당 기능은 데이터베이스 이동을 쉽게 해주고 독립적인 데이터베이스 사용을 가능하게 해주는 것을 목표로 하고 있다. 그러나 현재까지는 부분적인 Contained Database 모드만 지원하며, 아직은 실험적이며, Replication, CDC, Database mail, Service Broker 등 주요한 기능을 지원하지 않고 Job Agent 도 지원하지 않아 2011 정식 릴리즈가 나올때도 이 상황이라면 실제 사용에는 무리가 있어 보인다. 하지만, 향 후 메이저 릴리즈를 몇번 거치면 분명 훌륭한 솔루션으로 완성될 것이다. (개인적으로는 소규모 데이터베이스를 위한 기능 인듯 한데... 이런데 좀 그만 투자하고 힌트 좀 더 만들고 좀 더 가볍게 돌고, TCO 관점에서 메리트 있는 프러덕트를 만들었으면 한다. SSMS 도 좀 고치고......)

원본 아티클

http://sqlblog.com/blogs/aaron_bertrand/archive/2010/11/16/sql-server-v-next-denali-contained-databases.aspx

번역본 (by 김민석)

Aaron Bertrand

Aaron 은 SQL Sentry 의 시니어 컨설턴트이다. SQL Server, Analysis Services 와 Windows 의 성능 모니터링 과 이벤트 관리 소프트웨어 개발자 이다. 관리, 성능, 신 기능에 관심을 두고 2006년 부터 sqlblog.com 에서 블로깅 하고 있다. 1997년 부터 SQL Server MVP 이다. 트위터 아이디는 @AaronBertrand 이며, SQL Saturday 와 유저 그룹에서 종종 스피커로 활동한다.

SQL Server V.Next (Denali) : Contained Databases

SQL Server 데이터베이스가 포터블하지 않은 것은 오랫동안 골치꺼리 였다. 문론 백업 리스토어나 디테치 어테치 등의 방법으로 다른 위치에 이동은 가능하다. 그러나 그러한 작업을 할 때, 어플리케이션의 한 부분인 데이터베이스의 많은 부분이 누락된다. 그리고 많은 부분들은 어플리케이션 파트가 아니라 관리영역이다. 그리고 데이터베이스 밖에 있는 부분들에 대해 동기화 하지 못한다. 예를 들자면, Security, roles, linked servers, CLR, Database mail, service broker, replication, Agent job. 또한 대상 서버가 다른 정렬을 사용하면, 임시테이블을 사용하거나 다른 데이터베이스과 join 할 때 문제가 생긴다. 초기 데이터베이스 이관 후, 호환되지 않는 많은 오류 이슈를 알고 있을것이라 생각된다.

"Contained Databases" 속으로~

Denali 에서는 몇 몇 이슈가 Contained database 로 소개 되었다. 정식 버전이 나올때 이것이 마켓팅 용어가 될지 확신하지 못하겠지만, 한번 살펴보자. (DMF) 동적 관리 프레임워크에서 (PBM) 정책 기반 관리로 바꿀 때 처럼 DDL 이 내장된 이후는 그 이름을 더 고치기 힘들다. 기본적으로 Contained database 는 해당 데이터베이스를 좀 더 black box 화 하고, 서버나 인스턴스 수준의 관리에서 어플리케이션과 특정 데이터베이스로 기능으로 분리시킨다.

Denali 첫번째 버전에서 Contained database 기능은 다음과 같은 솔루션을 제공한다.

* login 없이 특정 데이터베이스 user를 만들 수 있다. (그리고 다른 데이터베이스를 위해 같은 이름으로 이런 다중 user 를 만들수 있다)
* 정렬이 다른 데이터베이스 환경에서 tempdb 호환성(완벽하지 않음)을 유지할 수 있다. 왜냐하면, 임시테이블은 데이터베이스 context 정렬로 만들어 지기 때문이다.
* DMV 를 통해 Contained database (Containment) 에 위협을 가하는 (호환되지 않는) 모든 개체와 코드를 볼 수 있다.
* XEvents 로 동작중 일때만 알 수 있는것도 인식 할 수 있다. (김민석 추가)

개체와 엔티티들은 두개의 다른 카테고리로 나눌 수 있다.
Contained 와 UnContained 로 나눌 수 있다. Containted 엔티티는 외부 의존이 없는 것 또는 적어도 서버나 인스턴스간 의존성이 없는 것이다. Contained 개체의 예는 데이터베이스 내에서 외부 참조가 없는 것이다. 전체 리스트는 BOL의 다음 토픽을 참고 한다.

Features Within the Application Model

Uncontained 오브젝트는 명시적으로 외부 의존성이 있거나 (Three Four part names) 나 판단할 수 없는것 (Dynamic SQL) 이다. 명시적으로 uncontained 로 고려되는 것은 다음 bol 토픽을 참고한다.

Features Outside of the Application Model

몇몇 것들은 빠져 있다. 예를 들자면 HOST_NAME 이 목록에 없다. (이것은 Connect 에 리포트 되었다) 마지막으로, containment 를 수용하기 위해 변경된 것은 bol 의 다음 토픽을 참고한다.

Modified Features (Contained Databases)

가장 흥미로운것은 CREATE / ALTER DATABASE 변경이다. 이들은 Contained Database 에서는 다르게 동작한다. 새로운 옵션인 CURRENT 가 포함되었다. 만약 해당 데이터베이스를 새로운 인스턴스로 옮기거나 데이터베이스 이름을 바꿀때 이 명령이 사용된다. 명령어 설명 문서에 contained database 에서는 반드시 ALTER DATABASE CURRENT 를 사용하고 uncontainted database 에서는 ALTER DATABASE <database name> 을 사용하라고 되어있다. 하지만, 현재 CTP 판에서는 적용되어 있지 않다. 다른 흥미있는 토픽은 Collation Type (정렬), 임시 테이블의 제한사항, user option 이 있다.

Contained database 로 변경하기 (Turning containment on)

Denali 에서는 부분적 Containment 만 지원한다. 이말은 uncontained 엔티티를 만들 수 있다는 것이다. 그리고 이것을 DMV 를 통해 구분 할 수 있다. 하지만 엔진은 이러한 것들을 막지 못한다. 다음 버전의 SQL Server 에서는 완전한 Contained database 를 지원할 것이다. 이말은 containment 를 강제화 할 수 없다는 말이다.

정렬이 다른 여러 데이터베이스를 만들고 이런 기능을 사용해 보자. 주의할 것은 서버레벨 Configuration 옵션을 변경해야 이러한 Contanment DDL 을 사용할 수 있다.

USE [master];

CREATE DATABASE [ContainedDB1] COLLATE SQL_Latin1_General_CP1_CI_AS;

CREATE DATABASE [ContainedDB2] COLLATE Finnish_Swedish_100_CI_AI_SC;

EXEC sp_configure 'show advanced options', 1;

RECONFIGURE WITH OVERRIDE;

EXEC sp_configure 'contained database authentication', 1;

EXEC sp_configure 'show advanced options', 0;

RECONFIGURE WITH OVERRIDE;

ALTER DATABASE [ContainedDB1] SET CONTAINMENT = PARTIAL;

ALTER DATABASE [ContainedDB2] SET CONTAINMENT = PARTIAL;

로그인 없이 유저 만들기

이제 로그인에 묶이지 않은 유저를 만들 수 있다. 문법은 CREATE LOGIN 과 유사하다. password 호환성 default_schema default_language 등 대부분의 기능을 사용할 수 있다. 여기 각각의 데이터베이스별 유저를 만든다. 이름은 동일하게 MyUser 이고 암호는 다르다.

USE [ContainedDB1];

CREATE USER [MyUser] WITH PASSWORD = 'DB1';

USE [ContainedDB2];

CREATE USER [MyUser] WITH PASSWORD = 'DB2';

uncontainted database 에서 위 명령을 실행하면 다음과 같은 에러가 발생한다.

Msg 33233, Level 16, State 1, Line 1
You can only create a user with a password in a contained database.

자신에게 물어봐라, 두명의 유저가 같은 이름이라면 SQL Server 가 어떻게 니가 원하는 유저를 구분하겠는가? 유저는 이제 database 레벨에서 먼저 인증된다. (접속 문자열에 데이터베이스 이름이 명시되어 있다.) 그리고 만약 해당 database 에 유저를 찾지 못하면, 같은 이름의 로그인을 찾는다. 이는 성능 관점에서 고려되었으며, 반대로는 하지 않는다 -> 로그인을 찾고, 실패하면 각각의 모든 데이터베이스에서 같은 이름의 유저가 있는지 찾는것. 데이터베이스가 명시되어 있지 않으면 인증은 바로 실패한다. 그래서 만약 정확한 contained database name 유저와 database context 를 기술하지 않으면, 암호가 일치해도 인증은 실패한다. 이말은 default database 에 의존하지 말라는 것이다. (이러한 것은 Contained database 에는 없다.) 접속 문자열에 "Initial Catalog" 속성을 명시적으로 적어야 한다.

Contained database 에 "SQL user with password" 라는 새로운 속성 페이지를 볼 수 있다.

큰 차이점은 이제 이러한 데이터베이스를 다른 서버로 옮길때 단지 접속 문자열만 다시 가르키면 된다. 로그인 이나 SID 매칭 작업은 필요 없다. 그리고 Windows principals 도 가능하다. 그러나 이것은 별로 흥미롭지 않다. 왜냐하면, 이들 데이터베이스는 로그인에 종속적이지 않기 때문이다.

USE [ContainedDB1];

CREATE USER [Domain\User]; -- note user@domain is not supported

하지만, contained database 에서 주의할 것은 Windows principals 는 자동으로 엔진과 현재 데이터베이스에 인증된다. 반면 uncontained database 에서는 관련된 로그인이나 Windows group 을 통해 인증 된다. 더 많은 정보는 업데이트된 CREATE USER 구문과, BOL 토픽을 살펴봐라. 그리고 새로운 시스템 저장 프로시저인 sys.sp_migrate_user_to_contained 는 기존의 SQL 인증 유저를 암호와 함께 contained 데이터베이스 유저로의 마이그레이션을 돕는다.

다음을 주의해라. 데이터베이스를 암호 복잡성 룰이 다른 새 서버로 옮길때 로그인을 새로 만들거나 암호를 고치지 않는한 검사하지 않는다. 그래서 복잡한 암호 사용이 강제화된 서버에서 약한 암호가 사용될 수 있다.

보안은 큰 이슈이고, Contained 데이터베이스에서는 로그인과 유저가 어떻게 동시에 존재할 수 있고 없음을 대충 훓터보았다. 흥미로운 시나리오가 있으면 알려달라. 그리고 다음 BOL 토픽을 살펴 봐라. "Threats Against Contained Databases." 또 하나의 흥미로운 토픽은 AUTO_CLOSE 이다. contained databases 와 AUTO_CLOSE 는 서비스 중단 공격이 되기도 한다. 왜냐하면, Contained database 유저 인증에 더많은 리소스가 필요하기 때문이다.

Tempdb 정렬 이슈 해결 (Resolving tempdb collation issues)

이 코드는 Denali 이전 버전이나 uncontained 데이터베이스에서는 흥미로운 점이 없다. (서버는 Finnish_Swedish_100_CI_AI 과 다른 정렬을 사용한다.)

USE [master];

CREATE DATABASE [test] COLLATE Finnish_Swedish_100_CI_AI;

USE [test];

CREATE TABLE dbo.foo

(

bar NVARCHAR(32)

);

CREATE TABLE #foo

(

bar NVARCHAR(32)

);

SELECT *

FROM #foo AS a

INNER JOIN dbo.foo AS b

ON a.bar = b.bar;

DROP TABLE #foo;

USE [master];

DROP DATABASE [test];

이 조인 조건은 다음과 같은 에러를 발생시킨다. 왜냐하면, 임시테이블은 데이터베이스 기본 정렬이 아니라, 서버 기본 정렬로 생성되기 때문이다.

Msg 468, Level 16, State 9, Line 3
Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "Finnish_Swedish_100_CI_AI" in the equal to operation.

Denali Contained database 에서는 임시 오브젝트가 해당 데이터베이스 정렬로 만들어진다. 그래서 Contained 데이터베이스를 만들고 위 코드를 다시 실행하면 잘 동작한다.

문론 이것이 모든 이런 이슈를 해결하지는 않는다. 여전히 다른 정렬에서는 에러가 발생할 수 있다. 그리고 일반적인 정렬이슈를 말하는 것이 아니다. 단지 많은 문제에 노출되지 않는다는 것을 말한다. 그래서 정렬이 민감한 환경에서는 문제 해결에 더 많은 시간을 보내야 한다. 하지만 사용하는 T-SQL 쿼리가 완전히 Contained 되고 정렬이슈 영향이 적다면 더 쉽게 데이터베이스를 옮길 수 있다.

데이터베이스 이식성 조사 (Discovering threats to database portability)

심지어 containemt 기능을 사용하지 않더라도, 이식 대상이 되는 데이터베이스에서 어떤 부분이 이식성에 문제가 있는지 알아내는 것은 중요하다. (contained 로 알려진) 그래서, sys.dm_db_uncontained_entities 라는 DMV 가 이것을 도와준다. objects 뿐만 아니라 프로시저 평선 뷰 트리거도 구분할 수 있다. 같은 모듈에서 여러 위반사항을 확인 할 수 있다. Contained 데이터베이스에서 아래 여러 오브젝트를 생성하고 containment 에 어떤 영향을 주는지 살펴보자.

USE [master];

-- create a login:

CREATE LOGIN MyLogin WITH PASSWORD = 'DB1';

USE [ContainedDB1];

-- create a user bound to the above login:

CREATE USER [MyLogin] FROM LOGIN [MyLogin];

-- create a procedure with dynamic SQL + deferred name resolution:

CREATE PROCEDURE dbo.foo

BEGIN

EXEC('SELECT * FROM table1');

EXEC('SELECT * FROM table2');

SELECT * FROM dbo.object_does_not_exist;

END

-- create a synonym to an external, uncontained object:

CREATE SYNONYM dbo.bar FOR [master].sys.backup_devices;

-- create a procedure that references the synonym:

CREATE PROCEDURE dbo.getbar

BEGIN

SELECT * FROM dbo.bar;

END

-- create a procedure that calls xp_cmdshell:

CREATE PROCEDURE dbo.use_xp_cmdshell

BEGIN

EXEC xp_cmdshell 'dir C:\';

END

-- create a procedure that relies on database mail:

CREATE PROCEDURE dbo.use_dbmail

BEGIN

EXEC msdb.dbo.sp_send_dbmail;

END

-- create a silly function that generates a random object_id:

CREATE FUNCTION dbo.GenRandNumber()

RETURNS BIGINT

BEGIN

RETURN

(

SELECT TOP 1 [object_id]

FROM msdb.sys.objects;

);

END

-- create a table with a default constraint that references the function:

CREATE TABLE dbo.nonsense

(

id INT NOT NULL

DEFAULT dbo.GenRandNumber()

);

나는 해당 DMV 를 통해 문제를 어떻게 해결하지는 보여주기 위해 아래 쿼리를 실행했다.

SELECT

e.feature_name,

[object] = COALESCE(

QUOTENAME(SCHEMA_NAME(o.[schema_id])) + '.' + QUOTENAME(o.[name]),

QUOTENAME(SCHEMA_NAME(s.[schema_id])) + '.' + QUOTENAME(s.[name])

[line] = COALESCE(e.statement_line_number, 0),

[statement / synonym target / route / user/login] = COALESCE(

s.[base_object_name],

SUBSTRING(

m.[definition],

e.statement_offset_begin / 2,

e.statement_offset_end / 2 - e.statement_offset_begin / 2

) COLLATE CATALOG_DEFAULT,

r.[name],

'User : ' + p.[name] + ' / Login : ' + sp.[name]

)

FROM

sys.dm_db_uncontained_entities AS e

LEFT OUTER JOIN

sys.objects AS o

ON e.major_id = o.[object_id]

AND e.class = 1

LEFT OUTER JOIN

sys.sql_modules AS m

ON e.major_id = m.[object_id]

AND e.class = 1

LEFT OUTER JOIN

sys.synonyms AS s

ON e.major_id = s.[object_id]

AND e.class = 1

LEFT OUTER JOIN

sys.routes AS r

ON e.major_id = r.[route_id]

AND e.class = 19

LEFT OUTER JOIN

sys.database_principals AS p

ON e.major_id = p.principal_id

AND e.class = 4

LEFT OUTER JOIN

sys.server_principals AS sp

ON p.[sid] = sp.[sid];

결과

dbo.foo 저장프로시저가 DMV 에 의해서 3번 호출된것을 볼 수 있다. 2번은 동적 SQL 에 의해서, 또 한번은 존재하지 않는 개체 때문이다. (이것이 갑자기 만들어 졌을때 uncontained 개체가 될 수 도 있기 때문에 출력되었다.) 라인 넘버도 매우 유용하다. 그래서 해당 개체 정의를 확인 할 때, 검색하지 않고 바로 그 라인으로 가 볼 수 있다.

이 DMV 가 SYNONYM 을 잡아내지 못했음에 주의 해라. 해당 SYNONYM 은 3부분으로 외부 참조를 한다. 그러므로 SYNONYM 을 모두 찾아 조사해야 한다. 이 건은 Connect #622368 로 보고 되었으며 향 후 고쳐지길 바란다.

그리고 table 이 function 을 사용할 때 function 내부에서 uncontained 참조가 있어도 잡아내지 못한다. 이 건은 Connect 에 올리지 않았다. 이건 잘 일어나지 않는 이슈이기 때문이다. 그래도 알고 있어야 한다.

마지막으로, AutoCreatedLocal 이 뭔지 잘 모르겠다.
(Finally, I am not sure exactly what you can do about the AutoCreatedLocal route; this is the first time I've ever noticed this entity)

해당 기능은 Service Broker 기능으로 잘 모를수도 ...(김민석)

DMV 에서 Assembly , Type, Full-text Index, Database DDL Trigger 와 Audit Specification 등 다른 엔티티도 다루는데 여기서 다루지 않았다. 등이다. 하지만 만약 이러한 아이템을 사용한다면, 같이 살펴봐야 한다.

결론

버전을 올리면서 분명 다른 많은 방면에서 작업 해 연결서버나 데이터베이스 레벨의 에이전트 작업 같은것들도 되게 할것이며 이는 매우 흥미로울 것이다. 나는 이들의 연구 수준을 느낄 수 있었고, 첫번째 증인이 될것이다. 앞으로의 메이저 버전이 나올때 마다 해당기능의 발전을 살펴볼 것이다.

For more information on contained databases, there is a great Books Online topic in the works:

저작자표시 비영리 변경금지 (새창열림)

Posted by 보미아빠

, |

바다는 한방울 빗물을 마다하지 않는다.

카테고리 없음 / 2011. 8. 15. 04:32

나 스스로를 낮추고, 넓은 마음을 가지자.......
살며, 사랑하며, 배우며....

저작자표시 비영리 변경금지 (새창열림)

Posted by 보미아빠

, |

Aggregation WITH ROLLUP

카테고리 없음 / 2011. 8. 11. 00:22

WITH ROLLUP 에 해당하는 부분을 SQL 2005 에서는 구분할 방법이 없었으며(2008 에서 고쳐짐), 본래의 Aggregate 는 병렬이나 HASH, STREAM 으로 풀 수 있으나, ROLLUP 에 해당하는 aggregation 부분은 병렬화 해서 풀 수 없고, STREAM Aggreation 만 할 수 있다.

이 포스트에서, WITH ROLLUP 이 어떻게 aggregation 과 함께 어떻게 동작 하는지 다룬다. WITH ROLLUP 절은 하나의 문장으로 다중 레벨 aggregation 을 가능하게 해준다. 예를 들자면, 아래과 같은 가상 판매 데이터가 있다고 가정해보자. (이건 PIVOT 연산자 시리즈에서 다룬 데이터와 같은 데이터이다.)

CREATE TABLE Sales (EmpId INT, Yr INT, Sales MONEY)
INSERT Sales VALUES(1, 2005, 12000)
INSERT Sales VALUES(1, 2006, 18000)
INSERT Sales VALUES(1, 2007, 25000)
INSERT Sales VALUES(2, 2005, 15000)
INSERT Sales VALUES(2, 2006, 6000)
INSERT Sales VALUES(3, 2006, 20000)
INSERT Sales VALUES(3, 2007, 24000)

년별 총 판매량을 계산하기 위해 다음과 같은 간단한 aggregation 쿼리를 작성할 수 있다.

SELECT Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY Yr

예상된 바와 같이 , 이 쿼리의 결과는 다음과 같이 년별로 한개씩 3개의 행을 출력한다.

Yr          Sales
----------- ---------------------
2005        27000.00
2006        44000.00
2007        49000.00

쿼리 계획은 간단한 stream aggregate 이다.

|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))
       |--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1010]=COUNT_BIG([Sales].[Sales]), [Expr1011]=SUM([Sales].[Sales])))
            |--Sort(ORDER BY:([Sales].[Yr] ASC))
                 |--Table Scan(OBJECT:([Sales]))

년별 값 뿐만 아니라 전체 판매량도 구하고자 한다면, UNION ALL 쿼리를 쓸 수 있다.

SELECT Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY Yr
UNION ALL
SELECT NULL, SUM(Sales) AS Sales
FROM Sales

이 쿼리는 동작하고 올바른 쿼리 결과를 보인다.

Yr          Sales
----------- ---------------------
2005        27000.00
2006        44000.00
2007        49000.00
NULL        120000.00

하지만, 쿼리 계획은 두개의 scan 쿼리과 두개의 aggregations 로 동작한다. (하나는 년별 판매이고 하나는 전체 판매량이다.)

|--Concatenation
       |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1023]=(0) THEN NULL ELSE [Expr1024] END))
       |    |--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1023]=COUNT_BIG([Sales].[Sales]), [Expr1024]=SUM([Sales].[Sales])))
       |         |--Sort(ORDER BY:([Sales].[Yr] ASC))
       |              |--Table Scan(OBJECT:([Sales]))
       |--Compute Scalar(DEFINE:([Expr1010]=NULL))
            |--Compute Scalar(DEFINE:([Expr1009]=CASE WHEN [Expr1025]=(0) THEN NULL ELSE [Expr1026] END))
                 |--Stream Aggregate(DEFINE:([Expr1025]=COUNT_BIG([Sales].[Sales]), [Expr1026]=SUM([Sales].[Sales])))
                      |--Table Scan(OBJECT:([Sales]))

우리는 본래 쿼리에서 WITH ROLLUP 을 추가함으로 더 좋은 플랜을 만들수 있다.

SELECT Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY Yr WITH ROLLUP

이쿼리는 더 간단하게 쓸 수 있고, 단지 한번의 scan 으로 더 효율적인 쿼리플랜이다.

|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1005]=(0) THEN NULL ELSE [Expr1006] END))
       |--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1005]=SUM([Expr1007]), [Expr1006]=SUM([Expr1008])))
            |--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1007]=COUNT_BIG([Sales].[Sales]), [Expr1008]=SUM([Sales].[Sales])))
                 |--Sort(ORDER BY:([Sales].[Yr] ASC))
                      |--Table Scan(OBJECT:([Sales]))

이 쿼리의 최하단의 stream aggregate 는 원래의 ROLLUP 이 없을때의 stream aggregate 이다. 이건은 정상적인 aggregation 이다. 그리고 이것은 이 예제에서와 같이 stream aggregate 나 option (HASH GROUP) 을 추가함으로 hash aggregate 로도 풀 수 있다. 그리고 병렬 쿼리로도 풀 수 있다.

최상단의 stream aggregate 는 ROLLUP 을 위한 특별한 aggregate 이다. (불행히, SQL Server 2005 에서는 ROLLUP 을 수행하는 aggregation 을 구분할(분별할) 방법은 없다.) 이 이슈는 SQL Server 2008 의 그래픽 과 XML 실행계획에서 고쳐질 것이다. ROLLUP aggregate 는 항상 stream aggregate 이고 병렬화 할 수 없다. 간단한 예제에서, ROLLUP stream aggregate 는 판매 컬럼의 누적값을 유지해 단지 각각의 이전 aggregate 한 값을 돌려준다. 마지막 입력행을 계산한 후에, aggregate 는 마지막 sum 값을 계산해 한행 추가한다. All 값에 대한 컨셉이 SQL 에 없기 때문에, 마지막 Yr 컬럼은 NULL 값으로 셋팅 한다. GROUPING(Yr) 을 추가해 ROLLUP 행을 식별할 수 있다.

SELECT
      CASE WHEN GROUPING(Yr) = 0
            THEN CAST (Yr AS CHAR(5))
            ELSE 'ALL'
      END AS Yr,
      SUM(Sales) AS Sales
FROM Sales
GROUP BY Yr WITH ROLLUP

Yr Sales
----- ---------------------
2005 27000.00
2006 44000.00
2007 49000.00
ALL 120000.00

또 다중 ROLLUP 레벨을 하나의 쿼리로 계산할 수 있다. 예들들면, 직원별로 먼저 판매량을 계산하고 년별 직원의 판매량을 계산할 수 있다.

SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY EmpId, Yr WITH ROLLUP

EmpId       Yr          Sales
----------- ----------- ---------------------
1           2005        12000.00
1           2006        18000.00
1           2007        25000.00
1           NULL        55000.00
2           2005        15000.00
2           2006        6000.00
2           NULL        21000.00
3           2006        20000.00
3           2007        24000.00
3           NULL        44000.00
NULL        NULL        120000.00

이 쿼리 계획은, Yr(원글에서는 EmpId 로 써 있는데 오타인듯 함) 컬럼으로 그룹하지 않고 EmpId 와 Yr 컬럼으로 그룹한것을 제외하고는 이전 계획과 동일하다. 이전의 쿼리계획과 비슷하게, 이 쿼리 계획은 두개의 stream aggregate 를 포함한다. 맨 하단의 stream aggregate 는 정상적인 것이고 최상위 것은 ROLLUP 을 계산하기 위한 것이다. 이 ROLLUP 은 두개의 aggregate 누적값을 구한다. 하나는 한명의 직원의 전체 년도에 대한 값이고, 다른 하나는 전체 직원은 전체 년도의 전체 값이다. 이 테이블은 ROLLUP 어떻게 계산되는지 보여준다.

다음 포스트에서 WITH CUBE 절을 살펴보겠다. ROLLUP 과 비교해 기능과 구현의 차이점을 다룬다.

by Craig Freedman

저작자표시 비영리 변경금지 (새창열림)

Posted by 보미아빠

, |

listening 연습은 이걸로 -_- reading 하고 있었는데 좋은걸~ ^^

카테고리 없음 / 2011. 8. 10. 19:14

http://technet.microsoft.com/en-us/sqlserver/ff977043.aspx

만철군이 오랜만에 좋은 정보를 물어줬다....나의 절대 관심사...영어... ㅠ.ㅠ

저작자표시 비영리 변경금지 (새창열림)

Posted by 보미아빠

, |

oracle linked server

카테고리 없음 / 2011. 8. 10. 16:11

1. http://blog.daum.net/partsofmemory/5 (참고로, Tistory의 저 아저씨는 SQL Server 기술지원 엔지니어 였었다. )
위 아티크를 참고로 sqlclient 를 설치한다.
sqlplus 도 같이 설치한다.

2. ACL 을 확인한다.
- Network ACL
- SQL ACL

3. sqlplus without tnsname 으로 connection TEST 를 진행한다.
   sqlplus 니가받은아이디/니가받은패스워드@//대상서버아이피:포트/서비스네임
   예)sqlplus id/pass@//10.0.0.1:1234/a.minsouk.com
   이렇게 해서 안되면 다른 사용중인 클라이언트에서 id pass 가 정확한지 확인한다.

4. tnsname.ora 를 셋팅해서 좀더 편하게 접근하기
   대충 다음과 비슷한 폴더를 찾는다.
   C:\Oracle\xxxxx\product\11.2.0\client_1\network\admin
   여기서 admin 폴더가 없으면 만들어서 tnsname.ora 를 만들어 넣는다.

5. tnsname.ora 의 예제
aliasname =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.0.0.1)(PORT = 1234))
)
(CONNECT_DATA =
(SERVICE_NAME = a.minsouk.com)
)
)

6. 나머지는 열심히 구글링한다.
http://support.microsoft.com/kb/280106/ko

7. 나는 서버 담당자가 service_name 을 이상하게 알려줘서 고생했다. (a.minsouk.com 에서 a 만 알려줌 -_- 뒤는 다 같으니 생략하고 알려줌)

agent 시작계정이 nt service 에 sqlserveragent 뭐 이런식으로 되어 있으면

공급자에 in processes 설정을 해주던지 localsystem 으로 계정을 바꾸어 주어야 한다.

dllHost.exe 의 실행권한이 없어 에러가 나는듯

http://blogs.msdn.com/b/dataaccesstechnologies/archive/2010/08/19/permissions-needed-to-set-up-linked-server-with-out-of-process-provider.aspx

저작자표시 비영리 변경금지 (새창열림)

Posted by 보미아빠

, |

Hash Aggregate

카테고리 없음 / 2011. 8. 9. 02:32

Hash Aggregate

나의 이전 두 글에서, stream aggregate 연산자를 다루었다. Stream aggregate 는 scalar aggregtates 와 group by 컬럼에 인덱스가 제공되거나 정렬이 필요한 aggregations 에 좋다. (예를들어, order by 절이 있는경우)

다른 또 하나의 aggregation 연산자는 hash aggregate 이다. 이것은 hash join과 비슷하다. 이것은 정렬순서를 유지할 필요가 없고, 메모리가 필요하며, 블럭된다 (즉, 이것은 모든 입력행을 다 처리하기 전까지 어떠한 결과도 만들지 않는다.). Hash aggregate 는 매우 큰 입력셋에서 효과적이다.

Here is pseudo-code for the hash aggregate algorithm:(Hash aggregate 알고리즘을 위한 슈도 코드)

for each input row
begin
    calculate hash value on group by column(s)
    check for a matching row in the hash table
    if we do not find a match
      insert a new row into the hash table
    else
      update the matching row with the input row
end
output all rows in the hash table

stream aggregate 가 한번에 한개의 group 값을 계산하는것에 반해 hash aggregate 는 전체 그룹을 동시에 계산한다. 이러한 그룹을 저장하기위해 hash table 을 사용한다. 각각의 새로운 입력행에 대해 hash table 을 체크하고, 입력된 행이 기존 그룹이 있는지 체크한다. 존재하면 해당 그룹을 간단히 업데이트 한다. 존재하지 않으면 새로운 그룹을 만든다. 입력되는 행이 정렬되어 있지 않기 때문에 입력되는 모든 행은 어떠한 그룹에도 속할 수 있다. 그러므로, 입력된 모든 행의 처리가 끝나기 전에 어떠한 결과값도 출력할 수 없다.

Memory and spilling (메모리와 넘침)

hash join 처럼 hash aggregate 는 메모리를 필요로 한다. hash aggregate 를 실행하기 전에, SQL Server 는 cadinality 예측에 기반해 얼마나 많은 메모리가 쿼리 실행에 필요한지 예측하게 된다. hash join 에서 각각의 빌드 입력을 저장한다. 그러므로, 전체 필요한 메모리는 빌드 입력의 행수와 사이즈에 비례한다. join 의 출력 cadinality 와 join 되는 행수는 join 의 메모리 필요량과 상관 없다.

hash aggregate 에서, 각각의 그룹에 하나의 행을 입력한다. 그러므로, 전체 메모리 요구량은 출력 그룹의 수나 행수에 비례한다. 만약 unique 값이 몇개 없고 group 이 몇개 되지 않는다면 메모리를 작게쓰고, 많은 unique 값이 있고, 그룹 컬럼의 개수가 많으면, 많은 메모리를 필요로 한다.

그래서, 메모리가 부족하면 어떤일이 일어날까? hash join 에서 메모리가 부족하면, tempdb 를 사용하게 된다. 부분적으로 aggregated 된 결과를 포함해 하나 이상의 bucket 이나 파티션을 디스크로 넘긴다. 이러한 행에 대해 aggregate 를 다시 시도하지 않지만, 이것들을 다시 hash 해 여러 bucket 이나 파티션으로 나누어야 한다. 모든 입력 그룹에 대해 처리가 끝나면, 메모리 내 그룹의 결과를 출력한다. 그리고, 메모리에서 넘친 부분을 다시 읽어 알고리즘을 반복 실행한다. 메모리에서 넘친 행이 여러 파티션으로 나누어지면, 각각의 파티션 수를 작게해 이러한 알고리즘이 여러번 반복되는 비효율을 줄인다.

주의할 점은, 중복 행이 hash join 에서는 큰 문제이다. 이것은 hash bucket 의 사이즈를 다르게 해 데이터 스큐(skew)를 유발하고, 크기가 고정된 작은 부분으로 나누기 어렵게 한다. 그러나 이러한 중복 행이 hash aggregation 에서는 유리하다. 왜냐하면, 하나의 그룹으로 축약되기 때문이다.

Example (예제)

옵티마이저는 테이블이 많은행과 많은 그룹을 가지면 hash aggregation 을 선호한다. 예를들어 100개의 행과 10개 그룹이 있으면 stream aggregate 를 한다.

create table t (a int, b int, c int)

set nocount on
declare @i int
set @i = 0
while @i < 100
begin
insert t values (@i % 10, @i, @i * 3)
set @i = @i + 1
end

select sum(b) from t group by a
|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))
       |--Stream Aggregate(GROUP BY:([t].[a]) DEFINE:([Expr1010]=COUNT_BIG([t].[b]), [Expr1011]=SUM([t].[b])))
            |--Sort(ORDER BY:([t].[a] ASC))
                 |--Table Scan(OBJECT:([t]))

그러나, 1000개의 행과 100개의 그룹이 있으면 hash aggregate 를 한다.

truncate table t

declare @i int
set @i = 100
while @i < 1000
begin
insert t values (@i % 100, @i, @i * 3)
set @i = @i + 1
end

select sum(b) from t group by a
|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))
|--Hash Match(Aggregate, HASH:([t].[a]), RESIDUAL:([t].[a] = [t].[a]) DEFINE:([Expr1010]=COUNT_BIG([t].[b]), [Expr1011]=SUM([t].[b])))
|--Table Scan(OBJECT:([t]))

group by 컬럼을 hash 했다. 나머지 비교 연산자(residual predicate)는 hash aggreate 가 hash 값 충돌이 발생 할 때 입력행을 hash table 에서 비교 할 때 사용된다.

hash aggregate 는 정렬이 필요없는 것도 볼 수 있다. 정렬은 hash aggregate 보다 더 많은 메모리를 필요로 한다. 왜냐하면,정렬은 1000개의 행을 정렬해야 하고 hash aggregate 는 100 개의 그룹값만 가지면 되기 때문이다. 하지만, 쿼리에서 order by 를 명시적으로 요구하면, 다시 stream aggregate 를 하는 것을 볼 수 있다.

select sum(b) from t group by a order by a
|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))
       |--Stream Aggregate(GROUP BY:([t].[a]) DEFINE:([Expr1010]=COUNT_BIG([t].[b]), [Expr1011]=SUM([t].[b])))
            |--Sort(ORDER BY:([t].[a] ASC))
                 |--Table Scan(OBJECT:([t]))

만약, 테이블이 충분히 크고, 그룹의 수가 작으면, 옵티마이저는 hash aggregate 를 수행하고 결과를 다시 정렬하는 것이 비용 효율적이다 라고 판단한다. 예를 들자면, 10000 개의 행을 가지고 100개의 그룹을 가지면, 10000개의 행을 정렬하기보다 옵티마이저는 100개의 그룹을 hash 하고 정렬하게 된다.

truncate table t

set nocount on
declare @i int
set @i = 0
while @i < 10000
begin
    insert t values (@i % 100, @i, @i * 3)
    set @i = @i + 1
end

select sum(b) from t group by a order by a
|--Sort(ORDER BY:([t].[a] ASC))
       |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))
            |--Hash Match(Aggregate, HASH:([t].[a]), RESIDUAL:([t].[a] = [t].[a]) DEFINE:([Expr1010]=COUNT_BIG([t].[b]), [Expr1011]=SUM([t].[b])))
                 |--Table Scan(OBJECT:([t]))

Distinct

stream aggregate 와 유사하게 hash aggregate 가 distinct 연산에도 사용될 수 있다.

select distinct a from t
|--Hash Match(Aggregate, HASH:([t].[a]), RESIDUAL:([t].[a] = [t].[a]))
|--Table Scan(OBJECT:([t]))

좀 더 흥미있는 것은

select sum(distinct b), sum(distinct c) from t group by a
|--Hash Match(Inner Join, HASH:([t].[a])=([t].[a]), RESIDUAL:([t].[a] = [t].[a]))
       |--Compute Scalar(DEFINE:([t].[a]=[t].[a]))
       |    |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1018]=(0) THEN NULL ELSE [Expr1019] END))
       |         |--Hash Match(Aggregate, HASH:([t].[a]), RESIDUAL:([t].[a] = [t].[a]) DEFINE:([Expr1018]=COUNT_BIG([t].[b]), [Expr1019]=SUM([t].[b])))
       |              |--Hash Match(Aggregate, HASH:([t].[a], [t].[b]), RESIDUAL:([t].[a] = [t].[a] AND [t].[b] = [t].[b]))
       |                   |--Table Scan(OBJECT:([t]))
       |--Compute Scalar(DEFINE:([t].[a]=[t].[a]))
            |--Compute Scalar(DEFINE:([Expr1005]=CASE WHEN [Expr1020]=(0) THEN NULL ELSE [Expr1021] END))
                 |--Hash Match(Aggregate, HASH:([t].[a]), RESIDUAL:([t].[a] = [t].[a]) DEFINE:([Expr1020]=COUNT_BIG([t].[c]), [Expr1021]=SUM([t].[c])))
                      |--Hash Match(Aggregate, HASH:([t].[a], [t].[c]), RESIDUAL:([t].[a] = [t].[a] AND [t].[c] = [t].[c]))
                           |--Table Scan(OBJECT:([t]))

이 실행계획은 논리적으로 저번 글에서 본 stream aggregate 와 같다. 하지만 정렬과, stream aggregate 와 merge join 한 대신에 hash aggregate 와 hash join 을 사용했다. 두개의 hash aggregate 가 중복을 제거하고 (하나는 distinct b 다른 하나는 distinct c) 두개의 hash aggregate 를 사용해 두개의 sum 값을 구했다. hash join 은 두 결과를 붙여서 마지막 결과를 만들었다.

Hints

"order group" 과 "hash group" 쿼리 힌트를 기술해 stream aggregate 와 hash aggregate 를 강제화 할 수 있다. 이러한 힌트는 전체 쿼리에서 모든 aggregation 연산자에 영향을 준다.

select sum(b) from t group by a option(order group)
select sum(b) from t group by a option(hash group)

Sql profiler

SQL 프로파일러를 사용해 hash join 이나 hash aggregate 메모리 초과를 검출 할 수 있다. ("Error and Warnings" 이벤트 클레서에서) "Hash Warning" 이벤트. 메모리 초과는 I/O 를 유발하고 성능에 좋지않는 영향을 준다. BOL 에서 이 이벤트에 대한 더 많은 정보를 얻을 수 있다.

by Craig Freedman

저작자표시 비영리 변경금지 (새창열림)

Posted by 보미아빠

, |

Table Value Function 의 종류

카테고리 없음 / 2011. 8. 8. 17:51

TVF 는 2가지가 있습니다.

inline TVF (Table Value Function) 과 Multi-statement TVF 로 나뉘며,

1. inlene TVF 는 parameterized view 와 동일하며,

2. multi-statement TVF 는 Table 변수를 지정하고, 여러 문장으로 나눈 while 이라던지 필요한 연산을 하고 결과값을 table 변수에 넣어서 출력 합니다. 그럼 성능이 좋을까요? 성능이 좋을리 없겠죠 table 변수는 통계정보가 없기 때문에, SQL 엔진은 cadinality 정보를 얻어 optimize 과정에서 효율화를 못하게 됩니다. 상대적으로 비효율적인 플랜은 나쁜 성능으로 이어지기도 합니다. 고의로 이러한 과정이 필요한 경우도 있을 것이고 그것이 성능이 좋아지는 경우도 있겠지만 이 두 차이를 이해하신다면, 경우에 따라서 어떤것을 써야 할 지 선택 할 수 있으리라 봅니다.

TVF 가 주요하게 성능을 향상시키는 경우는 배열 형태의 변수를 다른 프로시저로 전달해야 하는데, 하나하나 procedure 를 호출하는 것 보다 한번에 다 담아서 한번에 Call 하게 된다면 엄청난 성능향상이 있겠죠 이런 여러 차이점이 있으니 이것이 이것보다 좋아요 라고 말하기 보다는 It depends 라고 말하게 되지요...

TVF 가 특별하게 resultset cache 기능이 없는한 TVF 로 만드는 것은 의미가 없어 보입니다. 프로시저로 통일하고 적절하게 table 형태의 변수가 필요한 경우에만 사용

저작자표시 비영리 변경금지 (새창열림)

Posted by 보미아빠

, |

Stream Aggregate

카테고리 없음 / 2011. 8. 8. 12:19

쿼리의 실행 방법을 이해하고 필요없이 사용된 many-to-many join 을 query 재작성을 통해 one-to-many 조인을 이용해 값을 구하는 방법을 보여준다. one-to-many 조인은 temp 를 사용하지 않고, many-to-many join 은 temp 를 사용하기에 결과 양이 많아지면 i/o 성능에 문제가 생기기도 한다. 그러나, 실행계획을 이해하라고 글을 번역해 놓은것이지 성능을 높이기 위해 이러한 방법을 쓰라는 것은 아니다. 성능 튜닝은 다른 방법이 더욱 효율적일 수 있다.

Stream Aggregate

GROUP BY 절이 있을때, 보통 aggregate 를 수행하기 위해 SQL Server 는 두개의 물리 연산자를 가지고 있다. 하나는 저번주

에 보았던 stream aggregate 이다. 나머지 하나는 hash aggregate 이다. 이번 포스트에서는 stream aggregate 의 동작을 자

세히 살펴 보겠다.

The algorithm (알고리즘) -_- 이런건 번역하는게 이상하지만....

Stream aggregate 는 group by 컬럼(들)의 정렬된(sort) 데이터에 의존한다. 만약 하나 이상의 컬럼들이 그룹되면, 해당 모

든 컬럼들의 어떤 정렬 순서를 선택한다. 예를들면, 컬럼 a b 를 그룹하면, (a, b) 나 (b, a) 로 정렬할 수 있다. 이 정렬은

group by 컬럼을 위한 같은값을 가지는 행의 집합이 서로 인접하게 만들어 준다.

이것은 stream aggregate 알고리즘을 위한 슈도코드이다.

clear the current aggregate results
clear the current group by columns
for each input row
begin
    if the input row does not match the current group by columns
      begin
        output the aggregate results
        clear the current aggregate results
        set the current group by columns to the input row
      end
    update the aggregate results with the input row
end

예를들어 sum 값을 계산한다고 가정하면, 각각의 입력행에 대해 입력된 행이 현재 그룹에 속하면, (즉, 입력행의 group by

컬럼(들)이 이전행의 group by 결과와 같으면) 현재 계산중인 total 값에 적당한 값을 sum 해 업데이트 한다. 입력값이 새

로운 그룹에 속하면 (즉, 입력행의 group by 컬럼(들)이 이전행의 group by 결과와 같지 않으면) 현재의 sum 값을 출력하고

sum 값을 0 으로 초기화 하고 새로운 그룹값 계산을 시작한다.

Simple examples (간단한 예제)

create table t (a int, b int, c int)

select sum(c) from t group by a, b
|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))
|--Stream Aggregate(GROUP BY:([t].[b], [t].[a]) DEFINE:([Expr1010]=COUNT_BIG([t].[c]), [Expr1011]=SUM([t].

[c])))
|--Sort(ORDER BY:([t].[b] ASC, [t].[a] ASC))
|--Table Scan(OBJECT:([t]))

이것은 aggregate 하기전에 sort 가 필요하다는 점을 제외하면, 우리가 예전에 봤던 scalar aggregate SUM 과 기본적으로 같

은 플랜이다. (scalar aggregate 를 전체 행에 대해 하나의 큰 그룹으로 생각 할 수 있다. 그러므로 하나의 scalar

aggregate 는 행을 다른 그룹으로 넣을 필요가 없으므로, sort 가 필요 없다.)

Stream aggregate 는 입력된 행의 정렬순서를 유지한다. 그래서, 만약 group by 컬럼으로 정렬을 요청하거나 group by 컬럼

의 일부로 정렬을 요구하면, 다시 정렬할 필요가 없다.

select sum(c) from t group by a, b order by a

|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))
|--Stream Aggregate(GROUP BY:([t].[a], [t].[b]) DEFINE:([Expr1010]=COUNT_BIG([t].[c]), [Expr1011]=SUM([t].

[c])))
|--Sort(ORDER BY:([t].[a] ASC, [t].[b] ASC))
|--Table Scan(OBJECT:([t]))

위 실행계획은 바로위 실행계획에 비해 정렬컬럼만 반대로 바뀌었다. 이전 쿼리는 "(a,b)" 나 "(b,a)"로 정렬을 기술하지 않

았다. 이번에는 쿼리에 order by 절이 있고 컬럼 a 로 정렬을 요구했으므로 쿼리에서 컬럼 a 로 정렬을 먼저했다.

만약 우리가 적절한 인덱스가 있다면, 정렬이 필요없다.

create clustered index tab on t(a,b)

select sum(c) from t group by a, b
|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))
|--Stream Aggregate(GROUP BY:([t].[a], [t].[b]) DEFINE:([Expr1010]=COUNT_BIG([t].[c]), [Expr1011]=SUM([t].

[c])))
|--Clustered Index Scan(OBJECT:([t].[tab]), ORDERED FORWARD)

select distinct

다음 쿼리를 생각해 보자

select sum(distinct b) from t group by a
각각의 group 에서 컬럼 b 의 중복값을 제거가 필요하다. 이전 포스트에서, 이러한 방법중 하나인 sort distinct 를 이용하는 방법을 보았다. 하지만 적절한 인덱스가 있으면, stream aggregate 를 이용해서 중복값을 제거 할 수 있다.

|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))
       |--Stream Aggregate(GROUP BY:([t].[a]) DEFINE:([Expr1010]=COUNT_BIG([t].[b]), [Expr1011]=SUM([t].[b])))
            |--Stream Aggregate(GROUP BY:([t].[a], [t].[b]))
                 |--Clustered Index Scan(OBJECT:([t].[tab]), ORDERED FORWARD)

최상위의 aggregate 를 하는동안 최하위의 stream aggregate 는 중복값을 제거한다.

Multiple distincts (다중 distinct)

마지막으로 다음 쿼리를 생각해 보자

select sum(distinct b), sum(distinct c) from t group by a

|--Merge Join(Inner Join, MANY-TO-MANY MERGE:([t].[a])=([t].[a]), RESIDUAL:([t].[a] = [t].[a]))
       |--Compute Scalar(DEFINE:([t].[a]=[t].[a]))
       |    |--Compute Scalar(DEFINE:([Expr1005]=CASE WHEN [Expr1018]=(0) THEN NULL ELSE [Expr1019] END))
       |         |--Stream Aggregate(GROUP BY:([t].[a]) DEFINE:([Expr1018]=COUNT_BIG([t].[c]), [Expr1019]=SUM([t].[c])))
       |              |--Sort(DISTINCT ORDER BY:([t].[a] ASC, [t].[c] ASC))
       |                   |--Clustered Index Scan(OBJECT:([t].[tab]))
       |--Compute Scalar(DEFINE:([t].[a]=[t].[a]))
            |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1020]=(0) THEN NULL ELSE [Expr1021] END))
                 |--Stream Aggregate(GROUP BY:([t].[a]) DEFINE:([Expr1020]=COUNT_BIG([t].[b]), [Expr1021]=SUM([t].[b])))
                      |--Stream Aggregate(GROUP BY:([t].[a], [t].[b]))
                           |--Clustered Index Scan(OBJECT:([t].[tab]), ORDERED FORWARD)

이전 글에서 살펴본 다중 scalar distinct 예제에서 살펴본 것과 같이, 이 쿼를를 두개의 부분으로 나누어 볼 수 있다. 각 distinct set 에서 하나는, distinct 한 c 값을 구하기 위해 sum(distinct c) 는 정렬이 필요하고, 반면 sum(distinct b) 값을 구할때는 clustered index 와 stream aggregate 를 이용해 정렬없이 구했다. 그리고 마지막 결과를 구하기 위해 group by 컬럼 a 에 대한 각각의 sum 값을 merge join 했다. merge join 을 사용한 것은 각각의 두 입력이 이미 group by 컬럼에 대해 이미 정렬되어 있기 때문이다. (compute scalar 연산인 [t].[a] = [t].[a]”는 내부 목적으로 필요하므로 무시하면 된다.)

aggregate 값은 unique를 보장하기 때문에 one-to-many merge join 이 사용되어야 하고, many-to-many 방법을 사용할 필요가 없다. 이것은 약간의 성능 문제이지 정확도 문제는 아니다. 쿼리를 명시적 join 으로 다시 작성하면 one-to-many join 을 얻을 수 있다.

select sum_b, sum_c
from
(select a, sum(distinct b) as sum_b from t group by a) r
join
(select a, sum(distinct c) as sum_c from t group by a) s
on r.a = s.a

|--Merge Join(Inner Join, MERGE:([t].[a])=([t].[a]), RESIDUAL:([t].[a]=[t].[a]))
       |--Compute Scalar(DEFINE:([Expr1009]=CASE WHEN [Expr1020]=(0) THEN NULL ELSE [Expr1021] END))
       |    |--Stream Aggregate(GROUP BY:([t].[a]) DEFINE:([Expr1020]=COUNT_BIG([t].[c]), [Expr1021]=SUM([t].[c])))
       |         |--Sort(DISTINCT ORDER BY:([t].[a] ASC, [t].[c] ASC))
       |              |--Clustered Index Scan(OBJECT:([t].[tab]))
       |--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1022]=(0) THEN NULL ELSE [Expr1023] END))
            |--Stream Aggregate(GROUP BY:([t].[a]) DEFINE:([Expr1022]=COUNT_BIG([t].[b]), [Expr1023]=SUM([t].[b])))
                 |--Stream Aggregate(GROUP BY:([t].[a], [t].[b]))
                      |--Clustered Index Scan(OBJECT:([t].[tab]), ORDERED FORWARD)

Next ...
다음번 포스트에는 다른 aggregation 연산자에 대해서 살펴보겠다. (hash aggregate)

by Craig Freedman

저작자표시 (새창열림)

Posted by 보미아빠

, |

«이전 1 ··· 40 41 42 43 44 45 46 ··· 54 다음»

보미네

카테고리

달력

공지사항

태그목록

최근에 올라온 글

'보미아빠, 석이'에 해당되는 글 532건

dbcc memorystatus 의 분석

쿼리 분리가 필요한 경우?

[NF 2011 Denali] Contained Database

바다는 한방울 빗물을 마다하지 않는다.

Aggregation WITH ROLLUP

listening 연습은 이걸로 -_- reading 하고 있었는데 좋은걸~ ^^

oracle linked server

Hash Aggregate

Table Value Function 의 종류

Stream Aggregate

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

링크

티스토리툴바