보미네 :: 보미네

« » 2025.11
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Dynamic Seeks and Hidden Implicit Conversions

카테고리 없음 / 2025. 9. 22. 11:39

https://www.sql.kiwi/2012/01/dynamic-seeks-and-hidden-implicit-conversions/

Dynamic Seeks and Hidden Implicit Conversions

SQL Server can generate dynamic seek plans, where a seek is driven by Constant Scan and Merge Interval operators. The plans use internal functions LikeRangeStart, LikeRangeEnd, and LikeRangeInfo. Watch out for hidden conversions using GetRangeThroughConver

www.sql.kiwi

Introduction

A LIKE predicate with only a trailing wildcard can usually use an index seek, as the following AdventureWorks sample database query shows:

SELECT 
    P.[Name]
FROM Production.Product AS P
WHERE 
    P.[Name] LIKE N'D%';

SQL Server determines a covering range, which depends on the collation. It seeks the string index using the range as the start and end points of a partial scan, and applies the original LIKE condition as a residual predicate. More specifically, the Storage Engine seeks the index to locate rows in the covering range, and the Query Processor applies the residual predicate (the LIKE) to the rows it receives back.

Dynamic Seeks

What if the LIKE search term is in a variable?

DECLARE 
    @Like nvarchar(50) = N'D%';

SELECT 
    P.[Name]
FROM Production.Product AS P
WHERE 
    p.[Name] LIKE @Like;

SQL Server can still perform a seek here, but it needs to determine the covering seek range for the search term at execution time, not at compilation time:

The plan now contains an extra Constant Scan, a Compute Scalar and a Nested Loops Join. These operators are interesting because they have zero cost estimates: No CPU cost, no I/O cost, nothing. These new operators are purely architectural in nature. They are a workaround for the fact that SQL Server cannot currently perform a dynamic seek entirely within the Index Seek operator itself.

The optimizer’s output is:

*** Output Tree: (trivial plan) ***
    PhyOp_Filter
        PhyOp_Range TBL: Production.Product
            ScaOp_Intrinsic like
                ScaOp_Identifier QCOL: [P].Name
                ScaOp_Identifier COL: @Like 
                ScaOp_Const TI(nvarchar ML=2)
                ScaOp_Identifier COL: ConstExpr1002 
                ScaOp_Identifier COL: ConstExpr1003 
                ScaOp_Identifier COL: ConstExpr1004

The additional operators needed for dynamic seek execution are added after query optimization is complete, so they cannot affect plan choices. Since there is no more cost-based analysis to do, the extra machinery is given costs of zero.

Execution plan

Looking at the finished execution plan:

The Constant Scan produces a single in-memory row with no columns.
The Compute Scalar defines expressions to describe the covering seek range (using the runtime value of the @Like variable).
The Nested Loops Join drives the seek using the computed range information (provided as correlated values).

The upper tooltip shows that the Compute Scalar uses three internal functions, LikeRangeStart, LikeRangeEnd, and LikeRangeInfo.

The first two functions describe the range as an open interval. The third function returns a set of flags encoded in an integer. These are used internally to define certain seek properties for the Storage Engine.

The lower tooltip shows the seek on the open interval described by the result of LikeRangeStart and LikeRangeEnd, together with the application of the residual predicate LIKE @Like.

More Dynamic Seeks

Something very similar occurs in plans that use IN or OR with variables:

DECLARE 
    @1 integer = 320,
    @2 integer = 325,
    @3 integer = 330;

SELECT
    P.[Name]
FROM Production.Product AS P
WHERE 
    P.ProductID IN (@1,@2,@3);

Now we have three ranges: One for each of the variables in the original query. The Compute Scalar operators again define three columns containing the start and end of the range, and the associated informational flags (previously seen as a result of the LikeRangeInfo function). This time, we see the decimal representation of these flags, which happens to be 62 for an equality comparison.

The IN expands to ProductID = @1 OR ProductID = @2 OR ProductID = @3, so each of the ‘ranges’ here is in fact a single value. For this example, the start and end range values are therefore the same in each Compute Scalar. The three dynamic ranges are concatenated, and sorted so any overlapping ranges appear next to each other.

The Merge Interval collapses these intervals into one or more disjoint (non-overlapping) ranges. This is important, because the three variables might, for example, all contain the same value, and it would be incorrect to return that value three times.

For each disjoint range produced, the Nested Loops Join drives a new seek of the Clustered Index. The overall effect is that an arbitrary number of possibly overlapping ranges are computed, merged, and then used to drive one or more seek operations.

The final result of the query will be the combination of all the seek results.

Hidden Conversions

The following example contains a table with datetime2 values, and a query with a expression that at first sight seems unlikely to be able to seek on an index. The variable is typed as date, and there is a CONVERT function applied to the datetime2 column:

DECLARE @Example table
    (date_time datetime2 PRIMARY KEY);

INSERT @Example 
    (date_time) 
VALUES
    ('20110101 12:34:56');

DECLARE
    @date date = '2011-01-01';

SELECT * 
FROM @Example AS E
WHERE
    @date = CONVERT(date, E.date_time);

Nevertheless, an execution plan that uses a seek can be produced:

In this case, neither SSMS nor Plan Explorer will show the contents of the Compute Scalar. We have to open the XML representation of the execution plan to see the three familiar expressions, wrapped in a ValueVector (just a fancy container for multiple expressions).

Another internal function, GetRangeThroughConvert, is responsible for determining the range of datetime2 values covered by the date variable @date, and the informational flags needed. In the same way the engine works out covering ranges for some LIKE predicates, this function determines ranges where certain problematic type conversions are required.

Otherwise, the machinery is the same: A range description is defined by the Compute Scalar, and the Nested Loops Join driving a seek using those values.

More Hidden Conversions

There is another related internal function used when the Query Processor needs to determine a range for a comparison between different data types.

This example returns rows based on a greater-than-or-equal comparison between date column values and the datetime return value of the GETDATE() intrinsic function:

DECLARE @Example table
    (col1 date PRIMARY KEY);

SELECT * 
FROM @Example AS E
WHERE 
    E.col1 >= DATEADD(DAY, -7, GETDATE());

Again, the graphical plan cannot display the contents of the ValueVector, so we have to dig into the XML.

The function:

Evaluates the DATEADD(GETDATE()) expression
Computes the open-interval start point of a date range accounting for the conversion from datetime to date
Specifies NULL as the end of the range (since this is a >= comparison, there is no end value).

The flags value in this case is 22 (representing a >= seek operation).

Everything All At Once

This last example features all sorts of data type sloppiness, resulting in an execution plan that uses:

GetRangeThroughConvert on the string expression
GetRangeThroughConvert on the result of
GetRangeWithMismatchedTypes applied to the result of the GETDATE function.

The whole thing is then wrapped in a dynamic seek with the Merge Interval enforcing the (annoying) BETWEEN requirement that the first parameter must be less than or equal to the second.

See if you can work out all the conversions necessary for this query, using the rules of data type precedence.

It is really quite impressive that this example of lazy T-SQL coding still results in an index seek, don’t you think?

DECLARE @Example table
    (col1 datetime PRIMARY KEY);

SELECT * 
FROM @Example AS E
WHERE 
    CONVERT(date, E.col1) BETWEEN '20000101' AND GETDATE();

Conclusion

SQL Server works quite hard sometimes to produce index seeks where they might seem unlikely. This is a good thing, and it would be great to see this capability extended further in future. The downside is that this extra effort means you are less likely to see an Index Scan when you have done something daft with data types.

Why is this a bad thing if you get a seek anyway?

The problem is that these hidden implicit conversions can result in inaccurate cardinality and distribution estimates at any stage of the plan. So, even if you get a seek, the plan might be way off overall.

If that isn’t persuasive enough, consider this: Will having hidden nested range calculations improve your chances of getting a good query plan? Probably not, no.

Be very aware of data types in general, and in particular of the data types returned by system functions and T-SQL expressions. If in doubt, use SELECT INTO to materialize the result of an expression or query, and check the types of the columns produced.

Note: If you have any scripts that trawl the plan cache looking for implicit conversions (CONVERT_IMPLICIT), you might want to look into updating them to check for these conversions too. Remember that not all conversions are bad though.

Implicit Conversions by Craig Freedman.
More on Implicit Conversions also by Craig Freedman.
Join Performance, Implicit Conversions, and Residuals by me.
Trigram Wildcard Search in SQL Server also by me.

--*****************************************************************
-- 아래 이슈는 변수로 들어오는 인자를 테이블 컬럼타입과 일치시켜 varchar(10) 으로 변환해주면 효율적으로 처리 가능합니다. 
--*****************************************************************

-- 테이블
CREATE TABLE dbo.RandomData
(
    Id INT IDENTITY(1,1) PRIMARY KEY,
    RandomString VARCHAR(10) NOT NULL
);
GO

-- 데이터 삽입
INSERT INTO dbo.RandomData(RandomString)
SELECT LEFT(REPLACE(CONVERT(VARCHAR(36), NEWID()),'-','') 
            + REPLACE(CONVERT(VARCHAR(36), NEWID()),'-',''), 10)
FROM (
    SELECT TOP (100000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n
    FROM sys.all_objects a
    CROSS JOIN sys.all_objects b
) AS x;
GO 3

-- 인덱스 
create nonclustered index nc_RandomData_01 on RandomData (RandomString)
go

-- i/o 통계
set statistics io , time on 
go


-- 전체를 count 시 
select count(*) from RandomData
-----------
300000

(1개 행 적용됨)
--*****************************************************************
테이블 'RandomData'. 스캔 수 1, 논리적 읽기 1494, 실제 읽기 0, 페이지 서버 읽기 0, 미리 읽기 읽기 0, 페이지 서버 미리 읽기 읽기 0, lob 논리적 읽기 0, lob 실제 읽기 0, lob 페이지 서버 읽기 0, lob 미리 읽기 읽기 0, lob 페이지 서버 미리 읽기 읽기 0.
--*****************************************************************

-- 잘동작하는 경우
select count(*) from RandomData where RandomString = N'z'
select count(*) from RandomData where RandomString = N'aaaaaaaaaa'
select count(*) from RandomData where RandomString = N'zzzzzzzzzz'
select count(*) from RandomData where RandomString = N'          '

-----------
0

(1개 행 적용됨)
--*****************************************************************
테이블 'RandomData'. 스캔 수 1, 논리적 읽기 3, 실제 읽기 0, 페이지 서버 읽기 0, 미리 읽기 읽기 0, 페이지 서버 미리 읽기 읽기 0, lob 논리적 읽기 0, lob 실제 읽기 0, lob 페이지 서버 읽기 0, lob 미리 읽기 읽기 0, lob 페이지 서버 미리 읽기 읽기 0.
--*****************************************************************

-- GetRangeThroughConvert 함수값이 비효율 적인 경우
select count(*) from RandomData where RandomString = N'0'
select count(*) from RandomData where RandomString = N'a'


-----------
0

(1개 행 적용됨)
--*****************************************************************
테이블 'RandomData'. 스캔 수 1, 논리적 읽기 93, 실제 읽기 0, 페이지 서버 읽기 0, 미리 읽기 읽기 0, 페이지 서버 미리 읽기 읽기 0, lob 논리적 읽기 0, lob 실제 읽기 0, lob 페이지 서버 읽기 0, lob 미리 읽기 읽기 0, lob 페이지 서버 미리 읽기 읽기 0.
--*****************************************************************

-- GetRangeThroughConvert 함수값이 매우 비효율 적인 경우
select count(*) from RandomData where RandomString = N''
select count(*) from RandomData where RandomString = N'0123456789012'

   CPU 시간 = 0ms, 경과 시간 = 0ms.

-----------
0

(1개 행 적용됨)
--*****************************************************************
테이블 'RandomData'. 스캔 수 1, 논리적 읽기 1494, 실제 읽기 0, 페이지 서버 읽기 0, 미리 읽기 읽기 0, 페이지 서버 미리 읽기 읽기 0, lob 논리적 읽기 0, lob 실제 읽기 0, lob 페이지 서버 읽기 0, lob 미리 읽기 읽기 0, lob 페이지 서버 미리 읽기 읽기 0.
--*****************************************************************

Posted by 보미아빠

, |

SQL Server Mirroring Vs. AG 스레드 사용 방식 차이

카테고리 없음 / 2025. 8. 11. 17:14

스레드 구조 차이
Database Mirroring: 데이터베이스마다 독립적인 전용 워커 스레드를 사용하여 데이터 전송, 로그 적용, 세션 관리 등을 수행한다.
AlwaysOn AG: 모든 데이터베이스가 HADR Thread Pool(hadr_thread_pool_worker)을 공유하여 스레드를 사용한다. 따라서 데이터베이스 개수가 증가하더라도 스레드 수가 1:1로 비례 증가하지 않는다.

DB 수 증가에 따른 영향
Mirroring은 데이터베이스 수가 늘어날수록 필요한 워커 스레드 수도 선형적으로 증가한다. 스레드 수가 많아지면 SQL Server의 max worker threads 한계에 영향을 줄 수 있으며, CPU 스케줄링과 병렬 처리에도 부하가 커진다.

메모리 사용량
SQL Server에서 워커 스레드 하나당 약 2MB의 스택 메모리가 예약된다. 이 메모리는 max server memory 설정에 포함되므로 buffer pool 이나 plan cache 등 메모리 용량 산정 시 고려해야 한다. 1000개의 max worker thread를 설정하면 max server mermory 가 4GB 이라면 2GB 가 thread 메모리로 사용될 수 있다. (최대값으로 가정 하는게 좋고 실제 동작을 보면, worker thread 개수는 자동으로 올라가고 내려간다.)

결론 및 설계 포인트
Mirroring 환경에서는 DB 개수가 많을수록 워커 스레드 증가 → 메모리 사용량 증가 → max server memory를 설정할 때 이 부분을 감안해야 한다. AG 환경은 Thread Pool을 사용하므로 DB 개수 증가의 영향이 상대적으로 적다. 따라서 이중화 솔루션 설계 시 스레드 구조와 메모리 소비 패턴을 반영한 용량 계획이 필요하다.

추가 고려사항

* 데이터베이스 미러링(DM)은 DB당 단일 스레드만 쓰지 않음. 일반적으로 DB당 약 2개 이상의 스레드를 소비하고, 미러(secondary) 쪽에서는 (Enterprise의 경우) 병렬 REDO(roll‑forward) 스레드가 추가로 할당될 수 있음
* 스레드 1개당 스택 메모리(64비트 환경) 는 대략 2 MB 이므로(스택/MTL 관련), 스레드 증가가 메모리(및 max server memory 계산)에 직접 영향을 줌
* DM 미러에서의 병렬 REDO(Enterprise) 는 “CPU 4개당 1개의 REDO 스레드(대략)” 로 동작하는 식으로 최적화 됨

스크린샷1 (142개 미러링 데이터베이스가 있을때 Primary)

스크린샷2 (142개 미러링 데이터베이스가 있을때 Secondary)

결론?

* DM에서 미러링 대상서버의 max workerthread 숫자 조절 필요

* 제발 데이터베이스를 너무 많이 만들지 말자 (살려주세요 이건 아니잖아요....)

추가 전문 아티클

https://www.dbi-services.com/blog/sql-server-alwayson-number-of-databases-and-the-hadr-worker-pool/

SQL Server AlwaysOn: number of databases and the hadr worker pool

www.dbi-services.com

Often, when I’m in charge to implement an SQL Server AlwaysOn infrastructure with availability groups, customers ask me if it exists some best practices and recommendations about the maximum number of databases in an availability group planning. In fact, we have to keep in mind the relation between databases and the SQL Server worker pool in this architecture. Remember that the SQL Server AlwaysOn and availability group feature is an extension of the SQL Server mirroring and we retrieve the same concept of worker thread usage with a different architecture: Mirroring feature dedicates worker threads for each database compared to SQL Server AlwaysOn that uses requests queue and worker pool (aka HADR pool). Let’s see how the HADR worker pool behaves and its tight relationship with the databases.

First of all, let’s take a look at the Microsoft documentation here – “Prerequisites, Restrictions, and Recommendations for AlwaysOn Availability Groups – and let’s have a focus on the “Thread Usage by Availability Groups” section. After reading the content we can notice that the number of databases is an important part of the minimum hadr pool size calculation among others factors. Others factors are the following:

Number of replicas
Log capture thread
Log send queue thread
Message handler thread

Another point to keep in mind is that the size of the HADR worker pool is capped by the “max worker thread” parameter as shown below:

Hadr pool size max = max worker threads – 40

I also encourage you to read the excellent article of Bob Dorr that provides a formula to estimate the minimum size of the HADR worker pool. Once again we can notice that the number of databases is still relevant here. As explained by Bob Dorr, the max databases parameter concerns only the “active” databases where “active”.

Let’s perform a quick calculation of the min pool size in the worst case (all databases are active) required on the primary replica with several configurations based on the Bob Dorr’s formula:

Min Pool Size = Max Databases * (Log Capture Thread + (Log Send Thread * Nb secondary replicas)) + Message Handler Thread

	One secondary replica	Two secondary replicas
100 databases	100 * 2 + 1 = 201	100 * 3 + 1 = 301
500 databases	500 * 2 + 1 = 1001	500 * 3 + 1 = 1501
1000 databases	1000 * 2 + 1 = 2001	1000 * 3 + 1 = 3001

Then let me show you a more real picture with my lab that includes a SQL Server AlwaysOn 2014 (64 bits) with one availability group and two replicas (basically one primary and one secondary). Each replica is configured with 4 VCPU and no affinity mask, so all the CPU are visible from each replica.

Next, according to the Microsoft documentation here, we will be capped by the “max worker threads” parameters equal to 512 in my case. Thus, the maximum size of the HADR worker pool size will be 512 – 40 = 472.

Now let’s play with different scenarios:

1- First scenario (no availability groups)

The first scenario includes an environment with no availability groups. The global number of worker threads is as following:

select

scheduler_id,current_tasks_count, current_workers_count,active_workers_count,work_queue_count

from sys.dm_os_schedulers

where status = 'VISIBLE ONLINE'

This view is not perfect in our case because it includes all the worker threads of the SQL Server instance (hadr worker threads are included to this number). But we will use it as a starting point because there is no activity on my lab and we can admit that the active_workers_count column value will be relatively close than the number of HADR worker threads.

2 – Second scenario (availability group with 100 idle databases)

The second scenario consists in adding 100 databases to my newly availability group but there is no activity. Let’s have a look at the global number of worker threads:

The number of worker threads has increased but this is not a big deal here because the availability databases are not very active. At this point I want to introduce another way to get the number of hadr worker threads by using the extended events and the hadr_thread_pool_worker_start event:

The extended event session I used during my test …

create event session HadrThreadPoolWorkerStart

on server

add event sqlserver.hadr_thread_pool_worker_start

add target package0.event_file

(

set filename = N'E:\SQLSERVER\SQL14\backup\HadrThreadPoolWorkerStart.xel'

)

with

(

max_memory = 4096 KB,

event_retention_mode = allow_single_event_loss,

max_dispatch_latency = 30 seconds,

max_event_size = 0 KB,

memory_partition_mode = none,

track_causality = off,

startup_state = on

);

… the data extraction script:

declare @top_count int;

set @top_count = 100;

;with xe_cte as

(

select

object_name,

cast(event_data as xml) as event_data

from sys.fn_xe_file_target_read_file ( 'E:\SQLSERVER\SQL14\backup\HadrThreadPoolWorkerStart*.xel', null, null, null)

)

select top (@top_count)

DATEADD(hh, DATEDIFF(hh, GETUTCDATE(), CURRENT_TIMESTAMP), event_data.value('(/event/@timestamp)[1]', 'datetime2')) AS [timestamp],

event_data.value('(/event/data/value)[3]', 'int') as active_workers,

event_data.value('(/event/data/value)[2]', 'int') as idle_workers,

event_data.value('(/event/data/value)[1]', 'int') as worker_limit,

event_data.value('(/event/data/value)[4]', 'varchar(5)') as worker_start_success

from xe_cte

order by [timestamp] desc;

… and the result:

3- Third scenario (availability group with 200 idle databases)

Let’s add 100 additional databases and let’s have a look at the sys.dm_os_schedulers DMV:

Here, the result from the extended event session:

As the previous scenario, the number of active worker threads is still low because there is no activity on the availability databases.

4- Fourth scenario (availability group with 300 “active” databases)

In this scenario, let’s add 100 additional databases (for a total of 300 databases) but this time we will simulate a workload that will perform randomly on different availability databases.

Let’s take a look at the extended event session to see the number of active workers:

As you can notice, the story is not the same. A big increase of the number of HADR active workers (roughly 258) can be noticed here. However, if I refer to Bob Dorr’s formula in his article, we may be disappointed by the number of threads that we get from the extended event session that is theoretically lower than we can expect for a total number of 300 databases (300 *2 + 1 = 601). At this point, my guess is that my workload and my lab configuration didn’t allow to reach out the HADR worker thread starvation scenario but in fact, this is not so important here because the goal was simply to highlight the difference between the number of active worker threads between an idle and active situation.

5- Fifth scenario : case of a worker threads exhaustion scenario

In the previous scenarios, we saw that increasing the number of databases can have an impact on an availability group. From my point of view, facing this situation is probably the worst scenario but we have to take into account in your design regarding your context.

In this scenario, I will voluntary increase the number of databases up to 500 to reach out more quickly the number of allowed active worker threads.

Here’s a picture of such scenario from the extended event session:

We are in a situation of HADR worker pool starvation (active workers = work limit = 472). Moreover, the work_start_success column value (= false) indicates the SQL Server’s inability to start a new HADR worker.

After a big coffee break, here what I saw from the SQL Server error log. This message indicates an excessive worker thread contention after 15 minutes.

If you see this error message in your availability groups environment, it’s time to check the number of availability databases. You may face this situation more often than you may think. I can see a lot of customer situations where availability groups are tied to consolidated databases environments. In such situation, adding a database can become a semi-automatic process and the number of HADR worker threads can be out quickly of the control.

What is the solution in this case? Add more worker threads? More processors? Reduce the number of databases? I guess you know what I will answer: it depends on your context.

So, in this blog post we’ve seen that the number of databases in an availability group may be a problem regarding your infrastructure and the number of databases hosted in your availability group. Therefore, this topic can be probably one of your concern if you plan to consolidate a lot of databases in your availability groups. In this case, I would probably recommend to my customers to include the hadr worker pool in his monitoring tool.

Hope this helps.

By David Barbarin

Posted by 보미아빠

, |

메시지 3201, 수준 16, 상태 2, 줄 1

카테고리 없음 / 2025. 8. 8. 13:09

메시지 3201, 수준 16, 상태 2, 줄 1
Cannot open backup device '\\10.0.1.11\backup\a.bak'. Operating system error 1272(You can't access this shared folder because your organization's security policies block unauthenticated guest access. These policies help protect your PC from unsafe or malicious devices on the network.).
메시지 3013, 수준 16, 상태 1, 줄 1
RESTORE HEADERONLY is terminating abnormally.

Windows 2019 이상에서는 아래 옵션이 적용되어 있어야 한다.

Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters" -Name AllowInsecureGuestAuth -Value 1 -Type DWord
gpupdate /force

Posted by 보미아빠

, |

«이전 1 2 3 4 5 ··· 178 다음»

보미네

카테고리

달력

공지사항

태그목록

최근에 올라온 글

Dynamic Seeks and Hidden Implicit Conversions

Introduction

Dynamic Seeks

Execution plan

More Dynamic Seeks

Hidden Conversions

More Hidden Conversions

Everything All At Once

Conclusion

SQL Server Mirroring Vs. AG 스레드 사용 방식 차이

메시지 3201, 수준 16, 상태 2, 줄 1

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

링크

티스토리툴바

보미네

카테고리

달력

공지사항

태그목록

최근에 올라온 글

Dynamic Seeks and Hidden Implicit Conversions

Introduction

Dynamic Seeks

Execution plan

More Dynamic Seeks

Hidden Conversions

More Hidden Conversions

Everything All At Once

Conclusion

Related Reading

SQL Server Mirroring Vs. AG 스레드 사용 방식 차이

메시지 3201, 수준 16, 상태 2, 줄 1

최근에 달린 댓글

최근에 받은 트랙백

글 보관함

링크

티스토리툴바