Spiga

Large Data Operations in SQL Server

(http://www.sql-server-performance.com/jc_large_data_operations.asp)
By Joe Chang
18 February 2003

The articles in the Quantitative Performance Analysis series examined the internal cost formulas used by the SQL Server query optimizer in determining the execution plan and the actual query costs for in-memory configurations. The previous articles restricted the actual query cost analysis to in-memory cases where all necessary data fit in memory so that performance was not limited by the disk configuration. This article provides a first look at queries involving data large compared to available memory and queries involving write operations where performance is limited by the disk configuration.

A case is presented in this article showing that the SQL Server internal cost formulas for I/O operations are based on disk access times. The I/O cost values were fixed several years ago. They do not accurately reflect current generation disk drive performance, nor do they reflect the actual cost of in-memory operations. In addition, the SQL Server execution cost model does not properly account for I/O costs on Insert, Update, and Delete (IUD) operations. For small row count operations, the discrepancy between the internal SQL Server cost model and the actual query costs do not significantly affect the efficiency of the execution plan. On large row count operations, the default plan can be substantially slower than an execution plan that accurately accounts for the number I/O operations involved.

Execution Plan Cost Formula Review
First, a review of the SQL Server execution plan cost formulas discussed in the Quantitative Performance Analysis articles is helpful. The Index Seek operation cost formula is shown below. The I/O cost depends on the amount of physical memory.

I/O Cost = 0.006328500 + 0.000740741 per additional page (1GB)          = 0.003203425 + 0.000740741 per additional page (>1GB)

CPU Cost = 0.000079600 + 0.000001100 per additional row
The cost formula for multiple Bookmark Lookup operations follows the formula below.
I/O Cost = multiple of 0.006250000 (1GB)          = multiple of 0.003124925 (>1GB)
CPU Cost = 0.0000011 per row
The Bookmark Lookup multiple is not the exact multiple of the estimated row count, but a fraction thereof, typically >95% of the estimated rows for large row counts. For the both the Index Seek and Bookmark Lookup operations, the I/O base cost depends on the system memory configuration. For up to and including 1GB memory, the Index Seek I/O base cost is 0.063285. Above 1GB, the I/O cost is 0.003203425. All other component operation I/O costs do not depend on system memory (unless an index seek is implicitly involved). The Table Scan cost formula below, also applies for clustered index scans and nonclustered index scans.

I/O Cost = 0.0375785 + 0.000740741 per additional page
CPU Cost = 0.0000785 + 0.0000011 per row
For all table modification operations, Insert, Update and Delete (IUD), the I/O cost approximately follows the formulas below. The CPU cost is exact. The total cost of Update and Delete operations include the cost of either an index seek or table scan, but may not be incorporated in the I/O or CPU cost.

IUD I/O Cost ~ 0.01002 – 0.01010 (>100 rows)
IUD CPU Cost = 0.000001 per row
An interesting point is that the IUD I/O cost is fixed, regardless of the number of rows and pages involved. Contrast this to the Index Seek, Bookmark Lookup and Table Scan I/O costs, which depend on the number of pages involved.

There appears to be no documentation from Microsoft on the units of measure for the SQL Server execution plan cost values. The unit of measure could either be some measure of time or CPU utilization. The article Parallel Execution Plans demonstrated that the unit of measure is most probably time rather than CPU utilization. The primary argument for this deduction is that an execution plan with parallel component operations has a lower cost than the equivalent plan with non-parallel operations. A logical interpretation is that a parallel operation on two processors should complete the step in approximately one-half the time, plus some overhead for combining the results from each processor. If the unit of measure for execution plan cost value were CPU utilization, the parallel operation should show a higher cost on the assumption that splitting a task between two processors does not reduce the sum total CPU-cycles expended and some extra overhead is required to combine the results.

Following the assumption that the SQL Server execution plan cost values are in units of time, the next probable deduction is that the time unit is seconds. The cost values do not depend on the system processor type, and the cost values have not changed from SQL Server 7.0 to 2000. It can be assumed that the cost values were calibrated on some system that was available while SQL Server 7.0 was in development, sometime between 1995 and 1997. A reasonable assumption for that period is that the reference environment on which the SQL Server cost values were calibrated was a database with data size much larger than the available system memory. In that case, SQL operations would frequently involve disk I/O.

On a current generation 15K disk drive, the Index Seek I/O cost of 0.0063285 seconds (or 6.33ms) is a reasonable random disk access time. However, this is excessively low for the 7200 RPM disk drives available in the mid-1990’s. It is possible that a 50% buffer cache hit ratio may have been factored into the Index Seek and Bookmark Lookup I/O cost for system for the 1GB configuration and 75% for the >1GB configuration. There is no meaningful liability in using only two discrete I/O cost values as opposed to a continuous range. The additional I/O cost per page of 0.0007407 can be interpreted as the sequential I/O time (0.74ms), implying that the disk drive sequential transfer rate is 1350 IO/sec at 8KB each or 169 IO/sec at 64KB.

However, absolute system memory is not a reliable indicator of buffer cache hit ratio. A better strategy for adjusting I/O cost might be to use the recent buffer cache hit ratio history. For large row count operations, it is also worthwhile to determine how much of the required data is already in memory.

Test Environment
The test environment server is a 2-way Xeon 2.4GHz/512K cache, 533MHz system bus, and 2GB memory on a motherboard with the ServerWorks GC-LE chipset. The test database data file resides on 2x15K 18G SCSI U160 drives (OS) striped, all log files on one 10K SCSI drive, and the tempdb data file on an IDE/ATA drive. The database recovery model is set to SIMPLE, and 4GB of space is pre-allocated for both data and log. In addition, the max degree of parallelism was restricted to one CPU for most tests. The results are single query time measurements. There are some variations between measurements since there is no easy method to determine the contents of the buffer cache. An attempt was made to determine a reasonably repeatable measurement.

The following script creates a test table and populates the table with test data. As defined, this table has 8 integer columns of 4 bytes each, a char column of 10 bytes, a decimal column of 5 bytes, and three other columns of 8 bytes each for a total 71 bytes. Along with the metadata overhead, this works out to approximately 81 bytes per row, allowing 99 rows to fit in each 8KB page at 99% fill factor, with 77 bytes free.

CREATE TABLE M3C_00 (
     ID int NOT NULL,
     ID2 int NOT NULL,
     ID3 int NOT NULL,
     ID4 int NOT NULL,
     ID5 int NOT NULL,
     ID6 int NOT NULL,
     SeqID int NOT NULL,
     DistID int NOT NULL,
     Value char(10) NOT NULL,
     randDecimal decimal (9,4) NOT NULL,
     randMoney money NOT NULL,
     randDate datetime NOT NULL,
     seqDate datetime NOT NULL )

The script below populates the above table with 10,000,000 rows. The ID column is a sequence from 1 to 10M, but is not explicitly defined as an identity column. The columns ID2 to ID6 are not important to this series of tests, and any values can be used. The SeqID values are sequential and DistID values are distributed. In this case, the table has 10M rows. The first 100,000 rows have SeqID value 1; the next 100K rows have SeqID value 2 and so on. The first row has DistID value 1, the next row with DistID value is row 101, and so on. All rows with a specific SeqID are in adjacent rows. All rows with a specific DistID value are 100 rows apart, which cause each row having a common DistID value to be in a different 8KB data page. This table is not clustered, so while there is no guarantee on the row placement, it does appear that the rows are inserted in sequence.

BEGIN TRANSACTION

DECLARE @I int, @rowCnt int,
@p int, @sc1 int, @dv1 int

SELECT @I = 1, @rowCnt =
10000000, @p =100, @sc1 = 100000

SELECT @dv1 = @rowCnt/100000

WHILE @I <= @RowCnt BEGIN

INSERT M3C_00
(ID, ID2, ID3, ID4, ID5, ID6, SeqID, DistID,

Value, randDecimal, randMoney,
randDate, seqDate)

VALUES ( @I,
@I, @I/2, @I/4, @I/10, @I/20,

(@I-1)/@sc1 + 1, -- Sequential values

(@I-1)%(@dv1) + 1, -- Distributed values

>CHAR(65 + 26*rand())+CHAR(65 +
26*rand())+CHAR(65 + 26*rand())

+CONVERT(char(6),CONVERT(int,100000*(9.0*rand()+1.0)))+CHAR(65 + 26*rand()),

10000*rand(), 10000*rand(),

DATEADD(hour,100000*rand(),'1990-01-01'),
DATEADD(hour,@I/5,'1990-01-01') )

SET @I = @I+1

END

COMMIT TRANSACTION

CHECKPOINT

The above script takes 17min 45sec to run, averaging 9,380 row inserts per second. The M3C_00 table ends up at 105,264 pages with 95% fillfactor. Instead of regenerating the table each time a fresh data set is needed, the original data set is preserved in the above table and a copy of the original is generated with the SELECT INTO statement below.

SELECT
ID,ID2,ID3,ID4,ID5,ID6,SeqID,DistID,Value,randDecimal,randMoney,randDate,seqDate

INTO M3C_01

FROM M3C_00

CHECKPOINT

This table has 101,012 pages, averaging 99 rows per page at 99% fillfactor for a table size of just over 789MB. The SELECT INTO takes approximately 28 seconds to run, averaging 357,000 rows inserted per second. The following two indexes are created on SeqID and DistID columns. Each index occupies 18,554 pages or 145MB.

CREATE INDEX IX_M3C_01_Seq
ON M3C_01 (SeqID) WITH SORT_IN_TEMPDB

CHECKPOINT

CREATE INDEX IX_M3C_01_Dist
ON M3C_01 (DistID) WITH SORT_IN_TEMPDB

CHECKPOINT

The first index takes approximately 30 seconds to create with 256M memory and the seconds takes 60 seconds.

Test Queries
The first queries tested are the Select statements are shown below. The first pair of Select statement queries are on the SeqID column. The second pair of Select statement queries are on the DistID columns. The first query in each pair is a basic Select statement without index hints. The second query in each pair specified as an index hint.

-- Sequential rows, table scan

SELECT
AVG(randMoney) FROM
M3C_01 WHERE SeqID = 91

-- Sequential rows, index seek and bookmark
lookup

SELECT
AVG(randMoney) FROM
M3C_01 WITH(INDEX(IX_M3C_01_Seq))
WHERE SeqID = 91

-- Distributed rows, table scan

SELECT
AVG(randMoney) FROM
M3C_01 WHERE DistID = 91

-- Distributed rows, index seek and bookmark
lookup

SELECT
AVG(randMoney) FROM
M3C_01 WITH(INDEX(IX_M3C_01_Disr))
WHERE DistID = 91

The estimated and actual row count for both search arguments (SARG) is 100,000 rows. The default execution plan for the query without hints for either SARG is a table scan. The hint forces the execution plan to use the index on the SARG, which in turn requires a bookmark lookup to retrieve the randMoney column value. Now SQL Server statistics only keeps distribution information on the range of column values, not the location of the rows. Therefore, SQL Server cannot know how many different pages are actually required. Since both SARG cases have the same estimated (and actual) row count, the execution plan and estimated cost is identical for either SARG as long as the hints are also equivalent. The execution plans for the first pair of queries are shown below.

Picture (Metafile)
The Table Scan involves 101,012 pages and the Bookmark Lookup involves 100,000 rows, so the actual query time is essentially a comparison of the cost per page of a table scan relative to the bookmark lookup cost per row. The Table Scan cost detail is shown below.

Picture (Metafile)
The Estimated I/O cost is displayed as 37.4 and the Estimated CPU cost is displayed as 5.50, but the Estimated cost for the entire operation is 85.86. Calculating the I/O and CPU costs from the formula in the previous section, one arrives at 74.86 for I/O and 11.00 for CPU. This does work out to exactly the 85.86 shown for the combined I/O and CPU cost. Every now and then, the displayed value of the I/O and CPU costs are exactly one-half the expected value, but the combined cost is exactly the expected cost.

The index seek cost detail is shown below. The I/O cost of 0.140240 works out to the base cost from the previous section plus the cost for 185 additional leaf-level pages, meaning that each index leaf-level page holds approximately 541 rows. The CPU cost 0.110000 corresponds to 100,000 rows for a total I/O and CPU cost of 0.250619.

Picture (Metafile)
The bookmark lookup cost detail is shown below. The I/O cost of 311.86 works out 99.8% the cost of the 100,000 times the single row Bookmark Lookup I/O cost of0.0031249.

Picture (Metafile)
Rather than change system memory or the size of test data, various settings for max server memory were used. The run times measured from Profiler at 256M and 1,154M max server memory are shown next. Query Analyzer STATISTICS IO and the Perfmon disk counters confirm that the entire table is read from disk for the 256M server memory configuration, with the exception of the Index Seek and Bookmark Lookup plan on SeqID, which required very few disk reads. The 1,154M test runs are measured when both data and indexes are in memory.

SELECT query
100K rows
       Sequential rows Sequential rows Distributed rows        Distributed rows       
256M server mem Index + BL      Table Scan      Index+BL        Table Scan     
Query time (sec)        0.3     10.5    167     10.5   
Rows or Pages/sec       333,333(R)      9,620(P)        599(R)  9,620(P)       
Disk IO/sec     Low     ~1,200  ~600    ~1,200 
Avg. Byte/Read          64K     8K      64K    
1154MB server mem                                      
Query time      0.266   1.076   0.373   1.090  
Rows or Pages/sec       376,000 93,877  268,000 92,672 
Both Table Scans ran in approximately 10.5 seconds, so the location of the required rows does not affect the table scan time. The disk I/O rate for each physical disk was approximately 600/sec at 64KB/Read for a transfer rate of 37.5MB/sec, about the maximum throughput for the Seagate ST318451 disk (current generation 15K drives can sustain >50MB/sec sequential transfer capability).

The index seek plan for sequential rows exhibited some initial disk activity (less than 200 disk I/Os) at the 256M server memory configuration, but thereafter ran in memory. The index seek plan for distributed rows was forced to retrieve each 8KB page of the 101,012 page table from disk when max server memory was restricted to 256M. Each disk averaged 300 IO/sec at 8KB/Read. It might seem that 300 IO/sec is somewhat high for a 15K disk. However, the average queue depth was 40 per disk, and the data was distributed over 0.8GB of the 32GB total combined disk space. The high queue depth allows the disk drive to re-sequence I/O order and the small space usage reduces the average seek travel distance, both of which improves I/O performance.

At the 1154MB server memory configuration, with sufficient memory to hold the entire table and both indexes, once the data is loaded into memory, there is no disk activity on all four queries. The in-memory table scan rate of ~93,000 pages per second is the expected performance for a single 2.4GHz Xeon processor (with max degree of parallelism 1) where the cost per page of a table scan is approximately 25K CPU-cycles. Both of the Index Seek and Bookmark Lookup plans are much lower cost than the table scan when all data is in memory. The Bookmark Lookup for sequential rows is about four times faster than the table scan and the distributed rows case is just less than 3 times faster than the table scan.

The execution plan cost formula (>1GB) rates the index seek followed by a bookmark lookup to be 3.6 times more expensive (in execution time) than a table scan. The execution plan cost does not take into account whether the data is currently in memory, or whether the data can fit in memory, even though these factors have enormous impact of the cost of the query. It is reasonable for SQL Server not to take into account the location distribution of data because the SQL Server statistics data does keep this information.

When the entire data must be read from disk, the actual table scan cost, employing a sequential disk read, is 16 times faster than the index and bookmark plan using effectively random disk I/O. Compare this to the 3.6X advantage predicted by the execution plan cost formula. The 15K RPM disk used here probably has 2 times the random I/O capability of a 7200RPM drive available 8 years ago, but in the range of 8 times the sequential throughput of the older drives. It is quite possible that an older 7200 RPM disk would have produced exactly the result predicted by the execution plan.

If the number disk drives for the data partition were increased, both sequential and random I/O performance would increase by approximately the same degree initially. When there are sufficient drives to reach a sequential transfer rate of 800MB/sec (10X higher than the current transfer rate), the table scan performance should become limited by the processor speed. However, there is good reason to believe that more disk drives will continue to increase the random I/O performance to well over 10,000 I/O per sec.

Update Queries
The same four types of queries are tested for the Update statement. The first pair of queries updates a sequential block of 100,000 rows. The second pair updates 100,000 rows where each row is in a different 8KB page. A checkpoint is executed immediately after the Update statement. The combined Update and Checkpoint time determines the actual time required to write changes to disk. Even though a Query Analyzer session may show the statement as completed, the combined time is a more accurate assessment of the true cost.

-- Sequential rows, index seek

UPDATE m SET randMoney = 1.0 FROM M3C_01 m WHERE SeqID = 91

GO

CHECKPOINT

GO

-- Sequential rows, table scan

UPDATE m SET randMoney = 1.0 FROM M3C_01 m WITH(INDEX(0)) WHERE SeqID = 91

GO

CHECKPOINT

GO

-- Distributed rows, index seek

UPDATE M3C_01 SET randMoney = 5000.0 WHERE DistID = 91

GO

CHECKPOINT

GO

-- Distributed rows, table scan

UPDATE m SET randMoney = 1.0 FROM M3C_01 m WITH(INDEX(0)) WHERE DistID = 5

GO

CHECKPOINT

GO

The execution plan for the first pair of Update statements are shown below. Notice that the plan without hints uses an Index Seek to find the rows to be updated. The index can only identify the rows that need to be updated, but does not actually load the data page into memory. There is no indication that the first plan actually takes into account the cost for retrieving the actual data page that needs to be modified. Before SQL Server can write to a page, the page first has to be loaded into the buffer cache. The hint INDEX(0) in the second plan is a directive to use a table scan.

Picture (Metafile)
The Index Seek has the same cost as in the Select query. The Top and Compute Scalar operations each add 0.010000 CPU cost. The Table Update operation has 0.010068 I/O cost and a CPU cost of 0.100000 corresponding to cost of 0.0000001 per row. As discussed earlier, the execution plan shows the same Update I/O cost regardless of the number of rows or pages involved. The Index Seek cannot guarantee that the rows updated are in order.

Picture (Metafile)
Picture (Metafile)
The detail boxes below are for the Update statement using a hint to force a Table Scan. The Table Scan cost detail is the same as in the Select statement. In this case, the Top operation adds a cost of 4.809997 even though the I/O and CPU costs are the same as in the previous Update operation. The Table Update operator has the same cost structure in both cases.

Picture (Metafile)
Picture (Metafile)
The net result of the execution plan cost formulas is that the default plan using an Index Seek is less expensive than the Table Scan plan by a factor of 238. This is entirely due to the difference in cost between an Index Seek for the 100,000 rows in a high-density index and a Table Scan of 101,012 pages and 10M rows. The table below shows the actual Update and the subsequent checkpoint time for 256M and 1154M configurations.

UPDATE query - 100K rows        Sequential rows Sequential rows Distributed rows        Distributed rows       
256M server mem Index   Table Scan      Index   Table Scan     
Query time (sec)        1.3     12.6    476.6   28     
Checkpoint time (sec)   0.4     0.6     14.5    8      
Rows /sec       57,471  7,576   203     2,778  
1154MB server mem                                      
Query time (sec)        0.8     1.3     0.9     1.5    
Checkpoint time (sec)   0.2     0.1     23      23     
Rows /sec       100,000 71,429  4,184   4,082  
The index plan is indeed faster when all modified rows are in adjacent pages, but is much slower if the modified rows are distributed across many pages. Since SQL Server statistics do not determine the distribution of the affected pages, the execution plan cost formula should not assume that rows are in adjacent pages. There is no expectation for rows from a common nonclustered index value to be stored in adjacent rows. SQL Server however, always chose the index plan if an index on the SARG exists because the execution plan does not take into account the cost of retrieving the data pages in a manner similar to the Bookmark Lookup operation. It is clear that the true cost does reflect the cost of retrieving the actual data row. This discrepancy could lead to a very slow plan compared to the table scan method when no indexes are present.

A simple change of the UPDATE query in the SET condition to randMoney = randMoney + 1.0 yields the plan show below. Now the default plan uses a table scan, and forcing the index on the SARG shows a plan with the bookmark lookup operation.

Picture (Metafile)

Delete Queries
The delete queries are shown below. The first two queries involve rows in sequential pages, and the second pair involves rows distributed across all pages as in the other examples.

-- Sequential rows, index seek

DELETE M3C_01 WHERE SeqID = 91

GO

CHECKPOINT

GO

-- Sequential rows, table scan

DELETE m FROM M3C_01 m WITH(INDEX(0)) WHERE SeqID = 71

GO

CHECKPOINT

GO

-- Distributed rows, index seek

DELETE M3C_01 WHERE DistID = 91

GO

CHECKPOINT

GO

-- Distributed rows, table scan

DELETE m FROM M3C_01 m WITH(INDEX(0)) WHERE DistID = 27

GO

CHECKPOINT

GO

Without hints, the execution plan uses an Index Seek. The I/O cost for reading the actual data page and row is not taken into account by the execution plan, similar to the Update statement. Forcing a table scan shows a plan that is much more expensive. After the Table Delete, there is Spool operation followed by Index Delete operations for the two indexes on this table.

Picture (Metafile)
The Index Seek and Table Scan operations are the same as the previous Select and Update queries. The Top operation in each case of the two above Delete statements are also the same as the corresponding Update statements. Unlike the Update, the Delete has no Compute Scalar operation. However, the Index Seek has cost of 0.250619, the Top add 0.010000 for a total of 0.260619, the Table Delete adds 0.110068, yet the Table Delete subtree cost is 0.380687, so somehow an additional amount of 0.010000 crept into the Table Delete subtree cost. The Table Delete operation has a fixed I/O cost 0.010068 irrespective of the number of rows and pages involved.

Picture (Metafile)
The Table Spool operation costs are more difficult to interpret. Both show I/O cost of 0.931 and CPU cost of 0.0188, but the top Spool shows a total cost of 0.768492 while the lower Spool shows a total cost of 1.149179. Both Spool operations show a total subtree cost of 1.149179.

Picture (Metafile)
Upper Spool for Delete plan using Index Seek.
Picture (Metafile)
Lower Spool for Delete plan using Index Seek.
The subtree cost of the Table Delete and the top Table Spool does add up to the subtree cost of 1.149179, but there is no apparent explanation for how the lower Table Spool I/O and CPU costs add up to the same subtree cost.

Each Index Delete show an I/O cost of 0.0100125 and CPU cost 0.100000, slightly lower than the Table Delete I/O cost and exactly equal in CPU cost. Both Index Delete operations show a subtree cost of 1.2591919.

Picture (Metafile)
Upper Index Delete.
Picture (Metafile)
Lower Index Delete.
The last SQL operation is Sequence, which shows zero I/O cost and a CPU cost of 0.200000. Since to total subtree cost is 2.7183836, the prior operations contributed 2.5183836 of this cost, which works out to the 1.2591919 for each of the Index Delete subtrees.

Picture (Metafile)
It is not clear, but it may be that the costs of the operations prior to the Table Spools are dividing equally into the subtree cost of each Table Spool operation. The cost numbers do not add up exactly, but are reasonably close.

The table below shows the measured Delete times for both SARG and both execution plans. The execution plan using the index does not take into account the cost of retrieving the data row and page, but the measured costs clearly shows this is part of the actual execution plan. It is not clear why a Table Scan that deletes sequential rows takes longer than a Table Scan that deletes distributed rows. When all data is in memory, the index plan is slightly faster than the table scan. The checkpoint operation is very fast when few data pages are modified, and takes longer when all pages are modified, as would be expected.

Delete query - 100K rows        Sequential rows Sequential rows Distributed rows        Distributed rows       
256M server mem Index   Table Scan      Index   Table Scan     
Query time (sec)        4.8     88.52   282     41     
Checkpoint time (sec)   8.4     4.52    8.4     14     
Rows / sec      7,576   1,075   340     1,800  
1154MB server mem                                      
Query time (sec)        4.1     6.4     4.2     5.3    
Checkpoint time (sec)   3.7     3.9     28.6    28.6   
Rows /sec       12,821  9,708   3,048   2,949  
The Delete plan for the same table without indexes is shown below. The component operation costs are the same as the corresponding operations in the plan discussed earlier.

Picture (Metafile)
The measured query and checkpoint times for a 100,000 row Delete from a 10M row, 101,012 page table are shown below.
Delete query, no index 100K rows        Sequential rows Distributed rows       
256M server mem Table Scan      Table Scan     
Query time (sec)        11.5    26     
Checkpoint time (sec)   0.1     4      
Rows / sec      8,621   3,300  
1154MB server mem                      
Query time (sec)        1.9     1.5    
Checkpoint time (sec)   0.2     22     
Rows /sec       47,619  4,255  
As expected, the Delete operation on a table without indexes (which must use a Table Scan) is faster than a Delete using the Table Scan plan that must also delete from indexes. However, the difference in cost between the two, accounting for the cost of the index delete, is not large, except for the unusual behavior in deleting from sequential rows involving a table scan from disk.

Many people have noticed that it can be faster to update or delete a large number of rows by first dropping existing indexes, update or delete from the unindexed table, and then recreating the indexes, than updating or deleting from a table with indexes in-place. It is now clear that the reason for this is that the execution plan does not account for the cost of retrieving the data pages and incorrectly uses an Index Seek plan when a Table Scan plan is much more efficient. The index build time for the two indexes used in the examples above are around 90 seconds total (30 sec for the SeqID and 60 sec for DistID). An Update or Delete with indexes in place using a hint to force a Table Scan will in most cases is faster than dropping the indexes before the Update/Delete, and then recreating the indexes afterwards.

At 256M server memory, the Delete query time is slightly longer than the Select query for sequential rows, and approximately 2.5 times longer for distributed rows. The checkpoint is negligible for sequential rows were relatively few pages are involved (1010 data and 185 index), but more substantial if many pages must be written to.


Summary
Several important observations can be made from the examination of the SQL Server execution plans, and the actual query costs for both in-memory and disk intensive configurations.

There is good reason to believe that the SQL Server execution plan cost values have units of time in seconds and are based on at least partially disk bound operations. The cost values were fixed in the SQL Server 7.0 period and have not been updated. The SQL Server optimizer under estimates current generation sequential disk transfer rates relative to random disk I/O times. When data resides in memory, the Bookmark Lookup cost per row is less expensive than the Table Scan cost per page. Query time will depend significantly on whether the affected data currently resides in memory or disk access is required. For disk intensive operations, the characteristics of the disk sequential and random I/O performance are also important. The media technology of current disk drives allows a 7200RPM drive to have similar sequential transfer performance compared to a high-end 15K drive, but only one-half the random I/O performance.

In principle, a much better execution plan for the specific circumstance can be derived if it can be determined whether the data is currently in memory. Even a less accurate estimate using recent Buffer Cache Hit Ratio and an estimate of the size of the data relative to the Target Server Memory should produce a better plan than rigidly applying fixed estimates based on system memory alone.

The most important observation is that the SQL Server Insert, Update and Delete operations may not properly account for the full I/O costs involved. This can cause Update and Delete statements to use an Index Seek plan that is substantially less efficient than a Table Scan plan. It is not necessary to drop indexes prior to large deletes, and pay the cost of recreating indexes afterwards. A simple index hint to force a Table Scan is the best solution in many cases.

Published with the express written permission of the author. Copyright 2004 Joe Chang. All Rights Reserved.