Performance: Demonstration of Outdated Statistics
Statistics are created when Query Optimizer generates a query plan for getting a better performance. Query Optimizer uses Statistics to estimate how many rows will be returned from a query plan. You can access the statistics under the indexes folder in SSMS.
You can create statistics in two ways. One of them is CREATE STATISTICS command and the other one is CREATE INDEX command. When you create a database, normally Auto Update Statistics, which allows to automatically update the statistics when they are deemed outdated, and Auto Create Statistics, which allows to automatically create stats for a single non-indexed column if they are missing when it is used in a where or join condition, options are set to ON by default. However, we will set them to OFF during the demonstration below. So, lets start with how outdated statistics can affect performance.
Firstly, we need to create a demonstration database with these two options are set to ON.
|
USE [master] GO CREATE DATABASE [StatisticsDemoDB] GO ALTER DATABASE [StatisticsDemoDB] SET AUTO_CREATE_STATISTICS ON WITH NO_WAIT GO ALTER DATABASE [StatisticsDemoDB] SET AUTO_UPDATE_STATISTICS ON WITH NO_WAIT GO ALTER DATABASE [StatisticsDemoDB] SET AUTO_UPDATE_STATISTICS_ASYNC ON WITH NO_WAIT GO |
On the next step, we create and populate a test table (OrdersTest) with 1 million records. This type of CTE based tally table referred by Itzik Ben-Gan. It’s really fast!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
|
USE [StatisticsDemoDB] GO CREATE TABLE OrdersTest (OrderRowID BIGINT, OrderDate DATETIME, OrderName VARCHAR(250), TrackingCode VARCHAR(250)) GO ;WITH Pass0 AS (SELECT 1 AS C UNION ALL SELECT 1), Pass1 AS (SELECT 1 AS C FROM Pass0 AS A, Pass0 AS B), Pass2 AS (SELECT 1 AS C FROM Pass1 AS A, Pass1 AS B), Pass3 AS (SELECT 1 AS C FROM Pass2 AS A, Pass2 AS B), Pass4 AS (SELECT 1 AS C FROM Pass3 AS A, Pass3 AS B), Pass5 AS (SELECT 1 AS C FROM Pass4 AS A, Pass4 AS B), tally AS (SELECT row_number() OVER (Order BY C) AS N FROM Pass5) INSERT OrdersTest(OrderRowID, OrderDate, OrderName, TrackingCode) SELECT t.N, DATEADD(ss, t.N, DATEADD(dd, -20, CAST(CAST(SYSDATETIME() AS DATE) AS DATETIME2(0)))) , CASE WHEN t.N % 3 = 0 THEN 'Book' WHEN t.N % 4 = 0 THEN 'DVD' ELSE 'Camera' END , CASE WHEN t.N % 3 = 0 THEN 'Book' + CONVERT(VARCHAR, CONVERT(INTEGER, ABS(CHECKSUM(NEWID()))/ 100.0)) + CONVERT(VARCHAR, N) WHEN t.N % 4 = 0 THEN 'DVD' + CONVERT(VARCHAR, CONVERT(INTEGER, ABS(CHECKSUM(NEWID()))/ 100.0)) + CONVERT(VARCHAR, N) ELSE 'Camera' + CONVERT(VARCHAR, CONVERT(INTEGER, ABS(CHECKSUM(NEWID()))/ 100.0)) + CONVERT(VARCHAR, N) END FROM tally AS t WHERE N < = 1000000; |
Now, demonstration environment is ready to test! Don’t forget, as a first step, Auto Create Statistics and Auto Update Statistics options are enabled. Now, we have to disable both of them and modify lots of records while these auto statistic options are disabled.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
|
USE [master] GO ALTER DATABASE [StatisticsDemoDB] SET AUTO_CREATE_STATISTICS OFF WITH NO_WAIT GO ALTER DATABASE [StatisticsDemoDB] SET AUTO_UPDATE_STATISTICS OFF WITH NO_WAIT GO ALTER DATABASE [StatisticsDemoDB] SET AUTO_UPDATE_STATISTICS_ASYNC OFF WITH NO_WAIT GO USE [StatisticsDemoDB] GO UPDATE OrdersTest SET OrderName = 'HDD' WHERE OrderName = 'DVD' GO DELETE FROM OrdersTest WHERE TrackingCode LIKE 'Camera21%' GO INSERT OrdersTest(OrderDate, OrderName, TrackingCode) SELECT TOP(250000) DATEADD(ss, (CONVERT(INTEGER, ABS(CHECKSUM(NEWID()))/ 100.0)), DATEADD(dd, -1000, CAST(CAST(SYSDATETIME() AS DATE) AS DATETIME2(0)))), OrderName, TrackingCode + CONVERT(VARCHAR, CONVERT(INTEGER, ABS(CHECKSUM(NEWID()))/ 1000000)) FROM OrdersTest GO |
In order to compare Actual Execution Plan and Estimated Execution Plan, drop clean buffers and send our query.
|
DBCC DROPCLEANBUFFERS GO USE [StatisticsDemoDB] GO SELECT * FROM OrdersTest WHERE OrderRowID < 900000 GO |
Updating statistics ensures that queries compile with up-to-date statistics. However, updating statistics causes queries to recompile. We recommend not updating statistics too frequently because there is a performance tradeoff between improving query plans and the time it takes to recompile queries. The specific tradeoffs depend on your application. UPDATE STATISTICS can use tempdb to sort the sample of rows for building statistics. <TechNet>
606.655 number of rows are listed totaly different on Actual and Estimated Execution Plans as a result of comparision. In that case, update the statistics and check the results again. We expect that Estimated Rows and Actual Rows will be much more close with updated statistics. The question is when to update statistics? There are three cases; it is highly recommended to update the statistics after the maintanence operations or when you apply bulk insert operations to modify larger amount of data or if your queries are executing slower then before.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
|
USE [master] GO ALTER DATABASE [StatisticsDemoDB] SET AUTO_CREATE_STATISTICS ON WITH NO_WAIT GO ALTER DATABASE [StatisticsDemoDB] SET AUTO_UPDATE_STATISTICS ON WITH NO_WAIT GO USE [StatisticsDemoDB] GO UPDATE STATISTICS [OrdersTest] GO DBCC DROPCLEANBUFFERS GO SELECT * FROM OrdersTest WHERE OrderRowID < 900000 GO |
This is much more close than the value we got with the outdated statistics! By the way, you can find the outdated statistics list with Transact-SQL shown as below.
|
SELECT st.name AS [Table] , si.name AS [Statistic], STATS_DATE(id, indid) AS [Last Update Date/Time] , rowmodctr AS [Rows Modified Since Last Update] FROM sys.sysindexes AS si INNER JOIN sys.tables AS st ON si.[id] = st.[object_id] INNER JOIN sys.schemas AS ss ON st.[schema_id] = [ss].[schema_id] WHERE STATS_DATE(id, indid) <= DATEADD(DAY, -1, GETDATE()) AND rowmodctr > 10 ORDER BY rowmodctr DESC |
Post Details