Update SQL Statistics on things

Update SQL Statistics on things

Background

As our customer base has grown, so has the size of those customers and, with them, the amount of data we hold within our databases. We have a solution that keeps an eye on the performance of the database servers, and we monitor it closely to ensure we can see potential problems before they cause downtime or poor performance in our applications.

A few months ago, we noticed that one of our SQL servers was beginning to show signs of high CPU usage. It was regularly hitting over 80% CPU usage. We like to keep ‘some in reserve’ on our systems as the traffic through them can vary daily, with weekdays particularly busy from 10am to 5pm. By ensuring our systems never reach 100%, we can cater for the surges in traffic when they happen.

Once we saw the higher-than-usual CPU activity, we did what you would expect and ran several queries to see if we had introduced a problem. Our monitoring solution allowed us to look back on the last few months, and sure enough, there had been a slow but steady increase in the business going through the application. This was in line with what the business was expecting. So there was nothing wrong; we and our systems were working well.

We checked our housekeeping routines were working correctly – they were. We checked that our maintenance jobs were running as they should, too. All was well. So, we planned to upgrade the SQL Server. We liaised with the customers to agree when this would be done and waited until the agreed time.

All of our applications run in Azure. So, we had to shut the server down, change the server model for the next family member, and bring it back online. So simple!

Typically, going to the next server in the same family in Azure means you double the CPU cores, memory and IOPS. It also doubles the cost.

Everything went as it should; the server was updated, it came back online, and the sites were up and running within about 15 minutes.

However, we did not see the performance improvement we expected. It wasn’t any worse, but it was no better.

What was wrong?

We spent quite a while looking to see what the problem was. We ran queries, rebuilt indexes and even changed some parallelism settings. The latter helped a little, but it still ran at 80% CPU usage despite having twice as many cores.

Then I had an idea – I updated the statistics on the table that seemed to be causing the high CPU usage.

Boom – that was it – CPU utilisation dropped to about 50%

What are SQL Statistics?

SQL Server statistics stores the distribution of the column data in a histogram and stores unique values ratio in the density vector. These two meta-data are used by the query optimizer to calculate how many rows will return a query.

Outdated statistics can lead to inaccurate query plans, which can, in turn, lead to inefficient queries. Therefore, it’s essential to ensure your statistics are up to date.

Statistics in SQL Server are binary large objects (BLOBs) stored in an internal structure called a statistics object. These objects contain information about the distribution of values in one or more columns of a table or indexed view. Here, the distribution of values refers to the pattern of values.

When you execute a query, SQL Server uses statistics to estimate the number of rows the query will return. Then, the SQL Server Query Optimizer uses this number to determine the most efficient execution plan for that query. Note that statistics aren’t used for all types of queries. For example, statistics aren’t used for queries that involve common table expressions (CTEs), table variables, or dynamic SQL queries.

If a query selects all orders with an OrderTotal value greater than £100, the SQL Server uses the column’s statistics to determine the best execution plan. Based on the distribution of the order totals, the SQL Server may choose an index scan or index seek method to execute this query.

When SQL Server estimates the query will return most rows in the table, the SQL Server Query Optimizer may use an index scan. The SQL Server reads all the data pages in the index and returns the rows that satisfy the query. In this example, these are the rows whose OrderTotal value exceeds £100.

When few orders exceed £100, the SQL Server may choose an index seek method to locate the specific rows matching the search criteria. This approach is more efficient than scanning the entire table.

How to rebuild the Statistics

The first rebuild command I issued erred on the side of caution concerning performance and its impact on the running system. I ran this: –

UPDATE STATISTICS TABLENAME WITH SAMPLE

This had no effect on CPU usage – so I ran this: –

UPDATE STATISTICS TABLENAME WITH FULLSCAN

The command ran quickly, considering the size of the table, and the impact on the CPU usage was immediate – nice, low CPU usage.

We then scheduled in this command below and run this every week now: –

EXEC sp_MSforeachtable 'UPDATE STATISTICS ? WITH FULLSCAN'
Stephen

Hi, my name is Stephen Finchett. I have been a software engineer for over 30 years and worked on complex, business critical, multi-user systems for all of my career. For the last 15 years, I have been concentrating on web based solutions using the Microsoft Stack including ASP.Net, C#, TypeScript, SQL Server and running everything at scale within Kubernetes.