| Good day. |
There is a MS SQL 20016 SP2 server with several databases where index defragmentation has not been done for a long time.
In the main tables comes to 45-50%. I have configured a service plan using SSMS for indexes like:
index rebuild (free space does not change, indexes are online as much as possible, maxdop does not change. priority is low, fast scanning, it works with 30+ fragmentation and more than 1000 pages), then index reorganization (compression of large objects, fast scanning, 10+ fragmentation and 1000 pages), last action update statistics.
Job started once a weekend. And on Monday, complaints began to increase the time for processing requests, some developers were 5-10 times longer to complete. The main apparent reason - the load on the CPU began to steadily go to 100%.
The database analytics, the data loss suited and the database was restored at the time before the weekend. Requests again work out normally.
Full confidence in the causes, most likely, will not succeed. It was necessary to restore operability quickly and there was no possibility to check the monitoring counters in depth and to “drive” problematic requests. On several bases the query repository was turned on and in regressive queries one could see a sharp jump in the execution time with the same plan.
Now I'm wondering if there really can be a problem from defragmenting indexes, what causes and how to avoid.
So far I have only seen a mention of an increase in the time before the first update of the statistics. Statistics updated - did not help.
Either the option of changing the free space for the page and MAXDOP, but also not my option.
| Now explain this, pliz: |
reorg does not compress anything.
compresses compression, but this is not reorg, but rebuild,
and most importantly, blobs just do not shrink.
in general, if you shook the data by rebuild with data_compression,
it is not to be surprised that requests are now drank the CPU
| Yasha123, |
Compression of large objects - the name in Russian inside the designer of plans.
So this is compact large objects. This item is in reorganize, not in rebuild. It is turned on by default and I did not see recommendations to disable it.
Attached file Size - 54Kb
problems you have from changing implementation plans, I guess
indexes rebuilt, statistics rebuilt, optimizer decided to scan tables
if the autoupdate / autocreate statistics on the base is off, then there may be a similar effect
| no, compact large objects is another. |
reorg it does by default
but this is NOT compression,
nothing shrinks, just blank pages are released.
it will not affect further reading, except for the better
but after all if "decided to scan", then IO will jump, but not CPU
show the result
* from sys . select * from sys . configurations
| Komrad, |
That is, there was no parallelism before, and after reindexing it jumped out? I would rather believe in recalculating statistics. But it should be short for a few hours from strength.
You had a loop join because of a long time untranslated statistics, and now hash join with a full scan of large tables. Previously, the load fell on the merge join on tempdb, now there are not enough cores on the processors.
Check the cooling on the processors, treadmill under the coolers, check the power supply profile to ensure maximum performance instead of balanced.
And start by writing here the number of cores and processors, the clock frequency, the processor model. And what is set in maxdop for the entire MSSQL instance.
| Komrad, |
If a problem occurs, I run an update of statistics for user databases. Better not.
Max Degree of parallelism - 0. This, as I recall, is not limited. But why would you suddenly start producing extra concurrency to load the CPU?
Attached file Size - 136Kb
| The second part of |
Attached file Size - 139Kb
statistics can be updated in different ways
the most accurate and long - fullscan
see comments from Andy_Olap, very likely
| Of course, the option that they themselves had time to develop could not be discarded. But they are interested in possible problems from the side of the DBMS, I don’t want to meet the problem on the main production after my maintenance plan. |
"You had a loop join due to long untranslated statistics, and now hash join with a full scan of large tables. Previously, the load fell on merge join on tempdb, now there are not enough cores on the processors."
The version is interesting, you need to update knowledge on the topic.
"Check the cooling on the processors, fuse the cooler under the coolers, check the power supply profile to ensure maximum performance instead of balanced."
The virtual machine, the server itself in the data center. Issued to her 15 percent, 2.20 GHz.
Attached file Size - 120Kb
| Danion, |
here it still show
* from [ DB ]. select * from [ DB ]. sys . database_scoped_configurations
DB - your base
| Danion, |
You didn't accidentally drive fill factor to 100%. If so, that explains it.
| a_voronin, |
Split pages hell loads CPU according to you? In my database, all the tables have 100% filling and the load is not higher than 20% in bad weather.
Do you think that the author of the topic instead of the default "reorganize pages" chose "change free space" as in the picture ? And twisted it to 0, getting fill factor 100?
|Please note that comrade virtuals. And there already some admin nut tightened.|
| a_voronin, |
I thought about this option, in sys.configuration min - 0, max - 100%.
But the option in terms of service default free space per page seems to leave what was specified when creating the index.
Attached file Size - 79Kb
Transferred to the support of several servers. I did not deploy and initially set them up, while I try to change the settings to a minimum. It’s good that me backups managed to get through. Just started to bring the server in adequate view and then such a surprise. |
Honestly, I don’t really remember the information on the loop join and hash join. As I get free, I’ll reread them a bit.
I have backups before defragmenting indexes and on the date when the problems started. When deploying to a test server, is it about to repeat the problem and compare the settings, will it succeed, or is there a link to that server?
Take a developer who says "since Monday, this select has started to work 10 times slower."
Take a backup before defragmentation and deploy on any test server. Run the select. Records the resulting plan and the execution time. You start a defragmentation on an example on a fighting server. Waiting for completion. Run the select. Similarly. Next, just in case, deploy the backup after defragmentation from the battle server to the date when the problems started. Run the select. Similarly.
Then, with the information received, go to a developer who knows how to read a query execution plan, sit down next to you and think for a long time, tearing up your clothes and sprinkling ashes on your head.
| Danion, |
For everything to be kosher - test server during the experiments do not load anything else.
| Andy_OLAP, |
Thanks for the link to the article on join.
I will try to test.
At change join the request plan should change?
On the basis of the information repository was launched, I managed to watch a bit while “running around with shouts were repairing faster,” but there is little information for the exact definition.
After the restoration on the test, I will see what is collected.
The maintenance plan passed from 7 to 8, and the query plan did not change according to the regressive request schedule.
Attached file Size - 47Kb
should be similar to the parameters of the "iron" and setting version of the sequel
otherwise, the “fast” plan may not generate.
if you got it, try pinning it with the query data store, and rebuilding the indexes to check the performance with the forced "fast" plan and the new generation
| Thank you all for the answers. |
If the Andy_OLAP version is correct, then over time the statistics will be updated and by the principle “By default, the query optimizer will update the statistics as needed to improve the query plan” and the similar will arise again?
And who understands the errors of the system tables of user databases - please prompt in my own topic
Thank you in advance.
it will update if update is enabled
if disabled, will not update
| Komrad, |
Is the Auto Update Statistics item in the base options implied? There it is included on all bases, it seems to be set by default.
Well, from here it is not clear what you have set up there ...
+ look at the base
' Automatic update statistics: ' + ( case when DatabasePropertyEx ( db_name (), 'IsAutoUpdateStatistics' ) = 1 then 'ON' else 'OFF' end ) select 'Automatic update statistics:' + ( case when DatabasePropertyEx ( db_name (), 'IsAutoUpdateStatistics' ) = 1 then 'ON' else 'OFF' end ) ' Automatic create statistics: ' + ( case when DatabasePropertyEx ( db_name (), 'IsAutoCreateStatistics' ) = 1 then 'ON' else 'OFF' end ) select 'Automatic create statistics:' + ( case when DatabasePropertyEx ( db_name (), 'IsAutoCreateStatistics' ) = 1 then 'ON' else 'OFF' end ) - recompute stats OFF * from sys . select * from sys . where no_recompute=1 stats where no_recompute = 1
I kind of responded what is configured and where it looked in the SSMS snap-in.
On all bases
Automatic update statistics: ON
Automatic create statistics: ON
By select * from sys.stats where no_recompute = 1 - empty
It is clear that you need to find the culprit, but to do it on a test base where nothing is running can not be easy. Restore the last backup of the database at the time of the brakes and watch the Query Store for requests with the maximum total CPU.
Well, limit MAXDOP, it is unlikely that you have an OLAP repository there.
| Mind |
It has not yet been possible to restore the base on a test and see something.
The request from the schedule really came under the load of the CPU of other requests, rather than caused itself.
If again there is a problem on the combat server, then I will limit MAXDOP until it is clear how much to leave.
I stumbled upon an old topic on the same server, where query optimization was discussed without their global change, and the use of hints that change the logic of the query optimizer with the execution plan emerged in the discussion. But they fly off with a standard update of statistics. In this case, it is not yet known whether, but it can also cause changes.