In RockSolid we have tried for every feature we have added, to take a step back and re-evaluate what is truly best practice rather than just accepting current industry standards for our processes. One such area is that of database index maintenance, and in this article I will explain our approach to index management.
Typical Database Index Maintenance Process
First let me talk about the most common method of maintaining database indexes for a SQL Server database (true for almost all SQL Server databases). DBAs are aware that index maintenance is an important process for improving performance as it:
- reduces fragmentation which improves scan performance
- as it adds free space (fill) into the index which avoids page splits improving insert/update performance
And the most common method for running index maintenance is to schedule a chunk of time on the weekend and effectively have an outage while the maintenance process runs through all indexes on an instance sequentially carrying out a “rebuild” (which is effectively a recreation of the index). While this is common, but is it best practice? There are a number of issues that start to pop up especially when database maintenance windows are decreasing and the requirement to have the database online and operation at specific times increases. These issues include:
- Scheduling index maintenance for set periods and preventing index maintenance crossing maintenance window boundaries.
- Effective and efficient index maintenance, that is using the allocated maintenance time in a way to get the highest benefit for each database.
- Optimising index structures during rebuild to reduce future fragmentation or to enhance scan performance
- Minimising capacity issues during index maintenance
Issue: Maintenance Boundaries
The first issue is how long should the database maintenance window be for? How long can you tell the application users, the web site manager, the developer and so on that the database will be “busy” for while performing maintenance? What time can they start their batch process, their invoice run, their data entry? Sure a DBA can calculate the time required to carry out maintenance for a few databases, and recalculate these maintenance times routinely as the databases grow, but as many enterprise sites exceed 1000 databases, they cannot practically calculate exact window requirements for all databases organisation wide. So what happens is they allocate a catch all timeframe, half a day, all day Sunday, 10 hours etc which is sufficient enough to cover all requirements. Regardless if a database is being maintained for 10 mins, 30 mins or 2 hours, it is effectively considered under maintenance and unavailable for the duration of the generic window.
Issue: Effective use of Maintenance Time
The second issue is how effective and efficient is the index maintenance process being? Half a day on a Sunday to rebuild all database indexes is useful, but do all the indexes in a database need to be rebuilt weekly? Or, are there indexes that should be rebuilt much more frequently? Again something that can be easily determined for 1, 2 or 10 databases, not so easily accurately determined for hundreds or thousands of databases.
Issue: Effective Index Structuring
The third issue is what options should be applied when rebuilding a given index? What is the optimal fill factor? Should sorting take place in memory or tempdb? Should online or offline rebuilds take place? When we are talking hundreds or thousands of databases, we are usually talking tens of thousands, or hundreds of thousands of indexes. Again, not a practical question for a DBA to answer using traditional methods.
Issue: Maintenance Capacity Management
Rebuilding indexes can use a large amount of transaction log space for each index. DBA's typically schedule transaction log backups are a regular frequency to manage the size of log growth. However in high performing systems with larger databases, many index rebuild operations can occur within the delays between transaction log backups. This can mean that a database transaction log may grow significantly in these periods and a common issue is that transaction log space is often exhausted during maintenance. A resulting side issue is that log disks become highly utilised, even log the logs themselves will eventually be cleared.
RockSolid Index Maintenance Process
With RockSolid you do not need to allocate large random chunks of time for the purpose of rebuilding indexes and adjust all business and application requirements around that maintenance. Instead you tell RockSolid what time you have available for carrying out index maintenance, regardless of how long. Maybe you have 1 hour here on a Tuesday, and 30 minutes here between batch jobs on a Friday, and 15 minutes here between processes on a Sunday. Tell RockSolid what time you have available, and also tell it if that time is considered an outage (so no application processes are expected to run) or if the database should remain online during the window allocation (online maintenance).
Solution: Maintenance Boundaries
The RockSolid maintenance jobs automatically monitor their own performance and "work rate" to make adjustments about what indexes can be rebuilt within the allocated window. RockSolid will not attempt to rebuild indexes that it does not calculate can be completed within the window. Secondly the RockSolid jobs are aware of the maintenance boundaries allocated and will self-terminate if those boundaries are exceed. Finally for each database an external monitoring process also is watching over maintenance processes and these will automatically terminate any maintenance session running outside the allocated boundaries. There is no specific configuration required by the admin team to enable these protections other than defining the maintenance windows themselves in the RockSolid GUI.
Solution: Effective use of Maintenance Time
When RockSolid is given a maintenance window to use, at run time it carries out an analyses of a number of factors to determine which indexes will provide the most return in terms of performance, given the time allocated to the maintenance process. RockSolid evaluates factors such as fragmentation, index usage, index size and of course the time allocated to determine what is the best “bang for your buck”. The time allocated is important because RockSolid will never exceed the boundaries of a maintenance window, for any type of maintenance process that is managed by RockSolid.
RockSolid also provides feedback on the windows you have allocated to the database for maintenance. Are the windows too large/too many? Are their too few maintenance windows? A recommendation may be that you will see an estimated n% improvement in performance by adding a 30 minute daily maintenance window as apposed to a once off weekly window, for example. RockSolid pro-actively provides this analysis and feedback, and is constantly re-evaluating the maintenance requirements for the lifetime of a database.
Solution: Effective Index Structuring
RockSolid looks at all factors necessary to decide the most appropriate use of Fill Factors (index usage, read/write ratios, time to next maintenance window etc) and rebuild options (sort in tempdb, online/offline based on the window type) to ensure every index is rebuilt using an optimal strategy, not only for the rebuild itself but for the performance of that index for it’s intended purpose following the rebuild.
Solution: Maintenance Capacity Management
The RockSolid index maintenance process has logic where it can selectively run the database transaction log backups more frequently during index maintenance. Whenever rebuilding large indexes which will cause a significant increase in log size, RockSolid will kick of extra transaction log backups. These extra log backups proactively remove the committed rebuild operations from the logs allowing new rebuilds to occupy that log space without necessarily requiring the log to grow.
As an extra protection, if disk space is becoming low and log backup space is exhausted then RockSolid will terminate the database index maintenance process. This logic attempts to avoid the worst possible situation, the log backup disk is full and the database transaction log is full with no space available to clear it via backup - this commonly would result in a database outage.
This approach is redefining how database maintenance is performed, resulting in improvements in availability of the database, but also resulting in improved performance of the database through more effective index management.