Improved Access Path Management in DB2
IBM DB2 V12 provides dynamic SQL plan stability
6/28/2017 10:03:40 AM |
By Gareth Z. Jones
This article is the fourth in a series about the enhancements delivered in IBM DB2 12, and focuses on new and improved features that make access path management easier, more flexible and more powerful. Read part one, part two and part three.
Plan stability, the ability to stabilize access paths (i.e., access plans), was delivered for static SQL in DB2 9 and continued to evolve in DB2 10 and DB2 11. DB2 9 allowed customers to revert to a previous access path after a BIND REPLACE or REBIND operation, typically because the new access path performed worse than the old one. DB2 10 complemented this with the ability to ask the DB2 Optimizer to reuse a previous access path when rebuilding a package using the REBIND or BIND ACTION(REPLACE) commands. This meant that DB2 built the package using the old access paths but with new optimized runtime structures. DB2 11 provided usability enhancements and access path reuse capability at the statement level for both static and dynamic SQL. Now, DB2 12 has taken another significant stride forward with several new or improved features:
Dynamic SQL Plan Stability
- Dynamic SQL plan stability
- Incremental enhancements to improve the usability of static plan stability
- Preservation of the local dynamic statement cache (DSC) at rollback
- Integration of RUNSTATS and the optimizer, which will provide automatic updates of statistics profiles
- Improved support for statistics profiles, including an automatic update driven by CREATE/ALTER INDEX DDL and usage of statistics profiles by inline statistics collections
For many customers, unstable performance of repeating dynamic SQL statements has been an ongoing problem. Dynamic SQL is prepared (i.e.,the access path is recalculated) at every Global DSC miss and requires an expensive full prepare. This full prepare can require tens of thousands to hundreds of thousands of CPU instructions. If statements are regularly purged from the Global DSC, they can be reoptimized several times a day. To add to the unwanted CPU cost of full prepares, environmental changes can also result in access path changes when a dynamic SQL is re-prepared, possibly leading to performance regression. This includes changes introduced by:
- RUNSTATS updates to the DB2 catalog, reflecting changes in data volumes and column value distribution
- Software maintenance affecting optimizer choices
- Migration to a new release of DB2
- ZPARM (system parameter) changes
- Database schema changes
Avoiding unwanted access path changes for dynamic SQL can be difficult. Prior to DB2 12, static SQL had several advantages. The access path is established and locked in at BIND time, and, with a few exceptions, isn’t recalculated until there is an explicit BIND REPLACE or REBIND operation. We’ve already discussed static plan management features introduced in DB2 9 and DB2 10, all of which provided advanced plan management functions. DB2 12 makes some of the static SQL advantages available to repeating dynamic SQL workloads by providing the infrastructure to save, reuse, monitor and manage dynamic SQL runtime structures. It stores dynamic SQL statements and their associated Global DSC structures in the DB2 catalog so they can be reloaded into the Global DSC from the catalog on a Global DSC miss. They are then reused on subsequent executions of the statement. This provides access path stability along with consistency and predictability, not just in a single member but across several or all members of a data-sharing group. DB2 12’s new capability to maintain access path stability is also unaffected by events such as DB2 stop and restart, RUNSTATS, DB2 maintenance and release migration.
Control Mechanisms for SQL Plan Stability
To make dynamic SQL plan stability usable, DB2 12 provides a number of management and control mechanisms.
New ZPARM CACHEDYN_STABILIZATION controls whether individual dynamic SQL statements or groups of statements can be captured for stabilization and saved in the DB2 catalog for loading into the Global DSC when there’s a miss in the Global DSC.
New commands can be used to perform a one-time capture of qualifying statements—or logical statement groups—cached in the Global DSC. These commands can also monitor the Global DSC to capture statements that qualify based on user-defined criteria. That is, capture can be performed with or without monitoring. Statements can qualify based on a statement ID, a current SQLID, the number of statement executions or a combination of these. In addition, customers can set a global variable to drive stabilization of dynamic SQL for specific applications. It’s important to note that when an application issues a prepare for a statement, DB2 checks to see if that statement matches with a statement already in the cache. If not, it checks to see if it matches a statement stabilised in the catalog. If it passes either of these matching tests, then the run-time structures can be reused. The matching criteria include the SQL statement text, the AUTHID used, and the APPLCOMPAT (application compatibility) setting.
It’s possible to remove statements from stabilization control with new options on the FREE command (i.e., you can remove them from the DB2 catalog and from the Global DSC). This can be done for an individual stabilized statement or for a stabilized statement group. When this is done, not only is the statement or statement group removed from the DB2 catalog, but it also becomes invalidated in the DSC.
The LASTUSED stabilized dynamic SQL statistic makes it easier to identify statements that should be removed from stabilization control. It can identify statements that haven’t been executed for certain period of time and provides a list of candidate statements for removal from stabilization control.
DB2 12 retains full statement invalidation support. Database schema changes and authorization changes can invalidate stabilized dynamic SQL statements in the same way that such changes can invalidate static SQL packages. To allow you to identify and correct unwanted access path changes caused by statement invalidation, DB2 12 has extended the EXPLAIN statement to support stabilized queries. This provides the capability to explain both the saved current copy and the saved invalid copy. See the IBM Knowledge Center
for the full syntax.
DB2 12 provides instrumentation so that you can identify which statements are using which stabilized runtime structures. The statement hash is a value derived from the SQL statement text itself, and is externalized to new column QUERY_HASH of the DSN_STATEMENT_CACHE table You can use the hash as a stable identifier to monitor statements over time. In addition to the new instrumentation for statement tracking, the DB2 statistics trace is enhanced to provide information about the cache and catalog hit ratios.
There are some key limitations in this initial implementation of dynamic SQL plan stability. Statements prepared with CONCENTRATE STATEMENTS WITH LITERALS, and statements accessing temporal objects are not supported for stabilization. In addition, none of the plan management features such as REBIND SWITCH or APREUSE/APCOMPARE are available yet.
Static Plan Stability: Usability
DB2 9 introduced plan management to retain copies of the original, previous and current versions of a package. Prior to DB2 12, only two options were available when freeing packages; customers could either free all copies of a package or free the inactive copies (original and previous). FREE PACKAGE was enhanced with DB2 12, allowing users to selectively free either the previous or original copies individually, both of the inactive copies or all copies. This feature lets customers easily remove, for example, a stale original copy without removing an up-to-date previous copy. DB2 12 also makes it possible to free only invalid copies after schema changes or release migration invalidates one or more of the copies.
As a usability enhancement, you can now free invalid package copies when an application using the current copy of the packages is running. This is because the exclusive package lock held by the running application no longer extends to the invalid copies.
Earlier, we discussed access path reuse. With DB2 10 came the ability to influence the DB2 optimizer to reuse the previous access path at REBIND time with the APREUSE(ERROR) option. This proved to be very successful, and with customers requesting further enhancements DB2 11 provided customers with even more flexibility, delivering the APREUSE(WARN) option. DB2 12 took this a step further by providing a new option for REBIND PACKAGE called APREUSESOURCE. This allows customers to explicitly specify whether the current, previous or original copy should be used as the access path source for APREUSE.
As well as enhancing usability, this feature addresses situations where invalid package copies accidentally become current. This could occur, for example, in the following scenario:
- A DROP and CREATE of an index causes the current package copy to become invalid
- When the package is automatically rebound by DB2, it picks up a bad access path
- The previous and original versions are also invalid, but at least one of them contains the desired access path
- REBIND SWITCH to one of these results in an invalid package becoming current, with the risk of automatic rebind before REBIND with APREUSE can be executed
REBIND with APREUSESOURCE avoids the two-step process for restoring a previously valid access path and eliminates the risk of an invalid package becoming the current package.
Preserving the Dynamic Statement Cache at Rollback
At some customer sites, application programmers use ROLLBACK instead of COMMIT at the end of a read-only unit of work. This causes the statement to be purged from the local DSC and drives an expensive full prepare or less expensive short prepare. It can also pollute DB2 accounting and statistics metrics. When the application issues a rollback, DB2 12 automatically preserves the access path in the dynamic statement cache instead of invalidating it. This eliminates the unwanted CPU cost of a prepare.
RUNSTATS Enhancements for SQL Performance
DB2 12 makes a number of improvements to RUNSTATS for improved access path selection. It enhances the CLUSTERRATIO formula to better handle the impact that unclustered inserts have on the clustering index along with the impact of the space search algorithm on inserts into table spaces with larger segment sizes. This ensures that CLUSTERRATIO better reflects dynamic prefetch behavior.
Prior to DB2 12, RUNSTATS (and other utilities such as REORG and LOAD configured to collect inline statistics) invalidated dynamic SQL statements in the global dynamic statement cache that were dependent on the target object. For many customers, this could lead to unwanted access path changes, so DB2 12 added an INVALIDATECACHE option to RUNSTATS with a default of NO. This prevents RUNSTATS from invalidating the cache unless the parameter is set to YES. Note that this is a change in the default behavior of RUNSTATS from previous releases, so if you want to ensure that statements are invalidated you should specify INVALIDATECACHE YES.
However, RUNSTATS UPDATE(NONE) REPORT(NO) will continue to invalidate the cache so that existing cache invalidation jobs aren’t impacted. For some utilities, statement invalidation will only occur if the object is in a restricted state before the utility began—such as rebuild pending or reorg pending states. Other utilities (e.g., LOAD, REORG) do not provide an option to control dynamic statement cache invalidation.
In addition, statistics profiles are now supported by inline statistics so that they can be used whenever RUNSTATS are collected via the REORG and LOAD REPLACE utilities. Additionally, index changes via SQL DDL will update the profile and ensure that dropped indexes are removed from the profile and new indexes are added.
DB2 12 provides an automated value for COUNT for FREQVAL used for the collection of statistics on skewed column values. Knowing what number to specify for FREQVAL COUNT n has always been data dependent and therefore difficult, with the number not only varying from table to table but also for a single table over time. In DB2 12, specifying ‘FREQVAL’ without ‘COUNT n’ will result in DB2 automatically determining the appropriate number of frequently occurring column values to collect. The objective is to allow DB2 to collect up to the top 100 most skewed values. If no skew can be detected for the remaining values, DB2 stops collecting frequency statistics for that column.
More broadly, determining which statistics to collect has often been a challenge for customers especially when large numbers of objects have to be managed. To address this, the DB2 11 optimizer externalized information about missing and conflicting statistics into catalog and explain tables. DB2 12 takes this further so that the optimizer automatically updates the statistics profile for an object with RUNSTATS recommendations. Collection of statistics with the USE PROFILE option, be it through the RUNSTATS utility or the REORG and LOAD REPLACE utilities with in-line statistics, will automatically include all statistics recommended by the optimizer.
This new feature is known as enhanced statistics profile management. As shown in Figure 1, when calculating the access path for a given query, DB2 will update the profile in the catalog to ensure that missing statistics are collected when using a statistics profile.
Figure 1: Collecting Statistics
The DB2 optimizer will update the statistics profile to ensure that missing or conflicting statistics are corrected by RUNSTATS. This is controlled by new ZPARM STATFDBK_PROFILE. In addition, CREATE and DROP INDEX will also update the profile.
A final noteworthy point about enhanced statistics profile management is that the DB2-supplied stored procedure DSNACCOX will always recommend RUNSTATS whenever a profile has been updated. To ensure that you collect current statistics recommendations, specify USE PROFILE on RUNSTATS.
Gareth Z. Jones has worked in IT since 1985. Until 2000, he was an IBM client with experience as a systems programmer and DBA. He now works in DB2 for z/OS development as a member of the SWAT Team, which is led by John Campbell. He has worked with many customers around the world to help them succeed in their use of DB2. Gareth has written several technical papers and presented at many conferences and group meetings. He can be contacted via email at email@example.com.