Chapter 11: Performance Tuning

Optimizing Query Processing

The Query Optimizer uses advanced performance optimizing techniques when processing queries, especially when appropriate indexes are available. Most of the query processing algorithms described in this section are automatic; they do not require user intervention. However, understanding how the various processing algorithms use indexes will help you to decide when indexes are needed as well as help you construct more efficient queries. A list of query guidelines is provided at the end of this section.

Indexes and Performance

The XDB Server uses B-tree indexes to speed retrieval. Like a book index, a database index reduces the amount of data that must be searched to find an answer. The XDB Server performs automatic query optimization to select the best index to use for a query.

In general, the fewer records accessed, the faster the retrieval time. The XDB Server uses available indexes whenever they help to speed retrieval. But, while indexes generally speed up retrieval, they can slow down updates, inserts and deletes.

Usually, it is faster to execute a query using one or more indexes than to examine every record in the tables involved in the query. The XDB query optimization algorithms examine the indexes available and pick which indexes, if any, to use for a particular query. The user does not need to specify which indexes to use.

The query optimization algorithms analyze the selectivity of a column to determine how to process a query. Selectivity of a column is determined by calculating how unique the values in the column are. The more unique the values, the more likely the column will be a candidate for index use. Selectivity is determined by the formula, 1/x, where x is the number of distinct values in a column or index. The larger the number of distinct values, the more unique the column is and the lower the selectivity is. This increases the probability that the optimizer will use the index.

If a command does not contain a WHERE, ORDER BY, HAVING, or GROUP BY clause, the server generally will not perform an index search; instead, it processes the table sequentially. Some aggregate functions can cause an index to be used in certain cases. Usually, if a column in the WHERE clause is indexed, the index is a candidate for use. Other factors, such as the number of distinct values in an index, determine final index selection.

Overriding the Optimizer

You can force the use of an index for a particular query, or force the system to not use an index.

To force the use of an index, enter an asterisk (*) in parentheses after the name of the indexed column when you construct the query. To force the system to ignore the index, use enter a hyphen (-) in parentheses after the name of the indexed column when you construct the query.

For example, assuming that an index exists on C_NO:

This query uses the C_NO index.

SELECT c_no, o_no, o_date 
    FROM orders 
    WHERE c_no(*)<20

This query does not use the index.

SELECT c_no, o_no, o_date 
    FROM orders 
    WHERE c_no(-)<20

The (*) and (-) options are valid only for column names that appear in a WHERE clause. You cannot use the (*) and (-) options in GROUP BY, HAVING or ORDER BY clauses.

Retrieval Speed vs. Update Speed

There is a trade off between retrieval speed and update speed when indexes are used, since each time a value in an indexed column is changed, the system must also update the index. Overall database performance can actually deteriorate if indexes are built on columns that are seldom searched but are frequently updated. You might want to build a number of indexes on a read-only table, but limit a frequently updated table to a minimal set of indexes.

When importing data into a table, it is significantly faster to create indexes after the data has been inserted. Prior to importing, you should drop all indexes that are not required to enforce uniqueness. Use a batch SQL file or procedure to re-create indexes after importing.

List Prefetch

List Prefetch is an Query Optimizer routine that extracts the Record IDs (RIDs) that match a specific search criteria from a particular index. These RIDs are sorted in ascending order, thereby enabling sequential table access. By accessing the table data pages in a sequential fashion, disk head seek time can be drastically reduced and data page prefetching performed. The server considers list prefetch when table data page access is required, when a multiple index access condition is detected, or if data needs to be retrieved in physical order.

Note: List prefetch can be affected by the size of the RID Pool buffer. See the section Improving Production Processing for more information.

Aggregate Clause Processing

When all aggregate functions in a query are on the same indexed field, the system will determine the result from the index rather than by using the slower process of accessing the base table. For example, given the following query, the server does not need to look at the table file if an index exists on PAYRATE; the index alone is sufficient to give the result.

SELECT MAX(payrate), MIN(payrate), AVG(payrate) 
    FROM employee

Likewise, any COUNT aggregate using a WHERE clause that can be resolved via an index will not access the base table. For example, if an index exists on columns A and B in TABLE1 and the following query is issued:

SELECT COUNT(*) FROM table1 
    WHERE a=2 AND b=3;

the information will be available via the index. The base table (TABLE1) is not accessed.

Predicates

In the case of a simple predicate involving an indexed column, the XDB Server will generally use the index to speed retrieval. For example, given an index on ZIP, that index will be used to process the following query:

SELECT * FROM employee 
    WHERE zip = "20441"

Expressions

If an indexed column appears in a query only as a part of an expression that alters the column, the index will not be used. For example, given the following query, an index on the PAYRATE column will not be used, even if it exists:

SELECT * FROM employee 
    WHERE payrate * 1.10 > 13.00

Likewise, the index will not be used in the following query:

SELECT * FROM employee 
    WHERE payrate > 13.00/1.10

Usually, it is best to perform the constant calculations prior to executing the query.

LIKE Predicate

When constructing queries containing a LIKE predicate that involves an indexed column, try to avoid pattern matching strings that begin with percent (%) or underbar (_). If the string in a LIKE predicate begins with either of these two characters, it automatically indicates that a sequential search must be performed on the entire index. For example, given an index on LNAME, the query:

SELECT * FROM employee 
    WHERE lname LIKE "_risp%"

might take significantly longer than the query:

SELECT * FROM employee
    WHERE lname LIKE "Krisp%"
      OR lname LIKE "Crisp%"

In the first case, the entire index must be searched for any names that have "risp" as the second through fifth letter. This is virtually the same as having no index at all. In the second case, the search is for two specific starting strings; these are more quickly found because the search does not have to go through the entire index.

Index Usage with AND and OR

In the case of compound predicates connected with AND, the server will use all available indexes to find the intersection. For example, given indexes on columns ZIP and DEPT, both indexes might be used to process the following query:

SELECT * FROM employee 
    WHERE zip = "20441" 
      AND dept = "2020" 
      AND payrate < 12.00

If an index had existed on PAYRATE, it would be used, but its absence does not prevent the use of the other indexes.

In the case of compound predicates connected with OR, the server uses an index, only if the two conditions reference the same column (and that column has an index). For example:

This query uses the index on the DEPT column:

SELECT * FROM employee
    WHERE dept = "2020"
       OR dept = "2021"

This query does not use the index on the DEPT or PAYRATE columns:

SELECT * FROM employee 
    WHERE dept = "2020" 
       OR payrate = 12.00

Nested Queries

While algorithms exist on some systems to convert any nested query into a flat join, the benefit of such algorithms is not always significant. Therefore, the server does not perform the conversion. Instead, the server continually makes use of feedback from intermediate results to avoid unnecessary searches. If the server methodology is not the most efficient for your data, you can restructure the query as a flat join.

When processing a nested query, the server first evaluates the inner query (subquery). Consider the following query:

SELECT * FROM dept 
    WHERE mgr IN 
        SELECT e_no FROM employee 
            WHERE city = "Wild Woods"

The XDB Server first finds all employees who live in Wild Woods, and modifies the nested query to process the outer query. If an index exists on MGR, it might be used after the nested query has been processed.

Correlated Queries

When processing a correlated query, the outer query is processed first. The server compares the previous value to the current value to see if the inner query needs to be executed again. For example, given the following query to find the supplier who supplies the largest quantity of each product:

SELECT sno, pno FROM partsupp p
    WHERE qty = 
        SELECT MAX(quantity) FROM items 
            WHERE pno = p.pnoORDER BY pno

For each record in the PARTSUPP table, the inner query is evaluated to find the maximum quantity. If the next PARTSUPP record is for the same product, the maximum quantity is already known, and therefore the subquery is not performed. If PNO is indexed, then adding the ORDER BY PNO clause makes the query faster because the records with the same PNO values are all grouped together.

A correlated query whose correlated query block always evaluates to false is detected and does not continue to execute. If the correlation condition is not the condition that causes the query block to be false, the query block is always false.

For example, in this query:

SELECT * FROM table1 x 
    WHERE EXISTS 
        SELECT * FROM table2 y,table3 z 
            WHERE y.f1 = z.f1 
              AND x.f2 = z.f2 
              AND y.f3 > 10;

if the inner SELECT does not retrieve any records without evaluating the correlation condition (X.F2=Z.F2), it can be determined that the query will not return any records; no records match the non-correlated criteria.

If only one correlation condition exists, the previous value of the correlation will unconditionally be saved for comparison with the next correlated value. (In previous versions of the XDB Server, this was only saved and compared if the correlated condition was not embedded in a NOT, AND or OR condition.)

Join Optimization

The XDB Server uses a number of optimization techniques for processing joins. When performing a multitable join, performance might improve significantly if you specify all possible join conditions; this allows the system to perform the joins with the optimal selectivity ratio first, thus reducing the number of comparisons required.

For example, imagine three tables: EMPLOYEE, DEPARTMENT and PROJECT. Every employee works in a department. Every project is the responsibility of a department. Every project is managed by an employee from the responsible department.

Thus there are three possible join conditions that can be used to join the three tables:

DEPARTMENT.DEPT_ID=EMPLOYEE.DEPT_ID
DEPARTMENT.DEPT_ID=PROJECT.RESPONSIBLE_DEPT_ID
EMPLOYEE.EMPLOYEE_ID=PROJECT.PROJECT_MANAGER_ID

While it is sufficient to provide any two of these join conditions in your query, you might experience significant performance improvement if you provide all three join conditions.

As a general rule, when joining n tables, you should provide at least n-1 join conditions. When the context of a query permits, optimal performance might be achieved by providing any additional, valid join conditions. However the use of redundant join conditions must be balanced against the readability and parsing cost of the query.

Sorting

When records are to be retrieved in a sorted order, indexes will speed up sorting.

If no appropriate indexes exist, qualifying records are located by performing a sequential search on the table. The resulting subset of records is sorted. The efficiency of the sort operation is affected by the Sort Space Size setting specified in the Configuration Utility.

If an index exists, a sequential search of the index (index walk) will be performed in some cases. For example, if an index exists on the PNO column, an index walk is performed to process the query:

SELECT * FROM part
    ORDER BY pno

In the following query, the index walk algorithm will make use of a compound index on the SNO and PNO columns:

SELECT * FROM partsupp
    ORDER BY sno, pno

When possible, the server uses the compound index if the first element in the compound index is referenced in a predicate or ORDER BY clause, even if the other columns are not referenced. For example, if the ORDER BY clause in the query above included only the SNO column, the compound index would be used. If the ORDER BY clause included only the PNO column, the index would not be used.

In the case of queries involving predicates and ORDER BY clauses, if appropriate indexes exist, the server will use the index to find the qualifying records, and then sort just those records. For example, given the following query, with indexes on PNO and QTY, the server uses the index on QTY to find the records that qualify, and then orders the result. The index on PNO is not currently used.

SELECT * FROM partsupp 
    WHERE qty >= 200 
    ORDER BY pno

IN LIST with a Range Condition

If an IN LIST exists in conjunction with a range condition, a compound index can be used. In the following example, there is an index on columns A, B and C in TABLE1:

SELECT * FROM table1 
    WHERE a = 3 
      AND b IN (6,7) 
      AND c BETWEEN 2 AND 20;

Joins with Non-Join Predicates

Joins will take advantage of additional non-join predicates when compound indexes exist that involve both the join condition and the predicate.

For example, a compound index exists on TABLE2 (A,B,C):

SELECT * FROM table1,table2 
    WHERE table1.a = table2.a 
      AND table2.b = 3 
      AND table2.c BETWEEN 1/1/1992 AND 12/31/1992;

If TABLE2 is an inner table of the join, the additional constraints will be applied when the join condition is applied.

Join Restrictive List Analysis

If a restrictive list exists and has been prefetched on the inner table of a join, it might be discarded if the size of the list is larger than a certain percentage of the table size. If the restrictive list is discarded, the join index will be used (no merging is performed).

Query Performance Summary

The guidelines listed below summarize the ways you can improve query performance.

If nested or correlated queries do not seem to process efficiently, restructure the query as a flat join to see if that method improves processing speed.
Use a correlated query instead of a nested query if the correlated column is indexed and the nested query returns a large amount of data.
Try to index the columns that have the lowest selectivity values (those with the largest number of different values), and construct your queries to include predicates involving these columns.
Try to avoid OR conditions on different columns. Use a union of queries containing AND conditions instead.
When creating compound indexes, try to list columns in the same order they appear in the ORDER BY clauses of your queries to allow the compound index to be used to speed up sorts.
Avoid numeric conversions. Try to make comparisons between columns of the same type.
Avoid the use of LIKE predicates beginning with % or __.
Avoid arithmetic expressions in a predicate, especially those that evaluate to a constant. Perform the calculation beforehand, if possible.
Select only the columns you need. Avoid SELECT * FROM when possible, especially when retrieving data across a network. Unneeded data creates unnecessary network traffic.
Avoid using COUNT(*) to check for existence.
Evaluate the benefits of single vs. compound indexes. Use compound index ordering if suitable. If possible, order the columns in the compound index by highest selectivity (that is, order the columns with the most unique values first).
Provide redundant information when joining tables, so that the system can select the lowest cost solution for you.
If a table is frequently updated or if most searches on the table are sequential searches (every record is accessed), avoid building any indexes on the table except for those used to enforce uniqueness.
Use views only when absolutely necessary. Each view access requires additional I/O and processing costs.

Improving Production Processing

This section describes options that can help improve overall processing speed when using the XDB Server.

Integrity Constraints and Processing Speed

The XDB Server supports schema level domain and referential integrity constraints. While integrity checking is a valuable facility, it can impact database performance considerably, especially on UPDATE, INSERT and DELETE operations.

You must determine whether the advantages of schema level integrity constraints outweigh the additional time required to enforce them. You might want to eliminate some of these checks at the schema level, and instead perform them through the validation facilities of the front-end data entry application. Remember that in a LAN environment use of integrity features creates additional network traffic.

Note that some indexes are automatically created and maintained by the system when integrity constraints are imposed on the database. These indexes can be dropped only by dropping the associated constraints.

For more information about referential constraints, see the following in the SQL Reference:

PRIMARY KEY, FOREIGN KEY and REFERENCES options on the CREATE or ALTER TABLE command.
WITH CHECK OPTION clause on the CREATE VIEW command.

Locking Options

In multiuser systems, the time spent waiting for locks can be considerable. The XDB Server offers three means of adjusting the effect of locks on performance.

Isolation Level.
Adjust the isolation level to increase or decrease the length of time that locks (especially shared or read-only locks) are held. The sooner a lock is released, the sooner the resource can be locked by another transaction. The trade-off for increased concurrency is decreased consistency.

Each user can set a default isolation level in the client configuration. To specify the isolation level during a session, use the SET ISOLATION command.

On a single-workstation XDB Server that will be operating only one client session at a time, using the Exclusive Use (EU) isolation level will eliminate locking overhead.
Lock Granularity.
The time to perform many batch operations that affect many records of a table can often be greatly decreased by acquiring a table lock. Table locks are acquired by issuing the LOCK TABLE command. (See the SQL Reference for more information on the LOCK TABLE command.) Performance improves because there is no need to lock individual records as they are accessed. However, in some cases it will take a considerable amount of time to acquire a table lock. Also, other processes that need to access the locked table must wait until the table lock is released.
Timeout Period.
Users can adjust the amount of time the system tries to acquire a lock. If the time is too short, performance might deteriorate because transactions cannot acquire locks and must be continuously rolled back. On the other hand, if the time is too long, performance might deteriorate because it takes longer to resolve potential deadlocks.

Users can set their own timeout periods by setting the Lock Request Timeout in the client configuration.

Caching

Caching is available for all database objects, including tables, indexes and data dictionary files. The cache size is specified at the server level using the Caching Options screen of the XDB Server Configuration Utility. Cache memory usage and other related information can be monitored on the XDB Cache Statistics screen while the server is running.

The major benefit of using cache is to keep frequently accessed disk blocks in RAM to ensure that subsequent access to these blocks requires no additional disk I/O. The buffer manager uses a searching algorithm to determine if a block exists in cache before deciding to retrieve that block from disk. If a block does not exist in the cache, this searching time can be classified as additional overhead because the search and the disk I/O must be performed.

The buffer manager also has a tossing algorithm. If the cache is full, some current cache blocks must be discarded before new blocks can be added. The determination of which blocks to discard is based upon a modified clock algorithm.

The cache manager can provide a performance boost only if the reduction in I/Os outweighs the CPU cost required to search and toss. The more objects there are to cache and the more random the access, the less useful a cache manager becomes.

For example, consider a database that is significantly larger than available RAM, with a large number of indexes (to optimize query processing) and data access patterns that are mostly random. All those indexes are eligible for caching, but the random access nature of the queries means that there will probably be a high percentage of cache tossing and low percentage of cache hits. This would indicate that each query was accessing either different I/O disk blocks or I/O blocks that had previously been tossed from the cache. In this situation, large cache sizes would have limited use.

If your system shows a high percentage of tossing and low percentage of cache hits, you should choose a cache size that would not assume I/O overlap between the query set, but would allow I/O reduction for an individual query. In this fashion, each individual query could benefit from common disk I/O access, while reducing the amount of CPU time required by the cache manager to search and toss.

It is important to check the cache hits and misses when all or most of your users are actively accessing data. Generally, the cache hit ratio increases as the number of users grows, especially when those users are performing similar tasks.

Special Read-Only Caching

Outside of the normal cache manager, the server provides additional buffer management techniques specific to read-only data. These techniques are available when the user isolation level is set to dirty read (DR), snapshot read (SR) or exclusive use (EU).

In this cases, the server provides buffered I/O at the cursor level.

If sequential access of a database object is detected within an individual cursor, sequential prefetch of data can be triggered, allowing multiple data pages to be read into a buffer controlled by that individual cursor. This method of buffer management enjoys the benefits of reduced I/O without the normal cache manager overhead (search and toss), and eliminates the need for critical sections.

Record ID (RID) Pool Buffer

The Query Optimizer makes use of a Record ID (RID) Pool Buffer when using the list prefetch algorithm for analyzing data access paths. See the section Optimizing Query Processing for more information on the list prefetch algorithm. List prefetch retrieves Record IDs from an index (in response to the constraints of a particular WHERE condition). The RID Pool Buffer is used to store the Record IDs. If required, the records are sorted before being stored in the RID Pool Buffer.

If the number of Record IDs fetched exceeds the available RID Pool size, the list prefetch is aborted and less efficient processing routines are used. For best performance, the size of the RID Pool Buffer should be large enough to contain the number of Record IDs expected from an index search.

The RID Pool size is specified in the XDBRIDPOOL environment variable in the server machine's configuration file. See XDBRIDPOOL in the appendix Environment Variables for more information on using this environment variable, and for a formula for calculating RID Pool Size.

Number of Open Files

The XDB Configuration Utility allows you to specify the Maximum Number of Open Files. This setting, which represents the maximum number of files that can be kept logically open, should include only the required number of files. When a file is opened in response to a user request, the system scans an internal array to see if the file is already open. If it is, then processing time is improved.

Number of Open Queries

The XDB Configuration Utility allows you to specify the Maximum Number of Open Queries. This setting, which represents the maximum number of queries that can be open at a given time, should include only the required number of queries. When an SQL statement is committed or rolled back, the system scans an internal array of open queries to find the one that should be closed. If the number specified for the Number of Open Queries is approximately the same as the number actually open at any given time, then processing time is usually improved. Specifying too large a number can increase processing time.

Governor Settings

Use the Governor to limit user activity on the XDB Server. The settings you enter for the Governor can affect performance. The more frequently the Governor checks a user's activities, the slower the XDB Server will run. There is a trade-off among frequency of checking, performance, and how long the Governor will wait before terminating a user session.

Every n seconds, the Governor checks the status of the various options against actual server activity. Because the monitoring is not constant, the limits you set could be slightly exceeded before the Governor ends the user session. For example, a user might actually make 104 engine calls before the Governor sees that the user was allowed only 100 engine calls. You can make the Governor more accurate by setting the frequency for checking higher (such as every 10 seconds as opposed to every 30 seconds). However, there is a performance penalty for more frequent checking. See the chapter Server Configuration Utility for information about setting the governor cycle time.

Miscellaneous

Some general guidelines that can improve processing speed include the following:

Close queries as soon as possible.
Commit transactions as soon as possible.
Change to dirty read (DR) when doing read-only operations where reading uncommitted changes are acceptable (some report writing, for example).
Balance work between the client CPU and the XDB Server CPU. For example, try to minimize ORDER BYs and aggregates. When writing COBOL programs, take advantage of the COBOL sort facility (which can process on the client's workstation) instead of using ORDER BY (which processes on the XDB Server).
To improve batch processing speed, consider turning off auto commit and forward logging if the process can be easily re-run.
When using embedded SQL in a program (any language), and you are executing many similar queries (same structure, but different values), use a PREPARE statement and multiple EXECUTE statements. For example, specify one PREPARE to set up the structure. Then EXECUTE repeatedly using different host variables for each.

Reducing Disk I/O Contention

The XDB Server allows a single database to span multiple physical devices, as long as no single object of the database spans more than one physical device. Thus, while no database object (table or index) can exceed the size of its physical storage device, objects can be spread among many devices to reduce I/O contention.

For example:

Table Placement.
Ideally, each table should be stored on a different disk drive so queries requiring access to multiple tables will not create I/O contention on one disk. If there are more tables than disk drives, tables can be arranged such that those least likely to be accessed in the same query would be grouped together.
Index Placement.
Generally, indexes should not be placed on the same disk drive as other information (tables or indexes) that will likely be accessed by a query that uses the index. Some indexes, however, can be placed with any other object without creating disk I/O contention.

If an index is used for list prefetch or an index contains coverage for a particular table (all columns referenced for a particular table are within the index), the index will not be accessed at the same time as the base table. See the section Optimizing Query Processing for more information on list prefetch. Thus, all indexes used for list prefetch can be placed on any physical device because these indexes are accessed once, and only once, during the processing of a particular query. Likewise, an index which contains full coverage and is not involved in a join will be the only object accessed in a particular query, and thus could be placed on any physical device as well.
Temporary File Placement.
Temporary files should be written to or read from a disk drive other than the one used to store data and indexes, or a drive containing data that is little used by most queries. Temporary files are created when the SQL engine performs a sort that overflows the configured Sort Space Size (as specified in the Configuration Utility). Operations that perform sorts include CREATE INDEX statements and SELECTS that involve DISTINCT, ORDER BY, UNION or GROUP BY.

The guiding principle in determining object location is to spread the workload evenly among the drive controllers and disk heads. If disk head movement can be kept to a minimum and drive controller usage evenly dispersed, optimal performance can be obtained. Table and index caching can improve performance even more. See the section Improving Production Processing for more information.

Given that the server stores database objects in separate files, tuning of database object locations can be done at any time. By simply copying the database object file to a different physical device and updating the stogroup or environment variable pathname for that particular database object, different space allocation strategies can be tested.

Paths for stogroups are maintained in the SYSIBM.SYSVOLUMES table. If a table space was created without specifying a stogroup, the table space's path will be that of the database (either ..\loc-path\dbname or the path of the stogroup that was used to create the database). For greatest flexibility, each table should be associated with its own stogroup (drive and path) when it is created.

Paths for indexes and other objects can be specified in environment variables in the server machine's configuration file. (See the chapter Environment Variables for more information.)

Guidelines on how to position tables, indexes and temporary files are provided on the following pages.

Positioning Tables

Each database table can be associated with a physical device (drive and path) by using a stogroup and table space concept similar to that of DB2.

A stogroup is defined and associated with a physical device (drive and path). A database is then created using a particular stogroup. By default, any table subsequently created within that database is placed physically on the device (drive and path) specified by the database's stogroup.

To distribute tables to various disk drives, table spaces and stogroups are created. Each table space is associated with a stogroup, which in turn is associated with a drive and path. When a table is created, a table space can be specified, thus physically placing the table on the drive and path that is associated with the named table space.

For example, the following CREATE commands will create two stogroups (named STOR1 and STOR2) on two separate disk volumes (named MINE and YOURS). The UPDATE commands will define the disk drive and directory for each. (The directories must be created ahead of time using the operating system MAKE DIRECTORY (MD) command.)

CREATE STOGROUP stor1 VOLUMES (mine) VCAT xdb;
CREATE STOGROUP stor2 VOLUMES (yours) VCAT xdb;
UPDATE SYSIBM.SYSVOLUMES 
    SET XdbPrimeVolPath = "y" , 
        XdbVolPath = "d:\stor1\" 
    WHERE Volid = "mine";
UPDATE SYSIBM.SYSVOLUMES   
    SET XdbPrimeVolPath = "y" , 
        XdbVolPath = "e:\stor2\" 
    WHERE Volid = "yours";

Note: The fields XdbPrimeVolPath and XdbVolPath are XDB Server extensions to the SYSIBM.SYSVOLUMES table. These columns do not appear in DB2.

To complete the example, you would use CREATE commands such as the following to specify storage locations for objects you create, in this case tables.

CREATE DATABASE xyz STOGROUP stor1;
CREATE TABLESPACE abc in DATABASE xyz 
    USING STOGROUP stor2;
CREATE TABLE firsttable 
    (fld1 SMALLINT, fld2 CHAR 1) 
    IN xyz;
CREATE TABLE secondtable 
    (fld1 SMALLINT, fld2 CHAR 1) 
    IN xyz.abc;

The table FIRSTTABLE will reside on d:\stor1 because it uses the stogroup defined for the database. The table SECONDTABLE will reside on e:\stor2 because it uses the stogroup defined for the table space. If desired, you can create a separate table space for each table, each with its own associated stogroup/path.

To move a table to a different device, copy the *.tab file to the desired drive and path, and update the path name for the associated stogroup in the SYSIBM.SYSVOLUMES table. If the index files (*.idx and *.ptr) are stored with the tables and do not have special paths defined for them in environment variables, be sure to copy them along with the related table. (See the section Positioning Indexes for more information about moving indexes.)

See the SQL Reference for more information about the storage architecture (including examples) and the commands shown here.

Positioning Indexes

An XDB index can be stored on a physical device different from the device where its related tables are stored by using an environment variable and an extension to the VCAT parameter of the CREATE INDEX command.

To store each index on its own path, use the XDB_index-name_ location-name environment variable to specify the path for storing each index. Then create each index using the VCAT NAME option on the CREATE INDEX command (using the index-name specified in the environment variable). If a qualifier is required for the index, you can enter a two-part name such as XDB_eric.orderidx_location1.
To store all indexes for a given location on the same path, use the XDBIDX_location-name environment variable to specify the path for storing all indexes for the location. Then create indexes for the location using the VCAT PATH option on the CREATE INDEX command.

For example:

Enter the environment variable XDB_index-name_location-name in the computer's configuration file to specify a drive and path. You can create one or more index paths, such as:
```
SET XDB_customerindex_location1= d:\indexes\customer
```
```
SET XDB_ordersindex_location1= d:\indexes\orders
```
1. Reboot.
2. Create the indexes with the VCAT NAME parameter, as follows:
```
CREATE INDEX customerindex ON customer(c_no) 
    USING VCAT NAME;
```
```
CREATE INDEX ordersindex ON orders(o_no) 
    USING VCAT NAME;
```
  The VCAT NAME option causes the XDB Server to check the environment variable for the drive and path to use when creating the index. XDB index files are placed on the path associated with the environment variable for that particular index.

To move an index to a different device, simply copy the *.idx and *.ptr files to the desired drive and path, and change the path specification in the XDB_index-name_location-name or the XDBIDX_location-name environment variable. If the index path is not specified in an environment variable, then the index files must be in the same path as the related table.

See the CREATE INDEX command in the SQL Reference for more information about VCAT NAME and VCAT PATH. See the chapter Environment Variables for more information about the index environment variables.

Positioning the Temporary Path

The temporary path, specified in the Configuration Utility, is used to store temporary files such as those created during a sort procedure that overflows the Sort Space Size (specified in the Configuration Utility). Since these temporary files are written and read in a sequential fashion, it is important to reduce I/O contention between a temporary file operation and a related database object operation.

For example, performance can be optimized during CREATE INDEX operations by having the database table read from one disk and its temporary files written and read on another.

It is important to remember that temporary files can grow to be as large as the maximum database object size. Ideally, space for temporary files should be reserved on a disk drive separate from tables and indexes.

Increasing Data Import Speed

The SQLWizard Import Utility provides interactive and batch capabilities for transferring data from various formats into tables, including the DSNTIAUL format that can be generated from mainframe DB2 data. The DSNTIAUL format stores data in its internal DB2 representation, thus requiring little or no conversion before insertion into the database.

One additional feature is available in the Import Utility, to speed the processing of import data even more, called Direct Write. This option bypasses all referential integrity features normally imposed by the SQL engine and eliminates all engine overhead that would be encountered during normal record insertions. Instead, information is written directly to the table file. This method of data loading achieves maximum performance.

See the SQLWizard User's Guide for more about the Import Utility and the Direct Write Import feature.

Improving Performance During Object Creation

XDB bufferpool caching is consolidated into a buffer management system. Instead of specifying separate cache sizes for table, index and dictionary buffers, a single unified cache size is specified for all accessed locations. The server also provides tuning options for improving the speed at which tables and indexes are created, as well as performance improvements that depend on the methods or options used during object creation. These include:

Specifying cache size
Using clustered indexes
Specifying the percentage of free space for indexes (read-only environments)
Specifying the blocking factor for duplicate Record IDs
Specifying sort buffer size

Buffer Management System

The buffer manager is designed to enhance performance. The buffer manager uses sophisticated search and toss algorithms and is tightly integrated with the log manager. You configure buffer manager memory as a single buffer as opposed to specifying memory amounts for each database object type. This single buffer manager handles access to tables, indexes and dictionary information, and simplifies configuration by allowing more intelligent cache use. Memory is allocated by the cache manager at server start up and released at server shut down.

Use the Configuration Utility to specify the cache size at the server level, and to set the checkpoint frequency parameter, which controls periodic purging of logging system cache. See the chapter Server Configuration Utility chapter for information.

Clustered Indexes

In read-only environments, or for any table that does not have frequent inserts, it might be beneficial to define a clustered index. (You can have only one clustered index per table.) Specifying the CLUSTER option on the CREATE INDEX command gives you the option to later reorganize (re-order) the table's records into the same sequence as the related clustered index. For example:

CREATE INDEX pnoindex ON part (pno) CLUSTER

To reorganize the table's records into the same sequence as the clustered index's Record IDs, you must use the SQL COMPACT command with the CLUSTER BY INDEX option. For example:

COMPACT TABLE part CLUSTER BY INDEX pnoindex

Determining how often a table needs to be reorganized depends on how often records are added (inserted). The server leaves records in place when an update is issued (maintaining the order of records), but always inserts new records at the end of a table. A table that has many inserts would therefore need a reorganization more often than one that has few inserts.

Warning: If an update is made to the clustering key, the order of records in the table may be compromised. Use COMPACT to correct this situation.

See the CREATE INDEX and COMPACT command descriptions in the SQL Reference for more information about clustered indexes.

Read-Only Environments

In a read-only environment, you can create all indexes with the parameter PCTFREE 0. This parameter allows all index blocks to be filled to capacity, thus reducing the size and number of index levels. This reduction in size allows faster index creation and searching.

Block Size for Duplicate Record IDs

A blocking factor for duplicate key Record IDs can be specified using the DUPQTY parameter on the CREATE INDEX command. For best performance, the DUPQTY parameter for each index should closely resemble the index's distribution of duplicates. Indexes with high duplicate content should have high DUPQTY blocking factors, while more unique indexes should have low DUPQTY blocking factors.

Sort Space Size

The XDB Server performs sorts in a variety of situations, such as the processing of ORDER BY, DISTINCT, UNION and other clauses that appear in SQL statements. Creating or rebuilding indexes, and re-clustering with the COMPACT command requires a significant amount of sorting.

To enhance sorting operations, you can set the amount of memory that the XDB Server will attempt to acquire for this purpose (use the XDB Configuration Utility). You might want to specify a higher amount when creating indexes or compacting/reclustering tables, and set it back to a lower amount for normal operations. The higher the number, the faster large sorts can be processed. Also, larger amounts of data can be processed at a single pass, thus requiring fewer merges of data. However, if set too high relative to the number of concurrent sorts, users or available RAM, processing of queries can be slowed.

The amount you allocate for sorting is the amount of memory that will be made available for each sort request that the system receives. This is not a common pool for all users, but rather a per-sort amount. If a particular sort request does not require the full amount allocated, the excess is released back to the operating system.

Administrative Performance Tools

The following tools can help you determine how queries are being processed, how memory and cache are being used, and other performance-related information.

Virtual Statistics Tables

Virtual statistics tables allow you to run SQL queries against a snapshot of server usage data. The kinds of information you can extract depends on the query you construct and run against these special tables. For example, you can create such information as:

The number of users currently logged in.
A list of logged in users; most recent first (or last).
A list of users who have more than one query open and in repeatable read isolation level.
Number of users doing joins in a particular location.

The Virtual Statistics Tables contain dynamic information about the state of the server. The tables are populated when they are queried to give you current information. Apart from their dynamic nature, they can be accessed just like normal system tables, using joins, conditional clauses, sorts, aggregate functions, etc. Some of the information in these tables duplicates the data displayed on the various server screens, while some display other kinds of information.

All of the Virtual Statistics Tables have the AuthID XDBSTAT and are created (when queried) in the DSNDB07 database. They have no indexes. The format and content of the Virtual Statistics Tables are defined in detail in the SQL Reference. Examples of queries that can be run against these tables to extract information are presented below and on the next page.

Example: Number of Users Currently Logged In

SELECT COUNT(DISTINCT nodename) 
    FROM xdbstat.userlist;

Example: List of All Users, Most Recent First
```
SELECT * FROM xdbstat.userlist
```
ORDER BY attachts desc;

Example: List of All Users Having Greater than One Query Open in Repeatable Read (RR)

SELECT username||"@"||nodename 
    FROM xdbstat.userlist a, xdbstat.userresources b 
    WHERE a.username = b.username 
      AND a.nodename = b.nodename 
      AND b.numqds >1 
      AND a.ilevel = "RR";

Example: Number of Users Doing Joins in System Location

SELECT COUNT(*) 
  FROM xdbstat.queryinfo 
  WHERE locname = "SYSTEM" 
    AND numtables > 1;