Tuning Databases and Applications
Page 5 of 5

Working with RunTimeStatistics

Cloudscape provides a language-level tool for evaluating the performance and the execution plans of statements, the RUNTIMESTATISTICS() built-in function.

Overview

When a special attribute (RunTimeStatistics) is turned on for a connection, Cloudscape creates an object that implements the COM.cloudscape.types.RunTimeStatistics interface for each statement executed within the Connection until the attribute is turned off.

For the most recently executed query, the object displays information about:

  • the length of the compile time and the execute time

    This can help in benchmarking queries.

  • the statement execution plan

    This is a description of result set nodes, whether an index was used, what the join order was, how many rows qualified at each node, and how much time was spent in each node. This information can help you determine whether you need to add indexes or rewrite queries.

The exact details presented, as well as the format of presentation, may change.

You access the object created with the RUNTIMESTATISTICS() built-in function. You generally do not work with the object directly, but instead call one of its methods directly on the function. See RUNTIMESTATISTICS() in the Cloudscape Reference Manual.

How You Use It

NOTE: Cloudview presents an easy-to-use interface for runtime statistics. See Viewing Runtime Statistics in Cloudview.

These are the basic steps for working with the RUNTIMESTATISTICS() function.

  1. Turn on the RunTimeStatistics attribute with the SET RUNTIMESTATISTICS statement command (see SET RUNTIMESTATISTICS statement in the Cloudscape Reference Manual for more information).

    -- turns on RunTimeStatistics
    SET RUNTIMESTATISTICS ON

  2. Turn on the Statistics Timing attribute with the SET STATISTICS TIMING statement command (see SET STATISTICS TIMING statement of the Cloudscape Reference Manual for more information). If you do not turn on this attribute, you will see the statement execution plan only, and not the timing information.

    SET STATISTICS TIMING ON

  3. If you are working in ij, set the display width to 5000 or another high number.

MaximumDisplayWidth 5000

  1. Execute an SQL-J statement.
  2. Retrieve the values from one or more of the methods of the RunTimeStatistics interface executed against the RUNTIMESTATISTICS() built-in function. For example:

    VALUES RUNTIMESTATISTICS().toString()

    For a complete list of the methods available on the interface, see the javadoc for COM.cloudscape.types.RunTimeStatistics

    In a multi-threaded environment, you must synchronize steps 4 and 5.

  3. Turn off RunTimeStatistics and Statistics Timing.

These steps have shown you how you would work with RunTimeStatistics within ij. The basic steps for working with RunTimeStatistics are the same in a java program. For a complete coding example, see the sample program JBMSTours.RunTimein the JBMSTours application. Run the program for examples of several different kinds of statements and the different output they produce.

NOTE: The exact content and format of the statement execution plan are subject to change.

Analyzing the Information

Statistics Timing

If the Statistics timing attribute is on, the RunTimeStatistics object provides information about how long each stage of the statement took. An SQL-J statement has two basic stages within Cloudscape: compilation and execution. Compilation is the work done while the statement is prepared. Compilation is composed of the following stages: parsing, binding, optimization, and code generation. Execution is the actual evaluation of the statement. (If the statistics timing attribute is off, it shows a zero time for each stage.)

Statement Execution Plan

The RunTimeStatistics object also provides information about the statement execution plan. A statement execution plan is composed of a tree of result set nodes. A result set node represents the evaluation of one portion of the statement; it returns rows to a calling (or parent) node and can receive rows from a child node. A node can have one or more children. Starting from the top, if a node has children, it requests rows from the children. Usually only the execution plans of DML statements (queries, inserts, updates, and deletes, not dictionary object creation) are composed of more than one node.

For example, consider the following query:

SELECT *
FROM Countries

This simple query involves one node only--reading all the data out of the Countries table. It involves a single node with no children. This result set node is called a TableScanResultSet. RunTimeStatistics' text for this node looks something like this:

TableScanResultSet for COUNTRIES at read committed isolation level using share row locking chosen by the optimizer.
numOpens = 1
rowsSeen = 115
rowsFiltered = 0
fetchSize = 16
constructor time (milliseconds) = 0
open time (milliseconds) = 10
next time (milliseconds) = 30
close time (milliseconds) = 20
next time in milliseconds/row = 0
scanInfo:
columnsFetchedBitSet=cols:(all)
numColumnsFetched=3
numPagesVisited=2
numRowsQualified=115
numRowsVisited=115
scanType=heap
startPosition:
null stopPosition:
null qualifiers:
None
optimizer estimated row count: 124.00
optimizer estimated cost: 50.38

Consider this second, more complex query:

SELECT Country
FROM Countries
WHERE Region = 'Central America'

When executed, this query involves two nodes--one to retrieve qualifying rows (the restriction is done at this node) and one to project the requested columns. So, at bottom, there is a TableScanResultSet for scanning the table. The qualifier (Region = `Central America') is evaluated in this node. These data are passed up to the parent node, called a ProjectRestrictResultSet, in which the rows are projected--only the country column is needed (the first column in the table). RunTimeStatistics' text for these two nodes looks something like this:

ProjectRestrictResultSet (1):
numOpens = 1
rowsSeen = 6
rowsFiltered = 0
constructor time (milliseconds) = 0
open time (milliseconds) = 0
next time (milliseconds) = 10
close time (milliseconds) = 0
restriction time (milliseconds) = 0
projection time (milliseconds) = 0
Source result set:
TableScanResultSet for COUNTRIES at read committed isolation level using share row locking chosen by the optimizer.
numOpens = 1
rowsSeen = 6
rowsFiltered = 0
fetchSize = 16
constructor time (milliseconds) = 10
open time (milliseconds) = 0
next time (milliseconds) = 10
close time (milliseconds) = 0
next time in milliseconds/row = 1
scanInfo:
columnsFetchedBitSet=cols:{0, 2}
numColumnsFetched=2
numPagesVisited=2
numRowsQualified=6
numRowsVisited=115
scanType=heap
startPosition:
null stopPosition:
null qualifiers:
Column[0] Id: 2
Operator: =
Ordered nulls: false
Unknown return value: false
Negate comparison result: false
optimizer estimated row count: 11.50
optimizer estimated cost: 49.22

Other, more complex queries such as joins and unions have other types of result set nodes.

For inserts, updates, and deletes, rows flow out of the top, where they are inserted, updated, or deleted. For selects (queries), rows flow out of the top into a result set that is returned to the user.

Table 1-6 in the Cloudscape Reference Manual shows the many possible ResultSet nodes that might appear in an execution plan. Some examples are:

  • ProjectRestrictResultSet
  • SortResultSet
  • TableScanResultSet
  • IndexScanResultSet
  • IndexRowToBaseRowResultSet

In addition, read Chapter 4, "DML Statements and Performance", for more information about some of the ways in which Cloudscape executes statements.

The statement execution plan shows how long each node took to evaluate, how many rows were retrieved, whether an index was used, and so on. If an index was used, it shows the start and stop positions for the matching index scan. Looking at the plan may help you determine whether to add an index or to rewrite the query.

Subset of Statement Execution Plan

You can execute a method called getScanStatisticsText on the RunTimeStatistics interface that returns a string describing only a subset of the full statement execution plan. It provides information only about those nodes that access a table or index. Using this method is the easiest way to find out whether Cloudscape used an index or scanned the entire table and what the join order was.

Optimizer Estimates

Runtime statistics show the optimizer estimates for a particular node. They show the optimizer's estimated row count and the optimizer's "estimated cost."

The estimated row count is the query optimizer's estimate of the number of qualifying rows for the table or index for the entire life of the query. If the table is the inner table of a join, the estimated row count will be for all the scans of the table, not just for a single scan of the table.

NOTE: You can execute a method called getEstimatedRowCount, which returns the optimizer's estimated row count. (For a tip that explains when that might be useful, see Using RunTimeStatistics to Get an Approximate Row Count for a Table in the Cloudscape Developer's Guide.)

The estimated cost consists of a number, which is a relative number; it does not correspond directly to any time estimate. It is not, for example, the number of milliseconds or rows. Instead, the optimizer constructs this number for each possible access path. It compares the numbers and chooses the access path with the smallest number.

Viewing Runtime Statistics in Cloudview

Cloudview makes working with runtime statistics easy. To use it:

  1. Before executing the query you want to analyze, select Use Statistics (the equivalent of issuing the SET RUNTIMESTATISTICS ON and SET STATISTICS TIMING ON commands).

  2. Execute the query.
  3. Select the Statistics tab (equivalent to issuing VALUES RUNTIMESTATISTICS()).
  4. The ResultSet nodes appear in a tree format similar to the following:

Figure 3-5 ResultSet nodes displayed by Cloudview

Figure 3-5 shows the ResultSet nodes for the query used as an example in Join Order Case Study. It allows you to see at a glance that the query was executed as a nested loop join; that FlightAvailability is the outer table in the join (because it is the first ResultSet); and that Cloudscape used an index to look up values on the inner table, Flights (because there is an Index scan on Flights, with the name of the index, Flights_Pk). IndexRowToBaseRowResultSet is the node in which Cloudscape uses the row ID it got from the index key in the index scan to return the matching row from the base table.

  1. Cloudview displays timing statistics for each node. If you select Total, near the top of the screen, Cloudview displays the time spent in the node, including the time for all the node's children. For example, using the example in Figure 3-5, the IndexRowToBaseRowResultSet is part of (a child of) the NestedLoopJoin node. Selecting Total adds the time taken to execute this child node, along with any other, to the timing for the NestedLoopJoin node.