Programming for Performance
Page 4 of 4

Performance, Optimization, and Indexes

In a Cloudscape system, units of storage are called conglomerates; they are either tables or indexes.

An index is a database object that can improve the performance of some statements. You create an index on a column (or columns) in a table to provide a fast lookup of the values stored in that column. In a table, rows are stored in the order in which they are inserted--that is, in no useful order. Deletes, updates, and new inserts can rearrange this order. Tables are sometimes called heaps for this reason. In indexes, Cloudscape stores the values of the indexed column or columns in order in a tree structure called a BTREE. Depending on the amount of data, Cloudscape provides a number of nodes in the tree to provide quick lookup of values.

Figure 7-1 shows the "top" of the Flights table, where the rows appear in no particular order (well, in the order in which they were inserted).

Figure 7-1 Partial view of the Flights table

The orig_airport column is indexed. Its index stores every value in the orig_airport column plus information on how to retrieve the entire corresponding row for each value, as shown in Figure 7-2:

  • For every row in Flights, there is an entry in the index that includes the value of the orig_airport column and the address of the row itself. The entries are stored in ascending order by the orig_airport values. This level is called the leaf level of the index.
  • One or more abstract levels in the BTREE structure have values that point into lower levels of the index, much like tab dividers in a three-ring notebook that help you find the correct section quickly. These levels help Cloudscape determine where to begin an index scan.

Figure 7-2 The index on the orig_airport column helps Cloudscape find the rows for which orig_airport is equal to a specific value.

Types of Queries for Which Indexes Are Useful

Queries that are aided by indexes usually have WHERE clauses. A WHERE clause means that you want to restrict the results returned to specific records. Retrieving only some rows from a table is called restriction. A query without a WHERE clause usually means you want all the records in the table, even if you want only specific columns. Retrieving only some of the columns in a row is called projection.

Indexes are useful in a few cases:

  • restrictions (discussed in this section)

    If you want all the rows in the table, it doesn't make sense to find the locations of specific rows; you want all of them.

  • cases in which the index covers the query (discussed in Covering Indexes)

    If all the data you need is in the index, you never need to go to the base table.

Restricting statements that can make use of an index provide starting and stopping points for an index scan in a WHERE clause. That is, they tell Cloudscape the point to begin its scan of the index and where to end the scan.

The simplest example of such a statement is one in which the WHERE clause specifies a specific value for the indexed column, in which case the stop and start positions are the same. For example:

SELECT * FROM Flights WHERE orig_airport = 'SFO'

Cloudscape can use the lookup nodes in the BTREE index to get to the value SFO very quickly.

More complex comparisons in the WHERE clause also provide start and stop positions. An example is a statement with a WHERE clause looking for rows for which the orig_airport value is less than BBB. This means that Cloudscape must begin the scan at the beginning of the index; it can end the scan at BBB.

An index scan that uses start and stop conditions is called a matching index scan.

As it turns out, the Flights table does indeed have an index on the orig_airport column. In the next few tasks, you will compare the performance of retrieving a subset of rows from the table with and without the index.

Find the Name of the Index on the orig_airport Column

First, let's find out the name of the index on the orig_airport column. The easiest way to do that is with Cloudview.

  1. Start Cloudview as described in Start Cloudview in the Home Directory. Then open a connection to toursDB as described in Open a Connection to toursDB.
  2. Select the toursDB icon in the left-hand hierarchy window. Click the "+" to the left of the icon to display the Table icon.
  3. Click the "+" to the left of the Flights table to display its indexes.
  4. Click the Indexes beneath the Flights table icon.

    Indexes appear in the Name box (on the right). (Indexes that are created to enforce or back up primary, unique, and foreign key constraints are not displayed here.)

  5. ORIGINDEX appears in the Name box; double-click it.

    It is defined on the orig_airport column.

Execute a Query with and without an Index

  1. Click the toursDB icon in the left-hand hierarchy window to display the SQL window.
  2. Execute the following query from the SQL window:

    SELECT * FROM Flights WHERE orig_airport = 'SFO'

    The execute time should appear in the bottom bar below the results; write down this number.

  3. Now delete the index:

    DROP INDEX OrigIndex;

  4. Now re-execute the query (use the back arrow in the SQL window to return to the query, then click Execute).
  5. Note the new execute time.

    In our system, the query execution time using the index was 0.251 seconds, and without the index the execution time was 0.922 second.

  6. Re-create the index:

    CREATE INDEX OrigIndex ON Flights(orig_airport);

Re-creating the index takes longer than the query, because Cloudscape has to rebuild the BTREE structure and order the data.

There is an easier way to prevent Cloudscape from using a useful index: using a PROPERTIES clause to specify a null value for index. When the index value is set to null, Cloudscape must do a full table scan. In the following sections, you will use a PROPERTIES clause to test execution time without an index.

Execute a Query with a More Complex Comparison

  1. Execute the following query from the SQL window:

    SELECT * FROM Flights
    WHERE orig_airport BETWEEN 'SCL' AND 'SFO'

    The execute time should appear in the bottom bar below the results; write down this number.

  2. Now execute the query with a PROPERTIES clause that sets the index value to null:

    SELECT * FROM Flights PROPERTIES index=null
    WHERE orig_airport BETWEEN 'SCL' AND 'SFO'

  3. Note the new execute time.

    In our system, the query execution time using the index was 0.15 seconds, and without the index the execution time was 1.192 seconds.

Execute the Query That Uses the Column in an Expression

In the previous examples, the reference to the indexed column was "simple"; it was not used within an expression. Here are two examples of a reference to a column used within an expression:

orig_airport.toLowerCase()
orig_airport || 'hello, world'

|| is the concatenation operator; it concatenates two strings, or appends one string to another.

  1. Execute the following query from the SQL window:

    SELECT * FROM Flights
    WHERE orig_airport.toLowerCase() = 'sfo'

    The execute time should appear in the bottom bar below the results; write down this number.

  2. Now re-execute the query, forcing the table scan:

    SELECT * FROM Flights PROPERTIES index=null
    WHERE orig_airport.toLowerCase() = 'sfo'

  3. Note the new execute time.

    In our system, the query execution time using the index was 0.06 seconds, and without the index the execution time was 0.19 seconds.

    In this example, there is no substantial difference in execution time. That's because Cloudscape cannot use an index if the WHERE clause uses the column name within an expression.

Covering Indexes

Even when there is no definite starting or stopping point for an index scan, an index may speed up the execution of a query if the index covers the query. An index is said to cover the query if all the columns specified in the query are included in the index. These are the columns that are all columns referenced in the query, not just columns in a WHERE clause. If so, Cloudscape never has to go to the data pages but can retrieve all data from index access alone.

Execute a Query in Which the Index Covers the Query

  1. Execute the following query and write down the execution time.

    SELECT orig_airport FROM Flights

Delete the Index and Re-Execute the Query

  1. Now delete the index:

    DROP INDEX OrigIndex;

  2. Now re-execute the query (use the back arrow in the SQL window to return to the query, then click Execute).
  3. Note the new execute time.

    In our system, the query execution time using the covering index was about twice as fast as the query execution time not using the index.

  4. Re-create the index:

    CREATE INDEX OrigIndex on Flights(orig_airport);

    Cloudscape can get all required data out of the index more quickly than from the table, since it has fewer pages to read.

RunTimeStatistics

Cloudscape provides a quick way to understand how a query was executed. If you turn on RunTimeStatistics and statistics timing attributes before executing a query, Cloudscape creates an object showing (among other things) access path and join order.

Access path means whether Cloudscape used an index or went directly to the table; join order is the order in which it examined the tables in a join.

Cloudview provides an easy-to-use interface for working with runtime statistics.

View the RunTimeStatistics for a Query

  1. Select Use Statistics in the SQL window.
  2. Execute the following query:

    SELECT Country, City
    FROM Countries, Cities
    WHERE Countries.country_ISO_code = Cities.country_ISO_code

  3. Click the Statistics tab.

    Cloudview displays a tree-shaped structure representing the way Cloudscape accessed the table. This structure shows that Cloudscape accessed the Cities table first (table scan), then used the index on Countries to access the corresponding rows in that table.

NOTE: The plan chosen by the optimizer may differ from the plan shown in the picture.

  1. Exit Cloudview.

NOTE: You will find more information on how to work with RunTimeStatistics in Tuning Cloudscape.