Tuning Databases and Applications
Page 2 of 5

Application and Database Design Issues

Things that you can do to improve the performance of Cloudscape applications fall into three categories. In order of importance, these categories are:

  1. Avoiding Table Scans of Large Tables
  2. Avoiding Compiling SQL Statements
  3. Shielding Users from Cloudscape Class-Loading Events

Avoiding Table Scans of Large Tables

Cloudscape is fast and efficient, but when tables are huge, scanning tables may take longer than a user would expect. It's even worse if you then ask Cloudscape to sort this data.

Things that you can do to avoid table scans fall into two categories. These categories are, in order of importance:

  1. Index, Index, Index
  2. Prevent the User from Issuing Expensive Queries
  3. Understand When Statements Go Stale

Index, Index, Index

Have you ever thought what it would be like to look up a phone number in the phone book of a major metropolitan city if the book were not indexed by name? For example, to look up the phone number for John Jones, you couldn't go straight to the J page. You'd have to read the entire book. That's what a table scan is like. Cloudscape has to read the entire table to retrieve what you're looking for unless you create useful indexes on your table.

Create Useful Indexes

Indexes are useful when a query contains a WHERE clause. Without a WHERE clause, Cloudscape is supposed to return all the data in the table, and so a table scan is the correct (if not desirable) behavior. (More about that in Prevent the User from Issuing Expensive Queries.)

Cloudscape creates indexes on tables in the following situations:

  • When you define a primary key, unique, or foreign key constraint on a table. See CONSTRAINT clause of the Cloudscape Reference Manual for more information.
  • When you explicitly create an index on a table with a CREATE INDEX statement.

For an index to be useful for a particular statement, one of the columns in the statement's WHERE clause must be the first column in the index's key.

NOTE: For a complete discussion of how indexes work and when they are useful (including pictures), see What Is an Index? and Index Use and Access Paths.

Indexes provide some other benefits as well:

  • If all the data requested are in the index, Cloudscape doesn't have to go to the table at all. (See Covering Indexes.)
  • For operations that require a sort (ORDER BY), if Cloudscape uses the index to retrieve the data, it doesn't have to perform a separate sorting step for some of these operations in some situations. (See About the Optimizer's Choice of Sort Avoidance.)

NOTE: Don't index on large columns.

Make Sure They Are Being Used, and Rebuild Them

If an index is useful for a query, Cloudscape is probably using it. However, you need to make sure. Analyze the way Cloudscape is executing your application's queries. See Analyzing Statement Execution for information on how to do this.

In addition, over time, index pages fragment. Rebuilding indexes improves performance significantly in these situations. To rebuild an index, drop it and then re-create it.

Think About Join Order

For some queries, join order can make the difference between a table scan (expensive) and an index scan (cheap). Here's an example:

SELECT hotel_name
FROM Hotels, HotelAvailability
WHERE Hotels.hotel_id = HotelAvailability.hotel_id
AND Hotels.city_id = 10

If Cloudscape chooses Hotels as the outer table, it can use the index on Hotels to retrieve qualifying rows. (Given the data in toursDB, it will return three rows; three hotels have a city_id of 10.) Then it need only look up data in HotelAvailability three times, once for each qualifying row. And to retrieve the appropriate rows from HotelAvailability, it can use an index for HotelAvailability's hotel_id column instead of scanning the entire table.

If Cloudscape chooses the other order, with HotelAvailability as the outer table, it will have to probe the Hotels table for every row, not just three rows, because there are no qualifications on the HotelAvailability table.

For more information about join order, see Joins and Performance.

Cloudscape usually chooses a good join order. However, as with index use, you should make sure. Analyze the way Cloudscape is executing your application's queries. See Analyzing Statement Execution for information on how to do this.

Prevent the User from Issuing Expensive Queries

Some applications have complete control over the queries that they issue; the queries are built into the applications. Other applications allow users to construct queries by filling in fields on a form. Any time you let users construct ad-hoc queries, you risk the possibility that the query a user constructs will be one like the following:

SELECT * FROM ExtremelyHugeTable
ORDER BY unIndexedColumn

This statement has no WHERE clause. It will require a full table scan. To make matters worse, Cloudscape will then have to order the data. It's likely that the user doesn't really want to browse through all 100,000 rows, and doesn't really care whether they're all in order.

Do everything you can to avoid table scans and sorting of large results (such as table scans).

Some things you can do to disallow such runaway queries:

  • Use client-side checking to make sure some minimal fields are always filled in. Eliminate or disallow queries that cannot use indexes and are not optimizable. In other words, force an optimizable WHERE clause by making sure that the columns on which an index is built are included in the query's WHERE clause. Reduce or disallow DISTINCT clauses (which often require sorting) on large tables.
  • For queries with large results, don't let the database do the ordering. Retrieve data in chunks (provide a Next button to allow the user to retrieve the next chunk, if desired), and order the data in the application.
  • Don't do SELECT DISTINCT to populate lists; maintain a separate table of the unique items instead.

Understand When Statements Go Stale

Overview

When Cloudscape compiles and optimizes a statement, it creates what is called a statement execution plan. As discussed in Avoiding Compiling SQL Statements, this is a time-consuming process that you typically want to avoid. With stored prepared statements or prepared statements, Cloudscape can execute the same statement multiple times using the same plan and thus very quickly.

However, sometimes a statement's existing plan may no be longer the best plan for executing the statement. These situations are caused by significant changes in the number of rows in a table. In these situations, you should be willing to pay the price of recompilation, because it is cheaper than using the stale execution plan.

For example, if you have an empty table or a table with only a few rows in it, Cloudscape's optimizer will probably decide that it is easier to do a table scan than to access the data through an index. After you have loaded several thousand rows, however, a different statement execution plan is in order.

A statement can only go stale if it remains open or in the statement cache.

Table 3-1, "When Statements Can Go Stale", shows the situations in which it is technically possible for a statement to go stale.

Table 3-1 When Statements Can Go Stale

Type of Statement

When Statements Can Go Stale

Details

An application's JDBC Statement or PreparedStatement

A statement can go stale when the Statement remains open or is retrieved out of the per-connection Statement cache.

A Statement (or Prepared Statement) can only exist within the context of a single Connection. Once the Connection is closed, all statements disappear. The next time the application connects, it must create the statement anew, and so Cloudscape creates a new statement plan.

A stored prepared statement (which is compiled when you create it)

A stored prepared statement can go stale any time after you create it.

A stored prepared statement's plan persists across Connections and Cloudscape sessions.

Automatic Stale Plan Invalidation

To help you avoid stale statements, Cloudscape automatically checks whether a Statement's execution plan is still appropriate after a certain number of executions of that statement (by default, 100 executions) within a single Cloudscape session. Cloudscape determines if a statement's plan is appropriate by comparing the number of rows in the table with the number of rows in the table when the statement was first executed. If there has been a significant change in the number of the rows in the table, Cloudscape assumes the statement plan is no longer appropriate, and it invalidates and thus recompiles the statement.

This situation typically only affects SELECTs, UPDATEs, INSERT SELECTS, and DELETEs with WHERE clauses (statements that benefit from use of an index).

For example, imagine that you create a JDBC Statement (not a stored prepared statement). Its text follows:

SELECT *
FROM Flights
WHERE orig_airport = 'sfo'

Suppose that when your application creates this statement, the table is empty, and so Cloudscape chooses to scan the table instead of using the index on the orig_airport column.

During the course of a single Connection, the table gains a lot of data. The same or another user could insert thousands of rows. Unless Cloudscape notices the significant change in the size of the table, it continues to use the original--and now stale--plan. Given the system defaults, the hundredth time the application executes that statement, Cloudscape checks to see if the size of the table has changed significantly since the Statement was created. If it has, it marks the statement invalid and recompiles it before executing.

You can configure how often Cloudscape checks the appropriateness of a statement's plan with the cloudscape.language.stalePlanCheckInterval property.

Stale Plans and Stored Prepared Statements

Cloudscape tracks a table's size only within a single Cloudscape session. This limitation is not a problem for standard JDBC Statements and PreparedStatements, since they exist only within the scope of a single Connection. However, this may be a problem for stored prepared statements. If you stop and re-start Cloudscape often, Cloudscape may not notice a gradual change to a table that affects a stored prepared statement. In that situation, you may want to force recompilation, or you may want to configure Cloudscape to check whether to invalidate statements much more often (by setting cloudscape.language.stalePlanCheckInterval to a low value). If you do not stop and re-start Cloudscape often (for example, in a server environment), Cloudscape's automatic invalidation and recompilation of stale plans works quite well even with the default value.

When a Change in Table Makes a Plan Stale

Cloudscape uses the following rules to determine whether a change in a table's size is significant enough to force recompilation for a statement:

For large tables, Cloudscape uses a simple percentage test. For smaller tables, Cloudscape uses a more complex, non-linear test. The smaller the table, the more sensitive Cloudscape is to changes.

  • Simple percentage test (tables with 400 rows or more)

    If the number of rows in the table changes by more than ten percent, Cloudscape recompiles the statement.

    For example, if a table originally had 1000 rows when the statement was last compiled, and Cloudscape discovers that the table now has 1200 rows, it recompiles the statement, because the change in size is greater than ten percent. Similarly, if the table originally had 1000 rows, and it now has 800 rows, Cloudscape would recompile the statement. But if the table originally had 1000 rows and now has 1050 rows, the system would not recompile the statement.

  • Complex percentage test (tables with fewer than 400 rows)

    Cloudscape takes the square of the difference in row counts and compares it to four times the original number of rows. If the square of the difference is greater than four times the original number of rows, it recompiles the statement.

NOTE: These rules may change in future releases.

Consider the following example:

If a table originally had 8 rows, and the system discovers that it now has 15 rows, it recompiles the statement, because the square of the difference (49) is greater than four times the original size of the table (32). Similarly, if the table originally had 8 rows, and it now has 2 rows, it recompiles the statement, because the square of the difference (36) is greater than four times the original size of the table (32). On the other hand, if the table originally had 8 rows and now has 10 rows, it would not recompile the statement, because the square of the difference (4) is not greater than four times the original size of the table (32).

Avoiding Compiling SQL Statements

When you submit an SQL-J statement to Cloudscape, Cloudscape compiles and then executes the statement. Compilation is a time-consuming process that involves several steps, including optimization, the stage in which Cloudscape makes its statement execution plan. A statement execution plan includes whether to use an index, the join order, and so on.

Unless there are significant changes in the amount of data in a table or new or deleted indexes, Cloudscape will probably come up with the same statement execution plan for the same statement if you submit it more than once. This means that the same statements should share the same plan, and Cloudscape should not bother to recompile them. Cloudscape allows you to ensure this in the following ways (in order of importance):

  1. Across Connections and Sessions
  2. Within a Single Connection

Across Connections and Sessions

Statements from any connection or session can share the same statement execution plan (and avoid compilation) by using stored prepared statements. Stored prepared statements are database objects that persist within and across sessions. For more information, see Storing Prepared Statements to Improve Performance.

Stored prepared statements are not part of a standard but are unique to Cloudscape. Stored prepared statements are one of the most important features you can use to improve the performance of your application.

Within a Single Connection

As explained above in Across Connections and Sessions, you can create statements that can share the same statement execution plan (and avoid compilation) by using stored prepared statements. Within a single connection, you can also do the following to avoid extra compilation when for some reason it is impractical to use stored prepared statements:

  • Your application can use PreparedStatements instead of Statements.

    PreparedStatements are JDBC objects that you prepare (compile) once and execute multiple times. If your application executes statements that are almost but not exactly alike, use PreparedStatements, which can contain dynamic or IN parameters. Instead of using the literals for changing parameters, use ?s (placeholders) for these two parameters. Provide the values when you execute the statement.

    PreparedStatements, unlike stored prepared statements, are a standard part of the JDBC API. Another way in which they are different from stored prepared statements is that they do not require writing to disk and do not create dictionary objects. For more information about using prepared statements, see JDBC and the java.sql API (the listings for further reading on JDBC) in Using the Cloudscape Documentation.

    For examples, see JBMSTours.inserters.InsertCountries in the JBMSTours sample application and the chapter on programming for performance in Learning Cloudscape: The Tutorial.

Figure 3-1 A connection need only compile a PreparedStatement once. Subsequent executions can use the same statement execution plan even if the parameter values are different. (PreparedStatements are not shared across connections.)

  • Even if your statement uses Statements instead of PreparedStatements, Cloudscape can reuse the statement execution plan for the statements from the statement cache.

    When, in the same connection, an application submits an SQL Statement that has exactly the same text as one already in the cache, Cloudscape grabs the statement from the cache, even if the Statement has already been closed from the application. This is useful if your application allows the user to create ad-hoc queries; if the user happens to run the same query more than once, Cloudscape can grab the statement out of the statement cache.

    You can adjust the size of the statement cache. (See cloudscape.language.statementCacheSize.)

NOTE: The scroll type of the statement (forward-only vs. scrolling insensitive) of a statement must be identical, in addition to the SQL text, for there to be a match.

Figure 3-2 A connection can reuse a Statement object when the SQL text matches a prior statement exactly. (PreparedStatements are much more efficient.)

Shielding Users from Cloudscape Class-Loading Events

JVMs tend to load classes as they are needed, which means the first time you need a class in a piece of software, it takes longer to use.

Cloudscape has three clear cases when a lot of class loading occurs:

  • when the system boots

    The system boots when you load the embedded driver, COM.cloudscape.core.JDBCDriver. In a server framework, the system boots when you start the server framework. Booting Cloudscape loads basic Cloudscape classes.

  • when the first database boots

    Booting the first database loads some more Cloudscape classes. The default behavior is that the first database boots when the first connection is made to it. You can also configure the system to boot databases at startup. Depending on your application, one or the other may be preferable.

  • when you compile the first query

    Compiling the first query loads additional classes. You can configure your system so that these classes are loaded when the database boots; when so configured, Cloudscape executes a built-in query synchronously at startup. See cloudscape.language.preloadClasses.

For any of these events, you can control the impact they have on users by starting them in separate threads while other tasks are occurring.

In addition, if you are using PreparedStatements instead of stored prepared statements, prepare them in a separate thread in the background while other tasks are occurring. Even if you are using stored prepared statements, you must still create a Statement or PreparedStatement that executes the stored prepared statement. Do that in a background thread as well.

Tuning Tips for Multi-User Systems

  • For concurrency, use row-level locking and the READ_COMMITTED isolation level.
  • For read-only applications, use table-level locking and the READ_COMMITTED isolation level.
  • Boot databases at startup to minimize the impact of connecting.
  • Where possible, create stored prepared statements. Query plans for stored prepared statements can be shared across connections.

Tuning Tips for Single-User Systems

  • Cloudscape boots when you first load the embedded JDBC driver (COM.cloudscape.core.JDBCDriver). Load this driver during the least time-sensitive portion of your program, such as when it is booting or when you are waiting for user input. For server frameworks, the driver is loaded automatically when the server boots.
  • Boot the database at connection (the default behavior), not at startup. Connect in a background thread if possible.
  • Turn off row-level locking and use READ_COMMITTED isolation level. For single-connection applications, use SERIALIZABLE isolation level.