Internal Language Transformations
Page 5 of 8

Subquery Processing and Transformations

Subqueries are notoriously expensive to evaluate. This section describes some of the transformations that Cloudscape makes internally to reduce the cost of evaluating them.

Materialization

Materialization means that a subquery is evaluated only once. A subquery can be materialized if it is a noncorrelated expression subquery. A correlated subquery is one that references columns in the outer query, and so has to be evaluated for each row in the outer query.

For example:

SELECT * FROM Hotels WHERE hotel_id = (SELECT MAX(hotel_id) FROM HotelAvailability)

In this statement, the subquery needs to be evaluated only once.

This type of subquery must return only one row. If evaluating the subquery causes a cardinality violation (if it returns more than one row), an exception will be thrown at the beginning of execution.

Subquery materialization is detected prior to optimization, which allows the optimizer to see a materialized subquery as an unknown constant value. The comparison is thus optimizable.

In other words, the original statement is transformed into the following two statements:

constant = SELECT MAX(hotel_id) FROM HotelAvailability

SELECT * FROM hotels
WHERE hotel_id = constant

The second statement is optimizable.

Flattening a Subquery into a Normal Join

Subqueries are allowed to return more than one row when used with IN, EXISTS, and ANY. However, for each row returned in the outer row, Cloudscape evaluates the subquery until it returns one row--it doesn't evaluate the subquery for all rows returned.

For example, given two tables, t1 and t2:

c1

1

2

3

c1

2

2

1

and the following query:

SELECT c1 FROM t1 WHERE c1 IN (SELECT c1 FROM t2)

the results would be

1
2

Simply selecting t1.c1 when simply joining those tables has different results:

SELECT t1.c1 FROM t1, t2 WHERE t1.c1 = t2.c1

1
2
2

Statements that include such subqueries can be flattened into joins only if the subquery does not introduce any duplicates into the result set (in our example, the subquery introduced a duplicate and so cannot simply be flattened into a join). If this requirement and other requirements (listed below) are met, however, the statement is flattened such that the tables in the subquery's FROM list are treated as if they were inner to the tables in the outer FROM list.

For example, our query could have been flattened into a join if c2 in t2 had a unique index on it. It wouldn't have introduced any duplicate values into the result set.

The requirements for flattening into a normal join are:

  • The subquery is not under an OR.
  • The subquery type is EXISTS, IN, or ANY, or it is an expression subquery on the right side of a comparison operator.
  • The subquery is not in the SELECT list of the outer query block.
  • There are no aggregates in the SELECT list of the subquery.
  • The subquery does not have a GROUP BY clause.
  • There is a uniqueness condition that ensures that the subquery does not introduce any duplicates if it is flattened into the outer query block.
  • Each table in the subquery's FROM list (after any view, derived table, or subquery flattening) must be a base table.
  • If there is a WHERE clause in the subquery, there is at least one table in the subquery whose columns are in equality predicates with expressions that do not include any column references from the subquery block. These columns must be a superset of the key columns for any unique index on the table. For all other tables in the subquery, the columns in equality predicates with expressions that do not include columns from the same table are a superset of the unique columns for any unique index on the table.

Flattening into a normal join gives the optimizer more options for choosing the best query plan. For example, if the following statement:

SELECT huge.* FROM huge
WHERE c1 IN (SELECT c1 FROM tiny)

can be flattened into

SELECT huge.* FROM huge, tiny WHERE huge.c1 = tiny.c1

the optimizer can choose a query plan that will scan tiny and do a few probes into the huge table instead of scanning the huge table and doing a large number of probes into the tiny table.

Here is an expansion of the example used earlier in this section. Given

CREATE TABLE t1 (c1 INT)

CREATE TABLE t2 (c1 INT PRIMARY KEY)

CREATE TABLE t3 (c1 INT PRIMARY KEY)

INSERT INTO t1 VALUES (1), (2), (3)

INSERT INTO t2 VALUES (1), (2), (3)

INSERT INTO t3 VALUES (2), (3), (4)

this query

SELECT t1.* FROM t1 WHERE t1.c1 IN
    (SELECT t2.c1 FROM t2, t3 WHERE t2.c1 = t3.c1)

should return the following results:

2
3

The query satisfies all the requirements for flattening into a join, and the statement can be transformed into the following one:

SELECT t1.*
FROM t1, t2, t3
WHERE t1.c1 = t2.c1
AND t2.c1 = t3.c1
AND t1.c1 = t3.c1

The following query:

SELECT t1.*
FROM t1 WHERE EXISTS
(SELECT * FROM t2, t3 WHERE t2.c1 = t3.c1 AND t2.c1 = 3)

can be transformed into

SELECT t1.*
FROM t1, t2, t3
WHERE t1.c1 = t2.c1
AND t2.c1 = t3.c1
AND t1.c1 = t3.c1

Flattening a Subquery into an EXISTS Join

An EXISTS join is a join in which the right side of the join needs to be probed only once for each outer row. Using such a definition, an EXISTS join does not literally use the EXISTS keyword. Cloudscape treats a statement as an EXISTS join when there will be at most one matching row from the right side of the join for a given row in the outer table.

A subquery that cannot be flattened into a normal join because of a uniqueness condition can be flattened into an EXISTS join if it meets all the requirements (see below). Do you remember the first example in the previous section ( Flattening a Subquery into a Normal Join)?

SELECT c1 FROM t1 WHERE c1 IN (SELECT c1 FROM t2)

This query could not be flattened into a normal join because such a join would return the wrong results. However, this query can be flattened into a join recognized internally by the Cloudscape system as an EXISTS join. When processing an EXISTS join, Cloudscape knows to stop processing the right side of the join after a single row is returned. The transformed statement would look something like this:

SELECT c1 FROM t1, t2
WHERE t1.c1 = t2.c1
EXISTS JOIN INTERNAL SYNTAX

Requirements for flattening into an EXISTS join:

  • The subquery is not under an OR.
  • The subquery type is EXISTS, IN, or ANY.
  • The subquery is not in the SELECT list of the outer query block.
  • There are no aggregates in the SELECT list of the subquery.
  • The subquery does not have a GROUP BY clause.
  • The subquery has a single entry in its FROM list that is a base table.
  • None of the predicates in the subquery, including the additional one formed between the left side of the subquery operator and the column in the subquery's SELECT list (for IN or ANY subqueries), include any subqueries, method calls, or field accesses.

When a subquery is flattened into an EXISTS join, the table from the subquery is made join-order-dependent on all the tables with which it is correlated. This means that a table must appear inner to all the tables on which it is join-order-dependent. (In subsequent releases this restrictions may be relaxed.) For example:

SELECT t1.* FROM t1, t2
WHERE EXISTS (SELECT * FROM t3 WHERE t1.c1 = t3.c1)

gets flattened into

SELECT t1.* FROM t1, t2, t3 WHERE t1.c1 = t3.c1

where t3 is join order dependent on t1. This means that the possible join orders are (t1, t2, t3), (t1, t3, t2), and (t2, t1, t3).

Flatting VALUES Subqueries

Cloudscape flattens VALUES subqueries to improve performance.

DISTINCT Elimination in IN, ANY, and EXISTS Subqueries

An IN, ANY, or EXISTS subquery evaluates to true if there is at least one row that causes the subquery to evaluate to true. These semantics make a DISTINCT within an IN, ANY, or EXISTS subquery unnecessary. The following two queries are equivalent and the first is transformed into the second:

SELECT * FROM t1 WHERE c1 IN
    (SELECT DISTINCT c2 FROM t2 WHERE t1.c3 = t2.c4)
SELECT * FROM t1 WHERE c1 IN
    (SELECT c2 FROM t2 WHERE t1.c3 = t2.c4)

IN/ANY Subquery Transformation

An IN or ANY subquery that is guaranteed to return at most one row can be transformed into an equivalent expression subquery (a scalar subquery without the IN or ANY). The subquery must not be correlated. Subqueries guaranteed to return at most one row are:

  • Simple VALUES clauses
  • SELECTs returning a nongrouped aggregate

For example:

WHERE C1 IN (SELECT MIN(c1) FROM T)

can be transformed into

WHERE C1 = (SELECT MIN(c1) FROM T)

This transformation is considered before subquery materialization. If the transformation is performed, the subquery becomes materializable. In the example, if the IN subquery were not transformed, it would be evaluated anew for each row.

The subquery type transformation is shown in Table A-1:

Table A-1 IN or ANY Subquery Transformations for Subqueries Returning a Single Row

Before Transformation

After Transformation

c1 IN (SELECT ...)

c1 = (SELECT ...)

c1 = ANY (SELECT ...)

c1 = (SELECT ...)

c1 <> ANY (SELECT ...)

c1 <> (SELECT ...)

c1 > ANY (SELECT ...)

c1 > (SELECT ...)

c1 >= ANY (SELECT ...)

c1 >= (SELECT ...)

c1 < ANY (SELECT ...)

c1 < (SELECT ...)

c1 <= ANY (SELECT ...)

c1 <= (SELECT ...)