Performance Tuning 3
Optimizing SQL
Statements
This chapter describes how Oracle
optimizes Structured Query Language (SQL) using the cost-based optimizer (CBO).
- Approaches to SQL Statement Tuning
- Tuning Goals
- Best Practices
- SQL Tuning Tips
- Using EXISTS versus IN
- Trouble Shooting
- Tuning Distributed Queries
Although some Oracle tools and applications mask the
use of SQL, all database operations are performed using SQL. Any other data
access method circumvents the security built into Oracle and potentially
compromises data security and integrity.
|
- Restructuring the Indexes
- Restructuring the Statement
- Modifying or Disabling Triggers
- Restructuring the Data
- Keeping Statistics Current and Using Plan
Stability to Preserve Execution Plans
The guidelines described in this section are
oriented to production SQL that will be executed frequently. Most of the
techniques that are discouraged here can legitimately be employed in ad hoc
statements or in applications run infrequently where performance is not
critical.
|
Restructuring the indexes is a good
starting point, because it has more impact on the application than does
restructuring the statement or the data.
- Remove nonselective
indexes to speed the DML.
- Index
performance-critical access paths.
- Consider
hash clusters, but watch uniqueness.
- Consider
index clusters only if the cluster keys are similarly sized.
Do not use indexes as a panacea.
Application developers sometimes think that performance will improve if they
write more indexes. If a single programmer creates an appropriate index, then
this might indeed improve the application's performance. However, if 50
programmers each create an index, then application performance will probably be
hampered!
After restructuring the indexes, you can
try restructuring the statement. Rewriting an inefficient SQL statement is
often easier than repairing it. If you understand the purpose of a given statement,
then you may be able to quickly and easily write a new statement that meets the
requirement.
Because SQL is a flexible language, more
than one SQL statement may meet the needs of your application. Although two SQL
statements may produce the same result, Oracle may process one faster than the
other. You can use the results of the EXPLAIN PLAN statement
to compare the execution plans and costs of the two statements and determine
which is more efficient.
This example shows the execution plans for
two SQL statements that perform the same function. Both statements return all
the departments in the dept table
that have no employees in the emp table.
Each statement searches the emp table
with a subquery. Assume there is an index, deptno_index,
on the deptno column
of the emp table.
Step
3 of the output indicates that Oracle executes this statement by performing a
full table scan of the emp table
despite the index on the deptno column.
This full table scan can be a time-consuming operation. Oracle does not use the
index, because the subquery that searches the emp table
does not have a WHERE clause
that makes the index available.
The WHERE clause
of the subquery refers to the deptno column
of the emp table,
so the index deptno_index is
used. The use of the index is reflected in step 3 of the execution plan. The
index range scan ofdeptno_index takes
less time than the full scan of the emp table
in the first statement. Furthermore, the first query performs one full scan of
the emp table
for every deptno in
the dept table.
For these reasons, the second SQL statement is faster than the first.
If you have statements in your
applications that use the NOT IN operator,
as the first query in this example does, then you should consider rewriting
them so that they use the NOT EXISTS operator.
This would allow such statements to use an index, if one exists.
Use equijoins whenever possible. Without
exception, statements that perform equijoins on untransformed column values are
the easiest to tune.
Join order can have a significant effect
on performance. The main objective of SQL tuning is to avoid performing
unnecessary work to access rows that do not affect the result. This leads to
three general rules:
- Avoid a
full-table scan if it is more efficient to get the required rows through
an index.
- Avoid
using an index that fetches 10,000 rows from the driving table if you
could instead use another index that fetches 100 rows.
- Choose
the join order so as to join fewer rows to tables later in the join order.
The first three
conditions in the example above are filter conditions applying to only a single
table each. The last two conditions are join conditions.
Filter conditions
dominate the choice of driving table and index. In general, the driving table
should be the one containing the filter condition that eliminates the highest
percentage of the table. Thus, if the range of :alowto
:ahigh is
narrow compared with the range of acol,
but the ranges of :b* and :c*
are relatively large, then taba should
be the driving table, all else being equal.
After you know your
driving table, choose the most selective index available to drive into that
table. Alternatively, choose a full table scan if that would be more efficient.
From there, the joins should all happen through the join indexes, the indexes
on the primary or foreign keys used to connect that table to an earlier table
in the join tree. Rarely should you use the indexes on the non-join conditions,
except for the driving table. Thus, after taba is
chosen as the driving table, you should use the indexes on b.key1 and c.key2 to
drive into tabb and tabc,
respectively.
The work of the
following join can be reduced by first joining to the table with the best
still-unused filter. Thus, if "bcol BETWEEN ..."
is more restrictive (rejects a higher percentage of the rows seen) than "ccolbetween ...",
the last join can be made easier (with fewer rows) if tabb is
joined before tabc.
Do not use SQL functions in predicate
clauses or WHERE clauses.
The use of an aggregate function, especially in a subquery, often indicates
that you could have held a derived value on a master record.
Avoid mixed-mode expressions, and beware
of implicit type conversions. When you want to use an index on the VARCHAR2 column charcol,
but the WHERE clause
looks like this:
Where numexpr is
an expression of number type (for example, 1, USERENV('SESSIONID'), numcol, numcol+0,...),
Oracle translates that expression into:
- Any
expression using a column, such as a function having the column as its
argument, causes the optimizer to ignore the possibility of using an index
on that column, even a unique index.
- If the
system processes even a single row having charcol as a string of characters that does not translate
to a number, then an error is returned.
allows use of an index on numcol,
because the default conversion is always character-to-number. This behavior,
however, is subject to change. Making type conversions explicit also makes it
clear that charexpr should
always translate to a number.
SQL is not a procedural language. Using
one piece of SQL to do many different things is not a good idea: it usually
results in a less-than-optimal result for each task. If
you want SQL to accomplish different things, then write two different
statements rather than writing one statement that will do different things
depending on the parameters you give it.
Optimization (determining the execution
plan) takes place before the database knows what values will be substituted
into the query. An execution plan should not, therefore, depend on what those
values are. For example:
Written as shown, the database cannot use
an index on the somecolumn column,
because the expression involving that column uses the same column on both sides
of the BETWEEN.
This is not a problem if there is some
other highly selective, indexable condition you can use to access the driving
table. Often, however, this is not the case. Frequently, you may want to use an
index on a condition like that shown, but need to know the values of :loval,
and so on, in advance. With this information, you can rule out the ALL case,
which should not use
the index.
If you want to use the index whenever real
values are given for :loval and
:hival (that
is, if you expect narrow ranges, even ranges where :loval often
equals :hival), then you can rewrite
the example in the following logically equivalent form:
If you run EXPLAIN PLAN on
the new query, then you seem to get both a desirable and an undesirable
execution plan. However, the first condition the database evaluates for either
half of the UNION ALL is
the combined condition on whether :hival and :loval are ALL.
The database evaluates this condition before actually getting any rows from the
execution plan for that part of the query.
When the condition comes back false for
one part of the UNION ALL query,
that part is not evaluated further. Only the part of the execution plan that is
optimum for the values provided is actually carried out. Because the final
conditions on :hival and :loval are
guaranteed to be mutually exclusive, then only one half of the UNION ALL actually
returns rows. (The ALL in UNION ALL is
logically valid because of this exclusivity. It allows the plan to be carried
out without an expensive sort to rule out duplicate rows for the two halves of
the query.)
Use optimizer hints, such as /*+ORDERED */
to control access paths. This is a better approach than using traditional
techniques or "tricks of the trade" such as CUST_NO +
0. For example, use
The real objective in the WHERE clause
above is to determine whether the mode of transport is an automobile, and not
to identify a particular make. A reference table should be available in which
transport type = 'AUTOMOBILE'.
Minimize the use of DISTINCT. DISTINCT always
creates a sort; all the data must be instantiated before your results can be
returned.
When appropriate, use INSERT, UPDATE,
or DELETE... RETURNING to
select and modify data with a single call. This technique improves performance
by reducing the number of calls to the database.
Be careful when joining views, when
performing outer joins to views, and when you consider recycling views.
The shared SQL area in Oracle reduces the
cost of parsing queries that reference views. In addition, optimizer
improvements make the processing of predicates against views very efficient.
Together, these factors make possible the use of views for ad hoc queries.
Despite this, joins to views are not recommended, particularly joins from one
complex view to another.
The following example shows a query upon a
column which is the result of a GROUP BY.
The entire view is first instantiated, and then the query is run against the
view data.
Setting the UNNEST_SUBQUERY session
parameter to TRUE enables
subquery unnesting. Subquery unnesting unnests and merges the body of the
subquery into the body of the statement that contains it, allowing the
optimizer to consider them together when evaluating access paths and joins.
This parameter not cost based, and it is
not set by default. UNNEST_SUBQUERY first
verifies if the statement is valid. If the statement is not valid, then
subquery unnesting cannot proceed. The statement must then must pass a
heuristic test.
The UNNEST hint
checks the subquery block for validity only. If it is valid, then subquery
unnesting is enabled without Oracle checking the heuristics. If you enabled
subquery unnesting with the UNNEST_SUBQUERYparameter,
then the NO_UNNEST hint
turns it off for specific subquery blocks.
Because subquery unnesting generates
views, some views will be merged in the main query block with complex view
merging. When the subquery contains an aggregate function, it is a good idea to
have complex view merging enabled. This allows the inline view generated by
unnesting to be merged in the main query block.
An outer join to a multi-table view can be
problematic. For example, you may start with the usual emp and dept tables
with indexes on e.empno, e.deptno,
and d.deptno,
and create the following view:
You may then construct the simplest
possible query to do an outer join into this view on an indexed column (e.deptno)
of a table underlying the view:
Until both tables of the view are joined,
the optimizer does not know whether the view will generate a matching row. The
optimizer must therefore generate all the
rows of the view and perform a MERGE JOIN OUTER with
all the rows returned from the rest of the query. This approach would be
extremely inefficient if all you want is a few rows from a multi-table view
with at least one very large table.
Solving the problem in the preceding
example is relatively easy. The second reference to dept is
not needed, so you can do an outer join straight to emp.
In other cases, the join need not be an outer join. You can still use the view
simply by getting rid of the (+) on the join into the view.
Beware of writing a view for one purpose
and then using it for other purposes, to which it may be ill-suited. Consider
this example:
You can obtain dname and deptno directly
from the dept table.
It would be inefficient to obtain this information by querying the DX view
(which was declared earlier in the present example). To answer the query, the
view would perform a join of the dept and emp tables,
even though you do not need any data from the emp table.
Using triggers consumes system resources.
If you use too many triggers, then you may find that performance is adversely
affected and you may need to modify or disable them.
- Introduce
derived values. Avoid GROUP BY in
response-critical code.
- Implement
missing entities and intersection tables.
- Reduce
the network load. Migrate, replicate, partition data.
The overall purpose of any strategy for
data distribution is to locate each data attribute such that its value makes
the minimum number of network journeys. If the current number of journeys is
excessive, then moving (migrating) the data is a natural solution.
Often, however, no single location of the
data reduces the network load (or message transmission delays) to an acceptable
level. In this case, consider either holding multiple copies (replicating the
data) or holding different parts of the data in different places (partitioning
the data).
Where distributed queries are necessary,
it may be effective to code the required joins with procedures either in PL/SQL
within a stored procedure, or within the user interface code.
When considering a cross-network join, you
can either bring the data in from a remote node and perform the join locally,
or you can perform the join remotely. The option you choose should be
determined by the relative volume of data on the different nodes.
After you have tuned your application's
SQL statements, consider maintaining statistics with the useful procedures of
the DBMS_STATS package.
Also consider implementing plan stability to maintain application performance
characteristics despite system changes.
Structured Query Language (SQL) is used to
perform all database operations, although some Oracle tools and applications
simplify or mask its use. This chapter provides an overview of the issues
involved in tuning database operations from the SQL point-of-view.
Always approach the tuning of database
operations from the standpoint of the particular goals of your application. Are
you tuning serial SQL statements or parallel operations? Do you have an online
transaction processing (OLTP) application or a data warehousing application?
- Data
warehousing operations process high volumes of data, and they have a high
correlation with the goals of parallel operations.
- OLTP
applications have a large number of concurrent users, and they correlate
more with serial operations.
As a result, these two divergent types of
applications have contrasting goals for tuning as described in Table 9-1.
The goal of tuning one SQL statement in
isolation is: Minimize resource use by the
operation being performed.
You can experiment with alternative SQL
syntax without actually modifying your application. To do this, use the EXPLAIN PLAN statement
with the alternative statement that you are considering, and compare the
alternative statement's execution plan and cost with that of the existing one.
The cost of a SQL statement appears in the POSITION column
of the first row generated by EXPLAIN PLAN.
You must run the application to see which statement can actually be executed
more quickly.
If you have a powerful system and a
massive, high-priority SQL statement to run, then parallelize the statement so
that it uses all available resources.
- Parallel
query
- Parallel
DML (includes INSERT, UPDATE, DELETE; APPEND hint, parallel index scans)
- Parallel
DDL
- Parallel
recovery
- Parallel
loading
- Parallel
propagation (for replication)
Whenever an operation
you are performing in the database takes a long time, whether it is a query or
a batch job, you may be able to reduce the elapsed time by using parallel
operations.
You can also use parallel execution to
access object types within an Oracle database. For example, you can use
parallel execution to access Large Binary Objects (LOBs).
- Symmetric
multi-processors (SMP), clusters, or massively parallel systems.
- Sufficient
I/O bandwidth.
- Under-utilized
or intermittently used CPUs (for example, systems where CPU usage is
typically less than 30%).
- Sufficient
memory to support additional memory-intensive processes, such as sorts,
hashing, and I/O buffers.
If your system lacks any of these
characteristics, then parallel execution may
not significantly
improve performance. In fact, parallel execution can reduce
system performance on over-utilized systems or systems with small I/O
bandwidth.
Parallel execution provides the best
performance improvements in decision support systems (DSS). However, online
transaction processing (OLTP) systems also benefit from parallel execution; for
example, parallel index creation greatly benefits ecommerce businesses where
there is little scheduled downtime.
During the day, most OLTP systems should
probably not use parallel execution. During off-hours, however, parallel
execution can effectively process high-volume batch operations. For example, a
bank might use parallelized batch programs to perform millions of updates to
apply interest to accounts.
Tuning OLTP applications mostly involves
tuning serial SQL statements. You should consider two design issues: use of SQL
and shared PL/SQL, and use of different transaction modes.
To minimize parsing, use bind variables in
SQL statements within OLTP applications. This way, all users can share the same
SQL statements while using fewer resources for parsing.
Sophisticated users can use discrete
transactions if performance is of the utmost importance, and if the users are
willing to design the application accordingly.
Serializable transactions can be used if
the application must be ANSI compatible. Because of the overhead inherent in
serializable transactions, Oracle strongly recommends the use of read-committed
transactions instead.
If excessive use of triggers degrades
system performance, then modify the conditions under which triggers fire by
executing the CREATE TRIGGER or CREATE
OR REPLACE TRIGGER statements.
You can also turn off triggers with the ALTER TRIGGER statement.
Excessive use of triggers for frequent events such
as logons, logoffs, and error events can degrade performance, because these
events affect all users.
|
This section documents the best practices
for developing and tuning SQL with the cost-based optimizer (CBO). This
includes the following:
- Avoiding Rule-Based Optimizer Techniques
- Index Cost
- Optimizing SQL Statements
- Avoiding Complex Expressions
- Optimizing SQL Statements
- Handling Complex Logic in the Application
Because the CBO is cost
based, it is not necessary to force or disable a particular index. The CBO
chooses the access path with the best cost.
The CBO chooses the most
efficient join order based on cost after permuting the possible join graphs.
Hence, there is no need, or benefit, to ordering the FROM clause
under the CBO.
In the following example, the CBO may
choose a full table scan if the index probe on employee_num is
too costly (e.g., the estimated cardinality for employees having employee
numbers beginning with 20 is high).
The following example illustrates the cost
model and selectivity of a query which, under the RBO, used an inefficient
index. The CBO chooses a more efficient plan.
NULL )
===================================
FLAG
This illustrates that the cost of the RBO
plan is significantly higher than that of the the default CBO generated plan.
In order to avoid the full table scan, the
query can be rewritten in order to optimize by using a more selective filter.
In this case, the expenditure group is rather selective, but the NVL()
function prevented an index from being used.
NULL)
Although there is a full table scan on the pa_expenditure_types table, this is only a small lookup
table.
|
These expressions prevent the optimizer
from assigning valid cardinality or selectivity estimates, and can in turn
affect the overall plan and the join method.
The balloon tactic is when a developer
chooses to write a single complex SQL statement which incorporates complex
application and business logic, as opposed to writing a few simple queries to
achieve the same results. Developing a very large complex SQL statement has
performance implications in terms of sharable memory and optimization. Coding a
few simple queries in place of a single complex query is a better approach,
because the individual SQL statements are easier to optimize and maintain.
Oracle Forms and Reports are powerful
development tools which allow application logic to be coded using PL/SQL
(triggers or program units). This helps reduce the complexity of SQL by
allowing complex logic to be handled in the Forms or Reports. In addition, you
can also invoke a server side PL/SQL package which performs the few SQL
statements in place of a single large complex SQL statement. Because the package
is a server-side unit, there are no issues surrounding client to database
round-trips and network traffic.
Complex logic should be handled in the
application via Oracle Forms triggers, PL/SQL logic, or C-Code.
The following steps were taken to improve
the above query, which accessed a complex view with many outer joins:
- Rewrote
the SQL statement and eliminated 6 table joins.
- Added a
Forms post query trigger to populate address type fields.
- Reduced
the number of rows processed.
Table 9-2 lists
recommended tuning tips you should implement during your SQL statement design
phase:
|
|
|
|
It is important that you generate and
review execution plans for all your SQL statements to ensure optimal
performance.
Predicate collapsing occurs when a column
predicate involves more than one bind variable. An expression of the form [col = DECODE (:b1,'',:b3,col)] is
a an example of predicate collapsing. This implies that if the bind variable 1
is null, then the bind variable 3 should be used; otherwise, the expression
will result in [ col = col].
This prevents the optimizer from utilizing the index on the "col"
column due to the decode construct.
The following example demonstrates how
predicate collapsing is used to collapse a name bind variable with the delivery_id bind
variable in a single filter. As can be seen from the EXPLAIN PLAN,
this results in a full table scan on the wsh_deliveries table
because of the NVL()
construct on the delivery_id column,
as well as the DECODE()
construct on the name column.
This query can be rewritten using a UNION to
short-circuit one-side of the UNION based
on the bind variable values. For example, if the delivery_id bind
is supplied, only the first branch of the UNION is
executed.
If a value for the name bind variable is
supplied, then the second branch of the UNION is
executed. In either case, both sides of the UNION use
rather selective indexes on either the delivery_id column
or the name column. This is much more efficient than the original query which
performed a full table scan.
The following example illustrates how a
query can be optimized for the general case. Specifically, this purchasing
query determines the list of approvers which can approve a purchase order for a
given organizational structure. However, in most cases, the end user provides
the approver name via a name pattern, and, therefore, it is not necessary to
scan all the approvers.
If 'consistent gets' or 'physical reads'
are high relative to the amount of data returned, then this a sign that the
query is expensive and needs to be reviewed for optimization.
For example, if you are expecting less
than 1,000 rows back and 'consistent gets' is 1,000,000 and 'physical reads' is
10,000, then this query needs to be further optimized.
selective criteria on
the outer query. In this case, an IN operator is more
appropriate.
The IN operator enables Oracle
to drive off of the delivery_id index, which is
rather selective.
In this example, the use of EXISTS results
in a full table scan because there is no selective criteria on the outer query.
In this case, an IN operator
is more appropriate. The IN operator
enables Oracle to drive off of therequest_id index,
which is rather selective.
This section documents the steps and
procedures involved with diagnosing a CBO execution plan for a given SQL
statement:
Oracle supports transparent distributed
queries to access data from multiple databases. It also provides many other
distributed features, such as transparent distributed transactions and a
transparent, fully automatic two-phase commit. This section explains how the
Oracle8i optimizer decomposes SQL
statements and how this affects the performance of distributed queries. The
section also provides guidelines on how to influence the optimizer and avoid
performance bottlenecks.
- Remote and Distributed Queries
- Distributed Query Restrictions
- Transparent Gateways
- Optimizing Performance of Distributed Queries
If a SQL statement references one or more
remote tables, then the optimizer first determines whether all remote tables
are located at the same site. If all tables are located at the same remote
site, then Oracle sends the entire query to the remote site for execution. The
remote site sends the resulting rows back to the local site. This is called a remote SQL
statement. If the tables are located at more than one site, then the optimizer
decomposes the query into separate SQL statements to access each of the remote
tables. This is called a distributed SQL
statement. The site where the query is executed, called the driving site,
is usually the local site.
- Remote Data Dictionary Information
- Remote SQL Statements
- Distributed SQL Statements
- EXPLAIN PLAN and SQL Decomposition
- Partition Views
If a SQL statement references multiple
tables, then the optimizer must determine which columns belong to which tables
before it can decompose the SQL statement. For example:
The optimizer must first determine that
the dname column
belongs to the dept table
and the ename column
to the emp table.
After the optimizer has the data dictionary information of all remote tables,
it can build the decomposed SQL statements.
Column and table names in decomposed SQL
statements appear between double quotes. You must enclose in double quotes any
column and table names that contain special characters, reserved words, or
spaces.
This mechanism also replaces an asterisk
(*) in the select list with the actual column names. For example:
If the entire SQL statement is sent to the
remote database, then the optimizer uses table aliases A1, A2,
and so on, for all tables and columns in the query, in order to avoid possible
naming conflicts. For example:
When a query accesses data on one or more
databases, one site drives the
execution of the query. This is known as the driving site;
it is here that the data is joined, grouped, and ordered. By default, the local
Oracle server is the driving site. A hint called DRIVING_SITE enables
you to manually specify the driving site.
The decomposition of SQL statements is
important, because it determines the number of records or even tables that must
be sent through the network. A knowledge of how the optimizer decomposes SQL
statements can help you achieve optimum performance for distributed queries.
If a SQL statement references one or more
remote tables, then the optimizer must decompose the SQL statement into
separate queries to be executed on the different databases. For example:
Which is sent to the remote database. The
data from both tables is joined locally. All this is done automatically and
transparently for the user or application.
In some cases, however, it might be better
to send the local table to the remote database and join the two tables on the
remote database. This can be achieved either by creating a view or by using the DRIVING_SITE hint.
If you decide to create a view on the remote database, then a database link
from the remote database to the local database is also needed.
Now, the local dept table
is sent through the network to the remote database, joined on the remote
database with the emp table,
and the result is sent back to the local database.
The rule-based optimizer does not have
information about indexes for remote tables. It never, therefore, generates a
nested loops join between a local table and a remote table with the local table
as the outer table in the join. It uses either a nested loops join with the
remote table as the outer table or a sort merge join, depending on the indexes
available for the local table.
The cost-based optimizer can consider more
execution plans than the rule-based optimizer. The cost-based optimizer knows
whether indexes on remote tables are available, and in which cases it makes sense
to use them. The cost-based optimizer considers index access of the remote
tables as well as full table scans, whereas the rule-based optimizer considers
only full table scans.
The particular execution plan and table
access that the cost-based optimizer chooses depends on the table and index
statistics. For example:
Here, the optimizer might choose the local dept table
as the driving table, and access the remote emp table
using an index; so the decomposed SQL statement becomes the following:
If tables are on more than one remote
site, then it can be more effective to create a view than to use the DRIVING_SITE hint.
If not all tables are on the same remote database, then the optimizer accesses
each remote table separately. For example:
To join the two emp tables
remotely, create a view with the join of the remote tables on the remote
database. For example (on the remote database):
In a distributed query, all hints are
supported for local tables. For remote tables, however, you can use only join
order and join operation hints. (Hints for access methods, parallel hints, and
so on, have no effect.) For remote mapped queries, all hints are supported.
EXPLAIN PLAN gives
information not only about the overall execution plan of SQL statements, but
also about the way in which the optimizer decomposes SQL statements. EXPLAIN PLAN stores
information in thePLAN_TABLE table.
If remote tables are used in a SQL statement, then the OPERATION column
contains the value REMOTE to
indicate that a remote table is referenced, and the OTHER column
contains the decomposed SQL statement that will be sent to the remote database.
For example:
Partition views coalesce tables that have
the same structure, but that contain different partitions of data. Partition
views are supported for distributed databases where each partition resides on a
database, and the data in each partition has common geographical properties.
When a query is executed on a partition
view, and when the query contains a predicate that contains the result set to a
subset of the view's partitions, the optimizer chooses a plan which skips
partitions that are not needed for the query. This partition elimination takes
place at run time, when the execution plan references all partitions.
Partition views were the only form of
partitioning available in Oracle7 Release 7.3. They are not recommended for new
applications in Oracle8i.
Partition views that were created for Oracle7 databases can be converted to partitioned
tables by using the EXCHANGE PARTITION option
of the ALTER TABLE statement.
Oracle8i supports partition views only
for distributed queries and for backwards compatibility with Oracle7 Release
7.3. Future releases of Oracle will not support partition views.
|
There are circumstances under which a UNION ALL view
enables the optimizer to skip partitions. The Oracle server that contains the
partition view must conform to the following rules:
- The PARTITION_VIEW_ENABLED initialization parameter is set to true.
- The
cost-based optimizer is used.
To use the cost-based optimizer, you must analyze
all tables used in the UNION ALL views.
Alternatively, you can use a hint or set the parameter OPTIMIZER_MODE toALL_ROWS or FIRST_ROW. To set OPTIMIZER_MODE or PARTITION_VIEW_ENABLED, you can also use the ALTER SESSION statement.
|
Within a UNION ALL view,
there are multiple select statements, and each of these is called a branch.
A UNION ALL view
is a partition view if each select statement it defines conforms to the
following rules:
- The
branch has exactly one table in the FROM clause.
- The
branch contains a WHERE clause that defines the subset of data from the
partition that is contained in the view.
- None of
the following are used within the branch: WHERE clause with subquery, GROUP BY, aggregate functions, DISTINCT, ROWNUM, or CONNECT BY/START WITH.
- The SELECT list of
each branch is * or an explicit expansion of "*". The FROM clause
should be either the base table or a view of the base table that contains
all the columns in the base table.
- The
column names and column datatypes for all branches in the UNION ALL view are exactly the same.
- All
tables used in the branch must have indexes (if any) on the same columns
and number of columns.
Partition elimination is based on column
transitivity with constant predicates. The WHERE clause
used in the query that accesses the partition view is pushed down to the WHERE clause
of each of the branches in the UNIONALL view
definition. For example:
The "WHERE deptno=30"
predicate used in the query is pushed down to the queries in the UNION ALL view.
For a WHERE clause
such as "WHERE deptno=10 and deptno=30",
the optimizer applies transitivity rules to generate an extra predicate of
"10=30". This extra predicate is always false; thus, the table (emp@d10)
need not be accessed.
To confirm that the system recognizes a
partition view, check the EXPLAIN PLAN output.
The following operations appear in the OPERATIONS column
of the EXPLAIN PLAN output,
if a query was executed on a partition view:
|
|
|
|
|
If PARTITION does
not appear in the option column of the UNION-ALL operation,
then the partition view was not recognized, and no partitions were eliminated.
Make sure that the UNION ALL view
adheres to the rules defined in "Using
UNION ALL to Skip Partitions" .
The following example shows the partition
view customer partitioned
into two partitions: the east database
contains the East Coast customers, and the west database
contains the West Coast customers.
The following partition view is created at
the east database
(you could create a similar view at the west database):
If you execute the following statement,
then notice that the customer_west table
in the west database
is not accessed:
As shown in the EXPLAIN PLAN output,
the optimizer recognizes that the customer_west partition
need not be accessed:
- The
cost-based optimizer should be used for distributed queries. The
rule-based optimizer does not generate nested loop joins between remote
and local tables when the tables are joined with equijoins.
- In the
cost-based optimizer, no more than 20 indexes per remote table are
considered when generating query plans. The order of the indexes varies;
if the 20-index limitation is exceeded, then random variation in query
plans may result.
- Reverse
indexes on remote tables are not visible to the optimizer. This can
prevent nested-loop joins from being used for remote tables if there is an
equijoin using a column with only a reverse index.
- The
cost-based optimizer cannot recognize that a remote object is partitioned.
Thus, the optimizer may generate less than optimal plans for remote
partitioned objects, particularly when partition pruning would have been
possible, had the object been local.
- Remote
views are not merged, and the optimizer has no statistics for them. It is
best to replicate all mergeable views at all sites to obtain good query
plans. (See the next restriction.)
- Neither
the cost-based nor the rule-based optimizer can execute joins remotely.
All joins are executed at the driving site. This can affect performance
for CREATE TABLE ... AS SELECT if all
the tables in the select list are remote. In this case, you should create
a view for the SELECT statement
at the remote site.
The Transparent Gateways transparently
access data from a non-Oracle system (relational databases, hierarchical
databases, file systems, and so on), just as if it were another Oracle
database.
When a SQL statement accesses data from
non-Oracle systems, it is said to be a heterogeneous distributed SQL statement.
To optimize heterogeneous distributed SQL statements, follow the same
guidelines as for optimizing distributed SQL statements that access Oracle
databases only. However, you must consider that the non-Oracle system usually
does not support all the functions and operators that Oracle8i supports.
The Transparent Gateways tell Oracle (at
connect time) which functions and operators they do support. If the other data
source does not support a function or operator, then Oracle performs that
function or operator. In this case, Oracle obtains the data from the other data
source and applies the function or operator locally. This affects the way in
which the SQL statements are decomposed and can affect performance, especially
if Oracle is not on the same machine as the other data source.
You can use partition views with Oracle
Transparent Gateways release 8 or higher. Make sure you adhere to the rules
that are defined in "Using
UNION ALL to Skip Partitions". In particular:
- The
cost-based optimizer must be used, by using hints or setting the parameter OPTIMIZER_MODE to ALL_ROWS or FIRST_ROWS.
- Indexes
used for each partition must be the same. Consult your gateway- specific
documentation to find out whether the gateway sends index information of
the non-Oracle system to the Oracle Server. If the gateway sends index
information to the optimizer, then make sure that each partition uses the
same number of indexes, and that you have indexed the same columns. If the
gateway does not send index information, then the Oracle optimizer is not
aware of the indexes on partitions. Indexes are, therefore, considered to
be the same for each partition in the non-Oracle system. If one partition
resides on an Oracle server, then you cannot have an index defined on that
partition.
- The
column names and column datatypes for all branches in the UNION ALL view must be the same. Non-Oracle system
datatypes are mapped onto Oracle datatypes. Make sure that the datatypes
of each partition that reside in the different non-Oracle systems all map
to the same Oracle datatype. To see how datatypes are mapped onto Oracle
datatypes, execute a DESCRIBE statement in SQL*Plus.
In many cases, there are
several SQL statements which can achieve the same result. If all tables are on
the same database, then the difference in performance between these SQL
statements might be minimal; but, if the tables are located on different
databases, then the difference in performance might be more significant.
The cost-based optimizer
uses indexes on remote tables, considers more execution plans than the
rule-based optimizer, and generally gives better results. With the cost-based
optimizer, performance of distributed queries is generally satisfactory. Only
in rare occasions is it necessary to change SQL statements, create views, or
use procedural code.
In some rare occasions,
it can be more efficient to replace a distributed query by procedural code,
such as a PL/SQL procedure or a precompiler program. This option is mentioned
here only for completeness, not because it is often needed.
No comments:
Post a Comment