Performance tuning is a broad and somewhat complex topic area when it comes to Oracle databases. Two of the biggest questions faced by your average DBA concern where to start and what to do. All you may know is that someone (a user) reports a problem about a slow or poor performing application or query. Where do you even begin to start when faced with this situation?
Oracle's Approach to Tuning
For anyone who has taken the Performance Tuning exam for Oracle8i certification, one of the testable areas dealt with Oracle's Tuning Methodology. Oracle's emphasis on this particular methodology changed when Oracle9i was released. The approach has gone from top-down in 8i to that of following principles in 9i/10g. Neither methodology is absolute as each has its advantages and disadvantages. In Oracle8i, the steps consisted of the following:1. Tuning the Business Rules2. Tuning the Data Design3. Tuning the Application Design4. Tuning the Logical Structure of the Database5. Tuning Database Operations6. Tuning the Access Paths7. Tuning Memory Allocation8. Tuning I/O and Physical Structure9. Tuning Resource Contention10. Tuning the Underlying Platform(s)
With Oracle9i's principle-based approach, the principles, in order of priority, are:
Interestingly, the emphasis on identifying which step an action falls under went away with Oracle9i, and recitation of the principles is not a testable item. The title of documentation even changed between releases one and two, and that should send a clear signal that the art of performance tuning (or, performance and tuning) is still just that – an art. When it comes to instance tuning, the steps are even further reduced in Oracle10g.
The performance tuning guide for Oracle10g (Release 2) identifies the overall process as The Oracle Performance Improvement Method. The steps have been expanded, but overall, remain the same.
- 1. Perform the following initial standard checks:
- a. Get candid feedback from users. Determine the performance project's scope and subsequent performance goals, as well as performance goals for the future. This process is key in future capacity planning.
- b. Get a full set of operating system, database, and application statistics from the system when the performance is both good and bad. If these are not available, then get whatever is available. Missing statistics are analogous to missing evidence at a crime scene: They make detectives work harder and it is more time-consuming.
- c. Sanity-check the operating systems of all machines involved with user performance. By sanity-checking the operating system, you look for hardware or operating system resources that are fully utilized. List any over-used resources as symptoms for analysis later. In addition, check that all hardware shows no errors or diagnostics.
- 2. Check for the top ten most common mistakes with Oracle, and determine if any of these are likely to be the problem. List these as symptoms for later analysis. These are included because they represent the most likely problems. ADDM automatically detects and reports nine of these top ten issues.
- 3. Build a conceptual model of what is happening on the system using the symptoms as clues to understand what caused the performance problems.
- 4. Propose a series of remedy actions and the anticipated behavior to the system, then apply them in the order that can benefit the application the most. ADDM produces recommendations each with an expected benefit. A golden rule in performance work is that you only change one thing at a time and then measure the differences. Unfortunately, system downtime requirements might prohibit such a rigorous investigation method. If multiple changes are applied at the same time, then try to ensure that they are isolated so that the effects of each change can be independently validated.
- 5. Validate that the changes made have had the desired effect, and see if the user's perception of performance has improved. Otherwise, look for more bottlenecks, and continue refining the conceptual model until your understanding of the application becomes more accurate.
- 6. Repeat the last three steps until performance goals are met or become impossible due to other constraints.
The Change is Part of the Problem
The change from a top-down structured approach to a principle-based "make it stop hurting" one is part of the problem. Gathering statistics is obviously important because how else do you know if you have improved (or worsened) the problem? Still, to some degree with either approach, you are left with the original two questions: what do I look for, and how do I make it better? If the structured approach left you scratching your head, the principled approach only adds to the confusion.What would help the novice tuner (disclaimer: I am far from being an expert) is a list of items or areas to evaluate (configure, diagnose, and tune) in each of the following areas:
- Tuning the Buffer Cache
- Tuning the Redo Log Buffer
- Tuning the Shared Pool Memory
- Tuning the Program Global Area
- Optimizing Data Storage
- Optimizing Tablespaces
- Tuning Undo Segments
- Detecting Lock Contention
- Tuning SQL
17 Tips for Avoiding Problematic Queries
The 17 tips are listed below.1. Avoid Cartesian products2. Avoid full table scans on large tables3. Use SQL standards and conventions to reduce parsing4. Lack of indexes on columns contained in the WHERE clause5. Avoid joining too many tables6. Monitor V$SESSION_LONGOPS to detect long running operations7. Use hints as appropriate8. Use the SHARED_CURSOR parameter9. Use the Rule-based optimizer if I is better than the Cost-based optimizer10. Avoid unnecessary sorting11. Monitor index browning (due to deletions; rebuild as necessary)12. Use compound indexes with care (Do not repeat columns)13. Monitor query statistics14. Use different tablespaces for tables and indexes (as a general rule; this is old-school somewhat, but the main point is reduce I/O contention)15. Use table partitioning (and local indexes) when appropriate (partitioning is an extra cost feature)16. Use literals in the WHERE clause (use bind variables)17. Keep statistics up to date
That is quite a list and overall is thorough and accurate. Step 9, referring to the use of the Rule-based optimizer, may cause a reliance or dependency on a feature Oracle has identified as a future item to be deprecated. You are eventually going to have to solve the problem using the CBO, so you may as well start now and forget about the RBO. Step 14 should be changed to something along the lines of "reduce I/O contention" instead of its currently stated "separate index and table tablespaces" guidance.
In Closing
In the next article of this series, we will look at some specific steps of these tips. For example, advice given on many Web sites about how to improve a SQL statement's performance typically includes "use bind variables." Well, I am sure many people have this question: "How, exactly, do I do that?" It is actually pretty simple, as are many of the details of how to use many of these tips.Oracle Performance Tuning - Part 2
As mentioned in Part 1, there are several relatively easy steps you can take to improve performance. From the user's perspective, one of the most frequently used interfaces with a database involves SQL statements, so getting a handle on them is a good place to start in terms of being able to see an immediate improvement.
In the interest of being complete, I will cover some preliminary steps that will be needed in order to view what is taking place. These steps include running the plustrce SQL script, creating an "EXPLAIN_PLAN" table, granting a role, and configuring your SQL*Plus environment to see execution plans.
Preliminary Steps
If the PLUSTRACE role does not exist, create it using the PLUSTRCE SQL script found in ORACLE_HOME\sqlplus\admin. The script is pretty simple:drop role plustrace; create role plustrace; grant select on v_$sesstat to plustrace; grant select on v_$statname to plustrace; grant select on v_$mystat to plustrace; grant plustrace to dba with admin option;Check for the role using:
SQL> select role from dba_roles where role = 'PLUSTRACE'; ROLE ---------------- PLUSTRACEThe user must have (or have access to) a PLAN_TABLE (it can named something else, but for now, the "default" name is fine). This table is created using the UTLXPLAN SQL script found in ORACLE_HOME\rdbms\admin.
SQL> show user USER is "SYSTEM" SQL> @?\rdbms\admin\utlxplan Table created. SQL> create public synonym plan_table for system.plan_table; Synonym created. SQL> grant select, update, insert, delete on plan_table to <your user name>; Grant succeeded. SQL> grant plustrace to <your user name>; Grant succeeded.The user for these examples is HR (found in the sample schemas provided by Oracle).
SQL> conn hr/hr Connected. SQL> set autotrace on SQL> select * from dual; D - XWith autotrace set to on, you can confirm your ability to see an execution plan and some statistics. You should see output similar to the following:
Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=2 Card=1 Bytes=2) 1 0 TABLE ACCESS (FULL) OF 'DUAL' (TABLE) (Cost=2 Card=1 Bytes=2) Statistics ---------------------------------------------------------- 24 recursive calls 0 db block gets 6 consistent gets 1 physical reads 0 redo size 389 bytes sent via SQL*Net to client 508 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processedTo suppress the results of the query, use "traceonly" in the set statement.
Using Bind Variables
Here is a simple way to create and use a bind variable.SQL> variable department_id number SQL> begin 2 :department_id := 80; 3 end; 4 / PL/SQL procedure successfully completed. SQL> print department_id DEPARTMENT_ID ------------- 80Now let's make a comparison between querying for employee ID and name with and without the bind variable (with the output turned off using traceonly).
Now let's use the bind variable.
Okay, so the difference isn't that great (the cost went from 3 to 2), but this was a small example (the table only has 107 rows). Is there much of a difference when working with a larger table? Use the SH schema and its SALES table with its 900,000+ rows.
SQL> select prod_id, count(prod_id) 2 from sales 3 where prod_id > 130 4 group by prod_id;
Same query, but this time using a bind variable.
SQL> variable prod_id number SQL> begin 2 :prod_id := 130; 3 end; 4 / PL/SQL procedure successfully completed. SQL> print prod_id PROD_ID ---------- 130 SQL> select prod_id, count(prod_id) 2 from sales 3 where prod_id > :prod_id 4 group by prod_id;
The cost went from 540 to 33, and that is fairly significant. One of the main benefits is that the query using the bind variable, that is, the work done parsing the query, stays the same each and every time. All you have to do is substitute a new value for the variable.
==========================================================================================
Instance Tuning Using Performance Views
After the initial configuration of a database, monitoring and tuning an instance regularly is important to eliminate any potential performance bottlenecks. This chapter discusses the tuning process using Oracle V$ performance views.1 Instance Tuning Steps
These are the main steps in the Oracle performance method for instance tuning:
- Define the Problem
Get candid feedback from users about the scope of the performance problem. - Examine the Host System and Examine the Oracle Statistics
- After obtaining a full set of operating system, database, and application statistics, examine the data for any evidence of performance problems.
- Consider the list of common performance errors to see whether the data gathered suggests that they are contributing to the problem.
- Build a conceptual model of what is happening on the system using the performance data gathered.
- Implement and Measure Change
Propose changes to be made and the expected result of implementing the changes. Then, implement the changes and measure application performance. - Determine whether the performance objective defined in step 1 has been met. If not, then repeat steps 2 and 3 until the performance goals are met.
If your site does not have the Automatic Workload Repository and Automatic Database Diagnostic Monitor features, then Statspack can be used to gather Oracle instance statistics.
1.1 Define the Problem
It is vital to develop a good understanding of the purpose of the tuning exercise and the nature of the problem before attempting to implement a solution. Without this understanding, it is virtually impossible to implement effective changes. The data gathered during this stage helps determine the next step to take and what evidence to examine.
Gather the following data:
- Identify the performance objective.
What is the measure of acceptable performance? How many transactions an hour, or seconds, response time will meet the required performance level? - Identify the scope of the problem.
What is affected by the slowdown? For example, is the whole instance slow? Is it a particular application, program, specific operation, or a single user? - Identify the time frame when the problem occurs.
Is the problem only evident during peak hours? Does performance deteriorate over the course of the day? Was the slowdown gradual (over the space of months or weeks) or sudden? - Quantify the slowdown.
This helps identify the extent of the problem and also acts as a measure for comparison when deciding whether changes implemented to fix the problem have actually made an improvement. Find a consistently reproducible measure of the response time or job run time. How much worse are the timings than when the program was running well? - Identify any changes.
Identify what has changed since performance was acceptable. This may narrow the potential cause quickly. For example, has the operating system software, hardware, application software, or Oracle release been upgraded? Has more data been loaded into the system, or has the data volume or user population grown?
1.2 Examine the Host System
Look at the load on the database server and the database instance. Consider the operating system, the I/O subsystem, and network statistics, because examining these areas helps determine what might be worth further investigation. In multitier systems, also examine the application server middle-tier hosts.
Examining the host hardware often gives a strong indication of the bottleneck in the system. This determines which Oracle performance data could be useful for cross-reference and further diagnosis.
Data to examine includes the following:
- CPU Usage
- Identifying I/O Problems
- Identifying Network Issues
If there is a significant amount of idle CPU, then there could be an I/O, application, or database bottleneck. Note that wait I/O should be considered as idle CPU.
If there is high CPU usage, then determine whether the CPU is being used effectively. Is the majority of CPU usage attributable to a small number of high-CPU using programs, or is the CPU consumed by an evenly distributed workload?
If the CPU is used by a small number of high-usage programs, then look at the programs to determine the cause. Check whether some processes alone consume the full power of one CPU. Depending on the process, this could be an indication of a CPU or process bound workload which can be tackled by dividing or parallelizing the process activity.
1.2.1.1 Non-Oracle Processes
If the programs are not Oracle programs, then identify whether they are legitimately requiring that amount of CPU. If so, determine whether their execution be delayed to off-peak hours. Identifying these CPU intensive processes can also help narrowing what specific activity, such as I/O, network, and paging, is consuming resources and how can it be related to the Oracle workload.
1.2.1.2 Oracle Processes
If a small number of Oracle processes consumes most of the CPU resources, then use SQL_TRACE and TKPROF to identify the SQL or PL/SQL statements to see if a particular query or PL/SQL program unit can be tuned. For example, a SELECT statement could be CPU-intensive if its execution involves many reads of data in cache (logical reads) that could be avoided with better SQL optimization.
1.2.1.3 Oracle CPU Statistics
Oracle CPU statistics are available in several V$ views:
- V$SYSSTAT shows Oracle CPU usage for all sessions. The CPU used by this session statistic shows the aggregate CPU used by all sessions. The parsetime cpu statistic shows the total CPU time used for parsing.
- V$SESSTAT shows Oracle CPU usage for each session. Use this view to determine which particular session is using the most CPU.
- V$RSRC_CONSUMER_GROUP shows CPU utilization statistics for each consumer group when the Oracle Database Resource Manager is running.
It is important to recognize that CPU time and real time are distinct. With eight CPUs, for any given minute in real time, there are eight minutes of CPU time available. On Windows and UNIX, this can be either user time or system time (privileged mode on Windows). Thus, average CPU time utilized by all processes (threads) on the system could be greater than one minute for every one minute real time interval.
At any given moment, you know how much time Oracle has used on the system. So, if eight minutes are available and Oracle uses four minutes of that time, then you know that 50% of all CPU time is used by Oracle. If your process is not consuming that time, then some other process is. Identify the processes that are using CPU time, figure out why, and then attempt to tune them.
If the CPU usage is evenly distributed over many Oracle server processes, examine the V$SYS_TIME_MODEL view to help get a precise understanding of where most time is spent.
1.2.2 Identifying I/O ProblemsAn overly active I/O system can be evidenced by disk queue lengths greater than two, or disk service times that are over 20-30ms. If the I/O system is overly active, then check for potential hot spots that could benefit from distributing the I/O across more disks. Also identify whether the load can be reduced by lowering the resource requirements of the programs using those resources. If the I/O problems are caused by Oracle Database, then I/O tuning can begin. If Oracle Database is not consuming the available I/O resources, then identify the process that is using up the I/O. Determine why the process is using up the I/O, and then tune this process.
I/O problems can be identified using V$ views in Oracle Database and monitoring tools in the operating system, as described in the following sections:
- Identifying I/O Problems Using V$ Views
- Identifying I/O Problems Using Operating System Monitoring Tools
Check the Oracle wait event data in V$SYSTEM_EVENT to see whether the top wait events are I/O related. I/O related events include db file sequential read,db file scattered read, db file single write, db file parallel write, and log file parallel write. These are all events corresponding to I/Os performed against datafiles and log files. If any of these wait events correspond to high average time, then investigate the I/O contention.
Cross reference the host I/O system data with the I/O sections in the Automatic Repository report to identify hot datafiles and tablespaces. Also compare the I/O times reported by the operating system with the times reported by Oracle Database to see if they are consistent.
An I/O problem can also manifest itself with non-I/O related wait events. For example, the difficulty in finding a free buffer in the buffer cache or high wait times for logs to be flushed to disk can also be symptoms of an I/O problem. Before investigating whether the I/O system should be reconfigured, determine if the load on the I/O system can be reduced.
To reduce I/O load caused by Oracle Database, examine the I/O statistics collected for all I/O calls made by the database using the following views:
- V$IOSTAT_CONSUMER_GROUP
The V$IOSTAT_CONSUMER_GROUP view captures I/O statistics for consumer groups. If Oracle Database Resource Manager is enabled, I/O statistics for all consumer groups that are part of the currently enabled resource plan are captured. - V$IOSTAT_FILE
The V$IOSTAT_FILE view captures I/O statistics of database files that are or have been accessed. The SMALL_SYNC_READ_LATENCY column displays the latency for single block synchronous reads (in milliseconds), which translates directly to the amount of time that clients need to wait before moving onto the next operation. This defines the responsiveness of the storage subsystem based on the current load. If there is a high latency for critical datafiles, you may want to consider relocating these files to improve their service time. To calculate latency statistics, timed_statistics must be set to TRUE. - V$IOSTAT_FUNCTION
The V$IOSTAT_FUNCTION view captures I/O statistics for database functions (such as the LGWR and DBWR).
An I/O can be issued by various Oracle processes with different functionalities. The top database functions are classified in the V$IOSTAT_FUNCTION view. In cases when there is a conflict of I/O functions, the I/O is placed in the bucket with the lower FUNCTION_ID. For example, if XDB issues an I/O from the buffer cache, the I/O would be classified as an XDB I/O because it has a lower FUNCTION_ID value. Any unclassified function is placed in the Others bucket. You can display the FUNCTION_ID hierarchy by querying the V$IOSTAT_FUNCTION view:select FUNCTION_ID, FUNCTION_NAME from v$iostat_function order by FUNCTION_ID; FUNCTION_ID FUNCTION_NAME ----------- ------------------ 0 RMAN 1 DBWR 2 LGWR 3 ARCH 4 XDB 5 Streams AQ 6 Data Pump 7 Recovery 8 Buffer Cache Reads 9 Direct Reads 10 Direct Writes 11 Others
- Identifier
- Total wait time (in milliseconds)
- Number of waits executed (for consumer groups and functions)
- Number of requests for each operation
- Number of single and multi block bytes read
- Number of single and multi block bytes written
See Also:
- "db file scattered read" and "db file sequential read" for the difference between a scattered read and a sequential read, and how this affects I/O
- Get more information about the V$IOSTAT_CONSUMER_GROUP, V$IOSTAT_FUNCTION, V$IOSTAT_FILE, and V$SQLAREA views
Use operating system monitoring tools to determine what processes are running on the system as a whole and to monitor disk access to all files. Remember that disks holding datafiles and redo log files can also hold files that are not related to Oracle Database. Reduce any heavy access to disks that contain database files. Access to non-Oracle files can be monitored only through operating system facilities, rather than through the V$ views.
Utilities, such as sar -d (or iostat) on many UNIX systems and the administrative performance monitoring tool on Windows systems, examine I/O statistics for the entire system.
1.2.3 Identifying Network Issues
Using operating system utilities, look at the network round-trip ping time and the number of collisions. If the network is causing large delays in response time, then investigate possible causes.
To identify network I/O caused by remote access of database files, examine the V$IOSTAT_NETWORK view. This view contains network I/O statistics caused by accessing files on a remote database instance, including:
- Database client initiating the network I/O (such as RMAN and PLSQL)
- Number of read and write operations issued
- Number of kilobytes read and written
- Total wait time in milliseconds for read operations
- Total wait in milliseconds for write operations
1.3 Examine the Oracle Statistics
Oracle statistics should be examined and cross-referenced with operating system statistics to ensure a consistent diagnosis of the problem. operating-system statistics can indicate a good place to begin tuning. However, if the goal is to tune the Oracle instance, then look at the Oracle statistics to identify the resource bottleneck from an Oracle perspective before implementing corrective action.
The following sections discuss the common Oracle data sources used while tuning.
1.3.1 Setting the Level of Statistics Collection
Oracle provides the initialization parameter STATISTICS_LEVEL, which controls all major statistics collections or advisories in the database. This parameter sets the statistics collection level for the database.
Depending on the setting of STATISTICS_LEVEL, certain advisories or statistics are collected, as follows:
- BASIC: No advisories or statistics are collected. Monitoring and many automatic features are disabled. Oracle does not recommend this setting because it disables important Oracle features.
- TYPICAL: This is the default value and ensures collection for all major statistics while providing best overall database performance. This setting should be adequate for most environments.
- ALL: All of the advisories or statistics that are collected with the TYPICAL setting are included, plus timed operating system statistics and row source execution statistics.
This view lists the status of the statistics or advisories controlled by STATISTICS_LEVEL - V$STATISTICS_LEVEL view
1.3.2 Wait Events
Wait events are statistics that are incremented by a server process or thread to indicate that it had to wait for an event to complete before being able to continue processing. Wait event data reveals various symptoms of problems that might be impacting performance, such as latch contention, buffer contention, and I/O contention. Remember that these are only symptoms of problems, not the actual causes.
Wait events are grouped into classes. The wait event classes include: Administrative, Application, Cluster, Commit, Concurrency, Configuration, Idle, Network, Other, Scheduler, System I/O, and User I/O.
A server process can wait for the following:
- A resource to become available, such as a buffer or a latch
- An action to complete, such as an I/O
- More work to do, such as waiting for the client to provide the next SQL statement to execute. Events that identify that a server process is waiting for more work are known as idle events.
To minimize user response time, reduce the time spent by server processes waiting for event completion. Not all wait events have the same wait time. Therefore, it is more important to examine events with the most total time waited rather than wait events with a high number of occurrences. Usually, it is best to set the dynamic parameter TIMED_STATISTICS to true at least while monitoring performance.
1.3.3 Dynamic Performance Views Containing Wait Event Statistics
These dynamic performance views can be queried for wait event statistics:
- V$ACTIVE_SESSION_HISTORY
The V$ACTIVE_SESSION_HISTORY view displays active database session activity, sampled once every second. - V$SESS_TIME_MODEL and V$SYS_TIME_MODEL
The V$SESS_TIME_MODEL and V$SYS_TIME_MODEL views contain time model statistics, including DB time which is the total time spent in database calls - V$SESSION_WAIT
The V$SESSION_WAIT view displays information about the current or last wait for each session (such as wait ID, class, and time). - V$SESSION
The V$SESSION view displays information about each current session and contains the same wait statistics as those found in the V$SESSION_WAIT view. If applicable, this view also contains detailed information about the object that the session is currently waiting for (such as object number, block number, file number, and row number), the blocking session responsible for the current wait (such as the blocking session ID, status, and type), and the amount of time waited. - V$SESSION_EVENT
The V$SESSION_EVENT view provides summary of all the events the session has waited for since it started. - V$SESSION_WAIT_CLASS
The V$SESSION_WAIT_CLASS view provides the number of waits and the time spent in each class of wait events for each session. - V$SESSION_WAIT_HISTORY
TheV$SESSION_WAIT_HISTORY view displays information about the last ten wait events for each active session (such as event type and wait time). - V$SYSTEM_EVENT
The V$SYSTEM_EVENT view provides a summary of all the event waits on the instance since it started. - V$EVENT_HISTOGRAM
The V$EVENT_HISTOGRAM view displays a histogram of the number of waits, the maximum wait, and total wait time on an event basis. - V$FILE_HISTOGRAM
The V$FILE_HISTOGRAM view displays a histogram of times waited during single block reads for each file. - V$SYSTEM_WAIT_CLASS
The V$SYSTEM_WAIT_CLASS view provides the instance wide time totals for the number of waits and the time spent in each class of wait events. - V$TEMP_HISTOGRAM
The V$TEMP_HISTOGRAM view displays a histogram of times waited during single block reads for each temporary file.
1.3.4 System Statistics
System statistics are typically used in conjunction with wait event data to find further evidence of the cause of a performance problem.
For example, if V$SYSTEM_EVENT indicates that the largest wait event (in terms of wait time) is the event buffer busy waits, then look at the specific buffer wait statistics available in the view V$WAITSTAT to see which block type has the highest wait count and the highest wait time.
After the block type has been identified, also look at V$SESSION real-time while the problem is occurring or V$ACTIVE_SESSION_HISTORY andDBA_HIST_ACTIVE_SESS_HISTORY views after the problem has been experienced to identify the contended-for objects using the object number indicated. The combination of this data indicates the appropriate corrective action.
Statistics are available in many V$ views. Some common views include the following:
1.3.4.1 V$ACTIVE_SESSION_HISTORY
This view displays active database session activity, sampled once every second.
1.3.4.2 V$SYSSTAT
This contains overall statistics for many different parts of Oracle, including rollback, logical and physical I/O, and parse data. Data from V$SYSSTAT is used to compute ratios, such as the buffer cache hit ratio.
1.3.4.3 V$FILESTAT
This contains detailed file I/O statistics for each file, including the number of I/Os for each file and the average read time.
1.3.4.4 V$ROLLSTAT
This contains detailed rollback and undo segment statistics for each segment.
1.3.4.5 V$ENQUEUE_STAT
This contains detailed enqueue statistics for each enqueue, including the number of times an enqueue was requested and the number of times an enqueue was waited for, and the wait time.
1.3.4.6 V$LATCH
This contains detailed latch usage statistics for each latch, including the number of times each latch was requested and the number of times the latch was waited for.
1.3.5 Segment-Level Statistics
You can gather segment-level statistics to help you spot performance problems associated with individual segments. Collecting and viewing segment-level statistics is a good way to effectively identify hot tables or indexes in an instance.
After viewing wait events and system statistics to identify the performance problem, you can use segment-level statistics to find specific tables or indexes that are causing the problem. Consider, for example, that V$SYSTEM_EVENT indicates that buffer busy waits cause a fair amount of wait time. You can select fromV$SEGMENT_STATISTICS the top segments that cause the buffer busy waits. Then you can focus your effort on eliminating the problem in those segments.
You can query segment-level statistics through the following dynamic performance views:
- V$SEGSTAT_NAME This view lists the segment statistics being collected and the properties of each statistic (for instance, if it is a sampled statistic).
- V$SEGSTAT This is a highly efficient, real-time monitoring view that shows the statistic value, statistic name, and other basic information.
- V$SEGMENT_STATISTICS This is a user-friendly view of statistic values. In addition to all the columns of V$SEGSTAT, it has information about such things as the segment owner and table space name. It makes the statistics easy to understand, but it is more costly.
Often at the end of a tuning exercise, it is possible to identify two or three changes that could potentially alleviate the problem. To identify which change provides the most benefit, it is recommended that only one change be implemented at a time. The effect of the change should be measured against the baseline data measurements found in the problem definition phase.
Typically, most sites with dire performance problems implement a number of overlapping changes at once, and thus cannot identify which changes provided any benefit. Although this is not immediately an issue, this becomes a significant hindrance if similar problems subsequently appear, because it is not possible to know which of the changes provided the most benefit and which efforts to prioritize.
If it is not possible to implement changes separately, then try to measure the effects of dissimilar changes. For example, measure the effect of making an initialization change to optimize redo generation separately from the effect of creating a new index to improve the performance of a modified query. It is impossible to measure the benefit of performing an operating system upgrade if SQL is tuned, the operating system disk layout is changed, and the initialization parameters are also changed at the same time.
Performance tuning is an iterative process. It is unlikely to find a 'silver bullet' that solves an instance-wide performance problem. In most cases, excellent performance requires iteration through the performance tuning phases, because solving one bottleneck often uncovers another (sometimes worse) problem.
Knowing when to stop tuning is also important. The best measure of performance is user perception, rather than how close the statistic is to an ideal value.
2 Interpreting Oracle Statistics
Gather statistics that cover the time when the instance had the performance problem. If you previously captured baseline data for comparison, then you can compare the current data to the data from the baseline that most represents the problem workload.When comparing two reports, ensure that the two reports are from times where the system was running comparable workloads.
2.1 Examine Load
Usually, wait events are the first data examined. However, if you have a baseline report, then check to see if the load has changed. Regardless of whether you have a baseline, it is useful to see whether the resource usage rates are high.Load-related statistics to examine include redo size, session logical reads, db block changes, physical reads, physical read total bytes, physicalwrites, physical write total bytes, parse count (total), parse count (hard), and user calls. This data is queried from V$SYSSTAT. It is best to normalize this data over seconds and over transactions. It is also useful to examine the total I/O load in MB per second by using the sum of physical write total bytes and physical write total bytes. The combined value includes the I/O's used to buffer cache, redo logs, archive logs, by RMAN backup and recovery and any Oracle background process.
In the Automatic Workload Repository report, look at the Load Profile section. The data has been normalized over transactions and over seconds.
2.1.1 Changing Load
The load profile statistics over seconds show the changes in throughput (that is, whether the instance is performing more work each second). The statistics over transactions identify changes in the application characteristics by comparing these to the corresponding statistics from the baseline report.
2.1.2 High Rates of Activity
Examine the statistics normalized over seconds to identify whether the rates of activity are very high. It is difficult to make blanket recommendations on high values, because the thresholds are different on each site and are contingent on the application characteristics, the number and speed of CPUs, the operating system, the I/O system, and the Oracle release.
The following are some generalized examples (acceptable values vary at each site):
- A hard parse rate of more than 100 a second indicates that there is a very high amount of hard parsing on the system. High hard parse rates cause serious performance issues and must be investigated. Usually, a high hard parse rate is accompanied by latch contention on the shared pool and library cache latches.
- Check whether the sum of the wait times for library cache and shared pool latch events (latch: library cache, latch: library cache pin, latch: library cache lock and latch: shared pool) is significant compared to statistic DB time found in V$SYSSTAT. If so, examine the SQL ordered by Parse Calls section of the Automatic Workload Repository report.
- A high soft parse rate could be in the rate of 300 a second or more. Unnecessary soft parses also limit application scalability. Optimally, a SQL statement should be soft parsed once in each session and executed many times.
Whenever an Oracle process waits for something, it records the wait using one of a set of predefined wait events. These wait events are grouped in wait classes. The Idle wait class groups all events that a process waits for when it does not have work to do and is waiting for more work to perform. Non-idle events indicate nonproductive time spent waiting for a resource or action to complete.
Note:
Not all symptoms can be evidenced by wait events.
The most effective way to use wait event data is to order the events by the wait time. This is only possible if TIMED_STATISTICS is set to true. Otherwise, the wait events can only be ranked by the number of times waited, which is often not the ordering that best represents the problem.
To get an indication of where time is spent, follow these steps:
- Examine the data collection for V$SYSTEM_EVENT. The events of interest should be ranked by wait time.
Identify the wait events that have the most significant percentage of wait time. To determine the percentage of wait time, add the total wait time for all wait events, excluding idle events, such as Null event, SQL*Net message from client, SQL*Net message to client, and SQL*Net more data to client. Calculate the relative percentage of the five most prominent events by dividing each event's wait time by the total time waited for all events.
Alternatively, look at the Top 5 Timed Events section at the beginning of the Automatic Workload Repository report. This section automatically orders the wait events (omitting idle events), and calculates the relative percentage:Top 5 Timed Events [[user:DBASOLUTIONS|1325280042]][[user:DBASOLUTIONS|1325280042]][[user:DBASOLUTIONS|1325280042]][[user:DBASOLUTIONS|1325280042]]~~ % Total Event Waits Time (s) Call Time -------------------------------------- ------------ ----------- --------- CPU time 559 88.80 log file parallel write 2,181 28 4.42 SQL*Net more data from client 516,611 27 4.24 db file parallel write 13,383 13 2.04 db file sequential read 563 2 .27
In some situations, there might be a few events with similar percentages. This can provide extra evidence if all the events are related to the same type of resource request (for example, all I/O related events). - Look at the number of waits for these events, and the average wait time. For example, for I/O related events, the average time might help identify whether the I/O system is slow. The following example of this data is taken from the Wait Event section of the Automatic Workload Repository report:
Avg Total Wait wait Waits Event Waits Timeouts Time (s) (ms) /txn --------------------------- --------- --------- ---------- ------ --------- log file parallel write 2,181 0 28 13 41.2 SQL*Net more data from clie 516,611 0 27 0 9,747.4 db file parallel write 13,383 0 13 1 252.5
- The top wait events identify the next places to investigate. A table of common wait events is listed in below Table It is usually a good idea to also have quick look at high-load SQL.
- Examine the related data indicated by the wait events to see what other information this data provides. Determine whether this information is consistent with the wait event data. In most situations, there is enough data to begin developing a theory about the potential causes of the performance bottleneck.
- To determine whether this theory is valid, cross-check data you have already examined with other statistics available for consistency. The appropriate statistics vary depending on the problem, but usually include load profile-related data in V$SYSSTAT, operating system statistics, and so on. Perform cross-checks with other data to confirm or refute the developing theory.
Table 10-1 links wait events to possible causes and gives an overview of the Oracle data that could be most useful to review next.Table 10-1 Wait Events and Potential Causes
Wait Event | General Area | Possible Causes | Look for / Examine |
---|---|---|---|
buffer busy waits | Buffer cache, DBWR | Depends on buffer type. For example, waits for an index block may be caused by a primary key that is based on an ascending sequence. | Examine V$SESSION while the problem is occurring to determine the type of block in contention. |
free buffer waits | Buffer cache, DBWR, I/O | Slow DBWR (possibly due to I/O?) Cache too small | Examine write time using operating system statistics. Check buffer cache statistics for evidence of too small cache. |
db file scattered read | I/O, SQL statement tuning | Poorly tuned SQL Slow I/O system | Investigate V$SQLAREA to see whether there are SQL statements performing many disk reads. Cross-check I/O system andV$FILESTAT for poor read time. |
db file sequential read | I/O, SQL statement tuning | Poorly tuned SQL Slow I/O system | Investigate V$SQLAREA to see whether there are SQL statements performing many disk reads. Cross-check I/O system andV$FILESTAT for poor read time. |
enqueue waits (waits starting with enq:) | Locks | Depends on type of enqueue | Look at V$ENQUEUE_STAT. |
library cache latch waits:library cache, librarycache pin, and librarycache lock | Latch contention | SQL parsing or sharing | Check V$SQLAREA to see whether there are SQL statements with a relatively high number of parse calls or a high number of child cursors (column VERSION_COUNT). Check parse statistics inV$SYSSTAT and their corresponding rate for each second. |
log buffer space | Log buffer, I/O | Log buffer small Slow I/O system | Check the statistic redo buffer allocation retries in V$SYSSTAT. Check configuring log buffer section in configuring memory chapter. Check the disks that house the online redo logs for resource contention. |
log file sync | I/O, over- committing | Slow disks that store the online logs Un-batched commits | Check the disks that house the online redo logs for resource contention. Check the number of transactions (commits +rollbacks) each second, from V$SYSSTAT. |
- http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_database_id=NOT&p_id=34405.1
- http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?p_database_id=NOT&p_id=62172.1
http://metalink.oracle.com2.4 Additional Statistics
There are a number of statistics that can indicate performance problems that do not have corresponding wait events.
2.4.1 Redo Log Space Requests Statistic
The V$SYSSTAT statistic redo log space requests indicates how many times a server process had to wait for space in the online redo log, not for space in the redo log buffer. A significant value for this statistic and the wait events should be used as an indication that checkpoints, DBWR, or archiver activity should be tuned, not LGWR. Increasing the size of log buffer does not help.
2.4.2 Read Consistency
Your system might spend excessive time rolling back changes to blocks in order to maintain a consistent view. Consider the following scenarios:
- If there are many small transactions and an active long-running query is running in the background on the same table where the changes are happening, then the query might need to roll back those changes often, in order to obtain a read-consistent image of the table. Compare the following V$SYSSTATstatistics to determine whether this is happening:
- consistent changes statistic indicates the number of times a database block has rollback entries applied to perform a consistent read on the block. Workloads that produce a great deal of consistent changes can consume a great deal of resources.
- consistent gets statistic counts the number of logical reads in consistent mode.
- If there are few very, large rollback segments, then your system could be spending a lot of time rolling back the transaction table during delayed block cleanout in order to find out exactly which SCN a transaction was committed. When Oracle commits a transaction, all modified blocks are not necessarily updated with the commit SCN immediately. In this case, it is done later on demand when the block is read or updated. This is called delayed block cleanout.
The ratio of the following V$SYSSTAT statistics should be close to 1:ratio = transaction tables consistent reads - undo records applied / transaction tables consistent read rollbacks
The recommended solution is to use automatic undo management.- Comparing the number of WAITS to the number of GETS in V$ROLLSTAT; the proportion of WAITS to GETS should be small.
- Examining V$WAITSTAT to see whether there are many WAITS for buffers of CLASS 'undo header'.
You can detect migrated or chained rows by checking the number of table fetch continued row statistic in V$SYSSTAT. A small number of chained rows (less than 1%) is unlikely to impact system performance. However, a large percentage of chained rows can affect performance.
Chaining on rows larger than the block size is inevitable. You might want to consider using tablespace with larger block size for such data.
However, for smaller rows, you can avoid chaining by using sensible space parameters and good application design. For example, do not insert a row with key values filled in and nulls in most other columns, then update that row with the real data, causing the row to grow in size. Rather, insert rows filled with data from the start.
If an UPDATE statement increases the amount of data in a row so that the row no longer fits in its data block, then Oracle tries to find another block with enough free space to hold the entire row. If such a block is available, then Oracle moves the entire row to the new block. This is called migrating a row. If the row is too large to fit into any available block, then Oracle splits the row into multiple pieces and stores each piece in a separate block. This is called chaining a row. Rows can also be chained when they are inserted.
Migration and chaining are especially detrimental to performance with the following:
- UPDATE statements that cause migration and chaining to perform poorly
- Queries that select migrated or chained rows because these must perform additional input and output
Increasing PCTFREE can help to avoid migrated rows. If you leave more free space available in the block, then the row has room to grow. You can also reorganize or re-create tables and indexes that have high deletion rates. If tables frequently have rows deleted, then data blocks can have partially free space in them. If rows are inserted and later expanded, then the inserted rows might land in blocks with deleted rows but still not have enough room to expand. Reorganizing the table ensures that the main free space is totally empty blocks.
Note:
PCTUSED is not the opposite of PCTFREE
2.4.4 Parse-Related Statistics
The more your application parses, the more potential for contention exists, and the more time your system spends waiting. If parse time CPU represents a large percentage of the CPU time, then time is being spent parsing instead of executing statements. If this is the case, then it is likely that the application is using literal SQL and so SQL cannot be shared, or the shared pool is poorly configured.
There are a number of statistics available to identify the extent of time spent parsing by Oracle. Query the parse related statistics from V$SYSSTAT. For example:
SELECT NAME, VALUE FROM V$SYSSTAT WHERE NAME IN ( 'parse time cpu', 'parse time elapsed', 'parse count (hard)', 'CPU used by this session' );There are various ratios that can be computed to assist in determining whether parsing may be a problem:
- parse time CPU / parse time elapsed
This ratio indicates how much of the time spent parsing was due to the parse operation itself, rather than waiting for resources, such as latches. A ratio of one is good, indicating that the elapsed time was not spent waiting for highly contended resources. - parse time CPU / CPU used by this session
This ratio indicates how much of the total CPU used by Oracle server processes was spent on parse-related operations. A ratio closer to zero is good, indicating that the majority of CPU is not spent on parsing.
3 Wait Events Statistics
The V$SESSION, V$SESSION_WAIT, V$SESSION_HISTORY, V$SESSION_EVENT, and V$SYSTEM_EVENT views provide information on what resources were waited for, and, if the configuration parameter TIMED_STATISTICS is set to true, how long each resource was waited for.Investigate wait events and related timing data when performing reactive performance tuning. The events with the most time listed against them are often strong indications of the performance bottleneck.
The following views contain related, but different, views of the same data:
- V$SESSION lists session information for each current session. It lists either the event currently being waited for, or the event last waited for on each session. This view also contains information about blocking sessions, the wait state, and the wait time.
- V$SESSION_WAIT is a current state view. It lists either the event currently being waited for, or the event last waited for on each session, the wait state, and the wait time.
- V$SESSION_WAIT_HISTORY lists the last 10 wait events for each current session and the associated wait time.
- V$SESSION_EVENT lists the cumulative history of events waited for on each session. After a session exits, the wait event statistics for that session are removed from this view.
- V$SYSTEM_EVENT lists the events and times waited for by the whole instance (that is, all session wait events data rolled up) since instance startup.
For example, V$SESSION_EVENT can show that session 124 (SID=124) had many waits on the db file scattered read, but it does not show which file and block number. However, V$SESSION_WAIT shows the file number in P1, the block number read in P2, and the number of blocks read in P3 (P1 and P2 let you determine for which segments the wait event is occurring).
This section concentrates on examples using V$SESSION_WAIT. However, Oracle recommends capturing performance data over an interval and keeping this data for performance and capacity analysis. This form of rollup data is queried from the V$SYSTEM_EVENT view by Automatic Workload Repository.
Most commonly encountered events are described in this chapter, listed in case-sensitive alphabetical order. Other event-related data to examine is also included. The case used for each event name is that which appears in the V$SYSTEM_EVENT view.
3.1 buffer busy waits
This wait indicates that there are some buffers in the buffer cache that multiple processes are attempting to access concurrently. Query V$WAITSTAT for the wait statistics for each class of buffer. Common buffer classes that have buffer busy waits include data block, segment header, undo header, and undo block.
Check the following V$SESSION_WAIT parameter columns:
- P1 - File ID
- P2 - Block ID
- P3 - Class ID
To determine the possible causes, first query V$SESSION to identify the value of ROW_WAIT_OBJ# when the session waits for buffer busy waits. For example:
SELECT row_wait_obj# FROM V$SESSION WHERE EVENT = 'buffer busy waits';To identify the object and object type contended for, query DBA_OBJECTS using the value for ROW_WAIT_OBJ# that is returned from V$SESSION. For example:
SELECT owner, object_name, subobject_name, object_type FROM DBA_OBJECTS WHERE data_object_id = &row_wait_obj3.1.2 Actions
The action required depends on the class of block contended for and the actual segment.
3.1.2.1 segment header
If the contention is on the segment header, then this is most likely free list contention.
Automatic segment-space management in locally managed tablespaces eliminates the need to specify the PCTUSED, FREELISTS, and FREELIST GROUPSparameters. If possible, switch from manual space management to automatic segment-space management (ASSM).
The following information is relevant if you are unable to use automatic segment-space management (for example, because the tablespace uses dictionary space management).
A free list is a list of free data blocks that usually includes blocks existing in a number of different extents within the segment. Free lists are composed of blocks in which free space has not yet reached PCTFREE or used space has shrunk below PCTUSED. Specify the number of process free lists with the FREELISTSparameter. The default value of FREELISTS is one. The maximum value depends on the data block size.
To find the current setting for free lists for that segment, run the following:
SELECT SEGMENT_NAME, FREELISTS FROM DBA_SEGMENTS WHERE SEGMENT_NAME = segment name AND SEGMENT_TYPE = segment type;Set free lists, or increase the number of free lists. If adding more free lists does not alleviate the problem, then use free list groups (even in single instance this can make a difference). If using Oracle Real Application Clusters, then ensure that each instance has its own free list group(s).
3.1.2.2 data block
If the contention is on tables or indexes (not the segment header):
- Check for right-hand indexes. These are indexes that are inserted into at the same point by many processes. For example, those that use sequence number generators for the key values.
- Consider using automatic segment-space management (ASSM), global hash partitioned indexes, or increasing free lists to avoid multiple processes attempting to insert into the same block.
For contention on rollback segment header:
- If you are not using automatic undo management, then add more rollback segments.
For contention on rollback segment block:
- If you are not using automatic undo management, then consider making rollback segment sizes larger.
This event signifies that the user process is reading buffers into the SGA buffer cache and is waiting for a physical I/O call to return. A db file scattered readissues a scattered read to read the data into multiple discontinuous memory locations. A scattered read is usually a multiblock read. It can occur for a fast full scan (of an index) in addition to a full table scan.
The db file scattered read wait event identifies that a full scan is occurring. When performing a full scan into the buffer cache, the blocks read are read into memory locations that are not physically adjacent to each other. Such reads are called scattered read calls, because the blocks are scattered throughout memory. This is why the corresponding wait event is called 'db file scattered read'. multiblock (up to DB_FILE_MULTIBLOCK_READ_COUNT blocks) reads due to full scans into the buffer cache show up as waits for 'db file scattered read'.
Check the following V$SESSION_WAIT parameter columns:
- P1 - The absolute file number
- P2 - The block being read
- P3 - The number of blocks (should be greater than 1)
Other things that could indicate excessive I/O load on the system include the following:
- Poor buffer cache hit ratio
- These wait events accruing most of the wait time for a user experiencing poor response time
There are several ways to handle excessive I/O waits. In the order of effectiveness, these are as follows:
- Reduce the I/O activity by SQL tuning
- Reduce the need to do I/O by managing the workload
- Gather system statistics with DBMS_STATS package, allowing the query optimizer to accurately cost possible access paths that use full scans
- Use Automatic Storage Management
- Add more disks to reduce the number of I/Os for each disk
- Alleviate I/O hot spots by redistributing I/O across existing disks
- Improperly optimized SQL
- Missing indexes
- High degree of parallelism for the table (skewing the optimizer toward scans)
- Lack of accurate statistics for the optimizer
- Setting the value for DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter too high which favors full scans
Besides reducing I/O, also examine the I/O distribution of files across the disks. Is I/O distributed uniformly across the disks, or are there hot spots on some disks? Are the number of disks sufficient to meet the I/O needs of the database?
See the total I/O operations (reads and writes) by the database, and compare those with the number of disks used. Remember to include the I/O activity of LGWR and ARCH processes.
3.2.4 Finding the SQL Statement executed by Sessions Waiting for I/O
Use the following query to determine, at a point in time, which sessions are waiting for I/O:
SELECT SQL_ADDRESS, SQL_HASH_VALUE FROM V$SESSION WHERE EVENT LIKE 'db file%read';3.2.5 Finding the Object Requiring I/O
To determine the possible causes, first query V$SESSION to identify the value of ROW_WAIT_OBJ# when the session waits for db file scattered read. For example:
SELECT row_wait_obj# FROM V$SESSION WHERE EVENT = 'db file scattered read';To identify the object and object type contended for, query DBA_OBJECTS using the value for ROW_WAIT_OBJ# that is returned from V$SESSION. For example:
SELECT owner, object_name, subobject_name, object_type FROM DBA_OBJECTS WHERE data_object_id = &row_wait_obj;3.3db file sequential readThis event signifies that the user process is reading a buffer into the SGA buffer cache and is waiting for a physical I/O call to return. A sequential read is a single-block read.
Single block I/Os are usually the result of using indexes. Rarely, full table scan calls could get truncated to a single block call due to extent boundaries, or buffers already present in the buffer cache. These waits would also show up as 'db file sequential read'.
Check the following V$SESSION_WAIT parameter columns:
- P1 - The absolute file number
- P2 - The block being read
- P3 - The number of blocks (should be 1)
On a healthy system, physical read waits should be the biggest waits after the idle waits. However, also consider whether there are db file sequential readson a large data warehouse that should be seeing mostly full table scans with parallel query.
Figure 10-1 depicts the differences between the following wait events:
- db file sequential read (single block read into one SGA buffer)
- db file scattered read (multiblock read into many discontinuous SGA buffers)
- direct read (single or multiblock read into the PGA, bypassing the SGA)
Description of Figure 10-1 follows |
Description of "Figure 1 Scattered Read, Sequential Read, and Direct Path Read"
3.4 direct path read and direct path read temp
When a session is reading buffers from disk directly into the PGA (opposed to the buffer cache in SGA), it waits on this event. If the I/O subsystem does not support asynchronous I/Os, then each wait corresponds to a physical read request.
If the I/O subsystem supports asynchronous I/O, then the process is able to overlap issuing read requests with processing the blocks already existing in the PGA. When the process attempts to access a block in the PGA that has not yet been read from disk, it then issues a wait call and updates the statistics for this event. Hence, the number of waits is not necessarily the same as the number of read requests (unlike db file scattered read and db file sequential read).
Check the following V$SESSION_WAIT parameter columns:
- P1 - File_id for the read call
- P2 - Start block_id for the read call
- P3 - Number of blocks in the read call
This happens in the following situations:
- The sorts are too large to fit in memory and some of the sort data is written out directly to disk. This data is later read back in, using direct reads.
- Parallel slaves are used for scanning data.
- The server process is processing buffers faster than the I/O system can return the buffers. This can indicate an overloaded I/O system.
The file_id shows if the reads are for an object in TEMP tablespace (sorts to disk) or full table scans by parallel slaves. This is the biggest wait for large data warehouse sites. However, if the workload is not a DSS workload, then examine why this is happening.
3.4.2.1 Sorts to Disk
Examine the SQL statement currently being run by the session experiencing waits to see what is causing the sorts. Query V$TEMPSEG_USAGE to find the SQL statement that is generating the sort. Also query the statistics from V$SESSTAT for the session to determine the size of the sort. See if it is possible to reduce the sorting by tuning the SQL statement. If WORKAREA_SIZE_POLICY is MANUAL, then consider increasing the SORT_AREA_SIZE for the system (if the sorts are not too big) or for individual processes. If WORKAREA_SIZE_POLICY is AUTO, then investigate whether to increase PGA_AGGREGATE_TARGET.
3.4.2.2 Full Table Scans
If tables are defined with a high degree of parallelism, then this could skew the optimizer to use full table scans with parallel slaves. Check the object being read into using the direct path reads. If the full table scans are a valid part of the workload, then ensure that the I/O subsystem is configured adequately for the degree of parallelism. Consider using disk striping if you are not already using it or Automatic Storage Management (ASM).
3.4.2.3 Hash Area Size
For query plans that call for a hash join, excessive I/O could result from having HASH_AREA_SIZE too small. If WORKAREA_SIZE_POLICY is MANUAL, then consider increasing the HASH_AREA_SIZE for the system or for individual processes. If WORKAREA_SIZE_POLICY is AUTO, then investigate whether to increasePGA_AGGREGATE_TARGET.
3.5 direct path write and direct path write temp
When a process is writing buffers directly from PGA (as opposed to the DBWR writing them from the buffer cache), the process waits on this event for the write call to complete. Operations that could perform direct path writes include when a sort goes to disk, during parallel DML operations, direct-path INSERTs, parallel create table as select, and some LOB operations.
Like direct path reads, the number of waits is not the same as number of write calls issued if the I/O subsystem supports asynchronous writes. The session waits if it has processed all buffers in the PGA and is unable to continue work until an I/O request completes.
Check the following V$SESSION_WAIT parameter columns:
- P1 - File_id for the write call
- P2 - Start block_id for the write call
- P3 - Number of blocks in the write call
This happens in the following situations:
- Sorts are too large to fit in memory and are written to disk
- Parallel DML are issued to create/populate objects
- Direct path loads
For parallel DML, check the I/O distribution across disks and make sure that the I/O subsystem is adequately configured for the degree of parallelism.
3.6 enqueue (enq:) waits
Enqueues are locks that coordinate access to database resources. This event indicates that the session is waiting for a lock that is held by another session.
The name of the enqueue is included as part of the wait event name, in the form enq: enqueue_type - related_details. In some cases, the same enqueue type can be held for different purposes, such as the following related TX types:
- enq: TX - allocate ITL entry
- enq: TX - contention
- enq: TX - index contention
- enq: TX - row lock contention
You can check the following V$SESSION_WAIT parameter columns for additional information:
- P1 - Lock TYPE (or name) and MODE
- P2 - Resource identifier ID1 for the lock
- P3 - Resource identifier ID2 for the lock
Query V$LOCK to find the sessions holding the lock. For every session waiting for the event enqueue, there is a row in V$LOCK with REQUEST <> 0. Use one of the following two queries to find the sessions holding the locks and waiting for the locks.
If there are enqueue waits, you can see these using the following statement:
SELECT * FROM V$LOCK WHERE request > 0;To show only holders and waiters for locks being waited on, use the following:
SELECT DECODE(request,0,'Holder: ','Waiter: ') || sid sess, id1, id2, lmode, request, type FROM V$LOCK WHERE (id1, id2, type) IN (SELECT id1, id2, type FROM V$LOCK WHERE request > 0) ORDER BY id1, request;3.6.2 Actions
The appropriate action depends on the type of enqueue.
3.6.2.1 ST enqueue
If the contended-for enqueue is the ST enqueue, then the problem is most likely to be dynamic space allocation. Oracle dynamically allocates an extent to a segment when there is no more free space available in the segment. This enqueue is only used for dictionary managed tablespaces.
To solve contention on this resource:
- Check to see whether the temporary (that is, sort) tablespace uses TEMPFILES. If not, then switch to using TEMPFILES.
- Switch to using locally managed tablespaces if the tablespace that contains segments that are growing dynamically is dictionary managed.
- If it is not possible to switch to locally managed tablespaces, then ST enqueue resource usage can be decreased by changing the next extent sizes of the growing objects to be large enough to avoid constant space allocation. To determine which segments are growing constantly, monitor the EXTENTS column of the DBA_SEGMENTS view for all SEGMENT_NAMEs.
- Preallocate space in the segment, for example, by allocating extents using the ALTER TABLE ALLOCATE EXTENT SQL statement.
The HW enqueue is used to serialize the allocation of space beyond the high water mark of a segment.
- V$SESSION_WAIT.P2 / V$LOCK.ID1 is the tablespace number.
- V$SESSION_WAIT.P3 / V$LOCK.ID2 is the relative dba of segment header of the object for which space is being allocated.
3.6.2.3 TM enqueue
The most common reason for waits on TM locks tend to involve foreign key constraints where the constrained columns are not indexed. Index the foreign key columns to avoid this problem.
3.6.2.4 TX enqueue
These are acquired exclusive when a transaction initiates its first change and held until the transaction does a COMMIT or ROLLBACK.
- Waits for TX in mode 6: occurs when a session is waiting for a row level lock that is already held by another session. This occurs when one user is updating or deleting a row, which another session wants to update or delete. This type of TX enqueue wait corresponds to the wait event enq: TX - rowlock contention.
The solution is to have the first session already holding the lock perform a COMMIT or ROLLBACK. - Waits for TX in mode 4 can occur if the session is waiting for an ITL (interested transaction list) slot in a block. This happens when the session wants to lock a row in the block but one or more other sessions have rows locked in the same block, and there is no free ITL slot in the block. Usually, Oracle dynamically adds another ITL slot. This may not be possible if there is insufficient free space in the block to add an ITL. If so, the session waits for a slot with a TX enqueue in mode 4. This type of TX enqueue wait corresponds to the wait event enq: TX - allocate ITL entry.
The solution is to increase the number of ITLs available, either by changing the INITRANS or MAXTRANS for the table (either by using an ALTER statement, or by re-creating the table with the higher values). - Waits for TX in mode 4 can also occur if a session is waiting due to potential duplicates in UNIQUE index. If two sessions try to insert the same key value the second session has to wait to see if an ORA-0001should be raised or not. This type of TX enqueue wait corresponds to the wait event enq: TX - rowlock contention.
The solution is to have the first session already holding the lock perform a COMMIT or ROLLBACK. - Waits for TX in mode 4 is also possible if the session is waiting due to shared bitmap index fragment. Bitmap indexes index key values and a range of ROWIDs. Each 'entry' in a bitmap index can cover many rows in the actual table. If two sessions want to update rows covered by the same bitmap index fragment, then the second session waits for the first transaction to either COMMIT or ROLLBACK by waiting for the TX lock in mode 4. This type of TX enqueue wait corresponds to the wait event enq: TX - row lock contention.
- Waits for TX in Mode 4 can also occur waiting for a PREPARED transaction.
- Waits for TX in mode 4 also occur when a transaction inserting a row in an index has to wait for the end of an index block split being done by another transaction. This type of TX enqueue wait corresponds to the wait event enq: TX - index contention.
This event belong to Other wait class and typically should not occur on a system. This event is an aggregate of all other events in the Other wait class, such aslatch free, and is used in the V$SESSION_EVENT and V$SERVICE_EVENT views only. In these views, the events in the Other wait class will not be maintained individually in every session. Instead, these events will be rolled up into this single event to reduce the memory used for maintaining statistics on events in the Other wait class.
10.3.8 free buffer waits
This wait event indicates that a server process was unable to find a free buffer and has posted the database writer to make free buffers by writing out dirty buffers. A dirty buffer is a buffer whose contents have been modified. Dirty buffers are freed for reuse when DBWR has written the blocks to disk.
10.3.8.1 Causes
DBWR may not be keeping up with writing dirty buffers in the following situations:
- The I/O system is slow.
- There are resources it is waiting for, such as latches.
- The buffer cache is so small that DBWR spends most of its time cleaning out buffers for server processes.
- The buffer cache is so big that one DBWR process is not enough to free enough buffers in the cache to satisfy requests.
If this event occurs frequently, then examine the session waits for DBWR to see whether there is anything delaying DBWR.
10.3.8.2.1 Writes
If it is waiting for writes, then determine what is delaying the writes and fix it. Check the following:
- Examine V$FILESTAT to see where most of the writes are happening.
- Examine the host operating system statistics for the I/O system. Are the write times acceptable?
- Consider using faster I/O alternatives to speed up write times.
- Spread the I/O activity across large number of spindles (disks) and controllers.
It is possible DBWR is very active because the cache is too small. Investigate whether this is a probable cause by looking to see if the buffer cache hit ratio is low. Also use the V$DB_CACHE_ADVICE view to determine whether a larger cache size would be advantageous.
3.8.2.3 Cache Is Too Big for One DBWR
If the cache size is adequate and the I/O is already evenly spread, then you can potentially modify the behavior of DBWR by using asynchronous I/O or by using multiple database writers.
3.8.3 Consider Multiple Database Writer (DBWR) Processes or I/O Slaves
Configuring multiple database writer processes, or using I/O slaves, is useful when the transaction rates are high or when the buffer cache size is so large that a single DBWn process cannot keep up with the load.
3.8.3.1 DB_WRITER_PROCESSES
The DB_WRITER_PROCESSES initialization parameter lets you configure multiple database writer processes (from DBW0 to DBW9 and from DBWa to DBWj). Configuring multiple DBWR processes distributes the work required to identify buffers to be written, and it also distributes the I/O load over these processes. Multiple db writer processes are highly recommended for systems with multiple CPUs (at least one db writer for every 8 CPUs) or multiple processor groups (at least as many db writers as processor groups).
Based upon the number of CPUs and the number of processor groups, Oracle either selects an appropriate default setting for DB_WRITER_PROCESSES or adjusts a user-specified setting.
3.8.3.2 DBWR_IO_SLAVES
If it is not practical to use multiple DBWR processes, then Oracle provides a facility whereby the I/O load can be distributed over multiple slave processes. The DBWR process is the only process that scans the buffer cache LRU list for blocks to be written out. However, the I/O for those blocks is performed by the I/O slaves. The number of I/O slaves is determined by the parameter DBWR_IO_SLAVES.
DBWR_IO_SLAVES is intended for scenarios where you cannot use multiple DB_WRITER_PROCESSES (for example, where you have a single CPU). I/O slaves are also useful when asynchronous I/O is not available, because the multiple I/O slaves simulate nonblocking, asynchronous requests by freeing DBWR to continue identifying blocks in the cache to be written. Asynchronous I/O at the operating system level, if you have it, is generally preferred.
DBWR I/O slaves are allocated immediately following database open when the first I/O request is made. The DBWR continues to perform all of the DBWR-related work, apart from performing I/O. I/O slaves simply perform the I/O on behalf of DBWR. The writing of the batch is parallelized between the I/O slaves.
Note:
Implementing DBWR_IO_SLAVES requires that extra shared memory be allocated for I/O buffers and request queues. Multiple DBWR processes cannot be used with I/O slaves. Configuring I/O slaves forces only one DBWR process to start.
3.8.3.3 Choosing Between Multiple DBWR Processes and I/O Slaves
Configuring multiple DBWR processes benefits performance when a single DBWR process is unable to keep up with the required workload. However, before configuring multiple DBWR processes, check whether asynchronous I/O is available and configured on the system. If the system supports asynchronous I/O but it is not currently used, then enable asynchronous I/O to see if this alleviates the problem. If the system does not support asynchronous I/O, or if asynchronous I/O is already configured and there is still a DBWR bottleneck, then configure multiple DBWR processes.
Note:
If asynchronous I/O is not available on your platform, then asynchronous I/O can be disabled by setting the DISK_ASYNCH_IOinitialization parameter to FALSE.
Using multiple DBWRs parallelizes the gathering and writing of buffers. Therefore, multiple DBWn processes should deliver more throughput than one DBWR process with the same number of I/O slaves. For this reason, the use of I/O slaves has been deprecated in favor of multiple DBWR processes. I/O slaves should only be used if multiple DBWR processes cannot be configured.
3.9 Idle Wait Events
These events belong to Idle wait class and indicate that the server process is waiting because it has no work. This usually implies that if there is a bottleneck, then the bottleneck is not for database resources. The majority of the idle events should be ignored when tuning, because they do not indicate the nature of the performance bottleneck. Some idle events can be useful in indicating what the bottleneck is not. An example of this type of event is the most commonly encountered idle wait-event SQL Net message from client. This and other idle events (and their categories) are listed in Table 10-2.
Table 10-2 Idle Wait Events
Wait Name | Background Process Idle Event | User Process Idle Event | Parallel Query Idle Event | Shared Server Idle Event | Oracle Real Application Clusters Idle Event |
---|---|---|---|---|---|
dispatcher timer | . | . | . | X | . |
pipe get | . | X | . | . | . |
pmon timer | X | . | . | . | . |
PX Idle Wait | . | . | X | . | . |
PX Deq Credit: need buffer | . | . | X | . | . |
rdbms ipc message | X | . | . | . | . |
smon timer | X | . | . | . | . |
SQL*Net message from client | . | X | . | . | . |
virtual circuit status | . | . | . | X | . |
3.10 latch events
A latch is a low-level internal lock used by Oracle Database to protect memory structures. The latch free event is updated when a server process attempts to get a latch, and the latch is unavailable on the first attempt.
There is a dedicated latch-related wait event for the more popular latches that often generate significant contention. For those events, the name of the latch appears in the name of the wait event, such as latch: library cache or latch: cache buffers chains. This enables you to quickly figure out if a particular type of latch is responsible for most of the latch-related contention. Waits for all other latches are grouped in the generic latch free wait event.
3.10.1 Actions
This event should only be a concern if latch waits are a significant portion of the wait time on the system as a whole, or for individual users experiencing problems.
- Examine the resource usage for related resources. For example, if the library cache latch is heavily contended for, then examine the hard and soft parse rates.
- Examine the SQL statements for the sessions experiencing latch contention to see if there is any commonality.
- P1 - Address of the latch
- P2 - Latch number
- P3 - Number of times process has already slept, waiting for the latch
SELECT EVENT, SUM(P3) SLEEPS, SUM(SECONDS_IN_WAIT) SECONDS_IN_WAIT FROM V$SESSION_WAIT WHERE EVENT LIKE 'latch%' GROUP BY EVENT;A problem with the previous query is that it tells more about session tuning or instant instance tuning than instance or long-duration instance tuning.
The following query provides more information about long duration instance tuning, showing whether the latch waits are significant in the overall database time.
SELECT EVENT, TIME_WAITED_MICRO, ROUND(TIME_WAITED_MICRO*100/S.DBTIME,1) PCT_DB_TIME FROM V$SYSTEM_EVENT, (SELECT VALUE DBTIME FROM V$SYS_TIME_MODEL WHERE STAT_NAME = 'DB time') S WHERE EVENT LIKE 'latch%' ORDER BY PCT_DB_TIME ASC;A more general query that is not specific to latch waits is the following:
SELECT EVENT, WAIT_CLASS, TIME_WAITED_MICRO,ROUND(TIME_WAITED_MICRO*100/S.DBTIME,1) PCT_DB_TIME FROM V$SYSTEM_EVENT E, V$EVENT_NAME N, (SELECT VALUE DBTIME FROM V$SYS_TIME_MODEL WHERE STAT_NAME = 'DB time') S WHERE E.EVENT_ID = N.EVENT_ID AND N.WAIT_CLASS NOT IN ('Idle', 'System I/O') ORDER BY PCT_DB_TIME ASC;Table 10-3 Latch Wait Event
Latch | SGA Area | Possible Causes | Look For: |
---|---|---|---|
Shared pool, library cache | Shared pool | Lack of statement reuse Statements not using bind variables Insufficient size of application cursor cache Cursors closed explicitly after each execution Frequent logon/logoffs Underlying object structure being modified (for example truncate) Shared pool too small | Sessions (in V$SESSTAT) with high:
|
cache buffers lru chain | Buffer cache LRU lists | Excessive buffer cache throughput. For example, inefficient SQL that accesses incorrect indexes iteratively (large index range scans) or many full table scans DBWR not keeping up with the dirty workload; hence, foreground process spends longer holding the latch looking for a free buffer Cache may be too small | Statements with very high logical I/O or physical I/O, using unselective indexes |
cache buffers chains | Buffer cache buffers | Repeated access to a block (or small number of blocks), known as a hot block | Sequence number generation code that updates a row in a table to generate the number, rather than using a sequence number generator Index leaf chasing from very many processes scanning the same unselective index with very similar predicate Identify the segment the hot block belongs to |
row cache objects |
A main cause of shared pool or library cache latch contention is parsing. There are a number of techniques that can be used to identify unnecessary parsing and a number of types of unnecessary parsing:
3.10.3.1 Unshared SQL
This method identifies similar SQL statements that could be shared if literals were replaced with bind variables. The idea is to either:
- Manually inspect SQL statements that have only one execution to see whether they are similar:
SELECT SQL_TEXT FROM V$SQLSTATS WHERE EXECUTIONS < 4 ORDER BY SQL_TEXT;
- Or, automate this process by grouping together what may be similar statements. Do this by estimating the number of bytes of a SQL statement which will likely be the same, and group the SQL statements by that many bytes. For example, the following example groups together statements that differ only after the first 60 bytes.
SELECT SUBSTR(SQL_TEXT, 1, 60), COUNT(*) FROM V$SQLSTATS WHERE EXECUTIONS < 4 GROUP BY SUBSTR(SQL_TEXT, 1, 60) HAVING COUNT(*) > 1;
- Or report distinct SQL statements that have the same execution plan. The following query selects distinct SQL statements that share the same execution plan at least four times. These SQL statements are likely to be using literals instead of bind variables.
SELECT SQL_TEXT FROM V$SQLSTATS WHERE PLAN_HASH_VALUE IN (SELECT PLAN_HASH_VALUE FROM V$SQLSTATS GROUP BY PLAN_HASH_VALUE HAVING COUNT(*) > 4) ORDER BY PLAN_HASH_VALUE;
Check the V$SQLSTATS view. Enter the following query:
SELECT SQL_TEXT, PARSE_CALLS, EXECUTIONS FROM V$SQLSTATS ORDER BY PARSE_CALLS;When the PARSE_CALLS value is close to the EXECUTIONS value for a given statement, you might be continually reparsing that statement. Tune the statements with the higher numbers of parse calls.
3.10.3.3 By Session
Identify unnecessary parse calls by identifying the session in which they occur. It might be that particular batch programs or certain types of applications do most of the reparsing. To do this, run the following query:
SELECT pa.SID, pa.VALUE "Hard Parses", ex.VALUE "Execute Count" FROM V$SESSTAT pa, V$SESSTAT ex WHERE pa.SID = ex.SID AND pa.STATISTIC#=(SELECT STATISTIC# FROM V$STATNAME WHERE NAME = 'parse count (hard)') AND ex.STATISTIC#=(SELECT STATISTIC# FROM V$STATNAME WHERE NAME = 'execute count') AND pa.VALUE > 0;The result is a list of all sessions and the amount of reparsing they do. For each session identifier (SID), go to V$SESSION to find the name of the program that causes the reparsing.
Note:
Because this query counts all parse calls since instance startup, it is best to look for sessions with high rates of parse. For example, a connection which has been up for 50 days might show a high parse figure, but a second connection might have been up for 10 minutes and be parsing at a much faster rate.
The output is similar to the following:
SID Hard Parses Execute Count ------ ----------- ------------- 7 1 20 8 3 12690 6 26 325 11 84 16193.10.3.4 cache buffers lru chain
The cache buffers lru chain latches protect the lists of buffers in the cache. When adding, moving, or removing a buffer from a list, a latch must be obtained.
For symmetric multiprocessor (SMP) systems, Oracle automatically sets the number of LRU latches to a value equal to one half the number of CPUs on the system. For non-SMP systems, one LRU latch is sufficient.
Contention for the LRU latch can impede performance on SMP machines with a large number of CPUs. LRU latch contention is detected by querying V$LATCH,V$SESSION_EVENT, and V$SYSTEM_EVENT. To avoid contention, consider tuning the application, bypassing the buffer cache for DSS jobs, or redesigning the application.
3.10.3.5 cache buffers chains
The cache buffers chains latches are used to protect a buffer list in the buffer cache. These latches are used when searching for, adding, or removing a buffer from the buffer cache. Contention on this latch usually means that there is a block that is greatly contended for (known as a hot block).
To identify the heavily accessed buffer chain, and hence the contended for block, look at latch statistics for the cache buffers chains latches using the viewV$LATCH_CHILDREN. If there is a specific cache buffers chains child latch that has many more GETS, MISSES, and SLEEPS when compared with the other child latches, then this is the contended for child latch.
This latch has a memory address, identified by the ADDR column. Use the value in the ADDR column joined with the X$BH table to identify the blocks protected by this latch. For example, given the address (V$LATCH_CHILDREN.ADDR) of a heavily contended latch, this queries the file and block numbers:
SELECT OBJ data_object_id, FILE#, DBABLK,CLASS, STATE, TCH FROM X$BH WHERE HLADDR = 'address of latch' ORDER BY TCH;X$BH.TCH is a touch count for the buffer. A high value for X$BH.TCH indicates a hot block.
Many blocks are protected by each latch. One of these buffers will probably be the hot block. Any block with a high TCH value is a potential hot block. Perform this query a number of times, and identify the block that consistently appears in the output. After you have identified the hot block, query DBA_EXTENTS using the file number and block number, to identify the segment.
After you have identified the hot block, you can identify the segment it belongs to with the following query:
SELECT OBJECT_NAME, SUBOBJECT_NAME FROM DBA_OBJECTS WHERE DATA_OBJECT_ID = &obj;In the query, &obj is the value of the OBJ column in the previous query on X$BH.
3.10.3.6 row cache objects
The row cache objects latches protect the data dictionary.
3.11 log file parallel write
This event involves writing redo records to the redo log files from the log buffer.
3.12 library cache pin
This event manages library cache concurrency. Pinning an object causes the heaps to be loaded into memory. If a client wants to modify or examine the object, the client must acquire a pin after the lock.
3.13 library cache lock
This event controls the concurrency between clients of the library cache. It acquires a lock on the object handle so that either:
- One client can prevent other clients from accessing the same object
- The client can maintain a dependency for a long time which does not allow another client to change the object
3.14 log buffer space
This event occurs when server processes are waiting for free space in the log buffer, because all the redo is generated faster than LGWR can write it out.
Actions
Modify the redo log buffer size. If the size of the log buffer is already reasonable, then ensure that the disks on which the online redo logs reside do not suffer from I/O contention. The log buffer space wait event could be indicative of either disk I/O contention on the disks where the redo logs reside, or of a too-small log buffer. Check the I/O profile of the disks containing the redo logs to investigate whether the I/O system is the bottleneck. If the I/O system is not a problem, then the redo log buffer could be too small. Increase the size of the redo log buffer until this event is no longer significant.
3.15 log file switch
There are two wait events commonly encountered:
- log file switch (archiving needed)
- log file switch (checkpoint incomplete)
3.15.1 Actions
For the log file switch (archiving needed) event, examine why the archiver is unable to archive the logs in a timely fashion. It could be due to the following:
- Archive destination is running out of free space.
- Archiver is not able to read redo logs fast enough (contention with the LGWR).
- Archiver is not able to write fast enough (contention on the archive destination, or not enough ARCH processes). If you have ruled out other possibilities (such as slow disks or a full archive destination) consider increasing the number of ARCn processes. The default is 2.
- If you have mandatory remote shipped archive logs, check whether this process is slowing down because of network delays or the write is not completing because of errors.
- Check if DBWR is slow, possibly due to an overloaded or slow I/O system. Check the DBWR write times, check the I/O system, and distribute I/O if necessary.
- Check if there are too few, or too small redo logs. If you have a few redo logs or small redo logs (for example two x 100k logs), and your system produces enough redo to cycle through all of the logs before DBWR has been able to complete the checkpoint, then increase the size or number of redo logs.
When a user session commits (or rolls back), the session's redo information must be flushed to the redo logfile by LGWR. The server process performing theCOMMIT or ROLLBACK waits under this event for the write to the redo log to complete.
Actions
If this event's waits constitute a significant wait on the system or a significant amount of time waited by a user experiencing response time issues or on a system, then examine the average time waited.
If the average time waited is low, but the number of waits are high, then the application might be committing after every INSERT, rather than batching COMMITs. Applications can reduce the wait by committing after 50 rows, rather than every row.
If the average time waited is high, then examine the session waits for the log writer and see what it is spending most of its time doing and waiting for. If the waits are because of slow I/O, then try the following:
- Reduce other I/O activity on the disks containing the redo logs, or use dedicated disks.
- Alternate redo logs on different disks to minimize the effect of the archiver on the log writer.
- Move the redo logs to faster disks or a faster I/O subsystem (for example, switch from RAID 5 to RAID 1).
- Consider using raw devices (or simulated raw devices provided by disk vendors) to speed up the writes.
- Depending on the type of application, it might be possible to batch COMMITs by committing every N rows, rather than every row, so that fewer log file syncs are needed.
This event is used to wait for a reply from one of the background processes.
3.18 SQL*Net Events
The following events signify that the database process is waiting for acknowledgment from a database link or a client process:
- SQL*Net break/reset to client
- SQL*Net break/reset to dblink
- SQL*Net message from client
- SQL*Net message from dblink
- SQL*Net message to client
- SQL*Net message to dblink
- SQL*Net more data from client
- SQL*Net more data from dblink
- SQL*Net more data to client
- SQL*Net more data to dblink
Events that are client-related should be diagnosed as described for the event SQL*Net message from client. Events that are dblink-related should be diagnosed as described for the event SQL*Net message from dblink.
3.18.1 SQL*Net message from client
Although this is an idle event, it is important to explain when this event can be used to diagnose what is not the problem. This event indicates that a server process is waiting for work from the client process. However, there are several situations where this event could accrue most of the wait time for a user experiencing poor response time. The cause could be either a network bottleneck or a resource bottleneck on the client process.
3.18.1.1 Network Bottleneck
A network bottleneck can occur if the application causes a lot of traffic between server and client and the network latency (time for a round-trip) is high. Symptoms include the following:
- Large number of waits for this event
- Both the database and client process are idle (waiting for network traffic) most of the time
- Tune the application to reduce round trips.
- Explore options to reduce latency (for example, terrestrial lines opposed to VSAT links).
- Change system configuration to move higher traffic components to lower latency links.
If the client process is using most of the resources, then there is nothing that can be done in the database. Symptoms include the following:
- Number of waits might not be large, but the time waited might be significant
- Client process has a high resource usage
3.18.2 SQL*Net message from dblink
This event signifies that the session has sent a message to the remote node and is waiting for a response from the database link. This time could go up because of the following:
- Network bottleneck
- Time taken to execute the SQL on the remote node
It is useful to see the SQL being run on the remote node. Login to the remote database, find the session created by the database link, and examine the SQL statement being run by it. - Number of round trip messages
Each message between the session and the remote node adds latency time and processing overhead. To reduce the number of messages exchanged, use array fetches and array inserts.
The server process is sending more data or messages to the client. The previous operation to the client was also a send.
4 Real-Time SQL Monitoring
The real-time SQL monitoring feature of Oracle Database enables you to monitor the performance of SQL statements while they are executing. By default, SQL monitoring is automatically started when a SQL statement runs parallel, or when it has consumed at least 5 seconds of CPU or I/O time in a single execution.You can monitor the statistics for SQL statement execution using the V$SQL_MONITOR and V$SQL_PLAN_MONITOR views. These views can be used in conjunction with the following views to get additional information about the execution being monitored:
- V$ACTIVE_SESSION_HISTORY
- V$SESSION
- V$SESSION_LONGOPS
- V$SQL
- V$SQL_PLAN
The V$SQL_MONITOR view contains a subset of the statistics available in V$SQL. However, unlike V$SQL, monitoring statistics are not cumulative over several executions. Instead, one entry in V$SQL_MONITOR is dedicated to a single execution of a SQL statement. If two executions of the same SQL statement are being monitored, each of these executions will have a separate entry in V$SQL_MONITOR.
To uniquely identify two executions of the same SQL statement, a composite key called an execution key is generated. This execution key is composed of three attributes, each corresponding to a column in V$SQL_MONITOR:
- SQL identifier to identify the SQL statement (SQL_ID)
- Start execution timestamp (SQL_EXEC_START)
- An internally generated identifier to ensure that this primary key is truly unique (SQL_EXEC_ID)
- SQL Plan Monitoring
- Parallel Execution Monitoring
- Generating the SQL Monitor Report
- Enabling and Disabling SQL Monitoring
Real-time SQL monitoring also includes monitoring statistics for each operation in the execution plan of the SQL statement being monitored. This data is visible in the V$SQL_PLAN_MONITOR view. Similar to the V$SQL_MONITOR view, statistics in V$SQL_PLAN_MONITOR are updated every second as the SQL statement is being executed. These statistics persist after the execution ends, with the same duration as V$SQL_MONITOR. There will be multiple entries in V$SQL_PLAN_MONITOR for every SQL statement being monitored; each entry will correspond to an operation in the execution plan of the statement.
4.2 Parallel Execution Monitoring
Parallel queries, DML and DDL statements are automatically monitored as soon as execution begins. Monitoring information for each process participating in the parallel execution is recorded as separate entries in the V$SQL_MONITOR and V$SQL_PLAN_MONITOR views. As a result, the V$SQL_MONITOR view will have one entry for the parallel execution coordinator process, and one entry for each parallel execution server process. Each of these entries will have corresponding entries in the V$SQL_PLAN_MONITOR view. Since the processes allocated for the parallel execution of a SQL statement are cooperating for the same execution, these entries share the same execution key (the composite SQL_ID, SQL_EXEC_START and SQL_EXEC_ID). You can therefore aggregate the execution key to determine the overall statistics for a parallel execution.
4.3 Generating the SQL Monitor Report
You can use the SQL monitor report to view SQL monitoring data. The SQL monitor report uses data from several views, including:
- GV$SQL_MONITOR
- GV$SQL_PLAN_MONITOR
- GV$SQL
- GV$SQL_PLAN
- GV$ACTIVE_SESSION_HISTORY
- GV$SESSION_LONGOPS
variable my_rept CLOB; BEGIN :my_rept :=DBMS_SQLTUNE.REPORT_SQL_MONITOR(); END; / print :my_reptThe DBMS_SQLTUNE.REPORT_SQL_MONITOR function accepts several input parameters to specify the execution, the level of detail in the report, and the report type ('TEXT', 'HTML', or 'XML'). By default, a text report is generated for the last execution that was monitored if no parameters are specified as shown in the example.
Example shows the output of the SQL Monitor Report for the last execution of a SQL statement that was monitored.
Example Sample SQL Monitor Report
set long 10000000 set longchunksize 10000000 set linesize 200 select dbms_sqltune.report_sql_monitor from dual; SQL Text ---------------------------------------------------------------------------------------- select * from (select O_ORDERDATE, sum(O_TOTALPRICE) from orders o, lineitem l where l.l_orderkey = o.o_orderkey group by o_orderdate order by o_orderdate) where rownum < 100 ---------------------------------------------------------------------------------------- Global Information Status : EXECUTING Instance ID : 1 Session ID : 980 SQL ID : br4m75c20p97h SQL Execution ID : 16777219 Plan Hash Value : 2992965678 Execution Started : 06/07/2007 08:36:42 First Refresh Time : 06/07/2007 08:36:46 Last Refresh Time : 06/07/2007 08:40:02 ----------------------------------------------------------------------------------- | Elapsed | Cpu | IO | Application | Other | Buffer | Reads | Writes | | Time(s) | Time(s) | Waits(s) | Waits(s) | Waits(s) | Gets | | | ----------------------------------------------------------------------------------- | 198 | 140 | 56 | 0.31 | 1.44 | 1195K | 1264K | 84630 | ----------------------------------------------------------------------------------- SQL Plan Monitoring Details ======================================================================================= | Id | Operation | Name | Rows | Cost | Time | Start | | | | | (Estim) | | Active(s) | Active | ======================================================================================= | 0 | SELECT STATEMENT | | | 125K | | | | 1 | COUNT STOPKEY | | | | | | | 2 | VIEW | | 2406 | 125K | | | | 3 | SORT GROUP BY STOPKEY | | 2406 | 125K | 99 | +101 | | -> 4 | HASH JOIN | | 8984K | 123K | 189 | +12 | | | | | | | | | | 5 | INDEX FAST FULL SCAN | I_L_OKEY | 8984K | 63191 | 82 | +1 | | | | | | | | | | 6 | PARTITION RANGE ALL | | 44913K | 57676 | 94 | +84 | | 7 | PARTITION HASH ALL | | 44913K | 57676 | 94 | +84 | | 8 | TABLE ACCESS FULL | ORDERS | 44913K | 57676 | 95 | +84 | | | | | | | | | | | | | | | | | ======================================================================================= continuation of above table ======================================================================================= Starts | Rows | Memory | Temp | Activity | Activity Detail | Progress | | (Actual) | | | (percent) | (sample #) | | ======================================================================================= 1 | | | | | | | 1 | | | | | | | 1 | | | | | | | 1 | 0 | | | 4.02 | Cpu (8) | | 1 | 28130K | 10000K | 724M | 25.13 | Cpu (48) | 87% | | | | | | direct path read temp (2) | | 1 | 32734K | | | 34.17 | Cpu (58) | 100% | | | | | | direct path read (10) | | 1 | 45000K | | | | | | 84 | 45000K | | | | | | 672 | 45000K | | | 36.68 | Cpu (28) | | | | | | | reliable message (3) | | | | | | | direct path read (42) | | =======================================================================================In the Global Information section of this report, the Status field shows that this statement is still executing. The Time Active(s) column shows how long the operation has been active (the delta in seconds between the first and the last active time). The Start Active column shows, in seconds, when the operation in the execution plan started relative to the SQL statement execution start time. In this report, the fast full scan operation at ID 5 was the first to start (+1s Start Active) and ran for the first 82 seconds.The Starts column shows the number of times the operation in the execution plan was executed. The Rows (Actual) column indicates the number of rows produced, and the Rows (Estim) column shows the estimated cardinality from the optimizer. The Memory and Temp columns indicate the amount of memory and temporary space consumed by each operation of the execution plan.The Activity (percent) and Activity Detail (sample #) columns are derived by joining the V$SQL_PLAN_MONITOR and V$ACTIVE_SESSION_HISTORY views. Activity (percent) shows the percentage of database time consumed by each operation of the execution plan. Activity Detail (sample#) shows the nature of that activity (such as CPU or wait event). In this report, this column shows that most of the database time, 36.68%, is consumed by operation ID 8 (TABLE ACCESS FULL of ORDERS). This activity consists of 73 samples (28+3+42), of which more than half of the activity is attributed to direct path read (42 samples), and a third to CPU (28 samples).The last column, Progress, shows progress monitoring information for the operation from the V$SESSION_LONGOPS view. In this report, it shows that the hash-join operation is 87% complete.
10.4.4 Enabling and Disabling SQL Monitoring
The SQL monitoring feature is enabled by default when the STATISTICS_LEVEL initialization parameter is either set to ALL or TYPICAL (the default value). Additionally, the CONTROL_MANAGEMENT_PACK_ACCESSparameter must be set to DIAGNOSTIC+TUNING (the default value) because SQL monitoring is a feature of the Oracle Database Tuning Pack. SQL monitoring starts automatically for all long running queries.
Two statement-level hints are available to force or prevent a SQL statement from being monitored. To force SQL monitoring, use the MONITOR hint:
select /*+MONITOR*/ from dual;This hint is effective only when the CONTROL_MANAGEMENT_PACK_ACCESS parameter is set to DIAGNOSTIC+TUNING. To prevent the hinted SQL statement from being monitored, use the NO_MONITOR reverse hint.
5 Tuning Instance Recovery Performance: Fast-Start Fault Recovery
This section describes instance recovery, and how Oracle's Fast-Start Fault Recovery improves availability in the event of a crash or instance failure. It also offers guidelines for tuning the time required to perform crash and instance recovery.This section contains the following topics:
- About Instance Recovery
- Configuring the Duration of Cache Recovery: FAST_START_MTTR_TARGET
- Tuning FAST_START_MTTR_TARGET and Using MTTR Advisor
Instance and crash recovery are the automatic application of redo log records to Oracle data blocks after a crash or system failure. During normal operation, if an instance is shut down cleanly (as when using a SHUTDOWN IMMEDIATE statement), rather than terminated abnormally, then the in-memory changes that have not already been written to the datafiles on disk are written to disk as part of the checkpoint performed during shutdown.
However, if a single instance database crashes or if all instances of an Oracle Real Application Cluster configuration crash, then Oracle performs crash recovery at the next startup. If one or more instances of an Oracle Real Application Cluster configuration crash, then a surviving instance performs instance recovery automatically. Instance and crash recovery occur in two steps: cache recovery followed by transaction recovery.
The database can be opened as soon as cache recovery completes, so improving the performance of cache recovery is important for increasing availability.
5.1.1 Cache Recovery (Rolling Forward)
During the cache recovery step, Oracle applies all committed and uncommitted changes in the redo log files to the affected data blocks. The work required for cache recovery processing is proportional to the rate of change to the database (update transactions each second) and the time between checkpoints.
5.1.2 Transaction Recovery (Rolling Back)
To make the database consistent, the changes that were not committed at the time of the crash must be undone (in other words, rolled back). During the transaction recovery step, Oracle applies the rollback segments to undo the uncommitted changes.
5.1.3 Checkpoints and Cache Recovery
Periodically, Oracle records a checkpoint. A checkpoint is the highest system change number (SCN) such that all data blocks less than or equal to that SCN are known to be written out to the data files. If a failure occurs, then only the redo records containing changes at SCNs higher than the checkpoint need to be applied during recovery. The duration of cache recovery processing is determined by two factors: the number of data blocks that have changes at SCNs higher than the SCN of the checkpoint, and the number of log blocks that need to be read to find those changes.
5.1.3.1 How Checkpoints Affect Performance
Frequent checkpointing writes dirty buffers to the datafiles more often than otherwise, and so reduces cache recovery time in the event of an instance failure. If checkpointing is frequent, then applying the redo records in the redo log between the current checkpoint position and the end of the log involves processing relatively few data blocks. This means that the cache recovery phase of recovery is fairly short.
However, in a high-update system, frequent checkpointing can reduce run-time performance, because checkpointing causes DBWn processes to perform writes.
5.1.3.2 Fast Cache Recovery Tradeoffs
To minimize the duration of cache recovery, you must force Oracle to checkpoint often, thus keeping the number of redo log records to be applied during recovery to a minimum. However, in a high-update system, frequent checkpointing increases the overhead for normal database operations.
If daily operational efficiency is more important than minimizing recovery time, then decrease the frequency of writes to data files due to checkpoints. This should improve operational efficiency, but also increase cache recovery time.
5.2 Configuring the Duration of Cache Recovery: FAST_START_MTTR_TARGET
The Fast-Start Fault Recovery feature reduces the time required for cache recovery, and makes the recovery bounded and predictable by limiting the number of dirty buffers and the number of redo records generated between the most recent redo record and the last checkpoint.
The foundation of Fast-Start Fault Recovery is the Fast-Start checkpointing architecture. Instead of conventional event-driven (that is, log switching) checkpointing, which does bulk writes, fast-start checkpointing occurs incrementally. Each DBWn process periodically writes buffers to disk to advance the checkpoint position. The oldest modified blocks are written first to ensure that every write lets the checkpoint advance. Fast-Start checkpointing eliminates bulk writes and the resultant I/O spikes that occur with conventional checkpointing.
With the Fast-Start Fault Recovery feature, the FAST_START_MTTR_TARGET initialization parameter simplifies the configuration of recovery time from instance or system failure. FAST_START_MTTR_TARGET specifies a target for the expected mean time to recover (MTTR), that is, the time (in seconds) that it should take to start up the instance and perform cache recovery. Once FAST_START_MTTR_TARGET is set, the database manages incremental checkpoint writes in an attempt to meet that target. If you have chosen a practical value for FAST_START_MTTR_TARGET, you can expect your database to recover, on average, in approximately the number of seconds you have chosen.
Note:
You must disable or remove the FAST_START_IO_TARGET, LOG_CHECKPOINT_INTERVAL, and LOG_CHECKPOINT_TIMEOUTinitialization parameters when using FAST_START_MTTR_TARGET. Setting these parameters interferes with the mechanisms used to manage cache recovery time to meet FAST_START_MTTR_TARGET.
5.2.1 Practical Values for FAST_START_MTTR_TARGET
The maximum value for FAST_START_MTTR_TARGET is 3600 seconds (one hour). If you set the value to more than 3600, then Oracle rounds it to 3600.
The following example shows how to set the value of FAST_START_MTTR_TARGET:
SQL> ALTER SYSTEM SET FAST_START_MTTR_TARGET=30;In principle, the minimum value for FAST_START_MTTR_TARGET is one second. However, the fact that you can set FAST_START_MTTR_TARGET this low does not mean that that target can be achieved. There are practical limits to the minimum achievable MTTR target, due to such factors as database startup time.
The MTTR target that your database can achieve given the current value of FAST_START_MTTR_TARGET is called the effective MTTR target. You can view your current effective MTTR by viewing the TARGET_MTTRcolumn of the V$INSTANCE_RECOVERY view.
The practical range of MTTR target values for your database is defined to be the range between the lowest achievable effective MTTR target for your database and the longest that startup and cache recovery will take in the worst-case scenario (that is, when the whole buffer cache is dirty). A procedure for determining the range of achievable MTTR target values, one step in the process of tuning your FAST_START_MTTR_TARGETvalue.
Note:
It is usually not useful to set your FAST_START_MTTR_TARGET to a value outside the practical range. If yourFAST_START_MTTR_TARGET value is shorter than the lower limit of the practical range, the effect is as if you set it to the lower limit of the practical range. In such a case, the effective MTTR target will be the best MTTR target the system can achieve, but checkpointing will be at a maximum, which can affect normal database performance. If you set FAST_START_MTTR_TARGET to a time longer than the practical range, the MTTR target will be no better than the worst-case situation.
5.2.2 Reducing Checkpoint Frequency to Optimize Runtime Performance
To reduce the checkpoint frequency and optimize run-time performance, you can do the following:
- Set the value of FAST_START_MTTR_TARGET to 3600. This enables Fast-Start checkpointing and the Fast-Start Fault Recovery feature, but minimizes its effect on run-time performance while avoiding the need for performance tuning of FAST_START_MTTR_TARGET.
- Size your online redo log files according to the amount of redo your system generates. A good rule of thumb is to switch logs at most every twenty minutes. Having your log files too small can increase checkpoint activity and reduce performance. Also note that all redo log files should be the same size.
The V$INSTANCE_RECOVERY view displays the current recovery parameter settings. You can also use statistics from this view to determine which factor has the greatest influence on checkpointing.
The following table lists those columns most useful in monitoring predicted cache recovery performance:
Table 10-4 V$INSTANCE_RECOVERY Columns
Column | Description |
---|---|
TARGET_MTTR | Effective mean time to recover (MTTR) target in seconds. This field is 0 if FAST_START_MTTR_TARGET is not specified. |
ESTIMATED_MTTR | The current estimated mean time to recover (MTTR) in seconds, based on the current number of dirty buffers and log blocks. This field is always calculated, whether or not FAST_START_MTTR_TARGETis specified. |
5.3 Tuning FAST_START_MTTR_TARGET and Using MTTR Advisor
To determine the appropriate value for FAST_START_MTTR_TARGET for your database, use the following four step process:
- Calibrate the FAST_START_MTTR_TARGET
- Determine the Practical Range for FAST_START_MTTR_TARGET
- Evaluate Different Target Values with MTTR Advisor
- Determine Optimal Size for Redo Logs
The FAST_START_MTTR_TARGET initialization parameter causes the database to calculate internal system trigger values, in order to limit the length of the redo log and the number of dirty data buffers in the data cache. This calculation uses estimated time to read a redo block, estimates of the time to read and write a data block and characteristics of typical workload of the system, such as how many dirty buffers corresponds to how many change vectors, and so on.
Initially, internal defaults are used in the calculation. These defaults are replaced over time by data gathered on I/O performance during system operation and actual cache recoveries.
You will have to perform several instance recoveries in order to calibrate your FAST_START_MTTR_TARGET value properly. Before starting calibration, you must decide whether FAST_START_MTTR_TARGET is being calibrated for a database crash or a hardware crash. This is a consideration if your database files are stored in a file system or if your I/O subsystem has a memory cache, because there is a considerable difference in the read and write time to disk depending on whether or not the files are cached. The appropriate value for FAST_START_MTTR_TARGET will depend upon which type of crash is more important to recover from quickly.
To effectively calibrate FAST_START_MTTR_TARGET, make sure that you run the typical workload of the system for long enough, and perform several instance recoveries to ensure that the time to read a redo block and the time to read or write a data block during recovery are recorded accurately.
5.3.2 Determine the Practical Range for FAST_START_MTTR_TARGET
After calibration, you can perform tests to determine the practical range for FAST_START_MTTR_TARGET for your database.
5.3.2.1 Determining Lower Bound for FAST_START_MTTR_TARGET: Scenario
To determine the lower bound of the practical range, set FAST_START_MTTR_TARGET to 1, and start up your database. Then check the value ofV$INSTANCE_RECOVERY.TARGET_MTTR, and use this value as a good lower bound for FAST_START_MTTR_TARGET. Database startup time, rather than cache recovery time, is usually the dominant factor in determining this limit.
For example, set the FAST_START_MTTR_TARGET to 1:
SQL> ALTER SYSTEM SET FAST_START_MTTR_TARGET=1;Then, execute the following query immediately after opening the database:
SQL> SELECT TARGET_MTTR, ESTIMATED_MTTR FROM V$INSTANCE_RECOVERY;Oracle responds with the following:
TARGET_MTTR ESTIMATED_MTTR 18 15The TARGET_MTTR value of 18 seconds is the minimum MTTR target that the system can achieve, that is, the lowest practical value for FAST_START_MTTR_TARGET. This minimum is calculated based on the average database startup time.
The ESTIMATED_MTTR field contains the estimated mean time to recovery based on the current state of the running database. Because the database has just opened, the system contains few dirty buffers, so not much cache recovery would be required if the instance failed at this moment. That is why ESTIMATED_MTTRcan, for the moment, be lower than the minimum possible TARGET_MTTR.
ESTIMATED_MTTR can be affected in the short term by recent database activity. Assume that you query V$INSTANCE_RECOVERY immediately after a period of heavy update activity in the database. Oracle responds with the following:
TARGET_MTTR ESTIMATED_MTTR 18 30Now the effective MTTR target is still 18 seconds, and the estimated MTTR (if a crash happened at that moment) is 30 seconds. This is an acceptable result. This means that some checkpoints writes might not have finished yet, so the buffer cache contains more dirty buffers than targeted.
Now wait for sixty seconds and reissue the query to V$INSTANCE_RECOVERY. Oracle responds with the following:
TARGET_MTTR ESTIMATED_MTTR 18 25The estimated MTTR at this time has dropped to 25 seconds, because some of the dirty buffers have been written out during this period
5.3.2.2 Determining Upper Bound for FAST_START_MTTR_TARGET To determine the upper bound of the practical range, set FAST_START_MTTR_TARGET to 3600, and operate your database under a typical workload for a while. Then check the value of V$INSTANCE_RECOVERY.TARGET_MTTR. This value is a good upper bound for FAST_START_MTTR_TARGET.
The procedure is substantially similar to that in "Determining Lower Bound for FAST_START_MTTR_TARGET: Scenario".
5.3.2.3 Selecting Preliminary Value for FAST_START_MTTR_TARGET
Once you have determined the practical bounds for the FAST_START_MTTR_TARGET parameter, select a preliminary value for the parameter. Choose a higher value within the practical range if your concern is with database performance, and a lower value within the practical range if your priority is shorter recovery times. The narrower the practical range, of course, the easier the choice becomes.
For example, if you discovered that the practical range was between 17 and 19 seconds, it would be quite simple to choose 19, because it makes relatively little difference in recovery time and at the same time minimizes the effect of checkpointing on system performance. However, if you found that the practical range was between 18 and 40 seconds, you might choose a compromise value of 30, and set the parameter accordingly:
SQL> ALTER SYSTEM SET FAST_START_MTTR_TARGET=30;You might then go on to use the MTTR Advisor to determine an optimal value.
5.3.3 Evaluate Different Target Values with MTTR Advisor
Once you have selected a preliminary value for FAST_START_MTTR_TARGET, you can use MTTR Advisor to evaluate the effect of differentFAST_START_MTTR_TARGET settings on system performance, compared to your chosen setting.
5.3.3.1 Enabling MTTR Advisor
To enable MTTR Advisor, set the two initialization parameters STATISTICS_LEVEL and FAST_START_MTTR_TARGET.
STATISTICS_LEVEL governs whether all advisors are enabled and is not specific to MTTR Advisor. Make sure that it is set to TYPICAL or ALL. Then, whenFAST_START_MTTR_TARGET is set to a nonzero value, the MTTR Advisor is enabled.
5.3.3.2 Using MTTR Advisor
After enabling MTTR Advisor, run a typical database workload for a while. When MTTR Advisor is ON, the database simulates checkpoint queue behavior under the current value of FAST_START_MTTR_TARGET, and up to four other different MTTR settings within the range of valid FAST_START_MTTR_TARGET values. (The database will in this case determine the valid range for FAST_START_MTTR_TARGET itself before testing different values in the range.)
5.3.3.3 Viewing MTTR Advisor Results: V$MTTR_TARGET_ADVICE
The dynamic performance view V$MTTR_TARGET_ADVICE lets you view statistics or advisories collected by MTTR Advisor.
The database populates V$MTTR_TARGET_ADVICE with advice about the effects of each of the FAST_START_MTTR_TARGET settings for your database. For each possible value of FAST_START_MTTR_TARGET, the row contains details about how many cache writes would be performed under the workload tested for that value of FAST_START_MTTR_TARGET.
Specifically, each row contains information about cache writes, total physical writes (including direct writes), and total I/O (including reads) for that value ofFAST_START_MTTR_TARGET, expressed both as a total number of operations and a ratio compared to the operations under your chosen FAST_START_MTTR_TARGETvalue. For instance, a ratio of 1.2 indicates 20% more cache writes.
Knowing the effect of different FAST_START_MTTR_TARGET settings on cache write activity and other I/O enables you to decide better whichFAST_START_MTTR_TARGET value best fits your recovery and performance needs.
If MTTR Advisor is currently on, then V$MTTR_TARGET_ADVICE shows the Advisor information collected. If MTTR Advisor is currently OFF, the view shows information collected the last time MTTR Advisor was ON since database startup, if any. If the database has been restarted since the last time the MTTR Advisor was used, or if it has never been used, the view will not show any rows.
5.3.4 Determine Optimal Size for Redo Logs
You can use the V$INSTANCE_RECOVERY view column OPTIMAL_LOGFILE_SIZE to determine the size of your online redo logs. This field shows the redo log file size in megabytes that is considered optimal based on the current setting of FAST_START_MTTR_TARGET. If this field consistently shows a value greater than the size of your smallest online log, then you should configure all your online logs to be at least this size.
Note, however, that the redo log file size affects the MTTR. In some cases, you may be able to refine your choice of the optimal FAST_START_MTTR_TARGET value by re-running the MTTR Advisor with your suggested optimal log file size.
=====================================================================================================================
No comments:
Post a Comment