ExadataCommands_2
Oracle Exadata: DMA or DBA
Exadata is a different system for a DBA to administer. Some tasks in this environment, such as running the exachk script, require root O/S privileges. This script can be run by the system administrator, and this will be the case if you are managing Exadata as a DBA. However, a new role has emerged relative to Exadata, that of the Database Machine Administrator, or DMA. Let's look at what being a DMA really means.
In addition to the usual DBA skillset, the DMA must also be familiar with, and be able to understand, the following management and monitoring commands on the specified systems. On the compute nodes (database nodes):
Linux: top , mpstat , vmstat , iostat , fdisk , ustat , sar , sysinfo
Exadata: dcli
ASM: asmcmd , asmca
Clusterware: crsctl , srvctl
On the storage servers/cells:
Linux: top , mpstat , vmstat , iostat , fdisk , ustat , sar , sysinfo
Cell management: cellcli , cellsrvstat
Being a DMA also includes other areas of responsibility not associated with being a DBA. The following table summarizes the areas of responsibility for a DMA:
DMA Responsibilities Skill Percent
System Administrator 15
Storage Administrator 0
Network Administrator 5
Database Administrator 60
Cell Administrator 20
The "Percent" column indicates the percentage of the overall Exadata system requiring this knowledge, and as you can see if you've been an 11g RAC administrator, you have 60 percent of the skillset required to be a DMA. The remaining skills necessary to be a DMA are not difficult to learn and master. The Cell Administrator commands you will need ( cellcli , dcli ) will increase your knowledge to 80 percent of the DMA skillset. CellCLI is the command-line interface to monitor and manage the storage cells. There are three supplied logins to each storage cell and these are 'root', 'cellmonitor' and 'celladmin'. As you can probably guess 'celladmin' is the most powerful login that isn't 'root' (the superuser in Linux and Unix). You can do most anything to the storage cells, including startup and shutdown, with 'celladmin'. The 'cellmonitor' user can generate reports and list attributes from the storage cells but has no authority to perform management tasks. The full list of available cellcli commands is shown below:
CellCLI> help
HELP [topic]
Available Topics:
ALTER
ALTER ALERTHISTORY
ALTER CELL
ALTER CELLDISK
ALTER GRIDDISK
ALTER IBPORT
ALTER IORMPLAN
ALTER LUN
ALTER PHYSICALDISK
ALTER QUARANTINE
ALTER THRESHOLD
ASSIGN KEY
CALIBRATE
CREATE
CREATE CELL
CREATE CELLDISK
CREATE FLASHCACHE
CREATE FLASHLOG
CREATE GRIDDISK
CREATE KEY
CREATE QUARANTINE
CREATE THRESHOLD
DESCRIBE
DROP
DROP ALERTHISTORY
DROP CELL
DROP CELLDISK
DROP FLASHCACHE
DROP FLASHLOG
DROP GRIDDISK
DROP QUARANTINE
DROP THRESHOLD
EXPORT CELLDISK
IMPORT CELLDISK
LIST
LIST ACTIVEREQUEST
LIST ALERTDEFINITION
LIST ALERTHISTORY
LIST CELL
LIST CELLDISK
LIST FLASHCACHE
LIST FLASHCACHECONTENT
LIST FLASHLOG
LIST GRIDDISK
LIST IBPORT
LIST IORMPLAN
LIST KEY
LIST LUN
LIST METRICCURRENT
LIST METRICDEFINITION
LIST METRICHISTORY
LIST PHYSICALDISK
LIST QUARANTINE
LIST THRESHOLD
SET
SPOOL
START
CellCLI>
All of the above commands are available to 'celladmin'; only the LIST, DESCRIBE, SET and SPOOL commands are available to 'cellmonitor'.
Networking commands that you may need are ifconfig , iwconfig , netstat , ping , traceroute , and tracepath . You may, at some time, also need ifup and ifdown , to bring up or bring down network interfaces, although using these commands will not be a regular occurrence. The following example shows how to bring up the eth0 interface.
- ifup eth0
It seems like a daunting task, to become a DMA, but it really isn't that difficult. It does require a slightly different mindset, as you are now looking at, and managing, the entire system, rather than just the database. There will still be a need for a dedicated System Administrator and Network Administrator for your Exadata system, because, as a DMA, you won't be responsible for configuration of these resources, nor will you be responsible for patching and firmware upgrades. The DMA is, essentially, assisting these dedicated administrators by assuming the day-to-day tasks these resources would provide. Being a DMA is also more useful to you and to the enterprise as the regular tasks for these areas can be performed by the person or persons who do most of the interaction with Exadata on a daily basis. Enterprises vary, however, and it may not be possible to assume the role of DMA as the division of duties is strictly outlined and enforced. It is good to know, though, that such a role exists and may be made available to you at some time in the future.
Exadata Survival Guide
Performance is the reason companies choose Exadata over other platforms, as it can provide avenues of improvement commodity hardware can't match. One of those avenues is the Smart Scan, a mechanism where the database servers and the storage cells join forces to return the requested results. Let's see what triggers a Smart Scan and discover how it reduces the processing time for qualifying queries.
Smart Scans are remarkable; they take a seemingly common query and divide the labor between the database servers and the storage cells to return results in a fraction of the time required from non-Exadata hardware. There are several conditions, which must be met before a Smart Scan can be executed; I've mentioned these before but I will provide them again. To qualify for a Smart Scan a query must:
Use full table scans and/or full index scans coupled with direct path reads, have at least one predicate and use one or more of the following simple comparison operators
=,<,>,>=,=<,BETWEEN,IN,IS NULL,IS NOT NULL
Related Articles
Use parallel query, again with at least one predicate and using the above mentioned simple comparison operators.
Once the criterion is met the 'magic' begins. Oracle will pass the query to the storage cells in a manner similar to that used by Parallel Query (but, remember, Parallel Query does not need to be in use to get a Smart Scan to run). Each storage cell then processes the query/where clause, builds or updates the various storage indexes and returns only the requested data to the database servers eliminating the need for these servers to actually scan/read entire data blocks. Since the end result of such pre-processing is much smaller, the database servers can more quickly process the data, returning the final results to the user very quickly.
Knowing if a Smart Scan was used is fairly simple, but does require a bit of work. Execution plans can show Smart Scans were executed but those plans need to be generated by the optimizer at run time. Explain Plan won't be a reliable indicator of Smart Scan usage as a different mechanism is used to generate the anticipated plan. Note that I said 'anticipated plan' as that is exactly what Explain Plan provides, a guess at what Oracle will actually do to return the data. Valid methods for gathering usable plans are:
· Using autotrace on
· Querying either V$SQL_PLAN or DBA_HIST_SQL_PLAN for the actual plan
· Tracing the session with event 10046 and processing the trace file with tkprof
[There are a number of options to autotrace, including explain, but the explain option is no better than using Explain Plan.]
There are also two counters available in the V$SQL and GV$SQL views, which report on Smart Scan activity; these are IO_CELL_OFFLOAD_ELIGIBLE_BYTES and IO_CELL_OFFLOAD_RETURNED_BYTES. Let's look at a query that uses a Smart Scan and query V$SQL for the resulting I/O savings:
SQL> select *
2 from emp
3 where empid = 7934;
EMPID EMPNAME DEPTNO
---------- ---------------------------------------- ----------
7934 Smorthorper7934 15
Elapsed: 00:00:00.02
Execution Plan
Plan hash value: 3956160932
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 1 | 26 | 1167 (1)| 00:00:01 |
|* 1 | TABLE ACCESS STORAGE FULL| EMP | 1 | 26 | 1167 (1)| 00:00:01 |
Predicate Information (identified by operation id):
1 - storage("EMPID"=7934)
filter("EMPID"=7934)
Statistics
1 recursive calls
1 db block gets
4349 consistent gets
4276 physical reads
0 redo size
680 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
SQL> set autotrace off timing off
SQL>
SQL> select sql_id,
2 io_cell_offload_eligible_bytes qualifying,
3 io_cell_offload_returned_bytes actual,
4 round(((io_cell_offload_eligible_bytes - io_cell_offload_returned_bytes)/io_cell_offload_eligible_bytes)*100, 2) io_saved_pct,
5 sql_text
6 from v$sql
7 where io_cell_offload_returned_bytes > 0
8 and instr(sql_text, 'emp') > 0
9 and parsing_schema_name = 'BING';
SQL_ID QUALIFYING ACTUAL IO_SAVED_PCT SQL_TEXT
------------- ---------- ---------- ------------ --------------------------------------
gfjb8dpxvpuv6 35028992 6872 99.98 select * from emp where empid = 7934
SQL>
SQL> create index empid_idx on emp(empid);
Index created.
SQL>
SQL> set autotrace on timing on
SQL>
SQL> select *
2 from emp
3 where empid = 7934;
EMPID EMPNAME DEPTNO
---------- ---------------------------------------- ----------
7934 Smorthorper7934 15
Elapsed: 00:00:01.00
Execution Plan
Plan hash value: 1109982043
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
| 0 | SELECT STATEMENT | | 1 | 26 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMP | 1 | 26 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | EMPID_IDX | 1 | | 3 (0)| 00:00:01 |
Predicate Information (identified by operation id):
2 - access("EMPID"=7934)
Statistics
1 recursive calls
0 db block gets
5 consistent gets
2 physical reads
148 redo size
684 bytes sent via SQL*Net to client
524 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
SQL> set autotrace off timing off
SQL>
SQL> select sql_id, io_cell_offload_eligible_bytes qualifying, io_cell_offload_returned_bytes actual, round(((io_cell_offload_eligible_bytes - io_cell_offload_returned_bytes)/io_cell_offload_eligible_bytes)*100, 2) io_saved_pct, sql_text from v$sql where io_cell_offload_returned_bytes > 0 and instr(sql_text, 'emp') > 0 and parsing_schema_name = 'BING';
no rows selected
SQL>
The execution plan reports that a Smart Scan was used -- the TABLE SCAN STORAGE FULL entry and the storage("EMPID"=7934) predicate information indicate this. To back up that assertion the query from V$SQL shows that there were qualifying and returned offload bytes and computes the percentage of I/O saved because of the Smart Scan. Following traditional practices, an index was created to eliminate the full table scan; notice that the presence of the index also disabled the Smart Scan and increased the response time for the query. Granted, had it not been reported by SQL*Plus, the difference would not have been noticeable but it is interesting to see the elapsed times for both queries.
The I/O savings with Smart Scans can be dramatic; in some cases a Smart Scan can outperform an index scan on the same data (proven by the example shown), which is why it may be beneficial to remove some indexes from tables migrated to Exadata.
Smart Scans may not be the greatest thing since sliced bread but they do fulfill the promise of improved performance offered by Exadata. The nice thing about these is you don't have to set any obscure parameter to get them to work, which goes to show that simpler can indeed, be better.
Monitoring Exadata Storage Servers with the CellCLI
Oracle Exadata Database Machine provides a very powerful all-inclusive solution for highly available databases and extremely fast IO access to data. It combines Oracle's powerful Grid Infrastructure and Real Application Cluster database solution with the power of the Exadata Storage Server technology in a pre-configured configuration.
Exadata provides a platform that is a solution for systems ranging from data warehouses doing large scan intensive operations to online transaction systems needing high amounts of concurrency.
The Exadata Storage Servers are based on the 64-bit Intel-based Sun Fire Servers and they are shipped preloaded with Oracle Enterprise Linux x86_64 operating systems, the Exadata Storage Server software and InfiniBand protocol drivers.
While Exadata consistently provides amazing performance, like anything Oracle, it is important that DBAs are able to monitor the Exadata Storage Servers for both potential performance issues and errors.
Many aspects of the Exadata Storage Servers can be monitored including current active requests, hardware sensors, disk I/O errors, network errors, free space and metrics that are being managed. In this article, we'll be focusing on using and monitoring metrics using the CELLCLI command line tool.
To set up and configure the cells for alerts and notifications you should be logged into the Exadata Storage Server(s) using the cellmonitor account.
Overview of Metrics Monitoring
First, let's take a look at how the Exadata storage server monitoring works. The primary process that manages Exadata storage servers is CELLSRV. It will periodically record important metrics on components like the CPUs, cell disks, grid disks, flash cache and IORM (IO Resource Management). These metrics are initially stored in memory. The MS (Management Server) retrieves these metrics from CELLSRV and keeps a subset of the values in memory and once an hour writes a history to an internal disk repository. The retention period for these metrics and alert information defaults to seven days and can be controlled by a specific setting on the storage server called metricHistoryDays. It is changed using an ALTER CELL command in CELLCLI on each storage server.
Viewing Metric Information
At the center of the monitoring solutions is metrics and each of the metrics have the following significant attributes
· name
· metricObjectName - the specific object being measured such as the specific cell disk
· objectType -
o IORM_CONSUMER_GROUP
o IORM_DATABASE
o IORM_CATEGORY
o CELL
o CELLDISK
o CELL_FILESYSTEM
o GRIDDISK
o HOST_INTERCONNECT
o FLASHCACHE
· unit
o number
o percentage
o F (fahrenheit)
o C (celsius)
· metricValue
· metricType
o cumulative (since it was created)
o instantaneous (at the time the metric was collected)
o rate (change over time)
o transition (collected when the metric value changed)
There are several naming conventions followed that are worth knowing, to help us understand what we are looking at (or for), when managing the Exadata Storage Server metrics.
Metric names are prefixed as follows:
CL_ (cell)
CD_ (cell disk)
GD_ (grid disk)
FC_ (flash cache)
DB_ (database)
CG_ (consumer group)
CT_ (category)
N_ (interconnect network)
IO related metrics are further identified by codes that help to identify the operation(s) being done
IO_RQ (requests)
IO_BY (number of MB)
IO_TM (latency)
IO_WT (wait time)
They might also include a code to indicate reads (_R) or writes (_W), followed by an indicator of large (> 128k) _LG or small (<=128K) and a code for requests, seconds. While this all may sound complicated, after working with the names for a period of time the names actually do start to make sense.
For example:
CD_IO_RQ_W_LG would be the number of large write requests on a cell disk.
GD_IO_BY_R_SM_SEC is the number of MB of small block I/O reads per second on a grid disk.
To see the specific details about any of the metrics, use the LIST METRICDEFINITION command. For example, if you would like to see the detailed information of all metrics for celldisks - enter the following in CELLCLI>
LIST METRICDEFINITION WHERE objectType='CELLDISK' DETAIL
To view the history of any given metric, use the LIST METRICHISTORY command in CELLCLI. To see the current value of a metric use LIST METRICCURRENT. The following command would show the metric history of flash cache metrics collected after a specific date and time
LIST METRICHISTORY WHERE name like 'FC_.*' and collectionTime > '2013-01-31T13:15:30-08:00'
Or, to see the current value of metrics for all grid disks:
LIST METRICCURRENT WHERE objectType='GRIDDISK'
Working with Metrics Alerts
As administrators, not only can we view the metrics and the metric history, we are also able to define alert thresholds (both warning and critical) on many of these metrics along with I/O error counts, memory utilization and IORM metrics. Additionally, once an alert has been generated, actions taken to evaluate and resolve the alert can be tracked through the CELLCLI.
Alerts generated by the Exadata Storage Servers have the following attributes:
· alertSource
o BMR
o Metric
o ADR (automatic diagnostic repository_
· severity
o critical
o warning
o info
o clear
· alertType
o stateful
o stateless
· metricObjectName
· examinedBy
· metricName
· name
· description
· alertAction (recommended action to perform)
· alertMessage (brief information)
· failedMail (intended recipient of a failed notification)
· failedSNMP (intended SNMP subscriber of a failed notification)
· beginTime
· endTime
· notificationState
o 0 (never tried)
o 1 (sent successfully)
o 2 (retrying - up to 5 times)
o 3 (five failed retries)
To learn more about the details of the alert definitions use the LIST ALERTDEFINITION command in CELLCLI and indicate which attributes you would like to see.
LIST ALERTDEFINITION ATTRIBUTES name, metricName, description
To see warning level alerts that have been generated, and not yet examined by an administrator:
LIST ALERTHISTORY where examinedBy = ' ' and severity = 'warning' DETAIL
To mark an alert as examined:
ALTER ALERTHISTORY nnnn examinedBy="Karen" (where nnnn is the alert id #)
To create thresholds on metrics, indicate the name of the metric, the warning and critical levels, the comparison operator the number of occurrences and observation time using the CREATE THRESHOLD command. The observation attribute indicates the number of measurements that the metric values are averaged over.
For example, to create a threshold on waits for small IO requests for a IORM category called online that would give you a warning at 2500 milliseconds or higher and a critical at 4000 milliseconds or higher you would enter something like:
CREATE THRESHOLD ct_io_wt_sm_rq.online warning=2500, critical=4000, comparison='>', occurrences=2, observation=5
About Alert Email Notifications
In order to actually have the Exadata Storage Servers send notifications via email (or alternately SNMP) each of the servers has to be configured with the appropriate settings. This is done using the ALTER CELL command in CELLCLI.
ALTER CELL smtpServer='mailserver.somewhere.com', -
smtpFromAddr='exadata.cell01@somewhere.com', -
smtpPwd='email_password', -
smtpToAddr='someone@somewhere.com', -
notificationPolicy='critical,warning,clear', -
notificationMethod='mail'
There is also a verification command that can be run to test that the storage server can actually reach the mail server.
ALTER CELL VALIDATE MAIL
Watching for Undelivered Alerts
Once the alerts and notifications have been set up, it is still important to periodically check the storage servers just to make sure any alerts that have been generated have actually been delivered (via email and/or to Grid or Cloud Control).
LIST ALERTHISTORY where notificationState != 1 and examinedBy=''
If there are undelivered alerts double check the cell configuration, agent status and network connectivity.
Conclusion
The Oracle Exadata Database Machine is easily one of the fastest growing product lines for Oracle and with proven performance and availability. While we do face a learning curve to learn to fully manage and monitor the systems, it's easy to see that the Exadata Storage Server software provides a set of very powerful options that allow us to configure, manage and monitor the performance and status of the storage servers in an Oracle Exadata Database Machine.
No comments:
Post a Comment