Exadata Command Reference
Oracle Exadata Command Reference
Part 1: Jumpstarting on Exadata
Know your Oracle Exadata Database Machine and understand the building blocks where commands will be applied.
(Note: The purpose of this guide is educational; it is not intended to replace official Oracle-provided manuals or other documentation. The information in this guide is not validated by Oracle, is not supported by Oracle, and should only be used at your own risk.)
Let's begin with a whirlwind tour of the Oracle Exadata Database Machine. It comes in a rack with the components that make up a database infrastructure: disks, servers, networking gear, and so on. Three configuration types are available: full rack (see below), half rack, or quarter rack. The architecture is identical across all three types but the number of components differs.
Figure 1 Exadata Components, high-level view, at time of writing
Now let's dive into each of these components and the role they play. The following list applies to a full rack; you can also view them contextually via a really neat 3D demo.
The nodes run the Oracle Clusterware, the ASM instances, and the database instances. You may decide to create just one cluster or multiple ones. Similarly you may decide to create a single database on the cluster or multiple ones. If you were to create three databases – dev, int and QA - you would have two choices:
The first option allows you to add and remove instances of a database easily. For instance, with 8 nodes in a full rack, you may assign 2 nodes to dev, 2 to int, and 4 to QA. Suppose a full-fledged production stress test is planned and that temporarily needs all 8 nodes in QA to match 8 nodes in production. In this configuration, all you have to do is shut down the dev and int instances and start the other four instances of QA on those nodes. Once the stress test is complete, you can shut down those 4 QA instances and restart the dev and int instances on them.
If you run multiple production databases on a single rack of Exadata, you can still take advantage of this technique. If a specific database needs additional computing power temporarily to ride out a seasonal high demand, just shut down one instance of a different database and restart the instance of the more demanding one in that node. After the demand has waned, you can reverse the situation. You can also run two instances in the same node but they will compete for the resources – something you may not want. At the I/O level, you can control the resource usage by the instances using the IO Resource Manager (IORM).
On the other hand, with this option, you are still on just one cluster. When you upgrade the cluster, all the databases will need to be upgraded. The second option obviates that; there are individual clusters for each database – a complete separation. You can upgrade them or manipulate them any way you want without affecting the others. However, when you need additional computational power for other nodes, you can’t just start up an instance. You need to remove a node from that cluster and add the node to the other cluster where it is needed – an activity more complex compared to the simple shutdown and startup of instances.
Since the cells have the disks, how do the database compute nodes access them - or more specifically, how do the ASM instances running on the compute nodes access the disks? Well, the disks are presented to cells only, not to the compute nodes. The compute nodes see the disks through the cells. For the lack of a better analogy, this is akin to network-attached storage. (Please note, the cell disks are not presented as NAS; this is just an analogy.)
The flash disks are presented to the cell as storage devices as well, just like the normal disks. As a result they can be added to the pool of ASM disks to be used in the database for ultra fast access, or they can be used to create the smart flash cache layer, which is a secondary cache between database buffer cache and the storage. This layer caches the most used objects but does not follow the same algorithm as the database buffer cache, where everything is cached first before sending to the end user. Smart flash cache caches only those data items which are accessed frequently – hence the term “smart” in the name. The request for data not found in the smart flash cache is routed to disks automatically.
The Secret Sauce: Exadata Storage Server
So, you may be wondering, what’s the “secret sauce” for the Exadata Database Machine’s amazing performance? A suite of software known as Exadata Storage Server, which runs on the storage cells, is the primary reason behind that performance. In this section we will go over the components of the storage server very briefly (not a substitute for documentation!).
Cell Offloading
The storage in the Exadata Database Machine is not just dumb storage. The storage cells are intelligent enough to process some workload inside them, saving the database nodes from that work. This process is referred to as cell offloading. The exact nature of the offloaded activity is discussed in the following section.
Smart Scan
In a traditional Oracle database, when a user selects a row or even a single column in a row, the entire block containing that row is fetched from the disk to the buffer cache, and the selected row (or column, as the case may be) is then extracted from the block and presented to the user’s session. In the Exadata Database Machine, this process holds true for most types of access, except a very important few. Direct path accesses – for instance, full table scans and full index scans – are done differently. The Exadata Database Machine can pull the specific rows (or columns) from the disks directly and send them to the database nodes. This functionality is known as Smart Scan. It results in huge savings in I/O.
For instance your query might satisfy only 1,000 rows out of 1 billion but a full table scans in a traditional database retrieves all the blocks and filters the rows from them. Smart Scan, on the other hand, will extract only those 1,000 rows (or even specific columns from those rows, if those are requested) – potentially cutting I/O by 10 million times! The cell offloading enables the cells to accomplish this.
Not all the queries can take advantage of Smart Scan. Direct buffer reads can. An example of such queries is a full table scan. An index scan will look into index blocks first and then the table blocks – so, Smart Scan is not used.
iDB
How can storage cells know what columns and rows to filter from the data? This is done by another component inherently built into the storage software. The communication between nodes and cells employ a specially developed protocol called iDB (short for Intelligent Database). This protocol not only request the blocks (as it happens in an I/O call in a traditional database) but can optionally send other relevant information. In those cases where Smart Scan is possible, iDB sends the names the table, columns, predicates and other relevant information on the query. This information allows the cell to learn a lot more about the query instead of just the address of the blocks to retrieve. Similarly, the cells can send the row and column data instead of the traditional Oracle blocks using iDB.
Storage Indexes
How does Smart Scan achieve sending only those relevant rows and columns instead of blocks? A special data structure built on the pattern of the data within the storage cells enables this. For a specific segment, it stores the minimum, maximum, and whether nulls are present for all the columns of that segment in a specified region of the disk, usually 1MB in size. This data structure is called a storage index. When a cell gets a Smart Scan-enabled query from the database node via iDB, it checks which regions of the storage will not contain the data. For instance if the query predicate states where rating = 3, a region on the disk where the minimum and maximum values of the column RATING are 4 and 10 respectively will definitely not have any row that will match the predicate. Therefore the cell skips reading that portion of the disk. Checking the storage index, the cell excludes a lot of regions that will not contain that value and therefore saves a lot of I/O.
Although it has the word “index” in its name, a storage index is nothing like a normal index. Normal indexes are used to zero in on the locations where the rows are most likely to be found; storage indexes are used just for the opposite reason – where the rows are most likely not to be found. Also, unlike other segments, these are not stored on the disks; they reside in memory.
Smart Cache
Database buffer cache is where the data blocks come in before being shipped to the end user. If the data is found there, a trip to the storage is saved. However, if it not found, which might be the case in case of large databases, the I/O will inevitably come in. In Exadata Database Machine, a secondary cache can come in between the database buffer cache and the storage, called Smart Cache. The smart cache holds frequently accessed data and may satisfy the request from the database node from this cache instead of going to the disks – improving performance.
Infiniband Network
This is the network inside the Exadata Database Machine – the nervous system of the machine through which the different components such as database nodes and storage cells. Infiniband is a hardware media running a protocol called RDP (Reliable Datagram Protocol), which has high bandwidth and low latency – making the transfer of data extremely fast.
Disk Layout
The disk layout needs some additional explanation because that’s where most of the activities occur. As I mentioned previously, the disks are attached to the storage cells and presented as logical units (LUNs), on which physical volumes are built.
Each cell has 12 physical disks. In a high capacity configuration they are about 2TB and in a high performance configuration, they are about 600GB each. The disks are used for the database storage. Two of the 12 disks are also used for the home directory and other Linux operating system files. These two disks are divided into different partitions as shown in Figure 2 below.
Figure 2 Disk Layout
The physical disks are divided into multiple partitions. Each partition is then presented as a LUN to the cell. Some LUNs are used to create a filesystem for the OS. The others are presented as storage to the cell. These are called cell disks. The cell disks are further divided as grid disks, ostensibly referencing the grid infrastructure the disks are used inside. These grid disks are used to build ASM Diskgroups, so they are used as ASM disks. An ASM diskgroup is made up of several ASM disks from multiple storage cells. If the diskgroup is built with normal or high redundancy (which is the usual case), the failure groups are placed in different cells. As a result, if one cell fails, the data is still available on other cells. Finally the database is built on these diskgroups.
These diskgroups are created with the following attributes by default:
On two of the 12 disks, the operating system, Oracle Exadata Storage Server software, and other OS related filesystems such as /home are located. They occupy about 29GB on a disk. For protection, this area is mirrored as RAID1 with on another disk. The filesystems are mounted on that RAID1 volume.
However, this leaves two cell disks with less data than the other ten. If we create an ASM diskgroup on these 12 disks, it will have an imbalance on those two disks. Therefore, you (or whoever is doing the installation) should create another diskgroup with 29TB from the other 10 cell disks. This will create same sized ASM disks for other diskgroups. This “compensatory” diskgroup is usually named DBFS_DG. Since this diskgroup is built on the inner tracks of the disk, the performance is low compared to the outer tracks. Therefore instead of creating a database file here, you may want to use it for some other purpose such as ETL files. ETL files need a filesystem. You can create a database filesystem on this diskgroup – hence the name DBFS_DG. Of course, you can use it for anything you want, even for database files as well, especially for less accessed objects.
Now that you know the components, look at the next section to get a detailed description of these components.
Detailed Specifications
As of this writing, the current (third) generation of Exadata Database Machine comes in two models (X2-2 and X2-8); various sizes (full rack, half rack, and quarter rack); and three classes of storage (high performance, high capacity SAS, and high capacity SATA). For detailed specifications, please see the configuration specs on the Oracle website: X2-2, X2-8, X2-2 Storage Server.
Summary
In this installment you learned what Exadata is, what different hardware and software components it is made of, what enables its fast performance, and what you should be managing. A summary is provided below. In the next installment, you will learn about command categories and initial commands.
Oracle Exadata Command Reference
Part 2: Command Categories, Configuration, and Basic Commands
Learn different categories of commands and what to do after your new Exadata Database Machine is powered on.
(The purpose of this guide is educational; it is not intended to replace official Oracle-provided manuals or other documentation. The information in this guide is not validated by Oracle, is not supported by Oracle, and should only be used at your own risk.)
In Part 1, you learned about the composition of the Oracle Exadata Database Machine and its various components. Figure 1 shows the different components again and what types of commands are used in each.
Figure 1 Command categories
These are the basic categories of the commands. Of these only CellCLI and DCLI are Exadata specific. The rest, especially SQL, should be very familiar to DBAs.
Now that you know how narrow the scope of the commands is, do you feel a bit more relaxed? In the next sections we will see how these commands are used. (Note: Since CellCLI and DCLI are Exadata-specific commands, most DBAs making the transition to DMA are not expected to know about them. The next installment of the series – Part 3 –focuses on these two command categories exclusively.)
Configuration
Let’s start with the most exciting part: Your shiny new Exadata Database Machine is here, uncrated, mounted on the floorboards and connected to power. Now what?
Fortunately, the machine comes pre-imaged with all the necessary OS, software and drivers. There is no reason to tinker with the software installation. In fact, it’s not only unnecessary but dangerous as well, since it may void the warranty. You should not install any software on storage cells at all, and only the following on the database servers themselves:
The following section shows a sample configuration and should be followed as an illustration only. For simplicity, the OS covered here is Oracle Linux.
Configuration Worksheet
Oracle provides a detailed configuration worksheet that allows you to enter specific details of your implementation and decide on exact configuration. This worksheet is found in Exadata storage server in the following directory :
opt/oracle/cell/doc/doc
The exact file you want to open is e16099.pdf, which has all the worksheets to guide you how to configure. Here is an excerpt from the worksheet:
Figure 2 Worksheet excerpt
The configuration worksheet creates the following files in the directory /opt/oracle.SupportTools/onecommand. Here is a listing of that directory:
# ls
all_group cell_group config.dat patches
all_ib_group cell_ib_group dbs_group priv_ib_group
all_nodelist_group checkip.sh dbs_ib_group tmp
These files are very important. Here is a brief description of each file:
Hardware Profile
The next thing to do is to check the hardware profile. Oracle provides a tool for that as well. This is the command you should use:
# /opt/oracle.SupportTools/CheckHWnFWProfile
The output should be:
[SUCCESS] The hardware and firmware profile matches one of the supported profiles
If you see something different here, the message should be self-explanatory. The right thing to do at this point is to call up Exadata installation support since some hardware/software combination is not as expected.
Physical Disks
Next, you should check the disks to make sure they are up and online. Online does not mean they are available to ASM; it simply means the disks are visible to the server. To check the disks are visible and online, use this command:
# /opt/MegaRAID/MegaCli/MegaCli64 Pdlist -aAll |grep "Slot \|Firmware"
Here is truncated output:
Slot Number: 0
Firmware state: Online, Spun Up
Slot Number: 1
Firmware state: Online, Spun Up
… Output truncated …
Slot Number: 11
Firmware state: Online, Spun Up
If a disk is not online, you may want to replace it or at least understand the reason.
Flash Disks
After checking physical disks you should check flash disks. The Linux command for that is lsscsi, shown below.
# lsscsi |grep -i marvel
[1:0:0:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdm
[1:0:1:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdn
[1:0:2:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdo
[1:0:3:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdp
[2:0:0:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdq
[2:0:1:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdr
[2:0:2:0] disk ATA MARVELL SD88SA02 D20Y /dev/sds
[2:0:3:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdt
[3:0:0:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdu
[3:0:1:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdv
[3:0:2:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdw
[3:0:3:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdw
[4:0:0:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdy
[4:0:1:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdz
[4:0:2:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdaa
[4:0:3:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdab
By the way, you can also check the flashdisks from the CellCLI tool as well. TheCellCLI tool is explainedin detail in the next installment in this series.
#cellcli
CellCLI: Release 11.2.2.2.0 - Production on Sun Mar 13 12:57:24 EDT 2011
Copyright (c) 2007, 2009, Oracle. All rights reserved.
Cell Efficiency Ratio: 627M
CellCLI> list lun where disktype=flashdisk
1_0 1_0 normal
1_1 1_1 normal
1_2 1_2 normal
1_3 1_3 normal
2_0 2_0 normal
2_1 2_1 normal
2_2 2_2 normal
2_3 2_3 normal
4_0 4_0 normal
4_1 4_1 normal
4_2 4_2 normal
4_3 4_3 normal
5_0 5_0 normal
5_1 5_1 normal
5_2 5_2 normal
5_3 5_3 normal
To make sure the numbering of the flashdisks is correct, use the following command in CellCLI. Note that there is a hyphen (“-“) after the first line, since the command is too long to fit in one line and the “-“ is the continuation character.
CellCLI> list physicaldisk attributes name, id, slotnumber -
> where disktype="flashdisk" and status != "not present"
[1:0:0:0] 5080020000f21a2FMOD0 "PCI Slot: 4; FDOM: 0"
[1:0:1:0] 5080020000f21a2FMOD1 "PCI Slot: 4; FDOM: 1"
[1:0:2:0] 5080020000f21a2FMOD2 "PCI Slot: 4; FDOM: 2"
[1:0:3:0] 5080020000f21a2FMOD3 "PCI Slot: 4; FDOM: 3"
[2:0:0:0] 5080020000f131aFMOD0 "PCI Slot: 1; FDOM: 0"
[2:0:1:0] 5080020000f131aFMOD1 "PCI Slot: 1; FDOM: 1"
[2:0:2:0] 5080020000f131aFMOD2 "PCI Slot: 1; FDOM: 2"
[2:0:3:0] 5080020000f131aFMOD3 "PCI Slot: 1; FDOM: 3"
[3:0:0:0] 5080020000f3ec2FMOD0 "PCI Slot: 5; FDOM: 0"
[3:0:1:0] 5080020000f3ec2FMOD1 "PCI Slot: 5; FDOM: 1"
[3:0:2:0] 5080020000f3ec2FMOD2 "PCI Slot: 5; FDOM: 2"
[3:0:3:0] 5080020000f3ec2FMOD3 "PCI Slot: 5; FDOM: 3"
[4:0:0:0] 5080020000f3e16FMOD0 "PCI Slot: 2; FDOM: 0"
[4:0:1:0] 5080020000f3e16FMOD1 "PCI Slot: 2; FDOM: 1"
[4:0:2:0] 5080020000f3e16FMOD2 "PCI Slot: 2; FDOM: 2"
[4:0:3:0] 5080020000f3e16FMOD3 "PCI Slot: 2; FDOM: 3"
Auto-configuration
While it is possible to configure Exadata Database Machine manually, you don’t need to. In fact, you may not want to. Oracle provides three shell scripts for automatic configuration in the directory /opt/oracle.SupportTools/onecommand (these steps may change in later versions):
First, you should check the configuration for validity. To do that execute:
# check_ip.sh -m pre_applyconfig
Exadata Database Machine Network Verification version 1.9
Network verification mode pre_applyconfig starting ...
Saving output file from previous run as dbm.out_17739
Using name server xx.xxx.59.21 found in dbm.dat for all DNS lookups
Processing section DOMAIN : SUCCESS
Processing section NAME : SUCCESS
Processing section NTP : SUCCESS
Processing section GATEWAY : SUCCESS
Processing section SCAN : ERROR - see dbm.out for details
Processing section COMPUTE : ERROR - see dbm.out for details
Processing section CELL : ERROR - see dbm.out for details
Processing section ILOM : ERROR - see dbm.out for details
Processing section SWITCH : ERROR - see dbm.out for details
Processing section VIP : ERROR - see dbm.out for details
Processing section SMTP : SMTP "Email Server Settings" standardrelay.acmehotels.com 25:0
SUCCESS
One or more checks report ERROR. Review dbm.out for details
If you check the file dbm.out, you can see the exact error messages.
Running in mode pre_applyconfig
Using name server xx.xxx.59.21 found in dbm.dat for all DNS lookups
Processing section DOMAIN
test.prol
Processing section NAME
GOOD : xx.xxx.59.21 responds to resolve request for proldb01.test.prol
GOOD : xx.xxx.59.22 responds to resolve request for proldb01.test.prol
Processing section NTP
GOOD : xx.xxx.192.1 responds to time server query (/usr/sbin/ntpdate -q)
Processing section GATEWAY
GOOD : xx.xxx.192.1 pings successfully
GOOD : xx.xxx.18.1 pings successfully
Processing section SCAN
GOOD : prol-scan.test.prol resolves to 3 IP addresses
GOOD : prol-scan.test.prol forward resolves to xx.xxx.18.32
GOOD : xx.xxx.18.32 reverse resolves to prol-scan.test.prol.
ERROR : xx.xxx.18.32 pings
GOOD : prol-scan.test.prol forward resolves to xx.xxx.18.33
GOOD : xx.xxx.18.33 reverse resolves to prol-scan.test.prol.
ERROR : xx.xxx.18.33 pings
GOOD : prol-scan.test.prol forward resolves to xx.xxx.18.34
GOOD : xx.xxx.18.34 reverse resolves to prol-scan.test.prol.
ERROR : xx.xxx.18.34 pings
Processing section COMPUTE
GOOD : proldb01.test.prol forward resolves to xx.xxx.192.16
GOOD : xx.xxx.192.16 reverse resolves to proldb01.test.prol.
ERROR : xx.xxx.192.16 pings
GOOD : proldb02.test.prol forward resolves to xx.xxx.192.17
GOOD : xx.xxx.192.17 reverse resolves to proldb02.test.prol.
ERROR : xx.xxx.192.17 pings
GOOD : proldb03.test.prol forward resolves to xx.xxx.192.18
GOOD : xx.xxx.192.18 reverse resolves to proldb03.test.prol.
ERROR : xx.xxx.192.18 pings
… output truncated …
It will report all issues that must be addressed. After addressing all issues, execute the actual configuration:
# applyconfig.sh
After it completes, connect the Exadata Database Machine to your network and check for the validity:
# check_ip.sh -m post_applyconfig
It will report the output in the same manner as the pre_applyconfig parameter and will report any issue, if present. After fixing the issues, run the deployment script. That script actually executes several steps inside it – 29 in all. The most prudent thing to do is to first list out all the steps so that you can be familiar with them. The option -l (that’s the letter “l”; not the numeral “1”) displays all the steps in the list.
# deploy112.sh –l
To run all the steps you should issue
# deploy112.sh –i
If you would prefer, you can run steps one by one, or groups at a time. To run steps 1 through 3, issue:
# deploy112.sh –i -r 1-3
Or, to run only step 1:
# deploy112.sh -i -s 1
The steps are listed here. (Please note: the steps can change without notice. The most up-to-date list will always be found in the release notes that come with an Exadata box.)
Here is the output of the script (amply truncated at places to conserve space):
# ./deploy112.sh -i
Script started, file is /opt/oracle.SupportTools/onecommand/tmp/STEP-0-proldb01-20110331154414.log
Validating first boot...
This step will validate DNS, NTS, params.sh, dbmachine.params, and all the
files generated by the DB Machine Configurator
In Check and Fix Hosts...
INFO: This nslookup could take upto ten seconds to resolve if the host isn't in DNS, please wait..
INFO: Running /usr/bin/nslookup prol-scan...
INFO: Running /usr/bin/nslookup proldb02...
SUCCESS: SCAN and VIP found in DNS...
Looking up nodes in dbmachine.params and dbs_group...
SUCCESS: proldb01 has ip address of xx.xxx.192.16..A_OK
SUCCESS: proldb02 has ip address of xx.xxx.192.17..A_OK
… output truncated …
SUCCESS: proldb08 has ip address of xx.xxx.192.23..A_OK
SUCCESS: prol01-vip has ip address of xx.xxx.18.24..A_OK
SUCCESS: Found IP Address xx.xxx.18.24 for prol01-vip using ping...
SUCCESS: Based on bondeth0:xx.xxx.18.16 and NetMask:255.255.255.0 we picked bondeth0 as the appropriate VIP interface
SUCCESS: prol02-vip has ip address of xx.xxx.18.25..A_OK
SUCCESS: Found IP Address xx.xxx.18.24 for prol01-vip using ping...
SUCCESS: Based on bondeth0:xx.xxx.18.16 and NetMask:255.255.255.0 we picked bondeth0 as the appropriate VIP interface
… output truncated …
SUCCESS: prol08-vip has ip address of xx.xxx.18.31..A_OK
SUCCESS: Found IP Address xx.xxx.18.24 for prol01-vip using ping...
SUCCESS: Based on bondeth0:xx.xxx.18.16 and NetMask:255.255.255.0 we picked bondeth0 as the appropriate VIP interface
Checking blocksizes...
SUCCESS: DB blocksize is 16384 checks out
checking patches
checking patches and version = 11202
SUCCESS: Located patch# 10252487 in /opt/oracle.SupportTools/onecommand/patches...
INFO: Checking zip files
INFO: Validating zip file /opt/oracle.SupportTools/onecommand/p10098816_112020_Linux-x86-64_1of7.zip...
Archive: /opt/oracle.SupportTools/onecommand/p10098816_112020_Linux-x86-64_1of7.zip
Length Date Time Name
-------- ---- ---- ----
0 11-16-10 03:10 database/
0 11-16-10 03:03 database/install/
182 11-16-10 03:03 database/install/detachHome.sh
… output truncated …
41092 11-16-10 03:03 database/doc/install.112/e17212/concepts.htm
1892 11-16-10 03:03 database/doc/install.112/e17212/contents.js
44576 11-16-10 03:03 database/doc/install.112/e17212/crsunix.htm
ERROR: /usr/bin/unzip -l /opt/oracle.SupportTools/onecommand/p10098816_112020_Linux-x86-64_1of7.zip did not complete successfully: Return Status: 80 Step# 1
Exiting...
Time spent in step 1 = 1 seconds
INFO: Going to run /opt/oracle.cellos/ipconf /opt/oracle.SupportTools/onecommand/preconf-11-2-1-2-2.csv -verify -ignoremismatch -verbose to validate first boot...
INFO: Running /opt/oracle.cellos/ipconf -verify -ignoremismatch -verbose on this node...
Verifying of configuration for /opt/oracle.cellos/cell.conf
Config file exists : PASSED
Load configuration : PASSED
Config version defined : PASSED
Config version 11.2.2.1.1 has valid value : PASSED
Nameserver xx.xxx.59.21 has valid IP address syntax : PASSED
Nameserver xx.xxx.59.22 has valid IP address syntax : PASSED
Canonical hostname defined : PASSED
Canonical hostname has valid syntax : PASSED
Node type defined : PASSED
Node type db is valid : PASSED
This node type is db : PASSED
Timezone defined : PASSED
Timezone found in /usr/share/zoneinfo : PASSED
NTP server xx.xxx.192.1 has valid syntax : PASSED
NTP drift file defined : PASSED
Network eth0 interface defined : PASSED
IP address defined for eth0 : PASSED
IP address has valid syntax for eth0 : PASSED
Netmask defined for eth0 : PASSED
Netmask has valid syntax for eth0 : PASSED
Gateway has valid syntax for eth0 : PASSED
Gateway is inside network for eth0 : PASSED
Network type defined for eth0 : PASSED
Network type has proper value for eth0 : PASSED
Hostname defined for eth0 : PASSED
Hostname for eth0 has valid syntax : PASSED
Network bondeth0 interface defined : PASSED
IP address defined for bondeth0 : PASSED
IP address has valid syntax for bondeth0 : PASSED
Netmask defined for bondeth0 : PASSED
Netmask has valid syntax for bondeth0 : PASSED
Gateway has valid syntax for bondeth0 : PASSED
Gateway is inside network for bondeth0 : PASSED
Network type defined for bondeth0 : PASSED
Network type has proper value for bondeth0 : PASSED
Hostname defined for bondeth0 : PASSED
Hostname for bondeth0 has valid syntax : PASSED
Slave interfaces for bondeth0 defined : PASSED
Two slave interfaces for bondeth0 defined : PASSED
Master interface ib0 defined : PASSED
Master interface ib1 defined : PASSED
Network bondib0 interface defined : PASSED
IP address defined for bondib0 : PASSED
IP address has valid syntax for bondib0 : PASSED
Netmask defined for bondib0 : PASSED
Netmask has valid syntax for bondib0 : PASSED
Network type defined for bondib0 : PASSED
Network type has proper value for bondib0 : PASSED
Hostname defined for bondib0 : PASSED
Hostname for bondib0 has valid syntax : PASSED
Slave interfaces for bondib0 defined : PASSED
Two slave interfaces for bondib0 defined : PASSED
At least 1 configured Eth or bond over Eth interface(s) defined : PASSED
2 configured Infiniband interfaces defined : PASSED
1 configured bond over ib interface(s) defined : PASSED
ILOM hostname defined : PASSED
ILOM hostname has valid syntax : PASSED
ILOM short hostname defined : PASSED
ILOM DNS search defined : PASSED
ILOM full hostname matches short hostname and DNS search : PASSED
ILOM IP address defined : PASSED
ILOM IP address has valid syntax : PASSED
ILOM Netmask defined : PASSED
ILOM Netmask has valid syntax : PASSED
ILOM Gateway has valid syntax : PASSED
ILOM Gateway is inside network : PASSED
ILOM nameserver has valid IP address syntax : PASSED
ILOM use NTP servers defined : PASSED
ILOM use NTP has valid syntax : PASSED
ILOM first NTP server has non-empty value : PASSED
ILOM first NTP server has valid syntax : PASSED
ILOM timezone defined : PASSED
Done. Config OK
INFO: Printing group files....
######################################################
This is the list of Database nodes...
proldb01
… output truncated …
proldb08
This is the list of Cell nodes...
prolcel01
… output truncated …
prolcel14
This is the list of Database Private node names...
proldb01-priv
… output truncated …
proldb08-priv
This is the list of Cell Private node names...
prolcel01-priv
… output truncated …
prolcel14-priv
This is the list all node names...
proldb01
… output truncated …
prolcel14
This is the list all private node names...
proldb01-priv
… output truncated …
prolcel14-priv
This is the template /etc/hosts file for private nodes...
### Compute Node Private Interface details
172.32.128.1 proldb01-priv.test.prol proldb01-priv
… output truncated …
172.32.128.8 proldb08-priv.test.prol proldb08-priv
### CELL Node Private Interface details
172.32.128.9 prolcel01-priv.test.prol prolcel01-priv
… output truncated …
172.32.128.22 prolcel14-priv.test.prol prolcel14-priv
### Switch details
# The following 5 IP addresses are for reference only. You may
# not be able to reach these IP addresses from this machine
# xx.xxx.192.60 prolsw-kvm.test.prol prolsw-kvm
# xx.xxx.192.61 prolsw-ip.test.prol prolsw-ip
# xx.xxx.192.62 prolsw-ib1.test.prol prolsw-ib1
# xx.xxx.192.63 prolsw-ib2.test.prol prolsw-ib2
# xx.xxx.192.64 prolsw-ib3.test.prol prolsw-ib3
Creating work directories and validating required files
ERROR: Please review and fix all ERROR's, we appear to have 1 errors...
Exiting...
Time spent in step 0 ValidateThisNodeSetup = 1 seconds
Script done, file is /opt/oracle.SupportTools/onecommand/tmp/STEP-0-proldb01-20110331154414.log
Check post-deployment configuration for IP addresses.
# ./checkip.sh -m post_deploy112
Exadata Database Machine Network Verification version 1.9
Network verification mode post_deploy112 starting ...
Saving output file from previous run as dbm.out_772
Using name server xx.xxx.59.21 found in dbm.dat for all DNS lookups
Processing section DOMAIN : SUCCESS
Processing section NAME : SUCCESS
Processing section NTP : SUCCESS
Processing section GATEWAY : SUCCESS
Processing section SCAN : SUCCESS
Processing section COMPUTE : SUCCESS
Processing section CELL : SUCCESS
Processing section ILOM : SUCCESS
Processing section SWITCH : SUCCESS
Processing section VIP : SUCCESS
Processing section SMTP : SMTP "Email Server Settings" standardrelay.acmehotels.com 25:0
SUCCESS
If everything should come back OK, your installation and configuration was successful.
Basic Commands
Power
Let’s start by understanding some very first commands you will need: powering on and off. The command for that is IPMITOOL. To power on a cell or database server, issue this from another server:
# ipmitool -H prolcel01-ilom -U root chassis power on
IPMI – short for Intelligent Platform Management Interface - is an interface standard that allows remote management of a server from another using standardized interface. The servers in the Exadata Database Machine follow that. It’s not an Exadata command but rather a general Linux one. To get all the options available, execute:
# ipmitool –h
To stop a server, use the shutdown command. To stop immediately and keep it down, i.e. not reboot, execute:
# shutdown -h -y now
To shut down after 10 minutes (the users will get a warning message)
# shutdown -h -y 10
To reboot the server (the “-r” option is for reboot)
# shutdown –r –y now
Or, a simple:
# reboot
Sometimes you may want to shutdown multiple servers. The DCLI command comes handy that time. To shut down all the cells, execute the command:
# dcli -l root -g all_cells shutdown -h -y now
The –g option allows you to give a filename containing all the cell servers. For instance all_cells is a file as shown below:
# cat all_cells
prolcel01
prolcel02
prolcel03
prolcel04
prolcel05
prolcel06
prolcel07
prolcel08
You could use a similar file for all database servers and name it all_nodes. To shutdown all database servers:
# dcli -l root -g all_nodes shutdown -h -y now
You will learn the DCLI command in detail in the next installment.
Maintenance
From time to time you will need to maintain the servers. (Remember, you are the DMA now, not the DBA.) One of the most common tasks is to install new software Images. Let’s see some of the related commands.
To learn what software image is installed, use the following:
# imageinfo
Kernel version: 2.6.18-194.3.1.0.3.el5 #1 SMP Tue Aug 31 22:41:13 EDT 2010 x86_64
Cell version: OSS_11.2.0.3.0_LINUX.X64_101206.2
Cell rpm version: cell-11.2.2.2.0_LINUX.X64_101206.2-1
Active image version: 11.2.2.2.0.101206.2
Active image activated: 2011-01-21 14:09:21 -0800
Active image status: success
Active system partition on device: /dev/md5
Active software partition on device: /dev/md7
In partition rollback: Impossible
Cell boot usb partition: /dev/sdac1
Cell boot usb version: 11.2.2.2.0.101206.2
Inactive image version: undefined
Rollback to the inactive partitions: Impossible
You can glean some important information from the output above. Note the line Active image version: 11.2.2.2.0.101206.2, which indicates the specific Exadata Storage Server version. It also shows the date and time the software image was activated, which can be used to troubleshoot. If you see problems occurring from a specific date and time, you may be able to correlate.
On the heels of the above, the next logical question could be, if a new image was installed (activated), what was the version before this. To find out the history of all the image changes, you can use the imagehistory command.
# imagehistory
Version : 11.2.2.2.0.101206.2
Image activation date : 2011-01-21 14:09:21 -0800
Imaging mode : fresh
Imaging status : success
This is a fresh install, so you don’t see much of history.
Managing Infiniband
For the newly minted DMA nothing is as rattling as the networking commands. It’s like being given a stick-shift car when all you have ever driven is an automatic.
As DBAs you probably didn’t have to execute anything other than ifconfig and netstat. Well, they still apply; so don’t forget that. But let’s see how to extend that knowledge to infiniband.
Status
To get the status of the Infiniband services. First to check the status of the infiniband devices, use the ibstatus command.
# ibstatus
Infiniband device 'mlx4_0' port 1 status:
default gid: fe80:0000:0000:0000:0021:2800:01a0:fd45
base lid: 0x1a
sm lid: 0xc
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 40 Gb/sec (4X QDR)
Infiniband device 'mlx4_0' port 2 status:
default gid: fe80:0000:0000:0000:0021:2800:01a0:fd46
base lid: 0x1c
sm lid: 0xc
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 40 Gb/sec (4X QDR)
… output truncated …
If it comes out OK, the next step is to check the status of the Infiniband Link, using the iblinkinfo. Here is a truncated output to save space.
# iblinkinfo
Switch 0x0021286cd6ffa0a0 Sun DCS 36 QDR switch prolsw-ib1.test.prol:
1 1[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
1 2[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
… output truncated …
1 17[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
1 18[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
1 19[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 12 32[ ] "Sun DCS 36 QDR switch localhost" ( )
1 20[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
1 21[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 11 32[ ] "Sun DCS 36 QDR switch prolsw-ib2.test.prol" ( )
… output truncated …
1 36[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
Switch 0x0021286cd6eba0a0 Sun DCS 36 QDR switch localhost:
12 1[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 43 2[ ] "prolcel02 C 172.32.128.10 HCA-1" ( )
… output truncated …
12 11[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
12 12[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 17 2[ ] "proldb04 S 172.32.128.4 HCA-1" ( )
… output truncated …
12 18[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 11 17[ ] "Sun DCS 36 QDR switch prolsw-ib2.test.prol" ( )
12 19[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 20 1[ ] "prolcel13 C 172.32.128.21 HCA-1" ( )
… output truncated …
12 29[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
12 30[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 6 1[ ] "proldb05 S 172.32.128.5 HCA-1" ( )
12 31[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 11 31[ ] "Sun DCS 36 QDR switch prolsw-ib2.test.prol" ( )
12 32[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 1 19[ ] "Sun DCS 36 QDR switch prolsw-ib1.test.prol" ( )
12 33[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
… output truncated …
12 36[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
Switch 0x0021286ccc72a0a0 Sun DCS 36 QDR switch prolsw-ib2.test.prol:
11 1[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 42 1[ ] "prolcel02 C 172.32.128.10 HCA-1" ( )
… output truncated …
11 10[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 14 1[ ] "proldb02 S 172.32.128.2 HCA-1" ( )
11 11[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
… output truncated …
11 28[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 3 2[ ] "proldb07 S 172.32.128.7 HCA-1" ( )
11 29[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
11 30[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 7 2[ ] "proldb05 S 172.32.128.5 HCA-1" ( )
11 31[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 12 31[ ] "Sun DCS 36 QDR switch localhost" ( )
11 32[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 1 21[ ] "Sun DCS 36 QDR switch prolsw-ib1.test.prol" ( )
11 33[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
11 34[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
11 35[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
11 36[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
Topology
To get the topology of the infiniband network inside Exadata, use an Oracle supplied tool verify-topology, available in the directory /opt/oracle.SupportTools/ibdiagtools
# ./verify-topology.
[ DB Machine Infiniband Cabling Topology Verification Tool ]
[Version 11.2.1.3.b]
Looking at 1 rack(s).....
Spine switch check: Are any Exadata nodes connected ..............[SUCCESS]
Spine switch check: Any inter spine switch connections............[SUCCESS]
Spine switch check: Correct number of spine-leaf links............[SUCCESS]
Leaf switch check: Inter-leaf link check..........................[SUCCESS]
Leaf switch check: Correct number of leaf-spine connections.......[SUCCESS]
Check if all hosts have 2 CAs to different switches...............[SUCCESS]
Leaf switch check: cardinality and even distribution..............[SUCCESS]
Cluster Operations
To manage the Oracle Clusterware you use the same commands as you would in a traditional Oracle 11g Release 2 RAC database cluster. The commands are:
CRSCTL is not used much but you need it for some occasions – mostly to shut down the cluster and to start up (if is not started automatically during the machine startup). Remember, you have to be root to issue this command. However, the root user may not have the location of this tool in its path. So, you should use its fully qualified patch while issuing the command. Here is the command to stop the cluster on all nodes:
# <OracleGridInfrastructureHome>/bin/crsctl stop cluster –all
You don’t need to shutdown the cluster on all nodes; sometimes all you need is to shut down the cluster on only one node. To shut down the cluster on one node alone, use:
# <OracleGridInfrastructureHome>/bin/crsctl stop cluster –n <HostName>
Similarly to start the cluster on one of the nodes where the cluster was initially stopped,
# <OracleGridInfrastructureHome>/bin/crsctl start cluster –n <HostName>
Finally, you may want to make sure all the cluster resources are running. Here is the command for that. The status command does not need to be issued by root.
# <OracleGridInfrastructureHome>/bin/crsctl status resource –t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DBFS_DG.dg
ONLINE ONLINE proldb01
ONLINE ONLINE proldb02
ONLINE ONLINE proldb03
ONLINE ONLINE proldb04
ONLINE ONLINE proldb05
ONLINE ONLINE proldb06
ONLINE ONLINE proldb07
ONLINE ONLINE proldb08
ora.PRODATA.dg
ONLINE ONLINE proldb01
ONLINE ONLINE proldb02
ONLINE ONLINE proldb03
ONLINE ONLINE proldb04
ONLINE ONLINE proldb05
ONLINE ONLINE proldb06
ONLINE ONLINE proldb07
ONLINE ONLINE proldb08
ora.PRORECO.dg
ONLINE ONLINE proldb01
ONLINE ONLINE proldb02
ONLINE ONLINE proldb03
ONLINE ONLINE proldb04
ONLINE ONLINE proldb05
ONLINE ONLINE proldb06
ONLINE ONLINE proldb07
ONLINE ONLINE proldb08
ora.LISTENER.lsnr
ONLINE ONLINE proldb01
ONLINE ONLINE proldb02
ONLINE ONLINE proldb03
ONLINE ONLINE proldb04
ONLINE ONLINE proldb05
ONLINE ONLINE proldb06
ONLINE ONLINE proldb07
ONLINE ONLINE proldb08
ora.asm
ONLINE ONLINE proldb01 Started
ONLINE ONLINE proldb02 Started
ONLINE ONLINE proldb03 Started
ONLINE ONLINE proldb04 Started
ONLINE ONLINE proldb05 Started
ONLINE ONLINE proldb06 Started
ONLINE ONLINE proldb07 Started
ONLINE ONLINE proldb08
ora.gsd
OFFLINE OFFLINE proldb01
OFFLINE OFFLINE proldb02
OFFLINE OFFLINE proldb03
OFFLINE OFFLINE proldb04
OFFLINE OFFLINE proldb05
OFFLINE OFFLINE proldb06
OFFLINE OFFLINE proldb07
OFFLINE OFFLINE proldb08
ora.net1.network
ONLINE ONLINE proldb01
ONLINE ONLINE proldb02
ONLINE ONLINE proldb03
ONLINE ONLINE proldb04
ONLINE ONLINE proldb05
ONLINE ONLINE proldb06
ONLINE ONLINE proldb07
ONLINE ONLINE proldb08
ora.ons
ONLINE ONLINE proldb01
ONLINE ONLINE proldb02
ONLINE ONLINE proldb03
ONLINE ONLINE proldb04
ONLINE ONLINE proldb05
ONLINE ONLINE proldb06
ONLINE ONLINE proldb07
ONLINE ONLINE proldb08
ora.registry.acfs
ONLINE ONLINE proldb01
ONLINE ONLINE proldb02
ONLINE ONLINE proldb03
ONLINE ONLINE proldb04
ONLINE ONLINE proldb05
ONLINE ONLINE proldb06
ONLINE ONLINE proldb07
ONLINE ONLINE proldb08
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE proldb07
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE proldb02
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE proldb05
ora.cvu
1 ONLINE ONLINE proldb02
ora.proldb01.vip
1 ONLINE ONLINE proldb01
ora.proldb02.vip
1 ONLINE ONLINE proldb02
ora.proldb03.vip
1 ONLINE ONLINE proldb03
ora.proldb04.vip
1 ONLINE ONLINE proldb04
ora.proldb05.vip
1 ONLINE ONLINE proldb05
ora.proldb06.vip
1 ONLINE ONLINE proldb06
ora.proldb07.vip
1 ONLINE ONLINE proldb07
ora.proldb08.vip
1 ONLINE ONLINE proldb08
ora.prolrd.db
1 ONLINE ONLINE proldb01 Open
2 ONLINE ONLINE proldb02 Open
3 ONLINE ONLINE proldb03 Open
4 ONLINE ONLINE proldb04 Open
5 ONLINE ONLINE proldb05 Open
6 ONLINE ONLINE proldb06 Open
7 ONLINE ONLINE proldb07 Open
8 ONLINE ONLINE proldb08 Open
ora.oc4j
1 ONLINE ONLINE proldb01
ora.scan1.vip
1 ONLINE ONLINE proldb07
ora.scan2.vip
1 ONLINE ONLINE proldb02
ora.scan3.vip
1 ONLINE ONLINE proldb05
This output shows clearly the status of the various resources. A complete explanation of all the options of CRSCTL is not possible to give. Here is an abbreviated list of the options. To know the exact parameters required for each resource, simply call it with -h option. For instance, to know about the backup option, execute
# crsctl backup -h
Usage:
crsctl backup css votedisk
Backup the voting disk.
Here is the list of the options for CRSCTL:
crsctl add - add a resource, type or other entity
crsctl backup - back up voting disk for CSS
crsctl check - check a service, resource or other entity
crsctl config - output autostart configuration
crsctl debug - obtain or modify debug state
crsctl delete - delete a resource, type or other entity
crsctl disable - disable autostart
crsctl discover - discover DHCP server
crsctl enable - enable autostart
crsctl get - get an entity value
crsctl getperm - get entity permissions
crsctl lsmodules - list debug modules
crsctl modify - modify a resource, type or other entity
crsctl query - query service state
crsctl pin - Pin the nodes in the nodelist
crsctl relocate - relocate a resource, server or other entity
crsctl replace - replaces the location of voting files
crsctl release - release a DHCP lease
crsctl request - request a DHCP lease
crsctl setperm - set entity permissions
crsctl set - set an entity value
crsctl start - start a resource, server or other entity
crsctl status - get status of a resource or other entity
crsctl stop - stop a resource, server or other entity
crsctl unpin - unpin the nodes in the nodelist
crsctl unset - unset a entity value, restoring its default
Another command SRVCTL performs most of the server-based operations including resource (such as service) relocation. This is nothing different from the tool on a traditional Oracle RAC 11g Release 2 Cluster. To know more about the options in this tool, execute this command:
# srvctl -h
Usage: srvctl [-V]
Usage: srvctl add database -d <db_unique_name> -o <oracle_home> [-c {RACONENODE | RAC | SINGLE}
[-e <server_list>] [-i <instname>] [-w <timeout>]] [-m <domain_name>] [-p <spfile>] [-r {PRIMARY | PHYSICAL_STANDBY | LOGICAL_STANDBY | SNAPSHOT_STANDBY}]
[-s <start_options>] [-t <stop_options>] [-n <db_name>] [-y {AUTOMATIC | MANUAL}] [-g "<serverpool_list>"] [-x <node_name>] [-a "<diskgroup_list>"]
[-j "<acfs_path_list>"]
Usage: srvctl config database [-d <db_unique_name> [-a] ] [-v]
Usage: srvctl start database -d <db_unique_name> [-o <start_options>] [-n <node>]
Usage: srvctl stop database -d <db_unique_name> [-o <stop_options>] [-f]
Usage: srvctl status database -d <db_unique_name> [-f] [-v]
… output truncated …
IPMI Tool
Earlier in this article you saw a reference to the IPMI tool. We used it to power the servers on. But that is not the only thing you can do with this tool; there are plenty more options. If you want to find out what options are available, issue the command without any arguments.
# ipmitool
No command provided!
Commands:
raw Send a RAW IPMI request and print response
i2c Send an I2C Master Write-Read command and print response
spd Print SPD info from remote I2C device
lan Configure LAN Channels
chassis Get chassis status and set power state
power Shortcut to chassis power commands
event Send pre-defined events to MC
mc Management Controller status and global enables
sdr Print Sensor Data Repository entries and readings
sensor Print detailed sensor information
fru Print built-in FRU and scan SDR for FRU locators
sel Print System Event Log (SEL)
pef Configure Platform Event Filtering (PEF)
sol Configure and connect IPMIv2.0 Serial-over-LAN
tsol Configure and connect with Tyan IPMIv1.5 Serial-over-LAN
isol Configure IPMIv1.5 Serial-over-LAN
user Configure Management Controller users
channel Configure Management Controller channels
session Print session information
sunoem OEM Commands for Sun servers
kontronoem OEM Commands for Kontron devices
picmg Run a PICMG/ATCA extended cmd
fwum Update IPMC using Kontron OEM Firmware Update Manager
firewall Configure Firmware Firewall
shell Launch interactive IPMI shell
exec Run list of commands from file
set Set runtime variable for shell and exec
hpm Update HPM components using PICMG HPM.1 file
It’s not possible to explain each option here. Let’s examine one of the most used ones. The option sel shows System Event Log, one of the key commands you will need to use.
# ipmitool sel
SEL Information
Version : 2.0 (v1.5, v2 compliant)
Entries : 96
Free Space : 14634 bytes
Percent Used : 9%
Last Add Time : 02/27/2011 20:23:44
Last Del Time : Not Available
Overflow : false
Supported Cmds : 'Reserve' 'Get Alloc Info'
# of Alloc Units : 909
Alloc Unit Size : 18
# Free Units : 813
Largest Free Blk : 813
Max Record Size : 18
The output is summary only. To know the details of the Event Log, you can use an additional parameter: list.
# ipmitool sel list
1 | 01/21/2011 | 07:05:39 | System ACPI Power State #0x26 | S5/G2: soft-off | Asserted
2 | 01/21/2011 | 08:59:43 | System Boot Initiated | System Restart | Asserted
3 | 01/21/2011 | 08:59:44 | Entity Presence #0x54 | Device Present
4 | 01/21/2011 | 08:59:44 | System Boot Initiated | Initiated by hard reset | Asserted
5 | 01/21/2011 | 08:59:44 | System Firmware Progress | Memory initialization | Asserted
6 | 01/21/2011 | 08:59:44 | System Firmware Progress | Primary CPU initialization | Asserted
7 | 01/21/2011 | 08:59:49 | Entity Presence #0x58 | Device Present
8 | 01/21/2011 | 08:59:52 | Entity Presence #0x57 | Device Present
9 | 01/21/2011 | 08:59:53 | System Boot Initiated | Initiated by warm reset | Asserted
a | 01/21/2011 | 08:59:53 | System Firmware Progress | Memory initialization | Asserted
b | 01/21/2011 | 08:59:53 | System Firmware Progress | Primary CPU initialization | Asserted
c | 01/21/2011 | 08:59:54 | System Boot Initiated | Initiated by warm reset | Asserted
d | 01/21/2011 | 08:59:55 | System Firmware Progress | Memory initialization | Asserted
e | 01/21/2011 | 08:59:55 | System Firmware Progress | Primary CPU initialization | Asserted
f | 01/21/2011 | 09:00:01 | Entity Presence #0x55 | Device Present
... truncated ...
The output has been shown partially to conserve space. This is one of the key commands you should be aware of. In a troubleshooting episode, you should check the system even log to make sure the components have not failed. If they did, of course, you would have to replace them before going further. If you get a clean bill of health from IPMITOOL, you should go to the next step of making sure you have no issues with the cluster, then no issues with the RAC database and so on.
Oracle Exadata Command Reference
Part 1: Jumpstarting on Exadata
Know your Oracle Exadata Database Machine and understand the building blocks where commands will be applied.
(Note: The purpose of this guide is educational; it is not intended to replace official Oracle-provided manuals or other documentation. The information in this guide is not validated by Oracle, is not supported by Oracle, and should only be used at your own risk.)
Let's begin with a whirlwind tour of the Oracle Exadata Database Machine. It comes in a rack with the components that make up a database infrastructure: disks, servers, networking gear, and so on. Three configuration types are available: full rack (see below), half rack, or quarter rack. The architecture is identical across all three types but the number of components differs.
Figure 1 Exadata Components, high-level view, at time of writing
Now let's dive into each of these components and the role they play. The following list applies to a full rack; you can also view them contextually via a really neat 3D demo.
- Database Nodes – The Exadata Database Machine runs Oracle Database 11g Real Application Cluster. The cluster and the database run on the servers known as database nodes or compute nodes (or simply “nodes”). A full rack has 8 nodes running Oracle Linux or Oracle Solaris.
- Storage cells - The disks are not attached to the database compute nodes, as is normally the case with the direct attached storage, but rather to a different server known as the storage cell (or just “cell”; there are 14 of them in a full rack). The Oracle Exadata Server Software runs in these cells on top of the OS.
- Disks – each cell has 12 disks. Depending on the configuration, these disks are either 600GB high performance or 2TB high capacity (GB here means 1 billion bytes, not 1024MB). You have a choice in the disk type while making the purchase.
- Flash disks – each cell also has 384GB of flash disks. These disks can be presented to the compute nodes as storage (to be used by the database) or used a secondary cache for the database cluster (called smart cache).
- Infiniband circuitry – the cells and nodes are connected through infiniband for speed and low latency. There are 3 infiniband switches for redundancy and throughput. Note: there are no fiber switches since there is no fiber component.
- Ethernet switch – the outside world can communicate via infiniband, or by Ethernet. There is a set of Ethernet switches with ports open to the outside. The clients may connect to the nodes using Ethernet. DMAs and others connect to the nodes and cells using Ethernet as well. Backups are preferably via infiniband but they can be done through network as well.
- KVM switch – there is a keyboard, video, and mouse switch to get direct access to the nodes and cells physically. This is used initially while setting up and when the network to the system is not available. In a normal environment you will not need to go near the rack and access this KVM, not even for powering on and off the cells and nodes. Why not? You’ll learn why in the next installment. (Not all models have a KVM switch.)
The nodes run the Oracle Clusterware, the ASM instances, and the database instances. You may decide to create just one cluster or multiple ones. Similarly you may decide to create a single database on the cluster or multiple ones. If you were to create three databases – dev, int and QA - you would have two choices:
- One cluster – create one cluster and create the three databases
- Three clusters – create three different clusters and one database in each of them
The first option allows you to add and remove instances of a database easily. For instance, with 8 nodes in a full rack, you may assign 2 nodes to dev, 2 to int, and 4 to QA. Suppose a full-fledged production stress test is planned and that temporarily needs all 8 nodes in QA to match 8 nodes in production. In this configuration, all you have to do is shut down the dev and int instances and start the other four instances of QA on those nodes. Once the stress test is complete, you can shut down those 4 QA instances and restart the dev and int instances on them.
If you run multiple production databases on a single rack of Exadata, you can still take advantage of this technique. If a specific database needs additional computing power temporarily to ride out a seasonal high demand, just shut down one instance of a different database and restart the instance of the more demanding one in that node. After the demand has waned, you can reverse the situation. You can also run two instances in the same node but they will compete for the resources – something you may not want. At the I/O level, you can control the resource usage by the instances using the IO Resource Manager (IORM).
On the other hand, with this option, you are still on just one cluster. When you upgrade the cluster, all the databases will need to be upgraded. The second option obviates that; there are individual clusters for each database – a complete separation. You can upgrade them or manipulate them any way you want without affecting the others. However, when you need additional computational power for other nodes, you can’t just start up an instance. You need to remove a node from that cluster and add the node to the other cluster where it is needed – an activity more complex compared to the simple shutdown and startup of instances.
Since the cells have the disks, how do the database compute nodes access them - or more specifically, how do the ASM instances running on the compute nodes access the disks? Well, the disks are presented to cells only, not to the compute nodes. The compute nodes see the disks through the cells. For the lack of a better analogy, this is akin to network-attached storage. (Please note, the cell disks are not presented as NAS; this is just an analogy.)
The flash disks are presented to the cell as storage devices as well, just like the normal disks. As a result they can be added to the pool of ASM disks to be used in the database for ultra fast access, or they can be used to create the smart flash cache layer, which is a secondary cache between database buffer cache and the storage. This layer caches the most used objects but does not follow the same algorithm as the database buffer cache, where everything is cached first before sending to the end user. Smart flash cache caches only those data items which are accessed frequently – hence the term “smart” in the name. The request for data not found in the smart flash cache is routed to disks automatically.
The Secret Sauce: Exadata Storage Server
So, you may be wondering, what’s the “secret sauce” for the Exadata Database Machine’s amazing performance? A suite of software known as Exadata Storage Server, which runs on the storage cells, is the primary reason behind that performance. In this section we will go over the components of the storage server very briefly (not a substitute for documentation!).
Cell Offloading
The storage in the Exadata Database Machine is not just dumb storage. The storage cells are intelligent enough to process some workload inside them, saving the database nodes from that work. This process is referred to as cell offloading. The exact nature of the offloaded activity is discussed in the following section.
Smart Scan
In a traditional Oracle database, when a user selects a row or even a single column in a row, the entire block containing that row is fetched from the disk to the buffer cache, and the selected row (or column, as the case may be) is then extracted from the block and presented to the user’s session. In the Exadata Database Machine, this process holds true for most types of access, except a very important few. Direct path accesses – for instance, full table scans and full index scans – are done differently. The Exadata Database Machine can pull the specific rows (or columns) from the disks directly and send them to the database nodes. This functionality is known as Smart Scan. It results in huge savings in I/O.
For instance your query might satisfy only 1,000 rows out of 1 billion but a full table scans in a traditional database retrieves all the blocks and filters the rows from them. Smart Scan, on the other hand, will extract only those 1,000 rows (or even specific columns from those rows, if those are requested) – potentially cutting I/O by 10 million times! The cell offloading enables the cells to accomplish this.
Not all the queries can take advantage of Smart Scan. Direct buffer reads can. An example of such queries is a full table scan. An index scan will look into index blocks first and then the table blocks – so, Smart Scan is not used.
iDB
How can storage cells know what columns and rows to filter from the data? This is done by another component inherently built into the storage software. The communication between nodes and cells employ a specially developed protocol called iDB (short for Intelligent Database). This protocol not only request the blocks (as it happens in an I/O call in a traditional database) but can optionally send other relevant information. In those cases where Smart Scan is possible, iDB sends the names the table, columns, predicates and other relevant information on the query. This information allows the cell to learn a lot more about the query instead of just the address of the blocks to retrieve. Similarly, the cells can send the row and column data instead of the traditional Oracle blocks using iDB.
Storage Indexes
How does Smart Scan achieve sending only those relevant rows and columns instead of blocks? A special data structure built on the pattern of the data within the storage cells enables this. For a specific segment, it stores the minimum, maximum, and whether nulls are present for all the columns of that segment in a specified region of the disk, usually 1MB in size. This data structure is called a storage index. When a cell gets a Smart Scan-enabled query from the database node via iDB, it checks which regions of the storage will not contain the data. For instance if the query predicate states where rating = 3, a region on the disk where the minimum and maximum values of the column RATING are 4 and 10 respectively will definitely not have any row that will match the predicate. Therefore the cell skips reading that portion of the disk. Checking the storage index, the cell excludes a lot of regions that will not contain that value and therefore saves a lot of I/O.
Although it has the word “index” in its name, a storage index is nothing like a normal index. Normal indexes are used to zero in on the locations where the rows are most likely to be found; storage indexes are used just for the opposite reason – where the rows are most likely not to be found. Also, unlike other segments, these are not stored on the disks; they reside in memory.
Smart Cache
Database buffer cache is where the data blocks come in before being shipped to the end user. If the data is found there, a trip to the storage is saved. However, if it not found, which might be the case in case of large databases, the I/O will inevitably come in. In Exadata Database Machine, a secondary cache can come in between the database buffer cache and the storage, called Smart Cache. The smart cache holds frequently accessed data and may satisfy the request from the database node from this cache instead of going to the disks – improving performance.
Infiniband Network
This is the network inside the Exadata Database Machine – the nervous system of the machine through which the different components such as database nodes and storage cells. Infiniband is a hardware media running a protocol called RDP (Reliable Datagram Protocol), which has high bandwidth and low latency – making the transfer of data extremely fast.
Disk Layout
The disk layout needs some additional explanation because that’s where most of the activities occur. As I mentioned previously, the disks are attached to the storage cells and presented as logical units (LUNs), on which physical volumes are built.
Each cell has 12 physical disks. In a high capacity configuration they are about 2TB and in a high performance configuration, they are about 600GB each. The disks are used for the database storage. Two of the 12 disks are also used for the home directory and other Linux operating system files. These two disks are divided into different partitions as shown in Figure 2 below.
Figure 2 Disk Layout
The physical disks are divided into multiple partitions. Each partition is then presented as a LUN to the cell. Some LUNs are used to create a filesystem for the OS. The others are presented as storage to the cell. These are called cell disks. The cell disks are further divided as grid disks, ostensibly referencing the grid infrastructure the disks are used inside. These grid disks are used to build ASM Diskgroups, so they are used as ASM disks. An ASM diskgroup is made up of several ASM disks from multiple storage cells. If the diskgroup is built with normal or high redundancy (which is the usual case), the failure groups are placed in different cells. As a result, if one cell fails, the data is still available on other cells. Finally the database is built on these diskgroups.
These diskgroups are created with the following attributes by default:
Parameter | Description | Value |
_._DIRVERSION | The minimum allowed version for directories | 11.2.0.2.0 |
COMPATIBLE.ASM | The maximum ASM version whose features can use this diskgroup. For instance ASM Volume Management is available in 11.2 only. If this parameter is set to 11.1, then this diskgroup can’t be used for an ASM volume. | 11.2.0.2.0 |
IDP.TYPE | Intelligent Data Placement, a feature of ASM that allows placing data in such a way that more frequently accessed data is located close to the periphery of the disk where the access is faster. | dynamic |
CELL.SMART_SCAN_CAPABLE | Can this diskgroup be enabled for Exadata Storage Server’s Smart Scan Capability? | TRUE |
COMPATIBLE | The minimum version of the database that can be created on this diskgroup. The far back you go back in version number, the more the message passing between RExadata Database MachineS and ASM instances causing performance issue. So, unless you plan to create a pre-11.2 database here (which you most likely do not plan on), leave it as it is. | 11.2.0.2 |
AU Size | The size of Allocation Unit on this disk. The AU is the least addressable unit on the diskgroup. |
On two of the 12 disks, the operating system, Oracle Exadata Storage Server software, and other OS related filesystems such as /home are located. They occupy about 29GB on a disk. For protection, this area is mirrored as RAID1 with on another disk. The filesystems are mounted on that RAID1 volume.
However, this leaves two cell disks with less data than the other ten. If we create an ASM diskgroup on these 12 disks, it will have an imbalance on those two disks. Therefore, you (or whoever is doing the installation) should create another diskgroup with 29TB from the other 10 cell disks. This will create same sized ASM disks for other diskgroups. This “compensatory” diskgroup is usually named DBFS_DG. Since this diskgroup is built on the inner tracks of the disk, the performance is low compared to the outer tracks. Therefore instead of creating a database file here, you may want to use it for some other purpose such as ETL files. ETL files need a filesystem. You can create a database filesystem on this diskgroup – hence the name DBFS_DG. Of course, you can use it for anything you want, even for database files as well, especially for less accessed objects.
Now that you know the components, look at the next section to get a detailed description of these components.
Detailed Specifications
As of this writing, the current (third) generation of Exadata Database Machine comes in two models (X2-2 and X2-8); various sizes (full rack, half rack, and quarter rack); and three classes of storage (high performance, high capacity SAS, and high capacity SATA). For detailed specifications, please see the configuration specs on the Oracle website: X2-2, X2-8, X2-2 Storage Server.
Summary
In this installment you learned what Exadata is, what different hardware and software components it is made of, what enables its fast performance, and what you should be managing. A summary is provided below. In the next installment, you will learn about command categories and initial commands.
Term | Description |
Cell Offloading | The ability of the storage cells to execute some part of the processing of a query, and in the process filter the unnecessary data at the storage level. |
Smart Scan | The feature that allows the cells to search for data only in relevant cells; not all |
iDB | Intelligent Database protocol that allows database nodes to pass along information on the query, e.g. the predicate. It enables Smart Scan. |
Node | Also known as Database Node or Compute Node. This is where the database, ASM and clusterware are run. The clients connect to this. Runs Oracle Enterprise Linux. |
Cell | Also known as Storage Cells, which run the Exadata Storage Server software. The disks for database are attached to this. Runs Oracle Enterprise Linux. |
Smart Flash | Flash memory based storage to be used as a disk, or as a secondary cache for frequently accessed segments to reduce disk access. |
Oracle Exadata Command Reference
Part 2: Command Categories, Configuration, and Basic Commands
Learn different categories of commands and what to do after your new Exadata Database Machine is powered on.
(The purpose of this guide is educational; it is not intended to replace official Oracle-provided manuals or other documentation. The information in this guide is not validated by Oracle, is not supported by Oracle, and should only be used at your own risk.)
In Part 1, you learned about the composition of the Oracle Exadata Database Machine and its various components. Figure 1 shows the different components again and what types of commands are used in each.
Figure 1 Command categories
- Linux commands - Let’s start with the lowest-level component – the physical disk. The physical disk, as you learned from the previous installment, is the actual disk drive. It has to be partitioned to be used for ASM and regular filesystem. Normal disk management commands come here, e.g. fdisk. The storage cells are Linux servers; so all the regular Unix server administration tasks – shutdown, ps, etc., are relevant here. (For a refresher on Linux commands, you can check out my five-part series on advanced Linux commands.)
- CellCLI - Let’s move on the next stack in the software: the Exadata Storage Server. To manage this, Oracle provides a command line tool: CellCLI (Cell Command Line Interpreter). All the cell-related commands are entered through the CellCLI.
- DCLI - The scope of the CellCLI command is the cell where it is run, not in other cells. Sometimes you may want to execute a command across multiple cells from one command prompt, e.g. shutting down multiple nodes. There is another command line tool for that: DCLI.
- SQL – Once the cell disks are made available to the database nodes, the rest of the work is similar to what happens in a typical Oracle RAC database, in the language you use every day: SQL. SQL*Plus is an interface many DBAs use. You can also use other interfaces such as Oracle SQL Developer. If you have Grid Control, there are lots of commands you don’t even need to remember; they will be GUI based.
- ASMCMD – ASMCMD this is the command line interface for managing ASM resources like diskgroups, backups, etc.
- SRVCTL – SRVTCL is a command-line interface to manage Oracle Database 11.2 RAC Clusters. At the database level, most of the commands related to cluster, e.g. starting/stopping cluster resources, checking for status, etc. can be done through this interface.
- CRSCTL – CRSCTL is another tool to manage clusters. As of 11.2, the need to use this tool has dwindled to near zero. But there is at least one command in this category.
These are the basic categories of the commands. Of these only CellCLI and DCLI are Exadata specific. The rest, especially SQL, should be very familiar to DBAs.
Now that you know how narrow the scope of the commands is, do you feel a bit more relaxed? In the next sections we will see how these commands are used. (Note: Since CellCLI and DCLI are Exadata-specific commands, most DBAs making the transition to DMA are not expected to know about them. The next installment of the series – Part 3 –focuses on these two command categories exclusively.)
Configuration
Let’s start with the most exciting part: Your shiny new Exadata Database Machine is here, uncrated, mounted on the floorboards and connected to power. Now what?
Fortunately, the machine comes pre-imaged with all the necessary OS, software and drivers. There is no reason to tinker with the software installation. In fact, it’s not only unnecessary but dangerous as well, since it may void the warranty. You should not install any software on storage cells at all, and only the following on the database servers themselves:
- Grid Control Agent (required for management through Grid Control, explained in Part 4)
- RMAN Media Management Library (to back up to tape)
- Security Agent (if needed)
- Network – you should decide what names you will use for the servers, decide on IP addresses, have them in DNS, etc.
- SMTP and SNMP information - for sending mails, alerts, etc.
- Storage layout to address your specific requirements – for instance do you want Normal or High Redundancy, how many diskgroups do you want, what do you want to name them, etc.?
- 1. Storage configuration
- 2. OS configuration
- 3. Creation of userids in Linux or Oracle Solaris
- 4. ASM configuration
- 5. Clusterware installation
- 6. Database creation
The following section shows a sample configuration and should be followed as an illustration only. For simplicity, the OS covered here is Oracle Linux.
Configuration Worksheet
Oracle provides a detailed configuration worksheet that allows you to enter specific details of your implementation and decide on exact configuration. This worksheet is found in Exadata storage server in the following directory :
opt/oracle/cell/doc/doc
The exact file you want to open is e16099.pdf, which has all the worksheets to guide you how to configure. Here is an excerpt from the worksheet:
Figure 2 Worksheet excerpt
The configuration worksheet creates the following files in the directory /opt/oracle.SupportTools/onecommand. Here is a listing of that directory:
# ls
all_group cell_group config.dat patches
all_ib_group cell_ib_group dbs_group priv_ib_group
all_nodelist_group checkip.sh dbs_ib_group tmp
These files are very important. Here is a brief description of each file:
File Name | Description |
all_group | List of database nodes and storage cells in this Exadata Database Machine. Here is an excerpt: proldb01 proldb02 proldb03 proldb04 These are the database server nodes. |
all_ib_group | All host names of the private interconnects, both of cell servers and database nodes. Here is an excerpt from this file: proldb01-priv proldb02-priv proldb03-priv proldb04-priv proldb05-priv |
all_nodelist_group | All host names – public, hosts, private interconnects – of both storage and database nodes. Here is an excerpt from this file: proldb07 proldb08 prolcel01 prolcel02 prolcel03 |
cell_group | Host names of all cell servers. Here is an excerpt from this file: prolcel01 prolcel02 prolcel03 prolcel04 prolcel05 |
cell_ib_group | Hostnames of private interconnects of all cell servers. Here is an excerpt from this file: prolcel01-priv prolcel02-priv prolcel03-priv prolcel04-priv prolcel05-priv |
config.dat | The data file that is created from the configuration worksheet and is used to create the various scripts. Here is an excerpt from this file: customername=AcmeBank dbmprefix=prol cnbase=db cellbase=cel machinemodel=X2-2 Full rack dbnodecount=8 cellnodecount=14 |
dbs_group | Hostnames of the database nodes, similar to the cell servers. Here is an excerpt from the file: proldb01 proldb02 proldb03 proldb04 |
dbs_ib_group | Hostnames of private interconnects of the database nodes, similar to the cell servers. Here is an excerpt from the file: proldb01-priv proldb02-priv proldb03-priv proldb04-priv |
priv_ib_group | All private interconnect hostnames and their corresponding IP addresses are listed in this file. This is used to populate /etc/hosts file. Here is an excerpt from the file: ### Compute Node Private Interface details 172.32.128.1 proldb01-priv.test.prol proldb01-priv 172.32.128.2 proldb02-priv.test.prol proldb02-priv 172.32.128.3 proldb03-priv.test.prol proldb03-priv 172.32.128.4 proldb04-priv.test.prol proldb04-priv |
checkip.sh | This is a shell script to validate the accuracy of the network configuration. This is one of the most important files. The chckip script is called at multiple places with different parameters as you will see to perform validation at multiple places. |
Hardware Profile
The next thing to do is to check the hardware profile. Oracle provides a tool for that as well. This is the command you should use:
# /opt/oracle.SupportTools/CheckHWnFWProfile
The output should be:
[SUCCESS] The hardware and firmware profile matches one of the supported profiles
If you see something different here, the message should be self-explanatory. The right thing to do at this point is to call up Exadata installation support since some hardware/software combination is not as expected.
Physical Disks
Next, you should check the disks to make sure they are up and online. Online does not mean they are available to ASM; it simply means the disks are visible to the server. To check the disks are visible and online, use this command:
# /opt/MegaRAID/MegaCli/MegaCli64 Pdlist -aAll |grep "Slot \|Firmware"
Here is truncated output:
Slot Number: 0
Firmware state: Online, Spun Up
Slot Number: 1
Firmware state: Online, Spun Up
… Output truncated …
Slot Number: 11
Firmware state: Online, Spun Up
If a disk is not online, you may want to replace it or at least understand the reason.
Flash Disks
After checking physical disks you should check flash disks. The Linux command for that is lsscsi, shown below.
# lsscsi |grep -i marvel
[1:0:0:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdm
[1:0:1:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdn
[1:0:2:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdo
[1:0:3:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdp
[2:0:0:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdq
[2:0:1:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdr
[2:0:2:0] disk ATA MARVELL SD88SA02 D20Y /dev/sds
[2:0:3:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdt
[3:0:0:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdu
[3:0:1:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdv
[3:0:2:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdw
[3:0:3:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdw
[4:0:0:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdy
[4:0:1:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdz
[4:0:2:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdaa
[4:0:3:0] disk ATA MARVELL SD88SA02 D20Y /dev/sdab
By the way, you can also check the flashdisks from the CellCLI tool as well. TheCellCLI tool is explainedin detail in the next installment in this series.
#cellcli
CellCLI: Release 11.2.2.2.0 - Production on Sun Mar 13 12:57:24 EDT 2011
Copyright (c) 2007, 2009, Oracle. All rights reserved.
Cell Efficiency Ratio: 627M
CellCLI> list lun where disktype=flashdisk
1_0 1_0 normal
1_1 1_1 normal
1_2 1_2 normal
1_3 1_3 normal
2_0 2_0 normal
2_1 2_1 normal
2_2 2_2 normal
2_3 2_3 normal
4_0 4_0 normal
4_1 4_1 normal
4_2 4_2 normal
4_3 4_3 normal
5_0 5_0 normal
5_1 5_1 normal
5_2 5_2 normal
5_3 5_3 normal
To make sure the numbering of the flashdisks is correct, use the following command in CellCLI. Note that there is a hyphen (“-“) after the first line, since the command is too long to fit in one line and the “-“ is the continuation character.
CellCLI> list physicaldisk attributes name, id, slotnumber -
> where disktype="flashdisk" and status != "not present"
[1:0:0:0] 5080020000f21a2FMOD0 "PCI Slot: 4; FDOM: 0"
[1:0:1:0] 5080020000f21a2FMOD1 "PCI Slot: 4; FDOM: 1"
[1:0:2:0] 5080020000f21a2FMOD2 "PCI Slot: 4; FDOM: 2"
[1:0:3:0] 5080020000f21a2FMOD3 "PCI Slot: 4; FDOM: 3"
[2:0:0:0] 5080020000f131aFMOD0 "PCI Slot: 1; FDOM: 0"
[2:0:1:0] 5080020000f131aFMOD1 "PCI Slot: 1; FDOM: 1"
[2:0:2:0] 5080020000f131aFMOD2 "PCI Slot: 1; FDOM: 2"
[2:0:3:0] 5080020000f131aFMOD3 "PCI Slot: 1; FDOM: 3"
[3:0:0:0] 5080020000f3ec2FMOD0 "PCI Slot: 5; FDOM: 0"
[3:0:1:0] 5080020000f3ec2FMOD1 "PCI Slot: 5; FDOM: 1"
[3:0:2:0] 5080020000f3ec2FMOD2 "PCI Slot: 5; FDOM: 2"
[3:0:3:0] 5080020000f3ec2FMOD3 "PCI Slot: 5; FDOM: 3"
[4:0:0:0] 5080020000f3e16FMOD0 "PCI Slot: 2; FDOM: 0"
[4:0:1:0] 5080020000f3e16FMOD1 "PCI Slot: 2; FDOM: 1"
[4:0:2:0] 5080020000f3e16FMOD2 "PCI Slot: 2; FDOM: 2"
[4:0:3:0] 5080020000f3e16FMOD3 "PCI Slot: 2; FDOM: 3"
Auto-configuration
While it is possible to configure Exadata Database Machine manually, you don’t need to. In fact, you may not want to. Oracle provides three shell scripts for automatic configuration in the directory /opt/oracle.SupportTools/onecommand (these steps may change in later versions):
- check_ip.sh – for checking the configuration at various stages
- applyconfig.sh – to change the configuration
- deploy112.sh – for final deployment
First, you should check the configuration for validity. To do that execute:
# check_ip.sh -m pre_applyconfig
Exadata Database Machine Network Verification version 1.9
Network verification mode pre_applyconfig starting ...
Saving output file from previous run as dbm.out_17739
Using name server xx.xxx.59.21 found in dbm.dat for all DNS lookups
Processing section DOMAIN : SUCCESS
Processing section NAME : SUCCESS
Processing section NTP : SUCCESS
Processing section GATEWAY : SUCCESS
Processing section SCAN : ERROR - see dbm.out for details
Processing section COMPUTE : ERROR - see dbm.out for details
Processing section CELL : ERROR - see dbm.out for details
Processing section ILOM : ERROR - see dbm.out for details
Processing section SWITCH : ERROR - see dbm.out for details
Processing section VIP : ERROR - see dbm.out for details
Processing section SMTP : SMTP "Email Server Settings" standardrelay.acmehotels.com 25:0
SUCCESS
One or more checks report ERROR. Review dbm.out for details
If you check the file dbm.out, you can see the exact error messages.
Running in mode pre_applyconfig
Using name server xx.xxx.59.21 found in dbm.dat for all DNS lookups
Processing section DOMAIN
test.prol
Processing section NAME
GOOD : xx.xxx.59.21 responds to resolve request for proldb01.test.prol
GOOD : xx.xxx.59.22 responds to resolve request for proldb01.test.prol
Processing section NTP
GOOD : xx.xxx.192.1 responds to time server query (/usr/sbin/ntpdate -q)
Processing section GATEWAY
GOOD : xx.xxx.192.1 pings successfully
GOOD : xx.xxx.18.1 pings successfully
Processing section SCAN
GOOD : prol-scan.test.prol resolves to 3 IP addresses
GOOD : prol-scan.test.prol forward resolves to xx.xxx.18.32
GOOD : xx.xxx.18.32 reverse resolves to prol-scan.test.prol.
ERROR : xx.xxx.18.32 pings
GOOD : prol-scan.test.prol forward resolves to xx.xxx.18.33
GOOD : xx.xxx.18.33 reverse resolves to prol-scan.test.prol.
ERROR : xx.xxx.18.33 pings
GOOD : prol-scan.test.prol forward resolves to xx.xxx.18.34
GOOD : xx.xxx.18.34 reverse resolves to prol-scan.test.prol.
ERROR : xx.xxx.18.34 pings
Processing section COMPUTE
GOOD : proldb01.test.prol forward resolves to xx.xxx.192.16
GOOD : xx.xxx.192.16 reverse resolves to proldb01.test.prol.
ERROR : xx.xxx.192.16 pings
GOOD : proldb02.test.prol forward resolves to xx.xxx.192.17
GOOD : xx.xxx.192.17 reverse resolves to proldb02.test.prol.
ERROR : xx.xxx.192.17 pings
GOOD : proldb03.test.prol forward resolves to xx.xxx.192.18
GOOD : xx.xxx.192.18 reverse resolves to proldb03.test.prol.
ERROR : xx.xxx.192.18 pings
… output truncated …
It will report all issues that must be addressed. After addressing all issues, execute the actual configuration:
# applyconfig.sh
After it completes, connect the Exadata Database Machine to your network and check for the validity:
# check_ip.sh -m post_applyconfig
It will report the output in the same manner as the pre_applyconfig parameter and will report any issue, if present. After fixing the issues, run the deployment script. That script actually executes several steps inside it – 29 in all. The most prudent thing to do is to first list out all the steps so that you can be familiar with them. The option -l (that’s the letter “l”; not the numeral “1”) displays all the steps in the list.
# deploy112.sh –l
To run all the steps you should issue
# deploy112.sh –i
If you would prefer, you can run steps one by one, or groups at a time. To run steps 1 through 3, issue:
# deploy112.sh –i -r 1-3
Or, to run only step 1:
# deploy112.sh -i -s 1
The steps are listed here. (Please note: the steps can change without notice. The most up-to-date list will always be found in the release notes that come with an Exadata box.)
Step | Description |
0 | Validate this server setup |
1 | Setup SSH for the root user. |
2 | Validate all nodes. |
3 | Unzip files. |
4 | Update the /etc/hosts directory. |
5 | Create the cellip.ora and cellinit.ora files |
6 | Validate the hardware. |
7 | Validate the InfiniBand network. |
8 | Validate the cells. |
9 | Check RDS using the ping command. |
10 | Run the CALIBRATE command. |
11 | Validate the time and date. |
12 | Update the configuration. |
13 | Create the user accounts for celladmin and cellmonitor. |
14 | Set up SSH for the user accounts. |
15 | Create the Oracle home directories. |
16 | Create the grid disks. |
17 | Install the grid software. |
18 | Run the grid root scripts. |
19 | Install the Oracle Database software. |
20 | Create the listener. |
21 | Run Oracle ASM configuration assistant to configure Oracle ASM. |
22 | Unlock the Oracle Grid Infrastructure home directory. |
23 | Relink Reliable Data Socket (RDS) protocol. |
24 | Lock Oracle Grid Infrastructure. |
25 | Set up e-mail alerts for Exadata Cells. |
26 | Run Oracle Database Configuration Assistant. |
27 | Set up Oracle Enterprise Manager Grid Control. |
28 | Apply any security fixes. |
29 | Secure Oracle Exadata Database Machine. |
Here is the output of the script (amply truncated at places to conserve space):
# ./deploy112.sh -i
Script started, file is /opt/oracle.SupportTools/onecommand/tmp/STEP-0-proldb01-20110331154414.log
=
0 ValidateThisNodeSetup Begin ===============Validating first boot...
This step will validate DNS, NTS, params.sh, dbmachine.params, and all the
files generated by the DB Machine Configurator
In Check and Fix Hosts...
INFO: This nslookup could take upto ten seconds to resolve if the host isn't in DNS, please wait..
INFO: Running /usr/bin/nslookup prol-scan...
INFO: Running /usr/bin/nslookup proldb02...
SUCCESS: SCAN and VIP found in DNS...
Looking up nodes in dbmachine.params and dbs_group...
SUCCESS: proldb01 has ip address of xx.xxx.192.16..A_OK
SUCCESS: proldb02 has ip address of xx.xxx.192.17..A_OK
… output truncated …
SUCCESS: proldb08 has ip address of xx.xxx.192.23..A_OK
SUCCESS: prol01-vip has ip address of xx.xxx.18.24..A_OK
SUCCESS: Found IP Address xx.xxx.18.24 for prol01-vip using ping...
SUCCESS: Based on bondeth0:xx.xxx.18.16 and NetMask:255.255.255.0 we picked bondeth0 as the appropriate VIP interface
SUCCESS: prol02-vip has ip address of xx.xxx.18.25..A_OK
SUCCESS: Found IP Address xx.xxx.18.24 for prol01-vip using ping...
SUCCESS: Based on bondeth0:xx.xxx.18.16 and NetMask:255.255.255.0 we picked bondeth0 as the appropriate VIP interface
… output truncated …
SUCCESS: prol08-vip has ip address of xx.xxx.18.31..A_OK
SUCCESS: Found IP Address xx.xxx.18.24 for prol01-vip using ping...
SUCCESS: Based on bondeth0:xx.xxx.18.16 and NetMask:255.255.255.0 we picked bondeth0 as the appropriate VIP interface
Checking blocksizes...
SUCCESS: DB blocksize is 16384 checks out
checking patches
checking patches and version = 11202
SUCCESS: Located patch# 10252487 in /opt/oracle.SupportTools/onecommand/patches...
INFO: Checking zip files
INFO: Validating zip file /opt/oracle.SupportTools/onecommand/p10098816_112020_Linux-x86-64_1of7.zip...
Archive: /opt/oracle.SupportTools/onecommand/p10098816_112020_Linux-x86-64_1of7.zip
Length Date Time Name
-------- ---- ---- ----
0 11-16-10 03:10 database/
0 11-16-10 03:03 database/install/
182 11-16-10 03:03 database/install/detachHome.sh
… output truncated …
41092 11-16-10 03:03 database/doc/install.112/e17212/concepts.htm
1892 11-16-10 03:03 database/doc/install.112/e17212/contents.js
44576 11-16-10 03:03 database/doc/install.112/e17212/crsunix.htm
ERROR: /usr/bin/unzip -l /opt/oracle.SupportTools/onecommand/p10098816_112020_Linux-x86-64_1of7.zip did not complete successfully: Return Status: 80 Step# 1
Exiting...
Time spent in step 1 = 1 seconds
INFO: Going to run /opt/oracle.cellos/ipconf /opt/oracle.SupportTools/onecommand/preconf-11-2-1-2-2.csv -verify -ignoremismatch -verbose to validate first boot...
INFO: Running /opt/oracle.cellos/ipconf -verify -ignoremismatch -verbose on this node...
Verifying of configuration for /opt/oracle.cellos/cell.conf
Config file exists : PASSED
Load configuration : PASSED
Config version defined : PASSED
Config version 11.2.2.1.1 has valid value : PASSED
Nameserver xx.xxx.59.21 has valid IP address syntax : PASSED
Nameserver xx.xxx.59.22 has valid IP address syntax : PASSED
Canonical hostname defined : PASSED
Canonical hostname has valid syntax : PASSED
Node type defined : PASSED
Node type db is valid : PASSED
This node type is db : PASSED
Timezone defined : PASSED
Timezone found in /usr/share/zoneinfo : PASSED
NTP server xx.xxx.192.1 has valid syntax : PASSED
NTP drift file defined : PASSED
Network eth0 interface defined : PASSED
IP address defined for eth0 : PASSED
IP address has valid syntax for eth0 : PASSED
Netmask defined for eth0 : PASSED
Netmask has valid syntax for eth0 : PASSED
Gateway has valid syntax for eth0 : PASSED
Gateway is inside network for eth0 : PASSED
Network type defined for eth0 : PASSED
Network type has proper value for eth0 : PASSED
Hostname defined for eth0 : PASSED
Hostname for eth0 has valid syntax : PASSED
Network bondeth0 interface defined : PASSED
IP address defined for bondeth0 : PASSED
IP address has valid syntax for bondeth0 : PASSED
Netmask defined for bondeth0 : PASSED
Netmask has valid syntax for bondeth0 : PASSED
Gateway has valid syntax for bondeth0 : PASSED
Gateway is inside network for bondeth0 : PASSED
Network type defined for bondeth0 : PASSED
Network type has proper value for bondeth0 : PASSED
Hostname defined for bondeth0 : PASSED
Hostname for bondeth0 has valid syntax : PASSED
Slave interfaces for bondeth0 defined : PASSED
Two slave interfaces for bondeth0 defined : PASSED
Master interface ib0 defined : PASSED
Master interface ib1 defined : PASSED
Network bondib0 interface defined : PASSED
IP address defined for bondib0 : PASSED
IP address has valid syntax for bondib0 : PASSED
Netmask defined for bondib0 : PASSED
Netmask has valid syntax for bondib0 : PASSED
Network type defined for bondib0 : PASSED
Network type has proper value for bondib0 : PASSED
Hostname defined for bondib0 : PASSED
Hostname for bondib0 has valid syntax : PASSED
Slave interfaces for bondib0 defined : PASSED
Two slave interfaces for bondib0 defined : PASSED
At least 1 configured Eth or bond over Eth interface(s) defined : PASSED
2 configured Infiniband interfaces defined : PASSED
1 configured bond over ib interface(s) defined : PASSED
ILOM hostname defined : PASSED
ILOM hostname has valid syntax : PASSED
ILOM short hostname defined : PASSED
ILOM DNS search defined : PASSED
ILOM full hostname matches short hostname and DNS search : PASSED
ILOM IP address defined : PASSED
ILOM IP address has valid syntax : PASSED
ILOM Netmask defined : PASSED
ILOM Netmask has valid syntax : PASSED
ILOM Gateway has valid syntax : PASSED
ILOM Gateway is inside network : PASSED
ILOM nameserver has valid IP address syntax : PASSED
ILOM use NTP servers defined : PASSED
ILOM use NTP has valid syntax : PASSED
ILOM first NTP server has non-empty value : PASSED
ILOM first NTP server has valid syntax : PASSED
ILOM timezone defined : PASSED
Done. Config OK
INFO: Printing group files....
######################################################
This is the list of Database nodes...
proldb01
… output truncated …
proldb08
This is the list of Cell nodes...
prolcel01
… output truncated …
prolcel14
This is the list of Database Private node names...
proldb01-priv
… output truncated …
proldb08-priv
This is the list of Cell Private node names...
prolcel01-priv
… output truncated …
prolcel14-priv
This is the list all node names...
proldb01
… output truncated …
prolcel14
This is the list all private node names...
proldb01-priv
… output truncated …
prolcel14-priv
This is the template /etc/hosts file for private nodes...
### Compute Node Private Interface details
172.32.128.1 proldb01-priv.test.prol proldb01-priv
… output truncated …
172.32.128.8 proldb08-priv.test.prol proldb08-priv
### CELL Node Private Interface details
172.32.128.9 prolcel01-priv.test.prol prolcel01-priv
… output truncated …
172.32.128.22 prolcel14-priv.test.prol prolcel14-priv
### Switch details
# The following 5 IP addresses are for reference only. You may
# not be able to reach these IP addresses from this machine
# xx.xxx.192.60 prolsw-kvm.test.prol prolsw-kvm
# xx.xxx.192.61 prolsw-ip.test.prol prolsw-ip
# xx.xxx.192.62 prolsw-ib1.test.prol prolsw-ib1
# xx.xxx.192.63 prolsw-ib2.test.prol prolsw-ib2
# xx.xxx.192.64 prolsw-ib3.test.prol prolsw-ib3
Creating work directories and validating required files
ERROR: Please review and fix all ERROR's, we appear to have 1 errors...
Exiting...
Time spent in step 0 ValidateThisNodeSetup = 1 seconds
Script done, file is /opt/oracle.SupportTools/onecommand/tmp/STEP-0-proldb01-20110331154414.log
Check post-deployment configuration for IP addresses.
# ./checkip.sh -m post_deploy112
Exadata Database Machine Network Verification version 1.9
Network verification mode post_deploy112 starting ...
Saving output file from previous run as dbm.out_772
Using name server xx.xxx.59.21 found in dbm.dat for all DNS lookups
Processing section DOMAIN : SUCCESS
Processing section NAME : SUCCESS
Processing section NTP : SUCCESS
Processing section GATEWAY : SUCCESS
Processing section SCAN : SUCCESS
Processing section COMPUTE : SUCCESS
Processing section CELL : SUCCESS
Processing section ILOM : SUCCESS
Processing section SWITCH : SUCCESS
Processing section VIP : SUCCESS
Processing section SMTP : SMTP "Email Server Settings" standardrelay.acmehotels.com 25:0
SUCCESS
If everything should come back OK, your installation and configuration was successful.
Basic Commands
Power
Let’s start by understanding some very first commands you will need: powering on and off. The command for that is IPMITOOL. To power on a cell or database server, issue this from another server:
# ipmitool -H prolcel01-ilom -U root chassis power on
IPMI – short for Intelligent Platform Management Interface - is an interface standard that allows remote management of a server from another using standardized interface. The servers in the Exadata Database Machine follow that. It’s not an Exadata command but rather a general Linux one. To get all the options available, execute:
# ipmitool –h
To stop a server, use the shutdown command. To stop immediately and keep it down, i.e. not reboot, execute:
# shutdown -h -y now
To shut down after 10 minutes (the users will get a warning message)
# shutdown -h -y 10
To reboot the server (the “-r” option is for reboot)
# shutdown –r –y now
Or, a simple:
# reboot
Sometimes you may want to shutdown multiple servers. The DCLI command comes handy that time. To shut down all the cells, execute the command:
# dcli -l root -g all_cells shutdown -h -y now
The –g option allows you to give a filename containing all the cell servers. For instance all_cells is a file as shown below:
# cat all_cells
prolcel01
prolcel02
prolcel03
prolcel04
prolcel05
prolcel06
prolcel07
prolcel08
You could use a similar file for all database servers and name it all_nodes. To shutdown all database servers:
# dcli -l root -g all_nodes shutdown -h -y now
You will learn the DCLI command in detail in the next installment.
Maintenance
From time to time you will need to maintain the servers. (Remember, you are the DMA now, not the DBA.) One of the most common tasks is to install new software Images. Let’s see some of the related commands.
To learn what software image is installed, use the following:
# imageinfo
Kernel version: 2.6.18-194.3.1.0.3.el5 #1 SMP Tue Aug 31 22:41:13 EDT 2010 x86_64
Cell version: OSS_11.2.0.3.0_LINUX.X64_101206.2
Cell rpm version: cell-11.2.2.2.0_LINUX.X64_101206.2-1
Active image version: 11.2.2.2.0.101206.2
Active image activated: 2011-01-21 14:09:21 -0800
Active image status: success
Active system partition on device: /dev/md5
Active software partition on device: /dev/md7
In partition rollback: Impossible
Cell boot usb partition: /dev/sdac1
Cell boot usb version: 11.2.2.2.0.101206.2
Inactive image version: undefined
Rollback to the inactive partitions: Impossible
You can glean some important information from the output above. Note the line Active image version: 11.2.2.2.0.101206.2, which indicates the specific Exadata Storage Server version. It also shows the date and time the software image was activated, which can be used to troubleshoot. If you see problems occurring from a specific date and time, you may be able to correlate.
On the heels of the above, the next logical question could be, if a new image was installed (activated), what was the version before this. To find out the history of all the image changes, you can use the imagehistory command.
# imagehistory
Version : 11.2.2.2.0.101206.2
Image activation date : 2011-01-21 14:09:21 -0800
Imaging mode : fresh
Imaging status : success
This is a fresh install, so you don’t see much of history.
Managing Infiniband
For the newly minted DMA nothing is as rattling as the networking commands. It’s like being given a stick-shift car when all you have ever driven is an automatic.
As DBAs you probably didn’t have to execute anything other than ifconfig and netstat. Well, they still apply; so don’t forget that. But let’s see how to extend that knowledge to infiniband.
Status
To get the status of the Infiniband services. First to check the status of the infiniband devices, use the ibstatus command.
# ibstatus
Infiniband device 'mlx4_0' port 1 status:
default gid: fe80:0000:0000:0000:0021:2800:01a0:fd45
base lid: 0x1a
sm lid: 0xc
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 40 Gb/sec (4X QDR)
Infiniband device 'mlx4_0' port 2 status:
default gid: fe80:0000:0000:0000:0021:2800:01a0:fd46
base lid: 0x1c
sm lid: 0xc
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 40 Gb/sec (4X QDR)
… output truncated …
If it comes out OK, the next step is to check the status of the Infiniband Link, using the iblinkinfo. Here is a truncated output to save space.
# iblinkinfo
Switch 0x0021286cd6ffa0a0 Sun DCS 36 QDR switch prolsw-ib1.test.prol:
1 1[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
1 2[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
… output truncated …
1 17[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
1 18[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
1 19[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 12 32[ ] "Sun DCS 36 QDR switch localhost" ( )
1 20[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
1 21[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 11 32[ ] "Sun DCS 36 QDR switch prolsw-ib2.test.prol" ( )
… output truncated …
1 36[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
Switch 0x0021286cd6eba0a0 Sun DCS 36 QDR switch localhost:
12 1[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 43 2[ ] "prolcel02 C 172.32.128.10 HCA-1" ( )
… output truncated …
12 11[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
12 12[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 17 2[ ] "proldb04 S 172.32.128.4 HCA-1" ( )
… output truncated …
12 18[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 11 17[ ] "Sun DCS 36 QDR switch prolsw-ib2.test.prol" ( )
12 19[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 20 1[ ] "prolcel13 C 172.32.128.21 HCA-1" ( )
… output truncated …
12 29[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
12 30[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 6 1[ ] "proldb05 S 172.32.128.5 HCA-1" ( )
12 31[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 11 31[ ] "Sun DCS 36 QDR switch prolsw-ib2.test.prol" ( )
12 32[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 1 19[ ] "Sun DCS 36 QDR switch prolsw-ib1.test.prol" ( )
12 33[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
… output truncated …
12 36[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
Switch 0x0021286ccc72a0a0 Sun DCS 36 QDR switch prolsw-ib2.test.prol:
11 1[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 42 1[ ] "prolcel02 C 172.32.128.10 HCA-1" ( )
… output truncated …
11 10[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 14 1[ ] "proldb02 S 172.32.128.2 HCA-1" ( )
11 11[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
… output truncated …
11 28[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 3 2[ ] "proldb07 S 172.32.128.7 HCA-1" ( )
11 29[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
11 30[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 7 2[ ] "proldb05 S 172.32.128.5 HCA-1" ( )
11 31[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 12 31[ ] "Sun DCS 36 QDR switch localhost" ( )
11 32[ ] ==( 4X xx.0 Gbps Active/ LinkUp)==> 1 21[ ] "Sun DCS 36 QDR switch prolsw-ib1.test.prol" ( )
11 33[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
11 34[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
11 35[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
11 36[ ] ==( 4X 2.5 Gbps Down/Disabled)==> [ ] "" ( )
Topology
To get the topology of the infiniband network inside Exadata, use an Oracle supplied tool verify-topology, available in the directory /opt/oracle.SupportTools/ibdiagtools
# ./verify-topology.
[ DB Machine Infiniband Cabling Topology Verification Tool ]
[Version 11.2.1.3.b]
Looking at 1 rack(s).....
Spine switch check: Are any Exadata nodes connected ..............[SUCCESS]
Spine switch check: Any inter spine switch connections............[SUCCESS]
Spine switch check: Correct number of spine-leaf links............[SUCCESS]
Leaf switch check: Inter-leaf link check..........................[SUCCESS]
Leaf switch check: Correct number of leaf-spine connections.......[SUCCESS]
Check if all hosts have 2 CAs to different switches...............[SUCCESS]
Leaf switch check: cardinality and even distribution..............[SUCCESS]
Cluster Operations
To manage the Oracle Clusterware you use the same commands as you would in a traditional Oracle 11g Release 2 RAC database cluster. The commands are:
- CRSCTL – for a few cluster related commands
- SRVCTL – for most cluster related commands
CRSCTL is not used much but you need it for some occasions – mostly to shut down the cluster and to start up (if is not started automatically during the machine startup). Remember, you have to be root to issue this command. However, the root user may not have the location of this tool in its path. So, you should use its fully qualified patch while issuing the command. Here is the command to stop the cluster on all nodes:
# <OracleGridInfrastructureHome>/bin/crsctl stop cluster –all
You don’t need to shutdown the cluster on all nodes; sometimes all you need is to shut down the cluster on only one node. To shut down the cluster on one node alone, use:
# <OracleGridInfrastructureHome>/bin/crsctl stop cluster –n <HostName>
Similarly to start the cluster on one of the nodes where the cluster was initially stopped,
# <OracleGridInfrastructureHome>/bin/crsctl start cluster –n <HostName>
Finally, you may want to make sure all the cluster resources are running. Here is the command for that. The status command does not need to be issued by root.
# <OracleGridInfrastructureHome>/bin/crsctl status resource –t
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DBFS_DG.dg
ONLINE ONLINE proldb01
ONLINE ONLINE proldb02
ONLINE ONLINE proldb03
ONLINE ONLINE proldb04
ONLINE ONLINE proldb05
ONLINE ONLINE proldb06
ONLINE ONLINE proldb07
ONLINE ONLINE proldb08
ora.PRODATA.dg
ONLINE ONLINE proldb01
ONLINE ONLINE proldb02
ONLINE ONLINE proldb03
ONLINE ONLINE proldb04
ONLINE ONLINE proldb05
ONLINE ONLINE proldb06
ONLINE ONLINE proldb07
ONLINE ONLINE proldb08
ora.PRORECO.dg
ONLINE ONLINE proldb01
ONLINE ONLINE proldb02
ONLINE ONLINE proldb03
ONLINE ONLINE proldb04
ONLINE ONLINE proldb05
ONLINE ONLINE proldb06
ONLINE ONLINE proldb07
ONLINE ONLINE proldb08
ora.LISTENER.lsnr
ONLINE ONLINE proldb01
ONLINE ONLINE proldb02
ONLINE ONLINE proldb03
ONLINE ONLINE proldb04
ONLINE ONLINE proldb05
ONLINE ONLINE proldb06
ONLINE ONLINE proldb07
ONLINE ONLINE proldb08
ora.asm
ONLINE ONLINE proldb01 Started
ONLINE ONLINE proldb02 Started
ONLINE ONLINE proldb03 Started
ONLINE ONLINE proldb04 Started
ONLINE ONLINE proldb05 Started
ONLINE ONLINE proldb06 Started
ONLINE ONLINE proldb07 Started
ONLINE ONLINE proldb08
ora.gsd
OFFLINE OFFLINE proldb01
OFFLINE OFFLINE proldb02
OFFLINE OFFLINE proldb03
OFFLINE OFFLINE proldb04
OFFLINE OFFLINE proldb05
OFFLINE OFFLINE proldb06
OFFLINE OFFLINE proldb07
OFFLINE OFFLINE proldb08
ora.net1.network
ONLINE ONLINE proldb01
ONLINE ONLINE proldb02
ONLINE ONLINE proldb03
ONLINE ONLINE proldb04
ONLINE ONLINE proldb05
ONLINE ONLINE proldb06
ONLINE ONLINE proldb07
ONLINE ONLINE proldb08
ora.ons
ONLINE ONLINE proldb01
ONLINE ONLINE proldb02
ONLINE ONLINE proldb03
ONLINE ONLINE proldb04
ONLINE ONLINE proldb05
ONLINE ONLINE proldb06
ONLINE ONLINE proldb07
ONLINE ONLINE proldb08
ora.registry.acfs
ONLINE ONLINE proldb01
ONLINE ONLINE proldb02
ONLINE ONLINE proldb03
ONLINE ONLINE proldb04
ONLINE ONLINE proldb05
ONLINE ONLINE proldb06
ONLINE ONLINE proldb07
ONLINE ONLINE proldb08
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE proldb07
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE proldb02
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE proldb05
ora.cvu
1 ONLINE ONLINE proldb02
ora.proldb01.vip
1 ONLINE ONLINE proldb01
ora.proldb02.vip
1 ONLINE ONLINE proldb02
ora.proldb03.vip
1 ONLINE ONLINE proldb03
ora.proldb04.vip
1 ONLINE ONLINE proldb04
ora.proldb05.vip
1 ONLINE ONLINE proldb05
ora.proldb06.vip
1 ONLINE ONLINE proldb06
ora.proldb07.vip
1 ONLINE ONLINE proldb07
ora.proldb08.vip
1 ONLINE ONLINE proldb08
ora.prolrd.db
1 ONLINE ONLINE proldb01 Open
2 ONLINE ONLINE proldb02 Open
3 ONLINE ONLINE proldb03 Open
4 ONLINE ONLINE proldb04 Open
5 ONLINE ONLINE proldb05 Open
6 ONLINE ONLINE proldb06 Open
7 ONLINE ONLINE proldb07 Open
8 ONLINE ONLINE proldb08 Open
ora.oc4j
1 ONLINE ONLINE proldb01
ora.scan1.vip
1 ONLINE ONLINE proldb07
ora.scan2.vip
1 ONLINE ONLINE proldb02
ora.scan3.vip
1 ONLINE ONLINE proldb05
This output shows clearly the status of the various resources. A complete explanation of all the options of CRSCTL is not possible to give. Here is an abbreviated list of the options. To know the exact parameters required for each resource, simply call it with -h option. For instance, to know about the backup option, execute
# crsctl backup -h
Usage:
crsctl backup css votedisk
Backup the voting disk.
Here is the list of the options for CRSCTL:
crsctl add - add a resource, type or other entity
crsctl backup - back up voting disk for CSS
crsctl check - check a service, resource or other entity
crsctl config - output autostart configuration
crsctl debug - obtain or modify debug state
crsctl delete - delete a resource, type or other entity
crsctl disable - disable autostart
crsctl discover - discover DHCP server
crsctl enable - enable autostart
crsctl get - get an entity value
crsctl getperm - get entity permissions
crsctl lsmodules - list debug modules
crsctl modify - modify a resource, type or other entity
crsctl query - query service state
crsctl pin - Pin the nodes in the nodelist
crsctl relocate - relocate a resource, server or other entity
crsctl replace - replaces the location of voting files
crsctl release - release a DHCP lease
crsctl request - request a DHCP lease
crsctl setperm - set entity permissions
crsctl set - set an entity value
crsctl start - start a resource, server or other entity
crsctl status - get status of a resource or other entity
crsctl stop - stop a resource, server or other entity
crsctl unpin - unpin the nodes in the nodelist
crsctl unset - unset a entity value, restoring its default
Another command SRVCTL performs most of the server-based operations including resource (such as service) relocation. This is nothing different from the tool on a traditional Oracle RAC 11g Release 2 Cluster. To know more about the options in this tool, execute this command:
# srvctl -h
Usage: srvctl [-V]
Usage: srvctl add database -d <db_unique_name> -o <oracle_home> [-c {RACONENODE | RAC | SINGLE}
[-e <server_list>] [-i <instname>] [-w <timeout>]] [-m <domain_name>] [-p <spfile>] [-r {PRIMARY | PHYSICAL_STANDBY | LOGICAL_STANDBY | SNAPSHOT_STANDBY}]
[-s <start_options>] [-t <stop_options>] [-n <db_name>] [-y {AUTOMATIC | MANUAL}] [-g "<serverpool_list>"] [-x <node_name>] [-a "<diskgroup_list>"]
[-j "<acfs_path_list>"]
Usage: srvctl config database [-d <db_unique_name> [-a] ] [-v]
Usage: srvctl start database -d <db_unique_name> [-o <start_options>] [-n <node>]
Usage: srvctl stop database -d <db_unique_name> [-o <stop_options>] [-f]
Usage: srvctl status database -d <db_unique_name> [-f] [-v]
… output truncated …
IPMI Tool
Earlier in this article you saw a reference to the IPMI tool. We used it to power the servers on. But that is not the only thing you can do with this tool; there are plenty more options. If you want to find out what options are available, issue the command without any arguments.
# ipmitool
No command provided!
Commands:
raw Send a RAW IPMI request and print response
i2c Send an I2C Master Write-Read command and print response
spd Print SPD info from remote I2C device
lan Configure LAN Channels
chassis Get chassis status and set power state
power Shortcut to chassis power commands
event Send pre-defined events to MC
mc Management Controller status and global enables
sdr Print Sensor Data Repository entries and readings
sensor Print detailed sensor information
fru Print built-in FRU and scan SDR for FRU locators
sel Print System Event Log (SEL)
pef Configure Platform Event Filtering (PEF)
sol Configure and connect IPMIv2.0 Serial-over-LAN
tsol Configure and connect with Tyan IPMIv1.5 Serial-over-LAN
isol Configure IPMIv1.5 Serial-over-LAN
user Configure Management Controller users
channel Configure Management Controller channels
session Print session information
sunoem OEM Commands for Sun servers
kontronoem OEM Commands for Kontron devices
picmg Run a PICMG/ATCA extended cmd
fwum Update IPMC using Kontron OEM Firmware Update Manager
firewall Configure Firmware Firewall
shell Launch interactive IPMI shell
exec Run list of commands from file
set Set runtime variable for shell and exec
hpm Update HPM components using PICMG HPM.1 file
It’s not possible to explain each option here. Let’s examine one of the most used ones. The option sel shows System Event Log, one of the key commands you will need to use.
# ipmitool sel
SEL Information
Version : 2.0 (v1.5, v2 compliant)
Entries : 96
Free Space : 14634 bytes
Percent Used : 9%
Last Add Time : 02/27/2011 20:23:44
Last Del Time : Not Available
Overflow : false
Supported Cmds : 'Reserve' 'Get Alloc Info'
# of Alloc Units : 909
Alloc Unit Size : 18
# Free Units : 813
Largest Free Blk : 813
Max Record Size : 18
The output is summary only. To know the details of the Event Log, you can use an additional parameter: list.
# ipmitool sel list
1 | 01/21/2011 | 07:05:39 | System ACPI Power State #0x26 | S5/G2: soft-off | Asserted
2 | 01/21/2011 | 08:59:43 | System Boot Initiated | System Restart | Asserted
3 | 01/21/2011 | 08:59:44 | Entity Presence #0x54 | Device Present
4 | 01/21/2011 | 08:59:44 | System Boot Initiated | Initiated by hard reset | Asserted
5 | 01/21/2011 | 08:59:44 | System Firmware Progress | Memory initialization | Asserted
6 | 01/21/2011 | 08:59:44 | System Firmware Progress | Primary CPU initialization | Asserted
7 | 01/21/2011 | 08:59:49 | Entity Presence #0x58 | Device Present
8 | 01/21/2011 | 08:59:52 | Entity Presence #0x57 | Device Present
9 | 01/21/2011 | 08:59:53 | System Boot Initiated | Initiated by warm reset | Asserted
a | 01/21/2011 | 08:59:53 | System Firmware Progress | Memory initialization | Asserted
b | 01/21/2011 | 08:59:53 | System Firmware Progress | Primary CPU initialization | Asserted
c | 01/21/2011 | 08:59:54 | System Boot Initiated | Initiated by warm reset | Asserted
d | 01/21/2011 | 08:59:55 | System Firmware Progress | Memory initialization | Asserted
e | 01/21/2011 | 08:59:55 | System Firmware Progress | Primary CPU initialization | Asserted
f | 01/21/2011 | 09:00:01 | Entity Presence #0x55 | Device Present
... truncated ...
The output has been shown partially to conserve space. This is one of the key commands you should be aware of. In a troubleshooting episode, you should check the system even log to make sure the components have not failed. If they did, of course, you would have to replace them before going further. If you get a clean bill of health from IPMITOOL, you should go to the next step of making sure you have no issues with the cluster, then no issues with the RAC database and so on.
No comments:
Post a Comment