blog menu1

How to Replace a Hard Drive in an Exadata Storage Server (Hard Failure)

How to Replace a Hard Drive in an Exadata Storage Server (Hard Failure) (Doc ID 1386147.1)

DISPATCH INSTRUCTIONS:

The customer may choose to use the on-site spare disk provided for Exadata Storage Servers and do the replacement themselves. In this case, the spares should be replenished using a parts-only dispatch.

The following information will be required prior to dispatch of a replacement:
  • Type of Exadata (V2 or X2 or X3 or X4) / Exadata Storage Expansion Rack (X2 or X3 or X4)
  • Type of storage cell/Node (V2= x4275 / X2 = x4270m2 / X3 = X3-2L / X4 = X4-2L)
  • Size of failed drive and part number
  • Name/location of storage cell
  • Slot number of failed drive
  • Image Version (output of "/opt/oracle.cellos/imageinfo -all")

Failed hard drive in Exadata Storage Server. - This document is specific to hard drives in "critical failure" state, also known as hard failure. There are situations where a drive will be flagged at first as a predictive failure which means the disk may still be in use. In such cases, please reference Doc ID 1390836.1 for replacement steps.

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:
It is expected that the Exadata Machine is up and running and the storage cell containing the failed drive is booted and available.

If there are multiple drives to be replaced within an Exadata machine (or between an Exadata interconnected with another Exadata or Expansion Cabinet), it is critical that only ONE DRIVE BE REPLACED AT A TIME to avoid the risk of data loss. Before replacing another disk in Exadata, ensure the re-balance operation has completed from the first replacement.

Before proceeding, confirm the part number of the part in hand (either from logistics or an on-site spare) matches the part number dispatched for replacement (especially important in cases where the customer has multiple racks of different drive types/sizes).

1. Confirm the drive needing replacement based on the output provided ("name" or "slotNumber" value) and LED status of drive. For a hard failure, the LED for the failed drive should have the "Service Action Required" amber LED illuminated/flashing. It should also have the "OK to Remove" blue LED illuminated/flashing, but may not depending on the nature of the failure mode and when it occurred.

For example, follow Doc ID 1113013.1 to determine the failed drive.
  • CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk AND status=critical DETAIL
  • name: 28:5
  • deviceId: 21
  • diskType: HardDisk
  • enclosureDeviceId: 28
  • errMediaCount: 0
  • errOtherCount: 0
  • foreignState: false
  • luns: 0_5
  • makeModel: "SEAGATE ST360057SSUN600G"
  • physicalFirmware: 0705
  • physicalInterface: sas
  • physicalSerial: E07KZ8
  • physicalSize: 558.9109999993816G
  • slotNumber: 5
  • status: critical

In the output above, both the "name:" value (following the ":") and the "slotNumber" provide the slot of the physical device requiring replacement where the "status" field is "critical" status. In the above example, the slot is determined to be slot 5. (slotNumber: 5 AND name: 28:5)
2. The Oracle ASM disks associated with the grid disks on the physical disk will be automatically dropped with FORCE option, and an ASM re-balance will start immediately to restore the data redundancy. Due to being "critical", there is no need to check that ASM is still re-balancing.


Validate the disk that was marked "critical" is no longer part of the ASM diskgroups:
a. Login to a database node with the username for the owner of Oracle Grid Infrastructure home. Typically this is the 'oracle' user.
  • edx2db01 login: oracle
  • Password:
  • Last login: Thu Jul 12 14:43:10 on ttyS0
  • [oracle@edx2db01 ~]$
b. Select the ASM instance for this DB node and connect to SQL Plus:
  • [oracle@edx2db01 ~]$ . oraenv
  • ORACLE_SID = [oracle] ? +ASM1
  • The Oracle base has been set to /u01/app/oracle
  • [oracle@edx2db01 ~]$ sqlplus ' / as sysasm'

  • SQL*Plus: Release 11.2.0.2.0 Production on Thu Jul 12 14:45:20 2012
  • Copyright (c) 1982, 2010, Oracle. All rights reserved.
  • Connected to:
  • Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
  • With the Real Application Clusters and Automatic Storage Management options
  • SQL>

In the above output the “1” of “+ASM1” refers to the DB node number. For example, DB node #3 the value would be +ASM3.
c. From the DB node, run the following query, using the name the celldisk associated with this physical disk, which is given in the Cell alert:
SQL> select group_number,path,header_status,mount_status,mode_status,name from V$ASM_DISK where path like '%CD_05_edx2cel02';
no rows selected.
SQL>
This query should return no rows indicating the disk is no longer in the ASM diskgroup configuration. If this returns any other value, then contact the SR owner for further guidance.

Note: If you are not sure what the celldisk name is, or do not have the alert output available, from the CellCLI interface run "list alerthistory"

3. The Cell Management Server daemon monitors and takes action on replacement disks to automatically bring the new disk into the configuration.
a. Login to the cell server and enter the CellCLI interface
edx2cel01 login: celladmin
Password:
[celladmin@edx2cel01 ~]$ cellcli
CellCLI: Release 11.2.2.4.2 - Production on Mon Jul 23 16:21:17 EDT 2012

Copyright (c) 2007, 2009, Oracle. All rights reserved.
Cell Efficiency Ratio: 1,000

CellCLI>
b. Verify the status of the msStatusis running before replacing the disk:
  • CellCLI> list cell attributes cellsrvStatus,msStatus,rsStatus detail
  • cellsrvStatus: running
  • msStatus: running
  • rsStatus: running

4. If the failed disk is in slot 0 or slot 1, then the disk is a system disk which contains the running OS. Verify the root volume is in 'clean' state before hot replacing a system disk. If it is 'active' and the disk is hot removed, the OS may crash making the recovery more difficult.
a. Login as 'root' on the Storage Cell, and use 'df' to determine the md device name for "/" volume:
[root@dbm1cel1 /]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/md5 10317752 2906660 6886980 30% /
tmpfs 12265720 0 12265720 0% /dev/shm
/dev/md7 2063440 569452 1389172 30% /opt/oracle
/dev/md4 118451 37567 74865 34% /boot
/dev/md11 2395452 74228 2199540 4% /var/log/oracle
b. Use 'mdadm' to determine the volume status:
[root@dbm1cel1 ~]# mdadm -Q --detail /dev/md5
/dev/md5:
Version : 0.90
Creation Time : Wed Apr 11 12:08:33 2012
Raid Level : raid1
Array Size : 10482304 (10.00 GiB 10.73 GB)
Used Dev Size : 10482304 (10.00 GiB 10.73 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 5
Persistence : Superblock is persistent

Update Time : Wed Apr 11 13:35:04 2012
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0

UUID : c93a778e:64f89fb5:2c560736:d50b1c04
Events : 0.838

Number Major Minor RaidDevice State
0 8 5 0 active sync /dev/sda5
1 0 0 1 removed

2 8 21 - faulty spare /dev/sdb5

WHAT ACTION DOES THE ENGINEER NEED TO TAKE:
Confirm the drive needing replacement based on the output provided ("name" or "slotNumber" value) and LED status of drive. For a hard failure, the LED for the failed drive should have the "Service Action Required" amber LED illuminated/flashing, and may have the "OK to Remove" blue LED illuminated/flashing depending on the nature of the failure mode and when the failure occured. The cell server within the rack can be determined from the hostname usually, and the known default Exadata server numbering scheme. The server should also have its LOCATE white LED illuminated/flashing.
Perform the physical replacement of the disk following the directions from the service manual of the respective server (see REFERENCE INFORMATION below):

Slot locations for Exadata Storage Servers based on Sun Fire X4275 and Sun Fire X4270M2 servers:

View from the front:
HDD2HDD5HDD8HDD11
HDD1HDD4HDD7HDD10
HDD0HDD3HDD6HDD9
Slot locations for Exadata Storage Servers based on Sun Server X3-2L servers:

View from the front:
HDD8HDD9HDD10HDD11
HDD4HDD5HDD6HDD7
HDD0HDD1HDD2HDD3

1. If it is not already, turn on the service LED for the device with the following command, where <ID> is the "name" value provided in the action plan (such as 28:5 in the example above):
CellCLI> alter physicaldisk <ID> serviceled on
CellCLI> alter physicaldisk 28:5 serviceled on
This will cause the disk's Amber fault LED to blink rapidly as a locate indication.
2. On the drive you plan to remove, push the storage drive release button to open the latch.
3. Grasp the latch and pull the drive out of the drive slot (Caution: The latch is not an ejector. Do not bend it too far to the right. Doing so can damage the latch. Also, whenever you remove a storage drive, you should replace it with another storage drive or a filler panel, otherwise the server might overheat due to improper airflow.)
4. Wait three minutes for the MS daemon to recognize the removal of the old drive.
5. Slide the new drive into the drive slot until it is fully seated.
6. Close the latch to lock the drive in place.
7. Verify the "OK/Activity" Green LED begins to flicker as the system recognizes the new drive. The other two LEDs for the drive should no longer be illuminated.
8. Wait three minutes for the MS daemon to start rebuilding the virtual drives before proceeding. Note: Do not run any controller commands in the service manual when replacing the disk.
9. The server's locate and disk's service LED locate blinking function should automatically turn off. If it does not, it can be manually turned off for the device if it was turned on in step 1, using the same "<ID>" value:
CellCLI> alter physicaldisk 28:5 serviceled off

OBTAIN CUSTOMER ACCEPTANCE
- WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

After replacing the physical disk on Exadata Storage Server, wait for three minutes before running any commands to query the device from the server. CellCLI (examples below) should be the principle tool to query the drives. If that is unsuccessful you can use "lsscsi" and "/opt/MegaRAID/MegaCli/MegaCli64 -PDList -a0" to verify the drive from an OS perspective.

1. When you replace a physical disk, the disk must be acknowledged by the RAID controller before the rest of the system can access it. Login to the cell server and enter the CellCLI interface, and run the following command, where <ID> is the "name" value provided in the action plan:
  • CellCLI> LIST PHYSICALDISK <ID> detail
CellCLI> list physicaldisk 28:5 detail
  • name: 28:5
  • deviceId: 11
  • diskType: HardDisk
  • enclosureDeviceId: 28
  • errMediaCount: 0
  • errOtherCount: 0
  • foreignState: false
  • luns: 0_5
  • makeModel: "SEAGATE ST360057SSUN600G"
  • physicalFirmware: 0A25
  • physicalInsertTime: 2012-07-23T20:02:31-04:00
  • physicalInterface: sas
  • physicalSerial: E02LZ1
  • physicalSize: 558.9109999993816G
  • slotNumber: 5
  • status: normal
  • The "status" field should report "normal". Note also that the physicalInsertTime should be current date and time, and not an earlier time. If they are not, then the old disk entries may still be present and the disk replacement did not complete successfully. If this is the case, refer to the SR owner for further assistance.using/substituting the "name" value provided in the action plan
2. The firmware of the drive will be automatically upgraded to match the other disks in the system when the new drive is inserted, if it is below the supported version of the current image. If it is above the minimum supported version then no action will be taken, and the newer firmware will remain. This can be validated by the following command:
  • CellCLI> alter cell validate configuration
3. After the physical disk is replaced, a lun should be automatically created, and the grid disks and cell disks that existed on the previous disk in that slot are automatically re-created on the new physical disk. If those grid disks were part of an Oracle ASM group, then they will be added back to the disk group and the data will be re-balanced on them, based on the disk group redundancy and asm_power_limit parameter values.

Grid disks and cell disks can be verified with the following CellCLI command, where the lun name is reported in the physicaldisk output from step 1 above ("0_5" in this example"):
  • CellCLI> list lun 0_5 detail
  • name: 0_5
  • cellDisk: CD_05_edx2cel02
  • deviceName: /dev/sdad
  • diskType: HardDisk
  • id: 0_5
  • isSystemLun: FALSE
  • lunAutoCreate: FALSE
  • lunSize: 558.40625G
  • lunUID: 0_5
  • physicalDrives: 28:5
  • raidLevel: 0
  • lunWriteCacheMode: "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU"
  • status: normal

  • CellCLI> list celldisk where lun=0_5 detail
  • name: CD_05_edx2cel02
  • comment:
  • creationTime: 2012-07-23T20:03:34-04:00
  • deviceName: /dev/sdad
  • devicePartition: /dev/sdad
  • diskType: HardDisk
  • errorCount: 0
  • freeSpace: 0
  • id: fd22150e-34fd-4958-a955-10174d3225a0
  • interleaving: none
  • lun: 0_5
  • raidLevel: 0
  • size: 558.40625G
  • status: normal

  • CellCLI> list griddisk where celldisk=CD_05_edx2cel02 detail
  • name: DATA_Q1_CD_05_edx2cel02
  • availableTo:
  • cellDisk: CD_05_edx2cel02
  • comment:
  • creationTime: 2012-07-23T20:05:04-04:00
  • diskType: HardDisk
  • errorCount: 0
  • id: 088c80f3-67bd-4e3e-aff2-5cf9eb1885f7
  • offset: 32M
  • size: 423G
  • status: active

  • name: DBFS_DG_CD_05_edx2cel02
  • availableTo:
  • cellDisk: CD_05_edx2cel02
  • comment:
  • creationTime: 2012-07-23T20:05:18-04:00
  • diskType: HardDisk
  • errorCount: 0
  • id: 51607362-fa4e-4b47-bdc6-04e8ae7b9742
  • offset: 528.734375G
  • size: 29.125G
  • status: active

  • name: RECO_Q1_CD_05_edx2cel02
  • availableTo:
  • cellDisk: CD_05_edx2cel02
  • comment:
  • creationTime: 2012-07-23T20:05:55-04:00
  • diskType: HardDisk
  • errorCount: 0
  • id: d27f6c0e-36fa-4c87-96e3-6c5b35d83d86
  • offset: 423.046875G
  • size: 106.234375G
  • status: active
Status should be normal for the cell disks and active for the grid disks. All of the creation times should also match the insertion time of the replacement disk. If they are not, then the old disk entries may still be present and the disk replacement did not complete successfully. If this is the case, refer to the SR owner for further assistance.
Note: The lun name attribute will also be shown in the original alert generated by the storage cell.
4. To confirm that the status of the re-balance, connect to the ASM instance on a database node, and validate the disks were added back to the ASM diskgroups and a re-balance is running:
  • SQL> set linesize 132
  • SQL> col path format a50
  • SQL> select group_number,path,header_status,mount_status,name from V$ASM_DISK where path like '%CD_05_edx2cel02';
  • GROUP_NUMBER PATH HEADER_STATU MOUNT_S NAME
  • ------------ -------------------------------------------------- ------------ ------- ------------------------------
  • 1 o/192.168.9.10/DATA_Q1_CD_05_edx2cel02 MEMBER CACHED DATA_Q1_CD_05_edx2CEL02
  • 2 o/192.168.9.10/DBFS_DG_CD_05_edx2cel02 MEMBER CACHED DBFS_DG_CD_05_edx2CEL02
  • 3 o/192.168.9.10/RECO_Q1_CD_05_edx2cel02 MEMBER CACHED RECO_Q1_CD_05_edx2CEL02

  • SQL> select * from gv$asm_operation;
  • INST_ID GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE
  • ---------- ------------ ----- ---- ---------- ---------- ---------- ---------- ----------
  • EST_MINUTES ERROR_CODE
  • ----------- --------------------------------------------
  • 2 3 REBAL WAIT 10
  • 1 3 REBAL RUN 10 10 1541 2422
  • 7298 0
An active re-balance operation can be identified by STATE=RUN. The column group_number and inst_id provide the diskgroup number of the diskgroup been re-balanced and the instance number where the operation is running. The re-balance operation is complete when the above query returns "no rows selected".

Validate the expected number of griddisks per failgroup and diskgroup. Normal deployment includes 12 griddisks for data, 12 for reco and 10 for dbfs_dg. (MODE_STATUS = ONLINE or MOUNT_STATUS=CACHED) (via SQL> )
  • SQL> select group_number,failgroup,mode_status,count(*) from v$asm_disk group by group_number,failgroup,mode_status;
The re-balance operation has completed when there are no "group_number" values of 0, and each disk group has count the same number of disks.
If the new griddisks were not automatically added back into the ASM diskgroup configuration, then locate the disks with group_number=0, and add them back in manually using "alter diskgroup <name> add disk <path> re-balance power 10;" command:
SQL> select path,header_status from v$asm_disk where group_number=0;
PATH HEADER_STATU
-------------------------------------------------- ------------
o/192.168.9.10/DBFS_DG_CD_05_edx2cel02 FORMER
o/192.168.9.10/DATA_Q1_CD_05_edx2cel02 FORMER
o/192.168.9.10/RECO_Q1_CD_05_edx2cel02 FORMER

SQL> alter diskgroup dbfs_dg add disk 'o/192.168.9.10/DBFS_DG_CD_05_edx2cel02' rebalance power 10;
SQL> alter diskgroup data_q1 add disk 'o/192.168.9.10/DATA_Q1_CD_05_edx2cel02' rebalance power 10;
SQL> alter diskgroup reco_q1 add disk 'o/192.168.9.10/RECO_Q1_CD_05_edx2cel02' rebalance power 10;

Repeat the prior queries to validate the re-balance has started and there are no longer any disks with "group_number" values of 0.
5. If the disk replaced was a system disk in slot 0 or 1, then the status of the OS volume should also be checked. Login as 'root' on the Storage cell and check the status using the same 'df' and 'mdadm' commands listed above:
[root@dbm1cel1 ~]# mdadm -Q --detail /dev/md5
/dev/md5:
Version : 0.90
Creation Time : Thu Mar 17 23:19:42 2011
Raid Level : raid1
Array Size : 10482304 (10.00 GiB 10.73 GB)
Used Dev Size : 10482304 (10.00 GiB 10.73 GB)
Raid Devices : 2
Total Devices : 3
Preferred Minor : 5
Persistence : Superblock is persistent

Update Time : Mon Jul 23 20:11:36 2012
State : active, degraded
Active Devices : 1
Working Devices : 2
Failed Devices : 1
Spare Devices : 1

UUID : e75c1b6a:64cce9e4:924527db:b6e45d21
Events : 0.215

Number Major Minor RaidDevice State
3 65 213 0 spare rebuilding /dev/sdad5
1 8 21 1 active sync /dev/sdb5

2 8 5 - faulty spare

[root@dbm1cel1 ~]#
While the system disk is rebuilding, the state will show as "active, degraded" or "active,degraded,recovering" with one indicating it is rebuilding and the 3rd being the 'faulty' disk. After rebuild has started, re-running this command will give a "Rebuild Status: X% complete" line in the output. When the system disk sync status is complete, the state should return to "clean" only with 2 devices.

If the status of any of the above checks (firmware, grid disk / cell disk creation, re-balance) is not successful, re-engage Oracle Support to get the correct action plan to manually complete the required steps.

No comments:

Post a Comment