Quantum StorNext M440 Metadata Drive Replacement and Rebuild with NetApp E-Series
Author
Adi
Date Published
If you come on this page from google, don't go further. You are looking for the command "show storagearray longrunningoperations;" :)
Yesterday one of our clients had an issue during the upgrade from SNFS4 to SNFS5. While troubleshooting the issue we had to fail and replace one of the metadata drive from the NetApp E-Series disk array.
[caption id="attachment_32" align="aligncenter" width="600"] M440 metadata storage and controllers[/caption]
Usually we just replace the failed drive with the new drive and check everything is good in the Quantum StorNext web interface and don't bother too much about it.
This time it's a bit different since the first issue brought a lot of visibility to the SAN architecture. We wanted to keep everything stable and avoid to impact the production system with useless latency, lower performances or downtime visible by the users.
So when we decided to replace the metadata drive we wanted to see the progress of the drive replacement to make sure everything goes smoothly and to be able to communicate when the SAN will be completely healthy and in an optimal state.
Unfortunately the only management options we have are:
- Quantum StorNext web interface
- NetApp E-Series CLI command (SMcli)
These 2 tools don't provide an intuitive way to check how far we are in the rebuild of the drive. StorNext webui doesn't expose this detail of information and it took us more than 1 hour, 4 calls to support and few google queries to figure out how to get this information from the CLI (SMcli).
[caption id="attachment_31" align="aligncenter" width="751"] SNFS web interface notification for a drive replacement[/caption]
So if you want to know how your rebuild is progressing, you need to type the command "show storagearray longrunningoperations;":
1[root@node-1 ~]# SMcli Qarray1a -S -c "show storagearray longrunningoperations;"2Long Lived Operations:34 LOGICAL DEVICES OPERATION STATUS TIME REMAINING5 1 Copyback 86% Completed 17 min6 TRAY_85_VOL_4 Copyback 85% Completed 17 min7[root@node-1 ~]#89
Before to go there, we poked around trying the obvious commands to get these information with no luck.
"show storagearray healthstatus" tells us the array is fixing itself but no information about when it will be done:
1[root@node-1 stornext]# SMcli localhost -c "show storagearray healthstatus;"2 Performing syntax check...3 Syntax check complete.4 Executing script...56 Storage array health status = fixing.7 The following failures have been found:8 Volume - Hot Spare In Use9 Storage array: Qarray110 Volume group: 111 Status: Optimal12 RAID level: 113 Failed drive at: tray 85, slot 614 Service action (removal) allowed: No15 Service action LED on component: Yes16 Replaced by drive at: tray 85, slot 117 Volumes: TRAY_85_VOL_3, TRAY_85_VOL_41819 Script execution complete.20 SMcli completed successfully.21 [root@node-1 stornext]#22
"show volumegroup[$number]" tells us who are the members of the volumegroup, but nothing about what is going on:
1[root@node-1 stornext]# SMcli localhost -c "show volumegroup[1];"2 Performing syntax check...3 Syntax check complete.4 Executing script...56 DETAILS789 Name: 11011 Status: Optimal12 Capacity: 558.410 GB13 Current owner: Controller in slot B1415 Quality of Service (QoS) Attributes1617 RAID level: 118 Drive media type: Hard Disk Drive19 Drive interface type: Serial Attached SCSI (SAS)20 Tray loss protection: No21 Data Assurance (DA) capable: Yes22 DA enabled volume present: No232425 Total Volumes: 226 Standard volumes: 227 Repository volumes: 028 Free Capacity: 0.000 MB2930 Associated drives - present (in piece order)31 Total drives present: 33233 Tray Slot34 85 5 [mirrored pair with drive at tray 85, slot 1]35 85 636 85 1 [hot spare drive is sparing for drive at 85, 6]373839 Script execution complete.40 SMcli completed successfully.41 [root@node-1 stornext]#
And if we check the drive itself nothing interesting shows up:
1[root@node-1 stornext]# SMcli localhost -c "show drive[85,6];"2 Performing syntax check...3 Syntax check complete.4 Executing script...56 Drive at Tray 85, Slot 6789 Status: Replaced1011 Mode: Assigned12 Raw capacity: 558.912 GB13 Usable capacity: 558.412 GB14 World-wide identifier: 50:00:cc:a0:43:32:f6:90:00:00:00:00:00:00:00:0015 Associated volume group: 1161718 Port Channel19 0 220 1 1212223 Media type: Hard Disk Drive24 Interface type: Serial Attached SCSI (SAS)25 Drive path redundancy: OK2627 Drive capabilities: Data Assurance (DA), Full Disk Encryption (FDE)2829 Security capable: Yes, Full Disk Encryption (FDE)30 Secure: No31 Read/write accessible: Yes32 Drive security key identifier: None3334 Data Assurance (DA) capable: Yes3536 Speed: 10,020 RPM37 Current data rate: 6 Gbps38 Logical sector size: 512 bytes39 Physical sector size: 512 bytes40 Product ID: HUC109060CSS60141 Drive firmware version: MS0442 Serial number: KSGX0VVR43 Manufacturer: HITACHI44 Date of manufacture: Not Available454647 Script execution complete.48 SMcli completed successfully.49 [root@node-1 stornext]#50
But I am happy we discovered the command "show storagearray longrunningoperations;" to monitor the copy back from the hot spare to the new drive and to confirm everything completed correctly with everything in healthy and optimal status.
1[root@node-1 ~]# SMcli Qarray1a -S -c "show storagearray longrunningoperations;"23 Long Lived Operations:4 No operation is currently in progress.56 [root@node-1 ~]# SMcli Qarray1a -S -c "show volumegroup[1];"7 DETAILS8910 Name: 11112 Status: Optimal13 Capacity: 558.410 GB14 Current owner: Controller in slot B1516 Quality of Service (QoS) Attributes1718 RAID level: 119 Drive media type: Hard Disk Drive20 Drive interface type: Serial Attached SCSI (SAS)21 Tray loss protection: No22 Data Assurance (DA) capable: Yes23 DA enabled volume present: No242526 Total Volumes: 227 Standard volumes: 228 Repository volumes: 029 Free Capacity: 0.000 MB3031 Associated drives - present (in piece order)32 Total drives present: 23334 Tray Slot35 85 5 [mirrored pair with drive at tray 85, slot 6]36 85 6 [mirrored pair with drive at tray 85, slot 5]3738[root@node-1 ~]# SMcli localhost -c "show storagearray healthstatus;"39 Performing syntax check...40 Syntax check complete.41 Executing script...4243 Storage array health status = optimal.44 Script execution complete.45 SMcli completed successfully.46 [root@node-1 ~]#
[caption id="attachment_32" align="aligncenter" width="600"] M440 metadata storage and controllers[/caption]
I guess the best would have been to have:
- connected the NetApp E-Series E2700 to the network
- installed SANtricity software GUI on a Windows host
- checked the progress from SANtricity GUI
About disk storage array CLI. I got spoiled early on with DDN S2A (2004) command line, later the SFA CLI (2008) disappointed me. Nexsan didn't provide any CLI at all (2012). But today I realized NetApp / LSI / Engenio (2014) is definitely not better. Still a lot of progress in the world of block storage appliances to achieve (If it still makes sense for the hardware vendors to work on it).
About TrackIt
https://www.youtube.com/watch?v=QBiJ156cA2I
TrackIt is an international AWS cloud consulting, systems integration, and software development firm headquartered in Marina del Rey, CA.
We have built our reputation on helping media companies architect and implement cost-effective, reliable, and scalable Media & Entertainment workflows in the cloud. These include streaming and on-demand video solutions, media asset management, and archiving, incorporating the latest AI technology to build bespoke media solutions tailored to customer requirements.
Cloud-native software development is at the foundation of what we do. We specialize in Application Modernization, Containerization, Infrastructure as Code and event-driven serverless architectures by leveraging the latest AWS services. Along with our Managed Services offerings which provide 24/7 cloud infrastructure maintenance and support, we are able to provide complete solutions for the media industry.
