TrackIt
TrackIt
Contact us
Blogs

Quantum StorNext M440 Metadata Drive Replacement and Rebuild with NetApp E-Series

Author

Adi

Date Published

If you come on this page from google, don't go further. You are looking for the command "show storagearray longrunningoperations;" :)


Yesterday one of our clients had an issue during the upgrade from SNFS4 to SNFS5. While troubleshooting the issue we had to fail and replace one of the metadata drive from the NetApp E-Series disk array.

[caption id="attachment_32" align="aligncenter" width="600"] M440 metadata storage and controllers[/caption]

Usually we just replace the failed drive with the new drive and check everything is good in the Quantum StorNext web interface and don't bother too much about it.

This time it's a bit different since the first issue brought a lot of visibility to the SAN architecture. We wanted to keep everything stable and avoid to impact the production system with useless latency, lower performances or downtime visible by the users.

So when we decided to replace the metadata drive we wanted to see the progress of the drive replacement to make sure everything goes smoothly and to be able to communicate when the SAN will be completely healthy and in an optimal state.

Unfortunately the only management options we have are:

  • Quantum StorNext web interface
  • NetApp E-Series CLI command (SMcli)

These 2 tools don't provide an intuitive way to check how far we are in the rebuild of the drive. StorNext webui doesn't expose this detail of information and it took us more than 1 hour, 4 calls to support and few google queries to figure out how to get this information from the CLI (SMcli).

[caption id="attachment_31" align="aligncenter" width="751"] SNFS web interface notification for a drive replacement[/caption]

So if you want to know how your rebuild is progressing, you need to type the command "show storagearray longrunningoperations;":

1[root@node-1 ~]# SMcli Qarray1a -S -c "show storagearray longrunningoperations;"
2Long Lived Operations:
3
4 LOGICAL DEVICES OPERATION STATUS TIME REMAINING
5 1 Copyback 86% Completed 17 min
6 TRAY_85_VOL_4 Copyback 85% Completed 17 min
7[root@node-1 ~]#
8
9

Before to go there, we poked around trying the obvious commands to get these information with no luck.

"show storagearray healthstatus" tells us the array is fixing itself but no information about when it will be done:

1[root@node-1 stornext]# SMcli  localhost  -c "show storagearray healthstatus;"
2 Performing syntax check...
3 Syntax check complete.
4 Executing script...
5
6 Storage array health status = fixing.
7 The following failures have been found:
8 Volume - Hot Spare In Use
9 Storage array: Qarray1
10 Volume group: 1
11   Status: Optimal
12   RAID level: 1
13   Failed drive at: tray 85, slot 6
14     Service action (removal) allowed: No
15     Service action LED on component: Yes
16   Replaced by drive at: tray 85, slot 1
17   Volumes: TRAY_85_VOL_3, TRAY_85_VOL_4
18
19 Script execution complete.
20 SMcli completed successfully.
21 [root@node-1 stornext]#
22

"show volumegroup[$number]" tells us who are the members of the volumegroup, but nothing about what is going on:

1[root@node-1 stornext]# SMcli  localhost  -c "show volumegroup[1];"
2 Performing syntax check...
3 Syntax check complete.
4 Executing script...
5
6 DETAILS
7
8
9    Name:              1
10
11       Status:         Optimal
12       Capacity:       558.410 GB
13       Current owner:  Controller in slot B
14
15       Quality of Service (QoS) Attributes
16
17          RAID level:                   1
18          Drive media type:             Hard Disk Drive
19          Drive interface type:         Serial Attached SCSI (SAS)
20          Tray loss protection:         No
21          Data Assurance (DA) capable:  Yes
22          DA enabled volume present:    No
23
24
25       Total Volumes:          2
26          Standard volumes:    2
27          Repository volumes:  0
28          Free Capacity:       0.000 MB
29
30       Associated drives - present (in piece order)
31       Total drives present: 3
32
33          Tray     Slot
34          85       5 [mirrored pair with drive at tray 85, slot 1]
35          85       6
36          85       1 [hot spare drive is sparing for drive at 85, 6]
37
38
39 Script execution complete.
40 SMcli completed successfully.
41 [root@node-1 stornext]#

And if we check the drive itself nothing interesting shows up:

1[root@node-1 stornext]# SMcli  localhost  -c "show drive[85,6];"
2 Performing syntax check...
3 Syntax check complete.
4 Executing script...
5
6 Drive at Tray 85, Slot 6
7
8
9    Status:                   Replaced
10
11    Mode:                     Assigned
12    Raw capacity:             558.912 GB
13    Usable capacity:          558.412 GB
14    World-wide identifier:    50:00:cc:a0:43:32:f6:90:00:00:00:00:00:00:00:00
15    Associated volume group:  1
16
17
18    Port      Channel
19    0         2
20    1         1
21
22
23    Media type:                     Hard Disk Drive
24    Interface type:                 Serial Attached SCSI (SAS)
25    Drive path redundancy:          OK
26
27    Drive capabilities:             Data Assurance (DA), Full Disk Encryption (FDE)
28
29    Security capable:               Yes, Full Disk Encryption (FDE)
30    Secure:                         No
31    Read/write accessible:          Yes
32    Drive security key identifier:  None
33
34    Data Assurance (DA) capable:    Yes
35
36    Speed:                          10,020 RPM
37    Current data rate:              6 Gbps
38    Logical sector size:            512 bytes
39    Physical sector size:           512 bytes
40    Product ID:                     HUC109060CSS601
41    Drive firmware version:         MS04
42    Serial number:                  KSGX0VVR
43    Manufacturer:                   HITACHI
44    Date of manufacture:            Not Available
45
46
47 Script execution complete.
48 SMcli completed successfully.
49 [root@node-1 stornext]#
50

But I am happy we discovered the command "show storagearray longrunningoperations;" to monitor the copy back from the hot spare to the new drive and to confirm everything completed correctly with everything in healthy and optimal status.

1[root@node-1 ~]# SMcli Qarray1a -S -c "show storagearray longrunningoperations;"
2
3 Long Lived Operations:
4    No operation is currently in progress.
5
6 [root@node-1 ~]# SMcli Qarray1a -S -c "show volumegroup[1];"
7 DETAILS
8
9
10    Name:              1
11
12       Status:         Optimal
13       Capacity:       558.410 GB
14       Current owner:  Controller in slot B
15
16       Quality of Service (QoS) Attributes
17
18          RAID level:                   1
19          Drive media type:             Hard Disk Drive
20          Drive interface type:         Serial Attached SCSI (SAS)
21          Tray loss protection:         No
22          Data Assurance (DA) capable:  Yes
23          DA enabled volume present:    No
24
25
26       Total Volumes:          2
27          Standard volumes:    2
28          Repository volumes:  0
29          Free Capacity:       0.000 MB
30
31       Associated drives - present (in piece order)
32       Total drives present: 2
33
34          Tray     Slot
35          85       5 [mirrored pair with drive at tray 85, slot 6]
36          85       6 [mirrored pair with drive at tray 85, slot 5]
37
38[root@node-1 ~]# SMcli  localhost  -c "show storagearray healthstatus;"
39 Performing syntax check...
40 Syntax check complete.
41 Executing script...
42
43 Storage array health status = optimal.
44 Script execution complete.
45 SMcli completed successfully.
46 [root@node-1 ~]#

[caption id="attachment_32" align="aligncenter" width="600"] M440 metadata storage and controllers[/caption]

I guess the best would have been to have:

  • connected the NetApp E-Series E2700 to the network
  • installed SANtricity software GUI on a Windows host
  • checked the progress from SANtricity GUI

About disk storage array CLI. I got spoiled early on with DDN S2A (2004) command line, later the SFA CLI (2008) disappointed me. Nexsan didn't provide any CLI at all (2012). But today I realized NetApp / LSI / Engenio (2014) is definitely not better. Still a lot of progress in the world of block storage appliances to achieve (If it still makes sense for the hardware vendors to work on it).


About TrackIt

https://www.youtube.com/watch?v=QBiJ156cA2I


TrackIt is an international AWS cloud consulting, systems integration, and software development firm headquartered in Marina del Rey, CA.

We have built our reputation on helping media companies architect and implement cost-effective, reliable, and scalable Media & Entertainment workflows in the cloud. These include streaming and on-demand video solutions, media asset management, and archiving, incorporating the latest AI technology to build bespoke media solutions tailored to customer requirements.

Cloud-native software development is at the foundation of what we do. We specialize in Application Modernization, Containerization, Infrastructure as Code and event-driven serverless architectures by leveraging the latest AWS services. Along with our Managed Services offerings which provide 24/7 cloud infrastructure maintenance and support, we are able to provide complete solutions for the media industry.