Problem
Need to replace all the disks in your NetApp FAS or IBM N-Series Storage System?
This explains how to deal with a common disk replacement scenario that occurs within SMEs running entry-level dual controller NetApp FAS or IBM N-Series storage systems, where the controllers are running in an active/active configuration with each controller owning one aggregate.
Scenario
The Customer has two NetApp FAS or IBM N-series ONTAP 7-mode 7.x or 8.x Storage Systems on their network. One is used for production and one for backups/archiving.
The Customer wants to replace all of the disks in the production storage system.
The Storage System we are working with is a 2-controller system. In this example, the controllers are called CONTROLLER1 and CONTROLLER2.
CONTROLLER1 owns 12 disks in tray 1 (Internal SATA/SAS Tray) CONTROLLER2 owns 14 disks in tray 2 (External FC/SAS Tray)
The disks we want to replace are the 12 disks in tray1 owned by CONTROLLER1
There are no unallocated or unowned disks in tray 2. All disks are 100% allocated to their controller’s respective Aggregates (aggr0 for CONTROLLER1, aggr1 for CONTROLLER2). Both aggregates are RAID-DP and each has a single online spare.
All production volumes on CONTROLLER1 have already been backed up and migrated elsewhere. All that remains on aggr0 is the root volume for CONTROLLER1, which we call vol0 in this example.
Removing the disks from tray1 will destroy root volume vol0 for CONTROLLER1, rendering the controller unbootable, therefore forcing a time consuming rebuild of the controller’s configuration.
Procedure
This procedure avoids this undesirable consequence and provides for a very fast route to the desired upgrade with a minimum of risk.
The migration takes place in 4 steps:
- Backing up volumes to another NetApp/IBM N series storage system
- Freeing up and reassigning disks
- Copying the root volume and forcing boot from the copy.
- Replacing the disks
- Assigning the new disks
- Copying the root volume to the new disks and forcing boot from the copy
- Tidy-up
Backing up volumes This assumes you have another IBM N-Series or NetApp 7.x or 8.x 7-mode storage system on the network with spare capacity to take a backup of all of the data on the Storage system being upgraded. If you have a D2D or D2T backup system that can take block level backups of NetApp volumes (such as Snapvault) then this solution could be used instead of the method described in this section.
The controller host name of the second storage system we are using for backup purposes is SYSTEM2 in this exercise.
-
On SYSTEM2 create volumes identical in name, size and configuration to those on CONTROLLER2: CONTROLLER2> vol create g CONTROLLER2> snap reserve 0
- Repeat step 1 for all volumes on CONTROLLER2
- Create a user on CONTROLLER2 called ndmpuser
- Create a user on SYSTEM2 called ndmpuser
- On CONTROLLER2 run commands:
CONTROLLER2> ndmpd on
(This turns on “ndmp” volume to volume block copy functionality)
CONTROLLER2> ndmpd password ndmpuser
This will return a response similar to this: password g8x8xxxxxmZ0121
- On SYSTEM2 run commands:
SYSTEM2> ndmpd on
SYSTEM2> ndmpd password ndmpuser
This will return a response similar to this: password 48so8xxxxxxZ00nA
- On CONTROLLER2 run command:
CONTROLLER2> ndmpdcopy –sa ndmpuser: –da ndmpuser: "CONTROLLER2:/vol/" "SYSTEM2: :/vol/"
-
Wait for the NDMPCOPY block level backup to complete
-
Repeat steps 7 and 8 for all volumes on CONTROLLER2
-
The backup of all production volumes on the Storage system is now complete. Your data is now safe
Freeing up and reassigning disks
This is the “clever” bit. We actually have two disks on CONTROLLER2’s aggregate that we can use to temporarily host CONTROLLER1’s root volume vol0. Where? One is the online spare and the other is the second parity disk. We just need to persuade CONTROLLER2 to give them up.
Obviously the production data on CONTROLLER2’s aggregate is at risk in this procedure – one slip of the fingers, or a disk hardware failure while carrying out this procedure could wipe out the whole aggregate and everything on it. This is why we backed up all the volumes first!
- First, we change the aggregate on CONTROLLER2 from RAID-DP to RAID4. This frees up the disk we need.
CONTROLLER2> aggr options raidtype raid4
- Now we take a look at the Disk Ids for all 12 disks on CONTROLLER2
CONTROLLER2> disk show
This returns a list similar to this:
DISK OWNER
0c.00.10 CONTROLLER1(142242188) Pool0 WD-WCAW11111111
0c.00.11 CONTROLLER1(142242188) Pool0 WD-WCAW11111111
0c.00.7 CONTROLLER1(142242188) Pool0 WD-WCAW11111111
0c.00.6 CONTROLLER1(142242188) Pool0 WD-WCAW11111111
0c.00.9 CONTROLLER1(142242188) Pool0 WD-WCAW11111111
0c.00.1 CONTROLLER1(142242188) Pool0 WD-WCAW11111111
0c.00.2 CONTROLLER1(142242188) Pool0 WD-WCAW11111111
0c.00.4 CONTROLLER1(142242188) Pool0 WD-WCAW11111111
0c.00.3 CONTROLLER1(142242188) Pool0 WD-WCAW11111111
0c.00.5 CONTROLLER1(142242188) Pool0 WD-WCAW11111111
0c.00.8 CONTROLLER1(142242188) Pool0 WD-WCAW11111111
0b.16 CONTROLLER2(142242248) Pool0 aaaaaaaa
0b.26 CONTROLLER2(142242248) Pool0 aaaaaaaa
0b.25 CONTROLLER2(142242248) Pool0 aaaaaaaa
0b.27 CONTROLLER2(142242248) Pool0 aaaaaaaa
0b.29 CONTROLLER2(142242248) Pool0 aaaaaaaa
0b.18 CONTROLLER2(142242248) Pool0 aaaaaaaa
0b.17 CONTROLLER2(142242248) Pool0 aaaaaaaa
0b.20 CONTROLLER2(142242248) Pool0 aaaaaaaa
0b.21 CONTROLLER2(142242248) Pool0 aaaaaaaa
0b.23 CONTROLLER2(142242248) Pool0 aaaaaaaa
0b.22 CONTROLLER2(142242248) Pool0 aaaaaaaa
0b.28 CONTROLLER2(142242248) Pool0 aaaaaaaa
0b.19 CONTROLLER2(142242248) Pool0 aaaaaaaa
0b.24 CONTROLLER2(142242248) Pool0 aaaaaaaa
0c.00.0 CONTROLLER1(142242188) Pool0 WD-WCAW11111111
- Find out which two of CONTROLLER2’s disks are the spares (one being the original spare and the second being the parity disk we liberated by converting the aggregate to RAID4 earlier. The spares can be viewed under CONTROLLER2 - Storage - Disks in NetApp System Manager. They will have a “State” of “Spare”.
In this case, let’s say the Spare disks are: 0b.16 0b.29
- First we turn disk Autoassign off: on CONTROLLER2:
CONTROLLER2> options disk.auto_assign off
on CONTROLLER1:
CONTROLLER1> options disk.auto_assign off
- We now need to remove the owner from these two disks: From CONTROLLER2:
``shell CONTROLLER2> disk assign 0b.16 –s unowned –f CONTROLLER2> disk assign 0b.29 –s unowned –f
substituting 0b.16 and 0b.29 with your spare disk Ids
You will get a warning about there not being enough spare disks. Ignore this for now.
6. Next, we assign our two newly liberated disks to CONTROLLER1.
From CONTROLLER1:
```shell
CONTROLLER1> disk show –n
This will show the two unassigned disks something like this:
DISK OWNER POOL SERIAL NUMBER
0b.16 Not Owned NONE AAAAAAAA
0b.29 Not Owned NONE AAAAAAAA
- Now we assign the two disks to CONTROLLER1:
CONTROLLER1> disk assign 0b.16 –o CONTROLLER1
CONTROLLER1> disk assign 0b.29 –o CONTROLLER1
Again, substituting 0b.16 and 0b.29 with the same disk Ids you worked with in step 5
-
If we execute a DISK SHOW command now, we should see that CONTROLLER1 now owns the two liberated disks (0b.16 and 0b.29 in this example)
-
No we can create an aggregate called “aggr_temp” on the two new disks to take the temporary root volume while we swap out CONTROLLER1’s disks:
CONTROLLER1> aggr create aggr_temp -f -t raid4 -d 0b.16 0b.29
again use your own disk IDs per steps 6 & 7
-
Run command AGGR STATUS and watch the status of aggr_temp. As soon as it goes to status “Online” you may proceed to the next step. This process could take several minutes or several hours depending on the speed and size of your disks and other load on the storage system.
-
Now we create a new volume on the new aggregate aggr_temp called vol0_temp of identical size and configuration as CONTROLLER1’s root volume
CONTROLLER1> vol create vol0_temp aggr_temp 12g
CONTROLLER1> snap reserve vol0_temp 0
Copying the root volume and making it bootable
- We are now ready to copy our root volume. We need to elevate our access level for these commands:
CONTROLLER1> priv set diag
- Now we do a block level copy of the root vol:
CONTROLLER1*> ndmpd on
CONTROLLER1*> ndmpcopy "/vol/" /vol/vol0_temp
- Once copy is complete, we tag the copy root volume on the tray 2 disks as the root volume for the controller, then restart the controller.
CONTROLLER1*> vol options vol0_temp root
CONTROLLER1*> reboot –t 0
-
Wait for CONTROLLER1 to reboot, then log back into it.
-
You may now offline and destroy the original root volume and original production aggregate (the one containing the original root volume).
From CONTROLLER1 run commands:
CONTROLLER1*> vol offline
CONTROLLER1*> vol destroy
CONTROLLER1*> aggr offline
CONTROLLER1*> aggr destroy
- We can then remove CONTROLLER1 as the owner of the 12 disks that we are replacing.
Use the DISK SHOW command to get the list of 12 Disk Ids, then run a:
disk remove_ownership
command for each disk
Replacing the disks
The Disks connected to CONTROLLER1 may now be physically replaced with the newer items.
Assigning the new disks
We can now assign the new disks to CONTROLLER1.
- From CONTROLLER1, execute command:
CONTROLLER1> disk show –n
You will see the 12 new disks listed in a manner similar to that shown below:
DISK OWNER POOL SERIAL NUMBER
0c.00.6 Not Owned NONE 11111111111111111111
0c.00.10 Not Owned NONE 11111111111111111111
0c.00.2 Not Owned NONE 11111111111111111111
0c.00.7 Not Owned NONE 11111111111111111111
0c.00.11 Not Owned NONE 11111111111111111111
0c.00.9 Not Owned NONE 11111111111111111111
0c.00.5 Not Owned NONE 11111111111111111111
0c.00.8 Not Owned NONE 11111111111111111111
0c.00.1 Not Owned NONE 11111111111111111111
0c.00.4 Not Owned NONE 11111111111111111111
0c.00.0 Not Owned NONE 11111111111111111111
0c.00.3 Not Owned NONE 11111111111111111111
- Now we assign each of the new disks in turn to CONTROLLER1 using command:
CONTROLLER1> disk assign -o CONTROLLER1
i.e.
disk assign 0c.00.6 -o CONTROLLER1
disk assign 0c.00.10 -o CONTROLLER1
until you have assigned all 12 disks.
- We can now create our new aggregates using the “aggr create” command. The setup of these is obviously up to you. Remember to leave one spare though!
For the purposes of this example, we assume the new aggregate on which the root volume for CONTROLLER1 will be placed is called aggr0.
- Next create your new root volume on the new disks:
CONTROLLER1> vol create vol0 aggr0 12g
Creates a 12GB volume called vol0 on aggregate aggr0. If you prefer another size, use this instead.
- Now we can copy the root volume back to its proper location on the new disks:
CONTROLLER1*> ndmpd on
CONTROLLER1*> ndmpcopy /vol/vol0_temp /vol/vol0
You should substitute vol0 for your chosen volume name if different to the above. But the TEMP vol0 should always go FIRST and the root volume on the new disks SECOND.
- Once copy is complete, we tag the copy root volume on the tray 2 disks as the root volume for the controller, then restart the controller.
CONTROLLER1*> vol options vol0 root
CONTROLLER1*> reboot –t 0
- Wait for CONTROLLER1 to reboot
Tidy-up
- On CONTROLLER1, run these commands to remove the old temp root volume:
CONTROLLER1> Vol offline vol0_temp
CONTROLLER1> Vol destroy vol0_temp
with “Y” to acknowledge
- On CONTROLLER1, run these commands to remove the old temp root aggregate:
CONTROLLER1> Aggr offline aggr_temp
CONTROLLER1> Vol destroy aggr_temp
with “Y” to acknowledge
- On CONTROLLER1, de-assign the two disks we “borrowed” from CONTROLLER2:
CONTROLLER1> options disk.auto_assign off
CONTROLLER1> disk assign –s unowned –f
CONTROLLER1> disk assign –s unowned –f
These will be the same two disk IDs you worked with in “Freeing up disks” step 5
- You are now done with CONTROLLER1. You will switch to CONTROLLER2 for the remainder of the procedure. First, we assign the two “borrowed” disks back to CONTROLLER2
CONTROLLER2> disk assign -o CONTROLLER2
CONTROLLER2> disk assign -o CONTROLLER2
- Then, to wrap up we convert CONTROLLER2’s aggregate back to RAID_DP:
CONTROLLER2> aggr options raidtype raid_dp
All done! Of course, if we want to replace CONTROLLER2’s disks as well, we simply migrate all of CONTROLLER2’s production volumes over to CONTROLLER1 and repeat the procedure above, except that this time, we transpose CONTROLLER1 and CONTROLLER2.