A3CA08733-A305-01
Fujitsu Storage ETERNUS HX2000 and AX2100
systems
Replacing the controller module
| Replacing the controller module |
Replacing the controller module
You must review the prerequisites for the replacement procedure and select the correct one for your version of
the ETERNUS AX/HX Series operating system.
Before you begin
•
•
All drive shelves must be working properly.
If your system is in an HA pair, the healthy node must be able to take over the node that is being replaced
(referred to in this procedure as the “impaired node”).
•
If your system is in a MetroCluster configuration, you must review the section "Choosing the correct
recovery procedure" in the MetroCluster Management and Disaster Recovery Guide to determine whether
you should use this procedure.
About this task
•
This procedure includes steps for automatically or manually reassigning drives to the replacement node,
depending on your system's configuration.
You should perform the drive reassignment as directed in the procedure.
You must replace the failed component with a replacement FRU component you received from your
provider.
You must be replacing a controller module with a controller module of the same model type. You cannot
upgrade your system by just replacing the controller module.
•
•
•
•
You cannot change any drives or drive shelves as part of this procedure.
In this procedure, the boot device is moved from the impaired node to the replacement node so that the
replacement node will boot up in the same version of ETERNUS AX/HX Series as the old controller module.
It is important that you apply the commands in these steps on the correct systems:
•
•
•
•
The impaired node is the node that is being replaced.
The replacement node is the new node that is replacing the impaired node.
The healthy node is the surviving node.
•
You must always capture the node's console output to a text file.
This provides you a record of the procedure so that you can troubleshoot any issues that you might
encounter during the replacement process.
Shutting down the impaired controller
You can shut down or take over the impaired controller using different procedures, depending on the storage
system hardware configuration.
Shutting down the node
To shut down the impaired node, you must determine the status of the node and, if necessary, take over the
node so that the healthy node continues to serve data from the impaired node storage.
Before you begin
•
If you have a cluster with more than two nodes, it must be in quorum. If the cluster is not in quorum or a
healthy node shows false for eligibility and health, you must correct the issue before shutting down the
impaired node.
•
If you are using Storage Encryption, you must have reset the MSID using the instructions in the “Returning
SEDs to unprotected mode” section of the ONTAP 9 Encryption Power Guide.
4
| Replacing the controller module |
•
•
If you have a SAN system, you must have checked event messages (event log show) for impaired
node SCSI blade.
Each SCSI-blade process should be in quorum with the other nodes in the cluster. Any issues must be
resolved before you proceed with the replacement.
If you have a MetroCluster configuration, you must have confirmed that the MetroCluster Configuration
State is configured and that the nodes are in an enabled and normal state (metrocluster node
show).
Procedure
1. If the impaired node is part of an HA pair, disable automatic giveback from the console of the healthy
node: storage failover modify -node local -auto-giveback false
2. Take the impaired node to the LOADER prompt:
If the impaired node is displaying...
The LOADER prompt
Then...
Go to the next step.
Waiting for giveback...
Press Ctrl-C, and then respond ywhen prompted.
System prompt or password prompt (enter system
password)
Take over or halt the impaired node:
•
For an HA pair, take over the impaired
node from the healthy node: storage
failover takeover -ofnode
impaired_node_name
When the impaired node shows Waiting for
giveback..., press Ctrl-C, and then respond
y.
•
For a stand-alone system: system node
halt impaired_node_name
3. If the system has only one controller module in the chassis, turn off the power supplies, and then unplug
the impaired node's power cords from the power source.
Shutting down a node in a two-node MetroCluster configuration
To shut down the impaired node, you must determine the status of the node and, if necessary, switch over the
node so that the healthy node continues to serve data from the impaired node storage.
About this task
You must leave the power supplies turned on at the end of this procedure to provide power to the healthy
node.
Procedure
1. Check the MetroCluster status to determine whether the impaired node has automatically switched over to
the healthy node: metrocluster show
2. Depending on whether an automatic switchover has occurred, proceed according to the following table:
If the impaired node...
Then...
Has automatically switched over
Has not automatically switched over
Proceed to the next step.
Perform a planned switchover operation from the
healthy node: metrocluster switchover
Has not automatically switched over, you
attempted switchover with the metrocluster
switchovercommand, and the switchover was
vetoed
Review the veto messages and, if possible, resolve
the issue and try again. If you are unable to resolve
the issue, contact technical support.
5
| Replacing the controller module |
3. Resynchronize the data aggregates by running the metrocluster heal -phase aggregates
command from the surviving cluster.
controller_A_1::> metrocluster heal -phase aggregates
[Job 130] Job succeeded: Heal Aggregates is successful.
If the healing is vetoed, you have the option of reissuing the metrocluster healcommand with the
-override-vetoes parameter. If you use this optional parameter, the system overrides any soft vetoes that
prevent the healing operation.
4. Verify that the operation has been completed by using the metrocluster operation show
command.
controller_A_1::> metrocluster operation show
Operation: heal-aggregates
State: successful
Start Time: 7/25/2016 18:45:55
End Time: 7/25/2016 18:45:56
Errors: -
5. Check the state of the aggregates by using the storage aggregate showcommand.
controller_A_1::> storage aggregate show
Aggregate
Status
Size Available Used% State #Vols Nodes
RAID
--------- -------- --------- ----- ------- ------ ----------------
------------
...
aggr_b2
227.1GB 227.1GB
0% online
0 mcc1-a2
raid_dp, mirrored, normal...
6. Heal the root aggregates by using the metrocluster heal -phase root-aggregates
command.
mcc1A::> metrocluster heal -phase root-aggregates
[Job 137] Job succeeded: Heal Root Aggregates is successful
If the healing is vetoed, you have the option of reissuing the metrocluster healcommand with the
-override-vetoes parameter. If you use this optional parameter, the system overrides any soft vetoes that
prevent the healing operation.
7. Verify that the heal operation is complete by using the metrocluster operation showcommand
on the destination cluster:
mcc1A::> metrocluster operation show
Operation: heal-root-aggregates
State: successful
Start Time: 7/29/2016 20:54:41
End Time: 7/29/2016 20:54:42
Errors: -
8. On the impaired controller module, disconnect the power supplies.
Replacing the controller module hardware
To replace the controller module hardware, you must remove the impaired node, move FRU components to the
replacement controller module, install the replacement controller module in the chassis, and then boot the
system to Maintenance mode.
6
| Replacing the controller module |
Opening the system
To replace the controller module, you must first remove the old controller module from the chassis.
Procedure
1. If you are not already grounded, properly ground yourself.
2. Loosen the hook and loop strap binding the cables to the cable management device, and then unplug the
system cables and SFPs (if needed) from the controller module, keeping track of where the cables were
connected.
Leave the cables in the cable management device so that when you reinstall the cable management
device, the cables are organized.
3. Remove and set aside the cable management devices from the left and right sides of the controller
module.
4. If you left the SFP modules in the system after removing the cables, move them to the new controller
module.
5. Squeeze the latch on the cam handle until it releases, open the cam handle fully to release the controller
module from the midplane, and then, using two hands, pull the controller module out of the chassis.
7
| Replacing the controller module |
6. Turn the controller module over and place it on a flat, stable surface.
7. Open the cover by sliding in the blue tabs to release the cover, and then swing the cover up and open.
Moving the NVMEM battery
To move the NVMEM battery from the old controller module to the new controller module, you must perform a
specific sequence of steps.
Procedure
1. Check the NVMEM LED:
•
•
If your system is in an HA configuration, go to the next step.
If your system is in a stand-alone configuration, cleanly shut down the controller module, and then
check the NVRAM LED identified by the NV icon.
8
| Replacing the controller module |
NV
Attention: The NVRAM LED blinks while destaging contents to the flash memory when you
halt the system. After the destage is complete, the LED turns off.
•
If power is lost without a clean shutdown, the NVMEM LED flashes until the destage is
complete, and then the LED turns off.
•
If the LED is on and power is on, unwritten data is stored on NVMEM.
This typically occurs during an uncontrolled shutdown after ETERNUS AX/HX Series have
successfully booted.
2. Locate the NVMEM battery in the controller module.
3. Locate the battery plug and squeeze the clip on the face of the battery plug to release the plug from the
socket, and then unplug the battery cable from the socket.
4. Grasp the battery and press the blue locking tab marked PUSH, and then lift the battery out of the holder
and controller module.
5. Move the battery to the replacement controller module.
6. Loop the battery cable around the cable channel on the side of the battery holder.
7. Position the battery pack by aligning the battery holder key ribs to the “V” notches on the sheet metal side
wall.
8. Slide the battery pack down along the sheet metal side wall until the support tabs on the side wall hook
into the slots on the battery pack, and the battery pack latch engages and clicks into the opening on the
side wall.
Moving the boot media
You must locate the boot media and follow the directions to remove it from the old controller module and
insert it in the new controller module.
Procedure
1. Locate the boot media using the following illustration or the FRU map on the controller module:
9
| Replacing the controller module |
2. Press the blue button on the boot media housing to release the boot media from its housing, and then
gently pull it straight out of the boot media socket.
Note: Do not twist or pull the boot media straight up, because this could damage the socket or the boot
media.
3. Move the boot media to the new controller module, align the edges of the boot media with the socket
housing, and then gently push it into the socket.
4. Check the boot media to make sure that it is seated squarely and completely in the socket.
If necessary, remove the boot media and reseat it into the socket.
5. Push the boot media down to engage the locking button on the boot media housing.
Moving the DIMMs
To move the DIMMs, you must follow the directions to locate and move them from the old controller module
into the replacement controller module.
Before you begin
You must have the new controller module ready so that you can move the DIMMs directly from the impaired
controller module to the corresponding slots in the replacement controller module.
Procedure
1. Locate the DIMMs on your controller module.
2. Note the orientation of the DIMM in the socket so that you can insert the DIMM in the replacement
controller module in the proper orientation.
3. Eject the DIMM from its slot by slowly pushing apart the two DIMM ejector tabs on either side of the DIMM,
and then slide the DIMM out of the slot.
Attention: Carefully hold the DIMM by the edges to avoid pressure on the components on the
DIMM circuit board.
The number and placement of system DIMMs depends on the model of your system.
The following illustration shows the location of system DIMMs:
10
| Replacing the controller module |
4. Repeat these steps to remove additional DIMMs as needed.
5. Verify that the NVMEM battery is not plugged into the new controller module.
6. Locate the slot where you are installing the DIMM.
7. Make sure that the DIMM ejector tabs on the connector are in the open position, and then insert the DIMM
squarely into the slot.
The DIMM fits tightly in the slot, but should go in easily. If not, realign the DIMM with the slot and reinsert
it.
Attention: Visually inspect the DIMM to verify that it is evenly aligned and fully inserted into the
slot.
8. Repeat these steps for the remaining DIMMs.
9. Locate the NVMEM battery plug socket, and then squeeze the clip on the face of the battery cable plug to
insert it into the socket.
Make sure that the plug locks down onto the controller module.
Moving a caching module, if present
If your AX2100 or HX2000 system has a caching module, you need to move the caching module from the old
controller module to the replacement controller module. The caching module is referred to as the “M.2 PCIe
card” on the controller module label.
Before you begin
You must have the new controller module ready so that you can move the caching module directly from the
old controller module to the corresponding slot in the new one. All other components in the storage system
must be functioning properly; if not, you must contact technical support.
Procedure
1. Locate the caching module at the rear of the controller module and remove it.
a) Press the release tab.
b) Remove the heatsink.
11
| Replacing the controller module |
2. Gently pull the caching module straight out of the housing.
3. Move the caching module to the new controller module, and then align the edges of the caching module
with the socket housing and gently push it into the socket.
4. Verify that the caching module is seated squarely and completely in the socket.
If necessary, remove the caching module and reseat it into the socket.
5. Reseat and push the heatsink down to engage the locking button on the caching module housing.
6. Close the controller module cover, as needed.
Installing the controller
After you install the components from the old controller module into the new controller module, you must
install the new controller module into the system chassis and boot the operating system.
About this task
For HA pairs with two controller modules in the same chassis, the sequence in which you install the controller
module is especially important because it attempts to reboot as soon as you completely seat it in the chassis.
Note: The system might update system firmware when it boots. Do not abort this process. The procedure
requires you to interrupt the boot process, which you can typically do at any time after prompted to do so.
However, if the system updates the system firmware when it boots, you must wait until after the update is
complete before interrupting the boot process.
Procedure
1. If you are not already grounded, properly ground yourself.
2. If you have not already done so, replace the cover on the controller module.
3. Align the end of the controller module with the opening in the chassis, and then gently push the controller
module halfway into the system.
Note: Do not completely insert the controller module in the chassis until instructed to do so.
12
| Replacing the controller module |
4. Cable the management and console ports only, so that you can access the system to perform the tasks in
the following sections.
Note: You will connect the rest of the cables to the controller module later in this procedure.
5. Complete the reinstallation of the controller module:
If your system is in...
Then perform these steps...
An HA pair
The controller module begins to boot as soon as
it is fully seated in the chassis. Be prepared to
interrupt the boot process.
a. With the cam handle in the open position,
firmly push the controller module in until
it meets the midplane and is fully seated,
and then close the cam handle to the locked
position.
Attention: Do not use excessive force
when sliding the controller module into
the chassis; you might damage the
connectors.
The controller begins to boot as soon as it is
seated in the chassis.
b. If you have not already done so, reinstall the
cable management device.
c. Bind the cables to the cable management
device with the hook and loop strap.
d. Interrupt the boot process only after
determining the correct timing:
You must look for an Automatic firmware
updateconsole message. If the update
message appears, do not press Ctrl-Cto
interrupt the boot process until after you see
a message confirming that the update is
complete.
Only press Ctrl-Cwhen you see the message
Press Ctrl-C for Boot Menu.
Note: If the firmware update is aborted, the
boot process exits to the LOADER prompt. You
must run the update_flashcommand and
then exit LOADER and boot to Maintenance
mode by pressing Ctrl-Cwhen you see
Starting AUTOBOOT press Ctrl-C to
abort.
If you miss the prompt and the controller
module boots to ETERNUS AX/HX Series, enter
halt, and then at the LOADER prompt enter
boot_ETERNUS AX/HX Series, press
Ctrl-Cwhen prompted, and then boot to
Maintenance mode.
e. Select the option to boot to Maintenance mode
from the displayed menu.
A stand-alone configuration
a. With the cam handle in the open position,
firmly push the controller module in until
it meets the midplane and is fully seated,
and then close the cam handle to the locked
position.
13
| Replacing the controller module |
If your system is in...
Then perform these steps...
Attention: Do not use excessive force
when sliding the controller module
into the chassis to avoid damaging the
connectors.
b. If you have not already done so, reinstall the
cable management device.
c. Bind the cables to the cable management
device with the hook and loop strap.
d. Reconnect the power cables to the power
supplies and to the power sources, and then
turn on the power to start the boot process.
e. Interrupt the boot process only after
determining the correct timing:
You must look for an Automatic firmware
updateconsole message. If the update
message appears, do not press Ctrl-Cto
interrupt the boot process until after you see
a message confirming that the update is
complete.
Only press Ctrl-Cafter you see the Press
Ctrl-C for Boot Menumessage.
Note: If the firmware update is aborted, the
boot process exits to the LOADER prompt. You
must run the update_flashcommand and
then exit LOADER and boot to Maintenance
mode by pressing Ctrl-Cwhen you see
Starting AUTOBOOT press Ctrl-C to
abort.
If you miss the prompt and the controller
module boots to ETERNUS AX/HX Series, enter
halt, and then at the LOADER prompt enter
boot_ETERNUS AX/HX Series, press
Ctrl-Cwhen prompted, and then boot to
Maintenance mode.
f. From the boot menu, select the option for
Maintenance mode.
Important: During the boot process, you might see the following prompts:
•
•
A prompt warning of a system ID mismatch and asking to override the system ID.
A prompt warning that when entering Maintenance mode in an HA configuration you must ensure that
the healthy node remains down.
You can safely respond yto these prompts.
Restoring and verifying the system configuration
After completing the hardware replacement and booting to Maintenance mode, you verify the low-level
system configuration of the replacement controller and reconfigure system settings as necessary.
14
| Replacing the controller module |
Verifying and setting the HA state of the controller module
You must verify the HAstate of the controller module and, if necessary, update the state to match your system
configuration.
Procedure
1. In Maintenance mode from the new controller module, verify that all components display the same HA
state: ha-config show
If your system is in...
The HA state for all components should be...
An HA pair
ha
A MetroCluster FC configuration with four or more mcc
nodes
A two-node MetroCluster FC configuration
A MetroCluster IP configuration
A stand-alone configuration
mcc-2n
mccip
non-ha
2. If the displayed system state of the controller module does not match your system configuration, set the
HAstate for the controller module: ha-config modify controller ha-state
3. If the displayed system state of the chassis does not match your system configuration, set the HAstate for
the chassis: ha-config modify chassis ha-state
15
| Replacing the controller module |
Running system-level diagnostics
You should run comprehensive or focused diagnostic tests for specific components and subsystems whenever
you replace the controller.
Before you begin
About this task
All commands in the diagnostic procedures are issued from the node where the component is being replaced.
Procedure
1. If the node to be serviced is not at the LOADER prompt, reboot the node: halt
After you issue the command, you should wait until the system stops at the LOADER prompt.
2. At the LOADER prompt, access the special drivers specifically designed for system-level diagnostics to
function properly: boot_diags
During the boot process, you can safely respond yto the prompts until the Maintenance mode prompt
(*>) appears.
3. Display and note the available devices on the controller module: sldiag device show -dev mb
The controller module devices and ports displayed can be any one or more of the following:
• bootmedia is the system booting device.
• cna is a Converged Network Adapter or interface not connected to a network or storage device.
• fcal is a Fibre Channel-Arbitrated Loop device not connected to a Fibre Channel network.
• env is motherboard environmentals.
• mem is system memory.
• nic is a network interface card.
• nvram is nonvolatile RAM.
• nvmem is a hybrid of NVRAM and system memory.
• sas is a Serial Attached SCSI device not connected to a disk shelf.
4. Run diagnostics as desired.
If you want to run diagnostic tests on...
Then...
Individual components
a. Clear the status logs: sldiag device
clearstatus
b. Display the available tests for the selected
devices: sldiag device show -dev
dev_name
dev_name can be any one of the ports and
devices identified in the preceding step.
c. Examine the output and, if applicable, select
only the tests that you want to run: sldiag
device modify -dev dev_name -
selection only
-selection only disables all other tests that you
do not want to run for the device.
d. Run the selected tests: sldiag device run
-dev dev_name
After the test is complete, the following
message is displayed:
*> <SLDIAG:_ALL_TESTS_COMPLETED>
e. Verify that no tests failed: sldiag device
status -dev dev_name -long -state
failed
16
| Replacing the controller module |
If you want to run diagnostic tests on...
Then...
System-level diagnostics returns you to the
prompt if there are no test failures, or lists the
full status of failures resulting from testing the
component.
Multiple components at the same time
a. Review the enabled and disabled devices
in the output from the preceding procedure
and determine which ones you want to run
concurrently.
b. List the individual tests for the device: sldiag
device show -dev dev_name
c. Examine the output and, if applicable, select
only the tests that you want to run: sldiag
device modify -dev dev_name -
selection only
-selection only disables all other tests that you
do not want to run for the device.
d. Verify that the tests were modified: sldiag
device show
e. Repeat these substeps for each device that you
want to run concurrently.
f. Run diagnostics on all of the devices: sldiag
device run
Attention: Do not add to or modify
your entries after you start running
diagnostics.
After the test is complete, the following
message is displayed:
*> <SLDIAG:_ALL_TESTS_COMPLETED>
g. Verify that there are no hardware problems on
the node: sldiag device status -long
-state failed
System-level diagnostics returns you to the
prompt if there are no test failures, or lists the
full status of failures resulting from testing the
component.
5. Proceed based on the result of the preceding step.
If the system-level diagnostics tests...
Were completed without any failures
Then...
a. Clear the status logs: sldiag device
clearstatus
b. Verify that the log was cleared: sldiag
device status
The following default response is displayed:
SLDIAG: No log messages are
present.
c. Exit Maintenance mode: halt
The system displays the LOADER prompt.
You have completed system-level diagnostics.
17
| Replacing the controller module |
If the system-level diagnostics tests...
Then...
Resulted in some test failures
Determine the cause of the problem.
a. Exit Maintenance mode: halt
b. Perform a clean shutdown, and then disconnect
the power supplies.
c. Verify that you have observed all of the
considerations identified for running system-
level diagnostics, that cables are securely
connected, and that hardware components are
properly installed in the storage system.
d. Reconnect the power supplies, and then power
on the storage system.
e. Rerun the system-level diagnostics test.
Completing system restoration
To complete the replacement procedure and restore your system to full operation, you must recable the
storage, confirm disk reassignment, restore the Storage Encryption configuration (if necessary), and install
licenses for the new controller. You must complete a series of tasks before restoring your system to full
operation.
18
| Replacing the controller module |
Recabling the system
After running diagnostics, you must recable the controller module's storage and network connections.
Procedure
1. Recable the system.
2. Verify that the cabling is correct by using Config Advisor.
a) Download and install Config Advisor from the Fujitsu Support Site.
b) Enter the information for the target system, and then click Collect Data.
c) Click the Cabling tab, and then examine the output.
Make sure that all disk shelves are displayed and all disks appear in the output, correcting any cabling
issues you find.
d) Check other cabling by clicking the appropriate tab, and then examining the output from Config
Advisor.
Reassigning disks
If the storage system is in an HA pair, the system ID of the new controller module is automatically assigned to
the disks when the giveback occurs at the end of the procedure. In a stand-alone system, you must manually
reassign the ID to the disks.
About this task
You must use the correct procedure for your configuration:
Cinderella was a sleeper
Table 1:
Controller redundancy
Then use this procedure...
HA pair
page 19
Stand-alone
system in ETERNUS AX/HX Series on page 21
Two-node MetroCluster configuration
two-node MetroCluster configuration on page 21
Verifying the system ID change on an HA system
You must confirm the system ID change when you boot the replacement node and then verify that the change
was implemented.
About this task
This procedure applies only to systems running ETERNUS AX/HX Series in an HA pair.
Procedure
1. If the replacement node is in Maintenance mode (showing the *>prompt, exit Maintenance mode and go
to the LOADER prompt: halt
2. From the LOADER prompt on the replacement node, boot the node, entering yif you are prompted to
override the system ID due to a system ID mismatch: boot_ETERNUS AX/HX Series
3. Wait until the Waiting for giveback...message is displayed on the replacement node console
and then, from the healthy node, verify that the new partner system ID has been automatically assigned:
storage failover show
In the command output, you should see a message that the system ID has changed on the impaired node,
showing the correct old and new IDs. In the following example, node2 has undergone replacement and
has a new system ID of 151759706.
19
| Replacing the controller module |
node1> storage failover show
Takeover
Possible
--------
Node
Partner
State Description
------------
------------
-------------------------------------
node1
node2
false
System ID changed on
151759755, New:
partner (Old:
151759706), In takeover
node2
node1
-
Waiting for giveback (HA
mailboxes)
4. From the healthy node, verify that any coredumps are saved:
a) Change to the advanced privilege level: set -privilege advanced
You can respond Ywhen prompted to continue into advanced mode. The advanced mode prompt
appears (*>).
b) Save any coredumps: system node run -node local-node-name partner savecore
c) Wait for savecore command to complete before issuing the giveback.
You can enter the following command to monitor the progress of the savecore command: system
node run -node local-node-name partner savecore -s
d) Return to the admin privilege level: set -privilege admin
5. Give back the node:
a) From the healthy node, give back the replaced node's storage: storage failover giveback -
ofnode replacement_node_name
The replacement node takes back its storage and completes booting.
If you are prompted to override the system ID due to a system ID mismatch, you should enter y.
Note: If the giveback is vetoed, you can consider overriding the vetoes.
b) After the giveback has been completed, confirm that the HA pair is healthy and that takeover is
possible: storage failover show
The output from the storage failover showcommand should not include the System ID
changed on partnermessage.
6. Verify that the disks were assigned correctly: storage disk show -ownership
The disks belonging to the replacement node should show the new system ID. In the following example,
the disks owned by node1 now show the new system ID, 1873775277:
node1> storage disk show -ownership
Disk Aggregate Home Owner DR Home Home ID
Reserver Pool
Owner ID DR Home ID
------- -------
----- ------
----- ------ -------- -------
--------- ---
1.0.0 aggr0_1 node1 node1 -
1873775277 1873775277 -
1873775277 1873775277 -
1873775277 Pool0
1.0.1 aggr0_1 node1 node1
1873775277 Pool0
.
.
.
7. If the system is in a MetroCluster configuration, monitor the status of the node: metrocluster node
show
The MetroCluster configuration takes a few minutes after the replacement to return to a normal state, at
which time each node will show a configured state, with DR Mirroring enabled and a mode of normal. The
metrocluster node show -fields node-systemidcommand output displays the old system
ID until the MetroCluster configuration returns to a normal state.
8. If the node is in a MetroCluster configuration, depending on the MetroCluster state, verify that the DR
home ID field shows the original owner of the disk if the original owner is a node on the disaster site.
This is required if both of the following are true:
20
| Replacing the controller module |
•
•
The MetroCluster configuration is in a switchover state.
The replacement node is the current owner of the disks on the disaster site.
9. If your system is in a MetroCluster configuration, verify that each node is configured: metrocluster
node show - fields configuration-state
node1_siteA::> metrocluster node show -fields configuration-state
dr-group-id
-----------
cluster node
configuration-state
---------------------- --------------
-------------------
1 node1_siteA
1 node1_siteA
1 node1_siteB
1 node1_siteB
node1mcc-001
node1mcc-002
node1mcc-003
node1mcc-004
configured
configured
configured
configured
4 entries were displayed.
10.Verify that the expected volumes are present for each node: vol show -node node-name
11.If you disabled automatic takeover on reboot, enable it from the healthy node: storage failover
modify -node replacement-node-name -onreboot true
Manually reassigning the system ID on a stand-alone system in ETERNUS AX/HX Series
In a stand-alone system, you must manually reassign disks to the new controller's system ID before you return
the system to normal operating condition.
About this task
This procedure applies only to systems that are in a stand-alone configuration.
Procedure
1. If you have not already done so, reboot the replacement node, interrupt the boot process by pressing Ctrl-
C, and then select the option to boot to Maintenance mode from the displayed menu.
You must enter Ywhen prompted to override the system ID due to a system ID mismatch.
2. View the system IDs: disk show -a
You should make a note of the old system ID, which is displayed as part of the disk owner column.
The following example shows the old system ID of 118073209:
*> disk show -a
Local System ID: 118065481
DISK
--------
disk_name
(118073209)
disk_name
(118073209)
.
OWNER
POOL SERIAL NUMBER HOME
-------------
----- ------------- -------------
system-1 (118073209) Pool0 J8XJE9LC
system-1
system-1 (118073209) Pool0 J8Y478RC
system-1
.
.
3. Boot the node: boot_ETERNUS AX/HX Series
Manually reassigning the system ID on systems in a two-node MetroCluster configuration
In a two-node MetroCluster configuration running ETERNUS AX/HX Series, you must manually reassign disks to
the new controller's system ID before you return the system to normal operating condition.
21
| Replacing the controller module |
About this task
This procedure applies only to systems in a two-node MetroCluster configuration running ETERNUS AX/HX
Series.
You must be sure to issue the commands in this procedure on the correct node:
•
•
•
The impaired node is the node on which you are performing maintenance.
The replacement node is the new node that replaced the impaired node as part of this procedure.
The healthy node is the DR partner of the impaired node.
Procedure
1. If you have not already done so, reboot the replacement node, interrupt the boot process by entering
Ctrl-C, and then select the option to boot to Maintenance mode from the displayed menu.
You must enter Ywhen prompted to override the system ID due to a system ID mismatch.
2. View the old system IDs from the healthy node: metrocluster node show -fields node-
systemid,dr-partner-systemid
In this example, the Node_B_1 is the old node, with the old system ID of 118073209:
dr-group-id cluster
partner-systemid
node
node-systemid dr-
----------- --------------------- -------------------- -------------
-------------------
1
Cluster_A
Node_A_1
536872914
118073209
1
Cluster_B
Node_B_1
118073209
536872914
2 entries were displayed.
3. View the new system ID at the Maintenance mode prompt on the impaired node: disk show
In this example, the new system ID is 118065481:
Local System ID: 118065481
...
...
4. Reassign disk ownership (for ETERNUS HX systems) or LUN ownership (for FlexArray systems), by using the
system ID information obtained from the disk showcommand: disk reassign -s old system
ID
In the case of the preceding example, the command is: disk reassign -s 118073209
You can respond Ywhen prompted to continue.
5. Verify that the disks (or FlexArray LUNs) were assigned correctly: disk show -a
Verify that the disks belonging to the replacement node show the new system ID for the replacement
node. In the following example, the disks owned by system-1 now show the new system ID, 118065481:
*> disk show -a
Local System ID: 118065481
DISK
OWNER
POOL SERIAL NUMBER HOME
-------
-------------
----- ------------- -------------
disk_name system-1 (118065481) Pool0 J8Y0TDZC
system-1
(118065481)
disk_name system-1 (118065481) Pool0 J8Y09DXC
system-1
(118065481)
.
.
.
6. From the healthy node, verify that any coredumps are saved:
a) Change to the advanced privilege level:set -privilege advanced
You can respond Ywhen prompted to continue into advanced mode. The advanced mode prompt
appears (*>).
b) Verify that the coredumps are saved:system node run -node local-node-name partner
savecore
22
| Replacing the controller module |
If the command output indicates that savecore is in progress, wait for savecore to complete before
issuing the giveback. You can monitor the progress of the savecore using the system node run -
node local-node-name partner savecore -scommand.</info>.
c) Return to the admin privilege level:set -privilege admin
7. If the replacement node is in Maintenance mode (showing the *> prompt), exit Maintenance mode and
go to the LOADER prompt: halt
8. Boot the replacement node: boot_ETERNUS AX/HX Series
9. After the replacement node has fully booted, perform a switchback: metrocluster switchback
10.Verify the MetroCluster configuration: metrocluster node show - fields configuration-
state
node1_siteA::> metrocluster node show -fields configuration-state
dr-group-id
-----------
cluster node
configuration-state
---------------------- --------------
-------------------
1 node1_siteA
1 node1_siteA
1 node1_siteB
1 node1_siteB
node1mcc-001
node1mcc-002
node1mcc-003
node1mcc-004
configured
configured
configured
configured
4 entries were displayed.
11.Verify the operation of the MetroCluster configuration in Data ONTAP:
a) Check for any health alerts on both clusters:system health alert show
b) Confirm that the MetroCluster is configured and in normal mode:metrocluster show
c) Perform a MetroCluster check:metrocluster check run
d) Display the results of the MetroCluster check:metrocluster check show
e) Run Config Advisor. Go to the Config Advisor page on the Fujitsu Support Site at http://www.fujitsu.com/
After running Config Advisor, review the tool's output and follow the recommendations in the output to
address any issues discovered.
12.Simulate a switchover operation:
a) From any node's prompt, change to the advanced privilege level: set -privilege advanced
You need to respond with ywhen prompted to continue into advanced mode and see the advanced
mode prompt (*>).
b) Perform the switchback operation with the -simulate parameter: metrocluster switchover -
simulate
c) Return to the admin privilege level: set -privilege admin
Installing licenses for the replacement node in ETERNUS AX/HX Series
You must install new licenses for the replacement node if the impaired node was using ETERNUS AX/HX Series
features that require a standard (node-locked) license. For features with standard licenses, each node in the
cluster should have its own key for the feature.
About this task
Until you install license keys, features requiring standard licenses continue to be available to the replacement
node. However, if the impaired node was the only node in the cluster with a license for the feature, no
configuration changes to the feature are allowed. Also, using unlicensed features on the node might put you
out of compliance with your license agreement, so you should install the replacement license key or keys on
the replacement node as soon as possible.
The licenses keys must be in the 28-character format.
You have a 90-day grace period in which to install the license keys. After the grace period, all old licenses are
invalidated. After a valid license key is installed, you have 24 hours to install all of the keys before the grace
period ends.
23
| Replacing the controller module |
Note: If the node is in a MetroCluster configuration and all nodes at a site have been replaced (a single node
in the case of a two-node MetroCluster configuration), license keys must be installed on the replacement node
or nodes prior to switchback.
Note: If the node is in a MetroCluster configuration and all nodes at a site have been replaced, license keys
must be installed on the replacement node or nodes prior to switchback.
Procedure
1. If you need new license keys, obtain replacement license keys on the Fujitsu Support Site in the My
Support section under Software licenses.
Note: The new license keys that you require are automatically generated and sent to the email address
on file. If you fail to receive the email with the license keys within 30 days, you should contact technical
support.
2. Install each license key: system license add -license-code license-key, license-
key...
3. Remove the old licenses, if desired:
a) Check for unused licenses: license clean-up -unused -simulate
b) If the list looks correct, remove the unused licenses: license clean-up -unused
Restoring Storage and Volume Encryption functionality
After replacing the controller module or NVRAM module for a storage system that you previously configured
to use Storage or Volume Encryption, you must perform additional steps to provide uninterrupted Encryption
functionality. You can skip this task on storage systems that do not have Storage or Volume Encryption
enabled.
Procedure
Restore Storage or Volume Encryption functionality by using the appropriate procedure in the Encryption
Power Guide.
Use one of the following procedures, depending on whether you are using onboard or external key
management:
•
•
“Restoring onboard key management encryption keys”
“Restoring external key management encryption keys”
Verifying LIFs and registering the serial number
Before returning the replacement node to service, you should verify that the LIFs are on their home ports, and
register the serial number of the replacement node if AutoSupport is enabled, and reset automatic giveback.
Procedure
1. Verify that the logical interfaces are reporting to their home server and ports: network interface
show -is-home false
If any LIFs are listed as false, revert them to their home ports: network interface revert *
2. Register the system serial number with Fujitsu Support.
If...
Then...
AutoSupport is enabled
Send an AutoSupport message to register the serial
number.
AutoSupport is not enabled
Call Fujitsu Support to register the serial number.
3. If automatic giveback was disabled, reenable it: storage failover modify -node local -
auto-giveback true
24
| Replacing the controller module |
Switching back aggregates in a two-node MetroCluster configuration
After you have completed the FRU replacement in a two-node MetroCluster configuration, you can perform
the MetroCluster operation. This returns the configuration to its normal operating state, with the sync-source
storage virtual machines (SVMs) on the formerly impaired site now active and serving data from the local disk
pools.
About this task
This task only applies to two-node MetroCluster configurations.
Procedure
1. Verify that all nodes are in the enabledstate: metrocluster node show
cluster_B::> metrocluster node show
DR
Configuration DR
State Mirroring Mode
Group Cluster Node
----- ------- -------------- -------------- --------- --------------------
1
cluster_A
controller_A_1 configured
cluster_B
controller_B_1 configured
2 entries were displayed.
enabled
enabled
heal roots completed
waiting for switchback recovery
2. Verify that resynchronization is complete on all SVMs: metrocluster vserver show
3. Verify that any automatic LIF migrations being performed by the healing operations were completed
successfully: metrocluster check lif show
4. Perform the switchback by using the metrocluster switchbackcommand from any node in the
surviving cluster.
5. Verify that the switchback operation has completed: metrocluster show
The switchback operation is still running when a cluster is in the waiting-for-switchbackstate:
cluster_B::> metrocluster show
Cluster
Configuration State
Mode
-------------------- ------------------- ---------
Local: cluster_B configured
Remote: cluster_A configured
switchover
waiting-for-switchback
The switchback operation is complete when the clusters are in the normalstate.:
cluster_B::> metrocluster show
Cluster
Configuration State
Mode
-------------------- ------------------- ---------
Local: cluster_B configured
Remote: cluster_A configured
normal
normal
If a switchback is taking a long time to finish, you can check on the status of in-progress baselines by using
the metrocluster config-replication resync-status showcommand.
6. Reestablish any SnapMirror or SnapVault configurations.
Completing the replacement process
After you replace the part, you can return the failed part to Fujitsu, as described in the RMA instructions
00-800-44-638277 (Europe), or +800-800-80-800 (Asia/Pacific) if you need the RMA number or additional
help with the replacement procedure.
25
| Copyright |
Copyright
Copyright 2020 FUJITSU LIMITED. All rights reserved.
No part of this document covered by copyright may be reproduced in any form or by any means - graphic,
electronic, or mechanical, including photocopying, recording, taping, or storage in an electronic retrieval
system - without prior written permission of the copyright owner.
Software derived from copyrighted Fujitsu material is subject to the following license and disclaimer:
THIS SOFTWARE IS PROVIDED BY FUJITSU "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE, WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL FUJITSU BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Fujitsu reserves the right to change any products described herein at any time, and without notice. Fujitsu
assumes no responsibility or liability arising from the use of products described herein, except as expressly
agreed to in writing by Fujitsu. The use or purchase of this product does not convey a license under any patent
rights, trademark rights, or any other intellectual property rights of Fujitsu.
26
| How to send comments about documentation and receive update notifications |
How to send comments about documentation and receive
update notifications
The latest version of this document and the latest information related to this device are available at the
following site.
If necessary, refer to the manuals for your model.
28
Fujitsu Storage ETERNUS HX2000 and AX2100 systems
Replacing the controller module
A3CA08733-A305-01
Date of issuance: April 2020
Issuance responsibility: FUJITSU LIMITED
• The content of this manual is subject to change without notice.
• This manual was prepared with the utmost attention to detail.
However, Fujitsu shall assume no responsibility for any operational problems as the result of errors, omissions, or the
use of information in this manual.
• Fujitsu assumes no liability for damages to third party copyrights or other rights arising from the use of any information
in this manual.
• The content of this manual may not be reproduced or distributed in part or in its entirety without prior permission from
Fujitsu.
|