T3B and Sun StorEdge 6120 arrays may go down unexpectedly


Problem


     Firmware version 2.1.4 (and later) for Sun StorEdge T3B arrays, firmware version 3.0.0 (and later) for Sun StorEdge 6120, baseline firmware 2.3.2 (and later) for the Sun StorEdge 3910/3960/6910/6960, baseline firmware 1.1 (and later) for Sun StorEdge 6320 and baseline firmware 2.0.3 (and later) for Sun StorEdge 6920 are subject to the following issue which could affect array availability and possibly data:
These arrays may go down unexpectedly and lose host connectivity for several minutes if the array has run continuously for 994 days without a complete power cycle.  Data may be inaccessible, with a possible loss of data integrity.

Contributing Factors


This issue can occur on the following platforms:
·      Sun StorEdge T3B with firmware 2.1.4 or later
·      Sun StorEdge 6120 with firmware 3.0.0 or later
·      Sun StorEdge 3910/3960/6910/6960 with baseline firmware 2.3.2 or later
·      Sun StorEdge 6320 with baseline firmware 1.1 or later
·      Sun StorEdge 6920 with baseline firmware 2.0.3 or later
To determine the firmware revision on one of these systems, the following command can be run directly on the T3B or 6120:
6120:/:<1>ver
6120 Release 3.1.6 Thu Feb  3 16:48:03 PST 2005 (10.16.10.131)
Copyright (C) 1997-2003 Sun Microsystems, Inc., All Rights Reserved
The 3910, 3960, 6910, 6960, 6320 and 6920 would require a telnet connection to the T3B or 6120 internal array to run 'ver'.

Symptoms


If this issue occurs, systems may experience similar events as listed below:
22709 Apr 22 19:46:27 array00 ISR1 [1]: W: ISP2200 [1] LOOP DOWN detected.
...
22762 Apr 22 19:51:46 array00 LPCT[2]: N: u2d13 Bypassed on loop 2
22763 Apr 22 19:51:46 array00 LPCT[2]: N: u2d14 Bypassed on loop 2
22764 Apr 22 19:51:51 array00 ROOT[2]: N: Initializing loop 1 ISP2200 ... firmware status = 3
22765 Apr 22 19:51:51 array00 ROOT[2]: N: Detected 15 FC-AL ports on loop 1
22766 Apr 22 19:51:51 array00 ROOT[2]: N: loop 1 TARGET_ID = 0xf (ALPA = 0xce)
22767 Apr 22 19:52:18 array00 ROOT[2]: N: Initializing loop 2 ISP2200 ... firmware status = 3
22768 Apr 22 19:52:18 array00 ROOT[2]: N: Detected 29 FC-AL ports on loop 2
22769 Apr 22 19:52:18 array00 ROOT[2]: N: loop 2 TARGET_ID = 0xf (ALPA = 0xce)
22770 Apr 22 19:53:05 array00 ROOT[2]: N: u2ctr found 28 disks in the system
22771 Apr 22 19:53:24 array00 ROOT[2]: N: 6120 Release 3.2.6 Mon Feb  5 02:26:22 MST 2007 (192.168.0.40)
22772 Apr 22 19:53:24 array00 ROOT[2]: N: u2ctr Reset (3000) lpc_hbt.c line 290, Assert(0) => 0
Note: Although the event "uXctr Reset (3000) lpc_hbt.c line xxx, Assert (0) => 0" is a good indicator for this issue, the complete array logs should be analyzed to confirm this.

Solution


To avoid this issue, power cycle the array no later than every 994 days (The recommendation is to power cycle the array every 2 years).
Note: Executing the command 'reset' on the array is not enough to remedy this issue; a complete power cycle is required.
Procedure for the T3B and 6120:
1.     Stop the I/O access to the array.
2.     Wait 2 min.
3.     Run 'shutdown' on the array.
4.     Power off the array.
5.     Wait 1 min.
6.     Power on the array.
7.     Resume the I/O access once you confirm that the array is up.
Procedure for the 3910, 3960, 6910 and 6960:
1.     Stop the I/O access to the array.
2.     Follow the procedure for storage to power off the array.
3.     Follow the procedure for storage to power on the array.
Procedure for the 6320:
1.     Stop the I/O access to the array.
2.     Follow the procedure for storage to power off the array.
 3.           Follow the procedure for storage to power on the array.
4.           Resume the I/O access once you confirm the array is up.
Procedure for the 6920:
1.     Stop the I/O access to the array.
2.     Follow the procedure for this storage to “Performing a Partial Shutdown" (page 59) to power off the 6920.
3.     Pull the power cables from the DSP.
4.     Follow the procedure for this storage to "Restoring the System after a Partial Shutdown" (page 60) to power on the 6920.
5.     Wait 10 min.
6.     Insert power cables back to the DSP.
7.     Wait 5 min.
8.     Resume the I/O access to the array.