™
Sun StorEdge 3900 and 6900
Series Troubleshooting Guide
Sun Microsystems, Inc.
4150 Network Circle
Santa Clara, CA 95054 U.S.A.
650-960-1300
Part No. 816-4290-11
March 2002, Revision A
Send comments about this document to: [email protected]
1. Introduction
2. General Troubleshooting Procedures
3
Troubleshooting Overview Tasks
Multipathing Options in the Sun StorEdge 6900 Series
Alternatives to Sun StorEdge Traffic Manager
To Quiesce the I/ O
3
7
8
▼
▼
▼
▼
▼
▼
▼
▼
8
To Unconfigure the c2 Path
To Suspend the I/ O 10
8
To Return the Path to Production 10
To View the VxDisk Properties 11
To Quiesce the I/ O on the A3/ B3 Link 13
To Suspend the I/ O on the A3/ B3 Link 13
To Return the Path to Production 14
Fibre Channel Links 15
Fibre Channel Link Diagrams 16
Host Side Troubleshooting 18
Storage Service Processor Side Troubleshooting 18
Contents
iii
For Internal Use Only
Command Line Test Examples 19
qlctest(1M) 19
▼
3. Troubleshooting the Fibre Channel Links 23
A1/ B1 Fibre Channel (FC) Link 23
▼
▼
To Verify the Data Host 25
FRU Tests Available for A1/ B1 FC Link Segment 26
To Isolate the A1/ B1 FC Link 28
A2/ B2 Fibre Channel (FC) Link 29
▼
▼
▼
To Verify the Host Side 31
To Verify the A2/ B2 FC Link 33
To Isolate the A2/ B2 FC Link 33
A3/ B3 Fibre Channel (FC) Link 35
▼
▼
To Verify the Host Side 37
To Verify the Storage Service Processor 38
FRU Tests Available for the A3/ B3 FC Link Segment 38
To Isolate the A3/ B3 FC Link 39
▼
A4/ B4 Fibre Channel (FC) Link 40
▼
To Verify the Data Host 42
Sun StorEdge 3900 Series 42
To Isolate the A4/ B4 FC Link 44
▼
4. Configuration Settings 47
Verifying Configuration Settings 47
Contents
iv
For Internal Use Only
▼
▼
5. Troubleshooting Host Devices 53
Host Event Grid 53
▼
Using the Host Event Grid 53
Replacing the Master, Alternate Master, and Slave Monitoring Host 57
▼
▼
6. Troubleshooting Sun StorEdge FC Switch-8 and Switch-16 Devices 61
Sun StorEdge Network FC Switch-8 and Switch-16 Switch Description 61
▼
To Diagnose and Troubleshoot Switch Hardware 62
Switch Event Grid 62
Using the Switch Event Grid 62
▼
▼
7. Troubleshooting Virtualization Engine Devices 69
Virtualization Engine Description 69
Virtualization Engine Diagnostics 70
Service Request Numbers 70
Service and Diagnostic Codes 70
▼
To Retrieve Service Information 70
CLI Interface 70
▼
▼
To Display Log Files and Retrieve SRNs 71
To Clear the Log 72
Contents
v
For Internal Use Only
Virtualization Engine LEDs 72
Power LED Codes 73
Interpreting LED Service and Diagnostic Codes 73
Back Panel Features 74
Ethernet Port LEDs 74
Fibre Channel Link Error Status Report 75
To Check Fibre Channel Link Error Status Manually 76
▼
Translating Host Device Names 78
▼
To Display the VLUN Serial Number 79
Devices That Are Not Sun StorEdge Traffic Manager-Enabled 79
Sun StorEdge Traffic Manager-Enabled Devices 80
To View the Virtualization Engine Map 81
▼
▼
▼
To Failback the Virtualization Engine 83
To Replace a Failed Virtualization Engine 84
▼
To Manually Clear the SAN Database 86
▼
▼
To Reset the SAN Database on Both Virtualization Engines 86
To Reset the SAN Database on a Single Virtualization Engine 86
▼
One Sun StorEdge T3+ array partner pair with 1 500GB RAID 5 LUN per
▼
8. Troubleshooting the Sun StorEdge T3+ Array Devices 99
Explorer Data Collection Utility 99
▼
To Install Explorer Data Collection Utility on the Storage Service
Processor 99
vi
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Troubleshooting the T1/ T2 Data Path 102
Notes 102
T1/ T2 Notification Events 103
Sun StorEdge T3+ Array Storage Service Processor Verification 106
T1/ T2 FRU Tests Available 107
Notes 108
T1/ T2 Isolation Procedures 108
Sun StorEdge T3+ Array Event Grid 109
▼
Using the Sun StorEdge T3+ Array Event Grid 109
▼
Conclusion 122
9. Troubleshooting Ethernet Hubs 123
setupswitch Exit Values 141
Contents vii
For Internal Use Only
viii
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
List of Figures
Sun StorEdge 3900 Series Fibre Channel Link Diagram 16
Sun StorEdge 6900 Series Fibre Channel Link Diagram 17
Data Host Notification of Intermittent Problems 23
Data Host Notification of Severe Link Error 24
Storage Service Processor Notification 24
A2/B2 FC Link Host Side Event 29
A2/B2 FC Link Storage Service Processor Side Event 30
A3/B3 FC Link Host-Side Event 35
A3/B3 FC Link Storage Service Processor-Side Event 36
A3/B3 FC Link Storage Service Processor-Side Event 36
A4/B4 FC Link Data Host Notification 40
FIGURE 7-2
FIGURE 7-3
FIGURE 7-4
FIGURE 7-5
Host Event Grid 54
Switch Event Grid 63
Virtualization Engine Front Panel LEDs 73
Sun StorEdge 6900 Series Logical View 90
Primary Data Paths to the Alternate Master 91
Primary Data Paths to the Master Sun StorEdge T3+ Array 92
Path Failure—Before the Second Tier of Switches 93
List of Figures
ix
FIGURE 8-2
FIGURE 8-3
FIGURE 8-4
FIGURE 8-5
Path Failure —I/O Routed through Both HBAs 94
Virtualization Engine Event Grid 95
Storage Service Processor Event 103
Virtualization Engine Alert 105
Manage Configuration Files Menu 106
Example Link Test Text Output from the Storage Automated Diagnostic Environment 107
Sun StorEdge T3+ array Event Grid 109
List of Figures
x
Preface
The Sun StorEdge 3900 and 6900 Series Troubleshooting Guide provides guidelines
for isolating problems in supported configurations of the Sun StorEdgeTM 3900 and
6900 series. For detailed configuration information, refer to the Sun StorEdge 3900
and 6900 Series Reference Manual.
The scope of this troubleshooting guide is limited to information pertaining to the
components of the Sun StorEdge 3900 and 6900 series, including the Storage Service
Processor and the virtualization engines in the Sun StorEdge 6900 series. This guide
is written for Sun personnel who have been fully trained on all the components in
the configuration.
How This Book Is Organized
This book contains the following topics:
Chapter 2 offers general troubleshooting guidelines, such as quiescing the I/ O, and
tools you can use to isolate and troubleshoot problems.
Chapter 3 provides Fibre Channel link troubleshooting procedures.
Chapter 4 presents information about configuration settings, specific to the Sun
StorEdge 3900 and 6900 series. It also provides a procedure for how to clear the lock
file.
Chapter 5 provides information on host device troubleshooting.
Chapter 6 provides information on Sun StorEdge network FC switch-8 and switch-
16 switch device troubleshooting.
xi
Chapter 7 provides detailed information for troubleshooting the virtualization
engines.
Chapter 8 describes how to troubleshoot the Sun StorEdge T3+ array devices. Also
included in this chapter is information about the Explorer Data Collection Utility.
Chapter 9 discusses ethernet hub troubleshooting. Information associated with the
information.
Appendix A provides virtualization engine references, including SRN and SNMP
Reference, an SRN/ SNMP single point of failure table, and port communication and
service code tables.
Appendix B provides a list of SUNWsecfg Error Messages and recommendations for
corrective action.
Using UNIX Commands
This document may not contain information on basic UNIX commands and
®
procedures such as shutting down the system, booting the system, and configuring
devices.
See one or more of the following for this information:
■ Solaris Handbook for Sun Peripherals
■ AnswerBook2™ online documentation for the Solaris™ operating environment
■ Other software documentation that you received with your system
xii
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Typographic Conventions
Typeface
Meaning
Examples
AaBbCc123
The names of commands, files,
and directories; on-screen
computer output
Edit your.loginfile.
Use ls-ato list all files.
% You have mail.
What you type, when
contrasted with on-screen
computer output
% su
Password:
AaBbCc123
AaBbCc123
Book titles, new words or terms, Read Chapter 6 in the User’s Guide.
words to be emphasized
These are called class options.
You must be superuser to do this.
Command-line variable; replace To delete a file, type rmfilename.
with a real name or value
Shell Prompts
Shell
Prompt
C shell
machine_name%
C shell superuser
machine_name#
Bourne shell and Korn shell
Bourne shell and Korn shell superuser
$
#
Preface
xiii
Related Documentation
Product
Title
Part Number
Late-breaking News
• Sun StorEdge 3900 and 6900 Series Release Notes
816-3247
Sun StorEdge 3900 and 6900
series hardware information
• Sun StorEdge 3900 and 6900 Series Site Preparation Guide
• Sun StorEdge 3900 and 6900 Series Regulatory and Safety
Compliance Manual
816-3242
816-3243
• Sun StorEdge 3900 and 6900 Series Hardware Installation and
Service Manual
816-3244
Sun StorEdge T3 and T3+
array
• Sun StorEdge T3 and T3+ Array Start Here
• Sun StorEdge T3 and T3+ Array Installation, Operation, and
Service Manual
• Sun StorEdge T3 and T3+ Array Administrator’s Guide
• Sun StorEdge T3 and T3+ Array Configuration Guide
• Sun StorEdge T3 and T3+ Array Site Preparation Guide
• Sun StorEdge T3 and T3+ Field Service Manual
• Sun StorEdge T3 and T3+ Array Release Notes
816-0772
816-0773
816-0776
816-0777
816-0778
816-0779
816-0781
Diagnostics
• Storage Automated Diagnostics Environment User’s Guide
816-3142
Sun StorEdge network FC
switch-8 and switch-16
• Sun StorEdge Network FC Switch-8 and Switch-16 Release Notes 816-0842
• Sun StorEdge Network FC Switch-8 and Switch-16 Installation
and Configuration Guide
816-0830
816-2688
816-1986
816-1701
• Sun StorEdge Network FC Switch-8 and Switch-16 Best
Practices Manual
• Sun StorEdge Network FC Switch-8 and Switch-16 Operations
Guide
• Sun StorEdge Network FC Switch-8 and Switch-16 Field
Troubleshooting Guide
SANbox switch management
using SANsurfer
• SANbox 8/16 Segmented Loop Switch Management User’s
Manual
• SANbox-8 Segmented Loop Fibre Channel Switch Installer’s/
User’s Manual
• SANbox-16 Segmented Loop Fibre Channel Switch Installer’s/
User’s Manual
875-3060
875-1881
875-3059
Expansion cabinet
• Sun StorEdge Expansion Cabinet Installation and Service
Manual
805-3067
Storage server processor
• Netra X1 Server User’s Guide
• Netra X1 Server Hard Disk Drive Installation Guide
806-5980
806-7670
xiv Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Accessing Sun Documentation Online
A broad selection of Sun system documentation is located at:
http://www.sun.com/products-n-solutions/hardware/docs
A complete set of Solaris documentation and many other titles are located at:
http://docs.sun.com
Sun Welcomes Your Comments
Sun is interested in improving its documentation and welcomes your comments and
suggestions. You can email your comments to Sun at:
Please include the part number (816-4290-10) of your document in the subject line of
your email.
Preface
xv
xvi
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CHAPTER
1
Introduction
The Sun StorEdge 3900 and 6900 series storage subsystems are complete
preconfigured storage solutions. The configurations for each of the storage
subsystems are shown in TABLE 1-1.
TABLE 1-1
Additional Array
Partner Groups
Supported with
Optional Additional
Expansion Cabinet
Sun StorEdge
Fibre Channel
Switch Supported
Sun StorEdge T3+
Array Partner Groups
Supported
Series
System
Sun StorEdge
3900 series
Sun StorEdge
3910 system
Two 8-port
switches
1 to 4
Not applicable
Sun StorEdge
3960 system
Two 16-port
switches
1 to 4
1 to 3
1 to 5
Sun StorEdge
6900 series
Sun StorEdge
6910 system
Two 8-port
switches
1 to 4
Sun StorEdge
6960 system
Two 16-port
switches
1 to 3
1
Predictive Failure Analysis Capabilities
The Storage Automated Diagnostic Environment software provides the health and
monitoring functions for the Sun StorEdge 3900 and 6900 series systems. This
software provides the following predictive failure analysis (PFA) capabilities.
■ FC links—Fibre Channel links are monitored at all end points using the link FC-
ELS link counters. When link errors surpass the threshold values, an alert is sent.
This enables Sun personnel to replace components that are experiencing high
transient fault levels before a hard fault occurs.
■ Enclosure status—Many devices, like the Sun StorEdge network FC switch-8 and
switch-16 switch and the Sun StorEdge T3+ array, will cause the Storage
Automated Diagnostic Environment alerts to be sent if the temperature
thresholds are exceeded. This enables Sun-trained personnel to address the
problem before the component and enclosure fails.
■ SPOF notification—Storage Automated Diagnostic Environment notification for
path failures and failovers (that is, Sun StorEdge Traffic Manager software
failover) can be considered PFA, since Sun-trained personnel are notified and can
repair the primary path. This eliminates the time of exposure to single points of
failure and helps to preserve customer availability during the repair process.
PFA is not always effective in detecting or isolating failures. The remainder of this
document provides guidelines that can be used to troubleshoot problems that occur
in supported components of the Sun StorEdge 3900 and 6900 series.
2
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CHAPTER
2
General Troubleshooting
Procedures
This chapter contains the following sections:
■ “Troubleshooting Overview Tasks” on page 3
■ “Multipathing Options in the Sun StorEdge 6900 Series” on page 7
■ “Fibre Channel Links” on page 15
■ “Storage Automated Diagnostic Environment Event Grid” on page 21
Troubleshooting Overview Tasks
This section lists the high-level steps to isolate and troubleshoot problems in the Sun
StorEdge 3900 and 6900 series. It offers a methodical approach and lists the tools and
resources available at each step.
Note – A single problem can cause various errors throughout the SAN. A good
practice is to begin by investigating the devices that have experienced “Loss of
Communication” events in the Storage Automated Diagnostic Environment. These
errors usually indicate more serious problems.
A “Loss of Communication” error on a switch, for example, could cause multiple
ports and HBAs to go offline. Concentrating on the switch and fixing that failure can
help bring the ports and HBAs back online.
3
1. Discover the error by checking one or more of the following messages or files:
■ Storage Automated Diagnostic Environment alerts or email messages
■
/var/adm/messages
■
Sun StorEdge T3+ array syslogfile
■ Storage Service Processor messages
■
/var/adm/messages.t3messages
■
/var/adm/log/SEcfglogfile
2. Determine the extent of the problem by using one or more of the following
methods:
■ Storage Automated Diagnostic Environment Topology view
■ Storage Automated Diagnostic Environment Revision Checking (manual patch or
package, to check whether the package or patch is installed)
■ Verify the functionality using one of the following:
■
■
■
■
checkdefaultconfig(1M)
checkt3config(1M)
cfgadm -aloutput
luxadm(1M) output
■ Check the multipathing status using the Sun StorEdge Traffic Manager software
or VxDMP.
3. Check the status of a Sun StorEdge T3+ array by using one or more of the
following methods:
■ Storage Automated Diagnostic Environment device monitoring reports
■ Run the SEcfgscript, which displays and shows the Sun StorEdge T3+ array
configuration
■ Manually open a telnet session to the Sun StorEdge T3+ array
■ luxadm(1M) display output
■ LED status on the Sun StorEdge T3+ array
■ Explorer Data Collection Utility output (located on the Storage Service Processor)
4
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
4. Check the status of the Sun StorEdge FC network switch-8 and switch-16 switches
using the following tools:
■ Storage Automated Diagnostic Environment device monitoring reports
■ Run the SEcfgscript, which displays and shows the Sun StorEdge T3+ array
configuration
■ LED Status (online/ offline, POST error codes found in the Sun StorEdge network
FC switch-8 and switch-16 switch Installation and Configuration Guide)
■ Explorer Data Collection Utility output (located on the Storage Service Processor)
■ SANsurfer GUI
Note – To run the SANsurfer GUI from the Storage Service Processor, you must
export X-Display.)
5. Check the status of the virtualization engine using one or more of the following
methods:
■ Storage Automated Diagnostic Environment device monitoring reports
■ Run the SEcfg script, which displays and shows the virtualization engine
■ Refer to the LED status blink codes in Chapter 7.
6. Quiesce the I/O along the path to be tested as follows:
■ For installations using VERITAS VxDMP, disable vxdmpadm
■ For installations using the Sun StorEdge Traffic Manager software, unconfigure
the Fabric device.
■ Refer to “To Quiesce the I/ O” on page 8
■ Halt the application.
7. Test and isolate the FRUs using the following tools:
■ Storage Automated Diagnostic Environment diagnostic tests (this might require
the use of a loopback cable for isolation)
■ Sun StorEdge T3+ array tests, including t3test(1M), t3ofdg(1M), and
t3volverify(1M), which can be found in the Storage Automated Diagnostic
Environment User’s Guide.
Note – These tests isolate the problem to a FRU that must be replaced. Follow the
instructions in the Sun StorEdge 3900 and 6900 Series Reference Manual and the Sun
StorEdge 3900 and 6900 Installation and Service Manual for proper FRU replacement
procedures.
Chapter 2
General Troubleshooting Procedures
5
For Internal Use Only
8. Verify the fix using the following tools:
■ Storage Automated Diagnostic Environment GUI Topology View and Diagnostic
Tests
■ /var/adm/messageson the data host
9. Return the path to service by using one of the following methods:
■ Multipathing software
■ Restarting the application
6
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Multipathing Options in the Sun
StorEdge 6900 Series
Using the virtualization engines presents several challenges in how multipathing is
handled in the Sun StorEdge 6900 series.
Unlike Sun StorEdge T3+ array and Sun StorEdge network FC switch-8 and switch-
16 switch installations, which present primary and secondary pathing options, the
virtualization engines present only primary pathing options to the data host. The
virtualization engines handle all failover and failback operations and mask those
operations from the multipathing software on the data host.
The following example illustrates a Sun StorEdge Traffic Manager problem on a Sun
StorEdge 6900 series system.
# luxadm display
/dev/rdsk/c6t29000060220041F96257354230303052d0s2
DEVICE PROPERTIES for disk: /dev/rdsk/
c6t29000060220041F96257354230303052d0s2
Status(Port A):
Status(Port B):
Vendor:
O.K.
O.K.
SUN
Product ID:
WWN(Node):
WWN(Port A):
WWN(Port B):
Revision:
SESS01
2a000060220041f4
2b000060220041f4
2b000060220041f9
080C
Serial Num:
Unsupported
Unformatted capacity: 102400.000 MBytes
Write Cache:
Read Cache:
Minimum prefetch:
Maximum prefetch:
Device Type:
Path(s):
Enabled
Enabled
0x0
0x0
Disk device
/dev/rdsk/c6t29000060220041F96257354230303052d0s2
/devices/scsi_vhci/ssd@g29000060220041f96257354230303052:c,raw
Controller
Device Address
Class
/devices/pci@6,4000/SUNW,qlc@2/fp@0,0
2b000060220041f4,0
primary
State
ONLINE
Controller
Device Address
Class
/devices/pci@6,4000/SUNW,qlc@3/fp@0,0
2b000060220041f9,0
primary
State
ONLINE
Chapter 2
General Troubleshooting Procedures
7
For Internal Use Only
Note that in the Classand Statefields, the virtualization engines are presented as
two primary/ ONLINE devices. The current Sun StorEdge Traffic Manager design
does not enable you to manually halt the I/ O (that is, you cannot perform a failover
to the secondary path) when only primary devices are present.
Alternatives to Sun StorEdge Traffic Manager
As an alternative to using Sun StorEdge Traffic Manager, you can manually halt the
I/ O using one of two methods: quiesce I/ O and unconfigure the c2 path. These
methods are explained below.
▼ To Quiesce the I/ O
1. Determine the path you want to disable.
2. Type:
# cfgadm -c unconfigure device
▼ To Unconfigure the c2 Path
1. Type:
# cfgadm -al
Ap_Id
Type
Receptacle Occupant
Condition
c0
scsi-bus
disk
disk
connected
connected
connected
connected
connected
connected
connected
connected
connected
connected
connected
configured unknown
configured unknown
configured unknown
configured unknown
configured unknown
configured unknown
unconfigured unknown
configured unknown
configured unknown
unconfigured unknown
configured unknown
unconfigured unknown
unconfigured unknown
c0::dsk/c0t0d0
c0::dsk/c0t1d0
c1
c1::dsk/c1t6d0
c2
c2::210100e08b23fa25
c2::2b000060220041f4
scsi-bus
CD-ROM
fc-fabric
unknown
disk
fc-fabric
unknown
disk
c3
c3::210100e08b230926
c3::2b000060220041f9
c4
c5
fc-private connected
fc connected
8
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
2. Using Storage Automated Diagnostic Environment Topology GUI, determine
which virtualization engine is in the path you need to disable.
3. Use the world wide name (WWN) of the virtualization engine that is in the
unconfigure command, as follows:
# cfgadm -c unconfigure c2::2b000060220041f4
# cfgadm -al
Ap_Id
Type
Receptacle Occupant
Condition
c0
scsi-bus
disk
disk
connected
connected
connected
connected
connected
connected
connected
connected
connected
connected
connected
configured unknown
configured unknown
configured unknown
configured unknown
configured unknown
unconfigured unknown
unconfigured unknown
unconfigured unknown
configured unknown
unconfigured unknown
configured unknown
unconfigured unknown
unconfigured unknown
c0::dsk/c0t0d0
c0::dsk/c0t1d0
c1
c1::dsk/c1t6d0
c2
c2::210100e08b23fa25
c2::2b000060220041f4
scsi-bus
CD-ROM
fc-fabric
unknown
disk
fc-fabric
unknown
disk
c3
c3::210100e08b230926
c3::2b000060220041f9
c4
c5
fc-private connected
fc connected
4. Verify that I/O has halted.
This halts the I/ O only up to the A3/ B3 link (see FIGURE 2-2). I/ O continues to move
over the T1 and T2 paths, as well as the A4/ B4 links to the Sun StorEdge T3+ array.
Chapter 2
General Troubleshooting Procedures
9
For Internal Use Only
▼ To Suspend the I/ O
Use one of the following methods to suspend the I/ O while the failover occurs:
1. Stop all customer applications that are accessing the Sun StorEdge T3+ array.
2. Manually pull the link from the Sun StorEdge T3+ array to the switch and wait
for a Sun StorEdge T3+ array LUN failover.
■ After the failover occurs, replace the cable and proceed with testing and FRU
isolation.
■ After testing and any FRU replacement is finished, return the Controller state
back to the default by using virtualization engine failback. Refer to
“Virtualization Engine Failback” on page 81.
Note – To confirm that a failover is occurring, open a telnet session to the Sun
StorEdge T3+ array and check the output of portlistmap.
Another, but slower, method is to run the runsecfgscript and verify the
virtualization engine maps by polling them against a live system.
Caution – During the failover, SCSI errors will occur on the data host and a brief
suspension of I/ O will occur.
▼ To Return the Path to Production
1. Type cfgadm -c configuredevice.
# cfgadm -c configure c2::2b000060220041f4
2. Verify that I/O has resumed on all paths.
10
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
▼ To View the VxDiskProperties
1. Type the following:
# vxdisk list Disk_1
Device:
Disk_1
devicetag: Disk_1
type:
sliced
hostid:
disk:
group:
flags:
diag.xxxxx.xxx.COM
name=t3dg02 id=1010283311.1163.diag.xxxxx.xxx.com
name=t3dg id=1010283312.1166.diag.xxxxx.xxx.com
online ready private autoconfig nohotuse autoimport imported
pubpaths: block=/dev/vx/dmp/Disk_1s4 char=/dev/vx/rdmp/Disk_1s4
privpaths: block=/dev/vx/dmp/Disk_1s3 char=/dev/vx/rdmp/Disk_1s3
version:
iosize:
public:
private:
update:
headers:
configs:
logs:
2.2
min=512 (bytes) max=2048 (blocks)
slice=4 offset=0 len=209698816
slice=3 offset=1 len=4095
time=1010434311 seqno=0.6
0 248
count=1 len=3004
count=1 len=455
Defined regions:
config
config
log
priv 000017-000247[000231]: copy=01 offset=000000 enabled
priv 000249-003021[002773]: copy=01 offset=000231 enabled
priv 003022-003476[000455]: copy=01 offset=000000 enabled
Multipathing information:
numpaths:
2
c20t2B000060220041F4d0s2
c23t2B000060220041F9d0s2
state=enabled
state=enabled
# vxdmpadm listctlr all
CTLR-NAME
ENCLR-TYPE
STATE
ENCLR-NAME
=====================================================
c0
c2
c3
c20
c23
OTHER_DISKS
SENA
SENA
Disk
Disk
ENABLED
ENABLED
ENABLED
ENABLED
ENABLED
OTHER_DISKS
SENA0
SENA0
Disk
Disk
From the VxDiskoutput, notice that there are two physical paths to the LUN:
■
c20t2B000060220041F4d0s2
c23t2B000060220041F9d0s2
■
Both of these paths are currently enabled with VxDMP.
Chapter 2
General Troubleshooting Procedures
11
For Internal Use Only
2. Use the luxadm(1M) command to display further information about the
underlying LUN.
# luxadm display /dev/rdsk/c20t2B000060220041F4d0s2
DEVICE PROPERTIES for disk: /dev/rdsk/c20t2B000060220041F4d0s2
Status(Port A):
Vendor:
O.K.
SUN
Product ID:
WWN(Node):
WWN(Port A):
Revision:
SESS01
2a000060220041f4
2b000060220041f4
080C
Serial Num:
Unsupported
Unformatted capacity: 102400.000 MBytes
Write Cache:
Read Cache:
Minimum prefetch:
Maximum prefetch:
Device Type:
Path(s):
Enabled
Enabled
0x0
0x0
Disk device
/dev/rdsk/c20t2B000060220041F4d0s2
/devices/pci@a,2000/pci@2/SUNW,qlc@4/fp@0,0
ssd@w2b000060220041f4,0:c,raw
# luxadm display /dev/rdsk/c23t2B000060220041F9d0s2
DEVICE PROPERTIES for disk: /dev/rdsk/c23t2B000060220041F9d0s2
Status(Port A):
Vendor:
O.K.
SUN
Product ID:
WWN(Node):
WWN(Port A):
Revision:
SESS01
2a000060220041f9
2b000060220041f9
080C
Serial Num:
Unsupported
Unformatted capacity: 102400.000 MBytes
Write Cache:
Read Cache:
Minimum prefetch:
Maximum prefetch:
Device Type:
Path(s):
Enabled
Enabled
0x0
0x0
Disk device
/dev/rdsk/c23t2B000060220041F9d0s2
/devices/pci@e,2000/pci@2/SUNW,qlc@4/fp@0,0/
ssd@w2b000060220041f9,0:c,raw
12
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
▼ To Quiesce the I/ O on the A3/ B3 Link
1. Determine the path you want to disable.
2. Disable the path by typing the following:
# vxdmpadm disable ctlr=<c#>
3. Verify that the path is disabled:
# vxdmpadm listctlr all
Steps 1 and 2 halt I/ O only up to the A3/ B3 link. I/ O will continue to move over the
T1 & T2 paths, as well as the A4/ B4 links to the Sun StorEdge T3+ array.
▼ To Suspend the I/ O on the A3/ B3 Link
Use one of the following methods to suspend I/ O while the failover occurs:
1. Stop all customer applications that are accessing the Sun StorEdge T3+ array.
2. Manually pull the link from the Sun StorEdge T3+ array to the switch and wait
for a Sun StorEdge T3+ array LUN failover.
a. After the failover occurs, replace the cable and proceed with testing and FRU
isolation.
b. After testing is complete and any FRU replacement is finished, return the
controller state back to the default by using the virtualization engine failback
command.
Caution – This action will cause SCSI errors on the data host and a brief suspension
of I/ O while the failover occurs.
Chapter 2
General Troubleshooting Procedures
13
For Internal Use Only
Fibre Channel Links
The following sections provide troubleshooting information for the basic
components and Fibre Channel links, listed in TABLE 2-1.
TABLE 2-1
Link
Provides Fibre Channel Link Between these Components
Datahost, sw1a, and sw1b
A1 to B1
A2
sw1a and v1a*
B2
sw1b and v1b*
A3
v1a and sw2a*
B3
v1b and sw2b*
A4
Master Sun StorEdge T3+ array and the “A” path switch
AltMaster Sun StorEdge T3+ array and the “B” path switch
sw2a and sw2b*
B4
T1 to T2
* Sun StorEdge 6900 series only
Note – In an actual Sun StorEdge 3900 or 6900 series configuration, there could be
more Sun StorEdge T3+ arrays than are shown in FIGURE 2-1 and FIGURE 2-2.
By using the Storage Automated Diagnostic Environment, you should be able to
isolate the problem to one particular segment of the configuration.
The information found in this section is based on the assumption that the Storage
Automated Diagnostic Environment is running on the data host, and that it is
is not installed on the data host, there will be areas of limited monitoring, diagnosis
and isolation.
The following diagrams provide troubleshooting information for the basic
components and Fibre Channel links specific to the Sun StorEdge 3900 series, shown
in FIGURE 2-1, and the Sun StorEdge 6900 series, shown in FIGURE 2-2.
Chapter 2
General Troubleshooting Procedures
15
For Internal Use Only
Fibre Channel Link Diagrams
FIGURE 2-1 shows the basic components and the Fibre Channel links for a Sun
StorEdge 3900 series system:
■ A1 to B1—HBA to Sun StorEdge FC network switch-8 and switch-16 switch link
■ A4 to B4—Sun StorEdge FC network switch-8 and switch-16 switch to Sun
StorEdge T3+ array link
HOST
HBA-B
HBA-A
B1
A1
Sw1a
Sw1b
B4
T3 Alt-Master
A4
T3 Master
FIGURE 2-1 Sun StorEdge 3900 Series Fibre Channel Link Diagram
16
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
FIGURE 2-2 shows the basic components and the Fibre Channel links for a Sun
StorEdge 6900 series system:
■ A1 to B1—HBA to Sun StorEdge network FC switch-8 and switch-16 switch link
■ A2 to B2—Sun StorEdge network FC switch-8 and switch-16 switch to
virtualization engine link on the host side
■ A3 to B3—Sun StorEdge network FC switch-8 and switch-16 switch to the
virtualization engine link on the device side
■ A4 to B4—Sun StorEdge network FC switch-8 and switch-16 switch to Sun
StorEdge T3+ array switch
■ T1 to T2—T Port switch-to-switch link
HOST
HBA-A
HBA-B
B1
A1
Sw1b
Sw1a
B2
A2
V1b
V1a
B3
A3
T1
Sw2b
Sw2a
T2
B4
T3 Alt-Master
A4
T3 Master
FIGURE 2-2 Sun StorEdge 6900 Series Fibre Channel Link Diagram
Chapter 2
General Troubleshooting Procedures
17
For Internal Use Only
Host Side Troubleshooting
Host-side troubleshooting refers to the messages and errors the data host detects.
Usually, these messages appear in the /var/adm/messagesfile.
Storage Service Processor Side Troubleshooting
Storage Service Processor-side Troubleshooting refers to messages, alerts, and errors
that the Storage Automated Diagnostic Environment, running on the Storage Service
Processor, detects. You can find these messages by monitoring the following Sun
StorEdge 3900 series and the Sun StorEdge 6900 series components:
■ Sun StorEdge network FC switch-8 and switch-16 switches
■ Virtualization engine
■ Sun StorEdge T3+ array
Combining the host side messages and errors and the Storage Service Processor-side
messages, alerts, and errors into a meaningful context is essential for proper
troubleshooting.
18
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Command Line Test Examples
To run a single Sun StorEdge diagnostic test from the command line rather than
through the Storage Automated Diagnostic Environment interface, you must log into
the appropriate Host or Slave for testing the components. The following two tests,
the qlctest(1M) and the switchtest(1M) are provided as examples.
qlctest(1M)
The qlctest(1M) comprises several subtests that test the functions of the Sun
StorEdge PCI dual Fibre Channel (FC) host adapter board. This board is an HBA that
has diagnostic support. This diagnostic test is not scalable.
CODE EXAMPLE 2-1 qlctest(1M)
# /opt/SUNWstade/Diags/bin/qlctest -v -o "dev=
/devices/pci@6,4000/SUNW,qlc@3/fp@0,0:devctl|run_connect
=Yes|mbox=Disable|ilb=Disable|ilb_10=Disable|elb=Enable"
"qlctest: called with options: dev=/devices/pci@6,4000/SUNW,qlc@3/
fp@0,0:devctl|run_connect=Yes|mbox=Disable|ilb=Disable|ilb_10=Disable|el
b=Enable"
"qlctest: Started."
"Program Version is 4.0.1"
"Testing qlc0 device at /devices/pci@6,4000/SUNW,qlc@3/fp@0,0:devctl."
"QLC Adapter Chip Revision = 1, Risc Revision = 3,
Frame Buffer Revision = 1029, Riscrom Revision = 4,
Driver Revision = 5.a-2-1.15 "
"Running ECHO command test with pattern 0x7e7e7e7e"
"Running ECHO command test with pattern 0x1e1e1e1e"
"Running ECHO command test with pattern 0xf1f1f1f1"
<snip>
"Running ECHO command test with pattern 0x4a4a4a4a"
"Running ECHO command test with pattern 0x78787878"
"Running ECHO command test with pattern 0x25252525"
"FCODE revision is ISP2200 FC-AL Host Adapter Driver: 1.12 01/01/16"
"Firmware revision is 2.1.7f"
"Running CHECKSUM check"
"Running diag selftest"
"qlctest: Stopped successfully."
Chapter 2
General Troubleshooting Procedures
19
For Internal Use Only
switchtest(1M)
switchtest(1M) is used to diagnose the Sun StorEdge network FC switch-8 and
switch-16 switch devices. The switchtestprocess also provides command line
access to switch diagnostics. switchtestsupports testing on local and remote
switches.
switchtestruns the port diagnostic on connected switch ports. While
switchtestis running, the port statistics are monitored for errors, and the chassis
status is checked.
CODE EXAMPLE 2-2 switchtest(1M)
# /opt/SUNWstade/Diags/bin/switchtest -v -o "dev=
2:192.168.0.30:0x0|xfersize=200"
"switchtest: called with options: dev=2:192.168.0.30:0x0|xfersize=200"
"switchtest: Started."
"Testing port: 2"
"Using ip_addr: 192.168.0.30, fcaddr: 0x0 to access this port."
"Chassis Status for Device: Switch Power: OK Temp: OK 23.0c Fan 1: OK Fan
2: OK"
"Testing Device: Switch Port: 2 Pattern: 0x7e7e7e7e"
"Testing Device: Switch Port: 2 Pattern: 0x1e1e1e1e"
"Testing Device: Switch Port: 2 Pattern: 0xf1f1f1f1"
"Testing Device: Switch Port: 2 Pattern: 0xb5b5b5b5"
"Testing Device: Switch Port: 2 Pattern: 0x4a4a4a4a"
"Testing Device: Switch Port: 2 Pattern: 0x78787878"
"Testing Device: Switch Port: 2 Pattern: 0xe7e7e7e7"
"Testing Device: Switch Port: 2 Pattern: 0xaa55aa55"
"Testing Device: Switch Port: 2 Pattern: 0x7f7f7f7f"
"Testing Device: Switch Port: 2 Pattern: 0x0f0f0f0f"
"Testing Device: Switch Port: 2 Pattern: 0x00ff00ff"
"Testing Device: Switch Port: 2 Pattern: 0x25252525"
"Port: 2 passed all tests on Switch"
"switchtest: Stopped successfully."
All Storage Automated Diagnostic Environment diagnostics tests are located in
/opt/SUNWstade/Diags/bin. Refer to the Storage Automated Diagnostic
Environment User’s Guide for a complete list of tests, subtests, options, and
restrictions.
20
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Storage Automated Diagnostic
Environment Event Grid
The Storage Automated Diagnostic Environment generates component-specific event
grids that describe the severity of an Event, whether action is required, a description
of the event, and recommended action. Refer to Chapters 5 through 9 of this
troubleshooting guide for component-specific event grids.
1. Click the Event Grid link on the the Storage Automated Diagnostic Environment
Help menu.
2. Select the criteria from the Storage Automated Diagnostic Environment event
grid, like the one shown in in TABLE 2-2.
TABLE 2-2
Category
Event Grid Sorting Criteria
Component
Event Type
Severity
Action
• All (Default)
• All
(Default)
• Agent Deinstall
• Agent Install
Red—
Critical
(Error)
Y—This
event is
actionable
and is sent
to RSS/
• Sun StorEdge A3500FC array
• Sun StorEdge A5000 array
• Agent
• Host
• Message
• Sun Switch
• Sun StorEdge T3+ array
• Tape
• Backplane • Alarm
• Controller • Alternate Master +
• Disk
• Interface
• LUN
• Alternate Master—
• Audit
• CommunicationEstablished
• CommunicationLost
• Discovery
SRS
Yellow—
Alert
(Warning)
• Port
• Power
N—This
event is
non
• Vvirtualization engine
• Heartbeat
• Insert Component
• Location Change
• Patch Info
actionable
• Quiesce End
• Quiesce Start
• Removal
• Remove Component
• State Change +(from offline
to online)
Down—
System
Down
• State Change—(from online
to offline)
• Statistics
• Backup
Chapter 2
General Troubleshooting Procedures
21
For Internal Use Only
22
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CHAPTER
3
Troubleshooting the Fibre Channel
Links
A1/ B1 Fibre Channel (FC) Link
If a problem occurs with the A1/ B1 FC link:
■ In a Sun StorEdge 3900 series system, the Sun StorEdge T3+ array will fail over.
■ In a Sun StorEdge 6900 series system, no Sun StorEdge T3+ array will fail over,
but a severe problem can cause a path to go offline.
FIGURE 3-1, FIGURE 3-2, and FIGURE 3-3 are examples of A1/ B1 Fibre Channel Link
Notification Events.
Site
Source
: FSDE LAB Broomfield CO
: diag.xxxxx.xxx.com
Severity : Normal
Category : Message
Key: message:diag.xxxxx.xxx.com
EventType: LogEvent.driver.LOOP_OFFLINE
EventTime: 01/08/2002 14:34:45
Found 1 ’driver.LOOP_OFFLINE’ error(s) in logfile: /var/adm/messages on
diag.xxxxx.xxx.com (id=80fee746):
info: Loop Offline
Jan 8 14:34:25 WWN:
Received 2 ’Loop Offline’ message(s) [threshold is 1
in 5mins] Last-Message: ’diag.xxxxx.xxx.com qlc: [ID 686697 kern.info] NOTICE:
Qlogic qlc(0): Loop OFFLINE ’
FIGURE 3-1 Data Host Notification of Intermittent Problems
23
Site
Source
: FSDE LAB Broomfield CO
: diag.xxxxx.xxx.com
Severity : Normal
Category : Message
Key: message:diag.xxxxx.xxx.com
EventType: LogEvent.driver.MPXIO_offline
EventTime: 01/08/2002 14:48:02
Found 2 ’driver.MPXIO_offline’ warning(s) in logfile: /var/adm/messages on
diag.xxxxx.xxx.com (id=80fee746):
Jan 8 14:47:07 WWN:2b000060220041f9
diag.xxxxx.xxx.com mpxio: [ID
779286 kern.info] /scsi_vhci/ssd@g29000060220041f96257354230303053
(ssd19) multipath status: degraded, path /pci@6,4000/SUNW,qlc@3/fp@0,0
(fp1) to target address: 2b000060220041f9,1 is offline
Jan 8 14:47:07 WWN:2b000060220041f9
diag.xxxxx.xxx.com mpxio: [ID
779286 kern.info] /scsi_vhci/ssd@g29000060220041f96257354230303052
(ssd18) multipath status: degraded, path /pci@6,4000/SUNW,qlc@3/fp@0,0
(fp1) to target address: 2b000060220041f9,0 is offline
FIGURE 3-2 Data Host Notification of Severe Link Error
Site
Source
: FSDE LAB Broomfield CO
: diag.xxxxx.xxx.com
Severity : Normal
Category : Switch
Key: switch:100000c0dd0057bd
EventType: StateChangeEvent.X.port.6
EventTime: 01/08/2002 14:54:20
’port.6’ in SWITCH diag-sw1a (ip=192.168.0.30) is now Unknown (status-
state changed from ’Online’ to ’Admin’):
FIGURE 3-3 Storage Service Processor Notification
Note – An A1/ B1 FC link error can cause a port in sw1aor sw1bto change state.
24
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
▼ To Verify the Data Host
An error in the A1/ B1 FC link can cause a path to go offline in the multipathing
software.
CODE EXAMPLE 3-1 luxadm(1M) Display
# luxadm display
/dev/rdsk/c6t29000060220041F96257354230303052d0s2
DEVICE PROPERTIES for disk: /dev/rdsk/
c6t29000060220041F96257354230303052d0s2
Status(Port A):
Status(Port B):
Vendor:
O.K.
O.K.
SUN
Product ID:
WWN(Node):
WWN(Port A):
WWN(Port B):
Revision:
SESS01
2a000060220041f4
2b000060220041f4
2b000060220041f9
080C
Serial Num:
Unsupported
Unformatted capacity: 102400.000 MBytes
Write Cache:
Read Cache:
Minimum prefetch:
Maximum prefetch:
Device Type:
Path(s):
Enabled
Enabled
0x0
0x0
Disk device
/dev/rdsk/c6t29000060220041F96257354230303052d0s2
/devices/scsi_vhci/ssd@g29000060220041f96257354230303052:c,raw
Controller
Device Address
Class
/devices/pci@6,4000/SUNW,qlc@3/fp@0,0
2b000060220041f9,0
primary
State
OFFLINE
Controller
Device Address
Class
/devices/pci@6,4000/SUNW,qlc@2/fp@0,0
2b000060220041f4,0
primary
State
ONLINE
...
Chapter 3
Troubleshooting the Fibre Channel Links
25
For Internal Use Only
An error in the A1/ B1 FC link can also cause a device to enter the “unusable” state
in cfgadm. In this case, the output for luxadm-eportwill show that a device that
was “connected” changed to an “unconnected” state.
CODE EXAMPLE 3-2 cfgadm -al Display
...
# cfgadm -al
Ap_Id
c0
c0::dsk/c0t0d0
c0::dsk/c0t1d0
Type
scsi-bus
disk
Receptacle
connected
connected
connected
connected
connected
connected
connected
connected
connected
connected
connected
connected
Occupant
Condition
unknown
unknown
unknown
unknown
unknown
unknown
configured
configured
configured
configured
configured
configured
unconfigured unknown
configured
configured
disk
c1
scsi-bus
CD-ROM
fc-fabric
unknown
disk
fc-fabric
disk
fc-private
fc
c1::dsk/c1t6d0
c2
c2::210100e08b23fa25
c2::2b000060220041f4
unknown
unknown
c3
c3::2b000060220041f9
c4
c5
configured unusable
unconfigured unknown
unconfigured unknown
FRU Tests Available for A1/ B1 FC Link Segment
■ HBA—qlctest(1M)
■
Available only if the Storage Automated Diagnostic Environment is installed
on a data host
■
Causes HBA to go “offline” and “online” during tests
■ Switch —switchtest(1M)
■
Can be run while the link is still cabled and online (connected to HBA)
■
You must specify a payload of 200 bytes or less when testing the A1/ B1 FC
link, while the link is connected to the HBA (limitation in HBA ASIC).
■
■
Can be run only from the Storage Service Processor
The devoption to switchtestis in the following format:
Port:IP-Address:FCAddress
The FCAddresscan be set to 0x0
26
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CODE EXAMPLE 3-3 switchtest(1M) called with options
# ./switchtest -v -o "dev=2:192.168.0.30:0"
"switchtest: called with options: dev=2:192.168.0.30:0"
"switchtest: Started."
"Testing port: 2"
"Using ip_addr: 192.168.0.30, fcaddr: 0x0 to access this port."
"Chassis Status for Device: Switch Power: OK Temp: OK 23.0c Fan 1: OK
Fan 2: OK "
02/06/02 15:09:45 diag Storage Automated Diagnostic Environment MSGID 4001
switchtest.WARNING
switch0: "Maximum transfer size for a FABRIC port is 200. Changing
transfer size 2000 to 200"
"Testing Device: Switch Port: 2 Pattern: 0x7e7e7e7e"
"Testing Device: Switch Port: 2 Pattern: 0x1e1e1e1e"
Note – The Storage Automated Diagnostic Environment automatically resets the
transfer size if it notes that it is about to test a switch to HBA connection. This is
done both in the Storage Automated Diagnostic Environment GUI and from the
command-line interface (CLI).
Chapter 3
Troubleshooting the Fibre Channel Links
27
For Internal Use Only
▼ To Isolate the A1/ B1 FC Link
1. Quiesce the I/O on the A1/B1 FC link path.
2. Run switchtestor qlctestto test the entire link.
3. Break the connection by uncabling the link.
4. Insert a loopback connector into the switch port.
5. Rerun switchtest.
a. If switchtestfails, replace the GBIC and rerun switchtest.
b. If switchtestfails again, replace the switch.
6. Insert a loopback connector into the HBA.
7. Run qlctest.
■ If the test fails, replace the HBA.
■ If the test passes, replace the cable.
8. Recable the entire link.
9. Run switchtestor qlctestto validate the fix.
10. Return the path to production.
28
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
A2/ B2 Fibre Channel (FC) Link
If a problem occurs with the A2/ B2 FC link:
■ In a Sun StorEdge 3900 series system, the Sun StorEdge T3+ array will fail over.
■ In a Sun StorEdge 6900 series system, no Sun StorEdge T3+ array will fail over,
but a severe problem can cause a path to go offline.
FIGURE 3-4 and FIGURE 3-5 are examples of A2/ B2 FC Link Notification Events.
From root Tue Jan 8 18:39:48 2002
Date: Tue, 8 Jan 2002 18:39:47 -0700 (MST)
Message-Id: <[email protected]>
From: Storage Automated Diagnostic Environment.Agent
Subject: Message from ’diag.xxxxx.xxx.com’ (2.0.B2.002)
Content-Length: 2742
You requested the following events be forwarded to you from
’diag.xxxxx.xxx.com’.
Site
Source
: FSDE LAB Broomfield CO
: diag226.xxxxx.xxx.com
Severity : Normal
Category : Message
Key: message:diag.xxxxx.xxx.com
EventType: LogEvent.driver.Fabric_Warning
EventTime: 01/08/2002 17:34:47
Found 1 ’driver.Fabric_Warning’ warning(s) in logfile: /var/adm/messages
on diag.xxxxx.xxx.com (id=80fee746):
Info: Fabric warning
Jan 8 17:34:36 WWN:2b000060220041f4
diag.xxxxx.xxx.com fp: [ID 517869
kern.warning] WARNING: fp(0): N_x Port with D_ID=108000,
PWWN=2b000060220041f4 disappeared from fabric
<snip>
multipath status: degraded, path /pci@6,4000/SUNW,qlc@2/fp@0,0 (fp0) to
target address: 2b000060220041f4,1 is offline
Jan 8 17:34:55 WWN:2b000060220041f4
diag.xxxxx.xxx.com
mpxio: [ID 779286 kern.info] /scsi_vhci/
ssd@g29000060220041f96257354230303052 (ssd18)
multipath status: degraded, path /pci@6,4000/SUNW,qlc@2/fp@0,0 (fp0) to
target address: 2b000060220041f4,0 is offline
FIGURE 3-4 A2/ B2 FC Link Host Side Event
Chapter 3
Troubleshooting the Fibre Channel Links
29
For Internal Use Only
Site
Source
: FSDE LAB Broomfield CO
: diag.xxxxx.xxx.com
Severity : Normal
Category : Switch
Key: switch:100000c0dd0061bb
EventType: StateChangeEvent.X.port.1
EventTime: 01/08/2002 17:38:32
’port.1’ in SWITCH diag-sw1b (ip=192.168.0.31) is now Unknown (status-
state changed from ’Online’ to ’Admin’):
----------------------------------------------------------------
Site
Source
: FSDE LAB Broomfield CO
: diag.xxxxx.xxx.com
Severity : Normal
Category : San
Key: switch:100000c0dd0061bb:1
EventType: LinkEvent.ITW.switch|ve
EventTime: 01/08/2002 17:39:47
ITW-ERROR (765 in 11 mins): Origin: port 1 on ’switch ’sw1b/192.168.0.31’.
Destination: port 1 on ve ’diag-v1b/29000060220041f4’:
Info:
An invalid transmission word (ITW) was detected between two components.
This could indicate a potential problem.
Cause:
Likely Causes are: GBIC, FC Cable and device optical connections.
Action:
To isolate further please run the Storage Automated Diagnostic Environment
tests associated with this link segment.
FIGURE 3-5 A2/ B2 FC Link Storage Service Processor Side Event
30
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
▼ To Verify the Host Side
An error in the A2/ B2 FC link can result in a device being listed as in an “unusable”
state in cfgadm, but no HBAs are listed as in the “unconnected” state in luxadm
output. The multipathing software will note an OFFLINE path.
Chapter 3
Troubleshooting the Fibre Channel Links
31
For Internal Use Only
CODE EXAMPLE 3-4 cfgadm -al
# cfgadm -al
Ap_Id
c0
Type
scsi-bus
Receptacle Occupant
Condition
connected
configured unknown
<snip>
# luxadm -e port
Found path to 2 HBA ports
/devices/pci@6,4000/SUNW,qlc@2/fp@0,0:devctl
/devices/pci@6,4000/SUNW,qlc@3/fp@0,0:devctl
CONNECTED
CONNECTED
# luxadm display /dev/rdsk/c6t29000060220041F96257354230303052d0s2
DEVICE PROPERTIES for disk: /dev/rdsk/
c6t29000060220041F96257354230303052d0s2
Status(Port A):
Status(Port B):
Vendor:
O.K.
O.K.
SUN
Product ID:
WWN(Node):
WWN(Port A):
WWN(Port B):
Revision:
SESS01
2a000060220041f9
2b000060220041f9
2b000060220041f4
080C
Serial Num:
Unsupported
Unformatted capacity: 102400.000 MBytes
Write Cache:
Read Cache:
Minimum prefetch:
Maximum prefetch:
Device Type:
Path(s):
Enabled
Enabled
0x0
0x0
Disk device
/dev/rdsk/c6t29000060220041F96257354230303052d0s2
/devices/scsi_vhci/ssd@g29000060220041f96257354230303052:c,raw
Controller
Device Address
Class
/devices/pci@6,4000/SUNW,qlc@3/fp@0,0
2b000060220041f9,0
primary
State
ONLINE
Controller
Device Address
Class
/devices/pci@6,4000/SUNW,qlc@2/fp@0,0
2b000060220041f4,0
primary
State
OFFLINE
32
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Note – You can find procedures for restoring virtualization engine settings in the
Sun StorEdge 3900 and 6900 Series Reference Manual.
▼ To Verify the A2/ B2 FC Link
You can check the A2/ B2 FC link using the Storage Automated Diagnostic
Environment, Diagnose—Test from Topology functionality. The Storage Automated
Diagnostic Environment’s implementation of diagnostic tests verifies the operation
of user-selected components. Using the Topology view, you can select specific tests,
subtests, and test options.
Refer to Chapter 5 of the Storage Automated Diagnostic Environment User’s Guide for
more information.
FRU Tests Available for A2/B2 FC Link Segment
■ The linktestis not available.
■ The switch and/ or GBIC—switchtesttest:
■
Can be used only in conjunction with the loopback connector.
■
Cannot be cabled to the virtualization engine while switchtestruns.
■ No virtualization engine tests are available at this time.
▼ To Isolate the A2/ B2 FC Link
1. Quiesce the I/O on the A2/B2 FC link path.
2. Break the connection by uncabling the link.
3. Insert the loopback connector into the switch port.
4. Run switchtest:
a. If the test fails, replace the GBIC and rerun switchtest.
b. If the test fails again, replace the switch.
Chapter 3
Troubleshooting the Fibre Channel Links
33
For Internal Use Only
5. If the switch or the GBIC show no errors, replace the remaining components in
the following order:
a. Replace the virtualization engine-side GBIC, recable the link, and monitor the
link for errors.
b. Replace the cable, recable the link, and monitor the link for errors.
c. Replace the virtualization engine, restore the virtualization engine settings,
recable the link, and monitor the link for errors
6. Return the path to production.
The procedures for restoring virtualization engine settings are in the Sun StorEdge
3900 and 6900 Series Reference Manual.
34
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
A3/ B3 Fibre Channel (FC) Link
If a problem occurs with the A3/ B3 FC link:
■ In a Sun StorEdge 3900 series system, the Sun StorEdge T3+ array will fail over.
■ In a Sun StorEdge 6900 series system, no Sun StorEdge T3+ array will fail over,
but a severe problem can cause a path to go offline.
FIGURE 3-6, FIGURE 3-7, and FIGURE 3-8 are examples of A3/ B3 FC link Notification
Events.
Site
Source
: FSDE LAB Broomfield CO
: diag.xxxxx.xxx.com
Severity : Normal
Category : Message
Key: message:diag.xxxxx.xxx.com
EventType: LogEvent.driver.MPXIO_offline
EventTime: 01/08/2002 18:25:18
Found 2 ’driver.MPXIO_offline’ warning(s) in logfile: /var/adm/messages on
diag.xxxxx.xxx.com (id=80fee746):
Jan 8 18:24:24 WWN:2b000060220041f9
diag.xxxxx.xxx.com mpxio: [ID
779286 kern.info] /scsi_vhci/ssd@g29000060220041f96257354230303053
(ssd19) multipath status: degraded, path /pci@6,4000/SUNW,qlc@3/fp@0,0
(fp1) to target address: 2b000060220041f9,1 is offline
Jan 8 18:24:24 WWN:2b000060220041f9
diag.xxxxx.xxx.com mpxio: [ID
779286 kern.info] /scsi_vhci/ssd@g29000060220041f96257354230303052
(ssd18) multipath status: degraded, path /pci@6,4000/SUNW,qlc@3/fp@0,0
(fp1) to target address: 2b000060220041f9,0 is offline
----------------------------------------------------------------
Site
Source
: FSDE LAB Broomfield CO
: diag.xxxxx.xxx.com
Severity : Normal
Category : Message
Key: message:diag.xxxxx.xxx.com
EventType: LogEvent.driver.Fabric_Warning
EventTime: 01/08/2002 18:25:18
Found 1 ’driver.Fabric_Warning’ warning(s) in logfile: /var/adm/messages
on diag.xxxxx.xxx.com (id=80fee746):
Info:
Fabric warning
Jan 8 18:24:04 WWN:2b000060220041f9
diag.xxxxx.xxx.com fp: [ID 517869
kern.warning] WARNING: fp(1): N_x Port with D_ID=104000,
PWWN=2b000060220041f9 disappeared from fabric
FIGURE 3-6 A3/ B3 FC Link Host-Side Event
Chapter 3
Troubleshooting the Fibre Channel Links
35
For Internal Use Only
Site
Source
: FSDE LAB Broomfield CO
: diag.xxxxx.xxx.com
Severity : Normal
Category : Switch
Key: switch:100000c0dd0057bd
EventType: StateChangeEvent.M.port.1
EventTime: 01/08/2002 18:28:38
’port.1’ in SWITCH diag-sw1a (ip=192.168.0.30) is now Not-Available
(status-state changed from ’Online’ to ’Offline’):
Info:
A port on the switch has logged out of the fabric and gone offline
Action:
1. Verify cables, GBICs and connections along Fibre Channel path
2. Check Storage Automated Diagnostic Environment SAN Topology GUI to
identify failing segment of the data path
3. Verify correct FC switch configuration
FIGURE 3-7 A3/ B3 FC Link Storage Service Processor-Side Event
Site
Source
: FSDE LAB Broomfield CO
: diag.xxxxx.xxx.com
Severity : Normal
Category : Switch
Key: switch:100000c0dd00cbfe
EventType: StateChangeEvent.M.port.1
EventTime: 01/08/2002 18:28:40
’port.1’ in SWITCH diag-sw2a (ip=192.168.0.32) is now Not-Available
(status-state changed from ’Online’ to ’Offline’):
Info:
A port on the switch has logged out of the fabric and gone offline
Action:
1. Verify cables, GBICs and connections along Fibre Channel path
2. Check Storage Automated Diagnostic Environment SAN Topology GUI to
identify failing segment of the data path
3. Verify correct FC switch configuration
FIGURE 3-8 A3/ B3 FC Link Storage Service Processor-Side Event
36
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
▼ To Verify the Host Side
An error in the A3/ B3 FC link results in a device being listed as in an “unusable”
state in cfgadm, but no HBAs are listed as in the “unconnected” state in luxadm
output. The multipathing software will note an “offline” path.
CODE EXAMPLE 3-5 Devices in the “connected” state
# cfgadm -al
Ap_Id
c0
c0::dsk/c0t0d0
c0::dsk/c0t1d0
Type
scsi-bus
disk
Receptacle
connected
connected
connected
connected
connected
connected
connected
connected
connected
connected
connected
connected
connected
Occupant
Condition
unknown
unknown
unknown
unknown
unknown
unknown
configured
configured
configured
configured
configured
configured
unconfigured unknown
configured
configured
configured unusable
unconfigured unknown
unconfigured unknown
unconfigured unknown
disk
c1
scsi-bus
CD-ROM
fc-fabric
unknown
disk
fc-fabric
disk
unknown
fc-private
fc
c1::dsk/c1t6d0
c2
c2::210100e08b23fa25
c2::2b000060220041f4
unknown
unknown
c3
c3::2b000060220041f9
c3::210100e08b230926
c4
c5
# luxadm -e port
Found path to 2 HBA ports
/devices/pci@6,4000/SUNW,qlc@2/fp@0,0:devctl
/devices/pci@6,4000/SUNW,qlc@3/fp@0,0:devctl
CONNECTED
CONNECTED
# luxadm display
/dev/rdsk/c6t29000060220041F96257354230303052d0s2
DEVICE PROPERTIES for disk: /dev/rdsk/
c6t29000060220041F96257354230303052d0s2
<snip>
/devices/scsi_vhci/ssd@g29000060220041f96257354230303052:c,raw
Controller
Device Address
Class
/devices/pci@6,4000/SUNW,qlc@3/fp@0,0
2b000060220041f9,0
primary
State
OFFLINE
Controller
Device Address
Class
/devices/pci@6,4000/SUNW,qlc@2/fp@0,0
2b000060220041f4,0
primary
State
ONLINE
Chapter 3
Troubleshooting the Fibre Channel Links
37
For Internal Use Only
CODE EXAMPLE 3-6 VxDMPError Message
Jan 8 18:26:38 diag.xxxxx.xxx.com vxdmp: [ID 619769 kern.notice] NOTICE:
vxdmp: Path failure on 118/0x1f8
Jan 8 18:26:38 diag.xxxxx.xxx.com vxdmp: [ID 997040 kern.notice] NOTICE:
vxvm:vxdmp: disabled path 118/0x1f8 belonging to the dmpnode 231/0xd0
▼ To Verify the Storage Service Processor
You can check the A3/ B3 FC link using the Storage Automated Diagnostic
Environment, Diagnose—Test from Topology functionality. Storage Automated
Diagnostic Environment’s implementation of diagnostic tests verify the operation of
user-selected components. Using the Topology view, you can select specific tests,
subtests, and test options.
Refer to the Storage Automated Diagnostic Environment User’s Guide for more
information.
FRU Tests Available for the A3/ B3 FC Link
Segment
■ The Linktestis not available.
■ The switch and/ or GBIC - switchtesttest:
■
Can be used only in conjunction with the loopback connector.
■
Cannot be cabled to the virtualization engine while switchtestruns.
■ No virtualization engine tests are available at this time.
38
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
▼ To Isolate the A3/ B3 FC Link
1. Quiesce the I/O on the A3/B3 FC link path.
2. Break the connection by uncabling the link.
3. Insert the loopback connector into the switch port.
4. Run switchtest:
a. If the test fails, replace the GBIC and rerun switchtest.
b. If the test fails again, replace the switch.
5. If the switch or the GBIC show no errors, replace the remaining components in
the following order:
a. Replace the virtualization engine-side GBIC, recable the link, and monitor the
link for errors.
b. Replace the cable, recable the link, and monitor the link for errors.
c. Replace the virtualization engine, restore the virtualization engine settings,
recable the link, and monitor the link for errors
6. Return the path to production.
The procedures for restoring virtualization engine settings are in the Sun StorEdge
3900 and 6900 Series Reference Manual.
Chapter 3
Troubleshooting the Fibre Channel Links
39
For Internal Use Only
A4/ B4 Fibre Channel (FC) Link
If a problem occurs with the A4/ B4 FC link:
■ In a Sun StorEdge 3900 series system, the Sun StorEdge T3+ array will fail over.
■ In a Sun StorEdge 6900 series system, no Sun StorEdge T3+ array will fail over,
but a severe problem can cause a path to go offline.
and FIGURE 3-10 are examples of A4/ B4 Link Notification Events.
Site
Source
: FSDE LAB Broomfield CO
: diag.xxxxx.xxx.com
Severity : Warning
Category : Message
DeviceId : message:diag.xxxxx.xxx.com
EventType: LogEvent.driver.MPXIO_offline
EventTime: 01/29/2002 14:28:06
Found 2 ’driver.MPXIO_offline’ warning(s) in logfile: /var/adm/messages on
diag.xxxxx.xxx.com (id=80e4aa60):
<snip>
----------------------------------------------------------------------
Site
Source
: FSDE LAB Broomfield CO
: diag.xxxxx.xxx.com
Severity : Warning
Category : Message
DeviceId : message:diag.xxxxx.xxx.com
EventType: LogEvent.driver.Fabric_Warning
EventTime: 01/29/2002 14:28:06
Found 1 ’driver.Fabric_Warning’ warning(s) in logfile: /var/adm/messages on
diag.xxxxx.xxx.com (id=80e4aa60):
INFORMATION:
Fabric warning
<snip>
status of hba /devices/pci@a,2000/pci@2/SUNW,qlc@5/fp@0,0:devctl on
diag.xxxxx.xxx.com changed from CONNECTED to NOT CONNECTED
INFORMATION:
monitors changes in the output of luxadm -e port
Found path to 20 HBA ports
/devices/sbus@2,0/SUNW,socal@d,10000:0
NOT CONNECTED
FIGURE 3-9 A4/ B4 FC Link Data Host Notification
40
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Site
Source
: FSDE LAB Broomfield CO
: diag
Severity : Warning
Category : Switch
DeviceId : switch:100000c0dd0061bb
EventType: LogEvent.MessageLog
EventTime: 01/29/2002 14:25:05
Change in Port Statistics on switch diag-sw1b (ip=192.168.0.31):
Port-1: Received 16289 ’InvalidTxWds’ in 0 mins (value=365972 )
----------------------------------------------------------------------
Site
Source
: FSDE LAB Broomfield CO
: diag
Severity : Warning
Category : T3message
DeviceId : t3message:83060c0c
EventType: LogEvent.MessageLog
EventTime: 01/29/2002 14:25:06
Warning(s) found in logfile: /var/adm/messages.t3 on diag (id=83060c0c):
Jan 29 14:12:58 t3b0 ISR1[2]: W: u2ctr ISP2100[2] Received LOOP DOWN async
event
Jan 29 14:13:32 t3b0 MNXT[1]: W: u1ctr starting lun 1 failover
---------------------------------------------------------------------
Site
Source
: FSDE LAB Broomfield CO
: diag
Severity : Warning
Category : T3message
DeviceId : t3message:83060c0c
EventType: LogEvent.MessageLog
EventTime: 01/29/2002 14:11:14
Warning(s) found in logfile: /var/adm/messages.t3 on diag (id=83060c0c):
Jan 29 14:05:18 t3b0 ISR1[1]: W: u2d4 SVD_PATH_FAILOVER: path_id = 0
Jan 29 14:05:18 t3b0 ISR1[1]: W: u2d5 SVD_PATH_FAILOVER: path_id = 0
Jan 29 14:05:18 t3b0 ISR1[1]: W: u2d6 SVD_PATH_FAILOVER: path_id = 0
Jan 29 14:05:18 t3b0 ISR1[1]: W: u2d7 SVD_PATH_FAILOVER: path_id = 0
Jan 29 14:05:18 t3b0 ISR1[1]: W: u2d8 SVD_PATH_FAILOVER: path_id = 0
Jan 29 14:05:18 t3b0 ISR1[1]: W: u2d9 SVD_PATH_FAILOVER: path_id = 0
FIGURE 3-10 Storage Service Processor Notification
Chapter 3
Troubleshooting the Fibre Channel Links
41
For Internal Use Only
▼ To Verify the Data Host
A problem in the A4/ B4 FC Link appears differently on the data host, depending on
if the array is a Sun StorEdge 3900 series or a Sun StorEdge 6900 seriesdevice.
Sun StorEdge 3900 Series
In a Sun StorEdge 3900 series device, the data host multipathing software is
responsible for initiating the failover and reports it in /var/adm/messages, such
as those reported by the Storage Automated Diagnostic Environment email
notifications.
The luxadmfailovercommand is used to fail the Sun StorEdge T3+ array LUNs
back to the proper configuration after the failing FRU is replaced. This command is
issued from the data host.
Sun StorEdge 6900 Series
In a Sun StorEdge 6900 series device, the virtualization engine pairs handle the
failover and the failover is not noted on the data host. All paths would remain
ONLINE and ACTIVE.
The mpdrivefailbackcommand is used, and is issued from the Storage Service
Processor.
Note – In the event of a complete sw1b or sw2b failure in a Sun StorEdge 6900
series configuration, the virtualization engine pairs handle the failover. In addition,
the multipathing software notes a path failure on the data host, Sun StorEdge Traffic
Manager or VxDMP takes the entire path that was connected to the failed switch
offline, and the ISL ports on the surviving switch go offline as well.
42
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
To verify the failover luxadmdisplay can be used, the failed path will be marked
OFFLINE, as shown in CODE EXAMPLE 3-7.
CODE EXAMPLE 3-7 Failed Path marked OFFLINE
# luxadm display /dev/rdsk/c26t60020F200000644>
DEVICE PROPERTIES for disk: /dev/rdsk/
c26t60020F20000064433C3352A60003E82Fd0s2
Status(Port A):
Status(Port B):
Vendor:
O.K.
O.K.
SUN
Product ID:
WWN(Node):
WWN(Port A):
WWN(Port B):
Revision:
T300
50020f2000006443
50020f2300006355
50020f2300006443
0118
Serial Num:
Unsupported
Unformatted capacity: 488642.000 MBytes
Write Cache:
Read Cache:
Minimum prefetch:
Maximum prefetch:
Device Type:
Path(s):
Enabled
Enabled
0x0
0x0
Disk device
/dev/rdsk/c26t60020F20000064433C3352A60003E82Fd0s2
/devices/scsi_vhci/ssd@g60020f20000064433c3352a60003e82f:c,raw
Controller
Device Address
Class
/devices/pci@a,2000/pci@2/SUNW,qlc@5/fp@0,0
50020f2300006355,1
primary
State
OFFLINE
Controller
Device Address
/devices/pci@e,2000/pci@2/SUNW,qlc@5/fp@0,0
50020f2300006443,1
State
ONLINE
Note – This type of error may also cause the device to show up "unusable" in
cfgadm, as shown in CODE EXAMPLE 3-8.
Chapter 3
Troubleshooting the Fibre Channel Links
43
For Internal Use Only
CODE EXAMPLE 3-8 Failed Path marked “unusable”
# cfgadm -al
Ap_Id
ac0:bank0
ac0:bank1
c1
c16
c18
c19
c1::dsk/c1t6d0
c20
c21
Type
memory
memory
Receptacle
connected
empty
Occupant
configured
unconfigured unknown
configured unknown
unconfigured unknown
unconfigured unknown
unconfigured unknown
Condition
ok
scsi-bus
scsi-bus
scsi-bus
scsi-bus
CD-ROM
fc-private
fc-fabric
disk
connected
connected
connected
connected
connected
connected
connected
connected
configured
unconfigured unknown
configured unknown
configured unusable
unknown
c21::50020f2300006355
FRU tests available for the A4/ B4 FC Link
Segment
■ The switchtestcan only be run from the Storage Service Processor
■ The linktestwill be able to isolate the switch and the GBIC on the switch. It
will not be able to isolate the cable or the Sun StorEdge T3+ array controller.
▼ To Isolate the A4/ B4 FC Link
1. Quiesce the I/O on the A4/B4 FC link path.
2. Run linktestfrom the Storage Automated Diagnostic Environment GUI to
isolate suspected failing components.
Alternatively, follow these steps:
1. Quiesce the I/O on the A4/B4 FC link path.
2. Run switchtestto test the entire link (re-create the problem).
3. Break the connnection by uncabling the link.
4. Insert the loopback connector into the switch port.
44
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
5. Rerun switchtest.
a. If switchtestfails, replace the GBIC and rerun switchtest.
b. If the test fails again, replace the switch.
6. If switchtestpasses, assume that the suspect components are the cable and the
Sun StorEdge T3+ array controller.
a. Replace the cable.
b. Rerun switchtest.
7. If the test fails again, replace the Sun StorEdge T3+ array controller.
8. Return the path to production.
9. Return the Sun StorEdge T3+ array LUNs to the correct controllers, if a failover
occured (determine if failovers occur using the luxadmfailoveror mpdrive
failbackcommands).
Chapter 3
Troubleshooting the Fibre Channel Links
45
For Internal Use Only
46
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CHAPTER
4
This chapter contains the following sections:
■ “Verifying Configuration Settings” on page 47
■ “To Clear the Lock File” on page 50
For a complete listing of SUNWsecfg Error Messages and recommended action, refer
to Appendix B.
Verifying Configuration Settings
During the course of troubleshooting, you might need to verify configuration
settings on the various components in the Sun StorEdge 3900 or 6900 series.
▼ To Verify Configuration Settings
● Run one of the following scripts:
■ Use the /opt/SUNWsecfg/runsecfgscript and select the various Verify menu
selections.
■ Run the /opt/SUNWsecfg/bin/checkdefaultconfigscript to check all
accessible components. The output is shown in CODE EXAMPLE 4-1.
■ Run the checkswitch| checkt3config| checkve| checkvemapscripts
manually from /opt/SUNwsecfg/bin.
The scripts listed above check the default configuration files in the /opt/
SUNWsecfg/etcdirectory and compare the current, live settings to those of the
defaults. Any differences are marked with a FAIL.
47
Note – For cluster configurations and systems that are attached to Windows NT, the
default configurations may not match the current installed configuration. Be aware
of this when running the verification scripts. Certain items may be flagged as FAIL
in these special circumstances.
CODE EXAMPLE 4-1 /opt/SUNWsecfg/checkdefaultconfigoutput
# /opt/SUNWsecfg/checkdefaultconfig
Checking all accessible components.....
Checking switch: sw1a
Switch sw1a - PASSED
Checking switch: sw1b
Switch sw1b - PASSED
Checking switch: sw2a
Switch sw2a - PASSED
Checking switch: sw2b
Switch sw2b - PASSED
Please enter the Sun StorEdge T3+ array password :
Checking T3+: t3b0
Checking : t3b0 Configuration.......
Checking command ver
: PASS
: PASS
: PASS
Checking command vol stat
Checking command port list
Checking command port listmap
Checking command sys list
: PASS
: FAIL <-- Failure Noted
Checking T3+: t3b2
Checking : t3b2 Configuration.......
Checking command ver
: PASS
: PASS
: PASS
: PASS
: PASS
Checking command vol stat
Checking command port list
Checking command port listmap
Checking command sys list
<snip>
Checking Virtualization Engine Pair Parameters: v1a
v1a configuration check passed
Checking Virtualization Engine Pair Parameters: v1b
v1b configuration check passed
Checking Virtualization Engine Pair Configuration: v1
checkvemap: virtualization engine map v1 verification complete: PASS.
48
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
10. If anything is marked FAIL, check the /var/adm/log/SEcfglogfile for the
details of the failure.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : ----------
-SAVED CONFIGURATION--------------.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : blocksize : 16k.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : cache : auto.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : mirror : auto.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : mp_support : rw.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : rd_ahead : off.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : recon_rate : med.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : sys memsize : 32
MBytes.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : cache memsize :
256 MBytes.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : .
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : ----------
-CURRENT CONFIGURATION------------.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : blocksize : 16k.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : cache : auto.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : mirror : off.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : mp_support : rw.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : rd_ahead : off.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : recon_rate : med.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : sys memsize : 32
MBytes.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : cache memsize :
256 MBytes.
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : .
Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : ----------
In this example, the mirror setting in the Sun StorEdge T3+ array system settings is
“off.” The SAVED CONFIGURATION setting for this parameter, which is the default
setting, should be “auto.”
Chapter 4
Configuration Settings
49
For Internal Use Only
11. Fix the FAIL condition, and then verify the settings again.
# /opt/SUNWsecfg/bin/checkt3config -n t3b0
Checking : t3b0 Configuration.......
Checking command ver
: PASS
: PASS
: PASS
: PASS
: PASS
Checking command vol stat
Checking command port list
Checking command port listmap
Checking command sys list
If you interrupt any of the SUNWsecfgscripts (by typing a Control-Cdefault font,
for example), a lock file might remain in the /opt/SUNWsecfg/etcdirectory,
causing subsequent commands to fail. Use the following procedure to clear the lock
file.
▼ To Clear the Lock File
1. Type the following command:
# /opt/SUNWsecfg/bin/removelocks
usage : removelocks [-t|-s|-v]
where:
-t - remove all T3+ related lock files.
-s - remove all switch related lock files.
-v - remove all virtualization engine related lock files.
# /opt/SUNWsecfg/bin/removelocks -v
Note – After any virtualization engine configuration change, the script saves a new
copy of the virtualization engine map. This may take a minimum of two minutes,
during which time no additional virtualization engine changes are accepted.
2. Monitor the /var/adm/log/SEcfglogfile to see when the savevemapprocess
successfully exits.
50
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
CODE EXAMPLE 4-2 savevemap output
Tue Jan 29 16:12:34 MST 2002 savevemap: v1 ENTER.
Tue Jan 29 16:12:34 MST 2002 checkslicd: v1 ENTER.
Tue Jan 29 16:12:42 MST 2002 checkslicd: v1 EXIT.
Tue Jan 29 16:14:01 MST 2002 savevemap: v1 EXIT.
When savevemap:<ve-pair>EXITis displayed, the savevemapprocess has
successfully exited.
Chapter 4
Configuration Settings
51
For Internal Use Only
52
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
CHAPTER
5
Troubleshooting Host Devices
This chapter describes how to troubleshoot components associated with a Sun
This chapter contains the following sections:
■ “Using the Host Event Grid” on page 53
■ “To Replace the Master Host” on page 57
■ “To Replace the Alternate Master or Slave Monitoring Host” on page 58
Host Event Grid
The Storage Automated Diagnostic Environment Event Grid enables you to sort host
events by component, category, or event type. The Storage Automated Diagnostic
Environment GUI displays an event grid that describes the severity of the event,
whether action is required, a description of the event, and the recommended action.
Refer to the Storage Automated Diagnostic Environment User’s Guide for more
information.
1. From the Storage Automated Diagnostic Environment Help menu, click the Event
Grid link.
2. Select the criteria from the Storage Automated Diagnostic Environment event
grid, like the one shown in FIGURE 5-1.
53
TABLE 5-1 lists all the host events in the Storage Automated Diagnostic Environment.
Storage Automated Diagnostic Environment Event Grid for the Host
TABLE 5-1
Category
Component
EventType
Sev
Action
Description
Information
host
hba
Alarm+
Yellow
[ Info ] status of hba
/devices/
sbus@9,0/
Monitors changes in
the output of the
luxadm -eport.
SUNW,qlc@0,30000
/fp@0,0:devctl
on
diag.xxxxx.xxx.com
changed from NOT
CONNECTED to
CONNECTED
host
hba
Alarm-
Red
Y
[ Info ] status of hba
/devices/
sbus@9,0/
SUNW,qlc@0,30000
/fp@0,0:devctl
on
• Monitors changes
in the output of the
luxadm -eport.
• Found path to 20
HBA ports.
diag.xxxxx.xxx.com
changed from
CONNECTED to
NOT CONNECTED
host
lun.t300
Alarm-
Red
Y
[ Info ] The state of
lun.T300.c14t500
20F2300003EE5d0s
2.statusA on
diag.xxxxx.xxx.com
changed from OK to
ERROR
luxadmdisplay
reported a change in
the port status of
one of its paths. The
Storage Automated
Diagnostic
Environment then
tries to find to
which enclosure this
path corresponds by
reviewing its
(target=t3:diag244-
t3b0/ 90.0.0.40)
database of Sun
StorEdge T3+ arrays
and virtualization
engines.
Chapter 5
Troubleshooting Host Devices
55
For Internal Use Only
TABLE 5-1
Storage Automated Diagnostic Environment Event Grid for the Host (Continued)
host
lun.VE
Alarm-
Red
Y
[ Info ] The state of
lun.VE.c14t50020
F2300003EE5d0s2.
statusA on
diag.xxxxx.xxx.com
changed from OK to
ERROR
luxadmdisplay
reported a change in
the port status of
one of its paths. The
Storage Automated
Diagnostic
Environment then
tries to find to
which enclosure this
path corresponds by
reviewing its
(target=ve:diag244-
ve0/ 90.0.0.40)
database of Sun
StorEdge T3+ arrays
and virtualization
engines.
host
host
host
host
ifptest
Diagnostic
Test-
Red
Red
Red
Y
ifptest (diag240) on
host failed.
qlctest
Diagnostic
Test-
qlctest (diag240) on
host failed.
socaltest
enclosure
Diagnostic
Test-
socaltest (diag240) on
host failed.
PatchInfo
backup
[ Info ] New patch
and package
information
Send changes to the
output of showrev -
p and pkginfo -|.
generated.
host
enclosure
[ Info ] Agent Backup
Backup of the
configuration file of
the agent.
56
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Replacing the Master, Alternate Master,
and Slave Monitoring Host
The following procedures are a high-level overview of the procedures that are
detailed in the Storage Automated Diagnostic Environment User’s Guide. Follow these
procedures when replacing a master, alternate master, or slave monitoring host.
Note – The procedures for replacing the master host are different from the
procedures for replacing an alternate master or slave monitoring host.
▼ To Replace the Master Host
Refer to Chapter 2 of the Storage Automated Diagnostic Environment User’s Guide for
detailed instructions for the next four steps.
1. Install the SUNWstadepackage on a new Master Host.
2. Run /opt/SUNWstade/bin/ras_installon the new Master Host.
3. Configure the Host as the Master Host.
4. Connect to the Master Server’s GUI at http://<servername>:7654.
Chapter 5
Troubleshooting Host Devices
57
For Internal Use Only
5. Choose Utilities -> System -> Recover Config.
Refer to Chapter 7 of the Storage Automated Diagnostic Environment User’s Guide for
detailed instructions.
a. In the Recover Config window, enter the IP address of any alternate master or
slave monitoring host (all hosts keep a copy of the configuration).
b. Make sure the Recover Config and Reset slave to this master checkboxes are
checked.
c. Click Recover.
6. Choose Maintenance -> General Maintenance.
Ensure that all host and device settings are recovered correctly.
Refer to Chapter 3 of the Storage Automated Diagnostic Environment User’s Guide for
detailed instructions.
7. Choose Maintenance -> General Maintenance -> Start/Stop Agent to start the
agent on the master host.
▼ To Replace the Alternate Master or Slave
Monitoring Host
1. Choose Maintenance -> General Maintenance -> Maintain Hosts.
Refer to Chapter 3, “Maintenance,” of the Storage Automated Diagnostic Environment
User’s Guide.
2. In the Maintain Hosts window, select the host to be replaced from the Existing
Hosts list, and click Delete.
3. Install the new host.
Refer to Chapter 2 of the Storage Automated Diagnostic Environment User’s Guide for
detailed instructions for the next four steps.
4. Install the SUNWstadepackage on the new host.
5. Run /opt/SUNWstade/bin/ras_install.
6. Configure the host as a slave.
58
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
7. Choose Maintenance -> General Maintenance -> Maintain Hosts.
Refer to Chapter 3, “Maintenance,” of the Storage Automated Diagnostic User’s Guide
for detailed instructions.
8. In the Maintain Hosts window, select the new host.
9. Configure the options as needed.
10. Choose Maintenance -> Topology Maintenance -> Topology Snapshot.
a. In the Topology Snapshot window, select the new host.
b. Click Create and Retrieve Selected Topologies.
c. Click Merge and Push Master Topology.
Conclusion
Any time a master, alternate master, or slave monitoring host is replaced, you must
recover the configuration using the procedures described above. This is especially
important when the Storage Service Processor is replaced as a FRU, whether the
Storage Service Processor is the master or the slave.
Chapter 5
Troubleshooting Host Devices
59
For Internal Use Only
60
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CHAPTER
6
Troubleshooting Sun StorEdge FC
Switch-8 and Switch-16 Devices
Sun StorEdge 3900 or 6900 series system.
■ “Sun StorEdge Network FC Switch-8 and Switch-16 Switch Description” on
page 61
■ “Switch Event Grid” on page 62
■ “setupswitch Exit Values” on page 68
■ “Replacing the Master Midplane” on page 68
Sun StorEdge Network FC Switch-8 and
Switch-16 Switch Description
The Sun StorEdge network FC switch-8 and switch-16 switches provide cable
consolidation and increased connectivity for the internal data interconnection
infrastructure.
The switches are paired to provide redundancy. Two switches are used in each Sun
StorEdge 3900 series, and four switches are used in each Sun StorEdge 6900 series.
Each Sun StorEdge network FC switch-8 and switch-16 switch is connected by way
of an Ethernet to the service network for management and service from the Storage
Service Processor.
61
These switches can be monitored through the SANSurfer GUI, which is available on
the Storage Service Processor. You configure and modify the switches using the
Configuration Utilities. Do not configure or modify the switches using any method other
than the SUNWsecfgtools.
▼ To Diagnose and Troubleshoot Switch Hardware
1. To diagnose and troubleshoot the switch hardware, begin by running the
SUNWsecfgcheckswitchutility.
2. For detailed troubleshooting procedures, refer to the Sun StorEdge SAN Field
Troubleshooting Guide, Release 3.0.
The Sun StorEdge SAN Field Troubleshooting Guide, Release 3.0 describes how to
diagnose and troubleshoot the switch hardware. The scope of this document
includes the Sun StorEdge network FC switch-8 and switch-16 switch and the
interconnections (HBA, GBIC, cables) on either side of the switch. In addition, the
document provides examples of fault isolation and includes a Brocade switch
appendix.
Switch Event Grid
The Storage Automated Diagnostic Environment Event Grid enables you to sort
switch events by component, category, or event type. The Storage Automated
Diagnostic Environment GUI displays an event grid that describes the severity of the
event, whether action is required, a description of the event, and the recommended
action. Refer to the Storage Automated Diagnostic Environment User’s Guide for more
information.
1. From the Storage Automated Diagnostic Environment Help menu, click the Event
Grid link.
2. Select the criteria from the Storage Automated Diagnostic Environment event
grid, like the one shown in FIGURE 6-1.
62
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
TABLE 6-1 lists the switch events.
TABLE 6-1
Cat
Storage Automated Diagnostic Environment Event Grid for Switches
Component
EventType
Sev
Action
Description
Information/Action
switch port statistics
Log
Yellow
Y
[ Info/ Action ]
Information: The
switch has reported
a change in an error
counter. This could
indicate a failing
component in the
link.
Change in port
statistics on switch
diag156-sw1b
(ip=192.168.0.31)
Action:
Check the Topology
GUI for any link
errors.
Run linktest on the
link to isolate the
failing FRU. Quiesce
I/ O on the link
before running
linktest.
switch chassis.fan
Alarm
Yellow
Yellow
chassis.fan.1 status
changed from OK
switch chassis.power Alarm
[ Info ]
This event monitors
chassis.power.1 status changes in the
changed from OK status of the chassis’
power supply, as
reported by SANbox
chassis_status.
switch chassis.temp
Alarm
Yellow
Yellow
[ Info ] chassis.temp.1 This event monitors
status changed from
OK
changes in the
status of the chassis’
temperature supply,
as reported by
SANbox
chassis_status.
switch chassis.zone
Alarm
[ Info ] Switch sw1a
was rezoned: [ new
zones ...]
This event reports
changes in the
zoning of a switch.
64
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
TABLE 6-1
Cat
Storage Automated Diagnostic Environment Event Grid for Switches (Continued)
Component
EventType
Sev
Action
Description
Information/Action
switch enclosure
Audit
Auditing a new
switch called ras
d2-swb1
(ip=xxx.0.0.41)
10002000007a609
switch oob
switch oob
Comm_
Established
Communication
regained with sw1a
(ip=xxx.20.67.213)
Comm_Lost
Down
Yes
[ Info/ Action ] Lost
communication with
sw1a
Information:
Ethernet
connectivity to the
switch has been lost.
(ip=xxx.20.67.213)
Recommended
action:
1. Check Ethernet
connectivity to
the switch.
2. Verify that the
switch is booted
correctly with no
POST errors.
3. Verify that the
switch Test Mode
is set for normal
operations.
4. Verify the TCP/
IP settings on
switch via Forced
PROM Mode
access.
5. Replace switch, if
needed.
switch switchtest
Diagnostic
Test-
Red
switchtest (diag240)
on d2-swb1
(ip=xxx.0.0.41)
10002000007a609
Chapter 6
Troubleshooting Sun StorEdge FC Switch-8 and Switch-16 Devices
65
For Internal Use Only
TABLE 6-1
Cat
Storage Automated Diagnostic Environment Event Grid for Switches (Continued)
Component
EventType
Sev
Action
Description
Information/Action
switch enclosure
Discovery
[ Info ] Discovered a
new switch called ras
d2-swb1
(ip=xxx.0.0.41)
10002000007a609
Discovery events
occur the very first
time the agent
probes a storage
device. It creates a
detailed description
of the device
monitored and
sends it using any
active notifier
(NetConnect,
Email).
switch enclosure
LocationChan
ge
Location of switch
rasd2-swb0 (ip
xxx.0.0.40) was
changed
66
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
TABLE 6-1
Cat
Storage Automated Diagnostic Environment Event Grid for Switches (Continued)
Component
EventType
Sev
Action
Description
Information/Action
switch port
StateChange+
[ Info/ Action ] port.1 Port on switch is
in SWITCH diag185
(ip=
now available.
xxx.20.67.185) is
now Available (status-
state changed from
OFFLINE to ONLINE)
switch port
StateChange-
Red
Y
[ Info/ Action ] port.1 Information: A port
in SWITCH diag185 on the switch has
(ip=xxx.20.67.185) logged out of the
is now Not-Available
(status state changed
from ONLINE to
OFFLINE)
Fabric and has gone
offline.
Recommended
action:
1. Verify cables,
GBICs, and
connections
along the Fibre
Channel path.
2. Check Storage
Automated
Diagnostic
Environment
SAN Topology
GUI to identify
failing segment
of the data path.
3. Verify the correct
FC switch
configuration.
switch enclosure
Statistics
[ Info ] Statistics
about switch d2-
swb1
Port Statistics
(ipxxx.0.0.41)
10002000007a609
Chapter 6
Troubleshooting Sun StorEdge FC Switch-8 and Switch-16 Devices
67
For Internal Use Only
Replacing the Master Midplane
Follow this procedure when replacing the master midplane in a Sun StorEdge
network FC switch-8 or switch-16 switch or a Brocade Silkworm switch. This
procedure is detailed in the Storage Automated Diagnostic Environment User’s Guide.
▼ To Replace the Master Midplane
1. Choose Maintenance --> General Maintenance -- > Maintain Devices.
Refer to Chapter 3 of the Storage Automated Diagnostic Environment User’s Guide.
2. In the Maintain Devices window, delete the device that is to be replaced.
3. Choose Maintenance -- > General Maintenance -- > Discovery.
4. In the Device Discovery window, rediscover the device.
5. Choose Maintenance -- > Topology Maintenance -- > Topology Snapshot.
a. Select the host that monitors the replaced FRU.
b. Click Create and Retrieve Selected Topologies.
c. Click Merge and Push Master Topology.
Conclusion
Any time a master midplane is replaced, you must rediscover the device using the
procedure described above. This is especially important when the Storage Service
Processor is replaced as a FRU, whether the Storage Service Processor is the master
or the slave.
68
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CHAPTER
7
Troubleshooting Virtualization
Engine Devices
This chapter describes how to troubleshoot the virtualization engine component of a
This chapter contains the following sections:
■ “Virtualization Engine Description” on page 69
■ “Translating Host Device Names” on page 78
■ “Sun StorEdge 6900 Series Multipathing Example” on page 89
■ “Virtualization Engine Event Grid” on page 95
Virtualization Engine Description
The virtualization engine supports the multipathing functionality of the Sun
StorEdge T3+ array. Each virtualization engine has physical access to all underlying
Sun StorEdge T3+ arrays and controls access to half of the Sun StorEdge T3+ arrays.
The virtualization engine has the ability to assume control of all arrays in the event
of component failure. The configuration is maintained between virtualization engine
pairs through redundant T Port connections by way of a pair of Sun StorEdge
network FC switch-8 or switch-16 switches.
69
Virtualization Engine Diagnostics
The virtualization engine monitors the following components:
■ Virtualization engine router
■ Sun StorEdge T3+ array
■ Cabling among the router and storage
Service Request Numbers
The service request numbers are used to inform the user of storage subsystem
activities.
Service and Diagnostic Codes
The virtualization engine’s service and diagnostic codes inform the user of
subsystem activities. The codes are presented as a LED readout. See Appendix A for
the table of codes and actions to take. In some cases, you might not be able to receive
Service Request Numbers (SRNs) because of communication errors. If this occurs,
you must read the virtualization engine LEDs to determine the problem.
▼ To Retrieve Service Information
You can retrieve service information in two ways:
■ CLI Interface
■ Error Log Analysis Commands
Both of these methods are described in the following sections.
CLI Interface
The SLIC daemon, which runs on the Storage Service Processor, communicates with
the virtualization engine. The SLIC daemon periodically polls the virtualization
engine for all subsystem errors and for topology changes. It then passes this
information in the form of an SRN to the Error Log file.
70
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
▼ To Display Log Files and Retrieve SRNs
Use the /opt/svengine/sduc/sreadlogcommand to display log files and
retrieve the Service Request Numbers (SRN) for errors that need action. Data is
returned in the following format:
TimeStamp:nnn:Txxxxx.uuuuuuuu SRN=mmmmm
TimeStamp:nnn:Txxxxx.uuuuuuuu SRN=mmmmm
TimeStamp:nnn:Txxxxx.uuuuuuuu SRN=mmmmm
Item
Description
TimeStamp
nnn
Time and date when error occurred
The name of the virtualization engine pair (v1 or v2)
The LUN where the error occurred.
Txxxxx
Note: Txxxxx can represent a physical or a logical LUN.
uuuuuuuu
The unique ID of the drive or the virtualization engine router
The SRN defined in numerical order
SRN=mmmmm
Example
# /opt/svengine/sduc/sreadlog -d v1
2002:Jan:3:10:13:05:v1.29000060-220041F9.SRN=70030
2002:Jan:3:10:13:31:v1.29000060-220041F9.SRN=70030
2002:Jan:3:10:17:10:v1.29000060-220041F9.SRN=70030
2002:Jan:3:10:17:37:v1.29000060-220041F9.SRN=70030
2002:Jan:3:10:22:26:v1.29000060-220041F9.SRN=70030
2002:Jan:3:10:25:54:v1.29000060-220041F9.SRN=70030
Chapter 7
Troubleshooting Virtualization Engine Devices
71
For Internal Use Only
Item
Description
TimeStamp
nnn
v1 (virtualization engine pair v1)
uuuuuuuu
29000060-220041F9 (v1a, obtained by checking the virtualization
engine map from the SEcfg utility)
SRN=mmmmm
SRN=70030: SAN Configuration Changed
(Refer to Appendix A for codes.)
▼ To Clear the Log
● Use the /opt/svengine/sduc/sclrlogcommand.
Virtualization Engine LEDs
TABLE 7-1 describes the LEDs on the back of the virtualization engine..
TABLE 7-1
LED
Virtualization Engine LEDs
Color
State
Description
Power
Green
Solid on
The virtualization engine is powered
on
1
Status
Green
• Solid on
• Normal operating mode
• Blink Service
Code
• Number of blinks to indicate a
Fault
Amber
Serious problem
Decipher the blinking of the Status
LED to determine the service code.
Once you have determined the
service code, look up the decimal
number of the service code in
Appendix A.
1 The Status LED will blink a service code when the Fault LED is Solid on.
72
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
Power LED Codes
The virtualization engine LEDs are shown in FIGURE 7-1.
FIGURE 7-1 Virtualization Engine Front Panel LEDs
Interpreting LED Service and Diagnostic Codes
The Status LED communicates the status of the virtualization engine in decimal
numbers. Each decimal number is represented by number of blinks, followed by a
medium duration (two seconds) of LED off. TABLE 7-2 lists the status LED code
descriptions.
TABLE 7-2
LED Service and Diagnostic Codes
0
1
Fast blink
LED blinks once
2
LED blinks twice with one short duration (one second) between blinks
LED blinks three times with one short duration (one second) between blinks
3
...
10
LED blinks ten times with one short duration (one second) between blinks
The blink code repeats continuously, with a four-second off interval between code
sequences.
Chapter 7
Troubleshooting Virtualization Engine Devices
73
For Internal Use Only
Back Panel Features
The back panel of the virtualization engine contains the Sun StorEdge network FC
switch-8 or switch-16 switches and a socket for the AC power input, and various
data ports and LEDs.
Ethernet Port LEDs
The Ethernet port LEDs indicate the speed, activity, and validity of the link, shown
in TABLE 7-3.
TABLE 7-3
LED
Speed, Activity, and Validity of the Link
Color
State
Description
Speed
Amber
Solid On
The link is 100Base-TX
Off
The link is 10base-T
Link Activity
Green
Solid On
A valid link is established
Blink
Normal operations, including data
activity
74
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
Fibre Channel Link Error Status Report
The virtualization engine’s host-side and device-side interfaces provide statistical
data for the counts listed in TABLE 7-4.
TABLE 7-4
Virtualization Engine Statistical Data
Count Type
Description
Link Failure Count
The number of times the virtualization engine’s frame manager
detects a non-operational state or other failure of N_Port
initialization protocol.
Loss of
Synchronization
Count
The number of times that the virtualization engine detects a loss in
synchronization.
Loss of Signal Count The number of times that the virtualization engine’s frame manager
detects a loss of signal.
Primitive Sequence
Protocol Error
The number of times that the virtualization engine’s frame manager
detects N_Port protocol errors.
Invalid Transmission
Word
The number of times that the virtualization engine’s 8b/ 10b
decoder does not detect a valid 10-bit code.
Invalid CRC Count
The number of times that the virtualization engine receives frames
with a bad CRC and a valid EOF. A valid EOF includes EOFn, EOFt,
or EOFdti.
Chapter 7
Troubleshooting Virtualization Engine Devices
75
For Internal Use Only
▼ To Check Fibre Channel Link Error Status
Manually
The Storage Automated Diagnostic Environment, which runs on the Storage Service
Processor, monitors the Fibre Channel link status of the virtualization engine. The
virtualization engine must be power-cycled to reset the counters. Therefore, you
should manually check the accumulation of errors between a fixed period of time. To
check the status manually, follow these steps:
1. Use the svstatcommand to take a reading, as shown in CODE EXAMPLE 7-1.
A Status report for the host-side and device-side ports is displayed.
2. Within the next few minutes, take another reading.
The number of new errors that occurred within that time frame represents the
number of link errors.
Note – If the t3ofdg(1M) is running while you perform these steps, the following
error message is displayed:
Daemon error: check the SLIC router.
76
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
CODE EXAMPLE 7-1 Fibre Channel Link Error Status Example
# /opt/svengine/sduc/svstat -d v1
I00001 Host Side FC Vital Statistics:
Link Failure Count
Loss of Sync Count
Loss of Signal Count
Protocol Error Count
Invalid Word Count
Invalid CRC Count
0
0
0
0
8
0
I00001 Device Side FC Vital Statistics:
Link Failure Count
Loss of Sync Count
Loss of Signal Count
Protocol Error Count
Invalid Word Count
Invalid CRC Count
0
0
0
0
139
0
I00002 Host Side FC Vital Statistics:
Link Failure Count
Loss of Sync Count
Loss of Signal Count
Protocol Error Count
Invalid Word Count
Invalid CRC Count
0
0
0
0
11
0
I00002 Device Side FC Vital Statistics:
Link Failure Count
Loss of Sync Count
Loss of Signal Count
Protocol Error Count
Invalid Word Count
Invalid CRC Count
0
0
0
0
135
0
diag.xxxxx.xxx.com: root#
Note – v1 represents the first virtualization engine pair
Note – The SLICdaemon must be running for the
/opt/svengine/sduc/svstat -d v1command to work.
Chapter 7
Troubleshooting Virtualization Engine Devices
77
For Internal Use Only
Translating Host Device Names
You can translate host device names to VLUN, disk pool, and physical Sun StorEdge
T3+ array LUNs.
The luxadmoutput for a host device, shown in CODE EXAMPLE 7-2, does not include
the unique VLUN serial number that is needed to identify this LUN.
CODE EXAMPLE 7-2 luxadmOutput for a Host Device
# luxadm display /dev/rdsk/c4t2B00006022004186d0s2
DEVICE PROPERTIES for disk: /dev/rdsk/c4t2B00006022004186d0s2
Status(Port A):
Vendor:
O.K.
SUN
Product ID:
WWN(Node):
WWN(Port A):
Revision:
SESS01
2a00006022004186
2b00006022004186
080E
Serial Num:
Unsupported
Unformatted capacity: 56320.000 MBytes
Write Cache:
Read Cache:
Minimum prefetch:
Maximum prefetch:
Device Type:
Path(s):
Enabled
Enabled
0x0
0x0
Disk device
/dev/rdsk/c4t2B00006022004186d0s2
/devices/pci@1f,4000/pci@2/SUNW,qlc@5/fp@0,0/
ssd@w2b00006022004186,0:c,raw
78
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
▼ To Display the VLUN Serial Number
Devices That Are Not Sun StorEdge Traffic Manager-Enabled
1. Use the format -ecommand.
2. Type the disk on which you are working at the formatprompt.
3. Type inquiryat the scsiprompt.
4. Find the VLUN serial number in the Inquirydisplayed list.
# format -e c4t2B00006022004186d0
format> scsi
...
scsi> inquiry
Inquiry:
00 00 03 12 2b 00 00 02 53 55 4e 20 20 20 20 20
53 45 53 53 30 31 20 20 20 20 20 20 20 20 20 20
30 38 30 45 62 57 33 4b 30 30 31 48 30 30 30
....+...SUN
SESS01
080EbW3K001H000
Vendor:
Product:
Revision:
Removable media:
Device type:
SUN
SESS01
080E
no
0
From this screen, note that the VLUN number is 62 57 33 4b 30 30 31 48, beginning
with the 5th pair of numbers on the 3rd line, up to and including the 12 pair.
Chapter 7
Troubleshooting Virtualization Engine Devices
79
For Internal Use Only
Sun StorEdge Traffic Manager-Enabled Devices
1. If the devices support the Sun StorEdge Traffic Manager software, you can use this
shortcut.
2. Type:
# luxadm display /dev/rdsk/c6t29000060220041956257334B30303148d0s2
DEVICE PROPERTIES for disk: /dev/rdsk/
c6t29000060220041956257334B30303148d0s2
Status(Port A):
Status(Port B):
Vendor:
O.K.
O.K.
SUN
Product ID:
WWN(Node):
WWN(Port A):
WWN(Port B):
Revision:
SESS01
2a00006022004195
2b00006022004195
2b00006022004186
080E
Serial Num:
Unsupported
Unformatted capacity: 56320.000 MBytes
Write Cache:
Read Cache:
Minimum prefetch:
Maximum prefetch:
Device Type:
Path(s):
Enabled
Enabled
0x0
0x0
Disk device
/dev/rdsk/c6t29000060220041956257334B30303148d0s2
/devices/scsi_vhci/ssd@g29000060220041956257334b30303148:c,raw
Controller
Device Address
Class
/devices/pci@1f,4000/SUNW,qlc@4/fp@0,0
2b00006022004195,0
primary
State
ONLINE
Controller
Device Address
Class
/devices/pci@1f,4000/pci@2/SUNW,qlc@5/fp@0,0
2b00006022004186,0
primary
State
ONLINE
The /dev/rdsk/c#t#represents the Global Unique Identifier of the device. It is 32
bits long.
■ The first 16 bits correspond to the WWN of the master virtualization engine
router.
■ The remaining 16 bits are a the VLUN serial number.
■
Virtualization engine WWN = 2900006022004195
VLUN serial number = 6257334B30303148
■
80
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
▼ To View the Virtualization Engine Map
The virtualization engine map is stored on the Storage Service Processor.
1. To view the virtualization engine map, type:
# showvemap -n v1 -f
VIRTUAL LUN SUMMARY
Disk pool VLUN Serial
Number
MP Drive VLUN
Target Target
VLUN
Name
Size
GB
Slic Zones
---------------------------------------------------------------------------
t3b00
t3b00
6257334B30303148
6257334B30303149
T49152
T49152
T16384
T16385
VDRV000
VDRV001
55.0
55.0
*****
DISK POOL SUMMARY
Disk pool RAID MP Drive Size Free Space T3+ Active
Target GB GB Path WWN
Number of
VLUNs
-----------------------------------------------------------------------
t3b00
t3b01
5
5
T49152
T49153
116.7
116.7
6.7
116.7
50020F2300006DFA
50020F230000725B
2
0
*****
MULTIPATH DRIVE SUMMARY
Disk pool MP Drive T3+ Active
Target Path WWN
Controller Serial
Number
-------------------------------------------------------
t3b00
t3b01
*****
T49152
T49153
50020F2300006DFA 60020F2000006DFA
50020F230000725B 60020F2000006DFA
VIRTUALIZATION ENGINE SUMMARY
Initiator UID
VE Host Online Revision Number of SLIC Zones
--------------------------------------------------------------------------
I00001
I00002
*****
2900006022004195 v1a
2900006022004186 v1b
Yes
Yes
08.14
08.14
0
0
ZONE SUMMARY
Zone Name
HBA WWN
Initiator Online Number of VLUNs
---------------------------------------------------------------------
Undefined
Undefined
210000E08B033401 I00001
210000E08B026C0F I00002
Yes
Yes
0
0
Note – This example uses the virtualization engine map file, which could include
old information.
Chapter 7
Troubleshooting Virtualization Engine Devices
81
For Internal Use Only
2. You can optionally establish a telnet connection to the virtualization engine and
run the runsecfgutility to poll a live snapshot of the virtualization engine map.
Refer to “To Replace a Failed Virtualization Engine” on page 84 for telnet
instructions.
Determining the virtualization engine pairs on the system .........
MAIN MENU - SUN StorEdge 6910 SYSTEM CONFIGURATION TOOL
1) T3+ Configuration Utility
2) Switch Configuration Utility
3) Virtualization Engine Configuration Utility
4) View Logs
5) View Errors
6) Exit
Select option above:> 3
VIRTUALIZATION ENGINE MAIN MENU
1) Manage VLUNs
2) Manage Virtualization Engine Zones
3) Manage Configuration Files
4) Manage Virtualization Engine Hosts
5) Help
6) Return
Select option above:> 3
MANAGE CONFIGURATION FILES MENU
1) Display Virtualization Engine Map
2) Save Virtualization Engine Map
3) Verify Virtualization Engine Map
4) Help
5) Return
Select configuration option above:> 1
Do you want to poll the live system (time consuming) or view the file [l|f]: l
From the virtualization engine map output, you can match the VLUN serial number
to the VLUN name (VDRV000), the disk pool (t3b00) and the MP drive target
(T49152). This information can also help you find the controller serial number
(60020F2000006DFA), which you need to perform Sun StorEdge T3+ array LUN
failback commands.
82
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
▼ To Failback the Virtualization Engine
In the event of a Sun StorEdge T3+ array LUN failover, use the following procedure
to fail the LUN back to its original controller.
1. From the Storage Service Processor, type:
# /opt/svengine/sduc/mpdrive failback -d v1 -j 60020F2000006DFA
where:
-d
-j
Virtualization engine pair on which to run the command
Controller serial number, which corresponds to the Sun
StorEdge T3+ array WWN of the affected partner pair
The failback command will always be performed on the controller serial number,
regardless by which controller the LUN actually is currently owned (the Master or
Alt-Master). All VLUNS are affected by a failover and failback of the underlying
physical LUN.
The controller serial number is the system WWN for the Sun StorEdge T3+ array. In
50020F2300006DFA, and the number used in the failback command is
60020F2000006DFA.
2. The SLIC daemon must be running for the mpdrivefailbackcommand to
work. Ensure that the SLICdaemon is running by using the command found in
CODE EXAMPLE 7-3.
If no SLICprocesses are running, you can start them manually using the
SUNWsecfgscripts, which are located in the /opt/SUNWsecfg/bin/startslicd
-n v1directory.
CODE EXAMPLE 7-3 slicdOutput Example
# ps -ef | grep slic
root 6299 6295 0
root 6296 6295 0
Jan 04 ?
Jan 04 ?
Jan 04 ?
Jan 04 ?
Jan 04 ?
0:00 ./slicd
0:02 ./slicd
0:01 ./slicd
0:00 ./slicd
0:03 ./slicd
root 6295
1 0
root 6357 6295 0
root 6362 6295 0
Chapter 7
Troubleshooting Virtualization Engine Devices
83
For Internal Use Only
For detailed information about the SUNWsecfgscripts, refer to the Sun StorEdge
3900 and 6900 Series Reference Manual.
▼ To Replace a Failed Virtualization Engine
1. Replace the old (failed) virtualization engine unit with a new unit.
2. Identify the MAC address of the new unit and replace the old MAC address with
the new one in the /etc/ethersfile:
8:0:20:7d:82:9e virtualization engine-name
3. Verify that RARP is running on the Storage Service Processor.
4. Disable the switch port:
# /opt/SUNWsecfg/flib/setveport -v VE-name -d
5. Power on the new unit.
6. Log in to the new unit, for example:
# telnet v1a virtualization engine-name
7. From the User Service Utility Menu, enter 9 to clear the SAN database.
8. Choose Quitto clear the SAN database.
9. Configure the new unit:
# setupve -n virtualization engine-name
10. Check the configuration:
# checkve -n virtualization engine-name
84
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
11. Enable the switch port:
# /opt/SUNWsecfg/flib/setveport -v virtualization engine-name -e
12. Reset the virtualization engine:
# resetve -n virtualization engine-name
13. Find the initiator number for the new and old number:
# showvemap -n virtualization engine-pairname -l
The new unit will not have any zones defined.
14. If zones were present before the replacement, type the following:
# restorevemap -n virtualization engine pair -z \
-c old-ve-initiator-number -d new-ve-initiator-number
15. Verify the new unit by typing:
# showvemap -n virtualization engine-pairname -l
Chapter 7
Troubleshooting Virtualization Engine Devices
85
For Internal Use Only
▼ To Manually Clear the SAN Database
It is occasionally necessary to manually clear the SAN database on the virtualization
engine routers.
Caution – This procedure will wipe out the SAN database and will remove the
configuration of disk pools, Multipath drives, Zoning, and VLUNs. After performing
this procedure, the virtualization map must be restored to the virtualization engine
pair using /opt/SUNWsecfg/bin/restorevemap. This requires a valid copy of
the /opt/SUNWsecfg/etc/v1.san or v2.sanfile.
▼ To Reset the SAN Database on Both
Virtualization Engines
● Type:
# resetsandb -n vepair command
▼ To Reset the SAN Database on a Single
Virtualization Engine
1. Disconnect the virtualization engine device side FC cables.
2. Telnet to the first virtualization engine in the pair.
3. Enter the password.
The User Service Utility Menu is displayed.
4. Enter 9 to clear the SAN database.
■ *A successful command will display the message
SAN database has been cleared!
■ *An unsuccessful command will result in the service code 051.
If this occurs, repeat steps 1-3.
■
If the command continues to fail, replace the virtualization engine.
5. Reconnect the virtualization engine device side FC cables.
6. Enter B to Warm Reboot both virtualization engines.
86
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
Stopping and Restarting the SLIC Daemon
Follow this procedure to restart the SLICdaemon if the SLICdaemon becomes
unresponsive, or if messages such as the following are displayed:
connect: Connection refusedor Socket error encountered..
▼ To Restart the SLICDaemon
1. Check whether the SLICDis running:
# ps -eaf | grep slicd
2. Check for any message queues, shared memory, or semaphores still in use:
# ipcs
IPC status from <running system> as of Wed Feb 20 12:48:30 MST 2002
T
ID
KEY
MODE
OWNER
GROUP
Message Queues:
Shared Memory:
m
m
m
m
m
0
301
302
303
4
0x50000483 --rw-r--r--
0x5555aa8a --rw-------
0x5555aaaa --rw-------
0x5555aaba --rw-------
root
root
root
root
root
root
other
other
other
root
0x7cc
--rw-------
Semaphores:
s
s
s
s
196608
196609
196610
3
0x5555aa9a --ra-------
0x5555aa7a --ra-------
0x5555aaba --ra-------
root
root
root
root
other
other
other
root
0x10e1
--ra-------
Segments identified with 0x5555aa in the address are associated with the SLIC
daemon.
Chapter 7
Troubleshooting Virtualization Engine Devices
87
For Internal Use Only
3. Remove the segments by typing the following:
# ipcrm -m 301 -m 302 -m 303 -s 196608 -s 196609 -s 196610
Check the ipcrm(1m) man page for details.
4. Restart the SLICdaemon
# /opt/SUNWsecfg/bin/startslicd -n v1 *
(or v2, depending on configuration)
#
5. Confirm that the SLICdaemon is running:
# ps -eaf | grep slicd
root 16132 16130 0 11:45:00 ?
root 16135 16130 0 11:45:00 ?
0:00 ./slicd
0:00 ./slicd
0:00 ./slicd
0:00 ./slicd
0:00 grep slicd
0:00 ./slicd
root 16130
1 0 11:45:00 ?
root 16131 16130 0 11:45:00 ?
root 16189 15877 0 11:48:49 pts/1
root 16143 16130 0 11:45:00 ?
The message queues, shared memory, and semaphores have been removed.
88
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
Sun StorEdge 6900 Series Multipathing
Example
500GB RAID 5 LUN per brick (2 LUNs total)
Currently, there is one 10GB VLUN created from each physical LUN, for a total of
two VLUNs. In a Sun StorEdge 6900 series, there are four possible physical paths to
each Sun StorEdge T3+ array Volume (LUN). Refer to FIGURE 7-4 and FIGURE 7-3.
For example, to access the LUN on the Alt-Master, the Sun StorEdge T3+ array I/ O
could travel:
■ From HBA-0 -> Switch -> SVE(1) -> Switch -> Alt-Master Controller (Primary
Route from HBA-0)
■ From HBA-0 -> Switch -> SVE(1) -> Switch -> Switch -> Master Controller ->
Backend Loop to Alt-Master (Secondary Route from HBA-0)
■ From HBA-1 -> Switch -> SVE(2) -> Switch -> Switch -> Alt-Master Controller
(Primary Route from HBA-1)
■ From HBA-1 -> Switch -> SVE(2) -> Switch -> Master Controller -> Backend Loop
to Alt-Master (Secondary Route from HBA-1)
The virtualization engine recognizes the primary (active) and secondary (passive)
pathing for the LUNs and routes the I/ O to the primary controller, unless there is a
pathing failure to the primary path. In this case, the virtualization engine initiates a
through the interconnect cables). Refer to FIGURE 7-6.
The host, using multipathing software, is presented two primary (active) paths for
each LUN, allowing the host to route I/ O through either or both HBAs.
In the event of a path failure before the second tier of Sun StorEdge network FC
switch-8 and switch-16 switches (refer to FIGURE 7-5), one of the paths is disabled,
but the other path continues sending I/ O as normal and takes over the entire load.
No Sun StorEdge T3+ array LUN failure is noted because of the redundant path by
way of the Sun StorEdge network FC switch-8 and switch-16 switch T Ports.
Chapter 7
Troubleshooting Virtualization Engine Devices
89
For Internal Use Only
In the event of a path failure after the second tier of Sun StorEdge network FC
switch-8 and switch-16 switches (or in the event of both T Ports failing between the
switches), the virtualization engines force a LUN failover of the affected Sun
StorEdge T3+ array and routes all I/ O to its secondary path. From the host side,
nothing has changed; all I/ O is routed through both HBAs (refer to FIGURE 7-6).
FIGURE 7-2 Sun StorEdge 6900 Series Logical View
90
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
Virtualization Engine Event Grid
The Storage Automated Diagnostic Environment Event Grid enables you to sort
virtualization engine events by component, category, or event type. The Storage
Automated Diagnostic Environment GUI displays an event grid that describes the
severity of the event, whether action is required, a description of the event, and the
recommended action. Refer to the Storage Automated Diagnostic Environment User’s
Guide Help section for more information.
▼ Using the Virtualization Engine Event Grid
1. From the Storage Automated Diagnostic Environment Help menu, click the Event
Grid link.
2. Select the criteria from the Storage Automated Diagnostic Environment event
grid, like the one shown in FIGURE 7-7.
FIGURE 7-7 Virtualization Engine Event Grid
Chapter 7
Troubleshooting Virtualization Engine Devices
95
For Internal Use Only
TABLE 7-5 lists the Virtualization Engine Events.
TABLE 7-5
Category
Storage Automated Diagnostic Environment Event Grid for Virtualization Engine
Component
EventType
Sev
Action
Description
virtualization
engine
enclosure
Alarm
Yellow
Volume E00012 on
v1a changed
mapping.
virtualization
engine
enclosure
enclosure
Alarm.log
Audit
Yellow
Change in Port
Statistics on
virtualization
engine v1a
virtualization
engine
[ Info ] Auditing a
Virtualization
Information:
Audits occur every
week and send a
detailed description of
the enclosure to the
Sun Network Storage
Command Center
(NSCC)
Engine called v1a
virtualization
engine
oob
oob
Comm_
Established
Communication
regained with
virtualization
engine v1a
virtualization
engine
Comm_
Lost
Down
Y
[ Info/ Action ]
Lost
Information:
Ethernet connectivity
to the virtualization
communication
with virtualization engine unit has been
engine v1a
lost.
Recommended action:
1. Check Ethernet
connectivity to the
virtualization
engine.
2. Make sure the
virtualization
engine is boosted
correctly.
3. Verify that the
TCP/ IP settings on
the virtualization
engine are correct.
4. Replace the
virtualization
engine if necessary.
96
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
TABLE 7-5
Category
Storage Automated Diagnostic Environment Event Grid for Virtualization Engine (Continued)
Component
EventType
Sev
Action
Description
virtualization
engine
ve_diag
Diagnostic
Test-
Red
ve_diag (diag240)
on ve-1
(ip=xxx.20.67.213)
failed
virtualization
engine
veluntest
enclosure
Diagnostic
Test-
Red
veluntest
(diag240) on ve-1
(ip=xxx.20.67.213)
failed
virtualization
engine
Discovery
[ Info ] Discovered
a new
Information:
Virtualization
Engine called v1a
Discovery events
occur the first time the
agent probes a storage
device and creates a
detailed description of
the device monitored.
The discovery device
sends it using any
active notifier, such as
NetConnect or email.
Chapter 7
Troubleshooting Virtualization Engine Devices
97
For Internal Use Only
98
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002
CHAPTER
8
Troubleshooting the Sun StorEdge
T3+ Array Devices
This chapter contains the following sections:
■ “Explorer Data Collection Utility” on page 99
■ “Sun StorEdge T3+ Array Event Grid” on page 109
Explorer Data Collection Utility
The Explorer Data Collection Utility script is included on the Storage Service
Processor in the /export/packagesdirectory.
The Explorer Data Collection Utility is not installed by default, but can be installed
during rack setup. Customer-specific site information can be entered at that time.
▼ To Install Explorer Data Collection Utility on the
Storage Service Processor
# cd /export/packages
# pkgadd -d . SUNWexplo
As part of the installation procedure, you will be asked to enter in site-specific
information. You can optionally press the Return button to accept the blank defaults.
99
Do not accept automatic emailing of the Explorer Data Collection Utility output,
unless the Storage Service Processor is properly set up to handle mail correctly.
Automatic Email Submission
Would you like all explorer output to be sent to:
at the completion of explorer when -mail or -e is specified?
[y,n] n
Before running the Explorer Data Collection Utility, make sure that the switch and
Sun StorEdge T3+ array information is added to the proper /opt/SUNWexplo/etc
files.
Example
1. Type switch information into the /opt/SUNWexplo/etc/saninput.txtfile.
Edit the file with a text editor such as vi.
CODE EXAMPLE 8-1 Editing switch information using vi
# vi saninput.txt
# Input file for extended data collection
# Format is SWITCH SWITCH-TYPE PASSWORD LOGIN
# Valid switch types are ancor and brocade
# LOGIN is required for brocade switches, the default is admin
sw1a
sw1b
sw2a
sw2b
ancor
ancor
ancor
ancor
:wq!
2. Type Sun StorEdge T3+ array information into the /opt/SUNWexplo/etc/
t3input.txtfile. Edit the file with a text editor such as vi.
3. Type the password for your specific site.
100
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CODE EXAMPLE 8-2 Editing Sun StorEdge T3+ array information using vi
# vi t3input.txt
# Input file for extended data collection
# Format is HOST PASSWORD
t3b0
t3b2
t3b3
XXXX
XXXX
XXXX
:wq!
Note – xxxx represents Sun StorEdge T3+ array passwords.
■ You can now run /opt/SUNWexplo/bin/explorerto collect information
about the Storage Service Processor operating system, the Sun StorEdge network
FC switch-8 or switch-16 switch, and Sun StorEdge T3+ array information, which
can be used for troubleshooting purposes.
■ A tar/gzipfile will be put into the /opt/SUNWexplo/outputdirectory. The
tar/gzipfile can be sent to Sun Service for evaluation.
■ The Sun StorEdge network FC switch-8 and switch-16 switch information will be
placed in the sandirectory of the tar file.
■ Sun StorEdge T3+ array information will be placed in the disk’s/t3directory.
Chapter 8
Troubleshooting the Sun StorEdge T3+ Array Devices
101
For Internal Use Only
Troubleshooting the T1/ T2 Data Path
Notes
■ There are two T Port links for redundancy.
■ If one of the two links is lost, no Sun StorEdge T3+ array LUN failover will occur,
and no pathing failures will be noted.
■ If both T Port links fail, there will be a Sun StorEdge T3+ array LUN failover, as
one of the virtualization engines take control of the I/ O operations. One of the
Sun StorEdge T3+ array LUNs will failover, as all I/ O is routed to the controlling
virtualization engine.
■ The host will notice a pathing failure in its multipathing software.
102
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
T1/ T2 Notification Events
The example below shows a typical port failure event
Site
Source
: Lab 3286 - DSQA1 Broomfield
: diag.xxxxx.xxx.com
Severity : Error (Actionable)
Category : Switch
DeviceId : switch:100000c0dd00b682
EventType: StateChangeEvent.M.port.8
EventTime: 01/30/2002 11:17:22
’port.8’ in SWITCH diag209-sw2a (ip=192.168.0.32) is now Not-Available
(status-state changed from ’Online’ to ’Offline’):
INFORMATION:
A port on the switch has logged out of the fabric and gone offline
PROBABLE-CAUSE:
1. Verify cables, GBICs and connections along Fibre Channel path
2. Check Storage Automated Diagnostic Environment SAN Topology GUI to
identify failing segment of the data path
3. Verify correct FC switch configuration
----------------------------------------------------------------------
Site
Source
: Lab 3286 - DSQA1 Broomfield
: diag.xxxxx.xxx.com
Severity : Warning
Category : Switch
DeviceId : switch:100000c0dd00b682
EventType: LogEvent.MessageLog
EventTime: 01/30/2002 11:17:22
Change in Port Statistics on switch diag209-sw2a (ip=192.168.0.32):
Port-8: Received 9746 ’InvalidTxWds’ in 0 mins (value=9805 )
FIGURE 8-1 Storage Service Processor Event
Chapter 8
Troubleshooting the Sun StorEdge T3+ Array Devices
103
For Internal Use Only
If both T Ports go offline, you might see messages like the following. Note the
virtualization engine Event alerting the LUN failover.
Site
Source
: Lab 3286 - DSQA1 Broomfield
: diag.xxxxx.xxx.com
Severity : Warning (Actionable)
Category : Ve
DeviceId : ve:6257335A-30303142
EventType: AlarmEvent.volume
EventTime: 01/30/2002 11:49:05
Volume T49152 on diag209-v1a changed from 6257335A-30303142(active=50020F23-
00006DFA,passive=) to 6257335A-30303142(active=50020F23-
00006DFA,passive=50020F23-0000725B)
INFORMATION:
This event occurs when the virtualization engine has detected a
change in status for a Multipath Drive or VLUN,
usually meaning a pathing problem to a Sun StorEdge T3+ array controller
for changes in Active/Passive paths
2. Check Sun StorEdge T3+ array for current LUN ownership. (‘port listmap‘)
3. Use ‘mpdrive failback‘ if needed to fail LUNs back to
correct controller if needed
----------------------------------------------------------------------
Site
Source
: Lab 3286 - DSQA1 Broomfield
: diag.xxxxx.xxx.com
Severity : Warning
Category : Message
DeviceId : message:diag.xxxxx.xxx.com
EventType: LogEvent.driver.SSD_WARN
EventTime: 01/30/2002 11:50:07
Found 1 ’driver.SSD_WARN’ warning(s) in logfile: /var/adm/messages on
diag.xxxxx.xxx.com (id=809f76b4):
INFORMATION:
SSD warnings
Jan 30 11:49:48 WWN:
mins [threshold is 5 in 24hours]
Received 7 ’SSD Warning’ message(s) on ’ssd56’ in 8
Last-Message: ’diag.xxxxx.xxx.com scsi:
[ID 243001 kern.warning] WARNING: /scsi_vhci/
ssd@g29000060220041956257335a30303145 (ssd56): ’
...continued on next page...
104
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
...continued from previous page...
----------------------------------------------------------------------
Site
Source
: Lab 3286 - DSQA1 Broomfield
: diag.xxxxx.xxx.com
Severity : Warning
Category : Message
DeviceId : message:diag.xxxxx.xxx.com
EventType: LogEvent.driver.Fabric_Warning
EventTime: 01/30/2002 11:50:07
Found 1 ’driver.Fabric_Warning’ warning(s) in logfile: /var/adm/messages on
diag.xxxxx.xxx.com (id=809f76b4):
INFORMATION:
Fabric warning
Jan 30 11:46:37 WWN:2b00006022004186
diag.xxxxx.xxx.com fp: [ID 517869
kern.warning] WARNING: fp(2): N_x Port with D_ID=108000,
PWWN=2b00006022004186 reappeared in fabric ( in backup:diag.xxxxx.xxx.com)
----------------------------------------------------------------------
Site
Source
: Lab 3286 - DSQA1 Broomfield
: diag.xxxxx.xxx.com
Severity : Warning (Actionable)
Category : Host
DeviceId : host:diag.xxxxx.xxx.com
EventType: AlarmEvent.P.hba
EventTime: 01/30/2002 11:50:10
status of hba /devices/pci@1f,4000/pci@2/SUNW,qlc@5/fp@0,0:devctl on
diag.xxxxx.xxx.com changed from NOT CONNECTED to CONNECTED
INFORMATION:
monitors changes in the output of luxadm -e port
FIGURE 8-2 Virtualization Engine Alert
Chapter 8
Troubleshooting the Sun StorEdge T3+ Array Devices
105
For Internal Use Only
Sun StorEdge T3+ Array Storage Service Processor
Verification
1. Run portlistmapon the Sun StorEdge T3+ array to see the failover event.
# t3b0:/:<1>port listmap
port
u1p1
u1p1
u2p1
u2p1
targetid
addr_type
hard
hard
hard
hard
lun
0
1
0
1
volume
vol1
vol2
vol1
vol2
owner
u1
u1
u1
u1
access
0
0
1
1
primary
failover
failover
primary
2. Compare the virtualization engine configuration to a saved configuration by
running /opt/SUNWsecfg/runsecfgand choosing Verify Virtualization Engine
Map.
The output is from the diff(1) command, which shows the lines that have been
added, changed, or deleted. Notice that the active Sun StorEdge T3+ array controller
WWN has changed for one of the Sun StorEdge T3+ arrays, indicating it is using its
alternate path.
MANAGE CONFIGURATION FILES MENU
1) Display Virtualization Engine Map
2) Save Virtualization Engine Map
3) Verify Virtualization Engine Map
4) Help
5) Return
Select configuration option above:> 3
Verifying Virtualization Engine map for v1........
ERROR: virtualization engine map for v1 has changed.
18c18
< t3b01
5
T49153
116.7
0.7
50020F230000725B
1
1
> t3b01
28c28
5
T49153
116.7
0.7
50020F2300006DFA
< t3b01
> t3b01
37c37
T49153
T49153
50020F230000725B 60020F2000006DFA
50020F2300006DFA 60020F2000006DFA
< I00002
> I00002
46d45
2900006022004186 v1b
2900006022004186 Unknown No
Yes
08.14
Unknown
0
0
< Undefined
210000E08B026C0F I00002
Yes
0
checkvemap: virtualization engine map v1 verification complete: FAIL.
FIGURE 8-3 Manage Configuration Files Menu
106
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
T1/ T2 FRU Tests Available
■ Switch - switchtest
■ Link - linktest
Running linktestfrom the Storage Automated Diagnostic Environment GUI
will guide the Service Engineer to discover the failed FRU.
Once the test has completed its run, an email message, similar to the following
message, will be sent to the Email recipient that was specified in linktest.
running on diag.xxxxx.xxx.com
linktest started on FC interconnect: switch to switch
switchtest started on switch 100000c0dd00b682 port 8
Estimated test time 14 minute(s)
01/30/02 11:21:26 diag209 Storage Automated Diagnostic Environment: MSGID
6013 switchtest.FATAL
switch0: "Device: Switch Port: 8 is Offline"
switchtest failed
Remove FC Cable from switch: 100000c0dd00b682, port: 8
Insert FC loopback cable into switch: 100000c0dd00b682, port: 8
Continue Isolation ?
switchtest started on switch 100000c0dd00b682 port 8
Estimated test time 14 minute(s)
01/30/02 11:22:11 diag209 Storage Automated Diagnostic Environment: MSGID
6013 switchtest.FATAL
switch0: "Device: Switch Port: 8 is Offline"
switchtest failed
Remove FC loopback cable from switch: 100000c0dd00b682, port: 8
Insert a NEW FC GBIC into switch: 100000c0dd00b682, port: 8
Insert FC loopback cable into switch: 100000c0dd00b682, port: 8
Continue Isolation ?
switchtest started on switch 100000c0dd00b682 port 8
Estimated test time 14 minute(s)
01/30/02 11:25:12 diag209 Storage Automated Diagnostic Environment: MSGID
4001 switchtest.WARNING
switch0: "Maximum transfer size for a FABRIC port is 200. Changing transfer
size 2000 to 200"
switchtest completed successfully
Remove FC loopback cable from switch: 100000c0dd00b682, port: 8
Restore ORIGINAL FC Cable into switch: 100000c0dd00b682, port: 8
Suspect ORIGINAL FC GBIC in switch: 100000c0dd00b682, port: 8
Retest to verify FRU replacement.
linktest completed on FC interconnect: switch to switch
FIGURE 8-4 Example Link Test Text Output from the Storage Automated Diagnostic
Environment
Chapter 8
Troubleshooting the Sun StorEdge T3+ Array Devices
107
For Internal Use Only
Notes
■ When inserting a loopback connector into the T Port, there will be NO green light
indicating a proper insertion. However, the test will run and be valid. There is
currently an RFE to address this issue.
■ If only one of the links has failed and the I/ O is travelling over the remaining
link, once the failed link is replaced and recabled, I/ O will be automatically be
routed over the repaired link by the switch. No manual intervention is required.
■ If both links have failed and a LUN failover has occured, after repairing the links
and recabling them, the user will have to manually perform a ’mpdrive failback’
to return the paths to their optimal state. I/ O will then resume as normal over
the T Ports.
T1/ T2 Isolation Procedures
1. Run linktestfrom the Storage Automated Diagnostic Environment for a guided
isolation procedure.
2. After replacing the failed FRU, run mpdrivefailback, if needed.
108
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Sun StorEdge T3+ Array Event Grid
The Storage Automated Diagnostic Environment Event Grid enables you to sort Sun
StorEdge T3+ array events by component, category, or event type. The Storage
Automated Diagnostic Environment GUI displays an event grid that describes the
severity of the event, whether action is required, a description of the event, and the
recommended action. Refer to the Storage Automated Diagnostic Environment User’s
Guide for more information.
1. From the Storage Automated Diagnostic Environment Help menu, click the Event
Grid link.
2. Select the criteria from the Storage Automated Diagnostic Environment event
grid, like the one shown in FIGURE 8-5.
FIGURE 8-5 Sun StorEdge T3+ array Event Grid
Chapter 8
Troubleshooting the Sun StorEdge T3+ Array Devices
109
For Internal Use Only
The following table lists all of the events for the Sun StorEdge T3+ array.
Category
Component
EventType
Sev
Action
Description
Information
t3
power.temp
Alarm+
The state of
power.u1pcu1.PowTe
mp on diag213
(ip=xxx.20.67.213) is
Normal
t3
disk.port
Alarm-
Red
Y
[ Info/ Action ] The
state of disk.u1d1.
Port1Stateon Sun
StorEdge T3+ array
Information: The Sun
StorEdge T3+ array
has reported that one
port of a dual-ported
t300 changed from OK disk has failed.
to failed.
Recommended action:
1. Telnet to affected
Sun StorEdge T3+
array
2. Verify disk state in
fru stat, fru list,
and vol stat.’
t3
interface.
loopcard.cab
le
Alarm-
Red
Y
[ Info/ Action ] The
state of
loopcable.u1l1.CableSt has reported that a
Information: The Sun
StorEdge T3+ array
ate changed from OK
to failed.
loopcard is in a failed
state.
Drive Status
Messages:
Recommended action:
1. Telnet to affected
Sun StorEdge T3+
array.
2. Verify tje loopcard
state with fru stat.
3. Verify the
Value Description
0 Drive mounted
2 Drive present
3 Drive is spun up
4 Drive is disable
5 Drive has been
replaced
matching firmware
with the other
7 Invalid system area
on drive
9 Drive not present
D Drive disabled;
drive is being
loopcard.
4. Re-enable the
loopcard if
possible (enable
u (encid)|[1|2]
). Replace loopcard
if necessary.
reconstructed
S Drive substituted
5. Re-enable the disk
if possible
6. Replace the disk, if
necessary.
110
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Category
Component
EventType
Sev
Action
Description
Information
t3
power.
battery
Alarm-
Red
Y
[ Info/ Action ] The
state of
power.u1pcu1.BatStat
e on diag213
Information: The state
of the batteries in the
Sun StorEdge T3+
array is not optimal.
(ip=xxx.20.67.213) is
Fault
Recommended action:
Possible causes are:
1. Voltage level on
power supply and
battery have
1. Telnet to the
affected Sun
StorEdge T3+
array.
moved out of
acceptable
thresholds.
2. Run refresh -s
to verify the
battery state.
3. Replace the
battery, if
2. The internal PCU
temp has exceeded
acceptable
necessary
thresholds.
3. A PCU fan has
failed.
t3
power.fan
Alarm-
Red
Y
[ Info/ Action ] The
state of
Information: The state
of a fan on the Sun
power.u1pcu1.Fan1Sta StorEdge T3+ array is
te on diag213
(ip=xxx.20.67.213) is
Fault
not optimal.
Recommended action:
1. Telnet to affected
Sun StorEdge T3+
array.
2. Verify the fan state
with fru stat.
3. Replace the power
cooling unit, if
necessary.
Chapter 8
Troubleshooting the Sun StorEdge T3+ Array Devices
111
For Internal Use Only
Category
Component
EventType
Sev
Action
Description
Information
t3
power.
output
Alarm-
Red
Y
[ Info/ Action ] The
state of
Information: The state
of the power in the
power.u1pcu1.PowOu Sun StorEdge T3+
tput on diag213
(ip=xxx.20.67.21
3) is Fault
array power cooling
unit is not optimal.
Recommended action:
1. Telnet to affected
Sun StorEdge T3+
array.
2. Verify power
cooling unit state
in fru stat.
3. Replace PCU, if
necessary.
t3
power.temp
Alarm-
Red
Y
[ Info/ Action ] The
state of
power.u1pcu1.PowT
emp on diag213
(ip=
Information: The state
of the temperature in
the Sun StorEdge T3+
array power cooling
unit is either too high
or is unknown.
xxx.20.67.213)
is Fault
Recommended action:
1. Telnet to the
affected Sun
StorEdge T3+
array.
2. Verify that the
power cooling unit
state is in ‘fru stat’
3. Replace the PCU if
necessary.
t3
enclosure
Alarm.log
Red
Y
[ Info/ Action ]
Information: This
event includes all
important errors
found.
Errors(s) found in
logfile: / var/adm/
messages.t3
Recommended action:
Check the messages
file for appropriate
action.
112
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Category
Component
EventType
Sev
Action
Description
Information
t3
enclosure
Alarm.
time
Yello
w
[ Action ] Time of T3
diag213
Recommended action:
Discrepancy
(ip=xxx.20.67.213) is
different from host:
T3=Fri Oct 26
10:16:17 200,
Host=2001-10-26
12:21:04
Fix the date and time
on the Sun StorEdge
T3+ array using the
date command. Date
and time should be
the same as the
monitoring host.
t3
enclosure
Audit
[ Info ] Auditing a
new Sun StorEdge
T3+ array called ras
d2-t3b1
(ip=xxx.0.0.41)
slr-mi.370-3990-
01-e-e1.003239
Information: Audits
occur every week and
send a detailed
description of the
enclosure to the Sun
Network Storage
Command Center
(NSCC).
t3
ib
Comm_
Established
[ Info ]
Communication
regained
Information: InBand
Communication.
(InBand(ccadieux))
with diag213
(ip=xxx.20.67.213)
( last reboot was 2001-
09-27 15:22:00)
t3
oob
Comm_
Established
[ Info ]
Information:
OutOfBand
Communication
communications.
regained (OutOfBand
with diag213
(ip=xxx.20.67.213)
Chapter 8
Troubleshooting the Sun StorEdge T3+ Array Devices
113
For Internal Use Only
Category
Component
EventType
Sev
Action
Description
Information
t3
ib
Comm_Lost
Down
Y
[ Info/ Action ] Lost
communication
(InBandwith diag213
(ip=xxx.20.67.21
3) ( last reboot was
2001-09-27 15:22:00)
Recommended action:
1. Verify luxadmvia
command line
(luxadm probe,
luxadm display)
2. Verify cables,
GBICs and
connections along
data path.
3. Check the Storage
Automated
Information: InBand.
This event is
established using
luxadm. This
Diagnostic
monitoring may not
be activated for a
particular Sun
Environment SAN
Topology GUI to
identify the failing
segment of the
data path.
StorEdge T3+ array.
4. Verify the correct
FC switch
configuration, if
applicable.
t3
oob
Comm_Lost
Down
Y
[ Info/ Action ] Lost
communication
(OutOfBand with
diag213
Recommended action:
1. Check Ethernet
connectivity to the
affected Sun
(ip=xxx.20.67.212)
StorEdge T3+
array.
Probable Cause: This
problem can also be
caused by a very slow
network, or because
the Ethernet
connection to this Sun
StorEdge T3+ array
was lost.
2. Verify the Sun
StorEdge T3+ array
is booted correctly.
3. Verify the correct
TCP/ IP settings on
the Sun StorEdge
T3+ array .
4. Increase the http
and/ or ping
Information:
timeout in
Utilities--
>System--
>System--
>Timeouts. The
current default
timeouts are 10
seconds for ping
and 60 seconds for
http (tokens).
OutOfBand. This
means that the Sun
StorEdge T3+ array
failed to answer to a
ping or failed to
return its tokens.
114
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Category
Component
EventType
Sev
Action
Description
Information
t3
t3ofdg
Diagnostic
Test-
Red
t3ofdg (diag240)
on diag213
(ip=xxx.20.67.213)
failed
t3
t3
t3
t3test
Diagnostic
Test-
Red
Red
t3test (diag240) on
diag213
(ip=xxx.20.67.213)
failed
t3volverify
enclosure
Diagnostic
Test-
t3volverify (diag240)
on diag213
(ip=xxx.20.67.213)
failed
Discovery
[ Info ]
Information:
Discovery events
Discovered a new Sun
StorEdge T3+ array
called ras d2-t3b1
(ip=xxx.0.0.41)
slr-mi.370-3990-
01-e-e1.003239
occur the first time the
agent probes a storage
device. The Discovery
event creates a
detailed description of
the device monitored
and sends it using any
active notifier, such as
NetConnect or Email.
t3
controller
Insert
[ Info ]
Information:
Component
A new Controller, as
identified by its serial
number, has been
installed on the Sun
controller.u1ctr
(id) was added to T3
diag213
(ip=xxx.20.67.213) StorEdge T3+ array.
t3
t3
disk
Insert
Component
disk.u2d3(SEAGATE
.ST318203FSUN18G
.LRG07139) was
added to diag158
(ip=xxx.20.67.158)
interface.
loopcard
Insert
Component
[ Info ]
A new LoopCard, as
identified by its serial
number, has been
installed on the Sun
StorEdge T3+ array .
Chapter 8
Troubleshooting the Sun StorEdge T3+ Array Devices
115
For Internal Use Only
Category
Component
EventType
Sev
Action
Description
Information
t3
power
Insert
[ Info ]
Component
’power.u1pcu2’(TE
CTROL-CAN.300-
1454-
01(50).008275)
was added to T3
diag213
(ip=xxx.20.67.21
3)
t3
enclosure
Location
Change
Location of t3
rasd2-t3b0
(ip=xxx.0.0.40)
was changed
t3
t3
t3
t3
enclosure
enclosure
enclosure
controller
QuiesceEnd
QuiesceStart
Removal
Quiesce End on t3
d2-t3b1
(ip=xxx.0.0.41)
Quiesce Start on t3
d2-t3b1
(ip=xxx.0.0.41)
Monitoring of t3 d2-
t3b1 (ip=xxx.0.0.41)
ended
Remove
Red
Y
[ Info/ Action ]
Information: The Sun
Component
’controller.u1ctr StorEdge T3+ array
’(id) was removed
from T3 diag213
(ip=xxx.20.67.213)
has reported that a
controller was
removed from the
chassis.
Recommended action:
Replace the Controller
within 30 minute
power shutdown
window.
116
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Category
Component
EventType
Sev
Action
Description
Information
t3
disk
Remove
Component
Red
Y
[ Info/ Action ]
Information: The Sun
StorEdge T3+ array
has reported a disk
has been removed
from the chassis.
disk.u2d3(SEAGAT
E.ST318203FSUN18
G.LRG07139) was
removed from
diag158
Recommended action:
(ip=xxx.20.67.158)
Replace the disk
within the 30-minute
power shutdown
window.
t3
interface.
loopcard
Remove
Component
Red
Y
[ Info/ Action ]
Information:
Recommended action:
Replace the loopcard
within the 30-minute
The Sun StorEdge T3+ power shutdown
array has reported
that a loopcard has
been removed from
the chassis.
window
t3
power
Remove
Component
Red
Y
[ Info/ Action ]
Information: The Sun
StorEdge T3+ array
’power.u1pcu2’(TE has reported that a
CTROL-CAN.300-
1454-
01(50).008275)
was removed from T3
diag213
power cooling unit
has been removed
from the chassis.
Recommended action:
Replace the PCU
(ip=xxx.20.67.213)
within 30-minute
shutdown window.
t3
controller
State
Change+
’controller.u1ctr
’ in T3 diag213
(ip=xxx.20.67.213)
is now Available
(status-state changed
from disabledto
ready-enabled)
Chapter 8
Troubleshooting the Sun StorEdge T3+ Array Devices
117
For Internal Use Only
Category
Component
EventType
Sev
Action
Description
Information
t3
disk
State
Change+
disk.u1d5 in Sun
StorEdge T3+ array
rasd3-t3b1
(ip=xxx.0.0.41) is
now Available
(status-state changed
from fault-
disabledto ready-
enabled)
t3
t3
interface.
loopcard
State
Change+
[ Info ]
Information: The Sun
loopcard.u1l1(SLR StorEdge T3+ array
-MI.375-0085-01-
G-G4.070924) in T3
msp0-t3b0
has reported that a
loopcard has been
replaced or brought
back online.
volume
State
Change+
’volume.u1vol1
(slr-mi.370-3990-
01-e-
f0.022542.u1vol1)
in T3 dvt2-t3b0
(ip=192.168.0.40)
is now Available
status-state changed
from unmounted to
mounted)
t3
power
State
Change+
power.u1pcu2’TEC
TROL-CAN.300-
1454-
01(50).008275) in
T3 rasd2-t3b1
(ip=xxx.0.0.41) is
now Available status-
state changed from
ready-disable to
ready-enable).
118
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Category
Component
EventType
Sev
Action
Description
Information
t3
controller
State
Red
Y
[ Info/ Action ]
Recommended action:
Change-
controller.u1ctr
in T3 diag213
(ip=xxx.20.67.213)
is now Not-Available
(status-state changed
from unknownto
1. Telnet to affected
Sun StorEdge T3+
array.
2. Verify the
controller state
with ‘fru stat’ and
‘sys stat’.
ready-disabled)
3. Run ‘logger -
dmprstlog’ to
capture controller
information.
4. Re-enable the
controller if
Information: The Sun
StorEdge T3+ array
controller has been
disabled.
possible (enable u)
5. Replace the
controller, if
necessary.
t3
disk
StateChange- Red
Y
[ Info/ Action ]
Information: The Sun
StorEdge T3+ array
has reported that a
disk has failed.
disk.u1d5in T3
rasd3-t3b1
(ip=xxx.0.0.41) is
now Not-Available
(status-state changed
from unknownto
fault-disabled).
Recommended action:
1. Telnet to the
affected Sun
StorEdge T3+ array
2. Verify that the disk
state is in fru
stat, fru list,
and vol stat.
3. Replace the disk, if
necessary.
Chapter 8
Troubleshooting the Sun StorEdge T3+ Array Devices
119
For Internal Use Only
Category
Component
EventType
Sev
Action
Description
Information
t3
interface.
loopcard
StateChange- Red
Y
[ Info/ Action ]
Recommended action:
Information:
1. Telnet to the
affected Sun
StorEdge T3+
array.
The Sun StorEdge T3+
array has indicated
that the loopcard is no 2. Verify loopcard
longer in an optimal
state.
state with fru
stat
3. Verify matching
firmware with
other loopcard.
4. Re-enable loopcard
if possible (enable
u(encid)| [1| 2| ]
5. Replace the
loopcard if
necessary.
120
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Category
Component
EventType
Sev
Action
Description
Information
t3
volume
StateChange- Red
Y
[ Info/ Action ]
Information: The Sun
StorEdge T3+ array
has reported that a
power cooling unit
has been disabled.
Recommended action:
1. Check the Sun
StorEdge T3+ array
syslog for battery
hold times.
2. If < 6 minutes,
replace the battery,
or the entire PCU,
as required.
t3
power
StateChange- Red
Y
[ Info/ Action ]
Information: The Sun
power.u1pcu2(TECT StorEdge T3+ array
ROL-CAN.300-
1454-
has reported that a
LUN has changed
state.
01(50).008275) in
T3 rasd2-t3b1
(ip=xxx.0.0.41) is
now Not-Available
(status-state changed
from ready-enabled to
ready-disable).
Recommended action:
1. Telnet to the
affected Sun
StorEdge T3+ array
2. Check the status of
LUNs via vol
modeor vol
stat.
t3
enclosure
Statistics
Statistics about T3
d2-t3b1
(ip=xxx.0.0.41)
Chapter 8
Troubleshooting the Sun StorEdge T3+ Array Devices
121
For Internal Use Only
Replacing the Master Midplane
Follow this procedure when replacing the master midplane in a Sun StorEdge T3+
array. This procedure is detailed in the Storage Automated Diagnostic Environment
User’s Guide.
▼ To Replace the Master Midplane
1. Choose Maintenance --> General Maintenance -- > Maintain Devices.
Refer to Chapter 3 of the Storage Automated Diagnostic Environment User’s Guide.
2. In the Maintain Devices window, delete the device that is to be replaced.
3. Choose Maintenance -- > General Maintenance -- > Discovery.
4. In the Device Discovery window, rediscover the device.
5. Choose Maintenance -- > Topology Maintenance -- > Topology Snapshot.
a. Select the host that monitors the replaced FRU.
b. Click Create and Retrieve Selected Topologies.
c. Click Merge and Push Master Topology.
Conclusion
Any time a master midplane is replaced, you must rediscover the device using the
procedure described above. This is especially important when the Storage Service
Processor is replaced as a FRU, whether the Storage Service Processor is the master
or the slave.
122
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
CHAPTER
9
Troubleshooting Ethernet Hubs
The Sun StorEdge 3900 and 6900 series uses an Ethernet hub as the backbone for the
internal service network. The allocation of Ethernet ports are as follows:
■ 1—Storage Service Processor (per subsystem)
■ 1—for each Fibre Channel Switch
■ 1—for each Virtualization Engine
■ 2—for each Sun StorEdge T3+ array partner group
■ 1—for the Ethernet hub that is installed on the second Sun StorEdge Expansion
Cabinet in the Sun StorEdge 3960 and 6960 systems
Note – Information about LED Status lights, power information, and front panel
settings, can be found in the SuperStack 3 Baseline Hub 12-Port TP (3C16440A) and 24-
Port TP (3C16441A) User Guide, pn: DUA1644-0AAA03. This is a 3COM document.
Log in to http://www.3com.comto access the documentation.
123
124
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
APPENDIX
A
Virtualization Engine References
This Appendix contains the following Tables:
■ Table A-2 “SRN/ SNMP Single Point of Failure Table”
■ Table A-3 “Port Communication”
■ Table A-4 “Service Codes”
TABLE A-1 provides an explanation of Service Request Numbers for the virtualization
engine.
TABLE A-1 SRN and SNMP Reference
SRN
Description
Corrective Action
1xxxx
Disk drive Check Condition status. xxxx is
the Unit Error Code.
If too many Check Conditions are returned,
then check the link status.
The Unit Error Codes are returned by the
drive in Sense Data bytes 20-21 in response
to the SCSI Request Sense command.
70000
70001
70002
70003
SAN Configuration has changed.
Rebuild process has started.
Rebuild is completed without error.
Rebuild is aborted with a read error. This
means that the drive copying information
cannot read from the primary drive.
If a spare drive is available, it will be
brought in and used to replace the failed
drive. If no spare is available, replace the
failed drive with a new drive.
70004
Write error is reported by follower. If the
initiator is master, then its follower has
detected a write error on a member within
a mirror drive.
If a spare drive is available, it will be
brought in and used to replace the failed
drive. If no spare is available, replace the
failed drive with a new drive.
125
For Internal Use Only
TABLE A-1 SRN and SNMP Reference
SRN
Description
Corrective Action
70005
Write error is detected by master.
If the initiator is master, then it has detected
a write error on a member within a mirror
drive.
If a spare drive is available, it will be
brought in and used to replace the failed
drive. If no spare is available, replace the
failed drive with a new drive.
70006
70007
virtualization engine-to-virtualization
engine communication has failed.
Internal error. Update firmware.
Rebuild is aborted with write error. This
means the primary drive cannot write to
the drive being built.
If a spare drive is available, it will be
brought in and used to replace the failed
drive. If no spare is available, replace the
failed drive with a new drive.
70008
70009
Read error is reported by follower. If the
initiator is master, then its follower has
detected a read error on a member within a
mirror drive.
If a spare drive is available, it will be
brought in and used to replace the failed
drive. If no spare is available, replace the
failed drive with a new drive.
Read error is detected by master. If the
initiator is master, then it has detected a
read error on a member within a mirror
drive.
If a spare drive is available, it will be
brought in and used to replace the failed
drive. If no spare is available, replace the
failed drive with a new drive.
70010
70020
70021
70022
70023
70024
CleanUp configuration table is completed.
SAN physical configuration has changed.
Drive is offline.
If unintentional, check condition of drives.
If unintentional, check condition of drives.
If unintentional, check condition of drives.
Check condition of drives.
virtualization engine is offline.
Drive is unresponsive.
For Sun StorEdge T3+ array pack: Master
virtualization engine has detected the
partner virtualization engine’s IP Address.
70025
70030
For Sun StorEdge T3+ array pack: Master
virtualization engine is unable to detect the two virtualization engines.
partner virtualization engine’s IP Address.
Check the Ethernet connection between the
SAN configuration changed by SV SAN
Builder.
70040
70050
70051
70098
Host zoning configuration has changed.
MultiPath drive Failover.
MultiPath drive Failback.
Instant Copy degrade.
Check MultiPath drive.
If no spare is available, replace the failed
drive with a new drive.
70099
Degrade because the drive has disappeared. Reinsert the missing drive, or replace it
with a drive of equal or greater capacity.
126 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
TABLE A-1 SRN and SNMP Reference
SRN
Description
Corrective Action
7009A
Read degrade recorded. A mirror drive
was written to, causing it to enter the
degrade state.
Reinsert the missing drive, or replace it
with a drive of equal or greater capacity.
7009B
7009C
Write degrade recorded. If a spare drive is
available, it will be brought in and used to
replace the failed drive.
The removed drive needs to be (if good)
reinserted or (if bad) replaced.
Last primary failed during rebuild. This is
• Backup drive data.
a “multi-point failure” and is very rare.
• Destroy mirror drive where failure has
occurred.
• Format (mode 14) drives.
• Create new mirror drive.
• Re-assign old SCSI ID and LUN to mirror
drive.
• Restore data.
71000
71001
virtualization engine-to-virtualization
engine communication has recovered.
This is a generic error code for the SLIC. It
signifies communication problems between
the virtualization engine and the Daemon.
Check the condition of the virtualization
engine.
Check the cabling between the
virtualization engine and Daemon server.
Error halt mode also forces this SRN.
71002
This indicates that the SLIC was busy.
Check the condition of the virtualization
engine.
Check the cabling between the
virtualization engine and the Daemon
server.
Error halt mode also forces this SRN.
71003
71010
72000
SLIC Master unreachable.
Check conditions of the virtualization
engines in the SAN.
The status of the SLIC daemon has
changed.
Primary/ Secondary SLIC daemon
connection is active.
72001
72002
72003
72004
Failed to read SAN drive configuration.
Failed to lock on to SLIC daemon.
Failed to read SAN SignOn Information.
Failed to read Zone configuration.
Appendix A
Virtualization Engine References
127
TABLE A-1 SRN and SNMP Reference
SRN
Description
Corrective Action
72005
72006
72007
Failed to check for SAN changes.
Failed to read SAN event log.
SLIC daemon connection is down.
Wait for 1-5 minutes for backup daemon to
come up. If it doesn’t, check the network
connection for virtualization engine halt, or
hardware failure.
TABLE A-2 SRN/ SNMP Single Point of Failure Table
SRN after
Corrective
Action
SRN
SNMP Description
Corrective Action
70020
70030
70050*
70021
Check SAN cabling and connections
between Sun StorEdge T3+ array
and virtualization engine.
Perform Sun StorEdge T3+ array
failback, if necessary.
70020
70030
70051**
• SAN topology has changed
• Global SAN configuration has
changed.
• SAN configuration has changed.
• A physical device is missing.
70025
Partner’s virtualization engine’s IP is not
reachable.
Check Ethernet cabling and
connections.
None.
70020
70030
70050
70025
70021
70022
70020
70030
70050
70024
70021
70022
• SAN topology has changed
• Global SAN configuration has
changed.
• SAN configuration has changed.
• Partner virtualization engine’s IP is
not reachable.
• A physical device is missing.
• A SLIC virtualization engine is
missing.
• Check cabling and connections
between virtualization engine.
• Cycle power on failed
virtualization engine, if fault LED
flashes.
• Perform Sun StorEdge T3+
array failback, if necessary.
• Enable VERITAS path.
Readings
72007
When error halt on virtualization
engine (not master)
• SLIC daemon connection is inactive.
Failed to check for SAN changes,
daemon error, check the SLIC
virtualization engine.
72000
• Secondary daemon connection is
active.
* Sun StorEdge T3+ array LUN Failover.
** Sun StorEdge T3+ array LUN Failback.
128 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
TABLE A-3 Port Communication
Port
Daemon
Port
Port Number
20000
Management Programs
Daemon
Daemon
20001
Daemon
virtualization engine
virtualization engine
25000
virtualization engine
25001
TABLE A-4 provides service codes for the virtualization engine.
TABLE A-4 Service Codes
Code
Number
Cause
Corrective Action
005
24
PCI bus parity error.
• Replace virtualization engine.
• Cycle power to the virtualization engine.
The attempt to report one error resulted in
another error.
40
41
Corrupt database
• Clear SAN database
• Cycle power to the virtualization engine.
• Import SAN zone configuration
Corrupt database
• Clear SAN database
• Cycle power to the virtualization engine
• Import SAN zone configuration
42
Zone mapping database
• Import SAN zone configuration
050
This message indicates that an attempt to
write a value into non-volatile storage
failed. It could be a hardware failure, or it
could be that one of the databases stored in
Flash memory could not accept the entry
being added.
• Clear the SAN database.
• Cycle power to the virtualization engine.
051
53
Cannot erase FLASH memory.
• Replace virtualization engine.
Unauthorized cabling configuration
• Check cabling. Ensure server/ switch
connects to host-side and storage connects
to device side of virtualization engine
virtualization engine.
• If necessary, clear SAN database.
• If necessary, cycle virtualization engine
power.
• If necessary, import SAN zone
configuration.
Appendix A
Virtualization Engine References
129
TABLE A-4 Service Codes
54
57
60
62
Unauthorized cabling configuration.
• Check cabling.
• Check cabling.
• No action required.
Too many HBAs attempting to log in.
Node mapping table cleared using SW2.
Improper SW2 setting.
• Correct SW2 setting.
• Cycle virtualization engine power.
126
130
Too many virtualization engines in SAN.
• Remove the extra virtualization engine.
• Cycle virtualization engine power.
Heartbeat connection between
virtualization engines is down.
• Correct problem.
• Cycle the power on the follower
virtualization engine.
400 - 599 Device side interface driver errors:
409
434
FC device-side type code invalid.
• Cycle power
• If problem persists, replace virtualization
engine.
Too many elastic store errors to continue.
Elastic store errors result from a clock
mismatch between transmitter and receiver
and indicates an unreliable link. This error
can also occur if a device in the SAN loses
power unexpectedly.
• Check for faulty component and replace.
• Cycle the power on the follower
virtualization engine.
130 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
APPENDIX
B
SUNWsecfg Error Messages
The Sun StorEdge 3900 and 6900 Series Reference Manual lists and defines the
information by providing recommendations for corrective action, should you
encounter errors with the command utilities.
■
TABLE B-1 lists SUNWsecfg error messages specific to the virtualization engine
■
TABLE B-2 lists SUNWsecfg error messages specific to the Sun StorEdge network
FC switch-8 and switch-16 switches
■
■
TABLE B-3 lists SUNWsecfg error messages specific to the Sun StorEdge T3+ array
TABLE B-4 lists miscellaneous SUNWsecfg error messages common to all
components
131
For Internal Use Only
.
TABLE B-1 Virtualization Engine SUNWsecfg Error Messages
Message
Description and Cause of Error
Suggested Action
Common to
Invalid virtualization engine pair name Try ps -ef | grep savevemapor
virtualization engines
$vepair, or virtualization engine is
unavailable. Confirm that the
configuration locks are set. This is
usually due to the savevemap
command running.
listavailable -v(which returns
the status of individual virtualization
engines).
Common to
virtualization engine
No virtualization engine pairs found,
or the virtualization engine pairs are
offline. Confirm that the configuration
Try ps -ef | grep savevemapor
listavailable -v(which returns
the status of individual virtualization
locks are set. This is usually due to the engines).
savevemapcommand running.
Common to
Unable to obtain lock on $vepair.
Another virtualization engine
virtualization engine
Another command is running.
command is updating the
configuration. Try listavailable-v
(which returns the status of individual
virtualization engines) and check for
lock file directly by using ls-la
/opt/SUNWsecfg/etc (look for
.v1.lockor .v2.lock). If the lock is
set in error, use the removelocks -v
command to clear.
Common to
Unable to start slicdon ${vepair}. Try running startslicdand then
virtualization engine
Cannot execute command.
showlogs -e 50to determine why
startslicd couldn’t start the
daemon. You might have to reset or
power off the virtualization engine if
the problem persists.
Common to
Login failed. The environment variable A password is required to log in to the
virtualization engine
VEPASSWDmight be set to an incorrect
value. Try again.
virtualization engine. The utility uses
the VEPASSWD environment variable
to login. Set the VEPASSWD
environment variable with the proper
value.
Common to
virtualization engine
After resetting the virtualization
engine, the $VENAME is unreachable.
The hardware might be faulty.
Check the IP address and netmask that
has been assigned to the virtualization
engine hardware.
Be aware that after a reset, it takes
approximately 30 seconds to boot.
132 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
TABLE B-1 Virtualization Engine SUNWsecfg Error Messages (Continued)
Message
Description and Cause of Error
Suggested Action
Common to
virtualization engine
1. Device-side operating mode is not
set properly.
2. Device-side UID reporting scheme
is not set properly.
3. Host-side operating mode is not set
properly.
4. Host-side LUN mapping mode is
not set properly.
Log in to the virtualization engine and
verify that the device, host, and
network settings are correct. Make sure
the virtualization engine hardware is
not in ERROR 50 mode. If required,
power cycle the virtualization engine
hardware, or disable the host-side
switch port. Run the setupve -n
ve_namecommand and enable the
switch port.
5. Host-side Command Queue Depth
is not set properly.
6. Host-side UID distinguish is not set
properly.
7. IP is not set properly.
8. Subnet mask is not set properly.
9. Default gateway is not set properly.
10. Server port number is not set
properly.
11. Host WWN Authentications are not
set properly.
12. Host IP Authentications are not set
properly.
13. Other VEHOST IP is not set
properly.
checkslicd
checkslicd
Cannot establish communication with
${vepair}.
Run startslicd -n ${vepair}.
Cannot establish communication with
virtualization engine pair ${vepair}
initiator {$initiator}.
Determine the host name associated
with ${initiator}by using the
command output. Run the command
resetve -n vename.
checkvemap
Cannot establish communication with
${vepair}
Run the command again. If this fails,
check the status of both virtualization
engines. If there is an error condition,
see Appendix A for corrective action.
Appendix B
SUNWsecfg Error Messages
133
TABLE B-1 Virtualization Engine SUNWsecfg Error Messages (Continued)
Message
Description and Cause of Error
Suggested Action
createvezone
Invalid WWN $wwnon $vepair
initiator $init, or virtualization
engine is unavailable.
WWN that has already been specified
has a SLIC zone and/ or an HBA alias
assigned. Note that for a WWN to be
available for createvezone, the zone
name in the map file (showvemap -n
ve_pairname) must be “undefined”
and the online status should be “yes.”
If a zone name is assigned, run the
rmvezone command.
If there are still errors, try
sadapter alias -d $vepair -r
$initiator -a $zone -n “ “
and then run savemap -n $vepair.
listavailable
No virtualization engines are available. If no other commands are running and
They are either not found, or the
configuration lock is set.
you believe the configuration lock
might be set in error, run the
removelockscommand.
Either the components (the Sun
StorEdge T3+ array, the switch, or the
virtualization engine) are down
(cannot be pinged) or another
SUNWsecfgcommand is running and
is updating the configuration (ps -ef)
restorevemap
1. Import zone data failed
2. Restore physical and logical data
failed
Check the status of both virtualization
engines. If there is an error condition,
refer to Appendix A for corrective
action. Attempt to run the
3. Restore zone data failed
restorevemapcommand again.
setdefaultconfig
setdefaultconfig
1. Unable to properly configure the
virtualization engine host
${vehost}.
2. Cannot continue configuration of
other components.
Check the status of the virtualization
engine and try again.
The setupvecommandfailed.
Try running
setupve -n ve_hostname -v
(verbose mode) and check the errors.
Then run
checkve -n ve_hostname.
You can continue to configure VLUNs
and zones only if both of these
commands work.
134 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
TABLE B-2 Sun StorEdge Network FC Switch-8 and Switch-16 Switch SUNWsecfg Error Messages
Message
Description and Cause of Error
Suggested Action
Common Switch
Sun StorEdge system type entered,
Either call the command with the
${cab_type}, does not match system -f forceoption to force the series
type discovered, ${boxtype}.
type, or do not specify the cabinet type
(no -coption).
Common Switch
1. Unable to obtain lock on switch
${switch}. Another command is
running.
1. Another switch command might be
updating the configuration. Check
listavailable -s.
2. If the switch in question does not
appear, check for the existence of
the lock file directly by typing ls
-la /opt/SUNWsecfg/etc(look
for .$switch.lock).
3. If the lock is set in error, use the
removelocks -scommand to
clear it.
checkswitch
1. Current configuration on $switch
does not match the defined
configuration.
1. Select View Logs or directly view
$LOGFILEfor more details.
2. Re-run setupswitchon the
2. One of the predefined static switch
configuration parameters, that can
be overridden for special
specified $switch.
configurations such as NT connect
or cascaded switches, is set
incorrectly.
listavailable
No Sun StorEdge network FC switch-8 If no other commands are running and
or switch-16 switch devices are
available. They are either not found, or
the configuration lock is set.
you believe the configuration lock
might be set in error, run the
removelockscommand.
Either the components (the Sun
StorEdge T3+ array, the switch, or the
virtualization engine) are down
(cannot be pinged) or another
SUNWsecfgcommand is running and
is updating the configuration (ps -ef)
Appendix B
SUNWsecfg Error Messages
135
TABLE B-2 Sun StorEdge Network FC Switch-8 and Switch-16 Switch SUNWsecfg Error Messages
Message
Description and Cause of Error
Suggested Action
setswitchflash
Invalid flash file $flashfile.
Check the number of ports on switch
$switch.
You might be attempting to download
a flash file for an 8-port switch to a 16-
port switch. Check showswitch -s
$switchand look for “number of
ports.” Ensure that this matches the
second and third characters of the flash
file name; for example: m08030462.fls.
setswitchflash
setupswitch
${switch}timed out after reset. The
switch took longer than two minutes to or rarpis not working correctly. Try
reset after a configuration change.
The switch might not be set for rarp,
ping $switchafter waiting a few
more minutes. If errors persist,
manually power cycle the switch.
Switch ${switch}timed out after
reset.
The switch took longer than two
minutes to reset after a configuration
change. Try ping$switchafter
waiting a few more minutes. If errors
persist, manually power cycle the
switch.
setupswitch
Could not set chassis ID on switch
${switch} to ${cid}.
This should occur only in a SAN
environment with cascaded switches.
Be aware of the switch chassis IDs of
all switches in the SAN and make sure
the IDs are all unique. Once the chassis
IDs are established, override the switch
chassis IDs with the following
command:
setupswitch -s $switch_name -
i $unique_chassis_id -v.
136 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
TABLE B-3 Sun StorEdge T3+ Array SUNWsecfg Error Messages
Message
Description and Cause of Error
Suggested Action
Common to Sun
StorEdge T3+ array
Present configuration does not match
Reference configurations
Check the present Sun StorEdge T3+
array configuration with showt3 -n
<t3>command and verify whether
the configuration is corrupted or has
changed. If it is not one of the standard
configurations, restore the
configuration using the
restoret3configcommand.
Common to Sun
StorEdge T3+ array
1. Could not mount volume $vol.
2. $lunconfig does not match
There might be multiple drive failures
or corrupted data or parity on the
LUN. Replace the failed FRUs and
restore the Sun StorEdge T3+ array
configuration with the
restoret3config -f -n t3_name
command.
Common to Sun
StorEdge T3+ array
The $frustatus is not ready or
enabled. Operations on the Sun
StorEdge T3+ array are being aborted.
The disk, controller, or loop interface
card in the Sun StorEdge T3+ array
might be bad. Replace the failed FRU
and rerun the utility.
Common to Sun
StorEdge T3+ array
1. The Sun StorEdge T3+ array is not
of T3B type, and it cannot continue
aborting operations.
The Sun StorEdge T3 array
configuration is not a standard
configuration (refer to the t3
default/ custom configuration table in
the Sun StorEdge 3900 and 6900 Series
2. t3configutilities are supported
only in the Sun StorEdge T3+ array; Hardware Installation and Service
the t3configutilities are not
supported on Sun StorEdge T3+
arrays with 1.xx firmware.
Manual.)
Use showt3 -n t3_nameto display
the present configuration. Use the
modifyt3configand
restoret3configutility to
configure the Sun StorEdge T3+ array.
checkt3config
checkt3config
vol initcommand is being executed
by another user. Additional vol
commands cannot run.
Check whether any other secfgutility
is running. If one is running, allow it to
finish.
An error occurred while checking proc Check whether any other secfgor
list, aborting operation on
$BRICK_IP{$brick_name}
native Sun StorEdge T3+ commands
are being executed on the particular
Sun StorEdge T3+ array.
Appendix B
SUNWsecfg Error Messages
137
TABLE B-3 Sun StorEdge T3+ Array SUNWsecfg Error Messages (Continued)
Message
Description and Cause of Error
Suggested Action
checkt3config
Snapshot configuration files are not
Make sure that the snapshot files are
present. Unable to check configuration. saved and have read permissions in
the /opt/SUNWsecfg/etc/t3name/
directory. If the snapshot files are not
available, , create them by using the
savet3configcommand.
checkt3mount
1. The $lunstatus reported a bad or
Make sure that the requested LUN
exists on the Sun StorEdge T3+ array
by using the showt3 -n command.
nonexistent LUN.
2. While checking the configuration
using the showt3 -ncommand,
operations abort.
Confirm that the Sun StorEdge T3+
array configuration matches standard
configurations.
createvlun
Invalid diskpool $diskpool on $vepair, Ensure the diskpool was created
or diskpool is unavailable.
properly using the showvemap -n
$vepair command. If the diskpool is
unavailable, try creatediskpools -
n $t3name.
If that fails, check the Sun StorEdge
T3+ array for unmounted volumes or
path failures, by using
checkt3config -n $t3name -v.
createvlun
Unable to execute command. The
associated Sun StorEdge T3+ array
Run checkt3mount -n $t3name
-l ALLto see the mount status of the
physical LUN ${t3lun}, for disk pool volume. For further information about
${diskpool}, might not be mounted. problems with the underlying Sun
StorEdge T3+ array, try
checkt3config -n $t3name -v.
listavailable
No Sun StorEdge T3+ arrays are
available. They are either not found, or
the configuration lock is set.
If no other commands are running and
you believe the configuration lock
might be set in error, run the
removelockscommand.
Either the components (the Sun
StorEdge T3+ array, the switch, or the
virtualization engine) are down
(cannot be pinged) or another
SUNWsecfgcommand is running and
is updating the configuration (ps-ef).
modifyt3config
The lock file clear waiting period
expired and the creatediskpools
command is aborted.
Check to see if the modifyt3config
and restoret3configcommands
are executing on other Sun StorEdge
T3+ arrays. If the commands are
executing, wait for them to complete,
and then run creatediskpools -n
t3name.
138 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
TABLE B-3 Sun StorEdge T3+ Array SUNWsecfg Error Messages (Continued)
Message
Description and Cause of Error
Suggested Action
restoret3config
Error while the block size compare
command is executing. The
$BRICK_IP{$IPADD}command is
aborted.
The Sun StorEdge T3+ array block size
parameter is different from the
snapshot file. The Sun StorEdge T3+
array may have been reconfigured.
Run restoret3config.
restoret3config
restoret3config
$LUNconfiguration failed to restore
and the force option was used to
reinitialize, without success
Check the Sun StorEdge T3+
configuration with the showt3 -n
t3_name command. Restore the Sun
StorEdge T3+ array configuration with
the restoret3configcommand.
$LUNconfiguration is not found in the
$restore_file. Cannot restore
$LUN.
Check for snapshot files in the
/ opt/ SUNWsecfg/ etc/ t3_name/
directory. If the snapshot files are not
found, use the modifyt3config
command to configure the Sun
StorEdge T3+ array.
savet3config
While checking the configuration, the
Sun StorEdge T3+ array configuration
has not been saved.
Check the Sun StorEdge T3+ array
configuration by using the showt3 -n
t3_name command, if the
configuration is different from
standard Sun StorEdge T3
configurations. Use the
modifyt3configcommand to
reconfigure the device.
Appendix B
SUNWsecfg Error Messages
139
TABLE B-4 Other SUNWsecfgError Messages
Message
Description and Cause of Error
Suggested Action
Common to all
components
If the Sun StorEdge 3900 or 6900 series Set the BOXTYPE variable as follows:
has multiple (more than two) failures
(for example, both virtualization
engines and two switches are down),
the getcabinettool might not
BOXTYPE=6910; export BOXTYPE
determine the correct cabinet type. In
this example, the getcabinetscript
might determine the device to be a Sun
StorEdge 3900 series when, in reality, it
is a Sun StorEdge 6900 series.
Could not determine the Sun StorEdge Try using the command line interface
checkdefaultconfig
setdefaultconfig
system type.
(CLI) by setting the BOXTYPE
environment variable to one of the four
values.
Multiple components might be down
and the getcabinet command could not
determine the Sun StorEdge series type For example, BOXTYPE=3910; export
(3910, 3960, 6910, or 6960).
BOXTYPE).
The system could not determine the
Sun StorEdge system type.
Try using the command line interface
(CLI) by setting the BOXTYPE
environment variable to one of the four
values.
For example, BOXTYPE=3910; export
BOXTYPE).
140 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
setupswitch Exit Values
TABLE 9-1 lists the setupswitch exit values. The associated messages are logged in the
/var/adm/log/SEcfgloglog file.
TABLE 9-1 setupswitchExit Values
Severity
Level
Message Type
Message Meaning
0
1
2
INFO
All switch settings are properly set. The switch setting matches the default
configuration.
ERROR
Errors occurred while trying to set the proper switch settings.The switch
setting does not match the default configuration or any valid alternatives.
WARNING
Errors occurred while trying to set the proper switch settings. The ports did
not self-configure properly. A cable connection might not be working
properly. T ports self-configure (that is, the configuration tool cannot control
the configuration) from F ports when they are cabled properly. Specifically,
these are the ports on the back-end switches in Sun StorEdge 6900 series
configurations only. The ports support the ISL connections.
3
4
WARNING
WARNING
The Flash code is different from the release level. The switch Flash code does
not match the current release version 30462.
This is not an error; QLogic periodically releases new versions of the switch
Flash code and the new version will not match the default version.
The configuration is not set to the default, but the differences are likely
supported alternatives. The default switch configurations were overridden
with valid alternatives, which are also supported by the SUNWsecfg
configuration tools. It should still be flagged as “not the default.” It can
imply any of the following alternatives (these messages are printed to the
screen and to the Storage Automated Diagnostic Environment GUI):
• INFO—Some ports have been set to SL mode, but should have been set
using the setswitchs1command. View and verify this nonstandard
configuration setup as required using the showswitchcommand. Refer to
the Sun StorEdge 3900 and 6900 Series Reference Manual for detailed
configuration information.
• INFO—The chassis ID on the switch is not set to the default value. This
could be caused by unique ID settings or by conflicts in a SAN environment.
• INFO—Ports are identified that are not in the default hard zone. This
could be because the port is set to the same hard zone as the cascaded
switch in a SAN environment.
NOTE: If multiple solutions are connected to a switch, the switch settings might not match the default settings.
Appendix B
SUNWsecfg Error Messages
141
142 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
Index
A
related documentation, 123
troubleshooting, 123
host, 53
C
installation of, 99
checkswitch
used to diagnose and troubleshooting switch, 62
comments
sending documentation comments, xv
F
configuration settings, 47
verification of, 47
fibre channel link
FRU tests for A2/ B2 link, 33
troubleshooting A1/ B1 link, 23
troubleshooting A4/ B4 link, 40
D
diagrams
fibre channel link, 15, 16
documentation
how book is organized, xi
shell prompts, xiii
fibre channel links
used for PFA, 2
FRU tests
E
available for A1/ B1 FC link, 26
error status
checking fibre channel link manually, 76
error status report
fibre channel link, 75
Index 143
H
series, 2
host device names
translating, 78
Q
host devices
troubleshooting, 53
host event grid, 53
host side troubleshooting, 18
S
SAN database
I
IO
service processor troubleshooting, 18
suspension of, 10, 13
isolation procedures
for A2/ B2 link, 33
Sun StorEdge 3900 and 6900 series
description of, 1
L
logical view, 90
link error
primary data paths to Sun StorEdge T3+
example of severe data host error, 24
lock file
how to clear, 50
IO routed through both HBAs, 94
used to display information, 12
switch
M
monitoring functions for Sun StorEdge 3900 and
6900 Series, 2
LUN failover, 10
enabled devices, 80
N
notification
svengine command, 72
used in PFA, 2
switch
notification events
T1/ T2, 103
pairing through SANSurfer GUI, 62
P
paths
T
T1/ T2
how to unconfigure, 8
returning to production, 10, 14
FRU tests available, 107
Index 144
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
how to find, 79
T1/ T2 data path
used in troubleshooting, 11
test examples
VxDMP error message
thresholds
of virtualization engine, 9
quiesce IO, 5
Sun StorEdge T3+ array, 99
T1/ T2 data path, 102
test and isolate FRUs, 5
tools and resources available, 3
V
how to replace, 84
references, 125
LEDs, 72
reading LED service and diagnostic codes, 73
retrieving service information, 70
service and diagnostic codes, 70
service request numbers, 70
troubleshooting, 69
Index 145
For Internal Use Only
Index 146
Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002
|