Sun Microsystems StorEdge 6900 Series User Manual

™

Sun StorEdge 3900 and 6900

Series Troubleshooting Guide

Sun Microsystems, Inc.

4150 Network Circle

Santa Clara, CA 95054 U.S.A.

650-960-1300

Part No. 816-4290-11

March 2002, Revision A

Send comments about this document to: [email protected]

Contents

1. Introduction

Predictive Failure Analysis Capabilities

2. General Troubleshooting Procedures

Troubleshooting Overview Tasks

Multipathing Options in the Sun StorEdge 6900 Series

Alternatives to Sun StorEdge Traffic Manager

To Quiesce the I/ O

▼

To Unconfigure the c2 Path

To Suspend the I/ O 10

To Return the Path to Production 10

To View the VxDisk Properties 11

To Quiesce the I/ O on the A3/ B3 Link 13

To Suspend the I/ O on the A3/ B3 Link 13

To Return the Path to Production 14

Fibre Channel Links 15

Fibre Channel Link Diagrams 16

Host Side Troubleshooting 18

Storage Service Processor Side Troubleshooting 18

Contents

iii

For Internal Use Only

Command Line Test Examples 19

qlctest(1M) 19

switchtest(1M) 20

Storage Automated Diagnostic Environment Event Grid 21

▼

To Customize an Event Report 21

3. Troubleshooting the Fibre Channel Links 23

A1/ B1 Fibre Channel (FC) Link 23

▼

To Verify the Data Host 25

FRU Tests Available for A1/ B1 FC Link Segment 26

To Isolate the A1/ B1 FC Link 28

A2/ B2 Fibre Channel (FC) Link 29

▼

To Verify the Host Side 31

To Verify the A2/ B2 FC Link 33

To Isolate the A2/ B2 FC Link 33

A3/ B3 Fibre Channel (FC) Link 35

▼

To Verify the Host Side 37

To Verify the Storage Service Processor 38

FRU Tests Available for the A3/ B3 FC Link Segment 38

To Isolate the A3/ B3 FC Link 39

▼

A4/ B4 Fibre Channel (FC) Link 40

▼

To Verify the Data Host 42

Sun StorEdge 3900 Series 42

S u n S torEdge 6900 Series 42

FRU tests available for the A4/ B 4 FC Li n k S egment 44

To Isolate the A4/ B4 FC Link 44

▼

4. Configuration Settings 47

Verifying Configuration Settings 47

Contents

For Internal Use Only

▼

To V e rify Configuration Settings 47

To Clear the Lock File 50

5. Troubleshooting Host Devices 53

Host Event Grid 53

▼

Using the Host Event Grid 53

Replacing the Master, Alternate Master, and Slave Monitoring Host 57

▼

To Replace the Master Host 57

To Replace the Alternate Master or Slave Monitoring Host 58

Conclusion 59

6. Troubleshooting Sun StorEdge FC Switch-8 and Switch-16 Devices 61

Sun StorEdge Network FC Switch-8 and Switch-16 Switch Description 61

▼

To Diagnose and Troubleshoot Switch Hardware 62

Switch Event Grid 62

Using the Switch Event Grid 62

Replacing the Master Midplane 68

▼

To Replace the Master Midplane 68

Conclusion 68

7. Troubleshooting Virtualization Engine Devices 69

Virtualization Engine Description 69

Virtualization Engine Diagnostics 70

Service Request Numbers 70

Service and Diagnostic Codes 70

▼

To Retrieve Service Information 70

CLI Interface 70

▼

To Display Log Files and Retrieve SRNs 71

To Clear the Log 72

Contents

For Internal Use Only

Virtualization Engine LEDs 72

Power LED Codes 73

Interpreting LED Service and Diagnostic Codes 73

Back Panel Features 74

Ethernet Port LEDs 74

Fibre Channel Link Error Status Report 75

To Check Fibre Channel Link Error Status Manually 76

▼

Translating Host Device Names 78

▼

To Display the VLUN Serial Number 79

Devices That Are Not Sun StorEdge Traffic Manager-Enabled 79

Sun StorEdge Traffic Manager-Enabled Devices 80

To View the Virtualization Engine Map 81

▼

To Failback the Virtualization Engine 83

To Replace a Failed Virtualization Engine 84

▼

To Manually Clear the SAN Database 86

▼

To Reset the SAN Database on Both Virtualization Engines 86

To Reset the SAN Database on a Single Virtualization Engine 86

Stopping and Restarting the SLIC Daemon 87

▼

To Restart the SLIC Daemon 87

S u n S torEdge 6900 Series Multipathing Example 89

One Sun StorEdge T3+ array partner pair with 1 500GB RAID 5 LUN per

brick (2 LUNs total) 89

Virtualization Engine Event Grid 95

▼

Using the V irtualization Engine Event Grid 95

8. Troubleshooting the Sun StorEdge T3+ Array Devices 99

Explorer Data Collection Utility 99

▼

To Install Explorer Data Collection Utility on the Storage Service

Processor 99

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

Troubleshooting the T1/ T2 Data Path 102

Notes 102

T1/ T2 Notification Events 103

Sun StorEdge T3+ Array Storage Service Processor Verification 106

T1/ T2 FRU Tests Available 107

Notes 108

T1/ T2 Isolation Procedures 108

Sun StorEdge T3+ Array Event Grid 109

▼

Using the Sun StorEdge T3+ Array Event Grid 109

Replacing the Master Midplane 122

▼

To Replace the Master Midplane 122

Conclusion 122

9. Troubleshooting Ethernet Hubs 123

setupswitch Exit Values 141

Contents vii

For Internal Use Only

viii

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

List of Figures

Sun StorEdge 3900 Series Fibre Channel Link Diagram 16

Sun StorEdge 6900 Series Fibre Channel Link Diagram 17

Data Host Notification of Intermittent Problems 23

Data Host Notification of Severe Link Error 24

Storage Service Processor Notification 24

A2/B2 FC Link Host Side Event 29

A2/B2 FC Link Storage Service Processor Side Event 30

A3/B3 FC Link Host-Side Event 35

A3/B3 FC Link Storage Service Processor-Side Event 36

A4/B4 FC Link Data Host Notification 40

FIGURE 3-10 Storage Service Processor Notification 41

FIGURE 5-1

FIGURE 6-1

FIGURE 7-1

FIGURE 7-2

FIGURE 7-3

FIGURE 7-4

FIGURE 7-5

Host Event Grid 54

Switch Event Grid 63

Virtualization Engine Front Panel LEDs 73

Sun StorEdge 6900 Series Logical View 90

Primary Data Paths to the Alternate Master 91

Primary Data Paths to the Master Sun StorEdge T3+ Array 92

Path Failure—Before the Second Tier of Switches 93

List of Figures

FIGURE 7-6

FIGURE 7-7

FIGURE 8-1

FIGURE 8-2

FIGURE 8-3

FIGURE 8-4

FIGURE 8-5

Path Failure —I/O Routed through Both HBAs 94

Virtualization Engine Event Grid 95

Storage Service Processor Event 103

Virtualization Engine Alert 105

Manage Configuration Files Menu 106

Example Link Test Text Output from the Storage Automated Diagnostic Environment 107

Sun StorEdge T3+ array Event Grid 109

List of Figures

Preface

The Sun StorEdge 3900 and 6900 Series Troubleshooting Guide provides guidelines

for isolating problems in supported configurations of the Sun StorEdge^TM3900 and

6900 series. For detailed configuration information, refer to the Sun StorEdge 3900

and 6900 Series Reference Manual.

The scope of this troubleshooting guide is limited to information pertaining to the

components of the Sun StorEdge 3900 and 6900 series, including the Storage Service

Processor and the virtualization engines in the Sun StorEdge 6900 series. This guide

is written for Sun personnel who have been fully trained on all the components in

the configuration.

How This Book Is Organized

This book contains the following topics:

Chapter 1 introduces the Sun StorEdge 3900 and 6900 series storage subsystems.

Chapter 2 offers general troubleshooting guidelines, such as quiescing the I/ O, and

tools you can use to isolate and troubleshoot problems.

Chapter 3 provides Fibre Channel link troubleshooting procedures.

Chapter 4 presents information about configuration settings, specific to the Sun

StorEdge 3900 and 6900 series. It also provides a procedure for how to clear the lock

file.

Chapter 5 provides information on host device troubleshooting.

Chapter 6 provides information on Sun StorEdge network FC switch-8 and switch-

16 switch device troubleshooting.

Chapter 7 provides detailed information for troubleshooting the virtualization

engines.

Chapter 8 describes how to troubleshoot the Sun StorEdge T3+ array devices. Also

included in this chapter is information about the Explorer Data Collection Utility.

Chapter 9 discusses ethernet hub troubleshooting. Information associated with the

3COM Ethernet hubs is limited in this guide, however, as this is third-party

information.

Appendix A provides virtualization engine references, including SRN and SNMP

Reference, an SRN/ SNMP single point of failure table, and port communication and

service code tables.

Appendix B provides a list of SUNWsecfg Error Messages and recommendations for

corrective action.

Using UNIX Commands

This document may not contain information on basic UNIX commands and

procedures such as shutting down the system, booting the system, and configuring

devices.

See one or more of the following for this information:

■ Solaris Handbook for Sun Peripherals

■ AnswerBook2™ online documentation for the Solaris™ operating environment

■ Other software documentation that you received with your system

xii

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

Typographic Conventions

Typeface

Meaning

Examples

AaBbCc123

The names of commands, files,

and directories; on-screen

computer output

Edit your.loginfile.

Use ls-ato list all files.

% You have mail.

What you type, when

contrasted with on-screen

computer output

% su

Password:

AaBbCc123

Book titles, new words or terms, Read Chapter 6 in the User’s Guide.

words to be emphasized

These are called class options.

You must be superuser to do this.

Command-line variable; replace To delete a file, type rmfilename.

with a real name or value

Shell Prompts

Shell

Prompt

C shell

machine_name%

C shell superuser

machine_name#

Bourne shell and Korn shell

Bourne shell and Korn shell superuser

Preface

xiii

Category

Event Grid Sorting Criteria

Component

Event Type

Severity

Action

• All (Default)

• All

(Default)

• Agent Deinstall

• Agent Install

Red—

Critical

(Error)

Y—This

event is

actionable

and is sent

to RSS/

• Sun StorEdge A3500FC array

• Sun StorEdge A5000 array

• Agent

• Host

• Message

• Sun Switch

• Sun StorEdge T3+ array

• Tape

• Backplane • Alarm

• Controller • Alternate Master +

• Disk

• Interface

• LUN

• Alternate Master—

• Audit

• CommunicationEstablished

• CommunicationLost

• Discovery

SRS

Yellow—

Alert

(Warning)

• Port

• Power

N—This

event is

non

• Vvirtualization engine

• Heartbeat

• Insert Component

• Location Change

• Patch Info

actionable

• Quiesce End

• Quiesce Start

• Removal

• Remove Component

• State Change +(from offline

to online)

Down—

System

Down

• State Change—(from online

to offline)

• Statistics

• Backup

Chapter 2

General Troubleshooting Procedures

For Internal Use Only

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

CHAPTER

Troubleshooting the Fibre Channel

Links

A1/ B1 Fibre Channel (FC) Link

If a problem occurs with the A1/ B1 FC link:

■ In a Sun StorEdge 3900 series system, the Sun StorEdge T3+ array will fail over.

■ In a Sun StorEdge 6900 series system, no Sun StorEdge T3+ array will fail over,

but a severe problem can cause a path to go offline.

FIGURE 3-1, FIGURE 3-2, and FIGURE 3-3 are examples of A1/ B1 Fibre Channel Link

Notification Events.

Site

Source

: FSDE LAB Broomfield CO

: diag.xxxxx.xxx.com

Severity : Normal

Category : Message

Key: message:diag.xxxxx.xxx.com

EventType: LogEvent.driver.LOOP_OFFLINE

EventTime: 01/08/2002 14:34:45

Found 1 ’driver.LOOP_OFFLINE’ error(s) in logfile: /var/adm/messages on

diag.xxxxx.xxx.com (id=80fee746):

info: Loop Offline

Jan 8 14:34:25 WWN:

Received 2 ’Loop Offline’ message(s) [threshold is 1

in 5mins] Last-Message: ’diag.xxxxx.xxx.com qlc: [ID 686697 kern.info] NOTICE:

Qlogic qlc(0): Loop OFFLINE ’

FIGURE 3-1 Data Host Notification of Intermittent Problems

Site

Source

: FSDE LAB Broomfield CO

: diag.xxxxx.xxx.com

Severity : Normal

Category : Message

Key: message:diag.xxxxx.xxx.com

EventType: LogEvent.driver.MPXIO_offline

EventTime: 01/08/2002 14:48:02

Found 2 ’driver.MPXIO_offline’ warning(s) in logfile: /var/adm/messages on

diag.xxxxx.xxx.com (id=80fee746):

Jan 8 14:47:07 WWN:2b000060220041f9

diag.xxxxx.xxx.com mpxio: [ID

779286 kern.info] /scsi_vhci/ssd@g29000060220041f96257354230303053

(ssd19) multipath status: degraded, path /pci@6,4000/SUNW,qlc@3/fp@0,0

(fp1) to target address: 2b000060220041f9,1 is offline

Jan 8 14:47:07 WWN:2b000060220041f9

diag.xxxxx.xxx.com mpxio: [ID

779286 kern.info] /scsi_vhci/ssd@g29000060220041f96257354230303052

(ssd18) multipath status: degraded, path /pci@6,4000/SUNW,qlc@3/fp@0,0

(fp1) to target address: 2b000060220041f9,0 is offline

FIGURE 3-2 Data Host Notification of Severe Link Error

Site

Source

: FSDE LAB Broomfield CO

: diag.xxxxx.xxx.com

Severity : Normal

Category : Switch

Key: switch:100000c0dd0057bd

EventType: StateChangeEvent.X.port.6

EventTime: 01/08/2002 14:54:20

’port.6’ in SWITCH diag-sw1a (ip=192.168.0.30) is now Unknown (status-

state changed from ’Online’ to ’Admin’):

FIGURE 3-3 Storage Service Processor Notification

Note – An A1/ B1 FC link error can cause a port in sw1aor sw1bto change state.

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

▼ To Verify the Data Host

An error in the A1/ B1 FC link can cause a path to go offline in the multipathing

software.

CODE EXAMPLE 3-1 luxadm(1M) Display

# luxadm display

/dev/rdsk/c6t29000060220041F96257354230303052d0s2

DEVICE PROPERTIES for disk: /dev/rdsk/

c6t29000060220041F96257354230303052d0s2

Status(Port A):

Status(Port B):

Vendor:

O.K.

SUN

Product ID:

WWN(Node):

WWN(Port A):

WWN(Port B):

Revision:

SESS01

2a000060220041f4

2b000060220041f4

2b000060220041f9

080C

Serial Num:

Unsupported

Unformatted capacity: 102400.000 MBytes

Write Cache:

Read Cache:

Minimum prefetch:

Maximum prefetch:

Device Type:

Path(s):

Enabled

0x0

Disk device

/dev/rdsk/c6t29000060220041F96257354230303052d0s2

/devices/scsi_vhci/ssd@g29000060220041f96257354230303052:c,raw

Controller

Device Address

Class

/devices/pci@6,4000/SUNW,qlc@3/fp@0,0

2b000060220041f9,0

primary

State

OFFLINE

Controller

Device Address

Class

/devices/pci@6,4000/SUNW,qlc@2/fp@0,0

2b000060220041f4,0

primary

State

ONLINE

...

Chapter 3

Troubleshooting the Fibre Channel Links

For Internal Use Only

An error in the A1/ B1 FC link can also cause a device to enter the “unusable” state

in cfgadm. In this case, the output for luxadm-eportwill show that a device that

was “connected” changed to an “unconnected” state.

CODE EXAMPLE 3-2 cfgadm -al Display

...

# cfgadm -al

Ap_Id

c0::dsk/c0t0d0

c0::dsk/c0t1d0

Type

scsi-bus

disk

Receptacle

connected

Occupant

Condition

unknown

configured

unconfigured unknown

configured

disk

scsi-bus

CD-ROM

fc-fabric

unknown

disk

fc-fabric

disk

fc-private

c1::dsk/c1t6d0

c2::210100e08b23fa25

c2::2b000060220041f4

unknown

c3::2b000060220041f9

configured unusable

unconfigured unknown

FRU Tests Available for A1/ B1 FC Link Segment

■ HBA—qlctest(1M)

■

Available only if the Storage Automated Diagnostic Environment is installed

on a data host

■

Causes HBA to go “offline” and “online” during tests

■ Switch —switchtest(1M)

■

Can be run while the link is still cabled and online (connected to HBA)

■

You must specify a payload of 200 bytes or less when testing the A1/ B1 FC

link, while the link is connected to the HBA (limitation in HBA ASIC).

■

Can be run only from the Storage Service Processor

The devoption to switchtestis in the following format:

Port:IP-Address:FCAddress

The FCAddresscan be set to 0x0

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

CODE EXAMPLE 3-3 switchtest(1M) called with options

# ./switchtest -v -o "dev=2:192.168.0.30:0"

"switchtest: called with options: dev=2:192.168.0.30:0"

"switchtest: Started."

"Testing port: 2"

"Using ip_addr: 192.168.0.30, fcaddr: 0x0 to access this port."

"Chassis Status for Device: Switch Power: OK Temp: OK 23.0c Fan 1: OK

Fan 2: OK "

02/06/02 15:09:45 diag Storage Automated Diagnostic Environment MSGID 4001

switchtest.WARNING

switch0: "Maximum transfer size for a FABRIC port is 200. Changing

transfer size 2000 to 200"

"Testing Device: Switch Port: 2 Pattern: 0x7e7e7e7e"

"Testing Device: Switch Port: 2 Pattern: 0x1e1e1e1e"

Note – The Storage Automated Diagnostic Environment automatically resets the

transfer size if it notes that it is about to test a switch to HBA connection. This is

done both in the Storage Automated Diagnostic Environment GUI and from the

command-line interface (CLI).

Chapter 3

Troubleshooting the Fibre Channel Links

For Internal Use Only

▼ To Isolate the A1/ B1 FC Link

1. Quiesce the I/O on the A1/B1 FC link path.

2. Run switchtestor qlctestto test the entire link.

3. Break the connection by uncabling the link.

4. Insert a loopback connector into the switch port.

5. Rerun switchtest.

a. If switchtestfails, replace the GBIC and rerun switchtest.

b. If switchtestfails again, replace the switch.

6. Insert a loopback connector into the HBA.

7. Run qlctest.

■ If the test fails, replace the HBA.

■ If the test passes, replace the cable.

8. Recable the entire link.

9. Run switchtestor qlctestto validate the fix.

10. Return the path to production.

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

A2/ B2 Fibre Channel (FC) Link

If a problem occurs with the A2/ B2 FC link:

■ In a Sun StorEdge 3900 series system, the Sun StorEdge T3+ array will fail over.

■ In a Sun StorEdge 6900 series system, no Sun StorEdge T3+ array will fail over,

but a severe problem can cause a path to go offline.

FIGURE 3-4 and FIGURE 3-5 are examples of A2/ B2 FC Link Notification Events.

From root Tue Jan 8 18:39:48 2002

Date: Tue, 8 Jan 2002 18:39:47 -0700 (MST)

Message-Id: <[email protected]>

From: Storage Automated Diagnostic Environment.Agent

Subject: Message from ’diag.xxxxx.xxx.com’ (2.0.B2.002)

Content-Length: 2742

You requested the following events be forwarded to you from

’diag.xxxxx.xxx.com’.

Site

Source

: FSDE LAB Broomfield CO

: diag226.xxxxx.xxx.com

Severity : Normal

Category : Message

Key: message:diag.xxxxx.xxx.com

EventType: LogEvent.driver.Fabric_Warning

EventTime: 01/08/2002 17:34:47

Found 1 ’driver.Fabric_Warning’ warning(s) in logfile: /var/adm/messages

on diag.xxxxx.xxx.com (id=80fee746):

Info: Fabric warning

Jan 8 17:34:36 WWN:2b000060220041f4

diag.xxxxx.xxx.com fp: [ID 517869

kern.warning] WARNING: fp(0): N_x Port with D_ID=108000,

PWWN=2b000060220041f4 disappeared from fabric

<snip>

multipath status: degraded, path /pci@6,4000/SUNW,qlc@2/fp@0,0 (fp0) to

target address: 2b000060220041f4,1 is offline

Jan 8 17:34:55 WWN:2b000060220041f4

diag.xxxxx.xxx.com

mpxio: [ID 779286 kern.info] /scsi_vhci/

ssd@g29000060220041f96257354230303052 (ssd18)

multipath status: degraded, path /pci@6,4000/SUNW,qlc@2/fp@0,0 (fp0) to

target address: 2b000060220041f4,0 is offline

FIGURE 3-4 A2/ B2 FC Link Host Side Event

Chapter 3

Troubleshooting the Fibre Channel Links

For Internal Use Only

Site

Source

: FSDE LAB Broomfield CO

: diag.xxxxx.xxx.com

Severity : Normal

Category : Switch

Key: switch:100000c0dd0061bb

EventType: StateChangeEvent.X.port.1

EventTime: 01/08/2002 17:38:32

’port.1’ in SWITCH diag-sw1b (ip=192.168.0.31) is now Unknown (status-

state changed from ’Online’ to ’Admin’):

----------------------------------------------------------------

Site

Source

: FSDE LAB Broomfield CO

: diag.xxxxx.xxx.com

Severity : Normal

Category : San

Key: switch:100000c0dd0061bb:1

EventType: LinkEvent.ITW.switch|ve

EventTime: 01/08/2002 17:39:47

ITW-ERROR (765 in 11 mins): Origin: port 1 on ’switch ’sw1b/192.168.0.31’.

Destination: port 1 on ve ’diag-v1b/29000060220041f4’:

Info:

An invalid transmission word (ITW) was detected between two components.

This could indicate a potential problem.

Cause:

Likely Causes are: GBIC, FC Cable and device optical connections.

Action:

To isolate further please run the Storage Automated Diagnostic Environment

tests associated with this link segment.

FIGURE 3-5 A2/ B2 FC Link Storage Service Processor Side Event

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

▼ To Verify the Host Side

An error in the A2/ B2 FC link can result in a device being listed as in an “unusable”

state in cfgadm, but no HBAs are listed as in the “unconnected” state in luxadm

output. The multipathing software will note an OFFLINE path.

Chapter 3

Troubleshooting the Fibre Channel Links

For Internal Use Only

CODE EXAMPLE 3-4 cfgadm -al

# cfgadm -al

Ap_Id

Type

scsi-bus

Receptacle Occupant

Condition

connected

configured unknown

<snip>

# luxadm -e port

Found path to 2 HBA ports

/devices/pci@6,4000/SUNW,qlc@2/fp@0,0:devctl

/devices/pci@6,4000/SUNW,qlc@3/fp@0,0:devctl

CONNECTED

# luxadm display /dev/rdsk/c6t29000060220041F96257354230303052d0s2

DEVICE PROPERTIES for disk: /dev/rdsk/

c6t29000060220041F96257354230303052d0s2

Status(Port A):

Status(Port B):

Vendor:

O.K.

SUN

Product ID:

WWN(Node):

WWN(Port A):

WWN(Port B):

Revision:

SESS01

2a000060220041f9

2b000060220041f9

2b000060220041f4

080C

Serial Num:

Unsupported

Unformatted capacity: 102400.000 MBytes

Write Cache:

Read Cache:

Minimum prefetch:

Maximum prefetch:

Device Type:

Path(s):

Enabled

0x0

Disk device

/dev/rdsk/c6t29000060220041F96257354230303052d0s2

/devices/scsi_vhci/ssd@g29000060220041f96257354230303052:c,raw

Controller

Device Address

Class

/devices/pci@6,4000/SUNW,qlc@3/fp@0,0

2b000060220041f9,0

primary

State

ONLINE

Controller

Device Address

Class

/devices/pci@6,4000/SUNW,qlc@2/fp@0,0

2b000060220041f4,0

primary

State

OFFLINE

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

Note – You can find procedures for restoring virtualization engine settings in the

Sun StorEdge 3900 and 6900 Series Reference Manual.

▼ To Verify the A2/ B2 FC Link

You can check the A2/ B2 FC link using the Storage Automated Diagnostic

Environment, Diagnose—Test from Topology functionality. The Storage Automated

Diagnostic Environment’s implementation of diagnostic tests verifies the operation

of user-selected components. Using the Topology view, you can select specific tests,

subtests, and test options.

Refer to Chapter 5 of the Storage Automated Diagnostic Environment User’s Guide for

more information.

FRU Tests Available for A2/B2 FC Link Segment

■ The linktestis not available.

■ The switch and/ or GBIC—switchtesttest:

■

Can be used only in conjunction with the loopback connector.

■

Cannot be cabled to the virtualization engine while switchtestruns.

■ No virtualization engine tests are available at this time.

▼ To Isolate the A2/ B2 FC Link

1. Quiesce the I/O on the A2/B2 FC link path.

2. Break the connection by uncabling the link.

3. Insert the loopback connector into the switch port.

4. Run switchtest:

a. If the test fails, replace the GBIC and rerun switchtest.

b. If the test fails again, replace the switch.

Chapter 3

Troubleshooting the Fibre Channel Links

For Internal Use Only

5. If the switch or the GBIC show no errors, replace the remaining components in

the following order:

a. Replace the virtualization engine-side GBIC, recable the link, and monitor the

link for errors.

b. Replace the cable, recable the link, and monitor the link for errors.

c. Replace the virtualization engine, restore the virtualization engine settings,

recable the link, and monitor the link for errors

6. Return the path to production.

The procedures for restoring virtualization engine settings are in the Sun StorEdge

3900 and 6900 Series Reference Manual.

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

A3/ B3 Fibre Channel (FC) Link

If a problem occurs with the A3/ B3 FC link:

■ In a Sun StorEdge 3900 series system, the Sun StorEdge T3+ array will fail over.

■ In a Sun StorEdge 6900 series system, no Sun StorEdge T3+ array will fail over,

but a severe problem can cause a path to go offline.

FIGURE 3-6, FIGURE 3-7, and FIGURE 3-8 are examples of A3/ B3 FC link Notification

Events.

Site

Source

: FSDE LAB Broomfield CO

: diag.xxxxx.xxx.com

Severity : Normal

Category : Message

Key: message:diag.xxxxx.xxx.com

EventType: LogEvent.driver.MPXIO_offline

EventTime: 01/08/2002 18:25:18

Found 2 ’driver.MPXIO_offline’ warning(s) in logfile: /var/adm/messages on

diag.xxxxx.xxx.com (id=80fee746):

Jan 8 18:24:24 WWN:2b000060220041f9

diag.xxxxx.xxx.com mpxio: [ID

779286 kern.info] /scsi_vhci/ssd@g29000060220041f96257354230303053

(ssd19) multipath status: degraded, path /pci@6,4000/SUNW,qlc@3/fp@0,0

(fp1) to target address: 2b000060220041f9,1 is offline

Jan 8 18:24:24 WWN:2b000060220041f9

diag.xxxxx.xxx.com mpxio: [ID

779286 kern.info] /scsi_vhci/ssd@g29000060220041f96257354230303052

(ssd18) multipath status: degraded, path /pci@6,4000/SUNW,qlc@3/fp@0,0

(fp1) to target address: 2b000060220041f9,0 is offline

----------------------------------------------------------------

Site

Source

: FSDE LAB Broomfield CO

: diag.xxxxx.xxx.com

Severity : Normal

Category : Message

Key: message:diag.xxxxx.xxx.com

EventType: LogEvent.driver.Fabric_Warning

EventTime: 01/08/2002 18:25:18

Found 1 ’driver.Fabric_Warning’ warning(s) in logfile: /var/adm/messages

on diag.xxxxx.xxx.com (id=80fee746):

Info:

Fabric warning

Jan 8 18:24:04 WWN:2b000060220041f9

diag.xxxxx.xxx.com fp: [ID 517869

kern.warning] WARNING: fp(1): N_x Port with D_ID=104000,

PWWN=2b000060220041f9 disappeared from fabric

FIGURE 3-6 A3/ B3 FC Link Host-Side Event

Chapter 3

Troubleshooting the Fibre Channel Links

For Internal Use Only

Site

Source

: FSDE LAB Broomfield CO

: diag.xxxxx.xxx.com

Severity : Normal

Category : Switch

Key: switch:100000c0dd0057bd

EventType: StateChangeEvent.M.port.1

EventTime: 01/08/2002 18:28:38

’port.1’ in SWITCH diag-sw1a (ip=192.168.0.30) is now Not-Available

(status-state changed from ’Online’ to ’Offline’):

Info:

A port on the switch has logged out of the fabric and gone offline

Action:

1. Verify cables, GBICs and connections along Fibre Channel path

2. Check Storage Automated Diagnostic Environment SAN Topology GUI to

identify failing segment of the data path

3. Verify correct FC switch configuration

FIGURE 3-7 A3/ B3 FC Link Storage Service Processor-Side Event

Site

Source

: FSDE LAB Broomfield CO

: diag.xxxxx.xxx.com

Severity : Normal

Category : Switch

Key: switch:100000c0dd00cbfe

EventType: StateChangeEvent.M.port.1

EventTime: 01/08/2002 18:28:40

’port.1’ in SWITCH diag-sw2a (ip=192.168.0.32) is now Not-Available

(status-state changed from ’Online’ to ’Offline’):

Info:

A port on the switch has logged out of the fabric and gone offline

Action:

1. Verify cables, GBICs and connections along Fibre Channel path

2. Check Storage Automated Diagnostic Environment SAN Topology GUI to

identify failing segment of the data path

3. Verify correct FC switch configuration

FIGURE 3-8 A3/ B3 FC Link Storage Service Processor-Side Event

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

▼ To Verify the Host Side

An error in the A3/ B3 FC link results in a device being listed as in an “unusable”

state in cfgadm, but no HBAs are listed as in the “unconnected” state in luxadm

output. The multipathing software will note an “offline” path.

CODE EXAMPLE 3-5 Devices in the “connected” state

# cfgadm -al

Ap_Id

c0::dsk/c0t0d0

c0::dsk/c0t1d0

Type

scsi-bus

disk

Receptacle

connected

Occupant

Condition

unknown

configured

unconfigured unknown

configured

configured unusable

unconfigured unknown

disk

scsi-bus

CD-ROM

fc-fabric

unknown

disk

fc-fabric

disk

unknown

fc-private

c1::dsk/c1t6d0

c2::210100e08b23fa25

c2::2b000060220041f4

unknown

c3::2b000060220041f9

c3::210100e08b230926

# luxadm -e port

Found path to 2 HBA ports

/devices/pci@6,4000/SUNW,qlc@2/fp@0,0:devctl

/devices/pci@6,4000/SUNW,qlc@3/fp@0,0:devctl

CONNECTED

# luxadm display

/dev/rdsk/c6t29000060220041F96257354230303052d0s2

DEVICE PROPERTIES for disk: /dev/rdsk/

c6t29000060220041F96257354230303052d0s2

<snip>

/devices/scsi_vhci/ssd@g29000060220041f96257354230303052:c,raw

Controller

Device Address

Class

/devices/pci@6,4000/SUNW,qlc@3/fp@0,0

2b000060220041f9,0

primary

State

OFFLINE

Controller

Device Address

Class

/devices/pci@6,4000/SUNW,qlc@2/fp@0,0

2b000060220041f4,0

primary

State

ONLINE

Chapter 3

Troubleshooting the Fibre Channel Links

For Internal Use Only

CODE EXAMPLE 3-6 VxDMPError Message

Jan 8 18:26:38 diag.xxxxx.xxx.com vxdmp: [ID 619769 kern.notice] NOTICE:

vxdmp: Path failure on 118/0x1f8

Jan 8 18:26:38 diag.xxxxx.xxx.com vxdmp: [ID 997040 kern.notice] NOTICE:

vxvm:vxdmp: disabled path 118/0x1f8 belonging to the dmpnode 231/0xd0

▼ To Verify the Storage Service Processor

You can check the A3/ B3 FC link using the Storage Automated Diagnostic

Environment, Diagnose—Test from Topology functionality. Storage Automated

Diagnostic Environment’s implementation of diagnostic tests verify the operation of

user-selected components. Using the Topology view, you can select specific tests,

subtests, and test options.

Refer to the Storage Automated Diagnostic Environment User’s Guide for more

information.

FRU Tests Available for the A3/ B3 FC Link

Segment

■ The Linktestis not available.

■ The switch and/ or GBIC - switchtesttest:

■

Can be used only in conjunction with the loopback connector.

■

Cannot be cabled to the virtualization engine while switchtestruns.

■ No virtualization engine tests are available at this time.

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

▼ To Isolate the A3/ B3 FC Link

1. Quiesce the I/O on the A3/B3 FC link path.

2. Break the connection by uncabling the link.

3. Insert the loopback connector into the switch port.

4. Run switchtest:

a. If the test fails, replace the GBIC and rerun switchtest.

b. If the test fails again, replace the switch.

5. If the switch or the GBIC show no errors, replace the remaining components in

the following order:

a. Replace the virtualization engine-side GBIC, recable the link, and monitor the

link for errors.

b. Replace the cable, recable the link, and monitor the link for errors.

c. Replace the virtualization engine, restore the virtualization engine settings,

recable the link, and monitor the link for errors

6. Return the path to production.

The procedures for restoring virtualization engine settings are in the Sun StorEdge

3900 and 6900 Series Reference Manual.

Chapter 3

Troubleshooting the Fibre Channel Links

For Internal Use Only

A4/ B4 Fibre Channel (FC) Link

If a problem occurs with the A4/ B4 FC link:

■ In a Sun StorEdge 3900 series system, the Sun StorEdge T3+ array will fail over.

■ In a Sun StorEdge 6900 series system, no Sun StorEdge T3+ array will fail over,

but a severe problem can cause a path to go offline.

and FIGURE 3-10 are examples of A4/ B4 Link Notification Events.

Site

Source

: FSDE LAB Broomfield CO

: diag.xxxxx.xxx.com

Severity : Warning

Category : Message

DeviceId : message:diag.xxxxx.xxx.com

EventType: LogEvent.driver.MPXIO_offline

EventTime: 01/29/2002 14:28:06

Found 2 ’driver.MPXIO_offline’ warning(s) in logfile: /var/adm/messages on

diag.xxxxx.xxx.com (id=80e4aa60):

<snip>

----------------------------------------------------------------------

Site

Source

: FSDE LAB Broomfield CO

: diag.xxxxx.xxx.com

Severity : Warning

Category : Message

DeviceId : message:diag.xxxxx.xxx.com

EventType: LogEvent.driver.Fabric_Warning

EventTime: 01/29/2002 14:28:06

Found 1 ’driver.Fabric_Warning’ warning(s) in logfile: /var/adm/messages on

diag.xxxxx.xxx.com (id=80e4aa60):

INFORMATION:

Fabric warning

<snip>

status of hba /devices/pci@a,2000/pci@2/SUNW,qlc@5/fp@0,0:devctl on

diag.xxxxx.xxx.com changed from CONNECTED to NOT CONNECTED

INFORMATION:

monitors changes in the output of luxadm -e port

Found path to 20 HBA ports

/devices/sbus@2,0/SUNW,socal@d,10000:0

NOT CONNECTED

FIGURE 3-9 A4/ B4 FC Link Data Host Notification

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

Site

Source

: FSDE LAB Broomfield CO

: diag

Severity : Warning

Category : Switch

DeviceId : switch:100000c0dd0061bb

EventType: LogEvent.MessageLog

EventTime: 01/29/2002 14:25:05

Change in Port Statistics on switch diag-sw1b (ip=192.168.0.31):

Port-1: Received 16289 ’InvalidTxWds’ in 0 mins (value=365972 )

----------------------------------------------------------------------

Site

Source

: FSDE LAB Broomfield CO

: diag

Severity : Warning

Category : T3message

DeviceId : t3message:83060c0c

EventType: LogEvent.MessageLog

EventTime: 01/29/2002 14:25:06

Warning(s) found in logfile: /var/adm/messages.t3 on diag (id=83060c0c):

Jan 29 14:12:58 t3b0 ISR1[2]: W: u2ctr ISP2100[2] Received LOOP DOWN async

event

Jan 29 14:13:32 t3b0 MNXT[1]: W: u1ctr starting lun 1 failover

---------------------------------------------------------------------

Site

Source

: FSDE LAB Broomfield CO

: diag

Severity : Warning

Category : T3message

DeviceId : t3message:83060c0c

EventType: LogEvent.MessageLog

EventTime: 01/29/2002 14:11:14

Warning(s) found in logfile: /var/adm/messages.t3 on diag (id=83060c0c):

Jan 29 14:05:18 t3b0 ISR1[1]: W: u2d4 SVD_PATH_FAILOVER: path_id = 0

Jan 29 14:05:18 t3b0 ISR1[1]: W: u2d5 SVD_PATH_FAILOVER: path_id = 0

Jan 29 14:05:18 t3b0 ISR1[1]: W: u2d6 SVD_PATH_FAILOVER: path_id = 0

Jan 29 14:05:18 t3b0 ISR1[1]: W: u2d7 SVD_PATH_FAILOVER: path_id = 0

Jan 29 14:05:18 t3b0 ISR1[1]: W: u2d8 SVD_PATH_FAILOVER: path_id = 0

Jan 29 14:05:18 t3b0 ISR1[1]: W: u2d9 SVD_PATH_FAILOVER: path_id = 0

FIGURE 3-10 Storage Service Processor Notification

Chapter 3

Troubleshooting the Fibre Channel Links

For Internal Use Only

▼ To Verify the Data Host

A problem in the A4/ B4 FC Link appears differently on the data host, depending on

if the array is a Sun StorEdge 3900 series or a Sun StorEdge 6900 seriesdevice.

Sun StorEdge 3900 Series

In a Sun StorEdge 3900 series device, the data host multipathing software is

responsible for initiating the failover and reports it in /var/adm/messages, such

as those reported by the Storage Automated Diagnostic Environment email

notifications.

The luxadmfailovercommand is used to fail the Sun StorEdge T3+ array LUNs

back to the proper configuration after the failing FRU is replaced. This command is

issued from the data host.

Sun StorEdge 6900 Series

In a Sun StorEdge 6900 series device, the virtualization engine pairs handle the

failover and the failover is not noted on the data host. All paths would remain

ONLINE and ACTIVE.

The mpdrivefailbackcommand is used, and is issued from the Storage Service

Processor.

Note – In the event of a complete sw1b or sw2b failure in a Sun StorEdge 6900

series configuration, the virtualization engine pairs handle the failover. In addition,

the multipathing software notes a path failure on the data host, Sun StorEdge Traffic

Manager or VxDMP takes the entire path that was connected to the failed switch

offline, and the ISL ports on the surviving switch go offline as well.

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

To verify the failover luxadmdisplay can be used, the failed path will be marked

OFFLINE, as shown in CODE EXAMPLE 3-7.

CODE EXAMPLE 3-7 Failed Path marked OFFLINE

# luxadm display /dev/rdsk/c26t60020F200000644>

DEVICE PROPERTIES for disk: /dev/rdsk/

c26t60020F20000064433C3352A60003E82Fd0s2

Status(Port A):

Status(Port B):

Vendor:

O.K.

SUN

Product ID:

WWN(Node):

WWN(Port A):

WWN(Port B):

Revision:

T300

50020f2000006443

50020f2300006355

50020f2300006443

0118

Serial Num:

Unsupported

Unformatted capacity: 488642.000 MBytes

Write Cache:

Read Cache:

Minimum prefetch:

Maximum prefetch:

Device Type:

Path(s):

Enabled

0x0

Disk device

/dev/rdsk/c26t60020F20000064433C3352A60003E82Fd0s2

/devices/scsi_vhci/ssd@g60020f20000064433c3352a60003e82f:c,raw

Controller

Device Address

Class

/devices/pci@a,2000/pci@2/SUNW,qlc@5/fp@0,0

50020f2300006355,1

primary

State

OFFLINE

Controller

Device Address

Class

/devices/pci@e,2000/pci@2/SUNW,qlc@5/fp@0,0

50020f2300006443,1

secondary

State

ONLINE

Note – This type of error may also cause the device to show up "unusable" in

cfgadm, as shown in CODE EXAMPLE 3-8.

Chapter 3

Troubleshooting the Fibre Channel Links

For Internal Use Only

CODE EXAMPLE 3-8 Failed Path marked “unusable”

# cfgadm -al

Ap_Id

ac0:bank0

ac0:bank1

c16

c18

c19

c1::dsk/c1t6d0

c20

c21

Type

memory

Receptacle

connected

empty

Occupant

configured

unconfigured unknown

configured unknown

unconfigured unknown

Condition

scsi-bus

CD-ROM

fc-private

fc-fabric

disk

connected

configured

unconfigured unknown

configured unknown

configured unusable

unknown

c21::50020f2300006355

FRU tests available for the A4/ B4 FC Link

Segment

■ The switchtestcan only be run from the Storage Service Processor

■ The linktestwill be able to isolate the switch and the GBIC on the switch. It

will not be able to isolate the cable or the Sun StorEdge T3+ array controller.

▼ To Isolate the A4/ B4 FC Link

1. Quiesce the I/O on the A4/B4 FC link path.

2. Run linktestfrom the Storage Automated Diagnostic Environment GUI to

isolate suspected failing components.

Alternatively, follow these steps:

1. Quiesce the I/O on the A4/B4 FC link path.

2. Run switchtestto test the entire link (re-create the problem).

3. Break the connnection by uncabling the link.

4. Insert the loopback connector into the switch port.

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

5. Rerun switchtest.

a. If switchtestfails, replace the GBIC and rerun switchtest.

b. If the test fails again, replace the switch.

6. If switchtestpasses, assume that the suspect components are the cable and the

Sun StorEdge T3+ array controller.

a. Replace the cable.

b. Rerun switchtest.

7. If the test fails again, replace the Sun StorEdge T3+ array controller.

8. Return the path to production.

9. Return the Sun StorEdge T3+ array LUNs to the correct controllers, if a failover

occured (determine if failovers occur using the luxadmfailoveror mpdrive

failbackcommands).

Chapter 3

Troubleshooting the Fibre Channel Links

For Internal Use Only

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

CHAPTER

Configuration Settings

This chapter contains the following sections:

■ “Verifying Configuration Settings” on page 47

■ “To Clear the Lock File” on page 50

For a complete listing of SUNWsecfg Error Messages and recommended action, refer

to Appendix B.

Verifying Configuration Settings

During the course of troubleshooting, you might need to verify configuration

settings on the various components in the Sun StorEdge 3900 or 6900 series.

▼ To Verify Configuration Settings

● Run one of the following scripts:

■ Use the /opt/SUNWsecfg/runsecfgscript and select the various Verify menu

selections.

■ Run the /opt/SUNWsecfg/bin/checkdefaultconfigscript to check all

accessible components. The output is shown in CODE EXAMPLE 4-1.

■ Run the checkswitch| checkt3config| checkve| checkvemapscripts

manually from /opt/SUNwsecfg/bin.

The scripts listed above check the default configuration files in the /opt/

SUNWsecfg/etcdirectory and compare the current, live settings to those of the

defaults. Any differences are marked with a FAIL.

Note – For cluster configurations and systems that are attached to Windows NT, the

default configurations may not match the current installed configuration. Be aware

of this when running the verification scripts. Certain items may be flagged as FAIL

in these special circumstances.

CODE EXAMPLE 4-1 /opt/SUNWsecfg/checkdefaultconfigoutput

# /opt/SUNWsecfg/checkdefaultconfig

Checking all accessible components.....

Checking switch: sw1a

Switch sw1a - PASSED

Checking switch: sw1b

Switch sw1b - PASSED

Checking switch: sw2a

Switch sw2a - PASSED

Checking switch: sw2b

Switch sw2b - PASSED

Please enter the Sun StorEdge T3+ array password :

Checking T3+: t3b0

Checking : t3b0 Configuration.......

Checking command ver

: PASS

Checking command vol stat

Checking command port list

Checking command port listmap

Checking command sys list

: PASS

: FAIL <-- Failure Noted

Checking T3+: t3b2

Checking : t3b2 Configuration.......

Checking command ver

: PASS

Checking command vol stat

Checking command port list

Checking command port listmap

Checking command sys list

<snip>

Checking Virtualization Engine Pair Parameters: v1a

v1a configuration check passed

Checking Virtualization Engine Pair Parameters: v1b

v1b configuration check passed

Checking Virtualization Engine Pair Configuration: v1

checkvemap: virtualization engine map v1 verification complete: PASS.

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002

10. If anything is marked FAIL, check the /var/adm/log/SEcfglogfile for the

details of the failure.

Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : ----------

-SAVED CONFIGURATION--------------.

Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : blocksize : 16k.

Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : cache : auto.

Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : mirror : auto.

Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : mp_support : rw.

Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : rd_ahead : off.

Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : recon_rate : med.

Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : sys memsize : 32

MBytes.

Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : cache memsize :

256 MBytes.

Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : .

Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : ----------

-CURRENT CONFIGURATION------------.

Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : blocksize : 16k.

Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : cache : auto.

Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : mirror : off.

Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : mp_support : rw.

Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : rd_ahead : off.

Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : recon_rate : med.

Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : sys memsize : 32

MBytes.

Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : cache memsize :

256 MBytes.

Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : .

Mon Jan 7 18:07:51 PST 2002 checkt3config: t3b0 INFO : ----------

In this example, the mirror setting in the Sun StorEdge T3+ array system settings is

“off.” The SAVED CONFIGURATION setting for this parameter, which is the default

setting, should be “auto.”

Chapter 4

Configuration Settings

For Internal Use Only

11. Fix the FAIL condition, and then verify the settings again.

# /opt/SUNWsecfg/bin/checkt3config -n t3b0

Checking : t3b0 Configuration.......

Checking command ver

: PASS

Checking command vol stat

Checking command port list

Checking command port listmap

Checking command sys list

If you interrupt any of the SUNWsecfgscripts (by typing a Control-Cdefault font,

for example), a lock file might remain in the /opt/SUNWsecfg/etcdirectory,

causing subsequent commands to fail. Use the following procedure to clear the lock

file.

▼ To Clear the Lock File

1. Type the following command:

# /opt/SUNWsecfg/bin/removelocks

usage : removelocks [-t|-s|-v]

where:

-t - remove all T3+ related lock files.

-s - remove all switch related lock files.

-v - remove all virtualization engine related lock files.

# /opt/SUNWsecfg/bin/removelocks -v

Note – After any virtualization engine configuration change, the script saves a new

copy of the virtualization engine map. This may take a minimum of two minutes,

during which time no additional virtualization engine changes are accepted.

2. Monitor the /var/adm/log/SEcfglogfile to see when the savevemapprocess

successfully exits.

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002

CODE EXAMPLE 4-2 savevemap output

Tue Jan 29 16:12:34 MST 2002 savevemap: v1 ENTER.

Tue Jan 29 16:12:34 MST 2002 checkslicd: v1 ENTER.

Tue Jan 29 16:12:42 MST 2002 checkslicd: v1 EXIT.

Tue Jan 29 16:14:01 MST 2002 savevemap: v1 EXIT.

When savevemap:<ve-pair>EXITis displayed, the savevemapprocess has

successfully exited.

Chapter 4

Configuration Settings

For Internal Use Only

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002

CHAPTER

Troubleshooting Host Devices

This chapter describes how to troubleshoot components associated with a Sun

StorEdge 3900 or 6900 series H ost.

This chapter contains the following sections:

■ “Using the Host Event Grid” on page 53

■ “To Replace the Master Host” on page 57

■ “To Replace the Alternate Master or Slave Monitoring Host” on page 58

Host Event Grid

The Storage Automated Diagnostic Environment Event Grid enables you to sort host

events by component, category, or event type. The Storage Automated Diagnostic

Environment GUI displays an event grid that describes the severity of the event,

whether action is required, a description of the event, and the recommended action.

Refer to the Storage Automated Diagnostic Environment User’s Guide for more

information.

▼ Using the Host Event Grid

1. From the Storage Automated Diagnostic Environment Help menu, click the Event

Grid link.

2. Select the criteria from the Storage Automated Diagnostic Environment event

grid, like the one shown in FIGURE 5-1.

FIGURE 5-1 Host Event Grid

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

TABLE 5-1 lists all the host events in the Storage Automated Diagnostic Environment.

Storage Automated Diagnostic Environment Event Grid for the Host

TABLE 5-1

Category

Component

EventType

Sev

Action

Description

Information

host

hba

Alarm+

Yellow

[ Info ] status of hba

/devices/

sbus@9,0/

Monitors changes in

the output of the

luxadm -eport.

SUNW,qlc@0,30000

/fp@0,0:devctl

diag.xxxxx.xxx.com

changed from NOT

CONNECTED to

CONNECTED

host

hba

Alarm-

Red

[ Info ] status of hba

/devices/

sbus@9,0/

SUNW,qlc@0,30000

/fp@0,0:devctl

• Monitors changes

in the output of the

luxadm -eport.

• Found path to 20

HBA ports.

diag.xxxxx.xxx.com

changed from

CONNECTED to

NOT CONNECTED

host

lun.t300

Alarm-

Red

[ Info ] The state of

lun.T300.c14t500

20F2300003EE5d0s

2.statusA on

diag.xxxxx.xxx.com

changed from OK to

ERROR

luxadmdisplay

reported a change in

the port status of

one of its paths. The

Storage Automated

Diagnostic

Environment then

tries to find to

which enclosure this

path corresponds by

reviewing its

(target=t3:diag244-

t3b0/ 90.0.0.40)

database of Sun

StorEdge T3+ arrays

and virtualization

engines.

Chapter 5

Troubleshooting Host Devices

For Internal Use Only

TABLE 5-1

Storage Automated Diagnostic Environment Event Grid for the Host (Continued)

host

lun.VE

Alarm-

Red

[ Info ] The state of

lun.VE.c14t50020

F2300003EE5d0s2.

statusA on

diag.xxxxx.xxx.com

changed from OK to

ERROR

luxadmdisplay

reported a change in

the port status of

one of its paths. The

Storage Automated

Diagnostic

Environment then

tries to find to

which enclosure this

path corresponds by

reviewing its

(target=ve:diag244-

ve0/ 90.0.0.40)

database of Sun

StorEdge T3+ arrays

and virtualization

engines.

host

ifptest

Diagnostic

Test-

Red

ifptest (diag240) on

host failed.

qlctest

Diagnostic

Test-

qlctest (diag240) on

host failed.

socaltest

enclosure

Diagnostic

Test-

socaltest (diag240) on

host failed.

PatchInfo

backup

[ Info ] New patch

and package

information

Send changes to the

output of showrev -

p and pkginfo -|.

generated.

host

enclosure

[ Info ] Agent Backup

Backup of the

configuration file of

the agent.

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

Replacing the Master, Alternate Master,

and Slave Monitoring Host

The following procedures are a high-level overview of the procedures that are

detailed in the Storage Automated Diagnostic Environment User’s Guide. Follow these

procedures when replacing a master, alternate master, or slave monitoring host.

Note – The procedures for replacing the master host are different from the

procedures for replacing an alternate master or slave monitoring host.

▼ To Replace the Master Host

Refer to Chapter 2 of the Storage Automated Diagnostic Environment User’s Guide for

detailed instructions for the next four steps.

1. Install the SUNWstadepackage on a new Master Host.

2. Run /opt/SUNWstade/bin/ras_installon the new Master Host.

3. Configure the Host as the Master Host.

4. Connect to the Master Server’s GUI at http://<servername>:7654.

Chapter 5

Troubleshooting Host Devices

For Internal Use Only

5. Choose Utilities -> System -> Recover Config.

Refer to Chapter 7 of the Storage Automated Diagnostic Environment User’s Guide for

detailed instructions.

a. In the Recover Config window, enter the IP address of any alternate master or

slave monitoring host (all hosts keep a copy of the configuration).

b. Make sure the Recover Config and Reset slave to this master checkboxes are

checked.

c. Click Recover.

6. Choose Maintenance -> General Maintenance.

Ensure that all host and device settings are recovered correctly.

Refer to Chapter 3 of the Storage Automated Diagnostic Environment User’s Guide for

detailed instructions.

7. Choose Maintenance -> General Maintenance -> Start/Stop Agent to start the

agent on the master host.

▼ To Replace the Alternate Master or Slave

Monitoring Host

1. Choose Maintenance -> General Maintenance -> Maintain Hosts.

Refer to Chapter 3, “Maintenance,” of the Storage Automated Diagnostic Environment

User’s Guide.

2. In the Maintain Hosts window, select the host to be replaced from the Existing

Hosts list, and click Delete.

3. Install the new host.

Refer to Chapter 2 of the Storage Automated Diagnostic Environment User’s Guide for

detailed instructions for the next four steps.

4. Install the SUNWstadepackage on the new host.

5. Run /opt/SUNWstade/bin/ras_install.

6. Configure the host as a slave.

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

7. Choose Maintenance -> General Maintenance -> Maintain Hosts.

Refer to Chapter 3, “Maintenance,” of the Storage Automated Diagnostic User’s Guide

for detailed instructions.

8. In the Maintain Hosts window, select the new host.

9. Configure the options as needed.

10. Choose Maintenance -> Topology Maintenance -> Topology Snapshot.

a. In the Topology Snapshot window, select the new host.

b. Click Create and Retrieve Selected Topologies.

c. Click Merge and Push Master Topology.

Conclusion

Any time a master, alternate master, or slave monitoring host is replaced, you must

recover the configuration using the procedures described above. This is especially

important when the Storage Service Processor is replaced as a FRU, whether the

Storage Service Processor is the master or the slave.

Chapter 5

Troubleshooting Host Devices

For Internal Use Only

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

CHAPTER

Troubleshooting Sun StorEdge FC

Switch-8 and Switch-16 Devices

This chapter describes how to troubleshoot the switch components associated with a

Sun StorEdge 3900 or 6900 series system.

This chapter contains the following sections:

■ “Sun StorEdge Network FC Switch-8 and Switch-16 Switch Description” on

page 61

■ “Switch Event Grid” on page 62

■ “setupswitch Exit Values” on page 68

■ “Replacing the Master Midplane” on page 68

Sun StorEdge Network FC Switch-8 and

Switch-16 Switch Description

The Sun StorEdge network FC switch-8 and switch-16 switches provide cable

consolidation and increased connectivity for the internal data interconnection

infrastructure.

The switches are paired to provide redundancy. Two switches are used in each Sun

StorEdge 3900 series, and four switches are used in each Sun StorEdge 6900 series.

Each Sun StorEdge network FC switch-8 and switch-16 switch is connected by way

of an Ethernet to the service network for management and service from the Storage

Service Processor.

These switches can be monitored through the SANSurfer GUI, which is available on

the Storage Service Processor. You configure and modify the switches using the

Configuration Utilities. Do not configure or modify the switches using any method other

than the SUNWsecfgtools.

▼ To Diagnose and Troubleshoot Switch Hardware

1. To diagnose and troubleshoot the switch hardware, begin by running the

SUNWsecfgcheckswitchutility.

2. For detailed troubleshooting procedures, refer to the Sun StorEdge SAN Field

Troubleshooting Guide, Release 3.0.

The Sun StorEdge SAN Field Troubleshooting Guide, Release 3.0 describes how to

diagnose and troubleshoot the switch hardware. The scope of this document

includes the Sun StorEdge network FC switch-8 and switch-16 switch and the

interconnections (HBA, GBIC, cables) on either side of the switch. In addition, the

document provides examples of fault isolation and includes a Brocade switch

appendix.

Switch Event Grid

The Storage Automated Diagnostic Environment Event Grid enables you to sort

switch events by component, category, or event type. The Storage Automated

Diagnostic Environment GUI displays an event grid that describes the severity of the

event, whether action is required, a description of the event, and the recommended

action. Refer to the Storage Automated Diagnostic Environment User’s Guide for more

information.

▼ Using the Switch Event Grid

1. From the Storage Automated Diagnostic Environment Help menu, click the Event

Grid link.

2. Select the criteria from the Storage Automated Diagnostic Environment event

grid, like the one shown in FIGURE 6-1.

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

FIGURE 6-1 Switch Event Grid

Chapter 6

Troubleshooting Sun StorEdge FC Switch-8 and Switch-16 Devices

For Internal Use Only

TABLE 6-1 lists the switch events.

TABLE 6-1

Cat

Storage Automated Diagnostic Environment Event Grid for Switches

Component

EventType

Sev

Action

Description

Information/Action

switch port statistics

Log

Yellow

[ Info/ Action ]

Information: The

switch has reported

a change in an error

counter. This could

indicate a failing

component in the

link.

Change in port

statistics on switch

diag156-sw1b

(ip=192.168.0.31)

Action:

Check the Topology

GUI for any link

errors.

Run linktest on the

link to isolate the

failing FRU. Quiesce

I/ O on the link

before running

linktest.

switch chassis.fan

Alarm

Yellow

chassis.fan.1 status

changed from OK

switch chassis.power Alarm

[ Info ]

This event monitors

chassis.power.1 status changes in the

changed from OK status of the chassis’

power supply, as

reported by SANbox

chassis_status.

switch chassis.temp

Alarm

Yellow

[ Info ] chassis.temp.1 This event monitors

status changed from

changes in the

status of the chassis’

temperature supply,

as reported by

SANbox

chassis_status.

switch chassis.zone

Alarm

[ Info ] Switch sw1a

was rezoned: [ new

zones ...]

This event reports

changes in the

zoning of a switch.

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

TABLE 6-1

Cat

Storage Automated Diagnostic Environment Event Grid for Switches (Continued)

Component

EventType

Sev

Action

Description

Information/Action

switch enclosure

Audit

Auditing a new

switch called ras

d2-swb1

(ip=xxx.0.0.41)

10002000007a609

switch oob

Comm_

Established

Communication

regained with sw1a

(ip=xxx.20.67.213)

Comm_Lost

Down

Yes

[ Info/ Action ] Lost

communication with

sw1a

Information:

Ethernet

connectivity to the

switch has been lost.

(ip=xxx.20.67.213)

Recommended

action:

1. Check Ethernet

connectivity to

the switch.

2. Verify that the

switch is booted

correctly with no

POST errors.

3. Verify that the

switch Test Mode

is set for normal

operations.

4. Verify the TCP/

IP settings on

switch via Forced

PROM Mode

access.

5. Replace switch, if

needed.

switch switchtest

Diagnostic

Test-

Red

switchtest (diag240)

on d2-swb1

(ip=xxx.0.0.41)

10002000007a609

Chapter 6

Troubleshooting Sun StorEdge FC Switch-8 and Switch-16 Devices

For Internal Use Only

TABLE 6-1

Cat

Storage Automated Diagnostic Environment Event Grid for Switches (Continued)

Component

EventType

Sev

Action

Description

Information/Action

switch enclosure

Discovery

[ Info ] Discovered a

new switch called ras

d2-swb1

(ip=xxx.0.0.41)

10002000007a609

Discovery events

occur the very first

time the agent

probes a storage

device. It creates a

detailed description

of the device

monitored and

sends it using any

active notifier

(NetConnect,

Email).

switch enclosure

LocationChan

Location of switch

rasd2-swb0 (ip

xxx.0.0.40) was

changed

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

TABLE 6-1

Cat

Storage Automated Diagnostic Environment Event Grid for Switches (Continued)

Component

EventType

Sev

Action

Description

Information/Action

switch port

StateChange+

[ Info/ Action ] port.1 Port on switch is

in SWITCH diag185

(ip=

now available.

xxx.20.67.185) is

now Available (status-

state changed from

OFFLINE to ONLINE)

switch port

StateChange-

Red

[ Info/ Action ] port.1 Information: A port

in SWITCH diag185 on the switch has

(ip=xxx.20.67.185) logged out of the

is now Not-Available

(status state changed

from ONLINE to

OFFLINE)

Fabric and has gone

offline.

Recommended

action:

1. Verify cables,

GBICs, and

connections

along the Fibre

Channel path.

2. Check Storage

Automated

Diagnostic

Environment

SAN Topology

GUI to identify

failing segment

of the data path.

3. Verify the correct

FC switch

configuration.

switch enclosure

Statistics

[ Info ] Statistics

about switch d2-

swb1

Port Statistics

(ipxxx.0.0.41)

10002000007a609

Chapter 6

Troubleshooting Sun StorEdge FC Switch-8 and Switch-16 Devices

For Internal Use Only

Replacing the Master Midplane

Follow this procedure when replacing the master midplane in a Sun StorEdge

network FC switch-8 or switch-16 switch or a Brocade Silkworm switch. This

procedure is detailed in the Storage Automated Diagnostic Environment User’s Guide.

▼ To Replace the Master Midplane

1. Choose Maintenance --> General Maintenance -- > Maintain Devices.

Refer to Chapter 3 of the Storage Automated Diagnostic Environment User’s Guide.

2. In the Maintain Devices window, delete the device that is to be replaced.

3. Choose Maintenance -- > General Maintenance -- > Discovery.

4. In the Device Discovery window, rediscover the device.

5. Choose Maintenance -- > Topology Maintenance -- > Topology Snapshot.

a. Select the host that monitors the replaced FRU.

b. Click Create and Retrieve Selected Topologies.

c. Click Merge and Push Master Topology.

Conclusion

Any time a master midplane is replaced, you must rediscover the device using the

procedure described above. This is especially important when the Storage Service

Processor is replaced as a FRU, whether the Storage Service Processor is the master

or the slave.

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

CHAPTER

Troubleshooting Virtualization

Engine Devices

This chapter describes how to troubleshoot the virtualization engine component of a

Sun StorEdge 6900 series system.

This chapter contains the following sections:

■ “Virtualization Engine Description” on page 69

■ “Translating Host Device Names” on page 78

■ “Sun StorEdge 6900 Series Multipathing Example” on page 89

■ “Virtualization Engine Event Grid” on page 95

Virtualization Engine Description

The virtualization engine supports the multipathing functionality of the Sun

StorEdge T3+ array. Each virtualization engine has physical access to all underlying

Sun StorEdge T3+ arrays and controls access to half of the Sun StorEdge T3+ arrays.

The virtualization engine has the ability to assume control of all arrays in the event

of component failure. The configuration is maintained between virtualization engine

pairs through redundant T Port connections by way of a pair of Sun StorEdge

network FC switch-8 or switch-16 switches.

Virtualization Engine Diagnostics

The virtualization engine monitors the following components:

■ Virtualization engine router

■ Sun StorEdge T3+ array

■ Cabling among the router and storage

Service Request Numbers

The service request numbers are used to inform the user of storage subsystem

activities.

Service and Diagnostic Codes

The virtualization engine’s service and diagnostic codes inform the user of

subsystem activities. The codes are presented as a LED readout. See Appendix A for

the table of codes and actions to take. In some cases, you might not be able to receive

Service Request Numbers (SRNs) because of communication errors. If this occurs,

you must read the virtualization engine LEDs to determine the problem.

▼ To Retrieve Service Information

You can retrieve service information in two ways:

■ CLI Interface

■ Error Log Analysis Commands

Both of these methods are described in the following sections.

CLI Interface

The SLIC daemon, which runs on the Storage Service Processor, communicates with

the virtualization engine. The SLIC daemon periodically polls the virtualization

engine for all subsystem errors and for topology changes. It then passes this

information in the form of an SRN to the Error Log file.

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002

▼ To Display Log Files and Retrieve SRNs

Use the /opt/svengine/sduc/sreadlogcommand to display log files and

retrieve the Service Request Numbers (SRN) for errors that need action. Data is

returned in the following format:

TimeStamp:nnn:Txxxxx.uuuuuuuu SRN=mmmmm

Item

Description

TimeStamp

nnn

Time and date when error occurred

The name of the virtualization engine pair (v1 or v2)

The LUN where the error occurred.

Txxxxx

Note: Txxxxx can represent a physical or a logical LUN.

uuuuuuuu

The unique ID of the drive or the virtualization engine router

The SRN defined in numerical order

SRN=mmmmm

Example

# /opt/svengine/sduc/sreadlog -d v1

2002:Jan:3:10:13:05:v1.29000060-220041F9.SRN=70030

2002:Jan:3:10:13:31:v1.29000060-220041F9.SRN=70030

2002:Jan:3:10:17:10:v1.29000060-220041F9.SRN=70030

2002:Jan:3:10:17:37:v1.29000060-220041F9.SRN=70030

2002:Jan:3:10:22:26:v1.29000060-220041F9.SRN=70030

2002:Jan:3:10:25:54:v1.29000060-220041F9.SRN=70030

Chapter 7

Troubleshooting Virtualization Engine Devices

For Internal Use Only

Item

Description

TimeStamp

nnn

January 3, 2002 10:13

v1 (virtualization engine pair v1)

uuuuuuuu

29000060-220041F9 (v1a, obtained by checking the virtualization

engine map from the SEcfg utility)

SRN=mmmmm

SRN=70030: SAN Configuration Changed

(Refer to Appendix A for codes.)

▼ To Clear the Log

● Use the /opt/svengine/sduc/sclrlogcommand.

Virtualization Engine LEDs

TABLE 7-1 describes the LEDs on the back of the virtualization engine..

TABLE 7-1

LED

Virtualization Engine LEDs

Color

State

Description

Power

Green

Solid on

The virtualization engine is powered

Status

Green

• Solid on

• Normal operating mode

• Blink Service

Code

• Number of blinks to indicate a

decimal n umber

Fault

Amber

Serious problem

Decipher the blinking of the Status

LED to determine the service code.

Once you have determined the

service code, look up the decimal

number of the service code in

Appendix A.

1 The Status LED will blink a service code when the Fault LED is Solid on.

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002

Power LED Codes

The virtualization engine LEDs are shown in FIGURE 7-1.

FIGURE 7-1 Virtualization Engine Front Panel LEDs

Interpreting LED Service and Diagnostic Codes

The Status LED communicates the status of the virtualization engine in decimal

numbers. Each decimal number is represented by number of blinks, followed by a

medium duration (two seconds) of LED off. TABLE 7-2 lists the status LED code

descriptions.

TABLE 7-2

LED Service and Diagnostic Codes

Fast blink

LED blinks once

LED blinks twice with one short duration (one second) between blinks

LED blinks three times with one short duration (one second) between blinks

...

LED blinks ten times with one short duration (one second) between blinks

The blink code repeats continuously, with a four-second off interval between code

sequences.

Chapter 7

Troubleshooting Virtualization Engine Devices

For Internal Use Only

Back Panel Features

The back panel of the virtualization engine contains the Sun StorEdge network FC

switch-8 or switch-16 switches and a socket for the AC power input, and various

data ports and LEDs.

Ethernet Port LEDs

The Ethernet port LEDs indicate the speed, activity, and validity of the link, shown

in TABLE 7-3.

TABLE 7-3

LED

Speed, Activity, and Validity of the Link

Color

State

Description

Speed

Amber

Solid On

The link is 100Base-TX

Off

The link is 10base-T

Link Activity

Green

Solid On

A valid link is established

Blink

Normal operations, including data

activity

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002

Fibre Channel Link Error Status Report

The virtualization engine’s host-side and device-side interfaces provide statistical

data for the counts listed in TABLE 7-4.

TABLE 7-4

Virtualization Engine Statistical Data

Count Type

Description

Link Failure Count

The number of times the virtualization engine’s frame manager

detects a non-operational state or other failure of N_Port

initialization protocol.

Loss of

Synchronization

Count

The number of times that the virtualization engine detects a loss in

synchronization.

Loss of Signal Count The number of times that the virtualization engine’s frame manager

detects a loss of signal.

Primitive Sequence

Protocol Error

The number of times that the virtualization engine’s frame manager

detects N_Port protocol errors.

Invalid Transmission

Word

The number of times that the virtualization engine’s 8b/ 10b

decoder does not detect a valid 10-bit code.

Invalid CRC Count

The number of times that the virtualization engine receives frames

with a bad CRC and a valid EOF. A valid EOF includes EOFn, EOFt,

or EOFdti.

Chapter 7

Troubleshooting Virtualization Engine Devices

For Internal Use Only

▼ To Check Fibre Channel Link Error Status

Manually

The Storage Automated Diagnostic Environment, which runs on the Storage Service

Processor, monitors the Fibre Channel link status of the virtualization engine. The

virtualization engine must be power-cycled to reset the counters. Therefore, you

should manually check the accumulation of errors between a fixed period of time. To

check the status manually, follow these steps:

1. Use the svstatcommand to take a reading, as shown in CODE EXAMPLE 7-1.

A Status report for the host-side and device-side ports is displayed.

2. Within the next few minutes, take another reading.

The number of new errors that occurred within that time frame represents the

number of link errors.

Note – If the t3ofdg(1M) is running while you perform these steps, the following

error message is displayed:

Daemon error: check the SLIC router.

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002

CODE EXAMPLE 7-1 Fibre Channel Link Error Status Example

# /opt/svengine/sduc/svstat -d v1

I00001 Host Side FC Vital Statistics:

Link Failure Count

Loss of Sync Count

Loss of Signal Count

Protocol Error Count

Invalid Word Count

Invalid CRC Count

I00001 Device Side FC Vital Statistics:

Link Failure Count

Loss of Sync Count

Loss of Signal Count

Protocol Error Count

Invalid Word Count

Invalid CRC Count

139

I00002 Host Side FC Vital Statistics:

Link Failure Count

Loss of Sync Count

Loss of Signal Count

Protocol Error Count

Invalid Word Count

Invalid CRC Count

I00002 Device Side FC Vital Statistics:

Link Failure Count

Loss of Sync Count

Loss of Signal Count

Protocol Error Count

Invalid Word Count

Invalid CRC Count

135

diag.xxxxx.xxx.com: root#

Note – v1 represents the first virtualization engine pair

Note – The SLICdaemon must be running for the

/opt/svengine/sduc/svstat -d v1command to work.

Chapter 7

Troubleshooting Virtualization Engine Devices

For Internal Use Only

Translating Host Device Names

You can translate host device names to VLUN, disk pool, and physical Sun StorEdge

T3+ array LUNs.

The luxadmoutput for a host device, shown in CODE EXAMPLE 7-2, does not include

the unique VLUN serial number that is needed to identify this LUN.

CODE EXAMPLE 7-2 luxadmOutput for a Host Device

# luxadm display /dev/rdsk/c4t2B00006022004186d0s2

DEVICE PROPERTIES for disk: /dev/rdsk/c4t2B00006022004186d0s2

Status(Port A):

Vendor:

O.K.

SUN

Product ID:

WWN(Node):

WWN(Port A):

Revision:

SESS01

2a00006022004186

2b00006022004186

080E

Serial Num:

Unsupported

Unformatted capacity: 56320.000 MBytes

Write Cache:

Read Cache:

Minimum prefetch:

Maximum prefetch:

Device Type:

Path(s):

Enabled

0x0

Disk device

/dev/rdsk/c4t2B00006022004186d0s2

/devices/pci@1f,4000/pci@2/SUNW,qlc@5/fp@0,0/

ssd@w2b00006022004186,0:c,raw

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002

▼ To Display the VLUN Serial Number

Devices That Are Not Sun StorEdge Traffic Manager-Enabled

1. Use the format -ecommand.

2. Type the disk on which you are working at the formatprompt.

3. Type inquiryat the scsiprompt.

4. Find the VLUN serial number in the Inquirydisplayed list.

# format -e c4t2B00006022004186d0

format> scsi

...

scsi> inquiry

Inquiry:

00 00 03 12 2b 00 00 02 53 55 4e 20 20 20 20 20

53 45 53 53 30 31 20 20 20 20 20 20 20 20 20 20

30 38 30 45 62 57 33 4b 30 30 31 48 30 30 30

....+...SUN

SESS01

080EbW3K001H000

Vendor:

Product:

Revision:

Removable media:

Device type:

SUN

SESS01

080E

From this screen, note that the VLUN number is 62 57 33 4b 30 30 31 48, beginning

with the 5th pair of numbers on the 3rd line, up to and including the 12 pair.

Chapter 7

Troubleshooting Virtualization Engine Devices

For Internal Use Only

Sun StorEdge Traffic Manager-Enabled Devices

1. If the devices support the Sun StorEdge Traffic Manager software, you can use this

shortcut.

2. Type:

# luxadm display /dev/rdsk/c6t29000060220041956257334B30303148d0s2

DEVICE PROPERTIES for disk: /dev/rdsk/

c6t29000060220041956257334B30303148d0s2

Status(Port A):

Status(Port B):

Vendor:

O.K.

SUN

Product ID:

WWN(Node):

WWN(Port A):

WWN(Port B):

Revision:

SESS01

2a00006022004195

2b00006022004195

2b00006022004186

080E

Serial Num:

Unsupported

Unformatted capacity: 56320.000 MBytes

Write Cache:

Read Cache:

Minimum prefetch:

Maximum prefetch:

Device Type:

Path(s):

Enabled

0x0

Disk device

/dev/rdsk/c6t29000060220041956257334B30303148d0s2

/devices/scsi_vhci/ssd@g29000060220041956257334b30303148:c,raw

Controller

Device Address

Class

/devices/pci@1f,4000/SUNW,qlc@4/fp@0,0

2b00006022004195,0

primary

State

ONLINE

Controller

Device Address

Class

/devices/pci@1f,4000/pci@2/SUNW,qlc@5/fp@0,0

2b00006022004186,0

primary

State

ONLINE

The /dev/rdsk/c#t#represents the Global Unique Identifier of the device. It is 32

bits long.

■ The first 16 bits correspond to the WWN of the master virtualization engine

router.

■ The remaining 16 bits are a the VLUN serial number.

■

Virtualization engine WWN = 2900006022004195

VLUN serial number = 6257334B30303148

■

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002

▼ To View the Virtualization Engine Map

The virtualization engine map is stored on the Storage Service Processor.

1. To view the virtualization engine map, type:

# showvemap -n v1 -f

VIRTUAL LUN SUMMARY

Disk pool VLUN Serial

Number

MP Drive VLUN

Target Target

VLUN

Name

Size

Slic Zones

---------------------------------------------------------------------------

t3b00

6257334B30303148

6257334B30303149

T49152

T16384

T16385

VDRV000

VDRV001

55.0

*****

DISK POOL SUMMARY

Disk pool RAID MP Drive Size Free Space T3+ Active

Target GB GB Path WWN

Number of

VLUNs

-----------------------------------------------------------------------

t3b00

t3b01

T49152

T49153

116.7

6.7

116.7

50020F2300006DFA

50020F230000725B

*****

MULTIPATH DRIVE SUMMARY

Disk pool MP Drive T3+ Active

Target Path WWN

Controller Serial

Number

-------------------------------------------------------

t3b00

t3b01

*****

T49152

T49153

50020F2300006DFA 60020F2000006DFA

50020F230000725B 60020F2000006DFA

VIRTUALIZATION ENGINE SUMMARY

Initiator UID

VE Host Online Revision Number of SLIC Zones

--------------------------------------------------------------------------

I00001

I00002

*****

2900006022004195 v1a

2900006022004186 v1b

Yes

08.14

ZONE SUMMARY

Zone Name

HBA WWN

Initiator Online Number of VLUNs

---------------------------------------------------------------------

Undefined

210000E08B033401 I00001

210000E08B026C0F I00002

Yes

Note – This example uses the virtualization engine map file, which could include

old information.

Chapter 7

Troubleshooting Virtualization Engine Devices

For Internal Use Only

2. You can optionally establish a telnet connection to the virtualization engine and

run the runsecfgutility to poll a live snapshot of the virtualization engine map.

Refer to “To Replace a Failed Virtualization Engine” on page 84 for telnet

instructions.

Determining the virtualization engine pairs on the system .........

MAIN MENU - SUN StorEdge 6910 SYSTEM CONFIGURATION TOOL

1) T3+ Configuration Utility

2) Switch Configuration Utility

3) Virtualization Engine Configuration Utility

4) View Logs

5) View Errors

6) Exit

Select option above:> 3

VIRTUALIZATION ENGINE MAIN MENU

1) Manage VLUNs

2) Manage Virtualization Engine Zones

3) Manage Configuration Files

4) Manage Virtualization Engine Hosts

5) Help

6) Return

Select option above:> 3

MANAGE CONFIGURATION FILES MENU

1) Display Virtualization Engine Map

2) Save Virtualization Engine Map

3) Verify Virtualization Engine Map

4) Help

5) Return

Select configuration option above:> 1

Do you want to poll the live system (time consuming) or view the file [l|f]: l

From the virtualization engine map output, you can match the VLUN serial number

to the VLUN name (VDRV000), the disk pool (t3b00) and the MP drive target

(T49152). This information can also help you find the controller serial number

(60020F2000006DFA), which you need to perform Sun StorEdge T3+ array LUN

failback commands.

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002

▼ To Failback the Virtualization Engine

In the event of a Sun StorEdge T3+ array LUN failover, use the following procedure

to fail the LUN back to its original controller.

1. From the Storage Service Processor, type:

# /opt/svengine/sduc/mpdrive failback -d v1 -j 60020F2000006DFA

where:

-d

-j

Virtualization engine pair on which to run the command

Controller serial number, which corresponds to the Sun

StorEdge T3+ array WWN of the affected partner pair

The failback command will always be performed on the controller serial number,

regardless by which controller the LUN actually is currently owned (the Master or

Alt-Master). All VLUNS are affected by a failover and failback of the underlying

physical LUN.

The controller serial number is the system WWN for the Sun StorEdge T3+ array. In

the above example, the master Sun StorEdge T3+ array WWN is

50020F2300006DFA, and the number used in the failback command is

60020F2000006DFA.

2. The SLIC daemon must be running for the mpdrivefailbackcommand to

work. Ensure that the SLICdaemon is running by using the command found in

CODE EXAMPLE 7-3.

If no SLICprocesses are running, you can start them manually using the

SUNWsecfgscripts, which are located in the /opt/SUNWsecfg/bin/startslicd

-n v1directory.

CODE EXAMPLE 7-3 slicdOutput Example

# ps -ef | grep slic

root 6299 6295 0

root 6296 6295 0

Jan 04 ?

0:00 ./slicd

0:02 ./slicd

0:01 ./slicd

0:00 ./slicd

0:03 ./slicd

root 6295

1 0

root 6357 6295 0

root 6362 6295 0

Chapter 7

Troubleshooting Virtualization Engine Devices

For Internal Use Only

For detailed information about the SUNWsecfgscripts, refer to the Sun StorEdge

3900 and 6900 Series Reference Manual.

▼ To Replace a Failed Virtualization Engine

1. Replace the old (failed) virtualization engine unit with a new unit.

2. Identify the MAC address of the new unit and replace the old MAC address with

the new one in the /etc/ethersfile:

8:0:20:7d:82:9e virtualization engine-name

3. Verify that RARP is running on the Storage Service Processor.

4. Disable the switch port:

# /opt/SUNWsecfg/flib/setveport -v VE-name -d

5. Power on the new unit.

6. Log in to the new unit, for example:

# telnet v1a virtualization engine-name

7. From the User Service Utility Menu, enter 9 to clear the SAN database.

8. Choose Quitto clear the SAN database.

9. Configure the new unit:

# setupve -n virtualization engine-name

10. Check the configuration:

# checkve -n virtualization engine-name

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002

11. Enable the switch port:

# /opt/SUNWsecfg/flib/setveport -v virtualization engine-name -e

12. Reset the virtualization engine:

# resetve -n virtualization engine-name

13. Find the initiator number for the new and old number:

# showvemap -n virtualization engine-pairname -l

The new unit will not have any zones defined.

14. If zones were present before the replacement, type the following:

# restorevemap -n virtualization engine pair -z \

-c old-ve-initiator-number -d new-ve-initiator-number

15. Verify the new unit by typing:

# showvemap -n virtualization engine-pairname -l

Chapter 7

Troubleshooting Virtualization Engine Devices

For Internal Use Only

▼ To Manually Clear the SAN Database

It is occasionally necessary to manually clear the SAN database on the virtualization

engine routers.

Caution – This procedure will wipe out the SAN database and will remove the

configuration of disk pools, Multipath drives, Zoning, and VLUNs. After performing

this procedure, the virtualization map must be restored to the virtualization engine

pair using /opt/SUNWsecfg/bin/restorevemap. This requires a valid copy of

the /opt/SUNWsecfg/etc/v1.san or v2.sanfile.

▼ To Reset the SAN Database on Both

Virtualization Engines

● Type:

# resetsandb -n vepair command

▼ To Reset the SAN Database on a Single

Virtualization Engine

1. Disconnect the virtualization engine device side FC cables.

2. Telnet to the first virtualization engine in the pair.

3. Enter the password.

The User Service Utility Menu is displayed.

4. Enter 9 to clear the SAN database.

■ *A successful command will display the message

SAN database has been cleared!

■ *An unsuccessful command will result in the service code 051.

If this occurs, repeat steps 1-3.

■

If the command continues to fail, replace the virtualization engine.

5. Reconnect the virtualization engine device side FC cables.

6. Enter B to Warm Reboot both virtualization engines.

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002

Stopping and Restarting the SLIC Daemon

Follow this procedure to restart the SLICdaemon if the SLICdaemon becomes

unresponsive, or if messages such as the following are displayed:

connect: Connection refusedor Socket error encountered..

▼ To Restart the SLICDaemon

1. Check whether the SLICDis running:

# ps -eaf | grep slicd

2. Check for any message queues, shared memory, or semaphores still in use:

# ipcs

IPC status from <running system> as of Wed Feb 20 12:48:30 MST 2002

KEY

MODE

OWNER

GROUP

Message Queues:

Shared Memory:

301

302

303

0x50000483 --rw-r--r--

0x5555aa8a --rw-------

0x5555aaaa --rw-------

0x5555aaba --rw-------

root

other

root

0x7cc

--rw-------

Semaphores:

196608

196609

196610

0x5555aa9a --ra-------

0x5555aa7a --ra-------

0x5555aaba --ra-------

root

other

root

0x10e1

--ra-------

Segments identified with 0x5555aa in the address are associated with the SLIC

daemon.

Chapter 7

Troubleshooting Virtualization Engine Devices

For Internal Use Only

3. Remove the segments by typing the following:

# ipcrm -m 301 -m 302 -m 303 -s 196608 -s 196609 -s 196610

Check the ipcrm(1m) man page for details.

4. Restart the SLICdaemon

# /opt/SUNWsecfg/bin/startslicd -n v1 *

(or v2, depending on configuration)

5. Confirm that the SLICdaemon is running:

# ps -eaf | grep slicd

root 16132 16130 0 11:45:00 ?

root 16135 16130 0 11:45:00 ?

0:00 ./slicd

0:00 grep slicd

0:00 ./slicd

root 16130

1 0 11:45:00 ?

root 16131 16130 0 11:45:00 ?

root 16189 15877 0 11:48:49 pts/1

root 16143 16130 0 11:45:00 ?

The message queues, shared memory, and semaphores have been removed.

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002

Sun StorEdge 6900 Series Multipathing

Example

One Sun StorEdge T3+ array partner p air with 1

500GB RAID 5 LUN per brick (2 LUNs total)

Currently, there is one 10GB VLUN created from each physical LUN, for a total of

two VLUNs. In a Sun StorEdge 6900 series, there are four possible physical paths to

each Sun StorEdge T3+ array Volume (LUN). Refer to FIGURE 7-4 and FIGURE 7-3.

For example, to access the LUN on the Alt-Master, the Sun StorEdge T3+ array I/ O

could travel:

■ From HBA-0 -> Switch -> SVE(1) -> Switch -> Alt-Master Controller (Primary

Route from HBA-0)

■ From HBA-0 -> Switch -> SVE(1) -> Switch -> Switch -> Master Controller ->

Backend Loop to Alt-Master (Secondary Route from HBA-0)

■ From HBA-1 -> Switch -> SVE(2) -> Switch -> Switch -> Alt-Master Controller

(Primary Route from HBA-1)

■ From HBA-1 -> Switch -> SVE(2) -> Switch -> Master Controller -> Backend Loop

to Alt-Master (Secondary Route from HBA-1)

The virtualization engine recognizes the primary (active) and secondary (passive)

pathing for the LUNs and routes the I/ O to the primary controller, unless there is a

pathing failure to the primary path. In this case, the virtualization engine initiates a

LUN failover and routes the I/ O through the secondary path (which, in turn, goes

through the interconnect cables). Refer to FIGURE 7-6.

The host, using multipathing software, is presented two primary (active) paths for

each LUN, allowing the host to route I/ O through either or both HBAs.

In the event of a path failure before the second tier of Sun StorEdge network FC

switch-8 and switch-16 switches (refer to FIGURE 7-5), one of the paths is disabled,

but the other path continues sending I/ O as normal and takes over the entire load.

No Sun StorEdge T3+ array LUN failure is noted because of the redundant path by

way of the Sun StorEdge network FC switch-8 and switch-16 switch T Ports.

Chapter 7

Troubleshooting Virtualization Engine Devices

For Internal Use Only

In the event of a path failure after the second tier of Sun StorEdge network FC

switch-8 and switch-16 switches (or in the event of both T Ports failing between the

switches), the virtualization engines force a LUN failover of the affected Sun

StorEdge T3+ array and routes all I/ O to its secondary path. From the host side,

nothing has changed; all I/ O is routed through both HBAs (refer to FIGURE 7-6).

FIGURE 7-2 Sun StorEdge 6900 Series Logical View

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002

FIGURE 7-3 Primary Data Paths to the Alternate Master

Chapter 7

Troubleshooting Virtualization Engine Devices

For Internal Use Only

FIGURE 7-4 Primary Data Paths to the Master Sun StorEdge T3+ Array

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002

FIGURE 7-5 Path Failure—Before the Second Tier of Switches

Chapter 7

Troubleshooting Virtualization Engine Devices

For Internal Use Only

FIGURE 7-6 Path Failure —I/ O Routed through Both HBAs

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002

Virtualization Engine Event Grid

The Storage Automated Diagnostic Environment Event Grid enables you to sort

virtualization engine events by component, category, or event type. The Storage

Automated Diagnostic Environment GUI displays an event grid that describes the

severity of the event, whether action is required, a description of the event, and the

recommended action. Refer to the Storage Automated Diagnostic Environment User’s

Guide Help section for more information.

▼ Using the Virtualization Engine Event Grid

1. From the Storage Automated Diagnostic Environment Help menu, click the Event

Grid link.

2. Select the criteria from the Storage Automated Diagnostic Environment event

grid, like the one shown in FIGURE 7-7.

FIGURE 7-7 Virtualization Engine Event Grid

Chapter 7

Troubleshooting Virtualization Engine Devices

For Internal Use Only

TABLE 7-5 lists the Virtualization Engine Events.

TABLE 7-5

Category

Storage Automated Diagnostic Environment Event Grid for Virtualization Engine

Component

EventType

Sev

Action

Description

virtualization

engine

enclosure

Alarm

Yellow

Volume E00012 on

v1a changed

mapping.

virtualization

engine

enclosure

Alarm.log

Audit

Yellow

Change in Port

Statistics on

virtualization

engine v1a

virtualization

engine

[ Info ] Auditing a

Virtualization

Information:

Audits occur every

week and send a

detailed description of

the enclosure to the

Sun Network Storage

Command Center

(NSCC)

Engine called v1a

virtualization

engine

oob

Comm_

Established

Communication

regained with

virtualization

engine v1a

virtualization

engine

Comm_

Lost

Down

[ Info/ Action ]

Lost

Information:

Ethernet connectivity

to the virtualization

communication

with virtualization engine unit has been

engine v1a

lost.

Recommended action:

1. Check Ethernet

connectivity to the

virtualization

engine.

2. Make sure the

virtualization

engine is boosted

correctly.

3. Verify that the

TCP/ IP settings on

the virtualization

engine are correct.

4. Replace the

virtualization

engine if necessary.

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002

TABLE 7-5

Category

Storage Automated Diagnostic Environment Event Grid for Virtualization Engine (Continued)

Component

EventType

Sev

Action

Description

virtualization

engine

ve_diag

Diagnostic

Test-

Red

ve_diag (diag240)

on ve-1

(ip=xxx.20.67.213)

failed

virtualization

engine

veluntest

enclosure

Diagnostic

Test-

Red

veluntest

(diag240) on ve-1

(ip=xxx.20.67.213)

failed

virtualization

engine

Discovery

[ Info ] Discovered

a new

Information:

Virtualization

Engine called v1a

Discovery events

occur the first time the

agent probes a storage

device and creates a

detailed description of

the device monitored.

The discovery device

sends it using any

active notifier, such as

NetConnect or email.

Chapter 7

Troubleshooting Virtualization Engine Devices

For Internal Use Only

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide — March 2002

CHAPTER

Troubleshooting the Sun StorEdge

T3+ Array Devices

This chapter contains the following sections:

■ “Explorer Data Collection Utility” on page 99

■ “Sun StorEdge T3+ Array Event Grid” on page 109

Explorer Data Collection Utility

The Explorer Data Collection Utility script is included on the Storage Service

Processor in the /export/packagesdirectory.

The Explorer Data Collection Utility is not installed by default, but can be installed

during rack setup. Customer-specific site information can be entered at that time.

▼ To Install Explorer Data Collection Utility on the

Storage Service Processor

# cd /export/packages

# pkgadd -d . SUNWexplo

As part of the installation procedure, you will be asked to enter in site-specific

information. You can optionally press the Return button to accept the blank defaults.

Do not accept automatic emailing of the Explorer Data Collection Utility output,

unless the Storage Service Processor is properly set up to handle mail correctly.

Automatic Email Submission

Would you like all explorer output to be sent to:

[email protected]

at the completion of explorer when -mail or -e is specified?

[y,n] n

Before running the Explorer Data Collection Utility, make sure that the switch and

Sun StorEdge T3+ array information is added to the proper /opt/SUNWexplo/etc

files.

Example

1. Type switch information into the /opt/SUNWexplo/etc/saninput.txtfile.

Edit the file with a text editor such as vi.

CODE EXAMPLE 8-1 Editing switch information using vi

# vi saninput.txt

# Input file for extended data collection

# Format is SWITCH SWITCH-TYPE PASSWORD LOGIN

# Valid switch types are ancor and brocade

# LOGIN is required for brocade switches, the default is admin

sw1a

sw1b

sw2a

sw2b

ancor

:wq!

2. Type Sun StorEdge T3+ array information into the /opt/SUNWexplo/etc/

t3input.txtfile. Edit the file with a text editor such as vi.

3. Type the password for your specific site.

100

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

CODE EXAMPLE 8-2 Editing Sun StorEdge T3+ array information using vi

# vi t3input.txt

# Input file for extended data collection

# Format is HOST PASSWORD

t3b0

t3b2

t3b3

XXXX

:wq!

Note – xxxx represents Sun StorEdge T3+ array passwords.

■ You can now run /opt/SUNWexplo/bin/explorerto collect information

about the Storage Service Processor operating system, the Sun StorEdge network

FC switch-8 or switch-16 switch, and Sun StorEdge T3+ array information, which

can be used for troubleshooting purposes.

■ A tar/gzipfile will be put into the /opt/SUNWexplo/outputdirectory. The

tar/gzipfile can be sent to Sun Service for evaluation.

■ The Sun StorEdge network FC switch-8 and switch-16 switch information will be

placed in the sandirectory of the tar file.

■ Sun StorEdge T3+ array information will be placed in the disk’s/t3directory.

Chapter 8

Troubleshooting the Sun StorEdge T3+ Array Devices

101

For Internal Use Only

Troubleshooting the T1/ T2 Data Path

Notes

■ There are two T Port links for redundancy.

■ If one of the two links is lost, no Sun StorEdge T3+ array LUN failover will occur,

and no pathing failures will be noted.

■ If both T Port links fail, there will be a Sun StorEdge T3+ array LUN failover, as

one of the virtualization engines take control of the I/ O operations. One of the

Sun StorEdge T3+ array LUNs will failover, as all I/ O is routed to the controlling

virtualization engine.

■ The host will notice a pathing failure in its multipathing software.

102

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

T1/ T2 Notification Events

The example below shows a typical port failure event

Site

Source

: Lab 3286 - DSQA1 Broomfield

: diag.xxxxx.xxx.com

Severity : Error (Actionable)

Category : Switch

DeviceId : switch:100000c0dd00b682

EventType: StateChangeEvent.M.port.8

EventTime: 01/30/2002 11:17:22

’port.8’ in SWITCH diag209-sw2a (ip=192.168.0.32) is now Not-Available

(status-state changed from ’Online’ to ’Offline’):

INFORMATION:

A port on the switch has logged out of the fabric and gone offline

PROBABLE-CAUSE:

1. Verify cables, GBICs and connections along Fibre Channel path

2. Check Storage Automated Diagnostic Environment SAN Topology GUI to

identify failing segment of the data path

3. Verify correct FC switch configuration

----------------------------------------------------------------------

Site

Source

: Lab 3286 - DSQA1 Broomfield

: diag.xxxxx.xxx.com

Severity : Warning

Category : Switch

DeviceId : switch:100000c0dd00b682

EventType: LogEvent.MessageLog

EventTime: 01/30/2002 11:17:22

Change in Port Statistics on switch diag209-sw2a (ip=192.168.0.32):

Port-8: Received 9746 ’InvalidTxWds’ in 0 mins (value=9805 )

FIGURE 8-1 Storage Service Processor Event

Chapter 8

Troubleshooting the Sun StorEdge T3+ Array Devices

103

For Internal Use Only

If both T Ports go offline, you might see messages like the following. Note the

virtualization engine Event alerting the LUN failover.

Site

Source

: Lab 3286 - DSQA1 Broomfield

: diag.xxxxx.xxx.com

Severity : Warning (Actionable)

Category : Ve

DeviceId : ve:6257335A-30303142

EventType: AlarmEvent.volume

EventTime: 01/30/2002 11:49:05

Volume T49152 on diag209-v1a changed from 6257335A-30303142(active=50020F23-

00006DFA,passive=) to 6257335A-30303142(active=50020F23-

00006DFA,passive=50020F23-0000725B)

INFORMATION:

This event occurs when the virtualization engine has detected a

change in status for a Multipath Drive or VLUN,

usually meaning a pathing problem to a Sun StorEdge T3+ array controller

for changes in Active/Passive paths

2. Check Sun StorEdge T3+ array for current LUN ownership. (‘port listmap‘)

3. Use ‘mpdrive failback‘ if needed to fail LUNs back to

correct controller if needed

----------------------------------------------------------------------

Site

Source

: Lab 3286 - DSQA1 Broomfield

: diag.xxxxx.xxx.com

Severity : Warning

Category : Message

DeviceId : message:diag.xxxxx.xxx.com

EventType: LogEvent.driver.SSD_WARN

EventTime: 01/30/2002 11:50:07

Found 1 ’driver.SSD_WARN’ warning(s) in logfile: /var/adm/messages on

diag.xxxxx.xxx.com (id=809f76b4):

INFORMATION:

SSD warnings

Jan 30 11:49:48 WWN:

mins [threshold is 5 in 24hours]

Received 7 ’SSD Warning’ message(s) on ’ssd56’ in 8

Last-Message: ’diag.xxxxx.xxx.com scsi:

[ID 243001 kern.warning] WARNING: /scsi_vhci/

ssd@g29000060220041956257335a30303145 (ssd56): ’

...continued on next page...

104

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

...continued from previous page...

----------------------------------------------------------------------

Site

Source

: Lab 3286 - DSQA1 Broomfield

: diag.xxxxx.xxx.com

Severity : Warning

Category : Message

DeviceId : message:diag.xxxxx.xxx.com

EventType: LogEvent.driver.Fabric_Warning

EventTime: 01/30/2002 11:50:07

Found 1 ’driver.Fabric_Warning’ warning(s) in logfile: /var/adm/messages on

diag.xxxxx.xxx.com (id=809f76b4):

INFORMATION:

Fabric warning

Jan 30 11:46:37 WWN:2b00006022004186

diag.xxxxx.xxx.com fp: [ID 517869

kern.warning] WARNING: fp(2): N_x Port with D_ID=108000,

PWWN=2b00006022004186 reappeared in fabric ( in backup:diag.xxxxx.xxx.com)

----------------------------------------------------------------------

Site

Source

: Lab 3286 - DSQA1 Broomfield

: diag.xxxxx.xxx.com

Severity : Warning (Actionable)

Category : Host

DeviceId : host:diag.xxxxx.xxx.com

EventType: AlarmEvent.P.hba

EventTime: 01/30/2002 11:50:10

status of hba /devices/pci@1f,4000/pci@2/SUNW,qlc@5/fp@0,0:devctl on

diag.xxxxx.xxx.com changed from NOT CONNECTED to CONNECTED

INFORMATION:

monitors changes in the output of luxadm -e port

FIGURE 8-2 Virtualization Engine Alert

Chapter 8

Troubleshooting the Sun StorEdge T3+ Array Devices

105

For Internal Use Only

Sun StorEdge T3+ Array Storage Service Processor

Verification

1. Run portlistmapon the Sun StorEdge T3+ array to see the failover event.

# t3b0:/:<1>port listmap

port

u1p1

u2p1

targetid

addr_type

hard

lun

volume

vol1

vol2

vol1

vol2

owner

access

primary

failover

primary

2. Compare the virtualization engine configuration to a saved configuration by

running /opt/SUNWsecfg/runsecfgand choosing Verify Virtualization Engine

Map.

The output is from the diff(1) command, which shows the lines that have been

added, changed, or deleted. Notice that the active Sun StorEdge T3+ array controller

WWN has changed for one of the Sun StorEdge T3+ arrays, indicating it is using its

alternate path.

MANAGE CONFIGURATION FILES MENU

1) Display Virtualization Engine Map

2) Save Virtualization Engine Map

3) Verify Virtualization Engine Map

4) Help

5) Return

Select configuration option above:> 3

Verifying Virtualization Engine map for v1........

ERROR: virtualization engine map for v1 has changed.

18c18

< t3b01

T49153

116.7

0.7

50020F230000725B

> t3b01

28c28

T49153

116.7

0.7

50020F2300006DFA

< t3b01

> t3b01

37c37

T49153

50020F230000725B 60020F2000006DFA

50020F2300006DFA 60020F2000006DFA

< I00002

> I00002

46d45

2900006022004186 v1b

2900006022004186 Unknown No

Yes

08.14

Unknown

< Undefined

210000E08B026C0F I00002

Yes

checkvemap: virtualization engine map v1 verification complete: FAIL.

FIGURE 8-3 Manage Configuration Files Menu

106

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

T1/ T2 FRU Tests Available

■ Switch - switchtest

■ Link - linktest

Running linktestfrom the Storage Automated Diagnostic Environment GUI

will guide the Service Engineer to discover the failed FRU.

Once the test has completed its run, an email message, similar to the following

message, will be sent to the Email recipient that was specified in linktest.

running on diag.xxxxx.xxx.com

linktest started on FC interconnect: switch to switch

switchtest started on switch 100000c0dd00b682 port 8

Estimated test time 14 minute(s)

01/30/02 11:21:26 diag209 Storage Automated Diagnostic Environment: MSGID

6013 switchtest.FATAL

switch0: "Device: Switch Port: 8 is Offline"

switchtest failed

Remove FC Cable from switch: 100000c0dd00b682, port: 8

Insert FC loopback cable into switch: 100000c0dd00b682, port: 8

Continue Isolation ?

switchtest started on switch 100000c0dd00b682 port 8

Estimated test time 14 minute(s)

01/30/02 11:22:11 diag209 Storage Automated Diagnostic Environment: MSGID

6013 switchtest.FATAL

switch0: "Device: Switch Port: 8 is Offline"

switchtest failed

Remove FC loopback cable from switch: 100000c0dd00b682, port: 8

Insert a NEW FC GBIC into switch: 100000c0dd00b682, port: 8

Insert FC loopback cable into switch: 100000c0dd00b682, port: 8

Continue Isolation ?

switchtest started on switch 100000c0dd00b682 port 8

Estimated test time 14 minute(s)

01/30/02 11:25:12 diag209 Storage Automated Diagnostic Environment: MSGID

4001 switchtest.WARNING

switch0: "Maximum transfer size for a FABRIC port is 200. Changing transfer

size 2000 to 200"

switchtest completed successfully

Remove FC loopback cable from switch: 100000c0dd00b682, port: 8

Restore ORIGINAL FC Cable into switch: 100000c0dd00b682, port: 8

Suspect ORIGINAL FC GBIC in switch: 100000c0dd00b682, port: 8

Retest to verify FRU replacement.

linktest completed on FC interconnect: switch to switch

FIGURE 8-4 Example Link Test Text Output from the Storage Automated Diagnostic

Environment

Chapter 8

Troubleshooting the Sun StorEdge T3+ Array Devices

107

For Internal Use Only

Notes

■ When inserting a loopback connector into the T Port, there will be NO green light

indicating a proper insertion. However, the test will run and be valid. There is

currently an RFE to address this issue.

■ If only one of the links has failed and the I/ O is travelling over the remaining

link, once the failed link is replaced and recabled, I/ O will be automatically be

routed over the repaired link by the switch. No manual intervention is required.

■ If both links have failed and a LUN failover has occured, after repairing the links

and recabling them, the user will have to manually perform a ’mpdrive failback’

to return the paths to their optimal state. I/ O will then resume as normal over

the T Ports.

T1/ T2 Isolation Procedures

1. Run linktestfrom the Storage Automated Diagnostic Environment for a guided

isolation procedure.

2. After replacing the failed FRU, run mpdrivefailback, if needed.

108

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

Sun StorEdge T3+ Array Event Grid

The Storage Automated Diagnostic Environment Event Grid enables you to sort Sun

StorEdge T3+ array events by component, category, or event type. The Storage

Automated Diagnostic Environment GUI displays an event grid that describes the

severity of the event, whether action is required, a description of the event, and the

recommended action. Refer to the Storage Automated Diagnostic Environment User’s

Guide for more information.

▼ Using the Sun StorEdge T3+ Array Event Grid

1. From the Storage Automated Diagnostic Environment Help menu, click the Event

Grid link.

2. Select the criteria from the Storage Automated Diagnostic Environment event

grid, like the one shown in FIGURE 8-5.

FIGURE 8-5 Sun StorEdge T3+ array Event Grid

Chapter 8

Troubleshooting the Sun StorEdge T3+ Array Devices

109

For Internal Use Only

The following table lists all of the events for the Sun StorEdge T3+ array.

Category

Component

EventType

Sev

Action

Description

Information

power.temp

Alarm+

The state of

power.u1pcu1.PowTe

mp on diag213

(ip=xxx.20.67.213) is

Normal

disk.port

Alarm-

Red

[ Info/ Action ] The

state of disk.u1d1.

Port1Stateon Sun

StorEdge T3+ array

Information: The Sun

StorEdge T3+ array

has reported that one

port of a dual-ported

t300 changed from OK disk has failed.

to failed.

Recommended action:

1. Telnet to affected

Sun StorEdge T3+

array

2. Verify disk state in

fru stat, fru list,

and vol stat.’

interface.

loopcard.cab

Alarm-

Red

[ Info/ Action ] The

state of

loopcable.u1l1.CableSt has reported that a

Information: The Sun

StorEdge T3+ array

ate changed from OK

to failed.

loopcard is in a failed

state.

Drive Status

Messages:

Recommended action:

1. Telnet to affected

Sun StorEdge T3+

array.

2. Verify tje loopcard

state with fru stat.

3. Verify the

Value Description

0 Drive mounted

2 Drive present

3 Drive is spun up

4 Drive is disable

5 Drive has been

replaced

matching firmware

with the other

7 Invalid system area

on drive

9 Drive not present

D Drive disabled;

drive is being

loopcard.

4. Re-enable the

loopcard if

possible (enable

u (encid)|[1|2]

). Replace loopcard

if necessary.

reconstructed

S Drive substituted

5. Re-enable the disk

if possible

6. Replace the disk, if

necessary.

110

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

Category

Component

EventType

Sev

Action

Description

Information

power.

battery

Alarm-

Red

[ Info/ Action ] The

state of

power.u1pcu1.BatStat

e on diag213

Information: The state

of the batteries in the

Sun StorEdge T3+

array is not optimal.

(ip=xxx.20.67.213) is

Fault

Recommended action:

Possible causes are:

1. Voltage level on

power supply and

battery have

1. Telnet to the

affected Sun

StorEdge T3+

array.

moved out of

acceptable

thresholds.

2. Run refresh -s

to verify the

battery state.

3. Replace the

battery, if

2. The internal PCU

temp has exceeded

acceptable

necessary

thresholds.

3. A PCU fan has

failed.

power.fan

Alarm-

Red

[ Info/ Action ] The

state of

Information: The state

of a fan on the Sun

power.u1pcu1.Fan1Sta StorEdge T3+ array is

te on diag213

(ip=xxx.20.67.213) is

Fault

not optimal.

Recommended action:

1. Telnet to affected

Sun StorEdge T3+

array.

2. Verify the fan state

with fru stat.

3. Replace the power

cooling unit, if

necessary.

Chapter 8

Troubleshooting the Sun StorEdge T3+ Array Devices

111

For Internal Use Only

Category

Component

EventType

Sev

Action

Description

Information

power.

output

Alarm-

Red

[ Info/ Action ] The

state of

Information: The state

of the power in the

power.u1pcu1.PowOu Sun StorEdge T3+

tput on diag213

(ip=xxx.20.67.21

3) is Fault

array power cooling

unit is not optimal.

Recommended action:

1. Telnet to affected

Sun StorEdge T3+

array.

2. Verify power

cooling unit state

in fru stat.

3. Replace PCU, if

necessary.

power.temp

Alarm-

Red

[ Info/ Action ] The

state of

power.u1pcu1.PowT

emp on diag213

(ip=

Information: The state

of the temperature in

the Sun StorEdge T3+

array power cooling

unit is either too high

or is unknown.

xxx.20.67.213)

is Fault

Recommended action:

1. Telnet to the

affected Sun

StorEdge T3+

array.

2. Verify that the

power cooling unit

state is in ‘fru stat’

3. Replace the PCU if

necessary.

enclosure

Alarm.log

Red

[ Info/ Action ]

Information: This

event includes all

important errors

found.

Errors(s) found in

logfile: / var/adm/

messages.t3

Recommended action:

Check the messages

file for appropriate

action.

112

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

Category

Component

EventType

Sev

Action

Description

Information

enclosure

Alarm.

time

Yello

[ Action ] Time of T3

diag213

Recommended action:

Discrepancy

(ip=xxx.20.67.213) is

different from host:

T3=Fri Oct 26

10:16:17 200,

Host=2001-10-26

12:21:04

Fix the date and time

on the Sun StorEdge

T3+ array using the

date command. Date

and time should be

the same as the

monitoring host.

enclosure

Audit

[ Info ] Auditing a

new Sun StorEdge

T3+ array called ras

d2-t3b1

(ip=xxx.0.0.41)

slr-mi.370-3990-

01-e-e1.003239

Information: Audits

occur every week and

send a detailed

description of the

enclosure to the Sun

Network Storage

Command Center

(NSCC).

Comm_

Established

[ Info ]

Communication

regained

Information: InBand

Communication.

(InBand(ccadieux))

with diag213

(ip=xxx.20.67.213)

( last reboot was 2001-

09-27 15:22:00)

oob

Comm_

Established

[ Info ]

Information:

OutOfBand

Communication

communications.

regained (OutOfBand

with diag213

(ip=xxx.20.67.213)

Chapter 8

Troubleshooting the Sun StorEdge T3+ Array Devices

113

For Internal Use Only

Category

Component

EventType

Sev

Action

Description

Information

Comm_Lost

Down

[ Info/ Action ] Lost

communication

(InBandwith diag213

(ip=xxx.20.67.21

3) ( last reboot was

2001-09-27 15:22:00)

Recommended action:

1. Verify luxadmvia

command line

(luxadm probe,

luxadm display)

2. Verify cables,

GBICs and

connections along

data path.

3. Check the Storage

Automated

Information: InBand.

This event is

established using

luxadm. This

Diagnostic

monitoring may not

be activated for a

particular Sun

Environment SAN

Topology GUI to

identify the failing

segment of the

data path.

StorEdge T3+ array.

4. Verify the correct

FC switch

configuration, if

applicable.

oob

Comm_Lost

Down

[ Info/ Action ] Lost

communication

(OutOfBand with

diag213

Recommended action:

1. Check Ethernet

connectivity to the

affected Sun

(ip=xxx.20.67.212)

StorEdge T3+

array.

Probable Cause: This

problem can also be

caused by a very slow

network, or because

the Ethernet

connection to this Sun

StorEdge T3+ array

was lost.

2. Verify the Sun

StorEdge T3+ array

is booted correctly.

3. Verify the correct

TCP/ IP settings on

the Sun StorEdge

T3+ array .

4. Increase the http

and/ or ping

Information:

timeout in

Utilities--

>System--

>Timeouts. The

current default

timeouts are 10

seconds for ping

and 60 seconds for

http (tokens).

OutOfBand. This

means that the Sun

StorEdge T3+ array

failed to answer to a

ping or failed to

return its tokens.

114

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

Category

Component

EventType

Sev

Action

Description

Information

t3ofdg

Diagnostic

Test-

Red

t3ofdg (diag240)

on diag213

(ip=xxx.20.67.213)

failed

t3test

Diagnostic

Test-

Red

t3test (diag240) on

diag213

(ip=xxx.20.67.213)

failed

t3volverify

enclosure

Diagnostic

Test-

t3volverify (diag240)

on diag213

(ip=xxx.20.67.213)

failed

Discovery

[ Info ]

Information:

Discovery events

Discovered a new Sun

StorEdge T3+ array

called ras d2-t3b1

(ip=xxx.0.0.41)

slr-mi.370-3990-

01-e-e1.003239

occur the first time the

agent probes a storage

device. The Discovery

event creates a

detailed description of

the device monitored

and sends it using any

active notifier, such as

NetConnect or Email.

controller

Insert

[ Info ]

Information:

Component

A new Controller, as

identified by its serial

number, has been

installed on the Sun

controller.u1ctr

(id) was added to T3

diag213

(ip=xxx.20.67.213) StorEdge T3+ array.

disk

Insert

Component

disk.u2d3(SEAGATE

.ST318203FSUN18G

.LRG07139) was

added to diag158

(ip=xxx.20.67.158)

interface.

loopcard

Insert

Component

[ Info ]

A new LoopCard, as

identified by its serial

number, has been

installed on the Sun

StorEdge T3+ array .

Chapter 8

Troubleshooting the Sun StorEdge T3+ Array Devices

115

For Internal Use Only

Category

Component

EventType

Sev

Action

Description

Information

power

Insert

[ Info ]

Component

’power.u1pcu2’(TE

CTROL-CAN.300-

1454-

01(50).008275)

was added to T3

diag213

(ip=xxx.20.67.21

enclosure

Location

Change

Location of t3

rasd2-t3b0

(ip=xxx.0.0.40)

was changed

enclosure

controller

QuiesceEnd

QuiesceStart

Removal

Quiesce End on t3

d2-t3b1

(ip=xxx.0.0.41)

Quiesce Start on t3

d2-t3b1

(ip=xxx.0.0.41)

Monitoring of t3 d2-

t3b1 (ip=xxx.0.0.41)

ended

Remove

Red

[ Info/ Action ]

Information: The Sun

Component

’controller.u1ctr StorEdge T3+ array

’(id) was removed

from T3 diag213

(ip=xxx.20.67.213)

has reported that a

controller was

removed from the

chassis.

Recommended action:

Replace the Controller

within 30 minute

power shutdown

window.

116

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

Category

Component

EventType

Sev

Action

Description

Information

disk

Remove

Component

Red

[ Info/ Action ]

Information: The Sun

StorEdge T3+ array

has reported a disk

has been removed

from the chassis.

disk.u2d3(SEAGAT

E.ST318203FSUN18

G.LRG07139) was

removed from

diag158

Recommended action:

(ip=xxx.20.67.158)

Replace the disk

within the 30-minute

power shutdown

window.

interface.

loopcard

Remove

Component

Red

[ Info/ Action ]

Information:

Recommended action:

Replace the loopcard

within the 30-minute

The Sun StorEdge T3+ power shutdown

array has reported

that a loopcard has

been removed from

the chassis.

window

power

Remove

Component

Red

[ Info/ Action ]

Information: The Sun

StorEdge T3+ array

’power.u1pcu2’(TE has reported that a

CTROL-CAN.300-

1454-

01(50).008275)

was removed from T3

diag213

power cooling unit

has been removed

from the chassis.

Recommended action:

Replace the PCU

(ip=xxx.20.67.213)

within 30-minute

shutdown window.

controller

State

Change+

’controller.u1ctr

’ in T3 diag213

(ip=xxx.20.67.213)

is now Available

(status-state changed

from disabledto

ready-enabled)

Chapter 8

Troubleshooting the Sun StorEdge T3+ Array Devices

117

For Internal Use Only

Category

Component

EventType

Sev

Action

Description

Information

disk

State

Change+

disk.u1d5 in Sun

StorEdge T3+ array

rasd3-t3b1

(ip=xxx.0.0.41) is

now Available

(status-state changed

from fault-

disabledto ready-

enabled)

interface.

loopcard

State

Change+

[ Info ]

Information: The Sun

loopcard.u1l1(SLR StorEdge T3+ array

-MI.375-0085-01-

G-G4.070924) in T3

msp0-t3b0

has reported that a

loopcard has been

replaced or brought

back online.

volume

State

Change+

’volume.u1vol1

(slr-mi.370-3990-

01-e-

f0.022542.u1vol1)

in T3 dvt2-t3b0

(ip=192.168.0.40)

is now Available

status-state changed

from unmounted to

mounted)

power

State

Change+

power.u1pcu2’TEC

TROL-CAN.300-

1454-

01(50).008275) in

T3 rasd2-t3b1

(ip=xxx.0.0.41) is

now Available status-

state changed from

ready-disable to

ready-enable).

118

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

Category

Component

EventType

Sev

Action

Description

Information

controller

State

Red

[ Info/ Action ]

Recommended action:

Change-

controller.u1ctr

in T3 diag213

(ip=xxx.20.67.213)

is now Not-Available

(status-state changed

from unknownto

1. Telnet to affected

Sun StorEdge T3+

array.

2. Verify the

controller state

with ‘fru stat’ and

‘sys stat’.

ready-disabled)

3. Run ‘logger -

dmprstlog’ to

capture controller

information.

4. Re-enable the

controller if

Information: The Sun

StorEdge T3+ array

controller has been

disabled.

possible (enable u)

5. Replace the

controller, if

necessary.

disk

StateChange- Red

[ Info/ Action ]

Information: The Sun

StorEdge T3+ array

has reported that a

disk has failed.

disk.u1d5in T3

rasd3-t3b1

(ip=xxx.0.0.41) is

now Not-Available

(status-state changed

from unknownto

fault-disabled).

Recommended action:

1. Telnet to the

affected Sun

StorEdge T3+ array

2. Verify that the disk

state is in fru

stat, fru list,

and vol stat.

3. Replace the disk, if

necessary.

Chapter 8

Troubleshooting the Sun StorEdge T3+ Array Devices

119

For Internal Use Only

Category

Component

EventType

Sev

Action

Description

Information

interface.

loopcard

StateChange- Red

[ Info/ Action ]

Recommended action:

Information:

1. Telnet to the

affected Sun

StorEdge T3+

array.

The Sun StorEdge T3+

array has indicated

that the loopcard is no 2. Verify loopcard

longer in an optimal

state.

state with fru

stat

3. Verify matching

firmware with

other loopcard.

4. Re-enable loopcard

if possible (enable

u(encid)| [1| 2| ]

5. Replace the

loopcard if

necessary.

120

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

Category

Component

EventType

Sev

Action

Description

Information

volume

StateChange- Red

[ Info/ Action ]

Information: The Sun

StorEdge T3+ array

has reported that a

power cooling unit

has been disabled.

Recommended action:

1. Check the Sun

StorEdge T3+ array

syslog for battery

hold times.

2. If < 6 minutes,

replace the battery,

or the entire PCU,

as required.

power

StateChange- Red

[ Info/ Action ]

Information: The Sun

power.u1pcu2(TECT StorEdge T3+ array

ROL-CAN.300-

1454-

has reported that a

LUN has changed

state.

01(50).008275) in

T3 rasd2-t3b1

(ip=xxx.0.0.41) is

now Not-Available

(status-state changed

from ready-enabled to

ready-disable).

Recommended action:

1. Telnet to the

affected Sun

StorEdge T3+ array

2. Check the status of

LUNs via vol

modeor vol

stat.

enclosure

Statistics

Statistics about T3

d2-t3b1

(ip=xxx.0.0.41)

Chapter 8

Troubleshooting the Sun StorEdge T3+ Array Devices

121

For Internal Use Only

Replacing the Master Midplane

Follow this procedure when replacing the master midplane in a Sun StorEdge T3+

array. This procedure is detailed in the Storage Automated Diagnostic Environment

User’s Guide.

▼ To Replace the Master Midplane

1. Choose Maintenance --> General Maintenance -- > Maintain Devices.

Refer to Chapter 3 of the Storage Automated Diagnostic Environment User’s Guide.

2. In the Maintain Devices window, delete the device that is to be replaced.

3. Choose Maintenance -- > General Maintenance -- > Discovery.

4. In the Device Discovery window, rediscover the device.

5. Choose Maintenance -- > Topology Maintenance -- > Topology Snapshot.

a. Select the host that monitors the replaced FRU.

b. Click Create and Retrieve Selected Topologies.

c. Click Merge and Push Master Topology.

Conclusion

Any time a master midplane is replaced, you must rediscover the device using the

procedure described above. This is especially important when the Storage Service

Processor is replaced as a FRU, whether the Storage Service Processor is the master

or the slave.

122

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

CHAPTER

Troubleshooting Ethernet Hubs

The Sun StorEdge 3900 and 6900 series uses an Ethernet hub as the backbone for the

internal service network. The allocation of Ethernet ports are as follows:

■ 1—Storage Service Processor (per subsystem)

■ 1—for each Fibre Channel Switch

■ 1—for each Virtualization Engine

■ 2—for each Sun StorEdge T3+ array partner group

■ 1—for the Ethernet hub that is installed on the second Sun StorEdge Expansion

Cabinet in the Sun StorEdge 3960 and 6960 systems

Note – Information about LED Status lights, power information, and front panel

settings, can be found in the SuperStack 3 Baseline Hub 12-Port TP (3C16440A) and 24-

Port TP (3C16441A) User Guide, pn: DUA1644-0AAA03. This is a 3COM document.

123

124

Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

APPENDIX

Virtualization Engine References

This Appendix contains the following Tables:

■ Table A-1 “SRN and SNMP Reference”

■ Table A-2 “SRN/ SNMP Single Point of Failure Table”

■ Table A-3 “Port Communication”

■ Table A-4 “Service Codes”

TABLE A-1 provides an explanation of Service Request Numbers for the virtualization

engine.

TABLE A-1 SRN and SNMP Reference

SRN

Description

Corrective Action

1xxxx

Disk drive Check Condition status. xxxx is

the Unit Error Code.

If too many Check Conditions are returned,

then check the link status.

The Unit Error Codes are returned by the

drive in Sense Data bytes 20-21 in response

to the SCSI Request Sense command.

70000

70001

70002

70003

SAN Configuration has changed.

Rebuild process has started.

Rebuild is completed without error.

Rebuild is aborted with a read error. This

means that the drive copying information

cannot read from the primary drive.

If a spare drive is available, it will be

brought in and used to replace the failed

drive. If no spare is available, replace the

failed drive with a new drive.

70004

Write error is reported by follower. If the

initiator is master, then its follower has

detected a write error on a member within

a mirror drive.

If a spare drive is available, it will be

brought in and used to replace the failed

drive. If no spare is available, replace the

failed drive with a new drive.

125

For Internal Use Only

TABLE A-1 SRN and SNMP Reference

SRN

Description

Corrective Action

70005

Write error is detected by master.

If the initiator is master, then it has detected

a write error on a member within a mirror

drive.

If a spare drive is available, it will be

brought in and used to replace the failed

drive. If no spare is available, replace the

failed drive with a new drive.

70006

70007

virtualization engine-to-virtualization

engine communication has failed.

Internal error. Update firmware.

Rebuild is aborted with write error. This

means the primary drive cannot write to

the drive being built.

If a spare drive is available, it will be

brought in and used to replace the failed

drive. If no spare is available, replace the

failed drive with a new drive.

70008

70009

Read error is reported by follower. If the

initiator is master, then its follower has

detected a read error on a member within a

mirror drive.

If a spare drive is available, it will be

brought in and used to replace the failed

drive. If no spare is available, replace the

failed drive with a new drive.

Read error is detected by master. If the

initiator is master, then it has detected a

read error on a member within a mirror

drive.

If a spare drive is available, it will be

brought in and used to replace the failed

drive. If no spare is available, replace the

failed drive with a new drive.

70010

70020

70021

70022

70023

70024

CleanUp configuration table is completed.

SAN physical configuration has changed.

Drive is offline.

If unintentional, check condition of drives.

Check condition of drives.

virtualization engine is offline.

Drive is unresponsive.

For Sun StorEdge T3+ array pack: Master

virtualization engine has detected the

partner virtualization engine’s IP Address.

70025

70030

For Sun StorEdge T3+ array pack: Master

virtualization engine is unable to detect the two virtualization engines.

partner virtualization engine’s IP Address.

Check the Ethernet connection between the

SAN configuration changed by SV SAN

Builder.

70040

70050

70051

70098

Host zoning configuration has changed.

MultiPath drive Failover.

MultiPath drive Failback.

Instant Copy degrade.

Check MultiPath drive.

If no spare is available, replace the failed

drive with a new drive.

70099

Degrade because the drive has disappeared. Reinsert the missing drive, or replace it

with a drive of equal or greater capacity.

126 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

TABLE A-1 SRN and SNMP Reference

SRN

Description

Corrective Action

7009A

Read degrade recorded. A mirror drive

was written to, causing it to enter the

degrade state.

Reinsert the missing drive, or replace it

with a drive of equal or greater capacity.

7009B

7009C

Write degrade recorded. If a spare drive is

available, it will be brought in and used to

replace the failed drive.

The removed drive needs to be (if good)

reinserted or (if bad) replaced.

Last primary failed during rebuild. This is

• Backup drive data.

a “multi-point failure” and is very rare.

• Destroy mirror drive where failure has

occurred.

• Format (mode 14) drives.

• Create new mirror drive.

• Re-assign old SCSI ID and LUN to mirror

drive.

• Restore data.

71000

71001

virtualization engine-to-virtualization

engine communication has recovered.

This is a generic error code for the SLIC. It

signifies communication problems between

the virtualization engine and the Daemon.

Check the condition of the virtualization

engine.

Check the cabling between the

virtualization engine and Daemon server.

Error halt mode also forces this SRN.

71002

This indicates that the SLIC was busy.

Check the condition of the virtualization

engine.

Check the cabling between the

virtualization engine and the Daemon

server.

Error halt mode also forces this SRN.

71003

71010

72000

SLIC Master unreachable.

Check conditions of the virtualization

engines in the SAN.

The status of the SLIC daemon has

changed.

Primary/ Secondary SLIC daemon

connection is active.

72001

72002

72003

72004

Failed to read SAN drive configuration.

Failed to lock on to SLIC daemon.

Failed to read SAN SignOn Information.

Failed to read Zone configuration.

Appendix A

Virtualization Engine References

127

TABLE A-1 SRN and SNMP Reference

SRN

Description

Corrective Action

72005

72006

72007

Failed to check for SAN changes.

Failed to read SAN event log.

SLIC daemon connection is down.

Wait for 1-5 minutes for backup daemon to

come up. If it doesn’t, check the network

connection for virtualization engine halt, or

hardware failure.

TABLE A-2 SRN/ SNMP Single Point of Failure Table

SRN after

Corrective

Action

SRN

SNMP Description

Corrective Action

70020

70030

70050*

70021

Check SAN cabling and connections

between Sun StorEdge T3+ array

and virtualization engine.

Perform Sun StorEdge T3+ array

failback, if necessary.

70020

70030

70051**

• SAN topology has changed

• Global SAN configuration has

changed.

• SAN configuration has changed.

• A physical device is missing.

70025

Partner’s virtualization engine’s IP is not

reachable.

Check Ethernet cabling and

connections.

None.

70020

70030

70050

70025

70021

70022

70020

70030

70050

70024

70021

70022

• SAN topology has changed

• Global SAN configuration has

changed.

• SAN configuration has changed.

• Partner virtualization engine’s IP is

not reachable.

• A physical device is missing.

• A SLIC virtualization engine is

missing.

• Check cabling and connections

between virtualization engine.

• Cycle power on failed

virtualization engine, if fault LED

flashes.

• Perform Sun StorEdge T3+

array failback, if necessary.

• Enable VERITAS path.

Readings

72007

When error halt on virtualization

engine (not master)

• SLIC daemon connection is inactive.

Failed to check for SAN changes,

daemon error, check the SLIC

virtualization engine.

72000

• Secondary daemon connection is

active.

* Sun StorEdge T3+ array LUN Failover.

** Sun StorEdge T3+ array LUN Failback.

128 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

TABLE A-3 Port Communication

Port

Daemon

Port

Port Number

20000

Management Programs

Daemon

20001

Daemon

virtualization engine

25000

virtualization engine

25001

TABLE A-4 provides service codes for the virtualization engine.

TABLE A-4 Service Codes

Code

Number

Cause

Corrective Action

005

PCI bus parity error.

• Replace virtualization engine.

• Cycle power to the virtualization engine.

The attempt to report one error resulted in

another error.

Corrupt database

• Clear SAN database

• Cycle power to the virtualization engine.

• Import SAN zone configuration

Corrupt database

• Clear SAN database

• Cycle power to the virtualization engine

• Import SAN zone configuration

Zone mapping database

• Import SAN zone configuration

050

This message indicates that an attempt to

write a value into non-volatile storage

failed. It could be a hardware failure, or it

could be that one of the databases stored in

Flash memory could not accept the entry

being added.

• Clear the SAN database.

• Cycle power to the virtualization engine.

051

Cannot erase FLASH memory.

• Replace virtualization engine.

Unauthorized cabling configuration

• Check cabling. Ensure server/ switch

connects to host-side and storage connects

to device side of virtualization engine

virtualization engine.

• If necessary, clear SAN database.

• If necessary, cycle virtualization engine

power.

• If necessary, import SAN zone

configuration.

Appendix A

Virtualization Engine References

129

TABLE A-4 Service Codes

Unauthorized cabling configuration.

• Check cabling.

• No action required.

Too many HBAs attempting to log in.

Node mapping table cleared using SW2.

Improper SW2 setting.

• Correct SW2 setting.

• Cycle virtualization engine power.

126

130

Too many virtualization engines in SAN.

• Remove the extra virtualization engine.

• Cycle virtualization engine power.

Heartbeat connection between

virtualization engines is down.

• Correct problem.

• Cycle the power on the follower

virtualization engine.

400 - 599 Device side interface driver errors:

409

434

FC device-side type code invalid.

• Cycle power

• If problem persists, replace virtualization

engine.

Too many elastic store errors to continue.

Elastic store errors result from a clock

mismatch between transmitter and receiver

and indicates an unreliable link. This error

can also occur if a device in the SAN loses

power unexpectedly.

• Check for faulty component and replace.

• Cycle the power on the follower

virtualization engine.

130 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

APPENDIX

SUNWsecfg Error Messages

The Sun StorEdge 3900 and 6900 Series Reference Manual lists and defines the

command u tilities that configure the various components of the Sun StorEdge 3900

and 6900 series storage systems. The information in this appendix expands on that

information by providing recommendations for corrective action, should you

encounter errors with the command utilities.

The error m essages are broken out into the following tables:

■

TABLE B-1 lists SUNWsecfg error messages specific to the virtualization engine

■

TABLE B-2 lists SUNWsecfg error messages specific to the Sun StorEdge network

FC switch-8 and switch-16 switches

■

TABLE B-3 lists SUNWsecfg error messages specific to the Sun StorEdge T3+ array

TABLE B-4 lists miscellaneous SUNWsecfg error messages common to all

components

131

For Internal Use Only

TABLE B-1 Virtualization Engine SUNWsecfg Error Messages

Message

Description and Cause of Error

Suggested Action

Common to

Invalid virtualization engine pair name Try ps -ef | grep savevemapor

virtualization engines

$vepair, or virtualization engine is

unavailable. Confirm that the

configuration locks are set. This is

usually due to the savevemap

command running.

listavailable -v(which returns

the status of individual virtualization

engines).

Common to

virtualization engine

No virtualization engine pairs found,

or the virtualization engine pairs are

offline. Confirm that the configuration

Try ps -ef | grep savevemapor

listavailable -v(which returns

the status of individual virtualization

locks are set. This is usually due to the engines).

savevemapcommand running.

Common to

Unable to obtain lock on $vepair.

Another virtualization engine

virtualization engine

Another command is running.

command is updating the

configuration. Try listavailable-v

(which returns the status of individual

virtualization engines) and check for

lock file directly by using ls-la

/opt/SUNWsecfg/etc (look for

.v1.lockor .v2.lock). If the lock is

set in error, use the removelocks -v

command to clear.

Common to

Unable to start slicdon ${vepair}. Try running startslicdand then

virtualization engine

Cannot execute command.

showlogs -e 50to determine why

startslicd couldn’t start the

daemon. You might have to reset or

power off the virtualization engine if

the problem persists.

Common to

virtualization engine

VEPASSWDmight be set to an incorrect

value. Try again.

virtualization engine. The utility uses

the VEPASSWD environment variable

to login. Set the VEPASSWD

environment variable with the proper

value.

Common to

virtualization engine

After resetting the virtualization

engine, the $VENAME is unreachable.

The hardware might be faulty.

Check the IP address and netmask that

has been assigned to the virtualization

engine hardware.

Be aware that after a reset, it takes

approximately 30 seconds to boot.

132 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

TABLE B-1 Virtualization Engine SUNWsecfg Error Messages (Continued)

Message

Description and Cause of Error

Suggested Action

Common to

virtualization engine

1. Device-side operating mode is not

set properly.

2. Device-side UID reporting scheme

is not set properly.

3. Host-side operating mode is not set

properly.

4. Host-side LUN mapping mode is

not set properly.

verify that the device, host, and

network settings are correct. Make sure

the virtualization engine hardware is

not in ERROR 50 mode. If required,

power cycle the virtualization engine

hardware, or disable the host-side

switch port. Run the setupve -n

ve_namecommand and enable the

switch port.

5. Host-side Command Queue Depth

is not set properly.

6. Host-side UID distinguish is not set

properly.

7. IP is not set properly.

8. Subnet mask is not set properly.

9. Default gateway is not set properly.

10. Server port number is not set

properly.

11. Host WWN Authentications are not

set properly.

12. Host IP Authentications are not set

properly.

13. Other VEHOST IP is not set

properly.

checkslicd

Cannot establish communication with

${vepair}.

Run startslicd -n ${vepair}.

Cannot establish communication with

virtualization engine pair ${vepair}

initiator {$initiator}.

Determine the host name associated

with ${initiator}by using the

showvemap -n ${vepair} -f

command output. Run the command

resetve -n vename.

checkvemap

Cannot establish communication with

${vepair}

Run the command again. If this fails,

check the status of both virtualization

engines. If there is an error condition,

see Appendix A for corrective action.

Appendix B

SUNWsecfg Error Messages

133

TABLE B-1 Virtualization Engine SUNWsecfg Error Messages (Continued)

Message

Description and Cause of Error

Suggested Action

createvezone

Invalid WWN $wwnon $vepair

initiator $init, or virtualization

engine is unavailable.

WWN that has already been specified

has a SLIC zone and/ or an HBA alias

assigned. Note that for a WWN to be

available for createvezone, the zone

name in the map file (showvemap -n

ve_pairname) must be “undefined”

and the online status should be “yes.”

If a zone name is assigned, run the

rmvezone command.

If there are still errors, try

sadapter alias -d $vepair -r

$initiator -a $zone -n “ “

and then run savemap -n $vepair.

listavailable

No virtualization engines are available. If no other commands are running and

They are either not found, or the

configuration lock is set.

you believe the configuration lock

might be set in error, run the

removelockscommand.

Either the components (the Sun

StorEdge T3+ array, the switch, or the

virtualization engine) are down

(cannot be pinged) or another

SUNWsecfgcommand is running and

is updating the configuration (ps -ef)

restorevemap

1. Import zone data failed

2. Restore physical and logical data

failed

Check the status of both virtualization

engines. If there is an error condition,

refer to Appendix A for corrective

action. Attempt to run the

3. Restore zone data failed

restorevemapcommand again.

setdefaultconfig

1. Unable to properly configure the

virtualization engine host

${vehost}.

2. Cannot continue configuration of

other components.

Check the status of the virtualization

engine and try again.

The setupvecommandfailed.

Try running

setupve -n ve_hostname -v

(verbose mode) and check the errors.

Then run

checkve -n ve_hostname.

You can continue to configure VLUNs

and zones only if both of these

commands work.

134 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

TABLE B-2 Sun StorEdge Network FC Switch-8 and Switch-16 Switch SUNWsecfg Error Messages

Message

Description and Cause of Error

Suggested Action

Common Switch

Sun StorEdge system type entered,

Either call the command with the

${cab_type}, does not match system -f forceoption to force the series

type discovered, ${boxtype}.

type, or do not specify the cabinet type

(no -coption).

Common Switch

1. Unable to obtain lock on switch

${switch}. Another command is

running.

1. Another switch command might be

updating the configuration. Check

listavailable -s.

2. If the switch in question does not

appear, check for the existence of

the lock file directly by typing ls

-la /opt/SUNWsecfg/etc(look

for .$switch.lock).

3. If the lock is set in error, use the

removelocks -scommand to

clear it.

checkswitch

1. Current configuration on $switch

does not match the defined

configuration.

1. Select View Logs or directly view

$LOGFILEfor more details.

2. Re-run setupswitchon the

2. One of the predefined static switch

configuration parameters, that can

be overridden for special

specified $switch.

configurations such as NT connect

or cascaded switches, is set

incorrectly.

listavailable

No Sun StorEdge network FC switch-8 If no other commands are running and

or switch-16 switch devices are

available. They are either not found, or

the configuration lock is set.

you believe the configuration lock

might be set in error, run the

removelockscommand.

Either the components (the Sun

StorEdge T3+ array, the switch, or the

virtualization engine) are down

(cannot be pinged) or another

SUNWsecfgcommand is running and

is updating the configuration (ps -ef)

Appendix B

SUNWsecfg Error Messages

135

TABLE B-2 Sun StorEdge Network FC Switch-8 and Switch-16 Switch SUNWsecfg Error Messages

Message

Description and Cause of Error

Suggested Action

setswitchflash

Invalid flash file $flashfile.

Check the number of ports on switch

$switch.

You might be attempting to download

a flash file for an 8-port switch to a 16-

port switch. Check showswitch -s

$switchand look for “number of

ports.” Ensure that this matches the

second and third characters of the flash

file name; for example: m08030462.fls.

setswitchflash

setupswitch

${switch}timed out after reset. The

switch took longer than two minutes to or rarpis not working correctly. Try

reset after a configuration change.

The switch might not be set for rarp,

ping $switchafter waiting a few

more minutes. If errors persist,

manually power cycle the switch.

Switch ${switch}timed out after

reset.

The switch took longer than two

minutes to reset after a configuration

change. Try ping$switchafter

waiting a few more minutes. If errors

persist, manually power cycle the

switch.

setupswitch

Could not set chassis ID on switch

${switch} to ${cid}.

This should occur only in a SAN

environment with cascaded switches.

Be aware of the switch chassis IDs of

all switches in the SAN and make sure

the IDs are all unique. Once the chassis

IDs are established, override the switch

chassis IDs with the following

command:

setupswitch -s $switch_name -

i $unique_chassis_id -v.

136 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

TABLE B-3 Sun StorEdge T3+ Array SUNWsecfg Error Messages

Message

Description and Cause of Error

Suggested Action

Common to Sun

StorEdge T3+ array

Present configuration does not match

Reference configurations

Check the present Sun StorEdge T3+

array configuration with showt3 -n

<t3>command and verify whether

the configuration is corrupted or has

changed. If it is not one of the standard

configurations, restore the

configuration using the

restoret3configcommand.

Common to Sun

StorEdge T3+ array

1. Could not mount volume $vol.

2. $lunconfig does not match

There might be multiple drive failures

or corrupted data or parity on the

LUN. Replace the failed FRUs and

restore the Sun StorEdge T3+ array

configuration with the

restoret3config -f -n t3_name

command.

Common to Sun

StorEdge T3+ array

The $frustatus is not ready or

enabled. Operations on the Sun

StorEdge T3+ array are being aborted.

The disk, controller, or loop interface

card in the Sun StorEdge T3+ array

might be bad. Replace the failed FRU

and rerun the utility.

Common to Sun

StorEdge T3+ array

1. The Sun StorEdge T3+ array is not

of T3B type, and it cannot continue

aborting operations.

The Sun StorEdge T3 array

configuration is not a standard

configuration (refer to the t3

default/ custom configuration table in

the Sun StorEdge 3900 and 6900 Series

2. t3configutilities are supported

only in the Sun StorEdge T3+ array; Hardware Installation and Service

the t3configutilities are not

supported on Sun StorEdge T3+

arrays with 1.xx firmware.

Manual.)

Use showt3 -n t3_nameto display

the present configuration. Use the

modifyt3configand

restoret3configutility to

configure the Sun StorEdge T3+ array.

checkt3config

vol initcommand is being executed

by another user. Additional vol

commands cannot run.

Check whether any other secfgutility

is running. If one is running, allow it to

finish.

An error occurred while checking proc Check whether any other secfgor

list, aborting operation on

$BRICK_IP{$brick_name}

native Sun StorEdge T3+ commands

are being executed on the particular

Sun StorEdge T3+ array.

Appendix B

SUNWsecfg Error Messages

137

TABLE B-3 Sun StorEdge T3+ Array SUNWsecfg Error Messages (Continued)

Message

Description and Cause of Error

Suggested Action

checkt3config

Snapshot configuration files are not

Make sure that the snapshot files are

present. Unable to check configuration. saved and have read permissions in

the /opt/SUNWsecfg/etc/t3name/

directory. If the snapshot files are not

available, , create them by using the

savet3configcommand.

checkt3mount

1. The $lunstatus reported a bad or

Make sure that the requested LUN

exists on the Sun StorEdge T3+ array

by using the showt3 -n command.

nonexistent LUN.

2. While checking the configuration

using the showt3 -ncommand,

operations abort.

Confirm that the Sun StorEdge T3+

array configuration matches standard

configurations.

createvlun

Invalid diskpool $diskpool on $vepair, Ensure the diskpool was created

or diskpool is unavailable.

properly using the showvemap -n

$vepair command. If the diskpool is

unavailable, try creatediskpools -

n $t3name.

If that fails, check the Sun StorEdge

T3+ array for unmounted volumes or

path failures, by using

checkt3config -n $t3name -v.

createvlun

Unable to execute command. The

associated Sun StorEdge T3+ array

Run checkt3mount -n $t3name

-l ALLto see the mount status of the

physical LUN ${t3lun}, for disk pool volume. For further information about

${diskpool}, might not be mounted. problems with the underlying Sun

StorEdge T3+ array, try

checkt3config -n $t3name -v.

listavailable

No Sun StorEdge T3+ arrays are

available. They are either not found, or

the configuration lock is set.

If no other commands are running and

you believe the configuration lock

might be set in error, run the

removelockscommand.

Either the components (the Sun

StorEdge T3+ array, the switch, or the

virtualization engine) are down

(cannot be pinged) or another

SUNWsecfgcommand is running and

is updating the configuration (ps-ef).

modifyt3config

The lock file clear waiting period

expired and the creatediskpools

command is aborted.

Check to see if the modifyt3config

and restoret3configcommands

are executing on other Sun StorEdge

T3+ arrays. If the commands are

executing, wait for them to complete,

and then run creatediskpools -n

t3name.

138 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

TABLE B-3 Sun StorEdge T3+ Array SUNWsecfg Error Messages (Continued)

Message

Description and Cause of Error

Suggested Action

restoret3config

Error while the block size compare

command is executing. The

$BRICK_IP{$IPADD}command is

aborted.

The Sun StorEdge T3+ array block size

parameter is different from the

snapshot file. The Sun StorEdge T3+

array may have been reconfigured.

Run restoret3config.

restoret3config

$LUNconfiguration failed to restore

and the force option was used to

reinitialize, without success

Check the Sun StorEdge T3+

configuration with the showt3 -n

t3_name command. Restore the Sun

StorEdge T3+ array configuration with

the restoret3configcommand.

$LUNconfiguration is not found in the

$restore_file. Cannot restore

$LUN.

Check for snapshot files in the

/ opt/ SUNWsecfg/ etc/ t3_name/

directory. If the snapshot files are not

found, use the modifyt3config

command to configure the Sun

StorEdge T3+ array.

savet3config

While checking the configuration, the

Sun StorEdge T3+ array configuration

has not been saved.

Check the Sun StorEdge T3+ array

configuration by using the showt3 -n

t3_name command, if the

configuration is different from

standard Sun StorEdge T3

configurations. Use the

modifyt3configcommand to

reconfigure the device.

Appendix B

SUNWsecfg Error Messages

139

TABLE B-4 Other SUNWsecfgError Messages

Message

Description and Cause of Error

Suggested Action

Common to all

components

If the Sun StorEdge 3900 or 6900 series Set the BOXTYPE variable as follows:

has multiple (more than two) failures

(for example, both virtualization

engines and two switches are down),

the getcabinettool might not

BOXTYPE=6910; export BOXTYPE

determine the correct cabinet type. In

this example, the getcabinetscript

might determine the device to be a Sun

StorEdge 3900 series when, in reality, it

is a Sun StorEdge 6900 series.

Could not determine the Sun StorEdge Try using the command line interface

checkdefaultconfig

setdefaultconfig

system type.

(CLI) by setting the BOXTYPE

environment variable to one of the four

values.

Multiple components might be down

and the getcabinet command could not

determine the Sun StorEdge series type For example, BOXTYPE=3910; export

(3910, 3960, 6910, or 6960).

BOXTYPE).

The system could not determine the

Sun StorEdge system type.

Try using the command line interface

(CLI) by setting the BOXTYPE

environment variable to one of the four

values.

For example, BOXTYPE=3910; export

BOXTYPE).

140 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

setupswitch Exit Values

TABLE 9-1 lists the setupswitch exit values. The associated messages are logged in the

/var/adm/log/SEcfgloglog file.

TABLE 9-1 setupswitchExit Values

Severity

Level

Message Type

Message Meaning

INFO

All switch settings are properly set. The switch setting matches the default

configuration.

ERROR

Errors occurred while trying to set the proper switch settings.The switch

setting does not match the default configuration or any valid alternatives.

WARNING

Errors occurred while trying to set the proper switch settings. The ports did

not self-configure properly. A cable connection might not be working

properly. T ports self-configure (that is, the configuration tool cannot control

the configuration) from F ports when they are cabled properly. Specifically,

these are the ports on the back-end switches in Sun StorEdge 6900 series

configurations only. The ports support the ISL connections.

WARNING

The Flash code is different from the release level. The switch Flash code does

not match the current release version 30462.

This is not an error; QLogic periodically releases new versions of the switch

Flash code and the new version will not match the default version.

The configuration is not set to the default, but the differences are likely

supported alternatives. The default switch configurations were overridden

with valid alternatives, which are also supported by the SUNWsecfg

configuration tools. It should still be flagged as “not the default.” It can

imply any of the following alternatives (these messages are printed to the

screen and to the Storage Automated Diagnostic Environment GUI):

• INFO—Some ports have been set to SL mode, but should have been set

using the setswitchs1command. View and verify this nonstandard

configuration setup as required using the showswitchcommand. Refer to

the Sun StorEdge 3900 and 6900 Series Reference Manual for detailed

configuration information.

• INFO—The chassis ID on the switch is not set to the default value. This

could be caused by unique ID settings or by conflicts in a SAN environment.

• INFO—Ports are identified that are not in the default hard zone. This

could be because the port is set to the same hard zone as the cascaded

switch in a SAN environment.

NOTE: If multiple solutions are connected to a switch, the switch settings might not match the default settings.

Appendix B

SUNWsecfg Error Messages

141

142 Sun StorEdge 3900 and 6900 Series Troubleshooting Guide • March 2002

Index

ethernet hubs