IBM Power Systems
Performance Capabilities Reference
IBM i operating system Version 6.1
January/April/October 2008
This
document is intended for use by qualified performance related programmers or analysts from
IBM, IBM Business Partners and IBM customers using the IBM PowerTM Systems platform
running IBM i operating system. Information in this document may be readily shared with
IBM i customers to understand the performance and tuning factors in IBM i operating system
6.1 and earlier where applicable. For the latest updates and for the latest on IBM i
performance information, please refer to the Performance Management Website:
Requests for use of performance information by the technical trade press or consultants should
be directed to Systems Performance Department V3T, IBM Rochester Lab, in Rochester, MN.
55901 USA.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
IBM i Performance Capabilities
1
Download from Www.Somanuals.com. All Manuals Search And Download.
Table of Contents
Special Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Purpose of this Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Chapter 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Chapter 2. iSeries and AS/400 RISC Server Model Performance Behavior . . . . . . . . . . . . . . . . . 14
2.1 Overview
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.1 Interactive Indicators and Metrics
2.1.2 Disclaimer and Remaining Sections
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.3 V5R3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.4 V5R2 and V5R1
2.2 Server Model Behavior
2.2.1 In V4R5 - V5R2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 Choosing Between Similarly Rated Systems
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.3 Existing Older Models
2.3 Server Model Differences
2.4 Performance Highlights of Model 7xx Servers
2.5 Performance Highlights of Model 170 Servers
2.6 Performance Highlights of Custom Server Models
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.7 Additional Server Considerations
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.8 Interactive Utilization
2.9 Server Dynamic Tuning (SDT)
2.10 Managing Interactive Capacity
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.11 Migration from Traditional Models
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.12 Upgrade Considerations for Interactive Capacity
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.13 iSeries for Domino and Dedicated Server for Domino Performance Behavior
. . . . . . . . . . . . . 34
. . . . . . . . . . . . . . . . . . . . 34
2.13.1 V5R2 iSeries for Domino & DSD Performance Behavior updates
2.13.2 V5R1 DSD Performance Behavior
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Chapter 3. Batch Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.1 Effect of CPU Speed on Batch
3.2 Effect of DASD Type on Batch
3.3 Tuning Parameters for Batch
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Chapter 4. DB2 for i5/OS Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1 New for i5/OS V6R1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
i5/OS V6R1 SQE Query Coverage
4.2 DB2 i5/OS V5R4 Highlights
i5/OS V5R4 SQE Query Coverage
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 i5/OS V5R3 Highlights
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
i5/OS V5R3 SQE Query Coverage
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Partitioned Table Support
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4 V5R2 Highlights - Introduction of the SQL Query Engine
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.5 Indexing
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.6 DB2 Symmetric Multiprocessing feature
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.7 DB2 for i5/OS Memory Sharing Considerations
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.8 Journaling and Commitment Control
4.9 DB2 Multisystem for i5/OS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.10 Referential Integrity
4.11 Triggers
4.12 Variable Length Fields
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.13 Reuse Deleted Record Space
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.14 Performance References for DB2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 IBM i Performance Capabilities
3
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 5. Communications Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.2 Communication Performance Test Environment
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.5 TCP/IP Secure Performance
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.6 Performance Observations and Tips
5.7 APPC, ICF, CPI-C, and Anynet
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.8 HPR and Enterprise extender considerations
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.9 Additional Information
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Chapter 6. Web Server and WebSphere Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.1 HTTP Server (powered by Apache)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2 PHP - Zend Core for i
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.3 WebSphere Application Server
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.4 IBM WebFacing
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
6.5 WebSphere Host Access Transformation Services (HATS)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
6.6 System Application Server Instance
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.7 WebSphere Portal
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.8 WebSphere Commerce
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6.9 WebSphere Commerce Payments
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
6.10 Connect for iSeries
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Chapter 7. Java Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.1 Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.2 What’s new in V6R1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
7.3 IBM Technology for Java (32-bit and 64-bit)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Native Code
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Garbage Collection
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.4 Classic VM (64-bit)
JIT Compiler
Garbage Collection
Bytecode Verification
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.5 Determining Which JVM to Use
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.6 Capacity Planning
General Guidelines
7.7 Java Performance – Tips and Techniques
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
i5/OS Specific Java Tips and Techniques
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Classic VM-specific Tips
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Java Language Performance Tips
Java i5/OS Database Access Tips
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Resources
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Chapter 8. Cryptography Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.1 System i Cryptographic Solutions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.2 Cryptography Performance Test Environment
8.3 Software Cryptographic API Performance
8.4 Hardware Cryptographic API Performance
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.5 Cryptography Observations, Tips and Recommendations
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
8.6 Additional Information
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Chapter 9. iSeries NetServer File Serving Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
9.1 iSeries NetServer File Serving Performance
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Chapter 10. DB2 for i5/OS JDBC and ODBC Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
10.1 DB2 for i5/OS access with JDBC
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
JDBC Performance Tuning Tips
References for JDBC
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 IBM i Performance Capabilities
4
Download from Www.Somanuals.com. All Manuals Search And Download.
10.2 DB2 for i5/OS access with ODBC
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
References for ODBC
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Chapter 11. Domino on i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
11.1 Domino Workload Descriptions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
11.2 Domino 8
11.3 Domino 7
11.4 Domino 6
Notes client improvements with Domino 6
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Domino Web Access client improvements with Domino 6
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
11.5 Response Time and Megahertz relationship
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
11.6 Collaboration Edition and Domino Edition offerings
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
11.7 Performance Tips / Techniques
11.8 Domino Web Access
11.9 Domino Subsystem Tuning
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
11.10 Performance Monitoring Statistics
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
11.11 Main Storage Options
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
11.12 Sizing Domino on System i
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
11.13 LPAR and Partial Processor Considerations
11.14 System i NotesBench Audits and Benchmarks
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Chapter 12. WebSphere MQ for iSeries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
12.1 Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
12.2 Performance Improvements for WebSphere MQ V5.3 CSD6
. . . . . . . . . . . . . . . . . . . . . . . . . . . 175
12.3 Test Description and Results
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
12.4 Conclusions, Recommendations and Tips
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Chapter 13. Linux on iSeries Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
13.1 Summary
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
Key Ideas
13.2 Basic Requirements -- Where Linux Runs
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
13.3 Linux on iSeries Technical Overview
Linux on iSeries Architecture
Linux on iSeries Run-time Support
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
13.4 Basic Configuration and Performance Questions
13.5 General Performance Information and Results
Computational Performance -- C-based code
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
Computational Performance -- Java
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
Web Serving Performance
Network Operations
Gcc and High Optimization (gcc compiler option -O3)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
The Gcc Compiler, Version 3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
13.6 Value of Virtual LAN and Virtual Disk
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Virtual LAN
Virtual Disk
13.7 DB2 UDB for Linux on iSeries
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
13.8 Linux on iSeries and IBM eServer Workload Estimator
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
13.9 Top Tips for Linux on iSeries Performance
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Chapter 14. DASD Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
14.1 Internal (Native) Attachment.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
14.1.0 Direct Attach (Native)
14.1.1 Hardware Characteristics
14.1.2 iV5R2 Direct Attach DASD
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
14.1.3 571B
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 IBM i Performance Capabilities
5
Download from Www.Somanuals.com. All Manuals Search And Download.
14.1.3.1 571B RAID5 vs RAID6 - 10 15K 35GB DASD
14.1.3.2 571B IOP vs IOPLESS - 10 15K 35GB DASD
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
14.1.4 571B, 5709, 573D, 5703, 2780 IOA Comparison Chart
. . . . . . . . . . . . . . . . . . . . . . . . . . 196
14.1.5 Comparing Current 2780/574F with the new 571E/574F and 571F/575B
NOTE: iV5R3 has support for the features in this section but all of our
performance measurements were done on iV5R4 systems. For information on the
supported features see the IBM Product Announcement Letters. . . . . . . . . . . . . . . . . . . . . . . . . . . 198
14.1.6 Comparing 571E/574F and 571F/575B IOP and IOPLess
. . . . . . . . . . . . . . . . . . . . . . . . . 199
14.1.7 Comparing 571E/574F and 571F/575B RAID5 and RAID6 and Mirroring
. . . . . . . . . . . 200
14.1.8 Performance Limits on the 571F/575B
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
14.1.9 Investigating 571E/574F and 571F/575B IOA, Bus and HSL limitations.
. . . . . . . . . . . . 203
. . . . . . . . . . . . . . . . . . . . . . . . . 205
14.1.10 Direct Attach 571E/574F and 571F/575B Observations
14.2 New in iV5R4M5
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
14.2.1 9406-MMA CEC vs 9406-570 CEC DASD
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
14.2.2 RAID Hot Spare
14.2.3 12X Loop Testing
14.3 New in iV6R1M0
14.3.1 Encrypted ASP
14.3.2 57B8/57B7 IOA
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
14.3.3 572A IOA
14.4 SAN - Storage Area Network (External)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
14.5.1 General VIOS Considerations
14.5.1.1 Generic Concepts
14.5.1.2 Generic Configuration Concepts
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
14.5.1.3 Specific VIOS Configuration Recommendations -- Traditional (non-blade)
Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
14.5.1.3 VIOS and JS12 Express and JS22 Express Considerations
. . . . . . . . . . . . . . . . . . . . . . . . . . 222
14.5.1.3.1 BladeCenter H JS22 Express running IBM i operating system/VIOS
. . . . . . . . . . 222
14.5.1.3.2 BladeCenter S and JS12 Express
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
14.5.1.3.3 JS12 Express and JS22 Express Configuration Considerations
. . . . . . . . . . . . . . . . . 228
. . . . . . . . . . . . . . . . . . . . . . . 229
14.5.1.3.4 DS3000/DS4000 Storage Subsystem Performance Tips
14.6 IBM i operating system 5.4 Virtual SCSI Performance
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
14.6.1 Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
14.6.2 Virtual SCSI Performance Examples
14.6.2.1 Native vs. Virtual Performance
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
14.6.2.2 Virtual SCSI Bandwidth-Multiple Network Storage Spaces
. . . . . . . . . . . . . . . . . . . . . . 235
14.6.2.3 Virtual SCSI Bandwidth-Network Storage Description (NWSD) Scaling
. . . . . . . . . . . . 236
14.6.2.4 Virtual SCSI Bandwidth-Disk Scaling
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
14.6.3 Sizing
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
14.6.3.1 Sizing when using Dedicated Processors
14.6.3.2 Sizing when using Micro-Partitioning
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
14.6.3.3 Sizing memory
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
14.6.4 AIX Virtual IO Client Performance Guide
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
14.6.5 Performance Observations and Tips
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
14.6.6 Summary
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Chapter 15. Save/Restore Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
15.1 Supported Backup Device Rates
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
15.2 Save Command Parameters that Affect Performance
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Use Optimum Block Size (USEOPTBLK)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
Data Compression (DTACPR)
Data Compaction (COMPACT)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 IBM i Performance Capabilities
6
Download from Www.Somanuals.com. All Manuals Search And Download.
15.3 Workloads
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
15.4 Comparing Performance Data
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
15.5 Lower Performing Backup Devices
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
15.6 Medium & High Performing Backup Devices
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
15.7 Ultra High Performing Backup Devices
15.8 The Use of Multiple Backup Devices
15.9 Parallel and Concurrent Library Measurements
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
15.9.1 Hardware (2757 IOAs, 2844 IOPs, 15K RPM DASD)
. . . . . . . . . . . . . . . . . . . . . . . . . . . 249
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
15.9.2 Large File Concurrent
15.9.3 Large File Parallel
15.9.4 User Mix Concurrent
15.10 Number of Processors Affect Performance
15.11 DASD and Backup Devices Sharing a Tower
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
15.12 Virtual Tape
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
15.13 Parallel Virtual Tapes
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
15.14 Concurrent Virtual Tapes
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
15.15 Save and Restore Scaling using a Virtual Tape Drive.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
15.16 Save and Restore Scaling using 571E IOAs and U320 15K DASD units to a
3580 Ultrium 3 Tape Drive. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
15.17 High-End Tape Placement on System i
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
15.18 BRMS-Based Save/Restore Software Encryption and DASD-Based ASP
Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
15.19 5XX Tape Device Rates
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
15.20 5XX Tape Device Rates with 571E & 571F Storage IOAs and 4327 (U320)
Disk Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
15.21 5XX DVD RAM and Optical Library
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
15.23 9406-MMA DVD RAM
15.24 9406-MMA 576B IOPLess IOA
Chapter 16 IPL Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
16.1 IPL Performance Considerations
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
16.2 IPL Test Description
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
16.3 9406-MMA System Hardware Information
16.3.1 Small system Hardware Configuration
16.3.2 Large system Hardware Configurations
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
16.4 9406-MMA IPL Performance Measurements (Normal)
16.5 9406-MMA IPL Performance Measurements (Abnormal)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
16.6 NOTES on MSD
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
16.6.1 MSD Affects on IPL Performance Measurements
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
16.7 5XX System Hardware Information
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
16.7.1 5XX Small system Hardware Configuration
16.7.2 5XX Large system Hardware Configuration
16.8 5XX IPL Performance Measurements (Normal)
16.9 5XX IPL Performance Measurements (Abnormal)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
16.10 5XX IOP vs IOPLess effects on IPL Performance (Normal)
. . . . . . . . . . . . . . . . . . . . . . . . . . . 279
16.11 IPL Tips
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Chapter 17. Integrated BladeCenter and System x Performance . . . . . . . . . . . . . . . . . . . . . . . . . 280
17.1 Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
17.2 Effects of Windows and Linux loads on the host system
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
17.2.1 IXS/IXA Disk I/O Operations:
17.2.2 iSCSI Disk I/O Operations:
17.2.3 iSCSI virtual I/O private memory pool
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 IBM i Performance Capabilities
7
Download from Www.Somanuals.com. All Manuals Search And Download.
17.2.4 Virtual Ethernet Connections:
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
17.2.5 IXS/IXA IOP Resource:
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
17.3 System i memory rules of thumb for IXS/IXA and iSCSI attached servers.
. . . . . . . . . . . . . . . . 285
17.3.1 IXS and IXA attached servers:
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
17.3.2 iSCSI attached servers:
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
17.4 Disk I/O CPU Cost
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
17.4.1 Further notes about IXS/IXA Disk Operations
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
17.5 Disk I/O Throughput
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
17.6 Virtual Ethernet CPU Cost and Capacities
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
17.6.1 VE Capacity Comparisons
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
17.6.2 VE CPW Cost
17.6.3 Windows CPU Cost
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
17.7 File Level Backup Performance
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
17.8 Summary
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
17.9 Additional Sources of Information
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Chapter 18. Logical Partitioning (LPAR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
18.1 Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
V5R3 Information
V5R2 Additions
General Tips
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
V5R1 Additions
18.2 Considerations
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
18.3 Performance on a 12-way system
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
18.4 LPAR Measurements
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
18.5 Summary
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
Chapter 19. Miscellaneous Performance Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
19.1 Public Benchmarks (TPC-C, SAP, NotesBench, SPECjbb2000, VolanoMark)
. . . . . . . . . . . . . 302
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
19.2 Dynamic Priority Scheduling
19.3 Main Storage Sizing Guidelines
19.4 Memory Tuning Using the QPFRADJ System Value
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
19.5 Additional Memory Tuning Techniques
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
19.6 User Pool Faulting Guidelines
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
19.7 AS/400 NetFinity Capacity Planning
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
Chapter 20. General Performance Tips and Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
20.1 Adjusting Your Performance Tuning for Threads
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
20.2 General Performance Guidelines -- Effects of Compilation
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
20.3 How to Design for Minimum Main Storage Use (especially with Java, C, C++)
. . . . . . . . . . . . . 317
Theory -- and Practice
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
System Level Considerations
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
Typical Storage Costs
A Brief Example
Which is more important?
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
A Short but Important Tip about Data Base
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
A Final Thought About Memory and Competitiveness
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
20.4 Hardware Multi-threading (HMT)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
HMT Described
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
HMT and SMT Compared and Contrasted
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
Models With/Without HMT
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
20.5 POWER6 520 Memory Considerations
20.6 Aligning Floating Point Data on Power6
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
Chapter 21. High Availability Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
IBM i Performance Capabilities
8
Download from Www.Somanuals.com. All Manuals Search And Download.
21.1 Switchable IASP’s
21.2 Geographic Mirroring
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Chapter 22. IBM Systems Workload Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
22.1 Overview
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
22.2 Merging PM for System i data into the Estimator
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
22.3 Estimator Access
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
22.4 What the Estimator is Not
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
Appendix A. CPW and CIW Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
A.1 Commercial Processing Workload - CPW
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
A.2 Compute Intensive Workload - CIW
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
Appendix B. System i Sizing and Performance Data Collection Tools . . . . . . . . . . . . . . . . . . . . 341
B.1 Performance Data Collection Services
B.2 Batch Modeling Tool (BCHMDL).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
Appendix C. CPW and MCU Relative Performance Values for System i . . . . . . . . . . . . . . . . . . 345
C.1 V6R1 Additions (October 2008)
C.2 V6R1 Additions (August 2008)
C.3 V6R1 Additions (April 2008)
C.4 V6R1 Additions (January 2008)
C.5 V5R4 Additions (July 2007)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
C.6 V5R4 Additions (January/May/August 2006 and January/April 2007)
. . . . . . . . . . . . . . . . . . . . 349
. . . . . . . . . . . . . . . . . . . . . . . . . 351
C.7 V5R3 Additions (May, July, August, October 2004, July 2005)
C.7.1 IBM ~® i5 Servers
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
C.8 V5R2 Additions (February, May, July 2003)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
C.8.1 iSeries Model 8xx Servers
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
C.8.2 Model 810 and 825 iSeries for Domino (February 2003)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
C.9 V5R2 Additions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
C.9.1 Base Models 8xx Servers
C.9.2 Standard Models 8xx Servers
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
C.10 V5R1 Additions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
C.10.1 Model 8xx Servers
C.10.2 Model 2xx Servers
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
C.10.3 V5R1 Dedicated Server for Domino
C.10.4 Capacity Upgrade on-demand Models
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
C.10.4.1 CPW Values and Interactive Features for CUoD Models
. . . . . . . . . . . . . . . . . . . . . . . 358
C.11 V4R5 Additions
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
C.11.1 AS/400e Model 8xx Servers
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
C.11.2 Model 2xx Servers
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
C.11.3 Dedicated Server for Domino
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
C.11.4 SB Models
C.12 V4R4 Additions
C.12.1 AS/400e Model 7xx Servers
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
C.12.2 Model 170 Servers
C.13 AS/400e Model Sxx Servers
C.14 AS/400e Custom Servers
C.15 AS/400 Advanced Servers
C.16 AS/400e Custom Application Server Model SB1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
C.17 AS/400 Models 4xx, 5xx and 6xx Systems
C.18 AS/400 CISC Model Capacities
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 IBM i Performance Capabilities
9
Download from Www.Somanuals.com. All Manuals Search And Download.
Special Notices
DISCLAIMER NOTICE
Performance is based on measurements and projections using standard IBM benchmarks in a controlled
environment. This information is presented along with general recommendations to assist the reader to
have a better understanding of IBM(*) products. The actual throughput or performance that any user will
experience will vary depending upon considerations such as the amount of multiprogramming in the
user's job stream, the I/O configuration, the storage configuration, and the workload processed.
Therefore, no assurance can be given that an individual user will achieve throughput or performance
improvements equivalent to the ratios stated here.
All performance data contained in this publication was obtained in the specific operating environment and
under the conditions described within the document and is presented as an illustration. Performance
obtained in other operating environments may vary and customers should conduct their own testing.
Information is provided "AS IS" without warranty of any kind.
The use of this information or the implementation of any of these techniques is a customer responsibility
and depends on the customer's ability to evaluate and integrate them into the customer's operational
environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there
is no guarantee that the same or similar results will be obtained elsewhere. Customers attempting to adapt
these techniques to their own environments do so at their own risk.
All statements regarding IBM future direction and intent are subject to change or withdrawal without
notice, and represent goals and objectives only. Contact your local IBM office or IBM authorized reseller
for the full text of the specific Statement of Direction.
Some information addresses anticipated future capabilities. Such information is not intended as a
definitive statement of a commitment to specific levels of performance, function or delivery schedules
with respect to any future products. Such commitments are only made in IBM product announcements.
The information is presented here to communicate IBM's current investment and development activities
as a good faith effort to help with our customers' future planning.
IBM may have patents or pending patent applications covering subject matter in this document. The
furnishing of this document does not give you any license to these patents. You can send license
inquiries, in writing, to the IBM Director of Commercial Relations, IBM Corporation, Purchase, NY
10577.
Information concerning non-IBM products was obtained from a supplier of these products, published
announcement material, or other publicly available sources and does not constitute an endorsement of
such products by IBM. Sources for non-IBM list prices and performance numbers are taken from
publicly available information, including vendor announcements and vendor worldwide homepages. IBM
has not tested these products and cannot confirm the accuracy of performance, capability, or any other
claims related to non-IBM products. Questions on the capability of non-IBM products should be
addressed to the supplier of those products.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
IBM i Performance Capabilities
10
Download from Www.Somanuals.com. All Manuals Search And Download.
The following terms, which may or may not be denoted by an asterisk (*) in this publication, are trademarks of the
IBM Corporation.
iSeries or AS/400
C/400
System/370
IPDS
Operating System/400
i5/OS
OS/400
System i5
System i
PS/2
OS/2
DB2
COBOL/400
RPG/400
CallPath
Application System/400
OfficeVision
Facsimile Support/400
Distributed Relational Database Architecture
Advanced Function Printing
Operational Assistant
DRDA
SQL/400
ImagePlus
VTAM
AFP
Client Series
IBM
SQL/DS
400
CICS
S/370
RPG IV
AIX
Micro-partitioning
POWER
PowerTM Systems
PowerPC
APPN
Workstation Remote IPL/400
Advanced Peer-to-Peer Networking
OfficeVision/400
iSeries Advanced Application Architecture
ADSTAR Distributed Storage Manager/400
IBM Network Station
Lotus, Lotus Notes, Lotus Word Pro, Lotus 1-2-3
POWER4+
POWER5+
SystemView
ValuePoint
DB2/400
ADSM/400
AnyNet/400
POWER4
POWER5
POWER6
POWER6+
PowerTM Systems Software
PowerTM Systems Software
The following terms, which may or may not be denoted by a double asterisk (**) in this publication, are trademarks
or registered trademarks of other companies as follows:
TPC Benchmark
TPC-A, TPC-B
TPC-C, TPC-D
ODBC, Windows NT Server, Access
Visual Basic, Visual C++
Adobe PageMaker
Borland Paradox
CorelDRAW!
Paradox
Transaction Processing Performance Council
Transaction Processing Performance Council
Transaction Processing Performance Council
Microsoft Corporation
Microsoft Corporation
Adobe Systems Incorporated
Borland International Incorporated
Corel Corporation
Borland International
WordPerfect
BEST/1
Satelite Software International
BGS Systems, Inc.
NetWare
Novell
Compaq
Proliant
BAPCo
Harvard
HP-UX
HP 9000
INTERSOLV
Q+E
Compaq Computer Corporation
Compaq Computer Corporation
Business Application Performance Corporation
Gaphics Software Publishing Corporation
Hewlett Packard Corporation
Hewlett Packard Corporation
Intersolve, Inc.
Intersolve, Inc.
Netware
Novell, Inc.
SPEC
UNIX
WordPerfect
Powerbuilder
SQLWindows
NetBench
Systems Performance Evaluation Cooperative
UNIX Systems Laboratories
WordPerfect Corporation
Powersoft Corporation
Gupta Corporation
Ziff-Davis Publishing Company
Digital Equipment Corporation
DEC Alpha
Microsoft, Windows, Windows 95, Windows NT, Internet Explorer, Word, Excel, and Powerpoint, and the Windows logo are
trademarks of Microsoft Corporation in the United States, other countries, or both.
Intel, Intel Inside (logos), MMX and Pentium are trademarks of Intel Corporation in the United States, other countries, or both.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
Other company, product or service names may be trademarks or service marks of others.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
IBM i Performance Capabilities
11
Download from Www.Somanuals.com. All Manuals Search And Download.
Purpose of this Document
The intent of this document is to help provide guidance in terms of IBM i operating system
performance, capacity planning information, and tips to obtain optimal performance on IBM i
operating system. This document is typically updated with each new release or more often if needed.
This October 2008 edition of the IBM i V6.1 Performance Capabilities Reference Guide is an update to
the April 2008 edition to reflect new product functions announced on October 7, 2008.
This edition includes performance information on newly announced IBM Power Systems including
Power 520 and Power 550, utilizing POWER6 processor technology. This document further includes
information on IBM System i 570 using POWER6 processor technology, IBM i5/OS running on IBM
BladeCenter JS22 using POWER6 processor technology, recent System i5 servers (model 515, 525, and
595) featuring new user-based licensing for the 515 and 525 models and a new 2.3GHz model 595, DB2
UDB for iSeries SQL Query Engine Support, Websphere Application Server including WAS V6.1 both
with the Classic VM and the IBM Technology for Java (32-bit) VM, WebSphere Host Access
Transformation Services (HATS) including the IBM WebFacing Deployment Tool with HATS
Technology (WDHT), PHP - Zend Core for i, Java including Classic JVM (64-bit), IBM Technology for
Java (32-bit), IBM Technology for Java (64-bit) and bytecode verification, Cryptography, Domino 7,
Workplace Collaboration Services (WCS), RAID6 versus RAID5 disk comparisons, new internal storage
adapters, Virtual Tape, and IPL Performance.
The wide variety of applications available makes it extremely difficult to describe a "typical" workload.
The data in this document is the result of measuring or modeling certain application programs in very
specific and unique configurations, and should not be used to predict specific performance for other
applications. The performance of other applications can be predicted using a system sizing tool such as
IBM Systems Workload Estimator (refer to Chapter 22 for more details on Workload Estimator).
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
IBM i Performance Capabilities
12
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 1. Introduction
IBM System i and IBM System p platforms unified the value of their servers into a single,
powerful lineup of servers based on industry leading POWER6 processor technology with
support for IBM i operating system (formerly known as i5/OS), IBM AIX and Linux for Power.
Following along with this exciting unification are a number of naming changes to the formerly
named i5/OS, now officially called IBM i operating system. Specifically, recent versions of the
operating system are referred to by IBM i operating system 6.1 and IBM i operating system 5.4,
formerly i5/OS V6R1 and i5/OS V5R4 respectively. Shortened forms of the new operating
system name are IBM i 6.1, i 6.1, i V6.1 iV6R1, and sometimes simply ‘i’. As always,
references to legacy hardware and software will commonly use the naming conventions of the
time.
The Power 520 Express Edition is the entry member of the Power Systems portfolio, supporting
both IBM i 5.4 and IBM i 6.1. The System i 570 is enhanced to enable medium and large
enterprises to grow and extend their IBM i business applications more affordably and with more
granularity, while offering effective and scalable options for deploying Linux and AIX
applications on the same secure, reliable system.
The IBM Power 570 running IBM i offers IBM's fastest POWER6 processors in 2 to 16-way
configurations, plus an array of other technology advances. It is designed to deliver outstanding
price/performance, mainframe-inspired reliability and availability features, flexible capacity
upgrades, and innovative virtualization technologies. New 5.0GHz and 4.4GHz POWER6
processors use the very latest 64-bit IBM POWER processor technology. Each 2-way 570
processor card contains one two-core chip (two processors) and comes with 32 MB of L3 cache
and 8 MB of L2 cache.
The CPW ratings for systems with POWER6 processors are approximately 70% higher than
equivalent POWER5 systems and approximately 30% higher than equivalent POWER5+
systems. For some compute-intensive applications, the new System i 570 can deliver up to twice
the performance of the original 570 with 1.65 GHz POWER5 processors.
The 515 and 525 models introduced in April 2007, introduce user-based licensing for IBM i. For
assistance in determining the required number of user licenses, see
replacement for system sizing; instead, user-based licensing enables appropriate user
connectivity to the system. Application environments require different amounts of system
resources per user. See Chapter 22 (IBM Systems Workload Estimator) for assistance in system
sizing.
Customers who wish to remain with their existing hardware but want to move to IBM i 6.1 may
find functional and performance improvements. IBM i 6.1 continues to help protect the
customer's investment while providing more function and better price/performance over previous
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 1- Introduction
13
Download from Www.Somanuals.com. All Manuals Search And Download.
versions. The primary public performance information web site is found at:
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 1- Introduction
14
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 2. iSeries and AS/400 RISC Server Model Performance Behavior
2.1 Overview
iSeries and AS/400 servers are intended for use primarily in client/server or other non-interactive work
environments such as batch, business intelligence, network computing etc. 5250-based interactive work
can be run on these servers, but with limitations. With iSeries and AS/400 servers, interactive capacity
can be increased with the purchase of additional interactive features. Interactive work is defined as any
job doing 5250 display device I/O. This includes:
All 5250 sessions
RUMBA/400
Any green screen interface
Screen scrapers
Telnet or 5250 DSPT workstations
5250/HTML workstation gateway
PC's using 5250 emulation
Interactive program debugging
PC Support/400 work station function
Interactive subsystem monitors
Twinax printer jobs
BSC 3270 emulation
5250 emulation
Note that printer work that passes through twinax media is treated as interactive, even though there is no
“user interface”. This is true regardless of whether the printer is working in dedicated mode or is printing
spool files from an out queue. Printer activity that is routed over a LAN through a PC print controller are
not considered to be interactive.
This explanation is different than that found in previous versions of this document. Previous versions
indicated that spooled work would not be considered to be interactive and were in error.
As of January 2003, 5250 On-line Transaction Processing (OLTP) replaces the term “interactive” when
referencing interactive CPW or interactive capacity. Also new in 2003, when ordering a iSeries server, the
customer must choose between a Standard Package and an Enterprise Package in most cases. The
Standard Packages comes with zero 5250 CPW and 5250 OLTP workloads are not supported. However,
the Standard Package does support a limited 5250 CPW for a system administrator to manage various
aspects of the server. Multiple administrative jobs will quickly exceed this capability. The Enterprise
Package does not have any limits relative to 5250 OLTP workloads. In other words, 100% of the server
capacity is available for 5250 OLTP applications whenever you need it.
5250 OLTP applications can be run after running the WebFacing Tool of IBM WebSphere Development
Studio for iSeries and will require no 5250 CPW if on V5R2 and using model 800, 810, 825, 870, or 890
hardware.
2.1.1 Interactive Indicators and Metrics
Prior to V4R5, there were no system metrics that would allow a customer to determine the overall
interactive feature capacity utilization. It was difficult for the customer to determine how much of the
total interactive capacity he was using and which jobs were consuming interactive capacity. This got
much easier with the system enhancements made in V4R5 and V5R1.
Starting with V4R5, two new metrics were added to the data generated by Collection Services to report
the system's interactive CPU utilization (ref file QAPMSYSCPU). The first metric (SCIFUS) is the
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 2 - Server Performance Behavior
15
Download from Www.Somanuals.com. All Manuals Search And Download.
interactive utilization - an average for the interval. Since average utilization does not indicate potential
problems associated with peak activity, a second metric (SCIFTE) reports the amount of interactive
utilization that occurred above threshold. Also, interactive feature utilization was reported when printing
a System Report generated from Collection Services data. In addition, Management Central now
monitors interactive CPU relative to the system/partition capacity.
Also in V4R5, a new operator message, CPI1479, was introduced for when the system has consistently
exceeded the purchased interactive capacity on the system. The message is not issued every time the
capacity is reached, but it will be issued on an hourly basis if the system is consistently at or above the
limit. In V5R2, this message may appear slightly more frequently for 8xx systems, even if there is no
change in the workload. This is because the message event was changed from a point that was beyond the
purchased capacity to the actual capacity for these systems in V5R2.
In V5R1, Collection Services was enhanced to mark all tasks that are counted against interactive capacity
(ref file QAPMJOBMI, field JBSVIF set to ‘1’). It is possible to query this file to understand what tasks
have contributed to the system’s interactive utilization and the CPU utilized by all interactive tasks. Note:
the system’s interactive capacity utilization may not be equal to the utilization of all interactive tasks.
Reasons for this are discussed in Section 2.10, Managing Interactive Capacity.
With the above enhancements, a customer can easily monitor the usage of interactive feature and decide
when he is approaching the need for an interactive feature upgrade.
2.1.2 Disclaimer and Remaining Sections
The performance information and equations in this chapter represent ideal environments. This
information is presented along with general recommendations to assist the reader to have a better
understanding of the iSeries server models. Actual results may vary significantly.
This chapter is organized into the following sections:
y Server Model Behavior
y Server Model Differences
y Performance Highlights of New Model 7xx Servers
y Performance Highlights of Current Model 170 Servers
y Performance Highlights of Custom Server Models
y Additional Server Considerations
y Interactive Utilization
y Server Dynamic Tuning (SDT)
y Managing Interactive Capacity
y Migration from Traditional Models
y Migration from Server Models
y Dedicated Server for Domino (DSD) Performance Behavior
2.1.3 V5R3
Beginning with V5R3, the processing limitations associated with the Dedicated Server for Domino (DSD)
models have been removed. Refer to section 2.13, “Dedicated Server for Domino Performance
Behavior”, for additional information.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 2 - Server Performance Behavior
16
Download from Www.Somanuals.com. All Manuals Search And Download.
2.1.4 V5R2 and V5R1
There were several new iSeries 8xx and 270 server model additions in V5R1 and the i890 in V5R2.
However, with the exception of the DSD models, the underlying server behavior did not change from
V4R5. All 27x and 8xx models, including the new i890 utilize the same server behavior algorithm that
was announced with the first 8xx models supported by V4R5. For more details on these new models,
please refer to Appendix C, “CPW, CIW and MCU Values for iSeries”.
Five new iSeries DSD models were introduced with V5R1. In addition, V5R1 expanded the capability of
the DSD models with enhanced support of Domino-complementary workloads such as Java Servlets and
WebSphere Application Server. Please refer to Section 2.13, Dedicated Server for Domino Performance
Behavior, for additional information.
2.2 Server Model Behavior
2.2.1 In V4R5 - V5R2
Beginning with V4R5, all 2xx, 8xx and SBx model servers utilize an enhanced server algorithm that
manages the interactive CPU utilization. This enhanced server algorithm may provide significant user
benefit. On prior models, when interactive users exceed the interactive CPW capacity of a system,
additional CPU usage visible in one or more CFINT tasks, reduces system capacity for all users including
client/server. New in V4R5, the system attempts to hold interactive CPU utilization below the threshold
where CFINT CPU usage begins to increase. Only in cases where interactive demand exceeds the
limitations of the interactive capacity for an extended time (for example: from long-running,
CPU-intensive transactions), will overhead be visable via the CFINT tasks. Highlights of this new
algorithm include the following:
y
As interactive users exceed the installed interactive CPW capacity, the response times of those
applications may significantly lengthen and the system will attempt to manage these interactive
excesses below a level where CFINT CPU usage begins to increase. Generally, increased CFINT
may still occur but only for transient periods of time. Therefore, there should be remaining system
capacity available for non-interactive users of the system even though the interactive capacity has
been exceeded. It is still a good practice to keep interactive system use below the system interactive
CPW threshold to avoid long interactive response times.
y
y
Client/server users should be able to utilize most of the remaining system capacity even though the
interactive users have temporarily exceeded the maximum interactive CPW capacity.
The iSeries Dedicated Server for Domino models behave similarly when the Non Domino CPW
capacity has been exceeded (i.e. the system attempts to hold Non Domino CPW capacity below the
threshold where CFINT overhead is normally activated). Thus, Domino users should be able to run in
the remaining system capacity available.
y
y
With the advent of the new server algorithm, there is not a concept known as the interactive knee or
interactive cap. The system just attempts to manage the interactive CPU utilization to the level of the
interactive CPW capacity.
Dynamic priority adjustment (system value QDYNPTYADJ) will not have any effect managing the
interactive workloads as they exceed the system interactive CPW capacity. On the other hand, it
won’t hurt to have it activated.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 2 - Server Performance Behavior
17
Download from Www.Somanuals.com. All Manuals Search And Download.
y
The new server algorithm only applies to the new hardware available in V4R5 (2xx, 8xx and SBx
models). The behavior of all other hardware, such as the 7xx models is unchanged (see section 2.2.3
Existing Model section for 7xx algorithm).
2.2.2 Choosing Between Similarly Rated Systems
Sometimes it is necessary to choose between two systems that have similar CPW values but different
processor megahertz (MHz) values or L2 cache sizes. If your applications tend to be compute intensive
such as Java, WebSphere, EJBs, and Domino, you may want to go with the faster MHz processors
because you will generally get faster response times. However, if your response times are already
sub-second, it is not likely that you will notice the response time improvements. If your applications tend
to be L2 cache friendly such as many traditional commercial applications are, you may want to choose the
system that has the larger L2 cache. In either case, you can use the IBM eServer Workload Estimator to
2.2.3 Existing Older Models
Server model behavior applies to:
y AS/400 Advanced Servers
y AS/400e servers
y AS/400e custom servers
y AS/400e model 150
y iSeries model 170
y iSeries model 7xx
Relative performance measurements are derived from commercial processing workload (CPW) on iSeries
and AS/400. CPW is representative of commercial applications, particularly those that do significant
database processing in conjunction with journaling and commitment control.
Traditional (non-server) AS/400 system models had a single CPW value which represented the maximum
workload that can be applied to that model. This CPW value was applicable to either an interactive
workload, a client/server workload, or a combination of the two.
Now there are two CPW values. The larger value represents the maximum workload the model could
support if the workload were entirely client/server (i.e. no interactive components). This CPW value is for
the processor feature of the system. The smaller CPW value represents the maximum workload the model
could support if the workload were entirely interactive. For 7xx models this is the CPW value for the
interactive feature of the system.
The two CPW values are NOT additive - interactive processing will reduce the system's
client/server processing capability. When 100% of client/server CPW is being used, there is no CPU
available for interactive workloads. When 100% of interactive capacity is being used, there is no CPU
available for client/server workloads.
For model 170s announced in 9/98 and all subsequent systems, the published interactive CPW represents
the point (the "knee of the curve") where the interactive utilization may cause increased overhead on the
system. (As will be discussed later, this threshold point (or knee) is at a different value for previously
announced server models). Up to the knee the server/batch capacity is equal to the processor capacity
(CPW) minus the interactive workload. As interactive requirements grow beyond the knee, overhead
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 2 - Server Performance Behavior
18
Download from Www.Somanuals.com. All Manuals Search And Download.
grows at a rate which can eventually eliminate server/batch capacity and limit additional interactive
growth. It is best for interactive workloads to execute below (less than) the knee of the curve.
(However, for those models having the knee at 1/3 of the total interactive capacity, satisfactory
performance can be achieved.) The following graph illustrates these points.
Model 7xx and 9/98 Model 170 CPU
CPU Distribution vs. Interactive Utilization
100
Announced
Capacities
Stop Here!
80
Available for
Client/Server
available
60
overhead
Knee
40
20
0
interactive
0
Full 7/6
Fraction of Interactive CPW
Applies to: Model 170 announced in 9/98 and ALL systems announced on or after 2/99
Figure 2.1. Server Model behavior
The figure above shows a straight line for the effective interactive utilization. Real/customer
environments will produce a curved line since most environments will be dynamic, due to job initiation,
interrupts, etc.
In general, a single interactive job will not cause a significant impact to client/server performance
Microcode task CFINTn, for all iSeries models, handles interrupts, task switching, and other similar
system overhead functions. For the server models, when interactive processing exceeds a threshold
amount, the additional overhead required will be manifest in the CFINTn task. Note that a single
interactive job will not incur this overhead.
There is one CFINTn task for each processor. For example, on a single processor system only CFINT1
will appear. On an 8-way processor, system tasks CFINT1 through CFINT8 will appear. It is possible to
see significant CFINT activity even when server/interactive overhead does not exist. For example if there
are lots of synchronous or communication I/O or many jobs with many task switches.
The effective interactive utilization (EIU) for a server system can be defined as the useable interactive
utilization plus the total of CFINT utilization.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 2 - Server Performance Behavior
19
Download from Www.Somanuals.com. All Manuals Search And Download.
2.3 Server Model Differences
Server models were designed for a client/server workload and to accommodate an interactive workload.
When the interactive workload exceeds an interactive CPW threshold (the “knee of the curve”) the
client/server processing performance of the system becomes increasingly impacted at an accelerating rate
beyond the knee as interactive workload continues to build. Once the interactive workload reaches the
maximum interactive CPW value, all the CPU cycles are being used and there is no capacity available for
handling client/server tasks.
Custom server models interact with batch and interactive workloads similar to the server models but the
degree of interaction and priority of workloads follows a different algorithm and hence the knee of the
curve for workload interaction is at a different point which offers a much higher interactive workload
capability compared to the standard server models.
For the server models the knee of the curve is approximately:
y 100% of interactive CPW for:
y iSeries model 170s announced on or after 9/98
y 7xx models
y 6/7 (86%) of interactive CPW for:
y AS/400e custom servers
y 1/3 of interactive CPW for:
y AS/400 Advanced Servers
y AS/400e servers
y AS/400e model 150
y iSeries model 170s announced in 2/98
For the 7xx models the interactive capacity is a feature that can be sized and purchased like any other
feature of the system (i.e. disk, memory, communication lines, etc.).
The following charts show the CPU distribution vs. interactive utilization for Custom Server and pre-2/99
Server models.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 2 - Server Performance Behavior
20
Download from Www.Somanuals.com. All Manuals Search And Download.
Custom Server Model
CPU Distribution vs. Interactive Utilization
100
80
Available for
Client/Server
available
CFINT
60
40
20
0
Knee
interactive
0
6/7 Full
Fraction of Interactive CPW
Applies to: AS/400e Custom Servers, AS/400e Mixed Mode Servers
Figure 2.2. Custom Server Model behavior
Server Model
CPU Distribution vs. Interactive Utilization
100
80
Available for
Client/Server
60
40
20
0
available
CFINT
interactive
Knee
0
1/3 Int-CPW
Fraction of Interactive CPW
Full Int-CPW
Applies to: AS/400 Advanced Servers, AS/400e servers,
Model 150, Model 170s announced in 2/98
Figure 2.3. Server Model behavior
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 2 - Server Performance Behavior
21
Download from Www.Somanuals.com. All Manuals Search And Download.
2.4 Performance Highlights of Model 7xx Servers
7xx models were designed to accommodate a mixture of traditional “green screen” applications and more
intensive “server” environments. Interactive features may be upgraded if additional interactive capacity is
required. This is similar to disk, memory, or other features.
Each system is rated with a processor CPW which represents the relative performance (maximum
capacity) of a processor feature running a commercial processing workload (CPW) in a client/server
environment. Processor CPW is achievable when the commercial workload is not constrained by main
storage or DASD.
Each system may have one of several interactive features. Each interactive feature has an interactive
CPW associated with it. Interactive CPW represents the relative performance available to perform
host-centric (5250) workloads. The amount of interactive capacity consumed will reduce the available
processor capacity by the same amount. The following example will illustrate this performance capacity
interplay:
Model 7xx and 9/98 Model 170
CPU Distribution vs. Interactive Utilization
Model 7xx Processor FC 206B (240 / 70 CPW)
100
Announced
Capacities
Stop Here!
80
available
CFINT
Available for
Client/Server
60
40
20
0
Knee
interactive
34%
29.2%
0
20
40
60
80
100 (7/6)117
% of Published Interactive CPU
Applies to: Model 170 announced in 9/98 and ALL systems announced on or after 2/99
Figure 2.4. Model 7xx Utilization Example
At 110% of percent of the published interactive CPU, or 32.1% of total CPU, CFINT will use an
additional 39.8% (approximate) of the total CPU, yielding an effective interactive CPU utilization of
approximately 71.9%. This leaves approximately 28.1% of the total CPU available for client/server
work. Note that the CPU is completely utilized once the interactive workload reaches about 34%.
(CFINT would use approximately 66% CPU). At this saturation point, there is no CPU available for
client/server.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 2 - Server Performance Behavior
22
Download from Www.Somanuals.com. All Manuals Search And Download.
2.5 Performance Highlights of Model 170 Servers
iSeries Dedicated Server for Domino models will be generally available on September 24, 1999. Please
refer to Section 2.13, iSeries Dedicated Server for Domino Performance Behavior, for additional
information.
Model 170 servers (features 2289, 2290, 2291, 2292, 2385, 2386 and 2388) are significantly more
powerful than the previous Model 170s announced in Feb. '98. They have a faster processor (262MHz vs.
125MHz) and more main memory (up to 3.5GB vs. 1.0GB). In addition, the interactive workload
balancing algorithm has been improved to provide a linear relationship between the client/server (batch)
and published interactive workloads as measured by CPW.
The CPW rating for the maximum client/server workload now reflects the relative processor capacity
rather than the "system capacity" and therefore there is no need to state a "constrained performance"
CPW. This is because some workloads will be able to run at processor capacity if they are not DASD,
memory, or otherwise limited.
Just like the model 7xx, the current model 170s have a processor capacity (CPW) value and an
interactive capacity (CPW) value. These values behave in the same manner as described in the
Performance highlights of new model 7xx servers section.
As interactive workload is added to the current model 170 servers, the remaining available client/server
(batch) capacity available is calculated as: CPW (C/S batch) = CPW(processor) - CPW(interactive)
This is valid up to the published interactive CPW rating. As long as the interactive CPW workload does
not exceed the published interactive value, then interactive performance and client/server (batch)
workloads will be both be optimized for best performance. Bottom line, customers can use the entire
interactive capacity with no impacts to client/server (batch) workload response times.
On the current model 170s, if the published interactive capacity is exceeded, system overhead grows
very quickly, and the client/server (batch) capacity is quickly reduced and becomes zero once the
interactive workload reaches 7/6 of the published interactive CPW for that model.
The absolute limit for dedicated interactive capacity on the current models can be computed by
multiplying the published interactive CPW rating by a factor of 7/6. The absolute limit for dedicated
client/server (batch) is the published processor capacity value. This assumes that sufficient disk and
memory as well as other system resources are available to fit the needs of the customer's programs, etc.
Customer workloads that would require more than 10 disk arms for optimum performance should not be
expected to give optimum performance on the model 170, as 10 disk access arms is the maximum
configuration.
When the model 170 servers are running less than the published interactive workload, no Server Dynamic
Tuning (SDT) is necessary to achieve balanced performance between interactive and client/server (batch)
workloads. However, as with previous server models, a system value (QDYNPTYADJ - Server Dynamic
Tuning ) is available to determine how the server will react to work requests when interactive workload
exceeds the "knee". If the QDYNPTYADJ value is turned on, client/server work is favored over
additional interactive work. If it is turned off, additional interactive work is allowed at the expense of
low-priority client/server work. QDYNPTYADJ only affects the server when interactive requirements
exceed the published interactive capacity rating. The shipped default value is for QDYNPTYADJ to be
turned on.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 2 - Server Performance Behavior
23
Download from Www.Somanuals.com. All Manuals Search And Download.
The next chart shows the performance capacity of the current and previous Model 170 servers.
Previous vs. Current AS/400e server 170 Performance
1200
1090
1000
800
Current
Previous *
600
400
200
0
460 460
319 319
220
210
114
115
73
73
70
70
67
50
50
40
30
29
25
23
20
16
15
2159 2160 2164 2176 2183
2289 2290 2291 2292 2385 2386 2388
Interactive
Processor
* Unconstrained V4R2 rates
Figure 2.5. Previous vs. Current Server 170 Performance
2.6 Performance Highlights of Custom Server Models
Custom server models were available in releases V4R1 through V4R3. They interact with batch and
interactive workloads similar to the server models but the degree of interaction and priority of workloads
is different, and the knee of the curve for workload interaction is at a different point. When the
interactive workload exceeds approximately 6/7 of the maximum interactive CPW (the knee of the curve),
the client/server processing performance of the system becomes increasingly impacted. Once the
interactive workload reaches the maximum interactive CPW value, all the CPU cycles are being used and
there is no
capacity available for handling client/server tasks.
2.7 Additional Server Considerations
It is recommended that the System Operator job run at runpty(9) or less. This is because the possibility
exists that runaway interactive jobs will force server/interactive overhead to their maximum. At this
point it is difficult to initiate a new job and one would need to be able to work with jobs to hold or cancel
runaway jobs.
You should monitor the interactive activity closely. To do this take advantage of PM/400 or else run
Collection Services nearly continuously and query monitor data base each day for high interactive use
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 2 - Server Performance Behavior
24
Download from Www.Somanuals.com. All Manuals Search And Download.
and higher than normal CFINT values. The goal is to avoid exceeding the threshold (knee of the curve)
value of interactive capacity.
2.8 Interactive Utilization
When the interactive CPW utilization is beyond the knee of the curve, the following formulas can be used
to determine the effective interactive utilization or the available/remaining client/server CPW. These
equations apply to all server models.
CPWcs(maximum) = client/server CPW maximum value
CPWint(maximum) = interactive CPW maximum value
CPWint(knee)
CPWint
= interactive CPW at the knee of the curve
= interactive CPW of the workload
X is the ratio that says how far into the overhead zone the workload has extended:
= (CPWint - CPWint(knee)) / (CPWint(maximum) - CPWint(knee))
X
EIU = Effective interactive utilization. In other words, the free running, CPWint(knee), interactive plus the
combination of interactive and overhead generated by X.
EIU = CPWint(knee) + (X * (CPWcs(maximum) - CPWint(knee)))
CPW remaining for batch = CPWcs(maximum) - EIU
Example 1:
A model 7xx server has a Processor CPW of 240 and an Interactive CPW of 70.
The interactive CPU percent at the knee equals (70 CPW / 240 CPW) or 29.2%.
The maximum interactive CPU percent (7/6 of the Interactive CPW ) equals (81.7 CPW / 240 CPW) or
34%.
Now if the interactive CPU is held to less than 29.2% CPU (the knee), then the CPU available for the
System, Batch, and Client/Server work is 100% - the Interactive CPU used.
If the interactive CPU is allowed to grow above the knee, say for example 32.1 % (110% of the knee),
then the CPU percent remaining for the Batch and System is calculated using the formulas above:
X
= (32.1 - 29.2) / (34 - 29.2) = .604
EIU = 29.2 + (.604 * (100 - 29.2)) = 71.9%
CPW remaining for batch = 100 - 71.9 = 28.1%
Note that a swing of + or - 1% interactive CPU yields a swing of effective interactive utilization (EIU)
from 57% to 87%. Also note that on custom servers and 7xx models, environments that go beyond the
interactive knee may experience erratic behavior.
Example 2:
A Server Model has a Client/Server CPW of 450 and an Interactive CPW of 50.
The maximum interactive CPU percent equals (50 CPW / 450 CPW) or 11%.
The interactive CPU percent at the knee is 1/3 the maximum interactive value. This would equal 4%.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 2 - Server Performance Behavior
25
Download from Www.Somanuals.com. All Manuals Search And Download.
Now if the interactive CPU is held to less than 4% CPU (the knee), then the CPU available for the
System, Batch, and Client/Server work is 100% - the Interactive CPU used.
If the interactive CPU is allowed to grow above the knee, say for example 9% (or 41 CPW), then the CPU
percent remaining for the Batch and System is calculated using the formulas above:
X
= (9 - 4) / (11 - 4) = .71
(percent into the overhead area)
EIU = 4 + (.71 * (100 - 4)) = 72%
CPW remaining for batch = 100 - 72 = 28%
Note that a swing of + or - 1% interactive CPU yields a swing of effective interactive utilization (EIU)
from 58% to 86%.
On earlier server models, the dynamics of the interactive workload beyond the knee is not as abrupt, but
because there is typically less relative interactive capacity the overhead can still cause inconsistency in
response times.
2.9 Server Dynamic Tuning (SDT)
Logic was added in V4R1 and is still in use today so customers could better control the impact of
interactive work on their client/server performance. Note that with the new Model 170 servers (features
2289, 2290, 2291, 2292, 2385, 2386 and 2388) this logic only affects the server when interactive
requirements exceed the published interactive capacity rating. For further details see the section,
Performance highlights of current model 170 servers.
Through dynamic prioritization, all interactive jobs will be put lower in the priority queue, approximately
at the knee of the curve. Placing the interactive jobs at a lesser priority causes the interactive jobs to slow
down, and more processing power to be allocated to the client/server processing. As the interactive jobs
receive less processing time, their impact on client/server processing will be lessened. When the
interactive jobs are no longer impacting client/server jobs, their priority will dynamically be raised again.
The dynamic prioritization acts as a regulator which can help reduce the impact to client/server
processing when additional interactive workload is placed on the system. In most cases, this results in
better overall throughput when operating in a mixed client/server and interactive environment, but it can
cause a noticeable slowdown in interactive response.
To fully enable SDT, customers MUST use a non-interactive job run priority (RUNPTY parameter) value
of 35 or less (which raises the priority, closer to the default priority of 20 for interactive jobs).
Changing the existing non-interactive job’s run priority can be done either through the Change Job
(CHGJOB) command or by changing the RUNPTY value of the Class Description object used by the
non-interactive job. This includes IBM-supplied or application provided class descriptions.
Examples of IBM-supplied class descriptions with a run priority value higher than 35 include QBATCH
and QSNADS and QSYSCLS50. Customers should consider changing the RUNPTY value for
QBATCH and QSNADS class descriptions or changing subsystem routing entries to not use class
descriptions QBATCH, QSNADS, or QSYSCLS50.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 2 - Server Performance Behavior
26
Download from Www.Somanuals.com. All Manuals Search And Download.
If customers modify an IBM-supplied class description, they are responsible for ensuring the priority
value is 35 or less after each new release or cumulative PTF package has been installed. One way to do
this is to include the Change Class (CHGCLS) command in the system Start Up program.
NOTE: Several IBM-supplied class descriptions already have RUNPTY values of 35 or less. In these
cases no user action is required. One example of this is class description QPWFSERVER with
RUNPTY(20). This class description is used by Client Access database server jobs QZDAINIT (APPC)
and QZDASOINIT (TCP/IP).
The system deprioritizes jobs according to groups or "bands" of RUNPTY values. For example, 10-16 is
band 1, 17-22 is band 2, 23-35 is band 3, and so on.
Interactive jobs with priorities 10-16 are an exception case with V4R1. Their priorities will not be
adjusted by SDT. These jobs will always run at their specified 10-16 priority.
When only a single interactive job is running, it will not be dynamically reprioritized.
When the interactive workload exceeds the knee of the curve, the priority of all interactive jobs is
decreased one priority band, as defined by the Dynamic Priority Scheduler, every 15 seconds. If needed,
the priority will be decreased to the 52-89 band. Then, if/when the interactive CPW work load falls
below the knee, each interactive job's priority will gradually be reset to its starting value when the job is
dispatched.
If the priority of non-interactive jobs are not set to 35 or lower, SDT stills works, but its effectiveness is
greatly reduced, resulting in server behavior more like V3R6 and V3R7. That is, once the knee is
exceeded, interactive priority is automatically decreased. Assuming non-interactive is set at priority 50,
interactive could eventually get decreased to the 52-89 priority band. At this point, the processor is
slowed and interactive and non-interactive are running at about the same priority. (There is little priority
difference between 47-51 band and the 52-89 band.) If the Dynamic Priority Scheduler is turned off,
SDT is also turned off.
Note that even with SDT, the underlying server behavior is unchanged. Customers get no more CPU
cycles for either interactive or non-interactive jobs. SDT simply tries to regulate interactive jobs once
they exceed the knee of the curve.
Obviously systems can still easily exceed the knee and stay above it, by having a large number of
interactive jobs, by setting the priority of interactive jobs in the 10-16 range, by having a small
client/server workload with a modest interactive workload, etc. The entire server behavior is a partnership
with customers to give non-interactive jobs the bulk of the CPU while not entirely shutting out
interactive.
To enable the Server Dynamic Tuning enhancement ensure the following system values are on:
(the shipped defaults are that they are set on)
y
y
QDYNPTYSCD - this improves the job scheduling based on job impact on the system.
QDYNPTYADJ - this uses the scheduling tool to shift interactive priorities after the threshold is
reached.
The Server Dynamic Tuning enhancement is most effective if the batch and client/server priorities are in
the range of 20 to 35.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 2 - Server Performance Behavior
27
Download from Www.Somanuals.com. All Manuals Search And Download.
Server Dynamic Tuning Recommendations
On the new systems and mixed mode servers have the QDYNPTYSCD and QDYNPTYADJ system
value set on. This preserves non-interactive capacities and the interactive response times will be dynamic
beyond the knee regardless of the setting. Also set non-interactive class run priorities to less than 35.
On earlier servers and 2/98 model 170 systems use your interactive requirements to determine the
settings. For “pure interactive” environments turn the QDYNPTYADJ system value off. in mixed
environments with important non-interactive work, leave the values on and change the run priority of
important non-interactive work to be less than 35.
Affects of Server Dynamic Tuning
Server Dynamic Tuning - .
High "Server" Demand
100
Server Dynamic Tuning
Mixed "Server" Demand
100
80
80
available
Available for
Client/Server
Available for
Client/Server
60
60
available
interactive
O.H. or Server
Int. or Server
interactive
40
20
0
40
Knee
Knee
20
0
0
1/3 Int-CPW
Full Int-CPW
0
1/3 Int-CPW
Full Int-CPW
Fraction of Interactive CPW
Fraction of Interactive CPW
Without high "server"
demand, Interactive
allowed to grow to limit
With sufficient batch or
client/server load,
Interactive is constrained
to the "knee-level" by
priority degradation
Overhead introduced just
as when Dynamic Priority
Adjust is turned off
Interactive suffers poor
response times
Figure 2.6.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 2 - Server Performance Behavior
28
Download from Www.Somanuals.com. All Manuals Search And Download.
2.10 Managing Interactive Capacity
Interactive/Server characteristics in the real world.
Graphs and formulas listed thus far work perfectly, provided the workload on the system is highly regular
and steady in nature. Of course, very few systems have workloads like that. The more typical case is a
dynamic combination of transaction types, user activity, and batch activity. There may very well be cases
where the interactive activity exceeds the documented limits of the interactive capacity, yet decreases
quickly enough so as not to seriously affect the response times for the rest of the workload. On the other
hand, there may also be some intense transactions that force the interactive activity to exceed the
documented limits interactive feature for a period of time even though the average CPU utilization
appears to be less than these documented limits.
For 7xx systems, current 170 systems, and mixed-mode servers, a goal should be set to only rarely exceed
the threshold value for interactive utilization. This will deliver the most consistent performance for both
interactive and non-interactive work.
The questions that need to be answered are:
1. “How do I know whether my system is approaching the interactive limits or not?”
2. “What is viewed as ‘interactive’ by the system?”
3. “How close to the threshold can a system get without disrupting performance?”
This section attempts to answer these questions.
Observing Interactive CPU utilization
The most commonly available method for observing interactive utilization is Collection Services used in
conjunction with the Performance Tools program product. The monitor collects system data as well as
data for each job on the system, including the CPU consumed and the type of job. By examining the
reports generated by the Performance Tools product, or by writing a query against the data in the various
performance data base files.
Note: data is written to these files based on sample interval (Smallest is 5 minutes, default is 15
minutes). This data is an average for the duration of a measurement interval.
1. The first metric of interest is how much of the system’s interactive capacity has been used. The file
QAPMSYSCPU field SCIFUS contains the amount of interactive feature CPU time used. This metric
became available with Collection Services in V4R5.
2. Even though average CPU may be reasonable your interactive workload may still be exceeding limits
at times. The file QAPMSYSCPU field SCIFTE contains the amount of time the interactive threshold
was exceeded during the interval. This metric became available with Collection Services in V4R5.
3. To determine what jobs are responsible for interactive feature consumption, you can look at the data
in QAPMJOBL (Collection Services) or QAPMJOBS (Performance Monitor):
y
If using Collection Services on a V5R1 or later system, those jobs which the machine considers to
be interactive are indicated by the field JBSVIF =’1’. These are all jobs that could contribute to
your interactive feature utilization.
y
In all cases you can examine the jobs that are considered interactive by OS/400 as indicated by
field JBTYPE = “I”. Although not totally accurate, in most cases this will provide an adequate list
of jobs that contributed to interactive utilization.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 2 - Server Performance Behavior
29
Download from Www.Somanuals.com. All Manuals Search And Download.
There are other means for determining interactive utilization. The easiest of these is the performance
monitoring function of Management Central, which became available with V4R3. Management Central
can provide:
y Graphical, real-time monitoring of interactive CPU utilization
y Creation of an alert threshold when an alert feature is turned on and the graph is highlighted
y Creation of an reverse threshold below which the highlights are turned off
y Multiple methods of handling the alert, from a console message to the execution of a command to the
forwarding of the alert to another system.
By taking the ratio of the Interactive CPW rating and the Processor CPW rating for a system, one can
determine at what CPU percentage the threshold is reached (This ratio works for the 7xx models and the
current model 170 systems. For earlier models, refer to other sections of this document to determine what
fraction of the Interactive CPW rating to use.) Depending on the workload, an alert can be set at some
percentage of this level to send a warning that it may be time to redistribute the workload or to consider
upgrading the interactive feature.
Finally, the functions of PM400 can also show the same type of data that Collection Services shows, with
the advantage of maintaining a historical view, and the disadvantage of being only historical. However,
signing up for the PM400 service can yield a benefit in determining the trends of how interactive
capacities are used on the system and whether more capacity may be needed in the future.
Is Interactive really Interactive?
Earlier in this document, the types of jobs that are classified as interactive were listed. In general, these
jobs all have the characteristic that they have a 5250 workstation communications path somewhere within
the job. It may be a 5250 data stream that is translated into html, or sent to a PC for graphical display, but
the work on the iSeries is fundamentally the same as if it were communicating with a real 5250-type
display. However, there are cases where jobs of type “I” may be charged with a significant amount of
work that is not “interactive”. Some examples follow:
y
Job initialization: If a substantial amount of processing is done by an interactive job’s initial program,
prior to actually sending and receiving a display screen as a part of the job, that processing may not
be included as a part of the interactive work on the system. However, this may be somewhat rare,
since most interactive jobs will not have long-running initial programs.
y
More common will be parallel activities that are done on behalf of an interactive job but are not done
within the job. There are two database-related activities where this may be the case.
1. If the QQRYDEGREE system value is adjusted to allow for parallelism or the CHGQRYA
command is used to adjust it for a single job, queries may be run in service jobs which are not
interactive in nature, and which do not affect the total interactive utilization of the system.
However, the work done by these service jobs is charged back to the interactive job. In this case,
Collection Services and most other mechanisms will all show a higher sum of interactive CPU
utilization than actually occurs. The exception to this is the WRKSYSACT command, which may
show the current activity for the service jobs and/or the activity that they have “charged back” to
the requesting jobs. Thus, in this situation it is possible for WRKSYSACT to show a lower
system CPU utilization than the sum of the CPU consumption for all the jobs.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 2 - Server Performance Behavior
30
Download from Www.Somanuals.com. All Manuals Search And Download.
2. A similar effect can be found with index builds. If parallelism is enabled, index creation (CRTLF,
Create Index, Open a file with MAINT(*REBUILD), or running a query that requires an index to
be build) will be sent to service jobs that operate in non-interactive mode, but charge their work
back to the job that requested the service. Again, the work does not count as “interactive”, but the
performance data will show the resource consumption as if they were.
y
Lastly when only a single interactive job is running, the machine grants an exemption and does not
include this job’s activity in the interactive feature utilization.
There are two key ideas in the statements above. First, if the workload has a significant component that is
related to queries or there is a single interactive job running, it will be possible to show an interactive job
utilization in the performance tools that is significantly higher than what would be assumed and reported
from the ratings of the Interactive Feature and the Processor Feature. Second, although it may make
monitoring interactive utilization slightly more difficult, in the case where the workload has a significant
query component, it may be beneficial to set the QQRYDEGREE system value to allow at least 2
processes, so that index builds and many queries can be run in non-interactive mode. Of course, if the
nature of the query is such that it cannot be split into multiple tasks, the whole query is run inside the
interactive job, regardless of how the system value is set.
How close to the threshold can a system get without disrupting performance?
The answer depends on the dynamics of the workload, the percentage of work that is in queries, and the
projected growth rate. It also may depend on the number of processors and the overall capacity of the
interactive feature installed. For example, a job that absorbs a substantial amount of interactive CPU on a
uniprocessor may easily exceed the threshold, even though the “normal” work on the system is well under
it. On the other hand, the same job on a 12-way can use at most 1/12th of the CPU, or 8.3%. a single,
intense transaction may exceed the limit for a short duration on a small system without adverse affects,
but on a larger system the chances of having multiple intense transactions may be greater.
With all these possibilities, how much of the Interactive feature can be used safely? A good starting point
is to keep the average utilization below about 70% of the threshold value (Use double the threshold value
for the servers and earlier Model 170 systems that use the 1/3 algorithm described earlier in this
document.) If the measurement mechanism averages the utilization over a 15 minute or longer period, or
if the workload has a lot of peaks and valleys, it might be worthwhile to choose a target that is lower than
70%. If the measurement mechanism is closer to real-time, such as with Management Central, and if the
workload is relatively constant, it may be possible to safely go above this mark. Also, with large
interactive features on fairly large processors, it may be possible to safely go to a higher point, because
the introduction of workload dynamics will have a smaller effect on more powerful systems.
As with any capacity-related feature, the best answer will be to regularly monitor the activity on the
system and watch for trends that may require an upgrade in the future. If the workload averages 60% of
the interactive feature with almost no overhead, but when observed at 65% of the feature capacity it
shows some limited amount of overhead, that is a clear indication that a feature upgrade may be required.
This will be confirmed as the workload grows to a higher value, but the proof point will be in having the
historical data to show the trend of the workload.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 2 - Server Performance Behavior
31
Download from Www.Somanuals.com. All Manuals Search And Download.
2.11 Migration from Traditional Models
This section describes a suggested methodology to determine which server model is appropriate to
contain the interactive workload of a traditional model when a migration of a workload is occurring.
It is assumed that the server model will have both interactive and client/server workloads.
To get the same performance and response time, from a CPU perspective, the interactive CPU utilization
of the current traditional model must be known. Traditional CPU utilization can be determined in a
number of ways. One way is to sum up the CPU utilization for interactive jobs shown on the Work with
Active Jobs (WRKACTJOB) command.
***************************************************************************
Work with Active Jobs
CPU %: 33.0
Elapsed time: 00:00:00
Active jobs: 152
Type options, press Enter.
2=Change 3=Hold 4=End 5=Work with 6=Release 7=Display message
8=Work with spooled files 13=Disconnect ...
Opt Subsystem/Job User
Type CPU % Function
Status
DEQW
DEQW
DEQW
EVTW
DEQW
__ BATCH
__ QCMN
__ QCTL
QSYS
SBS
SBS
SBS
BCH
SBS
INT
INT
SBS
BCH
PJ
0
QSYS
0
QSYS
0
__
QSYSSCD
QPGMR
QSYS
TESTER
0 PGM-QEZSCNEP
0
0.2 PGM-BUPMENUNE DSPW
0.7 CMD-WRKACTJOB RUN
__ QINTER
__
__
DSP05
QPADEV0021 TEST01
__ QSERVER
QSYS
0
0
0
DEQW
SELW
DEQW
__
__
QPWFSERVSD QUSER
QPWFSERVS0 QUSER
**************************************************************************
(Calculate the average of the CPU utilization for all job types "INT" for the desired time interval for
interactive CPU utilization - "P" in the formula shown below.)
Another method is to run Collection Services during selected time periods and review the first page of the
Performance Tools for iSeries licensed program Component Report. The following is an example of this
section of the report:
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 2 - Server Performance Behavior
32
Download from Www.Somanuals.com. All Manuals Search And Download.
***********************************************************************************
Component Report
Component Interval Activity
Data collected 190396 at 1030
Member . . . : Q960791030 Model/Serial . : 310-2043/10-0751D Main St...
Library. . : PFR
System name. . : TEST01 Version/Re..
ITV
End
Tns/hr
Rsp/Tns
CPU %
Total
CPU%
Inter
CPU %
Batch
Disk I/O
per sec
Sync
102.9
103.3
96.6
Disk I/O
per sec
Async
39
33.9
33.2
10:36
10:41
10:46
10:51
10:56
:
6,164
7,404
5,466
5,622
4,527
0.8
0.9
0.7
1.2
0.8
85.2
91.3
97.6
97.9
97.9
32.2
45.2
38.8
35.6
16.5
46.3
39.5
51
57.4
77.4
86.6
64.2
49
40.7
11:51
11:56
5,068
5,991
1.8
2.4
99.9
99.9
74.2
46.8
25.7
45.5
56.5
65.5
19.9
32.6
Itv End------Interval end time (hour and minute)
Tns/hr-------Number of interactive transactions per hour
Rsp/Tns-----Average interactive transaction response time
***********************************************************************************
(Calculate the average of the CPU utilization under the "Inter" heading for the desired time interval for
interactive CPU utilization - "P" in the formula shown below.)
It is possible to have interactive jobs that do not show up with type "INT" in Collection Services or the
Component Report. An example is a job that is submitted as a batch job that acquires a work station.
These jobs should be included in the interactive CPU utilization count.
Most systems have peak workload environments. Care must be taken to ensure that peaks can be
contained in server model environments. Some environments could have peak workloads that exceed
the interactive capacity of a server model or could cause unacceptable response times and
throughput.
In the following equations, let the interactive CPU utilization of the existing traditional system be
represented by percent P. A server model that should then produce the same response time and throughput
would have a CPW of:
Server Interactive CPW = 3 * P * Traditional CPW
or for Custom Models use:
Server Interactive CPW = 1.0 * P * Traditional CPW (when P < 85%)
or
Server interactive CPW = 1.5 * P * Traditional CPW (when P >= 85%)
Use the 1.5 factor to ensure the custom server is sized less than 85% CPU utilization.
These equations provide the server interactive CPU cycles required to keep the interactive utilization at or
below the knee of the curve, with the current interactive workload. The equations given at the end of the
Server and Custom Server Model Behavior section can be used to determine the effective interactive
utilization above the knee of the curve. The interactive workload below the knee of the curve represents
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 2 - Server Performance Behavior
33
Download from Www.Somanuals.com. All Manuals Search And Download.
one third of the total possible interactive workload, for non-custom models. The equation shown in this
section will migrate a traditional system to a server system and keep the interactive workload at or below
the knee of the curve, that is, using less than two thirds of the total possible interactive workload. In some
environments these equations will be too conservative. A value of 1.2, rather than 1.5 would be less
conservative. The equations presented in the Interactive Utilization section should be used by those
customers who understand how server models work above the knee of the curve and the ramifications of
the V4R1 enhancement.
These equations are for migration of “existing workload” situations only. Installation workload
projections for “initial installation” of new custom servers are generally sized by the business partner for
50 - 60% CPW workloads and no “formula increase” would be needed.
For example, assume a model 510-2143 with a single V3R6 CPW rating of 66.7 and assume the
Performance Tools for iSeries report lists interactive work CPU utilization as 21%. Using the previous
formula, the server model must have an interactive CPW rating of at least 42 to maintain the same
performance as the 510-2143.
Server interactive CPW = 3 * P * Traditional CPW
= 3 * .21 * 66.7
= 42
A server model with an interactive CPW rating of at least 42 could approximate the same interactive
work of the 510-2143, and still leave system capacity available for client/server activity. An S20-2165 is
the first AS/400e series with an acceptable CPW rating (49.7).
Note that interactive and client/server CPWs are not additive. Interactive workloads which exceed (even
briefly) the knee of the curve will consume a disproportionate share of the processing power and may
result in insufficient system capacity for client/server activity and/or a significant increase in interactive
response times.
2.12 Upgrade Considerations for Interactive Capacity
When upgrading a system to obtain more processor capacity, it is important to consider upgrading the
interactive capacity, even if additional interactive work is not planned. Consider the following
hypothetical example:
y
y
The original system has a processor capacity of 1000 CPW and an interactive capacity of 250 ICPW
The proposed upgrade system has a processor capacity of 4000 CPW and also offers an interactive
capacity of 250 ICPW.
y
y
y
On the original system, the interactive capacity allowed 25% of the total system to be used for
interactive work. On the new system, the same interactive capacity only allows 6.25% of the total
system to be used for interactive work.
Even though the total interactive capacity of the system has not changed, the faster processors (and
likely larger memory and faster disks) will allow interactive requests to complete more rapidly, which
can cause greater spikes of interactive demand.
So, just as it is important to consider balancing memory and disk upgrades with processor upgrades,
optimal performance may also require an interactive capacity upgrade when moving to a new system.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 2 - Server Performance Behavior
34
Download from Www.Somanuals.com. All Manuals Search And Download.
2.13 iSeries for Domino and Dedicated Server for Domino Performance Behavior
In preparation for future Domino releases which will provides support for DB2 files, the previous
processing limitations associated with DSD models have been removed in i5/OS V5R3.
In addition, a PTF is available for V5R2 which also removes the processing limitations for DSD models
and allows full use of DB2. Please refer to PTF MF32968 and its prerequisite requirements.
The sections below from previous versions of this document are provided for those users on OS/400
releases prior to V5R3.
2.13.1 V5R2 iSeries for Domino & DSD Performance Behavior updates
Included in the V5R2 February 2003 iSeries models are five iSeries for Domino offerings. These include
three i810 and two i825 models. The iSeries for Domino offerings are specially priced and configured for
Domino workloads. There are no processing guidelines for the iSeries for Domino offerings as with
non-Domino processing on the Dedicated Server for Domino models. With the iSeries for Domino
offerings the full amount of DB2 processing is available, and it is no longer necessary to have Domino
processing active for non-Domino applications to run well. Please refer to Chapter 11 for additional
information on Domino performance in iSeries, and Appendix C for information on performance
specifications for iSeries servers.
For existing iSeries servers, OS/400 V5R2 (both the June 2002 and the updated February 2003 version)
will exhibit similar performance behavior as V5R1 on the Dedicated Server for Domino models. The
following discussion of the V5R1 Domino-complimentary behavior is applicable to V5R2.
Five new DSD models were announced with V5R1. These included the iSeries Model 270 with a 1-way
and a 2-way feature, and the iSeries Model 820 with 1-way, 2-way, and 4-way features. In addition,
OS/400 V5R1 was enhanced to bolster DSD server capacity for robust Domino applications that require
Java Servlet and WebSphere Application Server integration. The new behavior which supports
Domino-complementary workloads on the DSD was available after September 28, 2001 with a refreshed
version of OS/400 V5R1. This enhanced behavior is applicable to all DSD models including the model
170 and previous 270 and 820 models. Additional information on Lotus Domino for iSeries can be found
in Chapter 11, “Domino for iSeries”.
For information on the performance behavior of DSD models for releases prior to V5R1, please refer the
to V4R5 version of this document.
Please refer to Appendix C for performance specifications for DSD models, including the number of Mail
and Calendaring Users (MCU) supported.
2.13.2 V5R1 DSD Performance Behavior
This section describes the performance behavior for all DSD models for the refreshed version of OS/400
V5R1 that was available after September 28, 2001.
A white paper, Enhanced V5R1 Processing Capability for the iSeries Dedicated Server for Domino,
provides additional information on DSD behavior and can be accessed at:
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 2 - Server Performance Behavior
35
Download from Www.Somanuals.com. All Manuals Search And Download.
Domino-Complementary Processing
Prior to V5R1, processing that did not spend the majority of its time in Domino code was considered
non-Domino processing and was limited to approximately 10-15% of the system capacity. With V5R1,
many applications that would previously have been treated as non-Domino may now be considered as
Domino-complementary when they are used in conjunction with Domino. Domino-complementary
processing is treated the same as Domino processing, provided it also meets the criteria that the DB2
processing is less than 15% CPU utilization as described below. This behavioral change has been made to
support the evolving complexity of Domino applications which frequently require integration with
function such as Java Servlets and WebSphere Application Server. The DSD models will continue to
have a zero interactive CPW rating which allows sufficient capacity for systems management processing.
Please see the section below on Interactive Processing.
In other words, non-Domino workloads are considered complementary when used simultaneously with
Domino, provided they meet the DB2 processing criteria. With V5R1, the amount of DB2 processing on a
DSD must be less than 15% CPU. The DB2 utilization is tracked on a system-wide basis and all
applications on the DSD cumulatively should not exceed 15% CPU utilization. Should the 15% DB2
processing level be reached, the jobs and/or threads that are currently accessing DB2 resources may
experience increased response times. Other processing will not be impacted.
Several techniques can used to determine and monitor the amount of DB2 processing on DSD (and
non-DSD) iSeries servers for V4R5 and V5R1.
y
y
Work with System Status (WRKSYSSTS) command, via the % DB capability statistic
Work with System Activity (WRKSYSACT) command which is part of the IBM Performance Tools
for iSeries, via the Overall DB CPU util statistic
y
y
Management Central - by starting a monitor to collect the CPU Utilization (Database Capability)
metric
Workload section in the System Report which can be generated using the IBM Performance Tools for
iSeries, via the Total CPU Utilization (Database Capability) statistic
V5R1 Non-Domino Processing
Since all non-interactive processing is considered Domino-complementary when used simultaneously
with Domino, provided it meets the DB2 criteria, non-Domino processing with V5R1 refers to the
processing that is present on the system when there is no Domino processing present. (Interactive
processing is a special case and is described in a separate section below). When there is no Domino
processing present, all processing, including DB2 access, should be less than 10-15% of the system
capacity. When the non-Domino processing capacity is reached, users may experience increased
response times. In addition, CFINT processing may be present as the system attempts to manage the
non-Domino processing to the available capacity. The announced “Processor CPW” for the DSD models
refers to the amount of non-Domino processing that is supported .
Non-Domino processing on the 270 and 820 DSD models can be tracked using the Management Central
function of Operations Navigator. Starting with V4R5, Management Central provides a special metric
called “secondary utilization” which shows the amount of non-Domino processing. Even when Domino
processing is present, the secondary utilization metric will include the Domino-complementary
processing. And , as discussed above, the Domino-complementary processing running in conjunction
with Domino will not be limited unless it exceeds the DB2 criteria.
Interactive Processing
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 2 - Server Performance Behavior
36
Download from Www.Somanuals.com. All Manuals Search And Download.
Similar to previous DSD performance behavior for interactive processing, the Interactive CPW rating of 0
allows for system administrative functions to be performed by a single interactive user. In practice, a
single interactive user will be able to perform necessary administrative functions without constraint. If
multiple interactive users are simultaneously active on the DSD, the Interactive CPW capacity will likely
be exceeded and the response times of those users may significantly lengthen. Even though the
Interactive CPW capacity may be temporarily exceeded and the interactive users experience increased
response times, other processing on the system will not be impacted. Interactive processing on the 270
and 820 DSD models can be tracked using the Management Central function of Operations Navigator.
Logical Partitioning on a Dedicated Server
With V5R1, iSeries logical partitioning is supported on both the Model 270 and Model 820. Just to be
clear, iSeries logical partitioning is different from running multiple Domino partitions (servers). It is not
necessary to use iSeries logical partitioning in order to be able to run multiple Domino servers on an
iSeries system. iSeries logical partitioning lets you run multiple independent servers, each with its own
processor, memory, and disk resources within a single symmetric multiprocessing iSeries. It also provides
special capabilities such as having multiple versions of OS/400, multiple versions of Domino, different
system names, languages, and time zone settings. For additional information on logical partitioning on the
iSeries please refer to Chapter 18. Logical Partitioning (LPAR) and LPAR web at:
When you use logical partitioning with a Dedicated Server, the DSD CPU processing guidelines are
pro-rated for each logical partition based on how you divide up the CPU capability. For example,
suppose you use iSeries logical partitioning to create two logical partitions, and specify that each logical
partition should receive 50% of the CPU resource. From a DSD perspective, each logical partition runs
independently from the other, so you will need to have Domino-based processing in each logical
partition in order for non-Domino work to be treated as complementary processing. Other DSD
processing requirements such as the 15% DB2 processing guidelines and the 15% non-Domino
processing guideline will be divided between the logical partitions based on how the CPU was allocated
to the logical partitions. In our example above with 50% of the CPU in each logical partition, the DB2
database guideline will be 7.5% CPU for each logical partition. Keep in mind that WRKSYSSTS and
other tools show utilization only for the logical partition they are running in; so in our example of a
partition that has been allocated 50% of the processor resource, a 7.5% system-wide load will be shown
as 15% within that logical partition. The non-Domino processing guideline would be divided in a similar
manner as the DB2 database guideline.
Running Linux on a Dedicated Server
As with other iSeries servers, to run Linux on a DSD it is necessary to use logical partitioning. Because
Linux is it’s own unique operating environment and is not part of OS/400, Linux needs to have its own
logical partition of system resources, separate from OS/400. The iSeries Hypervisor allows each partition
to operate independently. When using logical partitioning on iSeries, the first logical partition, the
primary partition, must be configured to run OS/400. For more information on running Linux on iSeries,
please refer to Chapter 13. iSeries Linux Performance and Linux for iSeries web site at:
Running Linux in a DSD logical partition will exhibit different performance characteristics than running
OS/400 in a DSD logical partition. As described in the section above, when running OS/400 in a DSD
logical partition, the DSD capacities such as the 15% DB2 processing guideline and the 15% non-Domino
processing guidelines are divided proportionately between the logical partitions based on how the
processor resources were allocated to the logical partitions. However, for Linux logical partitions, the
DSD guidelines are relaxed, and the Linux logical partition is able to use all of the resources allocated to
it outside the normal guidelines for DSD processing. This means that it is not necessary to have Domino
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 2 - Server Performance Behavior
37
Download from Www.Somanuals.com. All Manuals Search And Download.
processing present in the Linux logical partition, and all resources allocated to the Linux logical partition
can essentially be used as though it were complementary processing. It is not necessary to proportionally
increase the amount of Domino processing in the OS/400 logical partition to account for the fact that
Domino processing is not present in the Linux logical partition .
By providing support for running Linux logical partitions on the Dedicated Server, it allows customers to
run Linux-based applications, such as internet fire walls, to further enhance their Domino processing
environment on iSeries. At the time of this publication, there is not a version of Domino that is supported
for Linux logical partitions on iSeries.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 2 - Server Performance Behavior
38
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 3. Batch Performance
In a commercial environment, batch workloads tend to be I/O intensive rather than CPU intensive. The
factors that affect batch throughput for a given batch application include the following:
y Memory (Pool size)
y CPU (processor speed)
y DASD (number and type)
y System tuning parameters
Batch Workload Description
The Batch Commercial Mix is a synthetic batch workload designed to represent multiple types of batch
processing often associated with commercial data processing. The different variations allow testing of
sequential vs random file access, changing the read to write ratio, generating "hot spots" in the data and
running with expert cache on or off. It can also represent some jobs that run concurrently with interactive
work where the work is submitted to batch because of a requirement for a large amount of disk I/O.
3.1 Effect of CPU Speed on Batch
The capacity available from the CPU affects the run time of batch applications. More capacity can be
provided by either a CPU with a higher CPW value, or by having other contending applications on the
same system consuming less CPU.
Conclusions/Recommendations
y For CPU-intensive batch applications, run time scales inversely with Relative Performance Rating
(CPWs). This assumes that the number synchronous disk I/Os are only a small factor.
y For I/O-intensive batch applications, run time may not decrease with a faster CPU. This is because
I/O subsystem time would make up the majority of the total run time.
y It is recommended that capacity planning for batch be done with tools that are available for iSeries.
For example, PATROL for iSeries - Predict from BMC Software, Inc. * (PID# 5620FIF) can be used
for modeling batch growth and throughput. BATCH400 (an IBM internal tool) can be used for
estimating batch run-time.
3.2 Effect of DASD Type on Batch
For batch applications that are I/O-intensive, the overall batch performance is very dependent on the
speed of the I/O subsystem. Depending on the application characteristics, batch performance (run time)
will be improved by having DASD that has:
y faster average service times
y read ahead buffers
y write caches
Additional information on DASD devices in a batch environment can be found in Chapter 14, “DASD
Performance”.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 3 - Batch Performance
39
Download from Www.Somanuals.com. All Manuals Search And Download.
3.3 Tuning Parameters for Batch
There are several system parameters that affect batch performance. The magnitude of the effect for each
of them depends on the specific application and overall system characteristics. Some general information
is provided here.
y
Expert Cache
Expert Cache did not have a significant effect on the Commercial Mix batch workload. Expert Cache
does not start to provide improvement unless the following are true for a given workload. These
include:
y the application that is running is disk intensive, and disk I/O's are limiting the throughput.
y the processor is under-utilized, at less than 60%.
y the system must have sufficient main storage.
For Expert Cache to operate effectively, there must be spare CPU, so that when the average disk
access time is reduced by caching in main storage, the CPU can process more work. In the
Commercial Mix benchmark, the CPU was the limiting factor.
However, specific batch environments that are DASD I/O intensive, and process data sequentially
may realize significant performance gains by taking advantage of larger memory sizes available on
the RISC models, particularly at the high-end. Even though in general applications require more
main storage on the RISC models, batch applications that process data sequentially may only require
slightly more main storage on RISC. Therefore, with larger memory sizes in conjunction with using
Expert Cache, these applications may achieve significant performance gains by decreasing the
number of DASD I/O operations.
y
Job Priority
Batch jobs can be given a priority value that will affect how much CPU processing time the job will
get. For a system with high CPU utilization and a batch job with a low job priority, the batch
throughput may be severely limited. Likewise, if the batch job has a high priority, the batch
throughput may be high at the expense of interactive job performance.
y
y
Dynamic Priority Scheduling
See 19.2, “Dynamic Priority Scheduling” for details.
Application Techniques
The batch application can also be tuned for optimized performance. Some suggestions include:
y Breaking the application into pieces and having multiple batch threads (jobs) operate concurrently.
Since batch jobs are typically serialized by I/O, this will decrease the overall required batch
window requirements.
y Reduce the number of opens/closes, I/Os, etc. where possible.
y If you have a considerable amount of main storage available, consider using the Set Object Access
(SETOBJACC) command. This command pre-loads the complete database file, database index, or
program into the assigned main storage pool if sufficient storage is available . The objective is to
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 3 - Batch Performance
40
Download from Www.Somanuals.com. All Manuals Search And Download.
improve performance by eliminating disk I/O operations.
y If communications lines are involved in the batch application, try to limit the number of
communications I/Os by doing fewer (and perhaps larger) larger application sends and receives.
Consider blocking data in the application. Try to place the application on the same system as the
frequently accessed data.
* BMC Software, the BMC Software logos and all other BMC Software products including PATROL for
iSeries - Predict are registered trademarks or trademarks of BMC Software, Inc.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 3 - Batch Performance
41
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 4. DB2 for i5/OS Performance
This chapter provides a summary of the new performance features of DB2 for i5/OS on V6R1, V5R4 and
V5R3, along with V5R2 highlights. Summaries of selected key topics on the performance of DB2 for
i5/OS are provided. General information and some recommendations for improving performance are
included along with links to the latest information on these topics. Also included is a section of
performance references for DB2 for i5/OS.
4.1 New for i5/OS V6R1
In i5/OS V6R1 there are several performance enhancements to DB2 for i5/OS. The evolution of the SQL
Query Engine (SQE), with this release, again supports more queries. Some of the new function supported
may also have a sizable effect on performance, including derived key indexes, decimal floating-point data
type, and select from insert. Lastly, modifications specifically to improve performance were made in
several key areas, including optimization improvements to produce more efficient access plans, reducing
full open and optimization time, and path length reduction of some basic, high use paths.
i5/OS V6R1 SQE Query Coverage
The query dispatcher controls whether an SQL query will be routed to SQE or to the Classic Query
Engine (CQE). SQL queries with the following attributes, which were routed to CQE in previous releases,
may now be routed to SQE in i5/OS V6R1:
y
y
y
y
y
y
y
y
NLSS/CCSID translation between columns
User-defined table functions
Sort sequence
Lateral correlation
UPPER/LOWER functions
UTF8/16 Normalization support (NORMALIZE_DATA INI option of *YES)
LIKE with UTF8/UTF16 data
Character based substring and length for UTF8/UTF16 data
Also, in V6R1, the default value for the QAQQINI option IGNORE_DERIVED_INDEX has changed
from *NO to *YES. The default behavior will now be to run supported queries through SQE even if
there is a select/omit logical file index created over any of the tables in the query. In V6R1 many types
of derived indexes are now supported by the SQE optimizer and usage of the QAQQINI option
IGNORE_DERIVED_INDEX only applies to select/omit logical file indexes.
SQL queries with the attributes listed above will be processed by the SQE optimizer and engine in V6R1.
Due to the robust SQE optimizer potentially choosing a better plan along with the more efficient query
engine processing, there is the potential for better performance with these queries than was experienced in
previous releases.
SQL queries which continue to be routed to CQE in i5/OS V6R1 have the following attributes:
y
y
y
y
INSERT WITH VALUES statement or the target of an INSERT with subselect statement
Logical files referenced in the FROM clause
Tables with Read Triggers
Read-only queries with more than 1000 dataspaces or updateable queries with more than 256
dataspaces.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
Chapter 4 - DB2 Performance
© Copyright IBM Corp. 2008
42
Download from Www.Somanuals.com. All Manuals Search And Download.
y
DB2 Multisystem tables
New function available in V6R1 whose use may affect SQL performance are derived key indexes,
decimal floating point data type support, and the select from insert statement. A derived key index can
have an expression in place of a column name that can use built-in functions, user defined functions, or
some other valid expression. Additionally, you can use the SQL CREATE INDEX statement to create a
sparse index using a WHERE condition.
The decimal floating-point data type has been implemented in V6R1. A decimal floating-point number is
an IEEE 754R number with a decimal point. The position of the decimal point is stored in each decimal
floating-point value. The maximum precision is 34 digits. The range of a decimal floating-point number is
either 16 or 34 digits of precision, and an exponent range of 10-383 to 10384 or 10-6143 to 106144 respectively.
Use of the new decimal floating-point data type depends on whether you desire the new functionality. In
general, more CPU is used to process data of this type versus decimal or floating-point data. The
increased amount of processing time needed depends on the processor technology being used. Power6
hardware has special hardware support for processing decimal floating-point data, while Power5 does not.
Power6 hardware enables much better performance for decimal floating-point processing. The CPU used
to process this data depends on other factors also, including the application code, the functions used, and
the data itself. As an example, for a specific set of queries run over a particular database, ranges for
increased processing time for decimal floating-point data versus either decimal or floating point are
shown in the chart below in Figure 4.1. The query attribute column shows the type of operations over the
decimal floating-point columns in the queries.
Query Attribute
Select
POWER5 Processor
0% to 15%
POWER6 Processor
0% to 15%
Arithmetic ( +, -, *, / )
15% improved to 400%
15% improved to 1200%
40% improved to 600%
0% to 20%
35% improved to 45%
35% improved to 300%
35% improved to 500%
0% to 35%
Functions ( AVG, MAX, MIN, SUM, CHAR, TRUN)
Casts ( to/from int, decimal, float)
Inserts, Updates, and Create Index
Figure 4.1 Processing time degradation with decimal floating-point data versus decimal or float
Given the additional processing time needed for decimal floating-point data, the recommendation is to use
this data type only when the increased precision and rounding capabilities are needed. It is also
recommended to avoid conversions to and from this data type, when possible. It should not normally be
necessary to migrate existing packed or zoned decimal fields within a mature data base file to the new
decimal floating point data type. Any decimal fields in the file will be converted to decimal float in host
variables, as provided by the languages and APIs chosen. That will, in many cases, be a better performer
overall (especially including existing code considerations) than a migration of the data field to a new
format.
The ability to insert records into a table and then select from those inserted records in one statement,
called Select From Insert, has been added to V6R1. Using a single SQL statement to insert and then
retrieve the records can perform much better than doing an insert followed by a select statement. The
chart below in figure 4.2 shows an example of the performance of a basic select from insert compared to
the insert followed by select when inserting/selecting various number of records, from 1 to 1000. The
data is for a particular database and SQL queries, and one specific hardware and software configuration
running V6R1 i5/OS. The ratio of the clock times for these operations is shown. A ratio of less than 1
indicates that the select from insert ran faster than the insert followed by a select. Select from insert
using NEW TABLE performs better than insert then select for all quantities of rows inserted. Select
from insert using FINAL TABLE performs better in the one row case, but takes longer with more rows.
This is due to the additional locking needed with FINAL TABLE to insure the rows are not modified until
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
43
Download from Www.Somanuals.com. All Manuals Search And Download.
the statement is complete. The implementation to invoke the locking causes a physical DASD write to
the journal for each record, which causes journal waits. Journal caching on allows the journal writes to
accumulate in memory and have one DASD write per multiple journal entries, greatly reducing the
journal wait time. So select from insert statements with FINAL TABLE run much faster with journal
caching on. Figure 4.2 shows that select from insert with FINAL TABLE and journal caching on ran
faster than the insert followed by select for all but the 1000 row insert size.
6.00
Select from Insert: Final Table
5.00
4.00
3.00
2.00
1.00
0.00
Select from Insert: Final Table
Journal caching on
Select from Insert: New Table
Select from Insert: New Table
Journal caching on
1
10
100
1000
Records Inserted/Selected
Figure 4.2 Select from Insert versus Insert followed by Select clock time ratios
In addition to updates for new functionality, in V6R1 substantial performance improvements were made
to some SQL code paths. Improvements were made to the optimizer to make query execution cost
estimates more accurate. This means that the optimizer is producing more efficient access plans for some
queries, which may reduce their run time. The time required to full open and optimize queries was also
largely reduced for many queries in V6R1. On average, for a group of greatly varying queries, the total
open time including optimization has been reduced 45%. For a given set of very simple queries which go
through a full open, but whose access plan already exists in the plan cache, the full open time was reduced
by up to 30%.
In addition to the optimization and full open performance improvements, for V6R1 there was a
comprehensive effort to reduce the basic path of a simple query which is running in re-use mode (pseudo
open), and in particular is using JDBC to access the database. The results of this are potentially large
reductions in the CPU time used in processing queries, particularly very simple queries. For a stock trade
workload running through JDBC, throughput improvements of up to 78% have been measured. For more
information please see Chapter 6. Web Server and WebSphere Performance.
4.2 DB2 i5/OS V5R4 Highlights
In i5/OS V5R4 there were several performance enhancements to DB2 for i5/OS. With support in SQE for
Like/Substring, LOBs and the use of temporary indexes, many more queries now go down the SQE path.
Thus there is the potential for better performance due to the robust SQE optimizer choosing a better plan
along with the more efficient query engine processing. Also supported is use of Recursive Common
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
44
Download from Www.Somanuals.com. All Manuals Search And Download.
Table Expressions (RCTE) which allow for more elegant and better performing implementations of
recursive processing. In addition, enhancements have been made in i5/OS V5R4 to the support for
materialize query tables (MQTs) and partitioned table processing, which were both new in i5/OS V5R3.
i5/OS V5R4 SQE Query Coverage
The query dispatcher controls whether an SQL query will be routed to SQE or to CQE. SQL queries with
the following attributes, which were routed to CQE in previous releases, may now be routed to SQE in
i5/OS V5R4:
y
y
Sensitive cursor
Like/Substring predicates
y
y
LOB columns
ALWCPYDTA(*NO)
SQL queries which continue to be routed to CQE in i5/OS V5R4 have the following attributes:
y
y
y
References to DDS logical files
NLSS/CCSID translation between columns
User-defined table unctions
y
y
DB2 Multisystem
Tables with select/omit logicals over them
In general, queries with Like and Substring predicates which are newly routed to SQE see substantial
performance improvements in i5/OS V5R4. For a group of widely varying queries and data, including a
wide range of Like and Substring predicates and various file sizes, a large percentage of the queries saw
up to a 10X reduction in query run time. Queries with references to LOB columns, which were newly
routed to SQE,, in general, also experience substantial performance improvements in i5/OS V5R4. For a
set of queries which have references to LOB columns, in which the queries and data vary greatly a large
percentage ran up to a 5X faster. .
A new addition to SQE is the creation and use of temporary indexes. These indexes will be created
because they are required for implementing certain types of query requests or because they allow for
better performance. The implementation of queries which require live data may require temporary
indexes, for example, queries that run with a sensitive cursor or with ALWCPYDTA(*NO). In the case
of using a temporary index for better performance, the SQE optimizer costs the creation and use of
temporary indexes during plan optimization. An access plan will choose a temporary index if the
amortized cost of building the index, provided one does not exist, reduces the query run time of the access
plan enough that this plan wins over other plan options. The temporary indexes that the optimizer
considers building are the same indexes in the ‘index advised’ list for a given query. Features unique to
SQE temporary indexes, compared to CQE temporary indexes, are the longer lifetimes and higher degree
of sharing of these indexes. SQE temporary indexes may be reused by the same query or other queries in
the same job or in other jobs. The SQE temporary indexes will persist and will be maintained until the last
query which references the temporary index is hard closed and the plan is removed from the plan cache.
In many cases, this means the temporary indexes will persist until the last job which was using the index
is ended. The high degree of sharing and longer lifetime allow for more reuse of the indexes without
repeated create index cost.
New function for implementing applications that work with recursive data has been added to i5/OS
V5R4. Recursive Common Table Expressions (RCTE) and Recursive Views may now be used in these
types of applications, versus using SQL Stored Procedures and temporary results tables. For more
information on using RCTEs and Recursive Views see the DB2 for System i Database Performance and
Query Optimization manual.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
45
Download from Www.Somanuals.com. All Manuals Search And Download.
Enhancements to extend the use of materialized query tables (MQTs) were added in i5/OS V5R4. New
supported function in MQT queries by the MQT matching algorithm are unions and partitioned tables,
along with limited support for scalar subselects, UDFs and user defined table functions, RCTE, and some
scalar functions. Also new to i5/OS V5R4, the MQT matching algorithm now tries to match constants in
the MQT with parameter markers or host variable values in the query. For more information on using
MQTs see the DB2 for System i Database Performance and Query Optimization manual and the white
paper, The creation and use of materialized query tables within IBM DB2 FOR i5/OS, available at
The performance of queries which reference partitioned tables has been enhanced in i5/OS V5R4. The
overhead when optimizing queries which reference a partitioned table has been reduced. Additionally,
general improvements in plan quality have yielded run time improvements as well.
4.3 i5/OS V5R3 Highlights
In i5/OS V5R3, the SQL Query Engine (SQE) roll-out in DB2 for i5/OS took the next step. The new SQL
Query Optimizer, SQL Query Engine and SQL Database Statistics were introduced in V5R2 with a
limited set of queries being routed to SQE. In i5/OS V5R3 many more SQL queries are implemented in
SQE. In addition, many performance enhancements were made to SQE in i5/OS V5R3 to decrease query
runtime and to use System i resources more efficiently. Additional significant new features in this release
are: table partitioning, the lookahead predicate generation (LPG) optimization technique for enhanced
star-join support and a technology preview of materialized query tables. Also an April 2005 addition to
the DB2 FOR i5/OS V5R3 support was query optimizer support for recognizing and using materialized
query tables (MQTs) (also referred to as automatic summary tables or materialized views) for limited
query functions. Two other improvements worth mentioning are faster delete support and SQE constraint
awareness. This section contains a summary of the V5R3 information in the System i Performance
Capabilities Reference i5/OS Version 5 Release 3 available at
i5/OS V5R3 SQE Query Coverage
The query dispatcher controls whether an SQL query will be routed to SQE or to CQE (Classic Query
Engine). The staged implementation of SQE enabled a very limited set of queries to be routed to SQE in
V5R2. In general, read only single table queries with a limited set of attributes would be routed to SQE.
The details of the query attributes for routing to SQE versus CQE in V5R2 are documented in the V5R2
redbook Preparing for and Tuning the V5R2 SQL Query Engine. With the V5R2 enabling PTF applied,
PTF SI07650 documented in Info APAR II13486, the dispatcher routes many more queries through SQE.
More single table queries and a limited set of multi-table queries are able to take advantage of the SQE
enhancements. Queries with OR and IN predicates may be routed to SQE with the enabling PTF as will
SQL queries with the appropriate attributes on systems with SMP enabled.
In i5/OS V5R3 a much larger set of queries are implemented in SQE including those with the enabling
PTF on V5R2 and many queries with the following types of attributes:
y
y
y
y
Subqueries
Views
Common table expressions
Derived tables
y
y
y
Unions
Updates
Deletes
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
46
Download from Www.Somanuals.com. All Manuals Search And Download.
SQL queries which continue to be routed to CQE in i5/OS V5R3 have the following attributes:
y
y
y
y
Sensitive cursor
Like/Substring predicates
LOB columns
y
y
y
y
NLSS/CCSID translation between columns
DB2 Multisystem
ALWCPYDTA(*NO)
References to DDS logical files
Tables with select/omit logicals over them
i5/OS V5R3 SQE Performance Enhancements
Many enhancements were made in i5/OS V5R3 to enable faster query runtime and use less system
resource. Highlights of these enhancements include the following:
y
y
y
y
y
y
y
y
y
New optimization techniques including Lookahead Predication Generation and Constraint Awareness
Sharing of temporary result sets across jobs
Reduction in size of temporary result sets
More efficient I/O for temporary result sets
Ability to do some aggregates with EVI symbol table access only
Reduction in memory used during optimization
Reduction in DB structure memory usage
More efficient statistics generation during optimization
Greater accuracy of statistics usage for optimization plan generation
The DB2 performance enhancements in i5/OS V5R3 substantially reduced the runtime of many queries.
Performance improvements vary substantially due to many factors -- file size and layout, indexes and
statistics available -- making generalization of performance expectations for a given query difficult.
However, longer running queries which are newly routed to SQE in i5/OS V5R3, in general, have a
greater likelihood of significant performance benefit.
For the short running queries, those that run less than 2 seconds, performance improvements are nominal.
For subsecond queries there is little to no improvement for most queries. As the runtime increases, the
reduction in runtime and CPU time become more substantial. In general, for short running queries there is
less opportunity for improving performance. Also, the first execution of all the queries in these figures
was measured so that a database open and full optimization were required. Database open and full
optimization overhead may be higher with SQE, as it evaluates more information and examines more
potential query implementation plans. As this overhead is much more expensive relative to actual query
implementation for short running queries, performance benefits from SQE for the short running queries
are minimized. However, in OLTP environments the plan caching and open data path (ODP) reuse design
minimizes the number of opens and full optimizations needed. A very small percentage of queries in
typical customer OLTP workloads go through full open and optimization.
The performance benefits are substantial for many of the medium to long running queries newly routed to
SQE in i5/OS V5R3. Typically, the longer the runtime, the more potential for improvements. This is due
to the optimizer constructing a more efficient access plan and the faster execution of the access plan with
the SQE query engine. Many of the queries with runtimes greater than 2 seconds, especially those with
runtimes greater than 10 seconds, reduced their runtime by a factor of 2 or more. Queries which run
longer than 200 seconds were typically improved from 15% to over 100 times.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
47
Download from Www.Somanuals.com. All Manuals Search And Download.
Partitioned Table Support
Table partitioning is a new feature introduced in i5/OS V5R3. The design is localized on an individual
table basis rather than an entire library. The user specifies one or more fields which collectively act as a
partitioning key. Next the records in the table are distributed into multiple disjoint sets based on the
partitioning scheme used: either a system-supplied hashing function or a set of value ranges (such as dates
by month or year) supplied by the user. The user can partition data using up to 256 partitions in i5/OS
V5R3. The partitions are stored as multiple members associated with the same file object, which
continues to represent the overall table as a single entity from an SQL data-access viewpoint.
The primary motivations for the initial release of this feature are twofold:
y
y
Eliminate the limitation of at most 4 billion (2^32) rows in a single table
Enhance data administration tasks such as save/restore, import/export, and add/drop which can be
done more quickly on a partition basis (subset of a table)
In theory, table partitioning also offers opportunities for performance gains on queries that specify
selection referencing a single or small number of partitions. In reality, however, the performance impact
of partitioned tables in this initial release are limited on the positive side and may instead result in
performance degradation when adopted too eagerly without carefully considering the ramifications of
such a change. The resulting performance after partitioning a table depends critically on the mix of
queries used to access the table and the number of partitions created. If fields used as partitioning keys
are frequently included in selection criteria the resulting performance can be much better due to improved
locality of reference for the desired records. When used incorrectly, table partitioning may degrade the
performance of queries by an order of magnitude or more -- particularly when a large number of
partitions (>32) are created.
Performance expectations of table partitioning on i5/OS V5R3 should not be equated at this time with
partitioning concepts on other database platforms such as DB2 for Linux, Unix and Windows or offerings
from other competitors. Nor should table partitioning on V5R3 be confused with the DB2 Multisystem
for i5/OS offering. Carefully planned data storage schemes with active end-user disk arm management
lead to the performance gains experienced with partitioned databases on those other platforms. Further
gains are realized in other approaches through execution on clusters of physical nodes (in an approach
similar to DB2 Multisystem for i5/OS). In addition, the entire schema is involved in the partitioning
approach. On the other hand, the System i table partitioning design continues to utilize single level
storage which already automatically spreads data to all disks in the relevant ASP. No new performance
gains from I/O balancing are achieved when partitioning a table. Instead the gains tend to involve
improved locality of reference for a subset of the data contained in a single partition or ease of
administration when adding or deleting data on partition boundaries.
An in-depth discussion of table partitioning for i5/OS V5R3 is available in the white paper Table
Partitioning Strategies for DB2 FOR i5/OS available at
This publication covers additional details such as:
y Migration strategies for deployment
y Requirements and Limitations
y Sample Environments (OLTP, OLAP, Limits to Growth, etc.) & Recommended Settings
y Indexing Strategies
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
48
Download from Www.Somanuals.com. All Manuals Search And Download.
y Statistical Strategies
y SMP Considerations
y Administration Examples (Adding a Partition, Dropping a Partition, etc.)
Materialized Query Table Support
The initial release of i5/OS V5R3 includes the Materialized Query Table (MQT) (also referred to as
automatic summary tables or materialized views) support in UDB DB2 for i5/OS as essentially a
technology preview. Pre-April 2005 i5/OS V5R3 provides the capability of creating materialized query
tables, but no optimizer awareness of these MQTs. An April 2005 addition to DB2 for i5/OS V5R3 is
query optimizer support for recognizing and using MQTs. This additional support for recognizing and
using MQTs is limited to certain query functions. MQTs can provide performance enhancements in a
manner similar to indexes. This is done by precomputing and storing results of a query in the materialized
query table. The database engine can use these results instead of recomputing them for a user specified
query. The query optimizer will look for any applicable MQTs and can choose to implement the query
using a given MQT provided this is a faster implementation choice. For long running queries, the run time
may be substantially improved with judicious use of MQTs. For more information on MQTs including
how to enable this new support, for which queries support MQTs and how to create and use MQTs see
the DB2 for System i Database Performance and Query Optimization manual. For the latest information
Fast Delete Support
As developers have moved from native I/O to embedded SQL, they often wonder why a Clear Physical
File Member (ClrPfm) command is faster than the SQL equivalent of DELETE FROM table. The reason
is that the SQL DELETE statement deletes a single row at a time. In i5/OS V5R3, DB2 for System i has
been enhanced with new techniques to speed up processing when every row in the table is deleted. If the
DELETE statement is not run under commitment control, then DB2 for System i will actually use the
ClrPfm operation underneath the covers. If the Delete is performed with commitment control, then DB2
FOR i5/OS can use a new method that’s faster than the old delete one row at a time approach. Note
however that not all DELETEs will use the new faster support. For example, delete triggers are still
processed the old way.
4.4 V5R2 Highlights - Introduction of the SQL Query Engine
In V5R2 major enhancements, entitled SQL Query Engine (SQE), were implemented in DB2 for i5/OS.
SQE encompasses changes made in the following areas:
y
y
y
SQL query optimizer
SQL query engine
Database statistics
A subset of the read-only SQL queries are able to take advantage of these enhancements in V5R2.
SQE Optimizer
The SQL query optimizer has been enhanced with new optimization capabilities implemented in object
oriented technology. This object oriented framework implements new optimization techniques and
allows for future extendibility of the optimizer. Among the new capabilities of the optimizer are
enhanced query access plan costing. For queries which can take advantage of the SQE enhancements,
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
49
Download from Www.Somanuals.com. All Manuals Search And Download.
more information may be used in the query plan costing phase than was available to the optimizer
previously. The optimizer may now use newly implemented database statistics to make more accurate
decisions when choosing the query access plan. Also, the enhanced optimizer may more often select
plans using hash tables and sorted partial result lists to hold partial query results during query processing,
rather than selecting access plans which build temporary indexes. With less reliance on temporary indexes
the SQE optimizer is able to select more efficient plans which save the overhead of building temporary
indexes and more fully take advantage of single-level store. The optimizer changes were designed to
create efficient query access plans for the enhanced database engine.
SQE Query Engine
The database engine is the part of the database implementation which executes the access plan produced
by the query optimizer. It accesses the data, processes it, and returns the SQL query results. The new
engine enhancements, the SQE database engine, employ state of the art object oriented implementation.
The SQE database engine was developed in tandem with the SQE optimization enhancements to allow for
an efficient design which is readily extendable. Efficient new algorithms for the data access methods are
used in query processing by the SQE engine.
The basic data access algorithms in SQE are designed to take full advantage of the System i single-level
store to give the fastest query response time. The algorithms reduce I/O wait time by making use of
available main memory and aggressively reading data from disk into memory. The goal of the data
read-ahead algorithms is that the data is in memory when it is needed. This is done through the use of
asynchronous I/Os. SQL queries which access large amounts of data may see a considerable
improvement in the query runtime. This may also result in higher peak disk utilization.
The effects of the SQE enhancements on SQL query performance will vary greatly depending on many
factors. Among these factors are hardware configuration (processor, memory size, DASD
configuration...), system value settings, file layout, indexes available, query options file QAQQINI
settings, and the SQL queries being run.
SQE Database Statistics
The third area of SQE enhancements is the collection and use of new database statistics. Efficient
processing of database queries depends primarily on a query optimizer that is able to make judicious
choices of access plans. The ability of an optimizer to make a good decision is critically influenced by the
availability of database statistics on tables referenced in queries. In the past such statistics were
automatically gathered during optimization time for columns of tables over which indexes exist. With
SQE statistics on columns without indexes can now be gathered and will be used during optimization.
Column statistics comprise histograms, frequent values list, and column cardinality.
With System i servers, the database statistics collection process is handled automatically, while on many
platforms statistics collection is a manual process that is the responsibility of the database administrator. It
is rarely necessary for the statistics to be manually updated, even though it is possible to manage statistics
manually. The Statistics Manager determines on what columns statistics are needed, when the statistics
collection should be run and when the statistics need to be refreshed. Statistics are automatically
collected as low priority work in the background, so as to minimize the impact to other work on the
system. The manual collection of statistics is run with the normal user job priority.
The system automatically determines what columns to collect statistics on based on what queries have run
on the system. Therefore for queries which have slower than expected performance results, a check
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
50
Download from Www.Somanuals.com. All Manuals Search And Download.
should be made to determine if the needed statistics are available. Also in environments where long
running queries are run only one time, it may be beneficial to ensure that statistics are available prior to
running the queries.
Some properties of database column statistics are as follows:
y
y
Column statistics occupy little storage, on average 8-12k per column.
Column Statistics are gathered through one full scan of the database file for any given number of
columns in the database file.
y
y
Column statistics are maintained periodically through means of statistics refreshing mechanisms
that require a full scan of the database file.
Column statistics are packed in one concise data structure that requires few I/Os to page it into
main memory during query optimization.
As stated above, statistics may have a direct effect on the quality of the access plan chosen by the query
optimizer and thereby influence the end user query performance. Shown below is an illustrative example
that underscores the effect of statistics on access plan selection process.
Statistic Usage Example:
Select * from T1, T2 where T1.A=T2.A and T1.B = ’VALUE1’ and T2.C = ‘VALUE2’
Database characteristics: indexes on T1.A and T2.A exist, NO column statistics, T1 has 100 million rows,
T2 has 10 million rows. T1 is 1 GB and T2 0.1 GB
Since statistics are not available, the optimizer has to consider default estimates for selectivity of T1.B =
’VALUE1’ ==> 10% T2.C = ‘VALUE2’ ==> 10%
The actual estimates are T1.B = ’VALUE1’ ===>10% and T2.C = ‘VALUE2’ ===>0.00001%
Based on selectivity estimates the optimizer will select the following access plan
Scan(T1) - Probe (T2.A index) - > Probe (T2 Table) ---
the real cost for the above access plan would be approximately 8192 I/Os + 3600 I/Os ~ 11792 I/Os
If column statistics existed on T2.C the selectivity estimate for T2.C = ‘VALUE2’ would be 10 rows or
0.00001%
And the query optimizer would select the following plan instead
Scan(T2) - Probe (T1.A index) - > Probe (T1 Table)
Accordingly the real cost could be calculated as follows:
819 I/Os + 10 I/Os ~ 830 I/Os. The result of having statistics on T2.C led to an access plan that is
faster by order of magnitude from a case where no statistics exist.
For more information on database statistics collection see the DB2 for i5/OS Database Performance and
Query Optimization manual.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
51
Download from Www.Somanuals.com. All Manuals Search And Download.
SQE for V5R2 Summary
Enhancements to DB2 for i5/OS, called SQE, were made in V5R2. The SQE enhancements are object
oriented implementations of the SQE optimizer, the SQE query engine and the SQE database statistics. In
V5R2 a subset of the read-only SQL queries will be optimized and run with the SQE enhancements. The
effect of SQE on performance will vary by workload and configuration. For the most recent information
on SQE please see the SQE web page on the DB2 for i5/OS web site located at
www.iseries.ibm.com/db2/sqe.html. More information on SQE for V5R2 is also available in the V5R2
redbook Preparing for and Tuning the V5R2 SQL Query Engine.
4.5 Indexing
Index usage can dramatically improve the performance of DB2 SQL queries. For detailed information on
using indexes see the white paper Indexing Strategies for DB2 for i5/OS at
http://www-1.ibm.com/servers/enable/site/education/abstracts/indxng_abs.html . The paper provides basic
information about indexes in DB2 for i5/OS, the data structures underlying them, how the system uses
them and index strategies. Also discussed are the additional indexing considerations related to
maintenance, tools and methods.
Encoded Vector Indices (EVIs)
DB2 for i5/OS supports the Encoded Vector Index (EVI) which can be created through SQL. EVIs
cannot be used to order records, but in many cases, they can improve query performance. An EVI has
several advantages over a traditional binary radix tree index.
y
y
y
The query optimizer can scan EVIs and automatically build dynamic (on-the-fly) bitmaps much more
quickly than from traditional indexes.
EVIs can be built much faster and are much smaller than traditional indexes. Smaller indexes require
less DASD space and also less main storage when the query is run.
EVIs automatically maintain exact statistics about the distribution of key values, whereas traditional
indexes only maintain estimated statistics. These EVI statistics are not only more accurate, but also
can be accessed more quickly by the query optimizer.
EVIs are used by the i5/OS query optimizer with dynamic bitmaps and are particularly useful for
advanced query processing. EVIs will have the biggest impact on the complex query workloads found in
business intelligence solutions and ad-hoc query environments. Such queries often involve selecting a
limited number of rows based on the key value being among a set of specific values (e.g. a set of state
names).
When an EVI is created and maintained, a symbol table records each distinct key value and also a
corresponding unique binary value (the binary value will be 1, 2, or 4 bytes long, depending on the
number of distinct key values) that is used in the main part of the EVI, the vector (array). The subscript
of each vector (array) element represents the relative record number of a database table row. The vector
has an entry for each row. The entry in each element of the vector contains the unique binary value
corresponding to the key value found in the database table row.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
52
Download from Www.Somanuals.com. All Manuals Search And Download.
4.6 DB2 Symmetric Multiprocessing feature
Introduction
The DB2 SMP feature provides application transparent support for parallel query operations on a single
tightly-coupled multiprocessor System i (shared memory and disk). In addition, the symmetric
multiprocessing (SMP) feature provides additional query optimization algorithms for retrieving data. The
database manager can automatically activate parallel query processing in order to engage one or more
system processors to work simultaneously on a single query. The response time can be dramatically
improved when a processor bound query is executed in parallel on multiple processors. For more
information on access methods which use the SMP feature and how to enable SMP see the DB2 for i5/OS
Database Performance and Query Optimization manual in the System i information center.
Decision Support Queries
The SMP feature is most useful when running decision support (DSS) queries. DSS queries which
generally give answers to critical business questions tend to have the following characteristics:
y
y
y
y
examine large volumes of data
are far more complex than most OLTP transactions
are highly CPU intensive
includes multiple order joins, summarizations and groupings
DSS queries tend to be long running and can utilize much of the system resources such as processor
capacity (CPU) and disk. For example, it is not unusual for DSS queries to have a response time longer
than 20 seconds. In fact, complex DSS queries may run an hour or longer. The CPU required to run a
DSS query can easily be 100 times greater than the CPU required for a typical OLTP transaction. Thus, it
is very important to choose the right System i for your DSS query and data warehousing needs.
SMP Performance Summary
The SMP feature provides performance improvement for query response times. The overall response time
for a set of DSS queries run serially at a single work station may improve more than 25 percent when
SMP support is enabled. The amount of improvement will depend in part on the number of processors
participating in each query execution and the optimization algorithms used to implement the query. Some
individual queries can see significantly larger gains.
An online course, DB2 Symmetric Multiprocessing for System i: Database Parallelism within i5/OS,
including a pdf form of the course materials is available at
4.7 DB2 for i5/OS Memory Sharing Considerations
DB2 for i5/OS has internal algorithms to automatically manage and share memory among jobs. This
eliminates the complexity of setting and tuning many parameters which are essential to getting good
performance on other database products. The memory sharing algorithms within SQE and i5/OS will
limit the amount of memory available to execute an SQL query to a ‘job share’. The optimizer will
choose an access plan which is optimal for the job’s share of the memory pool and the query engine will
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
53
Download from Www.Somanuals.com. All Manuals Search And Download.
limit the amount of data it brings into and keeps in memory to a job’s share of memory. The amount of
memory available to each job is inversely proportional to the number of active jobs in a memory pool.
The memory-sharing algorithms discussed above provide balanced performance for all the jobs running in
a memory pool. Running short transactional queries in the same memory pool as long running, data
intensive queries is acceptable. However, if it is desirable to get maximum performance for long-running,
data-intensive queries it may be beneficial to run these types of queries in a memory pool dedicated to
this type of workload. Executing long-running, data-intensive queries in the same memory pool with a
large volume of short transactional queries will limit the amount of memory available for execution of the
long-running query. The plan choice and engine execution of the long-running query will be tuned to run
in the amount of memory comparable to that available to the jobs running the short transactional queries.
In many cases, data-intensive, long-running queries will get improved performance with larger amounts
of memory. With more memory available the optimizer is able to consider access plans which may use
more memory, but will minimize runtime. The query engine will also be able to take advantage of
additional memory by keeping more data in memory potentially eliminating a large number of DASD
I/Os. Also, for a job executing long-running performance critical queries in a separate pool, it may be
beneficial to set QQRYDEGREE=*MAX. This will allow all memory in the pool to be used by the job to
process a query. Thus running the longer-running, data intensive queries in a separate pool may
dramatically reduce query runtime.
4.8 Journaling and Commitment Control
Journaling
The primary purpose of journal management is to provide a method to recover database files. Additional
uses related to performance include the use of journaling to decrease the time required to back up
database files and the use of access path journaling for a potentially large reduction in the length of
abnormal IPLs. For more information on the uses and management of journals, refer to the Sytem i
Backup and Recovery Guide. For more detailed information on the performance impact of journaling see
the redbook Striving for Optimal Journal Performance on DB2 Universal Database for System i.
The addition of journaling to an application will impact performance in terms of both CPU and I/O as the
application changes to the journaled file(s) are entered into the journal. Also, the job that is making the
changes to the file must wait for the journal I/O to be written to disk, so response time will in many cases
be affected as well.
Journaling impacts the performance of each job differently, depending largely on the amount of database
writes being done. Applications doing a large number of writes to a journaled file will most likely show a
significant degradation both in CPU and response time while an application doing only a limited number
of writes to the file may show only a small impact.
Remote Journal Function
The remote journal function allows replication of journal entries from a local (source) System i to a
remote (target) System i by establishing journals and journal receivers on the target system that are
associated with specific journals and journal receivers on the source system. Some of the benefits of
using remote journal include:
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
54
Download from Www.Somanuals.com. All Manuals Search And Download.
y
y
Allows customers to replace current programming methods of capturing and transmitting journal
entries between systems with more efficient system programming methods. This can result in lower
CPU consumption and increased throughput on the source system.
Can significantly reduce the amount of time and effort required by customers to reconcile their source
and target databases after a system failure. If the synchronous delivery mode of remote journal is used
(where journal entries are guaranteed to be deposited on the target system prior to control being
returned to the user application), then there will be no journal entries lost. If asynchronous delivery
mode is used, there may be some journal entries lost, but the number of entries lost will most likely be
fewer than if customer programming methods were used due to the reduced system overhead of
remote journal.
y
Journal receiver save operations can be offloaded from the source system to the target system, thus
further reducing resource and consumption on the source system.
Hot backup, data replication and high availability applications are good examples of applications which
can benefit from using remote journal. Customers who use related or similar software solutions from
other vendors should contact those vendors for more information.
System-Managed Access Path Protection (SMAPP)
System-Managed Access Path Protection (SMAPP) offers system monitoring of potential access path
rebuild time and automatically starts and stops journaling of system selected access paths. In the unlikely
event of an abnormal IPL, this allows for faster access path recovery time.
SMAPP does implicit access path journaling which provides for limited partial/localized recovery of the
journaled access paths. This provides for much faster IPL recovery steps. An estimation of how long
access path recovery will take is provided by SMAPP, and SMAPP provides a setting for the acceptable
length of recovery. SMAPP is shipped enabled with a default recovery time. For most customers, the
default value will minimize the performance impact, while at the same time provide a reasonable and
predictable recovery time and protection for critical access paths. But the overhead of SMAPP will vary
from system to system and application to application. As the target access path recovery time is lowered,
the performance impact from SMAPP will increase as the SMAPP background tasks have to work harder
to meet this target. There is a balance of recovery time requirements vs. the system resources required by
SMAPP.
Although SMAPP may start journaling access paths, it is recommended that the most
important/large/critical/performance sensitive access paths be journaled explicitly with STRJRNAP. This
eliminates the extra overhead of SMAPP evaluating these access paths and implicitly starting journaling
for the same access path day after day. A list of the currently protected access paths may be seen as an
option from the DSPRCYAP screen. Indexes which consistently show up at the top of this list may be
good candidates for explicit journaling via the STRJRNAP command. As identifying important access
paths can be a difficult task, SMAPP provides a good safety net to protect those not explicitly journaled.
In addition to the setting to specify a target recovery time, SMAPP also has the following special settings
which may be selected with the EDTRCYAP and CHGRCYAP commands:
y
y
y
*MIN - all exposed indexes will be protected
*NONE - no indexes will be protected; SMAPP statistics will be maintained
*OFF - no indexes will be protected; No SMAPP statistics will be maintained (Restricted Mode)
It is highly recommended that SMAPP protection NOT be turned off.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
55
Download from Www.Somanuals.com. All Manuals Search And Download.
There are 3 sets of tasks which do the SMAPP work. These tasks work in the background at low priority
to minimize the impact of SMAPP on system performance. The tasks are as follows:
y
y
y
JO_EVALUATE-TASK - Evaluates indexes, estimates rebuild time for an index, and may start or
stop implicit journaling of an index.
JO-TUNING-TASK - Periodically wakes up to consider where the user recovery threshold is set and
manages which indexes should be implicitly journaled.
JORECRA-DEF-XXX and JORECRA-USR-XXX tasks are the worker tasks which sweep aged
journal pages from main memory to minimize the amount of recovery needed during IPL.
Here are guidelines for lowering the amount of work for each of these tasks:
y
y
If the JO-TUNING-TASK seems busy, you may want to increase SMAPP recovery target time.
If the JO-EVALUATE task seems busy, explicitly journaling the largest access paths may help or
looks for jobs that are opening/closing files repeatedly.
y
y
If the JORECRA tasks seem busy, you may want to increase journal recovery ratio.
Also, if the target recovery time is not being met there may be SMAPP ineligible access paths. These
should be modified so as to become SMAPP eligible.
To monitor the performance impacts of SMAPP there are Performance Explorer trace points and a
substantial set of Collection Services counters which provide information on the SMAPP work.
SMAPP makes a decision of where to place the implicit access path journal entries. If the underlying
physical file is not journaled, SMAPP will place the entries in a default (hidden) system journal. If the
underlying physical file is journaled, SMAPP will place the implicit journal entries in the same place.
SMAPP automatically manages the system journal. For the user journal receivers used by SMAPP,
RCVSIZOPT(*RMVINTENT), as specified on the CHGJRN command, is a recommended option. The
disk space used by SMAPP may be displayed with the EDTRCYAP and DSPRCYAP commands. It
rarely exceeds 1% of the ASP size.
For more information on SMAPP see the Systems management -> Journal management ->
System-managed access path protection section in the System i information center.
Commitment Control
Commitment control is an extension to the journal function that allows users to ensure that all changes to
a transaction are either all complete or, if not complete, can be easily backed out. The use of commitment
control adds two more journal entries, one at the beginning of the committed transaction and one at the
end, resulting in additional CPU and I/O overhead. In addition, the time that record level locks are held
increases with the use of commitment control. Because of this additional overhead and possible additional
record lock contention, adding commitment control will in many cases result in a noticeable degradation
in performance for an application that is currently doing journaling.
4.9 DB2 Multisystem for i5/OS
DB2 Multisystem for i5/OS offers customers the ability to distribute large databases across multiple
System i servers in order to gain nearly unlimited scalability and improved performance for many large
query operations. Multiple System i servers are coupled together in a shared-nothing cluster where each
system uses its own main memory and disk storage. Once a database is properly partitioned among the
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
56
Download from Www.Somanuals.com. All Manuals Search And Download.
multiple nodes in the cluster, access to the database files is seamless and transparent to the applications
and users that reference the database. To the users, the partitioned files still behave as though they were
local to their system.
The most important aspect of obtaining optimal performance with DB2 Multisystem is to plan ahead for
what data should be partitioned and how it should be partitioned. The main idea behind this planning is to
ensure that the systems in the cluster run in parallel with each other as much as possible when processing
distributed queries while keeping the amount of communications data traffic to a minimum. Following is
a list of items to consider when planning for the use of distributed data via DB2 Multisystem.
y
Avoid large amounts of data movement between systems. A distributed query often achieves optimal
performance when it is able to divide the query among several nodes, with each node running its
portion of the query on data that is local to that system and with a minimum number of accesses to
remote data on other systems. Also, if a file that is heavily used for transaction processing is to be
distributed, it should be done such that most of the database accesses are local since remote accesses
may add significantly to response times.
y
Choosing which files to partition is important. The largest improvements will be for queries on large
files. Files that are primarily used for transaction processing and not much query processing are
generally not good candidates for partitioning. Also, partitioning files with only a small number of
records will generally not result in much improvement and may actually degrade performance due to
the added communications overhead.
y
y
Choose a partitioning key that has many different values. This will help ensure a more even
distribution of the data across the multiple nodes. In addition, performance will be best if the
partitioning key is a single field that is a simple data type.
It is best to choose a partition key that consists of a field or fields whose values are not updated.
Updates on partition keys are only allowed if the change to the field(s) in the key will not cause that
record to be partitioned to a different node.
y
y
If joins are often performed on multiple files using a single field, use that field as the partitioning key
for those files. Also, the fields used for join processing should be of the same data type.
It will be helpful to partition the database files based on how quickly each node can process its
portion of the data when running distributed queries. For example, it may be better to place a larger
amount of data on a large multiprocessor system than on a smaller single processor system. In
addition, current normal utilization levels of other resources such as main memory, DASD and IOPs
should be considered on each system in order to ensure that no one individual system becomes a
bottleneck for distributed query performance.
y
For the best query performance involving distributed files, avoid the use of commitment control when
possible. DB2 Multisystem uses two-phase commit, which can add a significant amount of overhead
when running distributed queries.
For more information on DB2 Multisystem refer to the DB2 Multisystem manual.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
57
Download from Www.Somanuals.com. All Manuals Search And Download.
4.10 Referential Integrity
In a database user environment, there are frequent cases where the data in one file is dependent upon the
data in another file. Without support from the database management system, each application program
that updates, deletes or adds new records to the files must contain code that enforces the data dependency
rules between the files. Referential Integrity (RI) is the mechanism supported by DB2 that offers its users
the ability to enforce these rules without specifically coding them in their application(s). The data
dependency rules are implemented as referential constraints via either CL commands or SQL statements
that are available for adding, removing and changing these constraints.
For those customers that have implemented application checking to maintain integrity of data among
files, there may be a noticeable performance gain when they change the application to use the referential
integrity support. The amount of improvement depends on the extent of checking in the existing
application. Also, the performance gain when using RI may be greater if the application currently uses
SQL statements instead of HLL native database support to enforce data dependency rules.
When implementing RI constraints, customers need to consider which data dependencies are the most
commonly enforced in their applications. The customer may then want to consider changing one or more
of these dependencies to determine the level of performance improvement prior to a full scale
implementation of all data dependencies via RI constraints.
For more information on Referential Integrity see the chapter Ensuring Data Integrity with Referential
Constraints in DB2 Universal Database for System i Database Programming manual and the redbook
Advanced Functions and Administration on DB2 Universal Database for System i.
4.11 Triggers
Trigger support for DB2 allows a user to define triggers (user written programs) to be called when records
in a file are changed. Triggers can be used to enforce consistent implementation of business rules for
database files without having to add the rule checking in all applications that are accessing the files. By
doing this, when the business rules change, the user only has to change the trigger program.
There are three different types of events in the context of trigger programs: insert, update and delete.
Separate triggers can be defined for each type of event. Triggers can also be defined to be called before or
after the event occurs.
Generally, the impact to performance from applying triggers on the same system for files opened without
commitment control is relatively low. However, when the file(s) are under commitment control, applying
triggers can result in a significant impact to performance.
Triggers are particularly useful in a client server environment. By defining triggers on selected files on
the server, the client application can cause synchronized, systematic update actions to related files on the
server with a single request. Doing this can significantly reduce communications traffic and thus provide
noticeably better performance both in terms of response time and CPU. This is true whether or not the file
is under commitment control.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
58
Download from Www.Somanuals.com. All Manuals Search And Download.
The following are performance tips to consider when using triggers support:
y
Triggers are activated by an external call. The user needs to weigh the benefit of the trigger against
the cost of the external call.
y
y
If a trigger is going to be used, leave as much validation to the trigger program as possible.
Avoid opening files in a trigger program under commitment control if the trigger program does not
cause changes to commitable resources.
y
Since trigger programs are called repeatedly, minimize the cost of program initialization and
unneeded repeated actions. For example, the trigger program should not have to open and close a file
every time it is called. If possible, design the trigger program so that the files are opened during the
first call and stay open throughout. To accomplish this, avoid SETON LR in RPG, STOP RUN in
COBOL and exit() in C.
y
y
If the trigger program opens a file multiple times (perhaps in a program which it calls), make use of
shared opens whenever possible.
If the trigger program is written for the Integrated Language Environment (ILE), make sure it uses the
caller's activation group. Having to start a new activation group every time the time the trigger
program is called is very costly.
y
If the trigger program uses SQL statements, it should be optimized such that SQL makes use of
reusable ODPs.
In conclusion, the use of triggers can help enforce business rules for user applications and can possibly
help improve overall system performance, particularly in the case of applying changes to remote systems.
However, some care needs to be used in designing triggers for good performance, particularly in the cases
where commitment control is involved. For more information see the redbook Stored Procedures,
Triggers and User Defined Functions on DB2 Universal Database for System i.
4.12 Variable Length Fields
Variable length field support allows a user to define any number of fields in a file as variable length, thus
potentially reducing the number of bytes that need to be stored for a particular field.
Description
Variable length field support on i5/OS has been implemented with a spill area, thus creating two possible
situations: the non-spill case and the spill case. With this implementation, when the data overflows, all of
the data is stored in the spill portion. An example would be a variable length field that is defined as
having a maximum length of 50 bytes and an allocated length of 20 bytes. In other words, it is expected
that the majority of entries in this field will be 20 bytes or less and occasionally there will be a longer
entry up to 50 bytes in length. When inserting an entry that has a length of 20 bytes or less that entry will
be inserted into the allocated part of the field. This is an example of a non-spill case. However, if an entry
is inserted that is, for example, 35 bytes long, all 35 bytes will go into the spill area.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
59
Download from Www.Somanuals.com. All Manuals Search And Download.
To create the variable length field just described, use the following DB2 statement:
CREATE TABLE library/table-name
(field VARCHAR(50) ALLOCATE(20) NOT NULL)
In this particular example the field was created with the NOT NULL option. The other two options are
NULL and NOT NULL WITH DEFAULT. Refer to the NULLS section in the SQL Reference to
determine which NULLS option would be best for your use. Also, for additional information on variable
length field support, refer to either the SQL Reference or the SQL Programming Concepts.
Performance Expectations
y
Variable length field support, when used correctly, can provide performance improvements in many
environments. The savings in I/O when processing a variable length field can be significant. The
biggest performance gains that will be obtained from using variable length fields are for description
or comment types of fields that are converted to variable length. However, because there is additional
overhead associated with accessing the spill area, it is generally not a good idea to convert a field to
variable length if the majority (70-100%) of the records would have data in this area. To avoid this
problem, design the variable length field(s) with the proper allocation length so that the amount of
data in the spill area stays below the 60% range. This will also prevent a potential waste of space
with the variable length implementation.
y
y
Another potential savings from the use of variable length fields is in DASD space. This is particularly
true in implementations where there is a large difference between the ALLOCATE and the
VARCHAR attributes AND the amount of spill data is below 60%. Also, by minimizing the size of
the file, the performance of operations such as CPYF (Copy File) will also be improved.
When using a variable length field as a join field, the impact to performance for the join will depend
on the number of records returned and the amount of data that spills. For a join field that contains a
low percentage of spill data and which already has an index built over it that can be used in the join, a
user would most likely find the performance acceptable. However, if an index must be built and/or
the field contains a large amount of overflow, a performance problem will likely occur when the join
is processed.
y
y
Because of the extra processing that is required for variable length fields, it is not a good idea to
convert every field in a file to variable length. This is particularly true for fields that are part of an
index key. Accessing records via a variable length key field is noticeably slower than via a fixed
length key field. Also, index builds over variable length fields will be noticeably slower than over
fixed length fields.
When accessing a file that contains variable length fields through a high-level language such as
COBOL, the variable that the field is read into must be defined as variable or of a varying length. If
this is not done, the data that is read in to the fixed length variable will be treated as fixed length. If
the variable is defined as PIC X(40) and only 25 bytes of data is read in, the remaining 15 bytes will
be space filled. The value in that variable will now contain 40 bytes. The following COBOL
example shows how to declare the receiving variable as a variable length variable:
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
Chapter 4 - DB2 Performance
© Copyright IBM Corp. 2008
60
Download from Www.Somanuals.com. All Manuals Search And Download.
01 DESCR.
49 DESCR-LEN
49 DESCRIPTION
PIC S9(4) COMP-4.
PIC X(40).
EXEC SQL
FETCH C1 INTO DESCR
END-EXEC.
For more detail about the vary-length character string, refer to the SQL Programmer's Guide.
The above point is also true when using a high-level language to insert values into a variable length
field. The variable that contains the value to be inserted must be declared as variable or varying. A
PL/I example follows:
DCL FLD1 CHAR(40) VARYING;
FLD1 = XYZ Company;
EXEC SQL
INSERT INTO library/file VALUES
("001453", FLD1, ...);
Having defined FLD1 as VARYING will, for this example, insert a data string of 11 bytes into the
field corresponding with FLD1 in this file. If variable FLD1 had not been defined as VARYING, a
data string of 40 bytes would be inserted into the corresponding field. For additional information on
the VARYING attribute, refer to the PL/I User's Guide and Reference.
y
In summary, the proper implementation and use of DB2 variable length field support can help provide
overall improvements in both function and performance for certain types of database files. However,
the amount of improvement can be greatly impacted if the new support is not used correctly, so users
need to take care when implementing this function.
4.13 Reuse Deleted Record Space
Description of Function
This section discusses the support for reuse of deleted record space. This database support provides the
customer a way of placing newly-added records into previously deleted record spaces in physical files.
This function should reduce the requirement for periodic physical file reorganizations to reclaim deleted
record space. File reorganization can be a very time consuming process depending on the size of the file
and the number of indexes over it, along with the reorganize options selected. To activate the reuse
function, set the Reuse deleted records (REUSEDLT) parameter to *YES on the CRTPF (Create Physical
File) The default value when creating a file with CRTPF is *NO (do not reuse). The default for SQL
Create Table is *YES.
Comparison to Normal Inserts
Inserts into deleted record spaces are handled differently than normal inserts and have different
performance characteristics. For normal inserts into a physical file, the database support will find the end
of the file and seize it once for exclusive use for the subsequent adds. Added records will be written in
blocks at the end of the file. The size of the blocks written will be determined by the default block size or
by the size specified using an Override Database File (OVRDBF) command. The SEQ(*YES number of
records) parameter can be used to set the block size.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
61
Download from Www.Somanuals.com. All Manuals Search And Download.
In contrast, when reuse is active, the database support will process the added record more like an update
operation than an add operation. The database support will maintain a bit map to keep track of deleted
records and to provide fast access to them. Before a record can be added, the database support must use
the bit-map to find the next available deleted record space, read the page containing the deleted record
entry into storage, and seize the deleted record to allow replacement with the added record. Lastly, the
added records are blocked as much as permissible and then written to the file.
To summarize, additional CPU processing will be required when reuse is active to find the deleted
records, perform record level seizes and maintain the bit-map of deleted records. Also, there may be some
additional disk I/O required to read in the deleted records prior to updating them. However, this extra
overhead is generally less than the overhead associated with a sequential update operation.
Performance Expectations
The impact to performance from implementing the reuse deleted records function will vary depending on
the type of operation being done. Following is a summary of how this function will affect performance
for various scenarios:
y
y
y
When blocking was not specified, reuse was slightly faster or equivalent to the normal insert
application. This is due to the fact that reuse by default blocks up records for disk I/Os as much as
possible.
Increasing the number of indexes over a file will cause degradation for all insert operations,
regardless of whether reuse is used or not. However, with reuse activated, the degradation to insert
operations from each additional index is generally higher than for normal inserts.
The RGZPFM (Reorganize Physical File Member) command can run for a long period of time,
depending on the number of records in the file and the number of indexes over the file and the chosen
command options. Even though activating the reuse function may cause some performance
degradation, it may be justified when considering reorganization costs to reclaim deleted record
space.
y
y
The reuse function can always be deactivated if the customer encounters a critical time window where
no degradation is permissible. The cost of activating/de-activating reuse is relatively low in most
cases.
Because the reuse function can lead to smaller sized files, the performance of some applications may
actually improve, especially in cases where sequential non-keyed processing of a large portion of the
file(s) is taking place.
4.14 Performance References for DB2
1. The home page for DB2 Universal Database for System i is found at
This web site includes the recent announcement information, white paper and technical articles, and
DB2 education information.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
62
Download from Www.Somanuals.com. All Manuals Search And Download.
2. The System i information center section on DB2 for i5/OS under Database and file systems has
information on all aspects of DB2 for i5/OS including the section Monitor and Tune database under
Administrative topics. This can be found at url:
3. Information on creating efficient running queries and query performance monitoring and tuning is
found in the DB2 for i5/OS Database Performance and Query Optimization manual. This document
contains detailed information on access methods, the query optimizer, and optimizing query
performance including using database monitor to monitor queries, using QAQQINI file options and
using indexes. To access this document look in the Printable PDF section in the System i information
center.
4. The System i redbooks provide performance information on a variety of topics for DB2. The redbook
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 4 - DB2 Performance
63
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 5. Communications Performance
There are many factors that affect System i performance in a communications environment. This chapter
discusses some of the common factors and offers guidance on how to help achieve the best possible
performance. Much of the information in this chapter was obtained as a result of analysis experience
within the Rochester development laboratory. Many of the performance claims are based on supporting
performance measurement and analysis with the NetPerf and Netop workloads. In some cases, the actual
performance data is included here to reinforce the performance claims and to demonstrate capacity
characteristics. The NetPerf and Netop workloads are described in section 5.2.
This chapter focuses on communication in non-secure and secure environments on Ethernet solutions
using TCP/IP. Many applications require network communications to be secure. Communications and
cryptography, in these cases, must be considered together. Secure Socket Layer (SSL), Transport Layer
Security (TLS) and Virtual Private Networking (VPN) capacity characteristics will be discussed in
section 5.5 of this chapter. For information about how the Cryptographic Coprocessor improves
performance on SSL/TLS connections, see section 8.4 of Chapter 8, “Cryptography Performance.”
Communications Performance Highlights for IBM i Operation System 5.4:
y
y
The support for the new Internet Protocol version 6 (IPv6) has been enhanced. The new IPv6
functions are consistent at the product level with their respective IPv4 counterparts.
Support is added for the 10 Gigabit Ethernet optical fiber input/output adapters (IOAs) 573A and
576A. These IOAs do not require an input/output processor (IOP) to be installed in conjunction with
the IOA. Instead the IOA can be plugged into a PCI bus slot and the IOA is controlled by the main
processor. The 573A is a 10 Gigabit SR (short reach) adapter, which uses multimode fiber (MMF)
and has a duplex LC connector. The 573A can transmit to lengths of 300 meters. The 576A is a 10
Gigabit LR (long reach) adapter, which uses single mode fiber (SMF) and has a duplex SC connector.
The 576A can transmit to lengths of 10 kilometers. Both of these adapters support TCP/IP, 9000-byte
jumbo frames, checksum offloading and the IEEE 802.3ae standard.
y
The IBM 5706 2-Port 10/100/1000 Base-TX PCI-X IOA and IBM 5707 2-Port Gigabit Ethernet-SX
PCI-X IOA supports checksum offloading and 9000-byte jumbo frames (1 Gigabit only). These
adapters do not require an IOP to be installed in conjunction with the IOA.
y
y
The IBM 5701 10/100/1000 Base-TX PCI-X IOA does not require an IOP to be installed in
conjunction with the IOA.
The IBM Cryptographic Access Provider product, 5722-AC3 (128-bit) is no longer required. This is
a new development for the 5.4 release of IBM i Operation System. All 5.4 systems are capable of the
function that was previously provided in the 5722-AC3 product. This is relevant for SSL
communications.
Communications Performance Highlights for IBM i Operation System 5.4.5:
y
The IBM 5767 2-Port 10/100/1000 Based-TX PCI-E IOA and IBM 5768 2-Port Gigabit Ethernet-SX
PCI-E IOA supports checksum offloading and 9000-byte jumbo frames (1 Gigabit only). These
adapters do not require an IOP to be installed in conjunction with the IOA.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 5 - Communications Performance
64
Download from Www.Somanuals.com. All Manuals Search And Download.
y
IBM’s Host Ethernet Adapter (HEA) integrated 2-Port 10/100/1000 Based-TX PCI-E IOA supports
checksum offloading, 9000-byte jumbo frames (1 Gigabit only) and LSO - Large Send Offload (IPv4
only). These adapters do not require an IOP to be installed in conjunction with the IOA.
Additionally, each physical port has 16 logical ports that may be assigned to other partitions and
allows each partition to utilize the same physical port simultaneously with the following limitation:
one logical port, per physical port, per partition.
Communications Performance Highlights for IBM i Operation System 6.1:
y
Additional enhancement in Internet Protocol version 6 (IPv6) in the following areas:
1. Advanced Sockets APIs
2. Path MTU Discovery
3. Correspondent Node Mobility Support
4. Support of Privacy extensions to stateless address auto-configuration
5. Virtual IP address,
6. Multicast Listener Discovery v2 support
7. Router preferences and more specific route advertisement support
8. Router load sharing.
y
y
Additional enhancement in Internet Protocol version 4 (IPv4) in the following areas:
1. Remote access proxy fault tolerance
2. IGMP v3 support for IPv4 multicast.
Large Send Offload support was implemented for Host Ethernet Adapter ports on Internet Protocol
version 4 (IPv4).
5.1 System i Ethernet Solutions
The need for communication between computer systems has grown over the last decades, and TCP/IP
over Ethernet has grown with it. We currently have arrived where different factors influence the
capabilities of the Ethernet. Some of these influences can come from the cabling and adapter type
chosen. Limiting factors can be the capabilities of the hub or switch used, the frame size you are able to
transmit and receive, and the type of connection used. The System i server is capable of transmitting and
receiving data at speeds of 10 megabits per second (10 Mbps) to 10 gigabits per second (10 Gbps or
000 Mbps) using an Ethernet IOA. Functions such as full duplex also enhance the communication speeds
and the overall performance of Ethernet.
10
Table 5.1 contains a list of Ethernet input/output adapters that are used to create the results in this chapter.
Table 5.1
Ethernet input/output adapters
CCIN3
Description
Speed6
(Mbps)
Jumbo Operations Duplex mode capability
frames
Console
Full
Half
supported supported
28491 10/100 Mbps Ethernet
10 / 100
1000
10 / 100 / 1000
No
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
Yes
No
Yes
No
No
57002 IBM Gigabit Ethernet-SX PCI-X
57011 IBM 10/100/1000 Base-TX PCI-X
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
57061 IBM 2-Port 10/100/1000 Base-TX PCI-X7 10 / 100 / 1000
Yes
Yes
Yes
Yes
No
57072 IBM 2-Port Gigabit Ethernet-SX PCI-X7
57671 IBM 2-Port 10/100/1000 Base-TX PCI-e7
57682 IBM 2-Port Gigabit Ethernet-SX PCI-e7
573A2 IBM 10 Gigabit Ethernet-SX PCI-X
1000
10 / 100 / 1000
1000
10000
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 5 - Communications Performance
65
Download from Www.Somanuals.com. All Manuals Search And Download.
181A1 IBM 2-Port 10/100/1000 Base-TX PCI-e7
181B2 IBM 2-Port Gigabit Base-SX PCI-e
181C1 IBM 4-Port 10/100/1000 Base-TX PCI-e7
10 / 100 / 1000
10000
10 / 100 / 1000
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
N/A
N/A
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
No
18191 IBM 4-Port 10/100/1000 Base-TX PCI-e7,9 10 / 100 / 1000
N/A Virtual Ethernet4
N/A Blade8
Notes:
n/a5
n/a5
Yes
1. Unshielded Twisted Pair (UTP) card; uses copper wire cabling
2. Uses fiber optics
3. Custom Card Identification Number and System i Feature Code
4. Virtual Ethernet enables you to establish communication via TCP/IP between logical partitions and can be used without
any additional hardware or software.
5. Depends on the hardware of the system.
6. These are theoretical hardware unidirectional speeds
7. Each port can handle 1000 Mbps
8. Blade communicates with the VIOS Partition via Virtual Ethernet
9. Host Ethernet Adapter for IBM Power 550, 9409-M50 running IBM i Operating System
y
All adapters support Auto-negotiation
5.2 Communication Performance Test Environment
Hardware
All PCI-X measurements for 100 Mbps and 1 Gigabit were completed on an IBM System i 570+ 8-Way
(2.2 GHz). Each system is configured as an LPAR, and each communication test was performed between
two partitions on the same system with one dedicated CPU. The gigabit IOAs were installed in a
133MHz PCI-X slot.
The measurements for 10 Gigabit were completed on two IBM System i 520+ 2-Way (1.9 GHz) servers.
Each System i server is configured as a single LPAR system with one dedicated CPU. Each
communication test was performed between the two systems and the 10 Gigabit IOAs were installed in
the 266 MHz PCI-X DDR(double data rate) slot for maximum performance. Only the 10 Gigabit Short
Reach (573A) IOA’s were used in our test environment.
All PCI-e measurements were completed on an IBM System i 9406-MMA 7061 16 way or IBM Power
550, 9409-M50. Each system is configured as an LPAR, and each communication test was performed
between two partitions on the same system with one dedicated CPU. The Gigabit IOA's where installed in
a PCI-e 8x slot.
All Blade Center measurements where collected on a 4 processor 7998-61X Blade in a Blade Center
H chassis, 32 GB of memory. The AIX partition running the VIOS server was not limited. All
performance data was collect with the Blade running as the server. The System i partition (on the Blade)
was limited to 1 CPU with 4 GB of memory and communicated with an external IBM System i 570+
8-Way (2.2 GHz) configured as a single LPAR system with one dedicated CPU and 4 GB of Memory.
Software
The NetPerf and Netop workloads are primitive-level function workloads used to explore
communications performance. Workloads consist of programs that run between a System i client and a
System i server, Multiple instances of the workloads can be executed over multiple connections to
increase the system load. The programs communicate with each other using sockets or SSL APIs.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 5 - Communications Performance
66
Download from Www.Somanuals.com. All Manuals Search And Download.
To demonstrate communications performance in various ways, several workload scenarios are analyzed.
Each of these scenarios may be executed with regular nonsecure sockets or with secure SSL using the
GSK API:
1. Request/Response (RR): The client and server send a specified amount of data back and forth over
a connection that remains active.
2. Asymmetric Connect/Request/Response (ACRR): The client establishes a connection with the
server, a single small request (64 bytes) is sent to the server, and a response (8K bytes) is sent by the
server back to the client, and the connection is closed.
3. Large transfer (Stream): The client repetitively sends a given amount of data to the server over a
connection that remains active.
The NetPerf and Netop tools used to measure these benchmarks merely copy and transfer the data from
memory. Therefore, additional consideration must be given to account for other normal application
processing costs (for example, higher CPU utilization and higher response times due to disk access time).
A real user application will have this type of processing as only a percentage of the overall workload.
The IBM Systems Workload Estimator, described in Chapter 23, reflects the performance of real user
applications while averaging the impact of the differences between the various communications protocols.
The real world perspective offered by the Workload Estimator can be valuable for projecting overall
system capacity.
5.3 Communication and Storage observations
With the continued progress in both communication and storage technology, it is possible that the
performance bottleneck shifts. Especially with high bandwidth communication such as 10 Gigabit and
Virtual ethernet, storage technology could become the limiting factor.
DASD Performance
Storage performance is dependent on the configuration and amount of disk units within your partition.
Table 14.1.2.2 in chapter 14. DASD Performance shows this for save and restore operations for 2
different IOA’s. See the chapter for detailed information.
Table 5.2 - Copy of Table 14.1.2.2 in chapter 14. DASD Performance
IOA and operation
2778 IOA
Number of 35 GB DASD units (Measurement numbers in GB/HR)
15 Units
30 Units
83
45 Units
122
122
Save
41
41
*SAVF
Restore
2757 IOA
83
Save
Restore
82
82
165
165
250
250
*SAVF
Large data transfer (FTP)
When transferring large amounts of data, for example with FTP, DASD performance plays an important
role. Both the sending and receiving end could limit the communication speed when using high
bandwidth communication. Also in a multi-threading environment, having more then one streaming
session could improve overall communication performance when the DASD throughput is available.
Table 5.3
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 5 - Communications Performance
67
Download from Www.Somanuals.com. All Manuals Search And Download.
Virtual Ethernet
FTP
Performance in MB per second
1 Disk Unit ASP on 2757 IOA 15 Disk Units ASP on 2757 IOA
1 Session
2 Sessions
3 Sessions
10.8
10.5
10.4
42.0
70.0
75.0
5.4 TCP/IP non-secure performance
In table 5.4 you will find the payload information for the different Ethernet types. The most important
factor with streaming is to determine how much data can be transferred. The results are listed in bits and
bytes per second. Virtual Ethernet does not have a raw bit rate, since the maximum throughput is
determined by the CPU.
Table 5.4
Streaming Performance
Raw bit rate1
(Mbits per second)
100
Payload Simplex3
(Mbits per second)
93.5
Payload Duplex4
(Mbits per second)
170.0
Ethernet Type
100 Megabit
1 Gigabit
MTU2
1,492
1,492
8,992
1,492
8,992
1,492
8,992
1,492
8,992
1,492
8,992
1,492
8,992
1,492
8,992
935.4
935.9
3745.4
8789.6
986.4
941.1
2811.8
9800.7
2913.1
9392.3
2823.5
9813.7
933.1
8553.0
1740.3
1753.1
4400.7
9297.0
1481.4
1960.9
6331.0
10586.4
3305.2
9276.9
6332.3
10602.3
1014.4
11972.3
1,000
10,000
1,000
10 Gigabit5
HEA 1 Gigabit
160,007
10,000
160,007
HEA 10 Gigabit
Blade8
Virtual6
n/a
n/a
Notes:
1. The Raw bit rate value is the physical media bit rate and does not reflect physical media overheads
2. Maximum Transmission Unit. The large (8992 bytes) MTU is also referred to as Jumbo Frames.
3. Simplex is a single direction TCP data stream.
4. Duplex is a bidirectional TCP data stream.
5. The 10 Gigabit results were obtained by using multiple sessions, because a single sessions is incapable to fully utilize the
10 Gigabit adapter.
6. Virtual Ethernet uses Jumbo Frames only, since large packets are supported throughout the whole connection path.
7. HEA P.P.U.T (Partition to Partition Unicast Traffic or internal switch) 16 Gbps per port group.
8. 4 Processor 7998-61X Blade
9. All measurements are performed with Full Duplex Ethernet.
Streaming data is not the only type of communication handled through Ethernet. Often server and client
applications communicate with small packets of data back and forth (RR). In the case of web browsers,
the most common type is to connect, request and receive data, then disconnect (ACRR). Table 5.5
provides some rough capacity planning information for these RR and ACRR communications.
Table 5.5
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 5 - Communications Performance
68
Download from Www.Somanuals.com. All Manuals Search And Download.
RR & ACRR Performance
(Transactions per second per server CPU)
Transaction Type
Request/Response
(RR) 128 Bytes
Threads
1 Gigabit
991.32
1330.45
261.51
279.64
Virtual
873.62
912.34
218.82
221.21
1
26
1
Asym. Connect/Request/Response
(ACRR) 8K Bytes
26
Notes:
y
y
y
y
y
Capacity metrics are provided for nonsecure transactions
The table data reflects System i as a server (not a client)
The data reflects Sockets and TCP/IP
This is only a rough indicator for capacity planning. Actual results may differ significantly.
All measurement where taken with Packet Trainer off (See 5.6 for line dependent performance enhancements)
Here the results show the difference in performance for different Ethernet cards compared with Virtual
Ethernet. We also added test results with multiple threads to give an insight on the performance when a
system is stressed with multiple sessions.
This information is of similar type to that provided in Chapter 6, Web Server Performance. There are also
capacity planning examples in that chapter.
5.5 TCP/IP Secure Performance
With the growth of communication over public network environments like the Internet, securing the
communication data becomes a greater concern. Good examples are customers providing personal data to
complete a purchase order (SSL) or someone working away from the office, but still able to connect to
the company network (VPN).
SSL
SSL was created to provide a method of session security, authentication of a server or client, and message
authentication. SSL is most commonly used to secure web communication, but SSL can be used for any
reliable communication protocol (such as TCP). The successor to SSL is called TLS. There are slight
differences between SSL v3.0 and TLS v1.0, but the protocol remains substantially the same. For the
data gathered here we only use the TLS v1.0 protocol. Table 5.6 provides some rough capacity planning
information for SSL communications, when using 1 Gigabit Ethernet.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 5 - Communications Performance
69
Download from Www.Somanuals.com. All Manuals Search And Download.
Table 5.6
SSL Performance
(transactions per second per server CPU)
Nonsecure
TCP/IP
RC4 /
MD5
RC4 /
SHA-1
AES128 /
SHA-1
AES256 /
SHA-1
TDES /
SHA-1
Transaction Type:
Request/Response
(RR) 128 Byte
Asym. Connect/Request/Response
(ACRR) 8K Bytes
Large Transfer
1167
249.7
478.4
565.4
53.4
55.7
530.0
48.0
53.3
479.6
31.3
36.9
462.1
27.4
31.9
202.2
4.8
6.5
(Stream) 16K Bytes
Notes:
y
y
y
y
Capacity metrics are provided for nonsecure and each variation of security policy
The table data reflects System i as a server (not a client)
This is only a rough indicator for capacity planning. Actual results may differ significantly.
Each SSL connection was established with a 1024 bit RSA handshake.
This table gives an overview on performance results on using different encryption methods in SSL
compared to regular TCP/IP. The encryption methods we used range from fast but less secure (RC4 with
MD5) to the slower but more secure (AES or TDES with SHA-1).
With SSL there is always a fixed overhead, such as the session handshake. The variable overhead is
based on the number of bytes that need to be encrypted/decrypted, the size of the public key, the type of
encryption, and the size of the symmetric key.
These results may be used to estimate a system’s potential transaction rate at a given CPU utilization
assuming a particular workload and security policy. Say the result of a given test is 5 transactions per
second per server CPU. Then multiplying that result with 50 will tell that at 50% CPU utilization a
transaction rate of 250 transactions per second is possible for this type of SSL communication on this
environment. Similarly when a capacity of 100 transactions per second is required, the CPU utilization
can be approximated by dividing 100 by 5, which gives a 20% CPU utilization in this environment. These
are only estimations on how to size the workload, since actual results might vary. Similar information
about SSL capacity planning can be found in Chapter 6, Web Server Performance.
Table 5.7 below illustrates relative CPU consumption for SSL instead of potential capacity. Essentially,
this is a normalized inverse of the CPU capacity data from Table 5.6. It gives another view of the impact
of choosing one security policy over another for various NetPerf scenarios.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 5 - Communications Performance
70
Download from Www.Somanuals.com. All Manuals Search And Download.
Table 5.7
SSL Relative Performance
(scaled to Nonsecure baseline)
Nonsecure
TCP/IP
RC4 /
MD5
RC4 /
SHA-1
AES128 /
SHA-1
AES256 /
SHA-1
TDES /
SHA-1
Transaction Type:
Request/Response
(RR) 128 Byte
Asym. Connect/Request/Response
(ACRR) 8K Bytes
Large Transfer
1.0 x
1.0 y
1.0 z
2.1
4.7
8.6
2.2
5.2
9.0
2.4
8.0
2.5
9.1
5.8
51.7
73.7
13.0
15.0
(Stream) 16K Bytes
Notes:
y
y
y
y
y
Capacity metrics are provided for nonsecure and each variation of security policy
The table data reflects System i as a server (not a client)
This is only a rough indicator for capacity planning. Actual results may differ significantly.
Each SSL connections was established with a 1024 bit RSA handshake.
x, y and z are scaling constants, one for each NetPerf scenario.
VPN
Although the term Virtual Private Networks (VPN) didn’t start until early 1997, the concepts behind VPN
started around the same time as the birth of the Internet. VPN creates a secure tunnel to communicate
from one point to another using an unsecured network as media. Table 5.8 provides some rough capacity
planning information for VPN communication, when using 1 Gigabit Ethernet.
Table 5.8
VPN Performance
(transactions per second per server CPU)
Nonsecure
TCP/IP
AH with
MD5
ESP with
RC4 / MD5
ESP with
AES128 /
SHA-1
ESP with TDES /
SHA-1
Transaction Type:
Request/Response
(RR) 128 Byte
Asym. Connect/Request/Response
(ACRR) 8K Bytes
Large Transfer
1167.0
249.7
478.4
428.5
49.9
44.0
322.9
37.7
31.0
307.71
32.7
148.4
9.1
25.6
5.4
(Stream) 16K Bytes
Notes:
y
y
y
Capacity metrics are provided for nonsecure and each variation of security policy
The table data reflects System i as a server (not a client)
VPN measurements used transport mode, TDES, AES128 or RC4 with 128-bit key symmetric cipher and MD5 message
digest with RSA public/private keys. VPN antireplay was disabled.
y
This is only a rough indicator for capacity planning. Actual results may differ significantly.
This table also shows a range of encryption methods to give you an insight on the performance between
less secure but faster, or more secure but slower methods, all compared to unsecured TCP/IP.
Table 5.9 below illustrates relative CPU consumption for VPN instead of potential capacity. Essentially,
this is a normalized inverse of the CPU capacity data from Table 5.6. It gives another view of the impact
of choosing one security policy over another for various NetPerf scenarios.
Table 5.9
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 5 - Communications Performance
71
Download from Www.Somanuals.com. All Manuals Search And Download.
VPN Relative Performance
(scaled to Nonsecure baseline)
Nonsecure
TCP/IP
AH with
MD5
ESP with
RC4 / MD5
ESP with
AES128 /
SHA-1
ESP with TDES /
SHA-1
Transaction Type:
Request/Response
(RR) 128 Byte
Asym. Connect/Request/Response
(ACRR) 8K Bytes
Large Transfer
1.0 x
1.0 y
1.0 z
2.7
5.0
3.6
6.6
3.8
7.6
7.9
27.5
88.8
10.9
15.4
18.7
(Stream) 16K Bytes
Notes:
y
y
y
Capacity metrics are provided for nonsecure and each variation of security policy
The table data reflects System i as a server (not a client)
VPN measurements used transport mode, TDES, AES128 or RC4 with 128-bit key symmetric cipher and MD5 message
digest with RSA public/private keys. VPN anti-replay was disabled.
y
y
This is only a rough indicator for capacity planning. Actual results may differ significantly.
x, y and z are scaling constants, one for each NetPerf scenario.
The SSL and VPN measurements are based on a specific set of cipher methods and public key sizes.
Other choices will perform differently.
5.6 Performance Observations and Tips
y
Communication performance on Blades may see an increase when the processors are in shared mode.
This is workload dependent.
y
y
Host Ethernet Adapters require 40 to 56 MB for memory per logical port to vary on.
IBM Power 550, 9409-M50 May show 2 to 5 percent increase over IBM Power 520, 9408-M25 due
to the incorporation of L3 cache. Results will vary based on workload and configuration.
y
y
y
Virtual ethernet should always be configured with jumbo frame enabled
In 6.1 Packet Trainer is defaulted to "off" but can be configured per Line Description in 6.1.
Virtual ethernet may see performance increases with Packet Trainer turn on. This depends on
workload, connection type and utilization.
y
y
Physical Gigabit lines may see performance increases with Packet Trainer off. This depends on
workload, connection type and utilization.
Host Ethernet Adapter should not be used for performance sensitive workloads, your throughput can
be greatly affected by the use of other logical ports connected to your physical port on additional
partitions.
y
Host Ethernet Adapter may see performance increases with Packet Trainer set to on, especially with
regard to HEA’s internal Logical Switch and Partition to Partition traffic via the same port group.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 5 - Communications Performance
72
Download from Www.Somanuals.com. All Manuals Search And Download.
y
y
For additional information regarding your Host Ethernet Adapter please see your specification
manual and the Performance Management page for future white papers regarding iSeries and HEA.
1 Gigabit Jumbo frame Ethernet enables 12% greater throughput compared to normal frame 1 Gigabit
Ethernet. This may vary significantly based on your system, network and workload attributes.
Measured 1 Gigabit Jumbo Frame Ethernet throughput approached 1 Gigabit/sec
y
The jumbo frame option requires 8992 Byte MTU support by all of the network components
including switches, routers and bridges. For System Adapter configuration, LINESPEED(*AUTO)
and DUPLEX(*FULL) or DUPLEX(*AUTO) must also be specified. To confirm that jumbo frames
have been successfully configured throughout the network, use NETSTAT option 3 to “Display
Details” for the active jumbo frame network connection.
y
y
Using *ETHV2 for the "Ethernet Standard" attribute of CRTLINETH may see slight performance
increase in STREAMING workloads for 1 Gigabit lines.
Always ensure that the entire communications network is configured optimally. The maximum
frame size parameter (MAXFRAME on LIND) should be maximized. The maximum
transmission unit (MTU) size parameter (CFGTCP command) for both the interface and the route
affect the actual size of the line flows and should be configured to *LIND and *IFC respectively.
Having configured a large frame size does not negatively impact performance for small transfers.
Note that both the System i and the other link station must be configured for large frames. Otherwise,
the smaller of the two maximum frame size values is used in transferring data. Bridges may also limit
the maximum frame size.
y
y
When transferring large amounts of data, maximize the size of the application's send and receive
requests. This is the amount of data that the application transfers with a single sockets API call.
Because sockets does not block up multiple application sends, it is important to block in the
application if possible.
With the CHGTCPA command using the parameters TCPRCVBUF and TCPSNDBUF you can alter
the TCP receive and send buffers. When transferring large amounts of data, you may experience
higher throughput by increasing these buffer sizes up to 8MB. The exact buffer size that provides the
best throughput will be dependent on several network environment factors including types of
switches and systems, ACK timing, error rate and network topology. In our test environment we used
1 MB buffers. Read the help for this command for more information.
y
y
Application time for transfer environments, including accessing a data base file, decreases the
maximum potential data rate. Because the CPU has additional work to process, a smaller percentage
of the CPU is available to handle the transfer of data. Also, serialization from the application's use of
both database and communications will reduce the transfer rates.
TCP/IP Attributes (CHGTCPA) now includes a parameter to set the TCP closed connection wait
time-out value (TCPCLOTIMO) . This value indicates the amount of time, in seconds, for which a
socket pair (client IP address and port, server IP address and port) cannot be reused after a connection
is closed. Normally it is set to at least twice the maximum segment lifetime. For typical applications
the default value of 120 seconds, limiting the system to approximately 500 new socket pairs per
second, is fine. Some applications such as primitive communications benchmarks work best if this
setting reflects a value closer to twice the true maximum segment lifetime. In these cases a setting of
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 5 - Communications Performance
73
Download from Www.Somanuals.com. All Manuals Search And Download.
only a few seconds may perform best. Setting this value too low may result in extra error handling
impacting system capacity.
y
No single station can or is expected to use the full bandwidth of the LAN media. It offers up to the
media's rated speed of aggregate capacity for the attached stations to share. The disk access time is
usually the limiting resource. The data rate is governed primarily by the application efficiency
attributes (for example, amount of disk accesses, amount of CPU processing of data, application
blocking factors, etc.).
y
y
LAN can achieve a significantly higher data rate than most supported WAN protocols. This is due to
the desirable combination of having a high media speed along with optimized protocol software.
Communications applications consume CPU resource (to process data, to support disk I/O, etc.) and
communications line resource (to send and receive data). The amount of line resource that is
consumed is proportional to the total number of bytes sent or received on the line. Some additional
CPU resource is consumed to process the communications software to support the individual sends
(puts or writes) and receives (gets or reads).
y
y
When several sessions use a line concurrently, the aggregate data rate may be higher. This is due to
the inherent inefficiency of a single session in using the link. In other words, when a single job is
executing disk operations or doing non-overlapped CPU processing, the communications link is idle.
If several sessions transfer concurrently, then the jobs may be more interleaved and make better use
of the communications link.
The CPU usage for high speed connections is similar to "slower speed" lines running the same type of
work. As the speed of a line increases from a traditional low speed to a high speed, performance
characteristics may change.
y Interactive transactions may be slightly faster
y Large transfers may be significantly faster
y A single job may be too serialized to utilize the entire bandwidth
y High throughput is more sensitive to frame size
y High throughput is more sensitive to application efficiency
y System utilization from other work has more impact on throughput
y
When developing scalable communication applications, consider taking advantage of the
Asynchronous and Overlapped I/O Sockets interface. This interface provides methods for threaded
client server model applications to perform highly concurrent and have memory efficient I/O.
Additional implementation information is available in the Sockets Programming guide.
5.7 APPC, ICF, CPI-C, and Anynet
•
Ensure that APPC is configured optimally for best performance: LANMAXOUT on the CTLD (for
APPC environments): This parameter governs how often the sending system waits for an
acknowledgment. Never allow LANACKFRQ on one system to have a greater value than
LANMAXOUT on the other system. The parameter values of the sending system should match the
values on the receiving system. In general, a value of *CALC (i.e., LANMAXOUT=2) offers the
best performance for interactive environments, and adequate performance for large transfer
environments. For large transfer environments, changing LANMAXOUT to 6 may provide a
significant performance increase. LANWNWSTP for APPC on the controller description (CTLD): If
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 5 - Communications Performance
74
Download from Www.Somanuals.com. All Manuals Search And Download.
there is network congestion or overruns to certain target system adapters, then increasing the value
from the default=*NONE to 2 or something larger may improve performance. MAXLENRU for
APPC on the mode description (MODD): If a value of *CALC is selected for the maximum SNA
request/response unit (RU) the system will select an efficient size that is compatible with the frame
size (on the LIND) that you choose. The newer LAN IOPs support IOP assist. Changing the RU size
to a value other than *CALC may negate this performance feature.
•
•
•
Some APPC APIs provide blocking (e.g., ICF and CPI-C), therefore scenarios that include repetitive
small puts (that may be blocked) may achieve much better performance.
A large transfer with the System i sending each record repetitively using the default blocking
provided by OS/400 to the System i client provides the best level of performance.
A large transfer with the System i flushing the communications buffer after each record (FRCDTA
keyword for ICF) to the System i client consumes more CPU time and reduces the potential data rate.
That is, each record will be forced out of the server system to the client system without waiting to be
blocked with any subsequent data. Note that ICF and CPI-C support blocking, Sockets does not.
A large transfer with the System i sending each record requiring a synchronous confirm (e.g.,
CONFIRM keyword for ICF) to the System is client uses even more CPU and places a high level of
serialization reducing the data rate. That is, each record is forced out of the server system to the client
system. The server system program then waits for the client system to respond with a confirm
(acknowledgment). The server application cannot send the next record until the confirm has been
received.
Compression with APPC should be used with caution and only for slower speed WAN environments.
Many suggest that compression should be used with speeds 19.2 kbps and slower and is dependent on
the data being transmitted (# of blanks, # and type of repetitions, etc.). Compression is very
CPU-intensive. For the CPB benchmark, compression increases the CPU time by up to 9 times. RLE
compression uses less CPU time than LZ9 compression (MODD parameters).
•
•
•
•
ICF and CPI-C have very similar performance for small data transfers.
ICF allows for locate mode which means one less move of the data. This makes a significant
difference when using larger records.
•
The best case data rate is to use the normal blocking that OS/400 provides. For best performance, the
use of the ICF keywords force data and confirm should be minimized. An application's use of these
keywords has its place, but the tradeoff with performance should be considered. Any deviation from
using the normal blocking that OS/400 provides may cause additional trips through the
communications software and hardware; therefore, it increases both the overall delay and the amount
of resources consumed.
•
•
Having ANYNET = *YES causes extra CPU processing. Only have it set to *YES if it is needed
functionally; otherwise, leave it set to *NO.
For send and receive pairs, the most efficient use of an interface is with it's "native" protocol stack.
That is, ICF and CPI-C perform the best with APPC, and Sockets performs best with TCP/IP. There
is CPU time overhead when the "cross over" is processed. Each interface/stack may perform
differently depending on the scenario.
•
•
Copyfile with DDM provides an efficient way to transfer files between System i systems. DDM
provides large blocking which limits the number of times the communications support is invoked. It
also maximizes efficiencies with the data base by doing fewer larger I/Os. Generally, a higher data
rate can be achieved with DDM compared with user-written APPC programs (doing data base
accesses) or with ODF.
When ODF is used with the SNDNETF command, it must first copy the data to the distribution queue
on the sending system. This activity is highly CPU-intensive and takes a considerable amount of
time. This time is dependent on the number and size of the records in the file. Sending an object to
more than one target System i server only requires one copy to the distribution queue. Therefore, the
realized data rate may appear higher for the subsequent transfers.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 5 - Communications Performance
75
Download from Www.Somanuals.com. All Manuals Search And Download.
•
FTS is a less efficient way to transfer data. However, it offers built in data compression for line
speeds less than a given threshold. In some configurations, it will compress data when using LAN;
this significantly slows down LAN transfers.
5.8 HPR and Enterprise extender considerations
Enterprise Extender is a protocol that allows the transmission of APPC data over IP only infrastructure. In
System i support for Enterprise Extender is added in 5.4. The communications using Enterprise Extender
protocol can be achieved by creating a special kind of APPC controller, with LINKTYPE parameter of
*HPRIP.
Enterprise Extender (*HPRIP) APPC controllers are not attached to a specific line. Because of this, the
controller uses the LDLCLNKSPD parameter to determine the initial link speed to the remote system.
After a connection has been started, this speed is adjusted automatically, using the measured network
values. However if the value of LDLCLNKSPD is too different to the real link speed value at the
beginning, the initial connections will not be using optimally the network. A high value will cause too
many packets to be dropped, and a low value will cause the system not to reach the real link speed for
short bursts of data.
In a laboratory controlled environment with an isolated 100 Mbps Ethernet network, the following
average response times were observed on the system (not including the time required to start a SNA
session and allocate a conversation):
Table 5.9
Test Type
HPRIP Link
Speed = 10Mbps
0.001 sec
HPRIP Link Speed AnyNet
= 100Mbps
LAN
Short Request
with echo
0.001 sec
0.001 sec
0.001 sec
Short Request
64K Request
with echo
0.001 sec
0.019 sec
0.001 sec
0.010 sec
0.003 sec
13 sec
0.003 sec
2 sec
64K Request
1GB Request
with echo
0.019 sec
6:14 min
0.010 sec
6:08 min
5 sec
7:22 min
1 sec
6:04 min
1GB Request
Send File using
sndnetf (1GB)
2:32 min
5:12 min
2:17 min
5:16 min
3:33 min
5:40 min
3:00 min
5:23 min
The tests were done between two IBM System i5 (9406-820 and 9402-400) servers in an isolated
network.
Allocation time refers to the time that it takes for the system to start a conversation to the remote system.
The allocation time might be greater when a SNA session has not yet started to the remote system.
Measured allocation speed times where of 14 ms, in HPRIP systems in average, while in AnyNet
allocation times where of 41 ms in average.
The HPRIP controllers have slightly higher CPU usage than controllers that use a direct LAN attach. The
CPU usage is similar to the one measured on AnyNet APPC controllers. On laboratory testing, a LAN
transaction took 3 CPW, while HPRIP and AnyNet, both took 3.7 CPW.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 5 - Communications Performance
76
Download from Www.Somanuals.com. All Manuals Search And Download.
5.9 Additional Information
Extensive information can be found at the System i Information Center web site at:
y
For network information select “Networking”:
y
y
See “TCP/IP setup” d “Internet Protocol version 6” for IPv6 information
See “Network communications” d “Ethernet” for Ethernet information.
y
For application development select “Programming”:
y
See “Communications” d “Socket Programming” for the Sockets Programming guide.
Information about Ethernet cards can be found at the IBM Systems Hardware Information Center. The
link for this information center is located on the IBM Systems Information Centers Page at:
y
See “Managing your server and devices” d “Managing devices” d “Managing Peripheral
Component Interconnect (PCI) adapters” for Ethernet PCI adapters information.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 5 - Communications Performance
77
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 6. Web Server and WebSphere Performance
This section discusses System i performance information in Web serving and WebSphere environments.
Specific products that are discussed include: HTTP Server (powered by Apache) (in section 6.1), PHP -
Zend Core for i (6.2), WebSphere Application Server and WebSphere Application Server - Express (6.3),
Web Facing (6.4), Host Access Transformation Services (6.5), System Application Server Instance (6.6),
WebSphere Portal Server (6.7), WebSphere Commerce (6.8), WebSphere Commerce Payments (6.9), and
Connect for iSeries (6.10).
The primary focus of this section will be to discuss the performance characteristics of the System i
platform as a server in a Web environment, provide capacity planning information, and recommend
actions to help achieve high performance. Having a high-performance network infrastructure is very
important for Web environments; please refer to Chapter 5, “Communications Performance” for related
information and tuning tips.
Web Overview: There are many factors that can impact overall performance (e.g., end-user response
time, throughput) in the complex Web environment, some of which are listed below:
1) Web Browser or client
y processing speed of the client system
y performance characteristics and configuration of the Web browser
y client application performance characteristics
2) Network
y speed of the communications links
y capacity and caching characteristics of any proxy servers
y the responsiveness of any other related remote servers (e.g., payment gateways)
y congestion of network resources
3) System i Web Server and Applications
y System i processor capacity (indicated by the CPW value)
y utilization of key System i server resources (CPU, IOP, memory, disk)
y Web server performance characteristics
y application (e.g., CGI, servlet) performance characteristics
Comparing traditional communications to Web-based transactions: For commercial applications,
data accesses across the Internet differs distinctly from accesses across 'traditional' communications
networks. The additional resources to support Internet transactions by the CPU, IOP, and line are
significant and must be considered in capacity planning. Typically, in a traditional network:
y
y
y
there is a request and response (between client and server)
connections/sessions are maintained between transactions
networks are well-understood and tuned
Typically for Web transactions, there may be a dozen or more line transmissions per transaction:
y
y
y
y
y
a connection is established/closed for each transaction
there is a request and response (between client and server)
one user transaction may contain many separate Internet transactions
secure transactions are more frequent and consume more resource
with the Internet, the network may not be well-understood (route, components, performance)
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 6 - Web Server and WebSphere
78
Download from Www.Somanuals.com. All Manuals Search And Download.
Information source and disclaimer: The information in the sections that follow is based on
performance measurements and analysis done in the internal IBM performance lab. The raw data is not
provided here, but the highlights, general conclusions, and recommendations are included. Results listed
here do not represent any particular customer environment. Actual performance may vary significantly
from what is provided here. Note that these workloads are measured in best-case environments (e.g.,
local LAN, large MTU sizes, no errors). Real Internet networks typically have higher contention, higher
levels of logging and security, MTU size limitations, and intermediate network servers (e.g., proxy,
SOCKS); and therefore, it would likely consume more resources.
6.1 HTTP Server (powered by Apache)
The HTTP Server (powered by Apache) for i5/OS has some exciting new features for V5R4. The level of
the HTTP Server has been increased to support Apache 2.0.52 and is now a UTF-8 server. This means
that requests are being received and then processed as UTF-8 rather than first being converted to EBCDIC
and then processed. This will make porting open source modules for the HTTP Server on your IBM
System i easier than before. For more information on what’s new for HTTP Server for i5/OS, visit
This section discusses some basic information about HTTP Server (powered by Apache) and gives you
some insight about the relative performance between primitive HTTP Server tests.
The typical high-level flow for Web transactions: the connection is made, the request is received and
processed by the HTTP server, the response is sent to the browser, and the connection is ended. If the
browser has multiple file requests for the same HTTP server, it is possible to get the multiple requests
with one connection. This feature is known as persistent connection and can be set using the KeepAlive
directive in the HTTP server configuration.
To understand the test environment and to better interpret performance tools reports or screens it is
helpful to know that the following jobs and tasks are involved: communications router tasks
(IPRTRnnn), several HTTP jobs with at least one with many threads, and perhaps an additional set of
application jobs/threads.
“Web Server Primitives” Workload Description: The “Web Server Primitives” workload is driven by
the program ApacheBench 2.0.40-dev that runs on a client system and simulates multiple Web browser
clients by issuing URL requests to the Web Server. The number of simulated clients can be adjusted to
vary the offered load, which was kept at a moderate level. Files and programs exist on the IBM System i
platform to support the various transaction types. Each of the transaction types used are quite simple, and
will serve a static response page of specified data length back to the client. Each of the transactions can
be served in a secure (HTTPS:) or a non-secure (HTTP:) fashion. The HTTP server environment is a
partition of an IBM System i 570+ 8-Way (2.2Ghz), configured with one dedicated CPU and a 1 Gbps
communication adapter.
y
Static Page: HTTP retrieves a file from IFS and serves the static page. The HTTP server can be
configured to cache the file in its local cache to reduce server resource consumption. FRCA (Fast
Response Caching Accelerator) can also be configured to cache the file deeper in the operating
system and further reduce resource consumption.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 6 - Web Server and WebSphere
79
Download from Www.Somanuals.com. All Manuals Search And Download.
y
CGI: HTTP invokes a CGI program which builds a simple HTML page and serves it via the HTTP
server. This CGI program can run in either a new or a named activation group. The CGI programs
were compiled using a "named" activation group unless specified otherwise.
Web Server Capacity Planning: Please use the IBM Systems Workload Estimator to do capacity
planning for Web environments using the following workloads: Web Serving, WebSphere, WebFacing,
WebSphere Portal Server, WebSphere Commerce. This tool allows you to suggest a transaction rate and
to further characterize your workload. You’ll find the tool along with good help text at:
utilize this tool (also chapter 23).
The following tables provide a summary of the measured performance data for both static and dynamic
Web server transactions. These charts should be used in conjunction with the rest of the information in
this section for correct interpretation. Results listed here do not represent any particular customer
environment. Actual performance may vary significantly from what is provided here.
Relative Performance Metrics:
y
“Relative Capacity Metric: This metric is used throughout this section to demonstrate the relative
capacity performance between primitive tests. Because of the diversity of each environment the
ability to scale these results could be challenging, but they are provided to give you an insight into the
relation between the performance of each primitive HTTP Server test..
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 6 - Web Server and WebSphere
80
Download from Www.Somanuals.com. All Manuals Search And Download.
Table 6.1 i5/OS V5R4 Web Serving Relative Capacity - Static Page
Relative Capacity Metrics
Transaction Type:
Static Page - IFS
Static Page - Local Cache
Static Page - FRCA
Non-secure
2.016
3.538
34.730
Secure
1.481
2.235
n/a
Notes/Disclaimers:
y Data assumes no access logging, no name server interactions, KeepAlive on, LiveLocalCache off
y Secure: 128-bit RC4 symmetric cipher and MD5 message digest with 1024-bit RSA public/private keys
y These results are relative to each other and do not scale with other environments
y Transactions using more complex programs or serving larger files will have lower capacities that what is listed here.
HTTP Server (powered by Apache) for i5/OS
V5R4 Relative Capacity for Static Page
Non-Secure
Secure
35
30
25
20
15
10
5
0
IFS
Local Cache
FRCA
Figure 6.1 i5/OS V5R4 Web Serving Relative Capacities - Various Transactions
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
81
Download from Www.Somanuals.com. All Manuals Search And Download.
Table 6.2 i5/OS V5R4 Web Serving Relative Capacity - CGI
Relative Capacity Metrics
Transaction Type:
CGI - New Activation
CGI - Named Activation
Non-secure
0.092
Secure
0.090
0.436
0.475
Notes/Disclaimers:
y Data assumes no access logging, no name server interactions, KeepAlive on, LiveLocalCache off
y Secure: 128-bit RC4 symmetric cipher and MD5 message digest with 1024-bit RSA public/private keys
y These results are relative to each other and do not scale with other environments
y Transactions using more complex programs or serving larger files will have lower capacities that what is listed here.
HTTP Server (powered by Apache) for i5/OS
V5R4 Relative Capacity for CGI
Non-Secure
Secure
0.5
0.4
0.3
0.2
0.1
0
New Activation
Named Activation
Figure 6.2 i5/OS V5R4 Web Serving Relative Capacities - Various Transactions
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
82
Download from Www.Somanuals.com. All Manuals Search And Download.
Table 6.3 i5/OS V5R4 Web Serving Relative Capacity for Static (varied sizes)
Relative Capacity Metrics
`10K Bytes
Transaction Type:
1K Bytes
100K Bytes
KeepAlive
Off
On
Off
On
Off
On
1.558
2.407
11.564
2.016
3.538
34.730
1.347
2.095
7.691
1.793
3.044
13.539
0.830
0.958
1.873
1.068
1.243
2.622
Static Page - IFS
Static Page - Local Cache
Static Page - FRCA
Notes/Disclaimers:
y These results are relative to each other and do not scale with other environments.
y IBM System i CPU features without an L2 cache will have lower web server capacities than the CPW value would indicate
HTTP Server (powered by Apache) for i5/OS
V5R4 Relative Capacities for Static Pages by Size
IFS
Local Cache
FRCA
35
30
25
20
15
10
5
0
<- - - - - - - - KeepAlive on - - - - - - - - >
<- - - - - - - - KeepAlive off - - - - - - - ->
1KB
10KB
100KB
1KB
10KB
100KB
Figure 6.3 i5/OS V5R4 Web Serving Relative Capacity for Static Pages and FRCA
Web Serving Performance Tips and Techniques:
1. HTTP software optimizations by release:
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
83
Download from Www.Somanuals.com. All Manuals Search And Download.
a. V5R4 provides similar Web server performance compared with V5R3 for most transactions (with
similar hardware). In V5R4 there are opportunities to exploit improved CGI performance. More
information can be found in the FAQ section of the HTTP server website
improve the performance of my CGI program?”
b. V5R3 provided similar Web server performance compared with V5R2 for most transactions (with
similar hardware).
c. V5R2 provided opportunities to exploit improved performance. HTTP Server (powered by
Apache) was updated to current levels with improved performance and scalability. FRCA (Fast
Response Caching Accelerator) was new with V5R2 and provided a high-performance
compliment to the HTTP Server for highly-used static content. FRCA generally reduces the CPU
consumption to serve static pages by half, potentially doubling the Web server capacity.
2. Web Server Cache for IFS Files: Serving static pages that are cached locally in the HTTP Server’s
cache can significantly increase Web server capacity (refer to Table 6.3 and Figure 6.3). Ensure that
highly used files are selected to be in the cache to limit the overhead of accessing IFS. To keep the
cache most useful, it may be best not to consume the cache with extremely large files. Ensure that
highly used small/medium files are cached. Also, consider using the LiveLocalCache off directive if
possible. If the files you are caching do not change, you can avoid the processing associated with
checking each file for any updates to the data. A great deal of caution is recommeded before enabling
this directive.
3. FRCA: Fast Response Caching Accelerator is newly implemented for V5R2. FRCA is based on
AFPA (Adaptive Fast Path Architecture), utilizes NFC (Network File Cache) to cache files, and
interacts closely with the HTTP Server (powered by Apache). FRCA greatly improves Web server
performance for serving static content (refer to Table 6.3 and Figure 6.3). For best performance,
FRCA should be used to store static, non-secure content (pages, gifs, images, thumbnails). Keep in
mind that HTTP requests served by FRCA are not authenticated and that the files served by FRCA
need to have an ASCII CCSID and correct authority. Taking advantage of all levels of caching is
really the key for good e-Commerce performance (local HTTP cache, FRCA cache, WebSphere
Commerce cache, etc.).
4. Page size: The data in the Table 6.1 and Table 6.2 assumes that a small amount of data is being
served (say 100 bytes). Table 6.3 illustrates the impact of serving larger files. If the pages are larger,
more bytes are processed, CPU processing per transaction significantly increases, and therefore the
transaction capacity metrics are reduced. This also increases the communication throughput, which
can be a limiting factor for the larger files. The IBM Systems Workload Estimator can be used for
capacity planning with page size variations (see chapter 23).
5. CGI with named activations: Significant performance benefits can be realized by compiling a CGI
program into a "named" versus a "new" activation group, perhaps up to 5x better. It is essential for
good performance that CGI-based applications use named activation groups. Refer to the i5/OS ILE
Concepts for more details on activation groups. When changing architectures, recompiling CGI
programs could boost server performance by taking advantage of compiler optimizations.
6. Secure Web Serving: Secure Web serving involves additional overhead to the server for Web
environments. There are primarily two groups of overhead: First, there is the fixed overhead of
establishing/closing a secure connection, which is dominated by key processing. Second, there is the
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
84
Download from Www.Somanuals.com. All Manuals Search And Download.
variable overhead of encryption/decryption, which is proportional to the number of bytes in the
transaction. Note the capacity factors in the tables above comparing non-secure and secure serving.
From Table 6.1, note that simple transactions (e.g., static page serving), the impact of secure serving
is around 20%. For complex transactions (e.g., CGI, servlets), the overhead is more watered down.
This relationship assumes that KeepAlive is used, and therefore the overhead of key processing can
be minimized. If KeepAlive is not used (i.e., a new connection, a new cached or abbreviated
handshake, more key processing, etc.), then there will be a hit of 7x or more CPU time for using
secure transaction. To illustrate this, a noncached SSL static transaction using KeepAlive has a
relative capacity of 1.481(from Table 6.1); this compares to 0.188 (not included in the table) when
KeepAlive is off. However, if the handshake is forced to be a regular or full handshake, then the
CPU time hit will be around 50x (relative capacity 0.03). The lesson here is to: 1) limit the use of
security to where it is needed, and 2) use KeepAlive if possible.
7. Persistent Requests and KeepAlive: Keeping the TCP/IP connection active during a series of
transactions is called persistent connection. Taking advantage of the persistent connection for a series
of Web transactions is called Persistent Requests or KeepAlive. This is tuned to satisfy an entire
typical Web page being able to serve all imbedded files on that same connection.
a. Performance Advantages: The CPU and network overhead of establishing and closing a
connection is very significant, especially for secure transactions. Utilizing the same connection
for several transactions usually allows for significantly better performance, in terms of reduced
resource consumption, higher potential capacity, and lower response time.
b. The down side: If persistent requests are used, the Web server thread associated with that series
of requests is tied up (only if the Web Server directive AsyncIO is turned Off). If there is a
shortage of available threads, some clients may wait for a thread non-proportionally long. A
time-out parameter is used to enforce a maximum amount of time that the connection and thread
can remain active.
8. Logging: Logging (e.g., access logging) consumes additional CPU and disk resources. Typically, it
may consume 10% additional CPU. For best performance, turn off unnecessary logging.
9. Proxy Servers: Proxy servers can be used to cache highly-used files. This is a great performance
advantage to the HTTP server (the originating server) by reducing the number of requests that it must
serve. In this case, an HTTP server would typically be front-ended by one or more proxy servers. If
the file is resident in the proxy cache and has not expired, it is served by the proxy server, and the
back-end HTTP server is not impacted at all. If the file is not cached or if it has expired, then a
request is made to the HTTP server, and served by the proxy.
10. Response Time (general): User response time is made up of Web browser (client work station) time,
network time, and server time. A problem in any one of these areas may cause a significant
performance problem for an end-user. To an end-user, it may seem apparent that any performance
problem would be attributable to the server, even though the problem may lie elsewhere. It is
common for pages that are being served to have imbedded files (e.g., gifs, images, buttons). Each of
these transactions may be a separate Internet transaction. Each adds to the response time since they
are treated as independent HTTP requests and can be retrieved from various servers (some browsers
can retrieve multiple URLs concurrently). Using Persistent Connection or KeepAlive directive can
improve this.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
85
Download from Www.Somanuals.com. All Manuals Search And Download.
11. HTTP and TCP/IP Configuration Tips: Information to assist with the configuration for TCP/IP
a. The number of HTTP server threads: The reason for having multiple server threads is that
when one server is waiting for a disk or communications I/O to complete, a different server job
can process another user's request. Also, if persistent requests are being used and AsyncIO is Off,
a server thread is allocated to that user for the entire length of the connection. For N-way
systems, each CPU may simultaneously process server jobs. The system will adjust the number of
servers that are needed automatically (within the bounds of the minimum and maximum
parameters). The values specified are for the number of "worker" threads. Typically, the default
values will provide the best performance for most systems. For larger systems, the maximum
number of server threads may have to be increased. A starting point for the maximum number of
threads can be the CPW value (the portion that is being used for Web server activity) divided by
20. Try not to have excessively more than what is needed as this may cause unnecessary system
activity.
b. The maximum frame size parameter (MAXFRAME on LIND) is generally satisfactory for
Ethernet because the default value is equal to the maximum value (1.5K). For Token-Ring, it can
be increased from 1994 bytes to its maximum of 16393 to allow for larger transmissions.
c. The maximum transmission unit (MTU) size parameter (CFGTCP command) for both the route
and interface affect the actual size of the line flows. Optimizing the MTU value will most likely
reduce the overall number of transmissions, and therefore, increase the potential capacity of the
CPU and the IOP. The MTU on the interface should be set to the frame size (*LIND). The MTU
on the route should be set to the interface (*IFC). Similar parameters also exist on the Web
browsers. The negotiated value will be the minimum of the server and browser (and perhaps any
bridges/routers), so increase them all.
d. Increasing the TCP/IP buffer size (TCPRCVBUF and TCPSNDBUF on the CHGTCPA or
CFGTCP command) from 8K bytes to 64K bytes (or as high as 8MB) may increase the
performance when sending larger amounts of data. If most of the files being served are 10K
bytes or less, it is recommended that the buffer size is not increased to the max of 8MB because it
may cause a negative effect on throughput.
e. Error and Access Logging: Having logging turned on causes a small amount of system overhead
(CPU time, extra I/O). Typically, it may increase the CPU load by 5-10%. Turn logging off for
best capacity. Use the Administration GUI to make changes to the type and amount of logging
needed.
f. Name Server Accesses: For each Internet transaction, the server accesses the name server for
information (IP address and name translations). These accesses cause significant overhead (CPU
time, comm I/O) and greatly reduce system capacity. These accesses can be eliminated by editing
the server’s config file and adding the line: “HostNameLookups Off”.
12. HTTP Server Memory Requirements: Follow the faulting threshold guidelines suggested in the
work management guide by observing/adjusting the memory in both the machine pool and the pool
that the HTTP servers run in (WRKSYSSTS). Factors that may significantly affect the memory
requirements include using larger document sizes and using CGI programs.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
86
Download from Www.Somanuals.com. All Manuals Search And Download.
13. File System Considerations: Web serving performance varies significantly based on which file
system is used. Each file system has different overheads and performance characteristics. Note that
serving from the ROOT or QOPENSYS directories provide the best system capacity. If Web page
development is done from another directory, consider copying the data to a higher-performing file
system for production use. The Web serving performance of the non-thread-safe file systems is
significantly less than the root directory. Using QDLS or QSYS may decrease capacity by 2-5 times.
Also, be sensitive to the number of sub-directories. Additional overhead is introduced with each
sub-directory you add due to the authorization checking that is performed. The HTTP Server serves
the pages in ASCII, so make sure that the files have the correct format, else the HTTP Server needs to
convert the pages which will result in additional overhead.
14. Communications/LAN IOPs: Since there are a dozen or more line flows per transaction (assuming
KeepAlive is off), the Web serving environment utilizes the IOP more than other communications
environments. Use the Performance Monitor or Collection Services to measure IOP utilization.
Attempt to keep the average IOP utilization at 60% or less for best performance. IOP capacity
depends on page size, the MTU size, the use of KeepAlive directive, etc. For the best projection of
IOP capacity, consider a measurement and observe the IOP utilization.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
87
Download from Www.Somanuals.com. All Manuals Search And Download.
6.2 PHP - Zend Core for i
This section discusses the different performance aspects of running PHP transaction based applications
using Zend Core for i, including DB access considerations, utilization of RPG program call, and the
benefits of using Zend Platform.
Zend Core for i
Zend Core for i delivers a rapid development and production PHP foundation for applications using PHP
running on i with IBM DB2 for i or MySQL databases. Zend Core for i includes the capability for Web
servers to communicate with DB2 and MySQL databases. It is easy to install, and is bundled with Apache
2, PHP 5, and PHP extensions such as ibm_db2.
The PHP application used for this study is a DVD store application that simulates users logging into an
online catalog, browsing the catalog, and making DVD purchases. The entire system configuration is a
two-tier model with tier one executing the driver that emulates the activities of Web users. Tier two
comprises the Web application server that intercepts the requests and sends database transactions to a
DB2 for i or MySQL server, configured on the same machine.
System Configuration
The hardware setup used for this study comprised a driver machine, and a separate system that hosted
both the web and database server. The driver machine emulated Web users of an online DVD store
generating HTTP requests. These HTTP requests were routed to the Web server that contained the DVD
store application logic. The Web server processed the HTTP requests from the Web browsers and
maintained persistent connections to the database server jobs. This allowed the connection handle to be
preserved after the transaction completed; future incoming transactions re-use the same connection
handle. The web and database server was a 2 processor partition on an IBM System i Model 9406-570
server (POWER5 2.2 Ghz) with 2GB of storage. Both IBM i 5.4 and 6.1 were used in the measurements,
but for this workload there was minimal difference between the two versions.
Database and Workload Description
The workload used simulates an Online Transaction Processing (OLTP) environment. A driver simulates
users logging in and browsing the catalog of available products via simple search queries. Returning
customers are presented with their online purchase transactions history, while new users may register to
create customer accounts. Users may select items they would like to purchase and proceed to check out or
continue to view available products. In this workload, the browse-buy ratio is 5:1. In total, for a given
order (business transaction) there are 10 web requests consisting of login, initiate shopping, five product
browse requests, shopping cart update, checkout, and product purchase. This is a transaction oriented
workload, utilizing commit processing to insure data integrity. In up to 2% of the orders, rollbacks occur
due to insufficient product quantities. Restocking is done once every 30 seconds to replenish the product
quantities to control the number of rollbacks.
Performance Characterization
The metrics used to characterize the performance of the workload were the following:
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
88
Download from Www.Somanuals.com. All Manuals Search And Download.
y
Throughput - Orders Per Minute (OPM). Each order actually consists of 10 web requests to complete
the order.
y
y
y
y
Order response time (RT) in milliseconds
Total CPU - Total system processor utilization
CPU Zend/AP - CPU for the Zend Core / Apache component.
CPU DB - CPU for the DB component
Database Access
The following four methods were used to access the backend database for the DVD Store application. In
the first three cases, SQL requests were issued directly from the PHP pages. In the fourth case, the i5 PHP
API toolkit program call interface was used to call RPG programs to issue i5 native DB IO. For all the
environments, the same presentation logic was used.
y
y
ibm_db2 extension shipped with Zend Core for i that provides the SQL interface to DB2 for i.
mysqli extension that provides the SQL interface to MySQL databases. In this case the MySQL
InnoDB and MyISAM storage engines were used.
y
y
i5 PHP API Toolkit SQL functions included with Zend Core for i that provide an SQL interface to
DB2 for i.
i5 PHP API Toolkit classes included with Zend Core for i that provide a program call interface.
When using ibm_db2, there are two ways to connect to DB2. If empty strings are passed for userid and
password on the connect, the database access occurs within the same job that the PHP script is executing
in. If a specific userid and password are used, database access occurs via a QSQSRVR job, which is
called server mode processing. In all tests using ibm_db2, server mode processing was used. This may
have a minimal performance impact due to management of QSQSRVR jobs, but does prevent the apache
job servicing the php request from not responding if a DB error occurs.
When using ibm_db2 and the i5 toolkit (SQL functions), the accepted practice of using prepare and
execute was utilized. In addition stored procedures were utilized for processing the purchase transactions.
For MySQL, prepared statements were not utilized because of performance overhead.
Finally, in the case of the i5 PHP API toolkit and ibm_db2, persistent connections were used. Persistent
connections provides dramatic performance gains versus using non-persistent connections. This is
discussed in more detail in the next section.
In the following table, we compare the performance of the different DB access methods.
OS / DB
ZendCore Version
Connect
i 5.4 / DB2
V2.5.2
db2_pconnect
i 5.4 / MySQL 5.0
V2.5.2
i 5.4 / DB2
V2.5.2
i5_pconnect
i 5.4 / DB2
V2.5.2
i5_pconnect
mysqli
SQL function Pgm Call Function
OPM
RT (ms)
Total CPU
CPU - Zend/AP
CPU - DB
4997
176
99
62
33
3935
225
98
49
47
3920
227
99
63
33
5240
169
98
88
7
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 6 - Web Server and WebSphere
89
Download from Www.Somanuals.com. All Manuals Search And Download.
Conclusions:
1. The performance of each DB connection interface provides exceptional response time at very high
throughput. Each order processed consisted of ten web requests. As a result, the capacity ranges from
about 650 transactions per second up to about 870 transactions per second. Using Zend Platform will
provide even higher performance (refer to the section on Zend Platform).
2. The i5 PHP API Toolkit is networked enabled so provides the capability to run in a 3-tier environment,
ie, where the PHP application is running on web server deployed on a separate system from the backend
DB server. However, when running in a 2- tier environment, it is recommended to use the ibm_db2 PHP
extension to access DB2 locally given the optimized performance.
The i5 PHP API Toolkit provides a wealth of interfaces to integrate PHP pages with native i5 system
services. When standardizing on the use of the i5 toolkit API, the use of the SQL functions to access DB2
will provide very good performance. In addition to SQL functions, the toolkit provides a program call
interface to call existing programs. Calling existing programs using native DB IO may provide
significantly more performance.
3. The most compelling reason to use MySQL on IBM i is when you are deploying an application that is
written to the MySQL database.
Database - Persistent versus Non-Persistent Connections
If you're connecting to a DB2 database in your PHP application, you'll find that there are two alternative
connections - db2_connect which establishes a new connection each time and db2_pconnect which uses
persistent connections. The main advantage of using a persistent connection is that it avoids much of the
initialization and teardown normally associated with getting a connection to the database. When
db2_close() is called against a persistent connection, the call always returns TRUE, but the underlying
DB2 client connection remains open and waiting to serve the next matching db2_pconnect() request.
One main area of concern with persistent connections is in the area of commitment control. You need to
be very diligent when using persistent connections for transactions that require the use of commitment
control boundaries. In this case, DB2_AUTOCOMMIT_OFF is specified and the programmer controls
the commit points using db2_commit() statements. If not managed correctly, mixing managed
commitment control and persistent connections can result in unknown transaction states if errors occur.
In the following table, we compare the performance of utilizing non-persistent connections in all cases
versus using a mix of persistent and non-persistent connections versus using persistent connections in all
cases.
OS / DB
ZendCore Version
Connect
i 5.4 / DB2
V2.5.2
db2_connect
i 5.4 / DB2
V2.5.2
i 5.4 / DB2
V2.5.2
db2_pconnect
Mixed
OPM
RT (ms)
Total CPU
CPU - Zend/AP
CPU - DB
445
2021
91
9
78
2161
414
99
33
62
4997
176
99
62
33
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 6 - Web Server and WebSphere
90
Download from Www.Somanuals.com. All Manuals Search And Download.
Conclusions:
1. As stated earlier, persistent connections can dramatically improve overall performance. When using
persistent connections for all transactions, the DB CPU utilization is significantly less than when using
non-persistent connections.
2. For any transactions that run with autocommit turned on, use persistent connections. If the transaction
requires that autocommit be turned off, use of non-persistent connections may be sufficient for pages that
don’t have heavy usage. However, if a page is heavily used, use of persistent connections may be required
to achieve acceptable performance. In this case, you will need a well designed transaction that handles
error processing to ensure no commits are left outstanding.
Database - Isolation Levels
Because the transaction isolation level determines how data is locked and isolated from other processes
while the data is being accessed, you should select an isolation level that balances the requirements of
concurrency and data integrity. DB2_I5_TXN_SERIALIZABLE is the most restrictive and protected
transaction isolation level, and it incurs significant overhead. Many workloads do not require this level of
isolation protection. We did limited testing comparing the performance of using
DB2_I5_TXN_READ_COMMITTED versus DB2_I5_TXN_READ_UNCOMMITTED isolation levels.
With this workload, running under DB2_I5_TXN_READ_COMMITTED reduced the overall capacity by
about 5%. However a given application might never update the underlying data or run with other
concurrent updaters and DB2_I5_TXN_READ_UNCOMMITTED may be sufficient. Therefore, review
your isolation level requirements and adjust them appropriately.
Zend Platform
Zend Platform for i is the production environment that ensures PHP applications are always available,
fast, reliable and scalable on the i platform. Zend Platform provides caching and optimization of compiled
PHP code, which provides significant performance improvement and scalability. Other features of Zend
Platform that brings additional value, include:
y
5020 Bridge – API for accessing 5250 data streams which allows Web front ends to be created
for existing applications.
y PHP Intelligence – provides monitoring of PHP applications and captures all the information
needed to pinpoint the root cause of problems and performance bottlenecks.
Online debugging and immediate error resolution with Zend Studio for i
PHP/Java integration bridge
y
y
By automatically caching and optimizing the compiled PHP code, application response time and system
capacity improves dramatically. The best part for this is that no changes are required to the take advantage
of this optimization. In the measurements included below, the default Zend Platform settings were used.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
91
Download from Www.Somanuals.com. All Manuals Search And Download.
OS / DB
Zend Version
Connect
i 6.1 / DB2
V2.5.2 V2.5.2/Platform
db2_pconnect db2_pconnect
i 6.1/MySQL 5.0
V2.5.2
V2.5.2/Platform
mysqli
mysqli
OPM
RT (ms)
Total CPU
CPU - Zend/AP
CPU - DB
5041
176
98
62
31
6795
129
95
44
46
3974
224
98
49
47
4610
191
96
31
62
Conclusions:
1. In both cases above, the overall system capacity improved significantly when using Zend Platform, by
about 15-35% for this workload. With each order consisting of 10 web requests, processing 6795 orders
per minute translates into about 1132 transactions per second.
2. Zend Platform will reduce the amount of processing in the Zend Core component since the PHP code is
compiled once and reused. In both of the above cases, the amount of processing done in Zend Core on a
per transaction basis was dramatically reduced by a factor of about 1.9X.
PHP System Sizing
The IBM Systems Workload Estimator (a.k.a., the Estimator or WLE) is a web-based sizing tool for IBM
Power Systems, System i, System p, and System x. You can use this tool to size a new system, to size an
upgrade to an existing system, or to size a consolidation of several systems. The Estimator allows
measurement input to best reflect your current workload and provides a variety of built-in workloads to
reflect your emerging application requirements.
Currently, a new built-in workload is being developed to allow the sizing of PHP workloads on Power
Systems running IBM i. This built-in is expected to be available November 2008. To access WLE use the
following URL:
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
92
Download from Www.Somanuals.com. All Manuals Search And Download.
6.3 WebSphere Application Server
This section discusses System i performance information for the WebSphere Application Server,
including WebSphere Application Server V6.1, WebSphere Application Server V6.0, WebSphere
Application Server V5.0 and V5.1, and WebSphere Application Server Express V5.1. Historically, both
WebSphere and i5/OS Java performance improve with each version. Note from the figures and data in
this section that the most recent versions of WebSphere and/or i5/OS generally provides the best
performance.
What’s new in V6R1?
The release of i5/OS V6R1 brings with it significant performance benefits for many WebSphere
applications. The following chart shows the amount of improvement in transactions per second (TPS) for
the Trade 6.1 workload using various data access methods:
V5R4 GA V6R1 GA
1400
1200
1000
800
600
400
200
0
+68%
+78%
+50%
Thi
s
cha
rt
Native JDBC Toolbox JDBC Universal JCC
sho
ws that in V6R1, throughput levels for Trade 6.1 increased from 50% to nearly 80% versus V5R4,
depending on which JDBC provider was being used. All measurement results were obtained in a 2-tier
environment (both application and database on the same partition) on a 2-core 2.2Ghz System i partition,
using Version 6.1 of WebSphere Application Server and IBM Technology for Java VM. Although many
of the improvements are applicable to 3-tier environments as well, the communications overhead in these
environments may affect the amount of improvement that you will see.
The improvements in V6R1 were primarily in the JDBC, DB2 for i5/OS and Java areas, as well as
changes in other i5/OS components such as seize/release and PASE call overhead. The majority of the
improvements will be achieved without any changes to your application, although some improvements do
require additional tuning (discussed below in Tuning Changes for V6R1). Although some of the changes
are now available via V5R4 PTFs, the majority of the improvement will only be realized by moving to
V6R1. The actual amount of improvement in any particular application will vary, particularly depending
on the amount of JDBC/DB activity, where a significant majority of the changes were made. In addition,
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
93
Download from Www.Somanuals.com. All Manuals Search And Download.
because the improvements largely resulted from significant reductions in pathlength and CPU,
environments that are constrained by other resources such as IO or memory may not show the same level
of improvements seen here.
Tuning changes in V6R1
As indicated above, most improvements will require no changes to an application. However, there are a
few changes that will require some tuning in order to be realized:
y
y
y
Using direct map (native JDBC)
For System i, the JDBC interfaces run more efficiently if direct mapping of data is used, where the
data being retrieved is in a form that closely matches how the data is stored in the database files. In
V6R1, significant enhancements were made in JDBC to allow direct map to be used in more cases.
For the toolbox and JCC JDBC drivers, where direct map is the default, there is no change needed to
realize these gains. However, for native JDBC, you will need to use the “directMap=true” custom
property for the datasource in order to maximize the gain from these changes. For Trade 6.1,
measurements show that adding this property results in about a 3-5% improvement in throughput.
Note that there is no detrimental effect from using this property, since the JDBC interfaces will only
use direct map if it is functionally viable to do so.
Use of unix sockets (toolbox JDBC)
For toolbox JDBC, the default is to use TCP/IP inet sockets for requests between the application
server and the database connections. In V6R1, an enhancement was added to allow the use of unix
sockets in a 2-tier toolbox environment (application and database reside on the same partition). Using
unix sockets for the Trade 6.1 2-tier workload in V6R1 resulted in about an 8-10% improvement in
throughput. However, as the default is still to use inet sockets, you will need to ensure that the class
path specified in the JDBC provider is set to use the jt400native.jar file (not the jt400.jar file) in order
to use unix sockets. Note that the improvement is applicable only to 2-tier toolbox environments. Inet
sockets will continue to be used for all other multiple tier toolbox environments no matter which .jar
file is used.
.
Using ‘threadUsed=false” custom property (toolbox JDBC)
In toolbox JDBC, the default method of operation is to use multiple application server threads for
each request to a database connection, with one thread used for sending data to the connection and
another thread being used to receive data from the connection. In V6R1, changes were made to allow
both the send and receive activity to be done within a single application server thread for each
request, thus reducing the overhead associated with the multiple threads. To gain the potential
improvement from this change, you will need to specify the “threadUsed=false” custom property in
the toolbox datasource, since the default is still to use multiple threads. For the Trade 6.1 workload,
use of the this property resulted in about a 10% improvement in throughput.
Tuning for WebSphere is important to achieve optimal performance. Please refer to the WebSphere
Application Server for iSeries Performance Considerations or the WebSphere Info Center documents for
more information. These documents describe the performance differences between the different
WebSphere Application Server versions on the System i platform. They also contain many performance
recommendations for environments using servlets, Java Server Pages (JSPs), and Enterprise Java Beans.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
94
Download from Www.Somanuals.com. All Manuals Search And Download.
For WebSphere 5.1 and earlier refer to the Performance Considerations guide at:
ns.html
For WebSphere 5.1, 6.0 and 6.1 please refer to the following page and follow the appropriate link:
Although some capacity planning information is included in these documents, please use the IBM
Systems Workload Estimator as the primary tool to size WebSphere environments. The Workload
Estimator is kept up to date with the latest capacity planning information available.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
95
Download from Www.Somanuals.com. All Manuals Search And Download.
Trade 6 Benchmark (IBM Trade Performance Benchmark Sample for WebSphere Application Server)
Description:
Trade 6 is the fourth generation of the WebSphere end-to-end benchmark and performance sample
application. The Trade benchmark is designed and developed to cover the significantly expanding
programming model and performance technologies associated with WebSphere Application Server. This
application provides a real-world workload, enabling performance research and verification test of the
JavaTM 2 Platform, Enterprise Edition (J2EETM) 1.4 implementation in WebSphere Application Server,
including key performance components and features.
Overall, the Trade application is primarily used for performance research on a wide range of software
components and platforms. This latest revision of Trade builds off of Trade 3, by moving from the J2EE
1.3 programming model to the J2EE 1.4 model that is supported by WebSphere Application Server V6.0.
Trade 6 adds DistributedMap based data caching in addition to the command bean caching that is
used in Trade 3. Otherwise, the implementation and workflow of the Trade application remains
unchanged.
Trade 6 also supports the recent DB2® V8.2 and Oracle® 10g databases. The new design of Trade 6
enables performance research on J2EE 1.4 including the new Enterprise JavaBeansTM (EJBTM) 2.1
component architecture, message-driven beans, transactions (1-phase, 2-phase commit) and Web services
(SOAP, WSDL, JAX-RPC, enterprise Web services). Trade 6 also drives key WebSphere Application
Server performance components such as dynamic caching, WebSphere Edge Server, and EJB caching.
NOTE: Trade 6 is an updated version of Trade 3 which takes advantage of the new JMS messaging
support available with WebSphere 6.0. The application itself is essentially the same as Trade 3 so
direct comparisons can be made between Trade 6 and Trade 3. However, it is important to note
that direct comparisons between Trade2 and Trade3 are NOT valid. As a result of the redesign and
additional components that were added to Trade 3, Trade 3 is more complex and is a heavier
application than the previous Trade 2 versions.
Figure 6. 1 Topology of the Trade Application
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
96
Download from Www.Somanuals.com. All Manuals Search And Download.
The Trade 6 application allows a user, typically using a Web browser, to perform the following actions:
y
y
y
y
y
y
y
Register to create a user profile, user ID/password and initial account balance
Login to validate an already registered user
Browse current stock price for a ticker symbol
Purchase shares
Sell shares from holdings
Browse portfolio
Logout to terminate the users active interval
Each action is comprised of many primitive operations running within the context of a single HTTP
request/response. For any given action there is exactly one transaction comprised of 2-5 remote method
calls. A Sell action for example, would involve the following primitive operations:
y
y
y
y
Browser issues an HTTP GET command on the TradeAppServlet
TradeServlet accesses the cookie-based HTTP Session for that user
HTML form data input is accessed to select the stock to sell
The stock is sold by invoking the sell() method on the Trade bean, a stateless Session EJB. To
achieve the sell, a transaction is opened and the Trade bean then calls methods on Quote, Account
and Holdings Entity EJBs to execute the sell as a single transaction.
The results of the transaction, including the new current balance, total sell price and other data,
are formatted as HTML output using a Java Server Page, portfolio.jsp.
Message Driven Beans are used to inform the user that the transaction has completed on the next
logon of that user.
y
y
To measure performance across various configuration options, the Trade 6 application can be run in
several modes. A mode defines the environment and component used in a test and is configured by
modifying settings through the Trade 6 interface. For example, data object access can be configured to
use JDBC directly or to use EJBs under WebSphere by setting the Trade 6 runtime mode. In the Sell
example above, operations are listed for the EJB runtime mode. If the mode is set to JDBC, the sell action
is completed by direct data access through JDBC from the TradeAppServlet. Several testing modes are
available and are varied for individual tests to analyze performance characteristics under various
configurations.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
97
Download from Www.Somanuals.com. All Manuals Search And Download.
WebSphere Application Server V6.1
Historically, new releases of WebSphere Application Server have offered improved performance and
functionality over prior releases of WebSphere. WebSphere Application Server V6.1 is no exception.
Furthermore, the availability of WebSphere Application Server V6.1 offers an entirely new opportunity
for WebSphere customers. Applications running on V6.1 can now operate with either the “Classic”
64-bit Virtual Machine (VM) or the recently released IBM Technology for Java, a 32-bit VM that is built
on technology being introduced across all the IBM Systems platforms.
Customers running releases of WebSphere Application prior to V6.1 will likely be familiar with the
Classic 64-bit VM. This continues to be the default VM on i5/OS, offering competitive performance and
excellent vertical scalability. Experiments conducted using the Trade6 benchmark show that WebSphere
Application Server V6.1 running on the Classic VM realized performance gains of 5-10% better
throughput when compared to WebSphere Application Server V6.0 on identical hardware.
In addition to the presence of the Classic 64-bit VM, WebSphere Application Server V6.1 can also take
advantage of IBM Technology for Java, a 32-bit implementation of Java supported on Java 5.0 (JDK 1.5).
For V6.1 users, IBM Technology for Java has two key potentially beneficial characteristics:
y Significant performance improvements for many applications - Most applications will see at least
equivalent performance when comparing WebSphere Application Server on the Classic VM to IBM
Technology for Java, with many applications seeing improvements of up to 20%.
y 32-bit addressing allows for a potentially considerable reduction in memory footprint - Object
references require only 4 bytes of memory as opposed to the 8 bytes required in the 64-bit Classic
VM. For users running on small systems with relatively low memory demands this could offer a
substantially smaller memory footprint. Performance tests have shown approximately 40% smaller
Java Heap sizes when using IBM Technology for Java when compared to the Classic VM.
It is important to realize that both the Classic VM and IBM Technology for Java have excellent benefits
for different applications. Therefore, choosing which VM to use is an extremely important consideration.
Chapter 7 - Java Performance has an extensive overview of many of the key decisions that go into
choosing which VM to use for a given application. Most of the points in Chapter 7 are very much
important to WebSphere Application Server users. One issue that will likely not be a concern to
WebSphere Application Server users is the additional overhead to native ILE calls that is seen in IBM
Technology for Java. However, if native calls are relevant to a particular application, that consideration
will of course be important. While choosing the appropriate VM is important, WebSphere Application
Server V6.1 allows users to toggle between the Classic VM and IBM Technology for Java either for the
entire WebSphere installation or for individual application server profiles.
While 32-bit addressing can provide smaller memory footprints for some applications, it is imperative to
understand the other end of the spectrum: applications requiring large Java heaps may not be able to fit in
the space available to a 32-bit implementation of Java. The 32-bit VM has a maximum heap size of 3328
MB for Java applications. However, WebSphere Application Server V6.1 using IBM Technology for
Java has a practical maximum heap size of around 2500 MB due in part to WebSphere related memory
demands like shared classes. The Classic VM should be used for applications that require a heap larger
than 2500 MB (see Chapter 7 - Java Performance for further details).
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
98
Download from Www.Somanuals.com. All Manuals Search And Download.
Trade3 Measurement Results:
Trade on System i - Historical View
Capacity
Trade 3/6 on m odel 825 2 W ay LPAR
550
500
450
400
350
300
250
200
150
100
50
0
Trade3-EJB
Trade3-JDBC
V5R2 WAS 5.0
V5R3 WAS 5.0
V5R3 WAS 5.1
V5R3 WAS 6.0 (Trade6)
V5R4 WAS 6.0 (Trade6)
V5R4 WAS 6.1 Classic (Trade 6.1)
V5R4 WAS 6.1 IBM Tech For Java (Trade 6.1)
Figure 6.2 Trade Capacity Results
WebSphere Application Server Trade Results
Notes/Disclaimers:
y Trade3 chart:
WebSphere 5.0 was measured on both V5R2 and V5R3 on a 4 way (LPAR) 825/2473 system
WebSphere 5.1 was measured on V5R3 on a 4 way (LPAR) 825/2473 system
WebSphere 6.0 was measured on V5R3 on a 4 way (LPAR) 825/2473 system
WebSphere 6.0 was measured on V5R4 on a 2 way (LPAR) 570/7758 system
WebSphere 6.1 using Classic VM was measured on V5R4 on a 2 way (LPAR) 570/7758 system
WebSphere 6.1 using IBM Technology for Java was measured on V5R4 on a 2 way (LPAR) 570/7758 system
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
99
Download from Www.Somanuals.com. All Manuals Search And Download.
Trade Scalability Results:
Trade on System i
Scaling of Hardware and Software
Power 5
Power 6
Trade 3
4000
1200
1000
800
600
400
200
0
2000
3500
3000
2500
2000
1500
1000
500
1500
1000
500
0
EJB
JDBC
0
EJB
JDBC
EJB
JDBC
Power4 2 Way (LPAR) 1.1 Ghz
Power5 2 Way 1.65 Ghz
Power5 2 way (LPAR) 2.2 Ghz
Power5 2 way (LPAR) 2.2 Ghz
Power6 2 way (LPAR) 4.6 Ghz
V5R2 WAS 5.0
V5R2 WAS 5.1
V5R3 WAS 5.1
Figure 6.3 Trade Scaling Results
WebSphere Application Server Trade Results
Notes/Disclaimers:
y Trade 3 chart:
V5R2 - 890/2488 32-Way 1.3 GHz, V5R2 was measured with WebSphere 5.0 and WebSphere 5.1
V5R3 - 890/2488 32-Way 1.3 GHz, V5R3 was measured with WebSphere 5.1
POWER5 chart:
POWER4 - V5R3 825/2473 2-Way (LPAR) 1.1 GHz., Power4 was measured with WebSphere 5.1
POWER5 - V5R3 520/7457 2-Way 1.65 GHz., Power5 was measured with WebSphere 5.1
POWER5 - V5R4 570/7758 2-Way (LPAR) 2.2 GHz, Power5 was measured with WebSphere 6.0
y POWER6 chart:
POWER5 - V5R4 570/7758 2-Way (LPAR) 2.2 GHz, Power5 was measured with WebSphere 6.0
POWER6 - V5R4 9406-MMA 2-Way (LPAR) 4.7 GHz, Power6 was measured with WebSphere 6.1
Trade 6 Primitives
Trade 6 provides an expanded suite of Web primitives, which singularly test key operations in the
enterprise Java programming model. These primitives are very useful in the Rochester lab for
release-to-release comparison tests, to determine if a degradation occurs between releases, and
what areas to target performance improvements. Table 6.1 describes all of the primitives that are
shipped with Trade 6, and Figure 6.4 shows the results of the primitives from WAS 5.0 and
WAS 5.1. In V5R4 a few of the primitives were tracked on WAS 6.0, showing a change of
0-2%, the results of which are not included in Figure 6.4. In the future, additional primitives are
planned again to be measured for comparison.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
100
Download from Www.Somanuals.com. All Manuals Search And Download.
Primitive Name
PingHtml
Description of Primitive
PingHtml is the most basic operation providing access to a simple "Hello World" page of static
HTML.
PingServlet
PingServletWriter
PingServlet tests fundamental dynamic HTML creation through server side servlet processing.
PingServletWriter extends PingServlet by using a PrintWriter for formatted output vs. the output
stream used by PingServlet.
PingServlet2Include
PingServlet2Servlet
PingServlet2Include tests response inclusion. Servlet 1 includes the response of Servlet 2.
PingServlet2Servlet tests request dispatching. Servlet 1, the controller, creates a new JavaBean
object forwards the request with the JavaBean added to Servlet 2. Servlet 2 obtains access to the
JavaBean through the Servlet request object and provides dynamic HTML output based on the
JavaBean data.
PingJSP
PingJSP tests a direct call to JavaServer Page providing server-side dynamic HTML through JSP
scripting.
PingJSPEL
PingServlet2JSP
PingJSPEL tests a direct call to JavaServer Page providing server-side dynamic HTML through JSP
scripting and the usage of the new JSP 2.0 Expression Language.
PingServlet2JSP tests a commonly used design pattern, where a request is issued to servlet providing
server side control processing. The servlet creates a JavaBean object with dynamically set attributes
and forwards the bean to the JSP through a RequestDispatcher The JSP obtains access to the
JavaBean and provides formatted display with dynamic HTML output based on the JavaBean data.
PingHTTPSession1 - SessionID tests fundamental HTTP session function by creating a unique
session ID for each individual user. The ID is stored in the users session and is accessed and
displayed on each user request.
PingHTTPSession1
PingHTTPSession2
PingHTTPSession3
PingHTTPSession2 session create/destroy further extends the previous test by invalidating the HTTP
Session on every 5th user access. This results in testing HTTPSession create and destroy.
PingHTTPSession3 large session object tests the servers ability to manage and persist large
HTTPSession data objects. The servlet creates a large custom java object. The class contains
multiple data fields and results in 2048 bytes of data. This large session object is retrieved and stored
to the session on each user request.
PingJDBCRead
PingJDBCRead tests fundamental servlet to JDBC access to a database performing a single-row read
using a prepared SQL statment.
PingJDBCWrite
PingJDBCRead tests fundamental servlet to JDBC access to a database performing a single-row
write using a prepared SQL statment.
PingServlet2JNDI
PingServlet2JNDI tests the fundamental J2EE operation of a servlet allocating a JNDI context and
performing a JNDI lookup of a JDBC DataSource.
PingServlet2SessionEJB
PingServlet2EntityEJBLocal
PingServlet2EntityEJBRemote
PingServlet2SessionEJB tests key function of a servlet call to a stateless SessionEJB. The
SessionEJB performs a simple calculation and returns the result.
PingServlet2EntityEJB tests key function of a servlet call to an EJB 2.0 Container Managed Entity.
In this test the EJB entity represents a single row in the database table. The Local version uses the
EJB Local interface while the Remote version uses the Remote EJB interface. (Note:
PingServlet2EntityEJBLocal will fail in a multi-tier setup where the Trade3 Web and EJB apps are
seperated.)
PingServlet2Session2Entity
PingServlet2Session2
EntityCollection
Tests the full servlet to Session EJB to Entity EJB path to retrieve a single row from the database.
This test extends the previous EJB Entity test by calling a Session EJB which uses a finder method
on the Entity that returns a collection of Entity objects. Each object is displayed by the servlet
PingServlet2Session2CMROne This test drives an Entity EJB to get another Entity EJB's data through an EJB 2.0 CMR One to One
2One relationship
PingServlet2Session2CMROne This test drives an Entity EJB to get another Entity EJB's data through an EJB 2.0 CMR One to
2Many
Many relationship
PingServlet2MDBQueue
PingServlet2MDBQueue drives messages to a Queue based Message Driven EJB (MDB).Each
request to the servlet posts a message to the Queue. The MDB receives the message asynchronously
and prints message delivery statistics on each 100th message.
PingServlet2MDBTopic
PingServlet2TwoPhase
PingServlet2MDBTopic drives messages to a Topic based Publish/Subscribe Message Driven EJB
(MDB).Each request to the servlet posts a message to the Topic. The TradeStreamMDB receives the
message asynchronously and prints message delivery statistics on each 100th message. Other
subscribers to the Topic will also receive the messages.
PingServlet2TwoPhase drives a Session EJB which invokes an Entity EJB with findByPrimaryKey
(DB Access) followed by posting a message to an MDB through a JMS Queue (Message access).
These operations are wrapped in a global 2-phase transaction and commit.
Table 6.1 Description of Trade primitives in Figure 6.4
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
101
Download from Www.Somanuals.com. All Manuals Search And Download.
WebSphere Trade 3 Primitives
3000
2500
2000
1500
1000
500
WAS 5.0.2
WAS 5.1
0
Figure 6.4 WebSphere Trade 3 primitive results.
Note: The measurements were performed on the same machine, an 270-2434 600 MHz 2-Way. All results are for a non-secure
environment.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
102
Download from Www.Somanuals.com. All Manuals Search And Download.
Accelerator for System i
Coinciding with the release of i5/OS V5R4, IBM introduces new entry IBM System i models. The
models introduce accelerator technologies and/or L3 cache in order to improve options for clients in the
low-end server space. As an overview, the Accelerator for System i affects two 520 Models: (1) 600
CPW with no L3 cache and (2) 1200 CPW with L3 cache. With the Accelerator for System i, the 600
CPW can be accelerated to a 3100 CPW system, whereas the 1200 CPW can be accelerated to 3800
CPW.
In order to showcase the abilities of these
systems, experiments were completed on
WAS 6.0 running Trade 6 to display the
Trade JDBC Throughput
benefits. The following information
describes the models in the context of both
350
capacity and response time. Results were
collected on System i Model 520 with
300
varying feature codes depending on the
presence of the Accelerator for System i.
250
With regards to capacity, Figure 6.5 shows
the 600 CPW model accelerated to 3100
CPW increases capacity 5.5 times.
200
Accelerator
New Models
Additionally, the 1200 CPW model
Old Models
150
100
50
accelerated to 3800 CPW increases capacity
3 times. This provides an extraordinary
benefit when running WebSphere
Applications.
It is also important to note the benefits of
L3 cache. For example, the 1200 CPW
model has 2.5 times more capacity than that
of the 600 CPW system. Additionally, Java
workloads tend to perform better with L3
cache. Thus, besides the benefit of increased
capacity, a move from a system with no L3
0
500 600 1000 1200 2400 3100 3300 3800
CPW Rating
cache to a system with L3 cache may scale better than CPW
ratings would indicate.
Figure 6.5 - Accelerator for System i
performance data - Capacity comparison
(WAS 6.0 running Trade 6).
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 6 - Web Server and WebSphere
103
Download from Www.Somanuals.com. All Manuals Search And Download.
Figure 6.6 provides insight into response time information regarding low-end System i models. There are
two key concepts that are displayed in the data in Figure 6.6. The first is that Accelerator for System i
models can provide substantially better response times than previous models for a single or many users.
The 600 CPW accelerated to 3100 CPW reduces the response time by 5 times while the 1200 CPW
accelerated to 3800 CPW reduces the response time by 2.5 times. The second idea to note is that the
presence of L3 cache has little effect on the response time of a single user. Of course there are benefits of
L3 cache, however, the absence of L3 cache does not imply poorer performance with regards to response
time.
Trade JDBC Response Time (1 User)
0.09
0.08
0.07
0.06
New Models
Accelerator
Old Models
0.05
0.04
0.03
0.02
0.01
0
500 600 1000 1200 2400 3100 3300 3800
CPW Rating
Figure 6.6 - Accelerator for System i
performance data - Single user response time
comparison (WAS 6.0 running Trade 6).
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 6 - Web Server and WebSphere
104
Download from Www.Somanuals.com. All Manuals Search And Download.
Performance Considerations When Using WebSphere Transaction Processing (XA)
In a general sense, a transaction is the execution of a set of related operations that must be completed
together. This set of operations is referred to as a unit-of-work. A transaction is said to commit when it
completes successfully. Otherwise it is said to roll back. When an application needs to access more than
one resource (backend) and needs to guarantee job consistency, a global transaction is required to
coordinate the interactions among the resources, application and transaction manager, as defined by the
XA specification.
WebSphere Application Server is compliant with the XA specification. It uses the two-phase commit
protocol to guarantee the All-or-Nothing semantics that either all the resources commit the change
permanently or none of them precede the update (rollback). Such a transaction scenario requires the
participating resource managers, such as WebSphere JMS/MQ and DB2 UDB, to support the XA
specification and provide the XA interface for a two-phase commit. It is the role of the resource managers
to manage access to shared resources involved in a transaction and guarantee that the ACID properties
(Atomicity, Consistency, Isolation, and Durability) of a transaction are maintained. It is the role of
WebSphere Transaction manager to control all of the global transaction logic as defined by the J2EE
Standard. Within WebSphere there are two ways of using WebSphere global transaction: Container
Managed Transaction (CMT) and Bean Managed Transaction (BMT). With Container Managed, you do
not need to write any code to control the transaction behavior. Instead, the J2EE container, WebSphere in
this case, controls all the transaction logic. It is a service provided by WebSphere.
When your application involves multiple resources, such as DB2, in a transaction, you need to ensure that
you select an XA compliant JDBC provider. For WebSphere on the System i platform you have two
options depending on if you are running in a two tier environment (application server and database server
on the same system) or in a three tier environment (application server and database server on separate
systems). For a two tier environment you would select DB2 UDB for iSeries (Native XA - V5R2 and
later). For a three tier environment you would select DB2 UDB for iSeries (Toolbox XA).
However, since the overhead of running XA is quite significant, you should ensure that you do not
configure an XA compliant JDBC provider if your application does not require XA functionality. In this
case, in a two tier environment you would select DB2 UDB for iSeries (Native - V5R2 and later) and for
a three tier environment you would select DB2 UDB for iSeries (Toolbox).
In WebSphere 6.0 the JMS provider was totally rewritten. It is now 100% pure Java and it no longer
requires WebSphere MQ to be installed. Also, the datastore for the messaging engine can be configured to
store persistent data in DB2 UDB for iSeries. As a result, you can configure your application to share the
JDBC connection used by a messaging engine, and the EJB container. This enables you to use one-phase
commit (non-XA) transactions since you now have only one resource manager (DB2) involved in a
transaction. Previously with 5.1 you had to use XA since the transaction would involve MQ and DB2
resource managers. By utilizing one-phase commit optimization, you can improve the performance of
your application.
You can benefit from the one-phase commit optimization in the following circumstances:
y
y
Your application must use the assured persistent reliability attribute for its JMS messages.
Your application must use CMP entity beans that are bound to the same JDBC data source that the
messaging engine uses for its data store.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 6 - Web Server and WebSphere
105
Download from Www.Somanuals.com. All Manuals Search And Download.
Restriction: You cannot benefit from the one-phase commit optimization in the following circumstances:
y
y
If your application uses a reliability attribute other than assured persistent for its JMS messages.
If your application uses Bean Managed Persistence (BMP) entity beans, or JDBC clients.
Before you configure your system, ensure that you consider all of the components of your J2EE
application that might be affected by one-phase commits. Also, since the JDBC datasource connection
will now be shared by the messaging engine and the EJB container, you need to ensure that you increase
the number of connections allocated to the connection pool. To optimize for one-phase commit
transactions, refer to the following website:
Http://publib.boulder.ibm.com/infocenter/ws60help/index.jsp?topic=/com.ibm.websphere.pmc.doc/tasks/t
jm0280_.html
WebSphere Application Server V51 Express
For information on WAS V51 Express, please refer to older versions of the Performance Capabilities
Reference Manual that can be found here:
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
106
Download from Www.Somanuals.com. All Manuals Search And Download.
6.4 IBM WebFacing
The IBM WebFacing tool converts your 5250 application DDS display files, menu source, and help files
into Java Servlets, JSPs, JavaBeans, and JavaScript to allow your application to run in either WebSphere
Application Server V5 or V4. This is an easy way to bring your application to either the Internet, or the
Intranet, both quickly and inexpensively.
The Number of Screens processed per second and the number of Input/Output fields per screen are
the main metric to tell how heavy a WebFaced application will be on the WebSphere Application Server.
The number of Input/Output fields are simple to count for most of the screens, except when dealing with
subfiles. Subfiles can affect the number of input/output fields dramatically. The number of fields in
subfiles are significantly impacted by two DDS keywords:
1. SFLPAG - The number of rows shown on a 5250 display.
2. SFLSIZ - The number of rows of data extracted from the database.
When using a DDS subfile, there are 3 typical modes of operation:
1. SFLPAG=SFLSIZ. In this mode, there are no records that are cached. When more records are
requested, WebFacing will have to get more rows of data. This is the recommended way to run your
WebFacing application.
2. SFLPAG < SFLSIZ. In this mode, WebFacing will get SFLSIZ rows of data at a time. WebFacing
will display SFLPAG rows, and cache the rest of the rows. When the user requests more rows with a
page-down, WebFacing will not have to access the database again, unless they page below the value
of SFLSIZ. When this happens, WebFacing will go back to the database and receive more rows.
3. SFLPAG = (SFLSIZ) * (Number of times requesting the data). This is a special case of option 2
above, and is the recommended approach to run GreenScreen applications. For the first time the page
is requested, SFLPAG rows will be returned. If the user performs a page down, then SFLPAG * 2
rows will be returned. This is very efficient in 5250 applications, but less efficient with WebFacing.
Since WebFacing is performance sensitive to the number of input/output fields that are requested from
WebFacing, the best option would be the first mode, since this will minimize the number of these fields
for each 5250 panel requested through WebFacing. The number of fields for a subfile is the number of
rows requested from the database (SFLSIZE) times the number of columns in each row.
Figure 6.7 shows a theoretical algorithm to graphically describe the effect the number of Input/Output
fields has on the performance of the WebFaced
application. The Y-axis metric is not important,
but merely can be used to measure the relative
60
amount of CPU horsepower that the application
50
needs to serve one single 5250 panel. In this
40
case, serving one single panel with 50 I/O fields
30
is approximately one half the CPU horsepower
20
needed to serve one 5250 panel with 350 I/O
10
fields. As you can see, the number of I/O fields
0
50
100
150
200
250
300
350
dramatically impacts the performance of your
WebFacing application, thereby reducing the I/O
fields will improve your performance.
Average Number of Fields per Panel
Constant Overhead Per Panel Overhead for I/O Fields
In our studies, we selected three customer
WebFaced applications, one simple, one
moderate, and one complex. See table 6.4, for
Figure 6.7 Shows the impact on CPU that the number of I/O
fields has per WebFaced panel
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 6 - Web Server and WebSphere
107
Download from Www.Somanuals.com. All Manuals Search And Download.
details on the number of I/O fields for each of these workloads. We ran the workloads on three separate
machines (see table 6.5) to validate the performance characteristics with regard to CPW. In our running
of the workloads, we tolerated only a 1.5 second server response time per panel. This value does not
include the time it takes to render the image on the client system, but only the time it took the server to
get the information to the client system. The machines that we used are in Table 6.5, and include the 800
and i810 (V5R2 Hardware) and the 170 (V4R4 Hardware). All systems were running OS/400 V5R2.
Some of the results that we saw in our tests are shown in Figure 6.8. This figure shows the scalability
across different hardware running the same workload. A user is defined as a client that requests one new
5250 panel every 15 seconds. According to our tests, we see relatively even results across the three
machines. The one machine that is a slight difference is the V4R4 hardware (1090 CPW). This slight
difference can be explained by release-to-release degradation. Since the CPW measurement were made in
V4R4, there have been three major releases, each bringing a slight degradation in performance. This
results in a slight difference in CPW value. With
Name
Average number of I/O Fields / panel
37
99
612
this taken into effect, the CPW/User measurement is
more in line with the other two machines.
Workload A
Workload B
Workload C
Many 5250 applications have been implemented
with "best performance" techniques, such as
minimized number of fields and amount of data
exchanged between the device and the application.
Table 6.4 Average number of I/O fields for each workload
defined in this section.
Other 5250 applications may not be as efficiently implemented, such as restoring a complete window of
data, when it was not required. Therefore it is difficult to give a generalized performance comparison
between the same application written to a 5250 device and that application using WebFacing to a
browser. In the three workloads that we measured, we saw a significant amount of resource needed to
WebFace these applications. The numbers varied from 3x up to 8x the amount of CPU resources needed
for the 5250 green screen application.
Use the IBM Systems Workload Estimator to predict the capacity characteristics for IBM WebFacing
This site will be updated, more often then this paper, so it will contain the most recent information. The
Workload Estimator will ask you to specify a transaction rate (5250 panels per hour) for a peak time of
day. It will further attempt to characterize your
CPW / User
workload by considering the complexity of the panels
and the number of unique panels that are displayed by
the JSP. You’ll find the tool at:
A workload description along with good help text is
available on this site. Work with your marketing
representative to utilize this tool (also see chapter 23).
6
5
4
3
2
1
0
2700 CPW
1090 CPW
300 CPW
WorkLoad C WorkLoad B
Version 5.0 of Webfacing
There have been a significant number of enhancements
delivered with V5.0 of Webfacing including:
Figure 6.8 CPW per User across the machines
documented in table 6.5
•
(Advanced Edition Only) Support for viewing
and printing spooled files
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 6 - Web Server and WebSphere
108
Download from Www.Somanuals.com. All Manuals Search And Download.
•
•
(Advanced Edition Only) Struts-compliant code generated by the WebFacing Tool conversion
process which sets the foundation for extending your Webfaced applications using
struts-compliant action architecture
Automatic configuration for UTF-8 support when you deploy to WebSphere Application Server
version 5.0
•
•
•
•
Support for function keys within window records
Enhanced hyperlink support
Improved memory optimization for record I/O processing.
Support to enable compression to improve response times on slow connections.
The two important enhancements from a performance perspective will be discussed below. For other
information related to Webfacing V5.0, please refer to the following website:
Display File Record I/O Processing
Display file record I/O processing has been optimized to decrease the WebSphere Application Server
runtime memory utilization. This has been accomplished by enhancing the Webfacing runtime to better
utilize the java objects required for processing display I/O requests for each end user transaction.
Formerly on each record I/O, Webfacing had to create a record data bean object to describe the I/O
request, and then create the record bean using this definition to pass the I/O data to the associated JSP.
These definition objects were not reused and were created for each user. With the optimization
implemented in V5.0, the record bean definitions are now reused and cached so that one instance for each
display file record can be shared by all users.
This optimization has decreased the overall memory requirements for Webfacing V5.0 versus V4.0. This
memory savings helps reduce the total memory required by the WebSphere Application Server, which is
referred to as the JVM Heap Size. The amount of memory savings depends on a number of parameters,
such as the complexity of the screens (based on number of fields per screen), the transaction rate, and the
number of concurrent end users. On measurements made with approximately 250 users and varying
screen complexity, the JVM Heap decreased by approximately 5 % for simple to moderate screens (99
fields per screen) and up to 20 % for applications with more complex screens (600 fields per screen).
When looking at the overall memory requirements for an application, the JVM Heap size is just one
component. If you are running the back-end application on the same server as the WebSphere Application
server, the overall decrease in system memory required for the Webfaced application will be less.
In terms of WebSphere CPU utilization, this optimization offers up to a 10% improvement for complex
workloads. However, when taking into account the overall CPU utilization for a Webfaced application
(Webfacing plus the application), you can expect equal or slightly better performance with Webfacing
V5.0.
Tuning the Record Definition Cache
In order to best use the optimization provided by this enhancement, servlet utilities have been included in
the Webfacing support to assess cache efficiency, set the cache size, and preload it with the most
frequently accessed record definitions. If you do not use the Record Definition Cache, or it is not tuned
properly, you will see degraded performance of Webfacing V5.0 versus V4.0.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
109
Download from Www.Somanuals.com. All Manuals Search And Download.
When set to an appropriate level for the Webfaced application, the Record Definition Cache can provide a
decrease in memory usage, and slightly decreased processor usage. The number of record definitions that
the cache will retain is set by an initialization parameter in the Webfaced application’s deployment
descriptor (web.xml). By changing the cache size, the Webfaced application can be tuned for best
performance and minimum memory requirements. The cache size determines the number of record data
definitions that will be retained in the cache. There is one record data definition for each record format.
Cache Size
too small
Effect
When the cache size is set too small for the Webfaced
application it will adversely affect the performance. In this
case, the definitions would be cached then discarded before
being re-used. There is significant overhead to create the
record definitions.
correct
With the cache set correctly, 90% of all accessed record data
definitions would be retained in the cache with few cache
misses for not commonly used records.
too large
If the cache is set too large then all record data definitions
for the Webfaced application would be cached, likely
consuming memory for seldom used definitions.
In order to determine what is the correct size for a given Webfaced application, the number of commonly
used record formats needs to estimated. This can be used as a starting point for setting the cache size.
The default size, if no size is specified, would be 600 record data definitions. To set the cache size to
something other than the default size, you need to add a session context parameter in the Webfaced
application’s web.xml file. In the following example the cache size is set to 200 elements, which may be
appropriate for a very small application, like the Order Entry example program.
<context-param>
<param-name>WFBeanCacheSize</param-name>
<param-value>200</param-value>
<description>WebFacing Record Definition Bean Cache Size</description>
</context-param>
NOTE: For information on defining a session context parameter in the web.xml file, refer to the
WebSphere Application Server Info Center. You can also edit the web.xml file of a deployed application.
Typically this file will be located in the following directory for WebSphere V5.0 applications:
/QIBM/UserData/WebAS5/Base/<application-server>/config/cells/..../WEB_INF
And the following directory for WebSphere Express V5.0 applications:
/QIBM/UserData/WebASE/ASE5/<application-server>/config/cells/..../WEB_INF
Cache Management - Definition Cache Content Viewer
To assist with managing the Record Definition Cache, two servlets can be enabled. One is used to display
the elements currently in the cache and the other can be used to load the cache. Both of these servlets are
not normally enabled in a WebFacing application in order to prevent mis-use or exposure of data.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
110
Download from Www.Somanuals.com. All Manuals Search And Download.
To enable the servlet that will display the contents of the cache, first add the following segments to the
Webfaced application’s web.xml.
<servlet>
<servlet-name>CacheDumper</servlet-name>
<display-name>CacheDumper</display-name>
<servlet-class>com.ibm.etools.iseries.webfacing.diags.CacheDumper</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>CacheDumper</servlet-name>
<url-pattern>/CacheDumper</url-pattern>
</servlet-mapping>
This servlet can then be invoked with a URL like: http://<server>:<port>/<webapp>/CacheDumper.
Then a Web page like that shown below will be displayed. Notice that the total number of cache hits and
misses are displayed, as are the hits for each record definition.
Refer to the following table for the functionality provided by the Cache Viewer servlet.
Cache Viewer Button operations
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
111
Download from Www.Somanuals.com. All Manuals Search And Download.
Button
Operation
Reset Counters
Set Limit
Resets the cache hit and miss counters back to 0.
Temporarily sets the cache limit to a new value.
Setting the value lower than the current value will
cause the cache to be cleared as well.
Refresh
Refresh the display of cache elements.
Clear Cache
Save List
Drop all the cached definitions.
Save a list of all the cached record data definitions.
This list is saved in the RecordJSPs directory of the
Webfaced application. The actual record definitions
are not saved, just the list of what record definitions
are cached. Once the cache is optimally tuned, this list
can be used to preload the Record Definition cache.
Cache Management - Record Definition Loader
As a companion to the Cache Content Viewer tool, there is also a Record Definition Cache Loader tool,
which is also referred to as the Bean Loader. This servlet can be used to pre-load the cache to aid in the
determination of the optimal cache size, and then finally, to pre-load the cache for production use. To
enable this servlet add the following two xml segments in the web.xml file.
<servlet>
<servlet-name>BeanLoader</servlet-name>
<display-name>BeanLoader</display-name>
<servlet-class>com.ibm.etools.iseries.webfacing.diags.BeanLoader</servlet-class>
</servlet>
<servlet-mapping>
<servlet-name>BeanLoader</servlet-name>
<url-pattern>/BeanLoader</url-pattern>
</servlet-mapping>
Invoking this servlet will present a Web page similar to the following.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
112
Download from Www.Somanuals.com. All Manuals Search And Download.
Refer to the following table for the functionality provided by the Record Definition Loader servlet.
Record Definition Loader Button operations
Button
Operation
Infer from JSP
Names
This will cause the loader servlet to infer record
definition names from the names or the JSP's
contained in the RecordJsps directory. It will not find
all the record definitions but it will get most of them.
This option will load the record definitions listed in a
file in the RecordJSPs directory. Typically this file is
created with the CacheDumper servlet previously
described.
Load from File
The Record Definition Loader servlet can also be used to pre-load the bean definitions when the
Webfaced application is started. To enable this the servlet definition in the web.xml needs to be updated
to define two init parameters: FileName and DisableUI. The FileName parameter indicates the name of
the file in the RecordJSPs directory that contains the list of definitions to pre-load the cache with. The
DisableUI parameter indicates that the Web UI (as presented above) would be disabled so that the servlet
can be used to safely pre-load the definitions without exposing the Webfaced application.
<servlet>
<servlet-name>BeanLoader</servlet-name>
<display-name>BeanLoader</display-name>
<servlet-class>com.ibm.etools.iseries.webfacing.diags.BeanLoader</servlet-class>
<init-param>
<param-name>FileName</param-name>
<param-value>cachedbeannames.lst</param-value>
</init-param>
<init-param>
<param-name>DisableUI</param-name>
<param-value>true</param-value>
</init-param>
<load-on-startup>10</load-on-startup>
</servlet>
Compression
LAN connection speeds and Internet hops can have a large impact on page response times. A fast server
but slow LAN connection will yield slow end-user performance and an unhappy customer.
It is very common for a browser page to contain 15-75K of data. Customers who may be running a
Webfaced application over a 256K internet connection might find results unacceptable. If every screen
averages 60K, the time for that data spent on the wire is significant. Multiply that by several users
simultaneously using the application, and page response times will be longer.
There are now two options available to support HTTP compression for Webfaced applications, which will
significantly improve response times over a slow internet connection. As of July 1, 2003, compression
support was added with the latest set of PTFs for IBM HTTP Server (powered by Apache) for i5/OS
(5722-DG1). Also, Version 5.0 of Webfacing was updated to support compression available in
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
113
Download from Www.Somanuals.com. All Manuals Search And Download.
WebSphere Application Server. On System i servers, the recommended WebSphere application
configuration is to run Apache as the web server and WebSphere Application Server as the application
server. Therefore, it is recommended that you configure HTTP compression support in Apache. However,
in certain instances HTTP compression configuration may be necessary using the Webfacing/WebSphere
Application Server support. This is discussed below.
The overall performance in both cases is essentially equivalent. Both provide significant improvement for
end-user response times on slower Internet connections, but also require additional HTTP/WebSphere
Application Server CPU resources. In measurements done with compression, the amount of CPU
required by HTTP/WebSphere Application Server increased by approximately 25-30%. When
compression is enabled, ensure that there is sufficient CPU to support it. Compression is particularly
beneficial when end users are attached via a Wide Area Network (WAN) where the network connection
speed is 256K or less. In these cases, the end user will realize significantly improved response times (see
chart below). If the end users are attached via a 512K connection, evaluate whether the realized response
time improvements offset the increased CPU requirements. Compression should not be used if end users
are connected via a local intranet due to the increased CPU requirements and no measurable improvement
in response time.
Webfacing - Compression
7
6
5
4
3
2
1
0
Without
Compression
With Compression
64K 128K 512K Local
Network Data Rate
NOTE: The above results were achieved in a controlled environment and may not be repeatable in other
environments. Improvements depend on many factors.
Enabling Compression in IBM HTTP Server (powered by Apache)
The HTTP compression support was added with the latest set of PTFs for IBM HTTP Server for
i5/OS (5722-DG1). For V5R1, the PTFs are SI09287 and SI09223. For V5R2, the PTFs are SI09286 and
SI09224.
There is a LoadModule directive that needs to be added to the HTTP config file in order to get
compression based on this new support. It looks like this:
LoadModule deflate_module /QSYS.LIB/QHTTPSVR.LIB/QZSRCORE.SRVPGM
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
114
Download from Www.Somanuals.com. All Manuals Search And Download.
You also need to add the directive:
SetOutputFilter DEFLATE
to the container to be compressed, or globally if the compression can always be done. There is
documentation on the Apache website on mod_deflate
(http://httpd.apache.org/docs-2.0/mod/mod_deflate.html) that has information specific to setting up for
compression. That is the best place to look for details. The LoadModule and SetOutputFilter directives
are required for mod_deflate to work. Any other directives are used to further define how the
compression is done.
Since the compression support in Apache for i5/OS is a recent enhancement , Information Center
documentation for the HTTP compression support was not available when this paper was created. The
IBM HTTP Server or i5/OS website (http://www.ibm.com/servers/eserver/iseries/software/http/) will be
updated with a splash when the InfoCenter documentation has been completed. Until the documentation
is available, the information at http://httpd.apache.org/docs-2.0/mod/mod_deflate.html can be used as a
reference for tuning how mod_deflate compression is done.
Enabling Compression using IBM Webfacing Tool and WebSphere Application Server Support
You would configure compression using the Webfacing/WebSphere support in environments where the
internal HTTP server in WebSphere Application Server is used. This may be the case in a test
environment, or in environments running WebSphere Express V5.0 on an xSeries Server.
With the IBM WebFacing Tool V5.0, compression is ‘turned on’ by default. This should be ‘turned off’ if
compression is configured in Apache or if the LAN environment is a local high speed connection. This is
particularly important if the CPU utilization of interactive types of users (Priority 20 jobs) is about
70-80% of the interactive capacity. In order to ‘turn off’ compression, edit the web.xml file for a
deployed Web application. There is a filter definition and filter mapping definition that defines
compression should be used by the WebFacing application (see below). These statements should be
deleted in order to ‘turn off’ compression. In a future service pack of the WebFacing Tool, it is planned
that compression will be configurable from within WebSphere Development Studio Client.
<filter id="Filter_1051910189313">
<filter-name>CompressionFilter</filter-name>
<display-name>CompressionFilter</display-name>
<description>WebFacing Compression Filter</description>
<filter-class>com.ibm.etools.iseries.webfacing.runtime.filters.CompressionFilter</filter-class>
</filter>
<filter-mapping id="FilterMapping_1051910189315">
<filter-name>CompressionFilter</filter-name>
<url-pattern>/WFScreenBuilder</url-pattern>
</filter-mapping>
Additional Resources
The following are additional resources that include performance information for Webfacing including
how to setup pretouch support to improve JSP first-touch performance:
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
115
Download from Www.Somanuals.com. All Manuals Search And Download.
PartnerWorld for Developers Webfacing website:
IBM WebFacing Tool Performance Update - This white paper expains how to help optimize
WebFaced Applications on IBM System i servers. Requests for the paper require user registration; there
are no charges.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
116
Download from Www.Somanuals.com. All Manuals Search And Download.
6.5 WebSphere Host Access Transformation Services (HATS)
WebSphere Host Access Transformation Services (HATS) gives you all the tools you need to quickly and
easily extend your legacy applications to business partners, customers, and employees. HATS makes your
5250 applications available as HTML through the most popular Web browsers, while converting your
host screens to a Web look and feel. With HATS it is easy to improve the workflow and navigation of
your host applications without any access or modification to source code.
What’s new with V5R4 and HATS 6.0.4
The IBM WebFacing Tool has been delivering reliable and customizable Web-enabled applications for
years. Host Access Transformation Services (HATS) has been providing seamless runtime
Web-enablement. Now, with the IBM WebFacing Deployment Tool with HATS Technology (WDHT),
IBM offers a single product with the power of both technologies.
This offering replaces HATS for iSeries and HATS for System i model 520. For HATS applications
created using HATS Toolkit 6.0.4 and deployed to a V5R4 system, you can now connect to the
WebFacing Server and eliminate the Online Transaction Processing charge. Without the OLTP
requirement for deploying a HATS application to i5/OS starting with V5R4, the overall cost of HATS
solutions is significantly reduced. HATS applications can now be deployed to i5/OS Standard Edition.
With WDHT, WebFacing applications can call non-WebFacing applications and those programs will be
dynamically transformed for the Web using HATS technology.
HATS Customization
HATS uses a rules-based engine to dynamically transform 5250 applications to HTML. The process
preserves the flow of the application and requires very little technical skill or customization.
Unless you do explicit customization for an application, the default HATS rules will be used to transform
the application interface dynamically at runtime. This is referred to as default rendering. Basically a
default template JSP is used for all application screens. There is the capability to change the default
template to customize the web appearance, but at runtime the application screens are still dynamically
transformed.
As an alternative, you can use HATS studio (built upon the common WebSphere Studio Workbench
foundation) to capture and customize select screens or all screens in an application. In this case a JSP is
created for each screen that is captured. Then at runtime the first step HATS performs is to check to see if
there are any screens that have been captured and identified that match the current host screen. If there are
no screen customizations, then the default dynamic transformation is applied. If there is a screen
customization that matches the current host screen, then whatever actions have been associated with this
screen are executed.
Since default rendering results in dynamic screen transformation at run time, it will require more CPU
resources than if the screens of an application have been customized. When an application is customized,
JSPs are created so that much of the transformation is static at run time. Based on measurements for a mix
of applications using the following levels of customizations, Moderate Customization typically requires
5-10% less CPU as compared to Default Rendering. With Advanced Customization, typically 20-25%
less CPU is required as compared to Default Rendering. You have to take into account, though, that
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
117
Download from Www.Somanuals.com. All Manuals Search And Download.
customization requires development effort, while Default Rendering requires minimal development
resources.
Default: The screens in the application’s main path are unchanged.
Moderate: An average of 30% of the screens have been customized.
Advanced: All screens have been customized.
HATS Customization (CPW/User)
6
5
4
Default
3
2
1
0
Moderate
Advanced
Application1
Application2
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 6 - Web Server and WebSphere
118
Download from Www.Somanuals.com. All Manuals Search And Download.
IBM Systems Workload Estimator for HATS
The purpose of the IBM Systems Workload Estimator (WLE) is to provide a comprehensive System i
sizing tool for new and existing customers interested in deploying new emerging workloads standalone or
in combination with their current workloads. The Estimator recommends the model, processor, interactive
feature, memory, and disk resources necessary for a mixed set of workloads. WLE was enhanced to
support sizing a System i server to meet your HATS workload requirements.
This tool allows you to input an interactive transaction rate and to further characterize your workload.
your marketing representative to utilize this tool, and also refer to chapter 22 for more information.
6.6 System Application Server Instance
WebSphere Application Server - Express for iSeries V5 (5722-IWE) is delivered with i5/OS V5R3
providing an out-of-the-box solution for building static and dynamic websites. In addition, V5R3 is
shipped with a pre-configured Express V5 application server instance referred to as the System
Application Server Instance (SYSINST). The SYSINST has the following IBM supplied system
administrative web applications pre-installed1, providing an easy-to-use web GUI interface to
administration tasks:
y
iSeries Navigator Tasks for the Web
Access core systems management tasks and access multiple systems through one System i server
from a web browser . Please see the following for more information:
Tivoli Directory Server Web Administration Tool
y
Setup new or manage existing (LDAP) directories for business application data. Please see the
following for more information:
The SYSINST is not started by default when V5R3 is installed. Before you begin working with the above
functions, the Administration instance of the HTTP Server (port 2001) must be running on your system.
The HTTP Admin instance provides an easy-to-use interface to manage web server and application server
instances, and allows you to configure the SYSINST to start whenever the HTTP Admin instance is
started. The above administrative web applications will then be accessible once the SYSINST is started.
Please refer to the following website for more information on how to work with the HTTP Admin
instance in configuring the SYSINST:
The minimum recommended requirements to support a limited number of users accessing a set of
administration functions provided by the SYSINST is 1.25 GB of memory and a system with at least 450
CPW. If you are utilizing only one of the administration functions, such as iSeries Navigator Tasks for
the Web or Tivoli Directory Server Web Administration Tool, then the recommended minimum memory
is 1 GB. Since the administration functions are integrated with the HTTP Administration Server, the
resources for this are included in the minimum recommended requirements. The recommended minimum
1
Only IBM supplied administrative web applications can be installed in the SYSINST. Customer web
applications will need to be deployed to a customer-created application server instance
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
119
Download from Www.Somanuals.com. All Manuals Search And Download.
requirements do not take into account the requirement for other web applications, such as customer
applications. You should use IBM Systems Workload Estimator
(http://www-912.ibm.com/wle/EstimatorServlet) to determine the system requirements for additional web
applications.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
120
Download from Www.Somanuals.com. All Manuals Search And Download.
6.7 WebSphere Portal
The IBM WebSphere Portal suite of products enables companies to build a portal website serving the
individual needs of their employees, business partners and customers. Users can sign on to the portal and
view personalized web pages that provide access to the information, people and applications they need.
This personalized, single point of access to resources reduces information overload, accelerates
productivity and increases website usage. As WebSphere Portal supports access through mobile devices,
as well as the desktop browser, critical information is always available. Visit the WebSphere Portal
InfoCenter for more information:
Use the IBM Systems Workload Estimator (Estimator) to predict the capacity characteristics for
WebSphere Portal (using the WebSphere Portal workload category). For custom applications, the
Workload Estimator will ask you questions about your portal pages served, such as the number of portlets
per page and the complexity of each portlet. It will also ask you to specify a transaction rate (visits per
hour) for a peak time of day. In addition to custom applications, the Estimator supports Portal Document
Manager (PDM) and Web Content Management (WCM) for some releases of WebSphere Portal. Because
of potential performance differences between WebSphere Portal releases, the projections for one release
cannot be applied to other releases.
y
y
y
WebSphere Portal Enable 5.1 - Custom applications only.
WebSphere Portal 6.0 - Custom applications and PDM.
WebSphere Portal Express 6.0 - Custom applications, PDM, and WCM.
descriptions and help text for the Portal workloads are available in the Estimator. Please work with your
marketing representative when using the Estimator to size Portal workloads (see also chapter 22).
6.8 WebSphere Commerce
Use the IBM Systems Workload Estimator to predict the capacity characteristics for WebSphere
Commerce performance (using the Web Commerce workload category). The Workload Estimator will
ask you to specify a transaction rate (visits per hour) for a peak time of day. It will further attempt to
characterize your workload by considering the complexity of shopping visits (browse/order ratio, number
of transactions per user visit, database size, etc.). Recently, the Estimator has also been enhanced to
include WebSphere Commerce Pro Entry Edition. The Web Commerce workload also incorporates
WebSphere Commerce Payments to process payment transactions. You’ll find the tool at:
http://www.ibm.com/eserver/iseries/support/estimator. A workload description along with good help text
is available on this site. Work with your marketing representative to utilize this tool (see also chapter 23).
To help you tune your WebSphere Commerce website for better performance on the System i platform,
there is a performance tuning guide available at:
http://www-1.ibm.com/support/docview.wss?uid=swg21198883. This guide provides tips and techniques,
as well as recommended settings or adjustments, for several key areas of WebSphere and DB2 that are
important to ensuring that your website performs at a satisfactory level.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
121
Download from Www.Somanuals.com. All Manuals Search And Download.
6.9 WebSphere Commerce Payments
Use the IBM Systems Workload Estimator to predict the capacities and resource requirements for
WebSphere Commerce Payments. The Estimator allows you to predict a standalone WCP environment
or a WCP environment associated with the buy visits from a WebSphere Commerce estimation. Work
with your marketing representative to utilize this tool. You’ll find the tool at:
Workload Description: The PayGen workload was measured using clients that emulate the payment
transaction initiated when Internet users purchase a product from an e-commerce shopping site. The
payment transaction includes the Accept and Approve processing for the initiated payment request.
WebSphere Commerce Payments has the flexibility and capability to integrate different types of payment
cassettes due to the independent architecture. Payment cassettes are the plugins used to accommodate
payment requirements on the Internet for merchants who need to accept multiple payment methods. For
more information about the various cassettes, follow the link below:
Performance Tips and Techniques:
1. DTD Path Considerations: When using the Java Client API Library (CAL), the performance of the
WebSphere Commerce Payments can be significantly improved if the merchant application specifies
the dtdPath parameter when creating a PaymentServerClient. When this parameter is specified, the
overhead of sending the entire IBMPaymentServer.dtd file with each response is avoided. The
dtdPath parameter should contain the path of the locally stored copy of the IBMPaymentServer.dtd
file. For the exact location of this file, refer to the Programmer's Guide and Reference at the
following link:
2. Other Tuning Tips: More performance tuning tips can be found in the Administrator’s Guide under
Appendix D at the following link:
3. WebSphere Tuning Tips: Please refer to the WebSphere section in section 6.2, for a discussion on
WebSphere Application Server performance as well as related web links.
6.10 Connect for iSeries
IBM Connect for iSeries is a software solution designed to provide System i server customers and
business partners a way to communicate with an eMarketplace. Connect for iSeries was developed as a
software integration framework that allows customers to integrate new and existing back-end business
applications with those of their trading partners. It is built on industry standards such as Java, XML and
MQ Series.
The framework supports plugins for multiple trading partner protocols. Connect for iSeries also provides
pluggable connectors that make it easy to communicate to various back-end applications through a variety
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
122
Download from Www.Somanuals.com. All Manuals Search And Download.
of access mechanisms. Please see the Connect for iSeries white paper located at the following URL for
more information on Connect for iSeries.
“B2B New Order Request” Workload Description: This workload is driven by a program that runs on
a client work station that simulates multiple Web users. These simulated users send in cXML “New
Order Request” transactions to the System i server by issuing an HTTP post which includes the cXML
New Order Request file as the body of the message. Besides the Connect for iSeries product, other files
and back-end application code exist to complete this transaction flow. For this workload, XML validation
was disabled for both requests and response flows. The intention of this workload is to drive the server
with a heavy load and to quantify the performance of Connect for iSeries.
Measurement Results: One of the main focal points was to evaluate and compare the differences
between the back-end application connector types. The five connector types compared were the Java,
JDBC, MQ Series, Data Queue, and PCML connectors. The graphs below illustrates the relative
capacities for each of the connector types. Please visit this link to learn about differences in connector
types.
Connect for iSeries
Connector Types
Java
JDBC
Data Queue
MQ Series
PCML
Figure 6.9 Connect for iSeries - Connector Types
Performance Observations/Tips:
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
123
Download from Www.Somanuals.com. All Manuals Search And Download.
1. Connector relative capacity: The different back-end connector types are meant to allow users a
simple way to connect the Connect for iSeries product to their back-end application. Your choice in a
connector type may be dictated by several factors. Clearly, one of these factors relate to your existing
back-end application and the programming language it is written in. This, in itself, may limit your
choice for a back-end connector type. Please see the Connect for iSeries white paper to assist you in
understanding the different connector types.
Performance was measured for a simple cXML New Order Request. The Java connector performance
may vary depending on the code you write for it. All connectors “mapped” approximately the same
number of “fields” to make a fair comparison. The PCML connector has overhead associated with it
in starting a job for each transaction via “SBMJOB”. You can pre-start a pool of these jobs which
may increase performance for this connector type.
2. XML Validation: XML validation should be avoided when not needed. Although many businesses
will decide to have this feature on (you may not be able to assume the request is both “well formed
and validated”) there are significant performance implications with this property “on”. One thought
would be to enable XML validation during your testing phase. Once your confident that your trading
partner is sending valid and well-formed XML, you may want to disable XML validation to improve
performance.
3. Tracing: Try to avoid tracing when possible. If enabled, it will impact your performance. However,
in some cases it is unavoidable (e.g. trouble shooting problems).
4. Management Central Logging: This feature will log transaction data to be queried and viewed with
Management Central. Performance is impacted with this feature “on” and must be taken into
consideration when deciding to use this feature.
5. MQ Series Management Central Audit Queue: Due to the fact that the Management Central
Auditing logs messages into a MQ Series queue for processing, the default queue size may not be
large enough if you run at a very high transaction rate. This can be adjusted by issuing wrkmqm and
selecting the queue manager for your Connect for iSeries instance, selecting option 18 (work with
queues) on that queue manager, selecting option 2 (change) and increasing the Maximum Queue
Depth property. This property, when enabled, added approximately 15% overhead to the “B2B New
Order Request” workload.
6. Recovery (Check pointing): Enabling transaction recovery adds significant overhead. This should
be avoided when not needed. This property when enabled added approximately 50% overhead to the
“B2B New Order Request” workload.
7. MQ Series Connector Queue Configuration: By default, in MQ Series 5.2, the queue manager
uses a single threaded listener which submits a job to handle each incoming connection request. This
has performance implications also. The queue manager can be changed to having a multithreaded
listener by adding the following property to the file
\QIBM\UserData\mqm\qmgrs\QMANAGERNAME\qm.ini
Channels:
ThreadedListener=Yes
The multithreaded listener can boast a higher throughput, but the single threaded listener is able to
handle many more concurrent connections. Please see MQ Series site for help with MQ Series.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 6 - Web Server and WebSphere
124
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 7. Java Performance
Highlights:
y
y
y
y
y
y
y
y
Introduction
What’s new in V6R1
IBM Technology for Java (32-bit and 64-bit)
Classic VM (64-bit)
Determining Which JVM to Use
Capacity Planning
Tips and Techniques
Resources
7.1 Introduction
Beginning in V5R4, IBM began a transition to a new VM implementation for i5/OS, IBM Technology for
Java, to replace the Classic VM. This transition continues in V6R1 with the introduction of a 64-bit
version of IBM Technology for Java, providing a new solution for Java applications which require large
amounts of memory. The transition is expected to be completed in the next version of i5/OS, which will
no longer support the Classic VM. In the mean time, one of the key performance -related decisions for
i5/OS Java users is which JVM to use.
Earlier versions of this document have followed the performance of Java from its infancy to maturity.
Early Java applications were often a departure from the traditional OS/400 application architecture, with
custom application code responsible for a large portion of the CPU used by the application. Therefore,
earlier versions of this document emphasized micro-optimizations – relatively small (though often
pervasive) changes to application code to improve performance.
Today’s Java applications, however, typically rely on a variety of system services such as JDBC,
encryption, and security provided by i5/OS, the Java Virtual Machine (VM), and WebSphere Application
Server (WAS), along with other products built on top of WebSphere. As a result, many Java applications
now spend far more time in these system services than in custom code. For many applications, this means
that performance depends mainly on the performance of IBM code (i5/OS, the Java VM, WebSphere,
etc.) and the way that these services are used by the application. Micro-optimizations can still be
important in some cases, but are not as critical as they have been in the past.
Tuning is also important for getting good performance in the Java environment. Tuning garbage
collection is perhaps the most common example. Thread and connection pool tuning is also frequently
important. Proper tuning of i5/OS can also make a big impact on Java application performance.
7.2 What’s new in V6R1
In V5R4 IBM introduced IBM Technology for Java, a new VM implementation built on technology used
across all of the IBM Systems platforms. In V5R4 only a 32-bit version of IBM Technology for Java was
supported; in V6R1, a new 64-bit version of IBM Technology for Java is also available, providing a new
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 7 - Java Performance
125
Download from Www.Somanuals.com. All Manuals Search And Download.
option for Java applications which require large amounts of memory. The Classic VM remains available
in V6R1, but future i5/OS releases are expected to support only IBM Technology for Java.
The default VM in V6R1 is IBM Technology for Java 5.0, 32-bit. Other supported versions of IBM
Technology for Java include 5.0 64-bit, 6.0 32-bit, and 6.0 64-bit. (6.0 versions will require the latest
PTFs to be loaded.) The Classic VM supports Java versions 1.4, 5.0, and 6.0. In V5R4, the default VM
is Classic 1.4. Classic 1.3, 5.0, and 6.0 are also supported, as well as IBM Technology for Java 5.0 32-bit
and 6.0 32-bit.
Java applications using the Classic VM will generally have equivalent performance between V5R4 and
V6R1, although applications which use JDBC to access database may see some improvement. The
Classic VM no longer supports Direct Execution (DE) in V6R1; all applications will run with the Just In
Time (JIT) compiler. As a result, applications which previously used DE may see some performance
difference (usually a significant improvement) when moving to V6R1. Because the same underlying VM
is used for all versions of Classic, most applications will see little performance difference between the
different JDK levels.
V6R1 offers significant performance improvements over V5R4 when running IBM Technology for Java
-- on the order of 10% for many applications, with larger improvements possible when using the -Xlp64k
flag to enable 64k pages. In addition, there are substantial performance improvements when moving from
IBM Technology for Java 5.0 to 6.0. Performance improvements are frequently introduced in PTFs.
Recent generations of hardware have greatly improved the performance of computationally-intensive
applications, which include most Java applications. Since their introduction in V5R3, System i5 servers
employing POWER5 processors – models 520, 550, 570, and 595 – have a proven record of providing
excellent performance for Java applications. The POWER5+ models introduced with V5R4 build on this
success, with performance improvements of up to 30% for the same number of processors in some
models. The new POWER6 models introduced in 2007 provide further performance gains, especially for
Java applications, which tend to be computationally intensive.
The 515 and 525 models introduced in April, 2007 all include a minimum of 3800 CPW and include L3
cache. These systems deliver solid Java performance at the low-end. Other attractive options at the
low-end are the 600 and 1200 CPW models (520-7350 and 520-7352), which have an accelerator feature
which allow them to be upgraded to 3100 and 3800 CPW (non-interactive), respectively.
7.3 IBM Technology for Java (32-bit and 64-bit)
IBM’s extensive research and development in Java technology has resulted in significant advances in
performance and reliability in IBM’s Java implementations. Many of these advances have been
incorporated into the i5/OS Classic VM, but in order to make the latest developments available to
System i customers as quickly as possible, IBM introduced a new 32-bit implementation of Java to i5/OS
in V5R4. This VM is built on the same technology as IBM’s VMs for other platforms, and provides a
modular and flexible base for further improvements in the future. In V6R1, a 64-bit version of the same
VM is also available.
IBM Technology for Java currently supports Java 5.0 (JDK version 1.5) and (with the latest PTFs) Java 6
(JDK version 1.6). Older versions of the JDK are only supported with the Classic 64-bit VM.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 7 - Java Performance
126
Download from Www.Somanuals.com. All Manuals Search And Download.
On i5/OS, IBM Technology for Java runs in i5/OS Portable Application Solutions Environment (i5/OS
PASE) with either a 32-bit (for the 32-bit VM) or 64-bit (for the 64-bit VM) environment. Due to
sophisticated memory management, both the 32-bit and 64-bit VMs provide a significant reduction in
memory requirements over the Classic VM for most applications. Because the 32-bit VM uses only 4
bytes (instead of 8 bytes) for object references, applications will have an even smaller memory footprint
with the 32-bit VM; however, the 32-bit address space leads to a maximum heap size of 2.5 - 3.25 GB,
which may not be enough memory for some applications.
Because IBM Technology for Java shares a common implementation with IBM’s VMs on other
platforms, the available tuning parameters are essentially the same on i5/OS as on other platforms. This
will require some adjustment for users of the i5/OS Classic VM, but may be a welcome change for those
who work with Java on multiple platforms.
Some of the key areas to be aware of when considering use of IBM Technology for Java are described
below.
Native Code
Because IBM Technology for Java runs in i5/OS PASE, there is some additional overhead in calls to
native ILE code. This may affect performance of certain applications which make calls to native ILE
code through the Java Native Interface (JNI). Calls to certain operating system services, such as IFS file
access and socket communication, may also have some additional overhead, although the overhead
should be minimal for applications with a typical use of these services. Conversely, JNI calls to PASE
native methods will have less overhead than they did with the Classic VM, offering a performance
improvement for some applications.
The performance impact for JNI method calls to ILE will depend on the frequency of JNI calls and the
complexity of the native methods. If the calls are infrequent, or if the native methods are very complex
(and therefore take a long time to execute), the increased overhead may not make a big difference in the
overall performance of the application. Applications which make frequent calls to simple native methods
may see a more significant performance impact compared to the 64-bit Classic VM.
For some applications, it may be possible to port these native methods to run in i5/OS PASE rather than
in ILE, greatly reducing the overhead of the native call. In other cases, it may be possible to modify the
application to require fewer JNI calls.
Garbage Collection
Recommendations for Garbage Collector (GC) tuning with the i5/OS Classic VM have always been a bit
different from tuning recommendations for Java VMs on other platforms. While the main GC tuning
parameters (initial and max heap size) have the same names as the key parameters for other VMs, and are
set in the same way when running Java from qsh (-Xms and -Xmx), the meaning of these parameters in
the Classic 64-bit VM is significantly different. However, with IBM Technology for Java these
parameters mean the same thing that they do in IBM VMs on other platforms. Many users will welcome
this commonality; however, it does make the transition to the new VM a bit more complicated. The move
from a 64-bit VM to a 32-bit VM also complicates matters somewhat, as the ideal heap size will be
significantly lower in a 32-bit VM than in a 64-bit VM.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 7 - Java Performance
127
Download from Www.Somanuals.com. All Manuals Search And Download.
Fortunately, it is not too difficult to come up with parameter values which will provide good performance.
If you are moving an application from the Classic VM to IBM Technology for Java, you can use a tool
like DMPJVM or verbose GC to determine how large the heap grows when running your application.
This value can be used as the maximum heap size for 64-bit IBM Technology for Java; in 32-bit IBM
Technology for Java, about 75% of this value is a reasonable starting point. For example, if your
application’s heap grows to 256 MB when running in the Classic VM, try setting the maximum heap size
to 192 MB when running in the 32-bit VM. The initial heap size can be set to about half of this value –
96 MB in our example. These settings are unlikely to provide the best possible performance or the
smallest memory footprint, but the application should run reasonably well. Additional performance tests
and tuning could result in better settings.
If your application also runs on IBM VMs on other platforms, such as AIX, then you might consider
trying the GC parameters from those platforms as a starting point when using IBM Technology for Java
on i5/OS.
If you are testing a new application, or aren’t certain about the performance characteristics of an existing
application running in the Classic 64-bit VM, start by running the application with the default heap size
parameters (currently an initial heap size of 4 MB and a maximum of 2 GB). Run the application and see
how large the heap grows under a typical peak load. The maximum heap size can be set to this value (or
perhaps slightly larger). Then the initial heap size can be increased to improve performance. The optimal
value will depend on the application and several other factors, but setting the initial heap size to about
25% of the maximum heap size often provides reasonable performance.
Keep in mind that the maximum heap size for the 32-bit VM is 3328 MB. Attempting to use a larger
value for the initial or maximum heap size will result in an error. The maximum heap size is reduced
when using IBM Technology for Java’s “Shared Classes” feature or when files are mapped into memory
(via the java.nio APIs). The maximum heap size can also be impacted when running large numbers of
threads, or by the use of native code running in i5/OS PASE, since the memory used by this native code
must share the same 32-bit address space as the Java VM. As a result, many applications will have a
practical limit of 3 GB (3072 MB) or even less. Applications with larger heap requirements may need to
use one of the 64-bit VMs (either IBM Technology for Java or the Classic VM).
When heap requirements are not a factor, the 64-bit version of IBM Technology for Java will tend to be
slightly slower (on the order of 10%) than 32-bit with a somewhat larger (on the order of 70%) memory
footprint. Thus, the 32-bit VM should be preferred for applications where the maximum heap size
limitation is not an issue.
7.4 Classic VM (64-bit)
The 64-bit Classic Java Virtual Machine continues to be supported in V6R1, though most applications
should begin migrating to IBM Technology for Java to take advantage of its performance benefits. The
integration of the Classic VM into i5/OS provides some unique features and benefits, although this can
result in some confusion to users who are familiar with running Java applications on other platforms.
Some of the performance-related features you may need to be aware of are described below.
JIT Compiler
Interpreting the platform-neutral bytecodes of a Java class file, bytecode by bytecode, is one valid and
robust way to execute Java object code; it is not, however, the fastest way. To approach optimal Java
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 7 - Java Performance
128
Download from Www.Somanuals.com. All Manuals Search And Download.
performance, it pays to apply analysis and optimizations to the Java bytecodes, and the resulting machine
code.
One approach to optimizing Java bytecode involves analyzing the object code “ahead of time” – before it
is actually running. This “ahead-of-time” (AOT) compiler technology was used exclusively by the
original AS/400 Java Virtual Machine, whose success proved the power of such an approach.
However, any static AOT analysis suffers one fatal flaw: in a dynamically loading language such as Java,
it is impossible for an AOT compiler to know exactly what the environment will look like when the code
is actually being executed. Certain valuable optimizations – such as inter-class method inlining or
parameter-passing optimizations – cannot be made without adding extra checks to ensure that the
optimization is still valid at run-time. While these checks are trimmed down as much as possible, some
amount of overhead is unavoidable.
When Java was first introduced to the AS/400 it used an AOT compilation approach, with a combination
of bytecode interpretation and Direct Execution (DE) programs to statically optimize Java code for the
OS/400 environment, with startup and runtime performance usually significantly faster than what other
Java implementations at the time could provide.
Later, “Just-In-Time” (JIT) compiler technology was introduced in many Java VMs. Unlike AOT
compilation, JIT compiles Java bytecodes to machine code on-the-fly as the application is running. While
this introduces some overhead as the compilation occurs, the compiler can optimize much more
aggressively, because it knows the exact state of the system it is compiling for.
Over time, JIT compilation technology improved and was implemented alongside DE in the i5/OS Classic
VM. JIT performance overtook DE in the V5R2 time frame for most applications, and has continued to
improve at a faster rate. In V6R1, support for DE was eliminated, so the JIT will be used for all Java
applications.
Despite the improvements to JIT for both runtime and startup performance, startup time does tend to be
slightly longer for JIT than DE. Beginning in V5R2, the Mixed Mode Interpreter (MMI) is used to
interpret code until it has been executed a number of times (2000 by default, can be overridden by setting
the system property os400.jit.mmi.threshold) before JIT compiling it, resulting in improved
startup time. V5R3 introduced asynchronous JIT compilation, which further improved startup time,
especially on multiprocessor systems. As a result of these and other improvements, many applications
will no longer see a significant difference in startup time between DE and JIT. Even if startup time is a
bit longer with JIT, the improvement in runtime performance may be worth it, especially for long-running
applications which don’t start up frequently.
Prior to V6R1, the default execution mode is “jitc_de”, which uses DE for Java classes which already
have DE programs, and JIT for classes which do not. Notably, JDK classes are shipped with DE program
objects created, and will therefore use DE by default. Set the system property java.compilerto jitc
to force JIT to be used for all Java code in your application. (See InfoCenter for instructions about setting
Java system properties.)
Note that even when running with the JIT, the VM will have to create a Java program object (with
optimization level *INTERPRET) the first time a particular Java class is used on the system, if one does
not already exist. Creation of this program object is much faster than creating a full DE program, but it
may still make a noticeable difference in startup time the first time your application is used, particularly in
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 7 - Java Performance
129
Download from Www.Somanuals.com. All Manuals Search And Download.
applications with a large number of classes. Running CRTJVAPGM with OPTIMIZE(*INTERPRET)
will create this program ahead of time, making the first startup faster.
Garbage Collection
Java uses Garbage Collection (GC) to automatically manage memory by cleaning up objects and memory
when they are no longer in use. This eliminates certain types of memory leaks which can be caused by
application bugs for applications written in other languages. But GC does have some overhead as it
determines which objects can be collected. Tuning the garbage collector is often the simplest way to
improve performance for Java applications.
The Garbage Collector in the i5/OS Classic VM works differently from collectors in Java VMs on other
platforms, and therefore must be tuned differently. There are two parameters that can be used to tune GC:
GCHINL (-Xms) and GCHMAX (-Xmx). The JAVA/RUNJVA commands also include GCHPTY and
GCHFRQ, but these parameters are ignored and have no effect on performance.
The Garbage Collector runs asynchronously in one or more background threads. When a GC cycle is
triggered, the Garbage Collector will scan the entire Java heap, and mark each of the objects which can
still be accessed by the application. At the end of this “mark” phase, any objects which have not been
marked are no longer accessible by the application, and can be deleted. The Garbage Collector then
“sweeps” the heap, freeing the memory used by all of these inaccessible objects.
A GC cycle can be triggered in a few different ways. The three most common are:
1. An amount of memory exceeding the collection threshold value (GCHINL) has been allocated since
the previous GC cycle began.
2. The heap size has reached the maximum heap value (GCHMAX).
3. The application called java.lang.System.gc( ) [not recommended for most applications]
The collection threshold value (GCHINL or -Xms, often referred to as the “initial heap size”) is the most
important value to tune. The default size for V5R3 and later is 16 MB. Using larger values for this
parameter will allow the heap to grow larger, which means that GC will run less frequently, but each
cycle will take longer. Using smaller values will keep the heap smaller, but GC will run more often. The
best value depends on the number, size, and lifetime of objects in your application as well as the amount
of memory available to the application. Most applications will benefit from using a larger collection
threshold value – 96 MB is reasonable for many applications. For WebSphere applications on larger
systems, heap threshold values of 512 MB or more are not uncommon.
The maximum heap size (GCHMAX, or -Xmx) specifies the largest that the heap is allowed to grow. If
the heap reaches this size, a synchronous garbage collection will be performed. All other application
threads will have to wait while this GC cycle occurs, resulting in longer response times. If this
synchronous GC cycle is not able to free up enough memory for the application to continue, an
OutOfMemoryError will be thrown. The default value for this parameter is *NOMAX, meaning that
there is no limit to the heap size. In practice, a well behaved application will settle out to some steady
state heap size, so *NOMAX does not mean that the heap will grow infinitely large. Most applications
can leave this parameter at its default value.
One important consideration is to not allow the Java heap to grow beyond the amount of physical memory
available to the application. For example, if the application is running in the *BASE memory pool with a
size of 1 GB, and the heap grows to 1.5 GB, the paging rate will tend to get quite high, especially when a
GC cycle is running. This will show up as non-database page faults on the WRKSYSSTS command
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 7 - Java Performance
130
Download from Www.Somanuals.com. All Manuals Search And Download.
display; rates of 20 to 30 faults per second are usually acceptable, but larger values may indicate a
performance problem. In this case, the size of the memory pool should be increased, or the collection
threshold value (GCHINL or -Xms) should be decreased so the heap isn’t allowed to grow as large. In
many cases the scenario may be complicated by the fact that multiple applications may be running in the
same memory pool. Therefore, the total memory requirements of all of these applications must be
considered when setting the pool size. In some environments it may be useful to run key Java
applications in a private pool in order to have more control over the memory available to these
applications.
In some cases it may also be helpful to set the maximum heap size to be slightly larger than the memory
pool size. This will act as a safety net so that if the heap does grow beyond the memory pool size, it will
not cause high paging rates. In this case, the application will probably not be usable (due to the
synchronous garbage collection cycles and OutOfMemoryErrors that may occur), but it will have less
impact on any other applications running on the system.
A final consideration is the application’s use of objects. While the garbage collector will prevent certain
types of memory leaks, it is still possible for an application to have an “object leak”. One common
example is when the application adds new objects to a List or Map, but never removes the objects.
Therefore the List or Map continues to grow, and the heap size grows along with it. As this growth
continues, the garbage collector will begin taking longer to run each cycle, and eventually you may
exhaust the physical memory available to the application. In this case, the application should be modified
to remove the objects from the List or Map when they are no longer needed so the heap can remain at a
reasonable size. A similar example involves the use of caches inside the application. If these caches are
allowed to grow too large, they may consume more memory than is physically available on the system.
Using smaller cache sizes may improve the performance of your application.
Bytecode Verification
In order to maintain system stability and security, it is important that Java bytecodes are verified before
they are executed, to ensure that the bytecodes don’t try to do anything not allowed by the Java VM
specification. This verification is important for any Java implementation, but especially critical for server
environments, and perhaps even more so on i5/OS where the JVM is integrated into the operating system.
Therefore, in i5/OS, bytecode verification is not only turned on by default, but it is impossible to turn it
off. While the bytecode verification step isn’t especially slow, it can impact startup time in certain cases
– especially when compared to VMs on other platforms which may not do bytecode verification by
default. In most cases, full bytecode verification can be done just once for a class, and the resulting
JVAPGM objects saved with its corresponding class or jar file as long as the class doesn’t change.
However, when user classloaders are used to load classes, the VM may not be able to locate the file from
which the class was loaded (in particular, if the standard URLClassLoader mechanism is not being used
by the user classloader). In this case, the bytecode verification cache is used to minimize the cost of
bytecode verification.
In V5R3 and later the bytecode verification cache is enabled by default, and tuning is usually
unnecessary. In V5R2 and earlier releases the cache was disabled by default, and tuning was sometimes
necessary. The cache can be turned on by specifying a valid value (e.g.,
/QIBM/ProdData/Java400/QDefineClassCache.jar) for the os400.define.class.cache.filesystem
property. It may also be helpful to set os400.define.class.cache.maxpgmsto a value of around
20000, since the default of 5000 had been shown to be too small for many applications. In V5R3 and
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 7 - Java Performance
131
Download from Www.Somanuals.com. All Manuals Search And Download.
later releases the cache is enabled and the maxpgms set to 20000 by default, so no adjustment is usually
necessary.
The verification cache operates by caching JVAPGMs that have been dynamically created for
dynamically loaded classes. When the verification cache is not operating, these JVAPGMs are created as
temporary objects, and are deleted as the JVM shuts down. When the verification cache is enabled,
however, these JVAPGMs are created as persistent objects, and are cached in the (user specified)
machine-wide cache file. If the same (byte-for-byte identical) class is dynamically loaded a second time
(even after the machine is re-IPLed), the cached JVAPGM for that class is located in the cache and
reused, eliminating the need to verify the class and create a new JVAPGM (and eliminating the time and
performance impact that would be required for these actions). Older JVAPGMs are "aged out" of the
cache if they are not used within a given period of time (default is one week).
In general, the only cost of enabling the verification cache is a modest amount of disk space. If it turns out
that your application is not using one of the problem user class loaders, the cache will have no impact,
positive or negative, while if your application is using such a class loader then the time taken to create
and cache the persistent JVAPGM is only slightly more than the time required to create a temporary
JVAPGM. With next to zero downside risk, and a decent potential to improve performance, the
verification cache is well worth a try.
Maintenance is not a problem either: if the source for a cached JVAPGM is changed, the currently-cached
version will simply "age out" (since its class will no longer be a byte-for-byte match), and a new
JVAPGM will be silently created and cached. Likewise, the cache doesn't care about JDK versions, PTFs
installed, application upgrades, etc.
7.5 Determining Which JVM to Use
Beginning in V5R4, applications can run in either the Classic 64-bit VM or with IBM Technology for
Java (32-bit only in V5R4, 32-bit or 64-bit in V6R1). Both VM implementations provide a fully
compliant implementation of the Java specifications, and pure Java applications should be able to run
without changes in either VM by setting the JAVA_HOME environment variable appropriately. (See
InfoCenter for details on specifying which VM will be used to execute a Java program.) However, some
applications may have dependencies which will prevent them from working on one of the VM
implementations.
In general, applications should use 32-bit IBM Technology for Java when possible. Applications which
require larger heaps than can be managed with a 32-bit VM should use 64-bit IBM Technology for Java
(on V6R1). The Classic VM also remains available for cases where IBM Technology for Java is not
appropriate and to ease migration from older releases.
Some factors to consider include:
Functional Considerations
1. IBM Technology for Java was introduced in i5/OS V5R4M0. Older versions of OS/400 and i5/OS
support only the Classic VM.
2. IBM Technology for Java only supports Java 5.0 (JDK 1.5) and higher. Older versions of Java (1.4,
1.3, etc.) are not supported. While the Java versions are generally backward compatible, some
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 7 - Java Performance
132
Download from Www.Somanuals.com. All Manuals Search And Download.
libraries and environments may require a particular version. The Classic VM continues to support
JDK 1.3, 1.4, 1.5 (5.0), and 1.6 (6.0) in V5R4, and JDK 1.4, 1.5 (5.0), and 1.6 (6.0) in V6R1.
3. The Classic VM supported an i5/OS-specific feature called Adopted Authority. IBM Technology for
Java does not support this feature, so applications which require Adopted Authority must run in the
Classic VM. This will not affect most applications. Applications which do use Adopted Authority
should consider migrating to APIs in IBM Toolbox for Java which can serve a similar purpose.
4. Java applications can call native methods through the Java Native Interface (JNI) with either VM.
When using IBM Technology for Java, these native programs must be compiled with teraspace
storage enabled. In addition, whenever a buffer is passed to JNI functions such as
GetxxxArrayRegion, the pointer must point to teraspace storage.
5. When using 32-bit IBM Technology for Java runs in a 32-bit PASE environment, any PASE native
methods must also be 32-bit. With 64-bit IBM Technology for Java, PASE native methods must be
64-bit. The Classic VM can call both 32-bit and 64-bit PASE native methods. All of the VMs can
call ILE native methods as well.
Performance Considerations
1. When properly tuned, applications will tend to use significantly less memory when running in IBM
Technology for Java than in the Classic VM. Performance tests have shown a reduction of 40% or
more in the Java heap for most applications when using the 32-bit IBM Technology for Java VM,
primarily because object references are stored with only 4 bytes (32 bits) rather than 8 bytes (64 bits).
Therefore, an application using 512 MB of heap space in the 64-bit Classic VM might require 300
MB or even less when running in 32-bit IBM Technology for Java. The difference between the
Classic VM and 64-bit IBM Technology for Java is somewhat less noticable, but 64-bit IBM
Technology for Java will still tend to have a smaller footprint than Classic for most applications.
2. The downside to using a 32-bit address space is that it limits the amount of memory available to the
application. As discussed above, the 32-bit VM has a maximum heap size of 3328 MB, although
most applications will have a practical limit of 3 GB or less. Applications which require a larger heap
should use 64-bit IBM Technology for Java or the Classic VM. Since applications will use less
memory when running in the 32-bit VM, this means that applications which require up to about 5 GB
in the Classic VM will probably be able to run in the 32-bit VM. Of course, applications with heap
requirements near the 3 GB limit will require extra testing to ensure that they are able to run properly
under full load over an extended period of time.
3. Applications which use a single VM to fully utilize large systems (especially 8-way and above) will
tend to require larger heap sizes, and therefore may not be able to use the 32-bit VM. In some cases it
may be possible to divide the work across two or more VMs. Otherwise, it may be necessary to use
one of the 64-bit VMs on large systems to allow larger heap sizes.
4. Because calls to native ILE code are more expensive in IBM Technology for Java, extra care should
be taken when moving Java applications which make heavy use of native ILE code to the new VM.
Performance testing should be performed to determine whether or not the overhead of the native ILE
calls are hurting performance in your application. If this is an issue, the techniques discussed above
should be used to attempt to improve the performance. If the performance is still unacceptable, it
may be best to continue using the Classic VM at this time. Conversely, applications which make use
of i5/OS PASE native methods may see a performance improvement when running in IBM
Technology for Java due to the reduced overhead of calling i5/OS PASE methods.
5. Remember that microbenchmarks (small tests to exercise a specific function) do not provide a good
measure of performance. Comparisons between the IBM Technology for Java and Classic based on
microbenchmarks will not give an accurate picture of how your application will perform in the two
VMs, because your application will have different characteristics than the microbenchmark. The best
way to determine which VM provides the best performance for your application is to test with the
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 7 - Java Performance
133
Download from Www.Somanuals.com. All Manuals Search And Download.
application itself or a reasonably complete subset of the application, using a load generating tool to
simulate a load representative of your planned deployment environment.
WebSphere applications running with IBM Technology for Java will be subject to the same constraints as
plain Java applications; however, there are some considerations which are specific to WebSphere, as
described in Chapter 6 (Web Server and WebSphere Performance).
7.6 Capacity Planning
Due to the wide variety of Java applications which can be developed, it is impossible to make precise
capacity planning recommendations which would apply to all applications. It is possible, however, to
make some general statements which will apply to most applications. Determining specific system
requirements for a particular application requires performance testing with that application. The
Workload Estimator can also be used to assist with capacity planning for specific environments, such as
WebSphere Application Server or WebSphere Commerce applications.
Despite substantial progress at the language execution level, Java continues to require, on average,
processors with substantially higher capabilities than the same machine primarily running RPG and
COBOL. This is partially due to the overhead of using an object oriented, garbage collected language.
But perhaps more important is that Java applications tend to do more than their counterparts written in
more traditional languages. For example, Java applications frequently include more network access and
data transformation (like XML) than the RPG and COBOL applications they replace. Java applications
also typically use JDBC with SQL to access the database, while traditional iSeries applications tend to use
less expensive data access methods like Record Level Access. Therefore, Java applications will continue
to require more processor cycles than applications with “similar” functionality written in RPG or
COBOL.
As a result, some models at the low end may be suitable for traditional applications, but will not provide
acceptable performance for applications written in Java.
General Guidelines
y
y
y
Remember to account for non-Java work on the system. Few System i servers are used for a single
application; most will have a combination of Java and non-Java applications running on the system.
Be sure to factor in capacity requirements for both the Java and the non-Java applications which will
run on the system. The eServer Workload Estimator can be used to estimate system requirements for
a variety of application types.
Similarly, be sure to consider additional system services which will be used when adding a new Java
application to the system. Most Java applications will make use of system services like network
communications and database, which may require additional system resources. In particular, the use
of JDBC and dynamic SQL can increase the cost of database access from Java compared to traditional
applications with similar function.
Also consider which applications on the system are likely to experience future growth, and adjust the
system requirements accordingly. For example, if a Java/WebSphere application is used as the core
of an e-business application, then it may see significantly more growth (requiring additional system
resources) over time or during particular times of the year than other applications on the system.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 7 - Java Performance
134
Download from Www.Somanuals.com. All Manuals Search And Download.
y
Beware of misleading benchmarks. Many benchmarks are available to test Java performance, but
most of these are not good predictors of server-side Java performance. Some of these benchmarks are
single-threaded, or run for a very short period of time. Others will stress certain components of the
JVM heavily, while avoiding other functionality that is more typical of real applications. Even the
best benchmarks will exercise the JVM differently than real applications with real data. This doesn’t
mean that benchmarks aren’t useful; however, results from these benchmarks must be interpreted
carefully.
y
y
5250 OLTP isn’t needed for Java applications, although some Java applications will execute 5250
operations that do require 5250 OLTP. Again, be sure to account for non-Java workloads on the
system that do require 5250 OLTP.
Java applications are inherently multi-threaded. Even if the application itself runs in a single thread,
VM functionality like Garbage Collection and asynchronous JIT compilation will run in separate
threads. As a result, Java will tend to benefit from processors which support Simultaneous
Multi-threading (SMT). See Chapter 20 for additional information on SMT. Java applications may
also benefit more from systems with multiple processors than single-threaded traditional applications,
as multiple application threads can be running in parallel.
y
y
Java tends to require more main storage (memory) than other languages, especially when using the
Classic VM. The 64-bit VMs (both Classic and IBM Technology for Java) will also tend to require
more memory than is needed by 32-bit VMs on other platforms.
Along the same lines, Java applications generally benefit more from L3 cache than applications in
other languages. Therefore, Java performance may scale better than CPW ratings would indicate
when moving from a system with no L3 cache to a system that does have L3 cache. Conversely, Java
performance on a system without L3 cache may be worse than the CPW rating suggests. See
Appendix C of this document for information on which systems include L3 cache.
y
DASD (hard disk) requirements typically don’t change much for Java applications compared to
applications written in languages like RPG. The biggest use of DASD is usually database, and
database sizes do not inherently change when running Java.
7.7 Java Performance – Tips and Techniques
Introduction
Tips and techniques for Java fall into several basic categories:
1. i5/OS Specific. These should be checked out first to ensure you are getting all you should be from
your i5/OS Java application.
2. Classic VM Specific. Many i5/OS-specific tips apply only when using the Classic VM and not for
IBM Technology for Java.
3. Java Language Specific. Coding tips that will ordinarily improve any Java application, or especially
improve it on i5/OS.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 7 - Java Performance
135
Download from Www.Somanuals.com. All Manuals Search And Download.
4. Database Specific. Use of database can invoke significant path length in i5/OS. Invoking it
efficiently can maximize the performance and value of a Java application.
i5/OS Specific Java Tips and Techniques
y
y
Load the latest CUM package and PTFs
To be sure that you have the best performing code, be sure to load the latest CUM packages and PTFs
for all products that you are using. In particular, performance improvements are often introduced in
new Java Group PTFs (SF99269 for V5R3, SF99291 for V5R4, and SF99562 for V6R1).
Explore the General Performance Tips and Techniques in Chapter 20
Some of the discussion in that chapter will apply to Java. Pay particular attention to the discussion
"Adjusting Your Performance Tuning for Threads." Specifically, ensure that MAXACT is set high
enough to allow all Java threads to run.
y
y
Consider running Java applications in a separate memory pool
On systems running multiple workloads simultaneously, putting Java applications in their own pool
will ensure that the Java applications have enough memory allocated to them.
Make sure SMT is enabled on systems that support it
Java applications are always multi-threaded, and should benefit from Simultaneous Multi-threading
(SMT). Ensure that it is turned on by setting the system value QPRCMLTTSK to 1 (On). See
chapter 20 for additional details on SMT.
y
Avoid starting new Java VMs frequently
Starting a new VM (e.g. through the JAVA/RUNJVA commands) is expensive on any platform, but
perhaps a bit more so on i5/OS, due to the relatively high cost of starting a new job. Other factors
which make Java startup slow include class loading, bytecode verification, and JIT compilation. As a
result, it is far better to use long-running Java programs rather than frequently starting new VMs. If
you need to invoke Java frequently from non-Java programs, consider passing messages through an
i5/OS Data Queue. The ToolBox Data Queue classes may be used to implement "hot" VM's.
Classic VM-specific Tips
y
Use java.compiler=jitc
The JIT compiler now outperforms Direct Execution for nearly all applications. Therefore,
java.compiler=jitc should be used for most Java applications. One possible exception is when startup
time is a critical issue, but JIT may be appropriate even in these cases. Setting java.compileris
not necessary for Classic on V6R1, or for IBM Technology for Java on either V5R4 or V6R1 -- the
JIT compiler is always used in these cases.
y
Delete existing DE program objects
When using the JIT, JVAPGM objects containing Direct Execution machine code are not used.
These program objects can be large, so removing the unused JVAPGM objects can free up disk space.
This is not needed on V6R1. To determine if your class/zip/jar file has a permanent, hidden program
object on previous releases, use the DSPJVAPGM command. If a Java program is associated with the
file, and the “Optimization” level is something other than *INTERPRET, use DLTJVAPGM to delete
the hidden program. DLTJVAPGM does not affect the jar or zip file itself; only the hidden program.
Do not use DLTJVAPGM on IBM-shipped JDK jar files (such as rt.jar). As explained earlier, the JIT
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 7 - Java Performance
136
Download from Www.Somanuals.com. All Manuals Search And Download.
does take advantage of programs created at optimization *INTERPRET. These programs require
significantly less space and do not need to be deleted. Program objects (even at *INTERPRET) are
not used by IBM Technology for Java.
y
Consider the special property os400.jit.mmi.threshold.
This property sets the threshold for the MMI of the JIT. Setting this to a small value will result is
compilation of the classes at startup time and will increase the start up time. In addition, using a very
small value (less than 50) may result in a slower compiled version, since profiling data gathered
during the interpreted phase may not be representative of the actual application characteristics.
Setting this to a high value may result in a somewhat faster startup time and compilation of the
classes will occur once the threshold is reached. However, if the value is set too high then an
increased warm-up time may occur since it will take additional time for the classes to be optimized by
the JIT compiler.
The default value of 2000 is usually OK for most scenarios. This property has no effect when using
IBM Technology for Java.
y
Package your Java application as a .jar or .zip file.
Packaging multiple classes in one .zip or .jar file should improve class loading time and also code
optimization when using Direct Execution (DE). Within a .zip or .class file, i5/OS Java will attempt
to in-line code from other members of the .zip or .jar file.
Java Language Performance Tips
Due to advances in JIT technology, many common code optimizations which were critical for
performance a few years ago are no longer as necessary in modern JVMs. Even today, these techniques
will not hurt performance. But they may not make a big positive difference either. When making these
types of optimizations, care should be taken to balance the need for performance with other factors such
as code readability and the ease of future maintenance. It is also important to remember that the majority
of the application’s CPU time will be spent in a small amount of code. CPU profiling should be used to
identify these “hot spots”, and optimizations should be focused on these sections of code.
Various Java code optimizations are well documented. Some of the more common optimizations are
described below:
y
Minimize object creation
Excessive object creation is a common cause of poor application performance. In addition to the cost
of allocating memory for the new object and invoking its constructor, the new object will use space in
the Java heap, which will result in longer garbage collection cycles. Of course, object creation cannot
be avoided, but it can be minimized in key areas.
The most important areas to look at for reducing object creation is inside loops and other
commonly-executed code paths. Some common causes of object creation include:
y
y
String.substring( ) creates a new String object.
The arithmetic methods in java.math.BigDecimal (add, divide, etc) create a new BigDecimal
object.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 7 - Java Performance
137
Download from Www.Somanuals.com. All Manuals Search And Download.
y
y
The I/O method readLine( ) (e.g. in java.io.BufferedReader) will create a new String.
String concatenation (e.g.: “The value is: “ + value) will generally result in creation of a
StringBuffer, a String, and a character array.
y
Putting primitive values (like int or long) into a collection (like List or Map) requires wrapping it
in a new object (e.g. Java.lang.Integer). This is usually obvious in the code, but Java 5.0
introduced the concept of autoboxing which will perform this wrapping automatically, hiding the
object creation from the programmer.
Some objects, like StringBuffer, provide a way to reset the object to its initial state, which can be
useful for avoiding object creation, especially inside loops. For StringBuffer, this can be done by
calling setLength(0).
y
Minimize synchronized methods
Synchronized methods/blocks can have significantly more overhead than non-synchronized code.
This includes some overhead in acquiring locks, flushing caches to correctly implement the Java
memory model, and contention on locks when multiple threads are trying to hold the same lock at the
same time. From a performance standpoint, it is best if synchronized code can be avoided. However,
it is important to remember that improperly synchronized code can cause functional or data-integrity
issues; some of these issues may be difficult to debug since they may only occur under heavy load.
As a result, it is important to ensure that changes to synchronization are “safe”. In many cases,
removing synchronization from code may require design changes in the application.
Some common synchronization patterns are easily illustrated with Java’s built-in String classes. Most
other Java classes (including user-written classes) will follow one of these patterns. Each has
different performance characteristics.
y
java.lang.String is an immutable object – once constructed, it cannot be changed. As a result, it is
inherently thread-safe and does not require synchronization. However, since Strings cannot be
modified, operations which require a modified String (like String.substring()) will have to create
a new String, resulting in more object creation.
y
y
java.lang.StringBuffer is a mutable object which can change after it is constructed. In order to
make it thread-safe, nearly all methods in the class (including some which do not modify the
StringBuffer) are synchronized.
java.lang.StringBuilder (introduced in Java 5.0) is an unsynchronized version of StringBuffer.
Because its methods are not synchronized, this class is not thread-safe, so StringBuilder instances
can not be shared between threads without external synchronization.
Dealing with synchronization correctly requires a good understanding of Java and your application,
so be careful about applying this tip.
y
Use exceptions only for “exceptional” conditions
The “try” block of an exception handler carries little overhead. However, there is significant
overhead when an exception is actually thrown and caught. Therefore, you should use exceptions
only for “exceptional” conditions; that is, for conditions that are not likely to happen during normal
execution. For example, consider the following procedure:
public void badPrintArray (int arr[]) {
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 7 - Java Performance
138
Download from Www.Somanuals.com. All Manuals Search And Download.
int i = 0;
try {
while (true) {
System.out.println (arr[i++]);
}
} catch (ArrayOutOfBoundsException e) {
// Reached the end of the array....exit
}
}
Instead, the above procedure should be written as:
public void goodPrintArray (int arr[]) {
int len = arr.length;
for (int i = 0; i < len; i++) {
System.out.println (arr[i]);
}
}
In the “bad” version of this code, an exception will always be thrown (and caught) in every execution
of the method. In the “good” version, most calls to the method will not result in an exception.
However, if you passed “null” to the method, it would throw a NullPointerException. Since this is
probably not something that would normally happen, an exception may be appropriate in this case.
(On the other hand, if you expect that null will be passed to this method frequently, it may be a good
idea to handle it specifically rather than throwing an exception.)
y
Use static final when creating constants
When data is invariant, declare it as static final. For example here are two array initializations:
class test1 {
int myarray[] =
{ 1,2,3,4,5,6,7,8,9,10,
2,3,4,5,6,7,8,9,10,11,
3,4,5,6,7,8,9,10,11,12,
4,5,6,7,8,9,10,11,12,13,
5,6,7,8,9,10,11,12,13,14 };
}
class test2 {
static final int myarray2[] =
{ 1,2,3,4,5,6,7,8,9,10,
2,3,4,5,6,7,8,9,10,11,
3,4,5,6,7,8,9,10,11,12,
4,5,6,7,8,9,10,11,12,13,
5,6,7,8,9,10,11,12,13,14 };
}
Since the array myarray2 in class test2 is defined as static, there is only one myarray2 array for all the
many creations of the test2 object. In the case of the test1 class, there is an array myarray for each
test1 instance. The use of final ensures that the array cannot be changed, making it safe to use from
multiple threads.
Java i5/OS Database Access Tips
y
Use the native JDBC driver
There are two i5/OS JDBC drivers that may be used to access local data: the Native driver (using a
JDBC URL "jdbc:db2:system-name") and the Toolbox driver (with a JDBC URL
"jdbc:as400:system-name"). The native JDBC driver is optimized for local database access, and
gives the best performance when accessing the database on the same system as your Java
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 7 - Java Performance
139
Download from Www.Somanuals.com. All Manuals Search And Download.
applications. The Toolbox driver supports remote access, and should be used when accessing the
database on a separate system. This recommendation is true for both the 64-bit Classic VM and the
new 32-bit VM.
y
y
Pool Database Connections
Connection pooling is a technique for sharing a small number of database connections among a
number of threads. Rather than each thread opening a connection to the database, executing some
requests, and then closing the connection, a connection can be obtained from the connection pool,
used, and then returned to the pool. This eliminates much of the overhead in establishing a new
JDBC connection. WebSphere Application Server uses built-in connection pooling when getting a
JDBC connection from a DataSource.
Use Prepared Statements
The JDBC prepareStatement method should be used for repeatable executeQuery or executeUpdate
methods. If prepareStatement, which generates a reusable PreparedStatement object, is not used, the
execute statement will implicitly re-do this work on every execute or executeQuery, even if the query
is identical. WebSphere’s DataSource will automatically cache your PreparedStatements, so you
don’t have to keep a reference to them – when WebSphere sees that you are attempting to prepare a
statement that it has already prepared, it will give you a reference to the already prepared statement,
rather than creating a new one. In non-WebSphere applications, it may be necessary to explicitly
cache PreparedStatement objects.
When using PreparedStatements, be sure to use parameter markers for variable data, rather than
dynamically building query strings with literal data. This will enable reuse of the PreparedStatement
with new parameter values.
Avoid placing the prepareStatement inside of loops (e.g. just before the execute). In some non-i5/OS
environments, this just-before-the-query coding practice is common for non-Java languages, which
required a "prepare" function for any SQL statement. Programmers may carry this practice over to
Java. However, in many cases, the prepareStatement contents don't change (this includes parameter
markers) and the Java code will run faster on all platforms if it is executed only one time, instead of
once per loop. This technique may show a greater improvement on i5/OS.
y
y
Store or at least fetch numeric data in DB2 as double
Fixed-precision decimal data cannot be represented in Java as a primitive type. When accessing
numeric and decimal fields from the database through JDBC, values can be retrieved using
getDouble( ) or getBigDecimal( ). The latter method will create a new java.math.BigDecimal object
each time it is called. Using getDouble (which returns a primitive double) will give better
performance, and should be preferred when floating-point values are appropriate for your application
(i.e. for most applications outside the financial industry).
Consider using ToolBox record I/O
The IBM Toolbox for Java provides native record level access classes. These classes are specific to
the i5/OS platform. They may provide a significant performance gain over the use of JDBC access
for applications where portability to other databases is not required. See the AS400File object under
Record Level access in the InfoCenter.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 7 - Java Performance
140
Download from Www.Somanuals.com. All Manuals Search And Download.
Resources
The i5/OS Java and WebSphere performance team maintains a list of performance-related documents at
The Java Diagnostics Guide provides detailed information on performance tuning and analysis when
using IBM Technology for Java. Most of the document applies to all platforms using IBM’s Java VM; in
addition, one chapter is written specifically for i5/OS information. The Diagnostics Guide is available at
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 7 - Java Performance
141
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 8. Cryptography Performance
With an increasing demand for security in today’s information society, cryptography enables us to
encrypt the communication and storage of secret or confidential data. This also requires data integrity,
authentication and transaction non-repudiation. Together, cryptographic algorithms, shared/symmetric
keys and public/private keys provide the mechanisms to support all of these requirements. This chapter
focuses on the way that System i cryptographic solutions improve the performance of secure e-Business
transactions.
There are many factors that affect System i performance in a cryptographic environment. This chapter
discusses some of the common factors and offers guidance on how to achieve the best possible
performance. Much of the information in this chapter was obtained as a result of analysis experience
within the Rochester development laboratory. Many of the performance claims are based on supporting
performance measurement and other performance workloads. In some cases, the actual performance data
is included here to reinforce the performance claims and to demonstrate capacity characteristics.
Cryptography Performance Highlights for i5/OS V5R4M0:
y
y
y
Support for the 4764 Cryptographic Coprocessor is added. This adapter provides both cryptographic
coprocessor and secure-key cryptographic accelerator function in a single PCI-X card.
5722-AC3 Cryptographic Access Provider withdrawn. This product is no longer required to enable
data encryption.
Cryptographic Services API function added. Key management function has been added, which helps
you securely store and handle cryptographic keys.
8.1 System i Cryptographic Solutions
On a System i, cryptographic solutions are based on software and hardware Cryptographic Service
Providers (CSP). These solutions include services required for Network Authentication Service,
SSL/TLS, VPN/IPSec, LDAP and SQL.
IBM Software Solutions
The software solutions are either part of the i5/OS Licensed Internal Code or the Java Cryptography
Extension (JCE).
IBM Hardware Solutions
One of the hardware based cryptographic offload solutions for the System i is the IBM 4764 PCI-X
Cryptography Coprocessor (Feature Code 4806). This solution will offload portions of cryptographic
processing from the host CPU. The host CPU issues requests to the coprocessor hardware. The hardware
then executes the cryptographic function and returns the results to the host CPU. Because this hardware
based solution handles selected compute-intensive functions, the host CPU is available to support other
system activity. SSL/TLS network communications can use these options to dramatically offload
cryptographic processing related to establishing an SSL/TLS session.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 8 Cryptography Performance
142
Download from Www.Somanuals.com. All Manuals Search And Download.
CSP API Sets
User applications can utilize cryptographic services indirectly via i5/OS functions (SSL/TLS, VPN IPSec)
or directly via the following APIs:
y
y
y
y
The Common Cryptographic Architecture (CCA) API set is provided for running cryptographic
operations on a Cryptographic Coprocessor.
The i5/OS Cryptographic Services API set is provided for running cryptographic operations within
the Licensed Internal Code.
Java Cryptography Extension (JCE) is a standard extension to the Java Software Development Kit
(JDK).
GSS (Generic Security Services), Java GSS, and Kerberos APIs are part of the Network
Authentication Service that provides authentication and security services. These services include
session level encryption capability.
y
y
i5/OS SSL and JSSE support the Secure Sockets Layer Protocol. APIs provide session level
encryption capability.
Structured Query Language is used to access or modify information in a database. SQL supports
encryption/decryption of database fields.
8.2 Cryptography Performance Test Environment
All measurements were completed on an IBM System i5 570+ 8-Way (2.2 GHz). The system is
configured as an LPAR, and each test was performed on a single partition with one dedicated CPU. The
partition was solely dedicated to run each test. The IBM 4764 PCI-X Cryptographic Coprocessor card is
installed in a PCI-X slot.
This System i model is a POWER5 hardware system, which provides Simultaneous Multi-Threading. The
tools used to obtain this data are in some cases only single threaded (single instruction stream)
applications, which don’t take advantage of the performance benefits of SMT. See section 8.6 for
additional information.
Cryptperf is an IBM internal use primative-level cryptographic function test driver used to explore and
measure System i cryptographic performance. It supports parameterized calls to various i5/OS CSPs. See
section 8.6 for additional information.
Cipher: Measures the performance of either symmetric or asymmetric key encrypt depending on
algorithm selected.
Digest: Measures the performance of hash functions.
Sign: Measures the performance of hash with private key encrypt .
Pin: Measures encrypted PIN verify using the IBM 3624 PIN format with the IBM 3624 PIN
calculation method.
All i5/OS and JCE test cases run at a near 100% CPU utilization. The test cases that use the
Cryptographic Coprocessor will offload all cryptographic functions, so that CPU utilization is negligible.
The relative performance and recommendations found in this chapter are similar for other models, but the
data presented here is not representative of a specific customer environment. Cryptographic functions are
very CPU intensive and scale easily. Adding or removing CPU’s to an environment will change
performance, so results in other environments may vary significantly.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 8 Cryptography Performance
143
Download from Www.Somanuals.com. All Manuals Search And Download.
8.3 Software Cryptographic API Performance
This section provides performance information for System i systems using the following cryptographic
services; i5/OS Cryptographic Services API and IBM JCE 1.2.1, an extension of JDK 1.4.2.
Cryptographic performance is an important aspect of capacity planning, particularly for applications using
secure network communications. The information in this section may be used to assist in capacity
planning for this complex environment.
Measurement Results
The cryptographic performance measurements in the following three tables were made using i5/OS
Cryptographic Services API and Java Cryptography Extension.
Table 8.1
Cipher Encrypt Performance
Transaction
Length
(Bytes)
1024
i5/OS
(Transactions/
Second)
11,276
15,402
5,039
87
JCE
(Transactions
/Second)
15,537
19,768
5,997
93
Encryption
Algorithm
Key Length
(Bits)
i5/OS (Bytes/
Second)
JCE (Bytes/
Threads
Second)
DES
DES
1
10
1
56
56
11,547,058
15,771,656
5,159,756
5,710,925
6,783,658
7,139,814
248,224,207
266,579,889
27,275,585
96,930,853
24,601,428
72,782,397
31,137,523
110,892,831
28,005,446
82,392,038
n/a
15,909,515
20,241,955
6,140,893
6,086,464
7,697,917
7,657,551
32,704,635
54,321,919
28,784,259
28,080,038
23,313,526
22,614,607
35,754,190
34,350,709
27,824,575
27,183,773
n/a
1024
1024
Triple DES
Triple DES
Triple DES
Triple DES
RC4
112
112
112
112
128
128
128
128
256
256
128
128
256
256
1024
2048
1024
2048
1
65536
1024
65536
262144
262144
1024
65536
1024
65536
1024
65536
1024
65536
100
100
100
10
10
1
10
1
1
1
1
10
10
10
10
1
1
10
10
6,625
109
947
7,517
117
125
207
28,110
428
22,767
345
34,916
524
27,172
415
RC4
AES
AES
AES
AES
AES
AES
AES
AES
RSA
RSA
RSA
1,017
26,636
1,479
24,025
1,111
30,408
1,692
27,349
1,257
897
197
30
246
35
128
1,187
165
n/a
n/a
n/a
n/a
n/a
n/a
RSA
100
Notes:
y
See section 8.2 for Test Environment Information
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 8 Cryptography Performance
144
Download from Www.Somanuals.com. All Manuals Search And Download.
Table 8.2
Signing Performance
RSA Key Length
Encryption
i5/OS
JCE
Threads
Algorithm
SHA-1 / RSA
SHA-1 / RSA
SHA-1 / RSA
SHA-1 / RSA
(Bits)
1024
1024
2048
2048
(Transactions/Second)
(Transactions/Second)
1
10
1
901
1,155
129
197
240
30
10
163
35
Notes:
y
y
Transaction Length set at 1024 bytes
See section 8.2 for Test Environment Information
Table 8.3
Digest Performance
i5/OS
(Transactions/
Second)
6,753
JCE
(Transactions/
Second)
Encryption
Algorithm
i5/OS
(Bytes/ Second)
JCE
(Bytes/Second)
Threads
SHA-1
SHA-1
1
10
1
10
1
10
1
10
110,642,896
178,172,751
63,645,228
73,086,411
115,505,548
132,301,878
115,201,800
132,059,807
2,295
37,608,172
48,401,773
33,576,523
39,184,923
65,865,327
75,925,668
69,098,731
78,659,561
10,875
3,885
4,461
7,050
8,075
2,954
2,049
2,392
4,020
4,634
4,217
4,801
SHA-256
SHA-256
SHA-384
SHA-384
SHA-512
SHA-512
7,031
8,060
Notes:
y
y
y
Key Length set at 1024 bits
Transaction Length set at 16384 bytes
See section 8.2 for Test Environment Information
8.4 Hardware Cryptographic API Performance
This section provides information on the hardware based cryptographic offload solution IBM 4764
PCI-X Cryptography Coprocessor (Feature Code 4806). This solution will improve the system CPU
capacity by offloading CPU demanding cryptographic functions.
IBM Common Name
System i hardware feature code
Applications
IBM 4764 PCI-X Cryptographic Coprocessor
#4806
Banking/finance (B/F)
Secure accelerator (SSL)
Secure hardware module
No IOP Required
Cryptographic Key Protection
Required Hardware
IBM System i5
Platform Support
The 4764 Cryptographic Coprocessor provides both cryptographic coprocessor and secure-key
cryptographic accelerator functions in a single PCI-X card. The coprocessor functions are targeted to
banking and finance applications. The secure-key accelerator functions are targeted to improving the
performance of SSL (secure socket layer) and TLS (transport layer security) based transactions. The 4764
Cryptographic Coprocessor supports secure storage of cryptographic keys in a tamper-resistant module,
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 8 Cryptography Performance
145
Download from Www.Somanuals.com. All Manuals Search And Download.
which is designed to meet FIPS 140-2 Level 4 security requirements. This new cryptographic card offers
the security and performance required to support e-Business and emerging digital signature applications.
For banking and finance applications the 4764 Cryptographic Coprocessor delivers improved
performance for T-DES, RSA, and financial PIN processing. IBM CCA (Common Cryptographic
Architecture) APIs are provided to enable finance and other specialized applications to access the services
of the coprocessor. For banking and finance applications the 4764 Coprocessor is a replacement for the
4758-023 Cryptographic Coprocessor (feature code 4801).
The 4764 Cryptographic Coprocessor can also be used to improve the performance of
high-transaction-rate secure applications that use the SSL and TLS protocols. These protocols are used
between server and client applications over a public network like the Internet, when private information is
being transmitted in the case of Consumer-to-Business transactions (for example, a web transaction with
payment information containing credit card numbers) or Business-to-Business transactions. SSL/TLS is
the predominant method for securing web transactions. Establishing SSL/TLS secure web connections
requires very compute intensive cryptographic processing. The 4764 Cryptographic Coprocessor
off-loads cryptographic RSA processing associated with the establishment of a SSL/TLS session, thus
freeing the server for other processing. For cryptographic accelerator applications the 4764 Cryptographic
Coprocessor is a replacement for the 2058 Cryptographic Accelerator (feature code 4805).
Cryptographic performance is an important aspect of capacity planning, particularly for applications using
SSL/TLS network communications. Besides host processing capacity, the impact of one or more
Cryptographic Coprocessors must be considered. Adding a Cryptographic Coprocessor to your
environment can often be more beneficial then adding a CPU. The information in this chapter may be
used to assist in capacity planning for this complex environment.
Measurement Results
The following three tables display the cryptographic test cases that use the Common Cryptographic
Architecture (CCA) interface to measure transactions per second for a variety of 4764 Cryptographic
Coprocessor functions.
Table 8.4
Cipher Encrypt Performance
CCA CSP
Encryption
Algorithm
DES
Key Length Transaction Length
4764
4764
(Bytes/second)
Threads
(Bits)
56
56
112
112
112
112
1024
2048
1024
2048
(Bytes)
1024
1024
1024
65536
1024
65536
100
(Transactions/second)
1
10
1
1,026
1,053
1,002
110
1,021
123
796
307
1,044
462
1,050,283
DES
1,078,458
1,025,798
7,191,327
1,045,535
8,035,164
n/a
Triple DES
Triple DES
Triple DES
Triple DES
RSA
1
10
10
1
1
10
10
RSA
RSA
RSA
100
100
100
n/a
n/a
n/a
Notes:
y
y
See section 8.2 for Test Environment information
AES is not supported by the IBM 4764 Cryptographic Coprocessor
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 8 Cryptography Performance
146
Download from Www.Somanuals.com. All Manuals Search And Download.
Table 8.5
Signing Performance
CCA CSP
Encryption
RSA Key Length
4764
Threads
Algorithm
SHA-1 / RSA
SHA-1 / RSA
SHA-1 / RSA
SHA-1 / RSA
(Bits)
1024
1024
2048
2048
(Transactions/second)
1
10
1
794
1,074
308
10
465
Notes:
y
y
Transaction Length set at 1024 bytes
See section 8.2 for Test Environment information
Table 8.6
Financial PINs Performance
CCA CSP
4764
Threads
Total Repetitions
(Transactions/second)
1
10
10000
100000
945
966
Notes:
y
See section 8.2 for Test Environment information
8.5 Cryptography Observations, Tips and Recommendations
y
y
y
The IBM Systems Workload Estimator, described in Chapter 23, reflects the performance of real user
applications while averaging the impact of the differences between the various communications
protocols. The real world perspective offered by the Workload Estimator may be valuable in some
cases
SSL/TLS client authentication requested by the server is quite expensive in terms of CPU and should
be requested only when needed. Client authentication full handshakes use two to three times the CPU
resource of server-only authentication. RSA authentication requests can be offloaded to an IBM 4764
Cryptographic Coprocessor.
With the use of Collection Services you can count the SSL/TLS handshake operations. This
capability allows you to better understand the performance impact of secure communications traffic.
Use this tool to count how many full versus cached handshakes per second are being serviced by the
server. Start the Collection Services with the default “Standard plus protocol”. When the collection is
done you can find the SSL/TLS information in the QAPMJOBMI database file in the fields JBASH
(full) and JBFSHA (cached) for server authentications or JBFSHA (full) and JBASHA (cached) for
server and client authentications. Accumulate the full handshake numbers for all jobs and you will
have a good method to determine the need for a 4764 Cryptographic Coprocessor. Information about
Collection Services can be found at the System i Information Center. See section 8.6 for additional
information.
•
Symmetric key encryption and signing performance improves significantly when multithreaded.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 8 Cryptography Performance
147
Download from Www.Somanuals.com. All Manuals Search And Download.
y
Supported number of 4764 Cryptographic Coprocessors:
Table 8.8
server models
Maximum per server
Maximum per partition
IBM System i5 570 8/12/16W, 595
IBM System i5 520, 550, 570 2/4W
32
8
8
8
y
y
Applications requiring a FIPS 140-2 Level 4 certified, tamper resistant module for storing
cryptographic keys should use the IBM 4764 Cryptographic Coprocessor.
Cryptographic functions demand a lot of a system CPU, but the performance does scale well when
you add a CPU to your system. If your CPU handles a large number of cryptographic requests,
offloading them to an IBM 4764 Cryptographic Coprocessor might be more beneficial then adding a
new CPU.
8.6 Additional Information
Extensive information about using System i Cryptographic functions may be found under “Security” and
“Networking Security” at the System i Information Center web site at:
IBM Security and Privacy specialists work with customers to assess, plan, design, implement and manage
a security-rich environment for your online applications and transactions. These Security, Privacy,
Wireless Security and PKI services are intended to help customers build trusted electronic relationships
with employees, customers and business partners. These general IBM security services are described at:
System i Security White Paper, “Security is fundamental to the success of doing e-business” is available
at:
IBM Global Services provides a variety of Security Services for customers and Business Partners. Their
Links to other Cryptographic Coprocessor documents including custom programming information can be
Other performance information can be found at the System i Performance Management website at:
More details about POWER5 and SMT can be found in the document Simultaneous Multi-Threading
(SMT) on eServer iSeries POWER5 Processors at:
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 8 Cryptography Performance
148
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 9. iSeries NetServer File Serving Performance
This chapter will focus on iSeries NetServer File Serving Performance.
9.1 iSeries NetServer File Serving Performance
iSeries Support for Windows Network Neighborhood (iSeries NetServer) supports the Server Message
Block (SMB) protocol through the use of Transmission Control Protocol/Internet Protocol (TCP/IP) on
iSeries. This communication allows clients to access iSeries shared directory paths and shared output
queues. PC clients on the network utilize the file and print-sharing functions that are included in their
operating systems. iSeries NetServer properties and the properties of iSeries NetServer file shares and
print shares are configured with iSeries Navigator.
Clients can use iSeries NetServer support to install Client Access from the iSeries since the clients use
function that is included in their operating system. See:
iSeries NetServer.
In V5R4, enhancements were made help optimize the performance of the iSeries NetServer, increasing
throughput and reducing client response time. The optimizations allow access to thread safe file systems
in the integrated file system from a new multithreaded file serving job. In addition, other optimizations
have been added and are used when accessing/using files in the “root” (/), QOpenSys, and user-defined
file systems (UDFS). See the iSeries NetServer articles in the iSeries Information Center for more
information.
iSeries NetServer Performance
Server
iSeries partition with 2 dedicated processors having equivalent CPW of 2400.
16384 MB main memory
5-4318 CCIN 6718 18 GB disk drives
2-5700 1000 MB (1 GB) Ethernet IOAs2
Clients
60 6862-27U IBM PC 300PL Pentium II 400 MHz 512KB L2, 320 MB RAM, 6.4 GB disk drive
Intel® 8255x based PCI Ethernet Adapter 10/100
Microsoft Windows XP Professional Version 2002 Service Pack 1
Controller PC: 6862-27U IBM PC 300PL Pentium II 400 MHz 512KB L2, 320 MB RAM, 6.4 GB disk
drive Intel® 8255x based PCI Ethernet Adapter 10/100
Microsoft Windows 2000 5.00.2195 Service Pack 4
Workload
PC Magazine’s NetBench® 7.0.3 with the test suite ent_dm.tst was used to provide the benchmark data3.
2
The clients used 100 MB Ethernet and were switched into the 1 GB network of the server.
3
The testing was performed without independent verification by VeriTest testing division of Lionbridge Technologies, Inc. ("VeriTest") or Ziff
Davis Media Inc. and that neither Ziff Davis Media Inc. nor VeriTest make any representations or warranties as to the result of the test.
NetBench® is a registered trademark of Ziff Davis Media Inc. or its affiliates in the U.S. and other countries. Further details on the test
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 9 - iSeries NetServer File Serving
149
Download from Www.Somanuals.com. All Manuals Search And Download.
Measurement Results:
Throughput
250.000
200.000
150.000
100.000
50.000
0.000
V5R2
V5R3
V5R4
1
4
8
12 16 20 24 28 32 36 40 44 48 52 56 60
Clients
Response Time
12.000
10.000
8.000
6.000
4.000
2.000
0.000
V5R2
V5R3
V5R4
1
4
8
12 16 20 24 28 32 36 40 44 48 52 56 60
Clients
Conclusion/Explanations:
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 9 - iSeries NetServer File Serving
150
Download from Www.Somanuals.com. All Manuals Search And Download.
From the charts above in the Measurement Results section, it is evident that when customers upgrade to
V5R4 they can expect to see an improvement in throughput and response time when using iSeries
NetServer.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 9 - iSeries NetServer File Serving
151
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 10. DB2 for i5/OS JDBC and ODBC Performance
DB2 for i5/OS can be accessed through many different interfaces. Among these interfaces are: Windows
.NET, OLE DB, Windows database APIs, ODBC and JDBC. This chapter will focus on access through
JDBC and ODBC by providing programming and tuning hints as well as links to detailed information.
10.1 DB2 for i5/OS access with JDBC
Access to the System i data from portable Java applications can be achieved with the universal database
access APIs available in JDBC (Java Database Connectivity). There are two JDBC drivers for the System
i. The Native JDBC driver is a type 2 driver. It uses the SQL Call Level Interface for database access
and is bundled in the System i Developer Kit for Java. The JDBC Toolbox driver is a type 4 driver
which is bundled in the System i Toolbox for Java. In general, the Native driver is chosen when running
on the System i server directly, while the Toolbox driver is typically chosen when accessing data on the
System i server from another machine. The Toolbox driver is typically used when accessing System i data
from a Windows machine, but it could be used when accessing the System i server from any Java capable
system. More detailed information on which driver to choose may be found in the JDBC references.
JDBC Performance Tuning Tips
JDBC performance depends on many factors ranging from generic best programming practices for
databases to specific tuning which optimizes JDBC API performance. Tips for both SQL programming
and JDBC tuning techniques to improve performance are included here.
y
y
In general when accessing a database it takes less time to retrieve smaller amounts of data. This is
even more significant for remote database access where the data is sent over a network to a client.
For good performance, SQL queries should be written to retrieve only the data that is needed. Select
only needed fields so that additional data is not unnecessarily retrieved and sent. Use appropriate
predicates to minimize row selection on the server side to reduce the amount of data sent for client
processing.
Follow the ‘Prepare once, execute many times’ rule of thumb. For statements that are executed many
times, use the PreparedStatement object to prepare the statement once. Then use this object to do
subsequent executes of this statement. This significantly reduces the overhead of parsing and
compiling the statement every time it is executed.
y
y
y
Do not use a PreparedStatement object if an SQL statement is run only one time. Compiling and
running a statement at the same time has less overhead than compiling the statement and running it in
two separate operations.
Consider using JDBC stored procedures. Stored procedures can help reduce network communication
time and traffic which improves response time. Java supports stored procedures via CallableStatement
objects.
Turn off autocommit, if possible. Explicitly manage commits in the application, but do not leave
transactions uncommitted for long periods of time.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 10 - DB2 for i5/OS JDBC and ODBC
152
Download from Www.Somanuals.com. All Manuals Search And Download.
y
y
Use the lowest isolation level required by the application. Higher isolation levels can reduce
performance levels as more locking and synchronization are required. Transaction levels in order of
increasing level are: TRANSACTION_NONE, TRANSACTION_READ_UNCOMMITTED,
TRANSACTION_READ_COMMITTED, TRANSACTION_REPEATABLE_READ,
TRANSACTION_SERIALIZABLE
Reuse connections. Minimize the opening and closing of connections where possible. These
operations are very expensive. If possible, keep connections open and reuse them. A connection pool
can help considerably.
y
y
Consider use of Extended Dynamic support. In generally provides better performance by caching the
SQL statements in SQL packages on the System i.
Use appropriate cursor settings. Use a fetch forward only cursor type if the data does not need to be
scrollable. Use read only cursors for retrieving data which will not be updated.
y
y
Use block inserts and batch updates.
Tune connection properties to maximize application performance. The connection properties are
explained in the driver documentation. Among the properties are ‘block size’ and ‘data compression’
which should be tuned as follows:
1. Choose the right ‘block size” for the application. ‘block size’ specifies the amount of data to
retrieve from the server and cache on the client. For the Toolbox driver ‘block size’ specifies the
transfer size in kilobytes, with 32 as the default. For the native driver ‘block size’ specifies the
number of rows that will be fetched at a time for a result set, with 32 as the default. When larger
amounts of data are retrieved a larger block size may help minimize communication time.
2. The Toolbox driver has a ‘data compression’ property to enable compressing the data blocks
before sending them to the client. This is set to true by default. In general this gives better
response time, but may use more CPU.
References for JDBC
y
y
y
The System i Information Center
The home page for Java and DB2 for i5/OS
Sun’s JDBC web page
10.2 DB2 for i5/OS access with ODBC
ODBC (Open Database Connectivity) is a set of API's which provide clients with an open interface to
any ODBC supported database. The ODBC APIs are part of System i Access.
In general, the JDBC Performance tuning tips also apply to the performance of ODBC applications:
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 10 - DB2 for i5/OS JDBC and ODBC
153
Download from Www.Somanuals.com. All Manuals Search And Download.
y
y
y
y
y
Employ efficient SQL programming techniques to minimize the amount of data processed
Prepared statement reuse to minimize parsing and optimization overhead for frequently run queries
Use stored procedures when appropriate to bundle processing into fewer database requests
Consider extended dynamic package support for SQL statement and package caching
Process data in blocks of multiple rows rather than single records when possible (e.g. Block inserts)
In addition for ODBC performance ensure that each statement has a unique statement handle. Sharing
statement handles for multiple sequential SQL statements causes DB2 on i5/OS to do FULL OPEN
operations since the database cursor can not be reused. By ensuring that an SQLAllocStmt is done before
any SQLPrepare or SQLExecDirect commands, database processing can be optimized. This is especially
important when a set of SQL statements are executed in a loop. Ensuring each SQL statement has its own
handle reduces the DB2 overhead.
Tools such as ODBC Trace (available through the ODBC Driver Manager) are useful in understanding
what ODBC calls are made and what activity occurs as a result. Client application profilers may also be
useful in tuning client applications. These are often included in application development toolkits.
ODBC Performance Settings
You may be able to further improve the performance of your ODBC application by configuring the
ODBC data source through the Data Sources (ODBC) administrator in the Control Panel. Listed below
are some of the parameters that can be set to better tune the performance of the System i Access ODBC
Driver. The ODBC performance parameters discussed in detail are:
y
y
y
y
y
y
Prefetch
ExtendedDynamic
RecordBlocking
BlockSizeKB
LazyClose
LibraryView
Prefetch : The Prefetch option is a performance enhancement to allow some or all of the rows of a
particular ODBC query to be fetched at PREPARE time. We recommend that this setting be turned ON.
However, if the client application uses EXTENDED FETCH (SQLExtendedFetch) this option should be
turned OFF.
ExtendedDynamic: Extended dynamic support provides a means to "cache" dynamic SQL statements on
the System i server. With extended dynamic, information about the SQL statement is saved away in an
SQL package object on the System i the first time the statement is run. On subsequent uses of the
statement, System i Access ODBC recognizes that the statement has been run before and can skip a
significant part of the processing by using the information saved in the SQL package. Statements which
are cached include SELECT, positioned UPDATE and DELETE, INSERT with subselect, DECLARE
PROCEDURE, and all other statements which contain parameter markers.
All extended dynamic support is application based. This means that each application can have its own
configuration for extended dynamic support. Extended dynamic support as a whole is controlled through
the use of the ExtendedDynamic option. If this option in not selected, no packages are used. If the option
is selected (default) custom settings per application can be configured with the “Custom Settings Per
Application” button. When this button is clicked a “Package information for application” window pops
up and package library and name fields can be filled in and usage options can be selected.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 10 - DB2 for i5/OS JDBC and ODBC
154
Download from Www.Somanuals.com. All Manuals Search And Download.
Packages may be shared by several clients to reduce the number of packages on the System i server. To
enable sharing, the default libraries of the clients must be the same and the clients must be running the
same application. Extended dynamic support will be deactivated if two clients try to use the same
package but have different default libraries. In order to reactivate extended dynamic support, the package
should be deleted from the System i and the clients should be assigned different libraries in which to store
the package(s).
Package Usage: The default and preferred performance setting enables the ODBC driver to use the
package specified and adds statements to the package as they are run. If the package does not exist when a
statement is being added, the package is created on the server.
Considerations for using package support: It is recommended that if an application has a fixed number
of SQL statements in it, a single package be used by all users. An administrator should create the
package and run the application to add the statements from the application to the package. Once that is
done, configure all users of the package to not add any further statements but to just use the package.
Note that for a package to be shared by multiple users each user must have the same default library listed
in their ODBC library list. This is set by using the ODBC Administrator.
Multiple users can add to or use a given package at the same time. Keep in mind that as a statement is
added to the package, the package is locked. This could cause contention between users and reduce the
benefits of using the extended dynamic support.
If the application being used has statements that are generated by the user and are ad hoc in nature, then it
is recommended that each user have his own package. Each user can then be configured to add
statements to their private package. Either the library name or all but the last 3 characters of the package
name can be changed.
RecordBlocking: The RecordBlocking switch allows users to control the conditions under which the
driver will retrieve multiple rows (block data) from the System i. The default and preferred performance
setting to Use Blocking will enable blocking for everything except SELECT statements containing an
explicit "FOR UPDATE OF" clause.
BlockSizeKB (choices 2 through 512): The BlockSizeKB parameter allows users to control the number
of rows fetched from the System i per communications flow (send/receive pair). This value represents the
client buffer size in Kilobytes and is divided by the size of one row of data to determine the number of
rows to fetch from the System i in one request. The primary use of this parameter is to speed up queries
that send a lot of data to the client. The default value 32 will perform very well for most queries. If you
have the memory available on the client, setting a higher value may improve some queries.
LazyClose: The LazyClose switch allows users to control the way SQLClose commands are handled by
the System i Access ODBC Driver. The default and preferred performance setting enables Lazy Close.
Enabling LazyClose will delay sending an SQLClose command to the System i until the next ODBC
request is sent. If Lazy Close is disabled, a SQLClose command will cause an immediate explicit flow to
the System i to perform the close. This option is used to reduce flows to the System i, and is purely a
performance enhancing option.
LibraryView: The LibraryView switch allows users to control the way the System i Access ODBC
Driver deals with certain catalog requests that ask for all of the tables on the system. The default and
preferred performance setting ‘Default Library List’ will cause catalog requests to use only the libraries
specified in the default library list when going after library information. Setting the LibraryView value to
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 10 - DB2 for i5/OS JDBC and ODBC
155
Download from Www.Somanuals.com. All Manuals Search And Download.
‘All libraries on the system’ will cause all libraries on the system to be used for catalog requests and may
cause significant degradation in response times due to the potential volume of libraries to process.
References for ODBC
y
y
y
DB2 Universal Database for System i SQL Call Level Interface (ODBC)
is found under the System i Information Center under Printable PDFs and Manuals
The System i Information Center
Microsoft ODBC webpage
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 10 - DB2 for i5/OS JDBC and ODBC
156
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 11. Domino on i
This chapter includes performance information for Lotus Domino on the IBM i operating system. Some
of the information previously included in this section has been removed. Earlier versions of the document
April 2008 Update:
y
Workload Estimator 2008.2
January 2008 Updates:
y
y
y
V6R1
Domino 8 white papers
Workload Estimator 2008.1
V6R1
V6R1 may provide improvements in processing capability for Domino environments. V6R1 also requires
object conversion for all program objects, including the Domino program objects. This conversion occurs
when starting a Domino server for the first time after installing V6R1, or after installing Domino on a
V6R1 system, and may take a significant amount of time to complete. For more information on Domino
support for V6R1, see:
POWER6 hardware
Hardware models based on POWER6 processors may provide improvements in processing capability for
Domino environments. For systems that use POWER5 and earlier processors, MCU (Mail and Calendar
Users) ratings, rather than CPW ratings, are used to compare Domino performance across hardware
models. With the introduction of the POWER6 models, it is less necessary to provide separate MCU and
CPW ratings. Appendix C provides projected MCU ratings for POWER6 models that were available as of
July 2007, but will not provide ratings for newer hardware models. The IBM Systems Workload
Estimator should be used for sizing Domino mail and application workloads. When sizing Domino on i,
the latest maintenance release of the selected version is assumed.
Workload Estimator 2008.2
Domino sizing support has been changed as follows:
y
y
Support for IBM Power System models has been added.
Domino 8 disk drive projections have been updated.
Workload Estimator 2008.1
Domino sizing support has been changed as follows:
y
y
Sametime support has been updated.
Quickr for Domino support has been added.
The remainder of this chapter provides performance information for Domino environments.
Additional Resources
Additional performance information for Domino on i can be found in the following articles, redbooks and
redpapers:
y
IBM Lotus Notes V8 workloads: Taking performance to a new level, September 2007
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 11 - Domino
157
Download from Www.Somanuals.com. All Manuals Search And Download.
y
y
y
y
y
y
IBM Lotus Domino V8 server with the IBM Lotus Notes V8 client: Performance, October 2007
Lotus Domino 7 Server Performance, Part 2, November 2005
Lotus Domino 7 Server Performance, Part 3, November 2005
Best Practices for Large Lotus Notes Mail Files, October 2005
Lotus Domino 7 Server Performance, Part 1, September 2005
y
y
y
y
y
y
y
Domino 6 for iSeries Best Practices Guide (SG24-6937), March 2004
Lotus Domino 6 for iSeries Multi-Versioning Support on iSeries (SG24-6940), March 2004
Sizing Large-Scale Domino Workloads on iSeries (redpaper), December 2003
Domino 6 for iSeries Implementation (SG24-6592), February 2003
Upgrading to Domino 6: The Performance Benefits (redpaper), January 2003
Domino for iSeries Sizing and Performance Tuning (SG24-5162), April 2002
iNotes Web Access on the IBM eServer iSeries Server (SG24-6553), February 2002
11.1 Domino Workload Descriptions
The Mail and Calendaring Users workload and the Domino Web Access mail scenarios discussed in this
chapter were driven by an automated environment which ran a script similar to the mail workloads from
Lotus NotesBench. Lotus NotesBench is a collection of benchmarks, or workloads, for evaluating the
performance of Domino servers. The results from the Mail and Calendaring Users and Domino Web
Access workloads are not official NotesBench tests. The numbers discussed for these workloads may not
be used officially or publicly to compare to NotesBench results published for other Domino server
environments.
Official NotesBench audit results for System i are discussed in section 11.14 System i NotesBench Audits
y
Mail and Calendaring Users (MCU)
Each user completes the following actions an average of every 15 minutes except where noted:
Open mail database which contains documents that are 10Kbytes in size.
Open the current view
Open 5 documents in the mail file
Categorize 2 of the documents
Send 1 new mail memos/replies 10Kbytes in size to 3 recipients. (every 90 minutes)
Mark several documents for deletion
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 11 - Domino
158
Download from Www.Somanuals.com. All Manuals Search And Download.
Delete documents marked for deletion
Create 1 appointment (every 90 minutes)
Schedule 1 meeting invitation (every 90 minutes)
Close the view
y
Domino Web Access (formerly known as iNotes Web Access)
Each user completes the following actions an average of every 15 minutes except where noted:
Open mail database which contains documents that are 10Kbytes in size.
Open the current view
Open 5 documents in the mail file
Send 1 new mail memos/replies 10Kbytes in size to 3 recipients (every 90 minutes)
Mark one document for deletion
Delete document marked for deletion
Close the view
The Domino Web Access workload scenario is similar to the Mail and Calendaring workload except
that the Domino mail files are accessed through HTTP from a Web browser and there is no
scheduling or calendaring taking place. When accessing mail through Notes, the Notes client
performs the majority of the work. When a web browser accesses mail from a Domino server, the
Domino server bears the majority of the processing load. The browser’s main purpose is to display
information.
11.2 Domino 8
Domino 8 may provide performance improvements for Notes clients. Test results comparing Domino
performance with Domino 7 and Domino 8 have been published in a 2-part series of articles. The
following links refer to these articles:
y
y
IBM Lotus Notes V8 workloads: Taking performance to a new level, September 2007
IBM Lotus Domino V8 server with the IBM Lotus Notes V8 client: Performance, October 2007
The most up-to-date sizing information on Domino 8 can be found in the Workload Estimator.
11.3 Domino 7
Domino 7 provides performance improvements for both Notes and Domino Web Access clients. Test
results comparing Domino performance with Domino 6.5 and Domino 7 have been published in a 3-part
series of articles titled Domino 7 Server Performance. The results show that Domino 7 reduces the
amount of CPU required for a given number of users and workload rate as compared with Domino 6.5.
The articles also show that the added function in the new Domino 7 mail templates do require some extra
processing resources. Results with Domino 7 using the Domino 7 templates show improvements over
Domino 6.5 with the Domino 6 mail templates, while Domino 7 with the Domino 6 template provides the
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 11 - Domino
159
Download from Www.Somanuals.com. All Manuals Search And Download.
optimal performance but of course without the function provided in the Domino 7 templates. The
following links refer to these articles:
y
y
y
Lotus Domino 7 Server Performance, Part 1, September 2005
Lotus Domino 7 Server Performance, Part 2, November 2005
Lotus Domino 7 Server Performance, Part 3, November 2005
Additional improvements have been made in Domino 7 to support more users per Domino partition.
Internal benchmark tests have been run with as many as 18,000 Notes clients in a single partition. While
we understand most customers will not configure a single Domino server to regularly run that number of
clients, improvements have been made to enable this type scaling within a Domino partition if desired. A
recently published audit report (January 2006) for the System i5 595 demonstrates that 250,500 R6Mail
users were run using only 14 Domino mail partitions with most running 18,000 users each.
Domino Domain Monitor
Included with Domino 7 is the Domino Domain Monitoring facility which provides a means to monitor
and determine the health of an entire domain at a single location and quickly resolve problems. Some of
the System i guidelines included for acceptable faulting rates have shown to be a bit aggressive, such that
customers have reported that memory alerts and alarms are being triggered even though system
performance and response times are acceptable. Work is in progress to make adjustments to future
versions of the tool to adjust the faulting guidelines. If you are running this tool and experience alerts for
high faulting rates (above 100 per processor), the alerts can be disregarded if you are experiencing
acceptable response time and system performance.
11.4 Domino 6
Domino 6 provided some very impressive performance improvements over Domino 5, both for workloads
we’ve tested in our lab and for customers who have already deployed Domino 6 on iSeries. In this
section we’ll provide data showing these improvements based on testing done with the Mail and
Calendaring User and Domino Web Access workloads.
Notes client improvements with Domino 6
Using the Mail and Calendaring User workload, we compared performance using Domino 5.0.11 and
Domino 6. The table below summarizes our results.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 11 - Domino
160
Download from Www.Somanuals.com. All Manuals Search And Download.
Domino Version
Number of Domino
Web Access users
Average CPU
Utilization
Average Response
Time
Average Disk
Utilization
Domino 5.0.11
Domino 6
2,000
2,000
41.5%
24.0%
96ms
64ms
<1%
<1%
Domino 5.0.11
Domino 6
3,800
3,800
19.4%
11.0%
119ms
65ms
<1%
<1%
Domino 5.0.11
Domino 6
20,000
20,000
96.2%
51.5%
>5sec
72ms
<1%
<1%
The 3000 user comparison above was done on an iSeries model i270-2253 which has a 2-way 450MHz
processor. This system was configured with 8 Gigabytes (GB) of memory and 12 18GB disk drives
configured with RAID5. Notice the 30% improvement in CPU utilization with Domino 6, along with a
substantial improvement in response time.
The 8000 user comparison was done on a model i810-2469 which has a 2-way 750MHz processor. The
system had 24 8.5GB disk drives configured with RAID5. In this test we notice a slightly greater than
30% improvement in CPU utilization as well as a significant reduction in response time with Domino 6.
For this comparison we intentionally created a slightly constrained main storage (memory) environment
with 8GB of memory available for the 8000 users. We found that we needed to add 13% more memory,
an additional 1GB in this case, when running with Domino 6 in order to achieve the same paging rates,
faulting rates, and average disk utilization as the Domino 5.0.11 test. In Domino 6 new memory caching
techniques are being used for the Notes client to improve response time and may require additional
memory.
Both comparisons shown in the table above were made using single Domino partitions. Similar
improvements can be expected for environments using multiple Domino partitions.
Domino Web Access client improvements with Domino 6
Using the Domino Web Access workload, we compared performance using Domino 5.0.11 and Domino
6. The table below summarizes our results.
Domino Version
Number of Mail and
Calendaring Users
Average CPU
Utilization
Average Response
Time
Average Disk
Utilization
Domino 5.0.11
Domino 6
3,000
3,000
39.4%
27.6%
26ms
18ms
7.1%
5.2%
Domino 5.0.11
Domino 6*
8,000
8,000
69.7%
46.7%
67ms
46ms
25.2%
26.1%
* Additional memory was added for this test
Notice that Domino 6 provides at least a 40% CPU improvement in each of the Domino Web Access
comparisons shown above, along with significant response time reductions. The comparisons shown
above were made on systems with abundant main storage and disk resources so that CPU was the only
constraining factor. As a result, the average disk utilization during all of these tests was less than one
percent. The purpose of the tests was to compare iNotes Web Access performance using Domino 5.0.11
and Domino 6.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 11 - Domino
161
Download from Www.Somanuals.com. All Manuals Search And Download.
The 2000 user comparison was done on a model i825-2473 with 6 1.1GHz POWER4 processors, 45GB
of memory, and 60 18GB disk drives configured with RAID5, in a single Domino partition. The 3800
user comparison used a single Domino partition on a model i890-0198 with 32 1.3GHz POWER4
processors. This system had 64GB of memory and 89 18GB disk drives configured with RAID5
protection. The 20,000 user comparison used ten Domino partitions, also on an i890-0198 32-way
system with 1.3GHz POWER4 processors. This particular system was equipped with 192GB of memory
and 360 18GB disk drives running with RAID5 protection.
In addition to the test results shown above, many more measurements were performed to study the
performance characteristics of Domino 6. One form of tests conducted are what we call “paging curves.”
To accomplish the paging curves, a steady state was achieved using the workload. Then, over a course of
several hours, we gradually reduced the main storage available to the Domino server(s) and observed the
effect on paging rates, faulting rates, and response times. These tests allowed us to build a performance
curve of the amount of memory available per user versus the paging rate and response time. Based on a
paging curve study of the Domino Web Access workload on Domino 6, we determined that, similar to the
Mail and Calendaring Users workload, some additional memory was required in order to achieve the
same faulting and paging rates as with Domino 5.0.11.
11.5 Response Time and Megahertz relationship
The iSeries models and processor speeds described in this section are obviously dated, but the concepts
and relationships of response time and megahertz (and gigahertz) described herein are still applicable.
NOTE: When comparing models which have different processors types, such as SSTAR, POWER4 and
POWER5 it is important to use appropriate rating metrics (see Appendix C) or a sizing tool such as the
IBM Systems Workload Estimator. The POWER4 and POWER5 processors have been designed to run at
significantly higher MHz than SSTAR processors, and the MHz on SSTAR does not compare directly to
the MHz on POWER4 or POWER5.
In general, Domino-related processing can be described as compute intensive (See Appendix C for more
discussion of compute intensive workloads). That is, faster processors will generally provide lower
response times for Domino processing. Of course other factors besides CPU time need to be considered
when evaluating overall performance and response time, but for the CPU portion of the response time the
following applies: faster megahertz processors will deliver better response times than an “equivalent”
total amount of megahertz which is the sum of slower processors. For example, the 270-2423 processor
is rated at 450MHz and the 170-2409 has 2 processors rated at 255MHz; the 1-way 450MHz processor
will provide better response time than a 2-way 255MHz processor configuration. The 540MHz, 600MHz,
and 750MHz processors perform even faster. Figure 11.3 below depicts the response time performance
for three processor types over a range of utilizations. Actual results will vary based on the type of
workload being performed on the system.
Using a web shopping application, we measured the following results in the lab. In tests involving 100
web shopping users, the 2-way 170-2409 ran at 71.5% CPU utilization with 0.78 seconds average
response time. The 1-way 450MHz 270-2423 ran at 73.6% CPU with average response time of 0.63
seconds. This shows a response time improvement of approximately 20% near 70% CPU utilization
which corresponds with the data shown in Figure 11.3. Response times at lower CPU utilizations will
see even more improvement from faster processors. The 270-2454 was not measured with the web
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 11 - Domino
162
Download from Www.Somanuals.com. All Manuals Search And Download.
shopping application, but would provide even better response times than the 270-2423 as projected in
Figure 11.3.
When using MHz alone to compare performance capabilities between models, it is necessary for those
models to have the same processor technology and configuration. Factors such as L2 cache and type and
speed of memory controllers also influence performance behavior and must be considered. For this
reason we recommend using the tables in Appendix C when comparing performance capabilities between
iSeries servers. The data in the Appendix C tables take the many performance and processor factors into
account and provides comparisons between the iSeries models using three different metrics, CPW, CIW
and MCU.
Megahertz and Response Time Relationship
3
2 X 255Mhz 170-2409 450 Mhz 270-2423 540Mhz 270-2452
2.5
2
1.5
1
0.5
0
10
20
30
40
50
60
70
80
9
CPU Utilization
Figure 11.3 Response Time and Megahertz relationship
11.6 Collaboration Edition and Domino Edition offerings
Collaboration Edition
The System i Collaboration Edition, announced May 9, 2006, delivers a lower-priced business system to
help support the transformation for small and medium sized clients. This edition helps support flexible
deployment of Domino, Workplace, and Portal solutions, enabling clients to build an on demand
computing environment. It provides support for collaboration applications while offering the flexibility
of a customizable package of hardware, software, and middleware. Please visit the following site(s) for
the additional information:
y
y
Domino Edition
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 11 - Domino
163
Download from Www.Somanuals.com. All Manuals Search And Download.
The eServer i5 Domino Edition builds on the tradition of the DSD (Dedicated Server for Domino) and the
iSeries for Domino offering - providing great price/performance for Lotus software on System i5 and
i5/OS. Please visit the following sites for the latest information on Domino Edition solutions:
y
y
11.7 Performance Tips / Techniques
1.
2.
Refer to the redbooks listed at the beginning of this chapter which provide Tips and Techniques
for tuning and analyzing Domino environments on System i servers.
Our mail tests show approximately a 10% reduction in CPU utilization with the system value
QPRCMLTTSK(Processor multi-tasking) set to 1 for the pre-POWER4 models. This allows the
system to have two sets of task data ready to run for each physical processor. When one of the
tasks has a cache miss, the processor can switch to the second task while the cache miss for the
first task is serviced. With QPRCMLTTSK set to 0, the processor is essentially idle during a
cache miss. This parameter does not apply to the POWER4-based i825, i870, and i890 servers.
NOTE: It is recommended to always set QPRCMLTTSK to “1” for the POWER5 models for
Domino processing as it has an even greater CPU impact than the 10% described above.
3.
4.
It has been shown in customer settings that maintaining a machine pool faulting rate of less than 5
faults per second is optimal for response time performance.
iSeries notes.ini / server document settings:
y
Mail.box setting
Setting the number of mail boxes to more than 1 may reduce contention and reduce the CPU
utilization. Setting this to 2, 3, or 4 should be sufficient for most environments. This is in the
Server Configuration document for R5.
y
Mail Delivery and Transfer Threads
You can configure the following in the Server Configuration document:
y
Maximum delivery threads. These pull mail out of mail.box and place it in the users
mail file. These threads tended to use more resources than the transfer threads, so
we needed to configure twice as many of these so they would keep up.
Maximum Transfer threads. These move mail from one server’s mail.box to another
server’s mail.box. In the peer-to-peer topology, at least 3 were needed. In the hub
and spoke topology, only 1 was needed in each spoke since mail was transferred to
only one location (the hub). Twenty-five were configured for the hubs (one for each
spoke).
y
y
Maximum concurrent transfer threads. This is the number of transfer threads from
server ‘A’ to server ‘B’. We set this to 1, which was sufficient in all our testing.
y
NSF_Buffer_Pool_Size_MB
This controls the size of the memory section used for buffering I/Os to and from disk storage.
If you make this too small and more storage is needed, Domino will begin using its own
memory management code which adds unnecessary overhead since OS/400 already is
managing the virtual storage. If it is made too large, Domino will use the space inefficiently
and will overrun the main storage pool and cause high faulting. The general rule of thumb is
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 11 - Domino
164
Download from Www.Somanuals.com. All Manuals Search And Download.
that the larger the buffer pool size, the higher the fault rate, but the lower the cpu cost. If the
faulting rate looks high, decrease the buffer pool size. If the faulting rate is low but your cpu
utilization is high, try increasing the buffer pool size. Increasing the buffer pool size
allocates larger objects specifically for Domino buffers, thus increasing storage pool
contention and making less storage available for the paging/faulting of other objects on the
system. To help optimize performance, increase the buffer pool size until it starts to impact
the faulting rate then back it down just a little. Changes to the buffer pool size in the Notes.ini
file will require the server to be restarted before taking effect. In Domino 8.0 and later
releases, the default buffer pool size is 512MB. In earlier releases, if
NFS_Buffer_Pool_Size_MB was not set in the notes.ini file, the buffer pool size could be as
large as 1.5GB. A buffer pool size that large might cause performance issues.
y
Server_Pool_Tasks
In the NOTES.INI file starting with 5.0.1, you can set the number of server threads in a
partition. Our tests showed best results when this was set to 1-2% of the number of active
threads. For example, with 3000 active users, the Server_Pool_Tasks was set to 60.
Configuring extra threads will increase the thread management cost, and increase your overall
cpu utilization up to 5%.
y
y
Route at once
In the Server Connection document, you can specify the number of normal-priority messages
that accumulate before the server routes mail. For our large server runs, we set this to 20.
Overall, this decreased the cpu utilization by approximately 10% by allowing the router to
deliver more messages when it makes a connection, rather than 1 message per connection.
Hub-and-spoke topology versus peer-to-peer topology.
We attempted the large server runs with both a peer-to-peer topology and a hub-and-spoke
topology (see the Domino Administrators guide for more details on how to set this up).
While the peer-to-peer funtioned well for up to 60,000 users, the hub-and-spoke topology had
better performance beyond 60,000 users due to the reduced number of server to server
connections (on the order of 50 versus 600) and the associated costs. A hub topology is also
easier to manage, and is sometimes necessitated by the LAN or WAN configuration. Also,
according to the Domino Administrators guide, the hub-and-spoke topology is more stable.
5.
6.
Dedicate servers to a specific task
This allows you to separate out groups of users. For example, you may want your mail delivered
at a different priority than you want database accesses. This will reduce the contention between
different types of users. Separate servers for different tasks are also recommended for high
availability.
MIME format.
For users accessing mail from both the Internet and Notes, store the messages in both Notes and
MIME format. This offers the best performance during mail retrieval because a format conversion
is not necessary. NOTE: This will take up extra disk space, so there is a trade-off of increased
performance over disk space.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 11 - Domino
165
Download from Www.Somanuals.com. All Manuals Search And Download.
7.
8.
Full text indexes
Consider whether to allow users to create full text indexes for their mail files, and avoid the use
of them whenever possible. These indexes are expensive to maintain since they take up CPU
processing time and disk space.
Replication.
To improve replication performance, you may need to do the following:
y
y
y
y
Use selective replication
Replicate more often so there are fewer updates per replication
Schedule replications at off-peak hours
Set up replication groups based on replication priority. Set the replication priority to high,
medium, or low to replicate databases of different priorities at different times.
9.
Unread marks.
Select “Don’t maintain unread marks” in the advanced properties section of Database properties
if unread marks are not important. This can save a significant amount of cpu time on certain
applications. Depending on the amount of changes being made to the database, not maintaining
unread marks can have a significant improvement. Test results in the lab with a Web shopping
applications have shown a cpu reduction of up to 20%. For mail, setting this in the NAB
decreased the cpu cost by 1-2%. Setting this in all of the user’s mail files showed a large
memory and cpu reduction (on the order of 5-10% for both). However, unread marks is an often
used feature for mail users, and should be disabled only after careful analysis of the tradeoff
between the performance gain and loss of usability.
10.
11.
Don’t overwrite free space
Select “Don’t overwrite free space” in the advanced properties section of Database properties if
system security can be maintained through other means, such as the default of PUBLIC
*EXCLUDE for mail files. This can save on the order of 1-5% of cpu. Note you can set this for
the mail.box files as well.
Full vs. Half duplex on Ethernet LAN.
Ensure the iSeries and the Ethernet switches in the network are configured to enable a full duplex
connection in order to achieve maximum performance. Poor performance can result when
running half duplex. This seems rather obvious, but the connection may end up running half
duplex even if the i5/OS line description is set to full duplex and even if the switch is enabled for
full duplex processing. Both the line description duplex parameter and the switch must be set to
agree with each other, and typically it is best to use auto-negotiate to achieve this (*AUTO for the
duplex parameter in the line description). Just checking the settings is usually not sufficient, a
LAN tester must be plugged into the network to verify full vs. half duplex.
12.
Transaction Logging.
Enabling transaction logging typically adds CPU cost and additional I/Os. These CPU and disk
costs can be justified if transaction logging is determined to be necessary for server reliability and
recovery speed. The redbook listed at the beginning of this chapter, “Domino for iSeries Sizing
and Performance Tuning,” contains an entire chapter on transaction logging and performance
impacts.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 11 - Domino
166
Download from Www.Somanuals.com. All Manuals Search And Download.
11.8 Domino Web Access
The following recommendations help optimize your Domino Web Access environment:
1. Refer to the redbooks listed at the beginning of this chapter. The redbook, “iNotes Web Access
on the IBM eServer iSeries server,” contains performance information on Domino Web Access
including the impact of running with SSL.
2. Use the default number of 40 HTTP threads. However, if you find that the
Domino.Threads.Active.Peak is equal to Domino.Threads.Total, HTTP requests may be waiting
or the HTTP server to make an active thread idle before handling the request. If this is the case
for your environment, increase the number of active threads until Domino.Threads.Active.Peak is
less than Domino.Threads.Total. Remember that if the number of threads is set very large, CPU
utilization will increase. Therefore, the number of threads should not exceed the peak by very
much.
3. Enable Run Web Agents Concurrently on the Internet Protocols HTTP tab in the Server
Document.
4. For optimal messaging throughput, enable two MAIL.BOX files. Keep in mind that MAIL.BOX
files grow as a messages queue and this can potentially impact disk I/O operations. Therefore, we
recommend that you monitor MAIL.BOX statistics such as Mail.Waiting and
Mail.Maximum.Deliver.Time. If either or both statistics increase over time, you should increase
the number of active MAIL.BOX files and continue to monitor the statistics.
11.9 Domino Subsystem Tuning
The objects needed for making subsystem changes to Domino are located in library QUSRNOTES and
have the same name as the subsystem that the Domino servers run in. The objects you can change are:
y
y
y
y
Class (timeslice, priority, etc.)
Subsystem description (pool configuration)
Job queue (max active)
Job description
The system supplied defaults for these objects should enable Domino to run with optimal performance.
However, if you want to ensure a specific server has better response time than another server, you could
configure that server in its own partition and change the priority for that subsystem (change the class),
and could also run that server in its own private pool (change the subsystem description).
You can create a class for each task in a Domino server. You would do this if, for example, you wanted
mail serving (SERVER task) to run at a higher priority than mail routing (ROUTER task). To enable this
level of priority setting, you need to do the following:
1. Create the classes that you want your Domino tasks to use.
2. Modify the following IFS file ‘/QIBM/USERDATA/LOTUS/NOTES/DOMINO_CLASSES’. In
that file, you can associate a class with a task within a given server.
3. Refer to the release notes in READAS4.NSF for details.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 11 - Domino
167
Download from Www.Somanuals.com. All Manuals Search And Download.
11.10 Performance Monitoring Statistics
Function to monitor performance statistics was added to Domino Release 5.0.3. Domino will track
performance metrics of the operating system and output the results to the server. Type "show stat
platform" at the server console to display them. This feature can be enabled by setting the parameter
PLATFORM_STATISTICS_ENABLED=1 in the NOTES.INI file and restarting your server and is
automatically enabled in some versions of Domino. Informal testing in the lab has shown that the
overhead of having statistics collection enabled is quite small and typically not even measurable.
The i5/OS Performance Tools and Collection Services function can be enabled to collect and report
Domino performance information by specifying to run the COLSRV400 task in the Domino server
notes.ini file parameter: ServerTasks=UPDATE,COLSRV400,ROUTER,COLLECT,HTTP. With V5R4
a new Domino Server Activity section has been added to the Performance Tools Component Report
which looks like the following:
Component Report
10/20/05 16:21:33
Domino Server Activity
Page
76
Member . . . : PERFMON01 Model/Serial . : 595/55-55555
Library . . : NZEN101905 System name . . :MySystemi
Main storage . . : 374.0 GB Started . . . . : 10/20/05 07:56:10
Version/Release 5/ 4.0 Stopped . . . . : 10/20/05 08:28:00
Int Threshold . : 100.00 %
:
Partition ID : 001
Feature Code . :7487-8966
Processor Units : 16.0
Virtual Processors: 16
Server : 855626/QNOTES/SERVER
Peak
Concur Pending
Users Outbound
------- Mail ------ ------- Database ------- ----- Name lookup ------
Itv
End
Tns
/Hour
CPU
Util
Waiting
Inbound
Cache
Hits
Cache
Lookups
Cache
Hits
Cache
Lookups
URLs
Rcv/Sec
Users
----- -------- ------ ------ ------- -------- --------- ----------- ----------- ----------- ----------- -------
07:58
07:59
08:00
08:01
.
1515420
1550099
1536840
1874520
18001
18001
18001
18001
39.50
37.25
31.95
35.46
18001
18001
18001
18001
23
23
24
24
0
0
0
0
0
0
0
0
264
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
.
.
08:24
08:25
08:26
08:27
08:28
1580400
1589159
1575959
1579739
1516680
18001
18001
18001
18001
18001
37.21
34.79
35.99
35.31
36.84
18001
18001
18001
18001
18001
34
40
40
40
40
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Column
Average
------------------------------ ----------------
Tns/Hour
Users
1,531,036
18,001
CPU Util
36.55
18,001
829
0
0
906
0
0
Peak Concurrent Users
Mail Pending Outbound
Mail Waiting Inbound
Database Cache Hits
Database Cache Lookups
Name Lookup Cache Hits
Name Lookup Cache Lookups
URLs Rcv/Sec
0
11.11 Main Storage Options
V5R3 provides performance improvements for the *DYNAMIC setting for Main Storage Option on
stream files. The charts found later in this section show the improved performance characteristics that can
be observed with using the *DYNAMIC setting in V5R3.
In V5R2 two new attributes were added to the OS/400 CHGATR command, *DISKSTGOPT and
*MAINSTGOPT. In this section we will describe our results testing the *MAINSTGOPT using the Mail
and Calendar workload. The allowed values for this attribute include the following:
1. *NORMAL
The main storage will be allocated normally. That is, as much main storage as possible will be
allocated and used. This minimizes the number of disk I/O operations since the information is cached
in main storage. If the *MAINSTGOPT attribute has not been specified for an object, this value is
the default.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 11 - Domino
168
Download from Www.Somanuals.com. All Manuals Search And Download.
2. *MINIMIZE
The main storage will be allocated to minimize the space used by the object. That is, as little main
storage as possible will be allocated and used. This minimizes main storage usage while increasing
the number of disk I/O operations since less information is cached in main storage.
3. *DYNAMIC
The system will dynamically determine the optimum main storage allocation for the object depending
on other system activity and main storage contention. That is, when there is little main storage
contention, as much storage as possible will be allocated and used to minimize the number of disk I/O
operations. When there is significant main storage contention, less main storage will be allocated and
used to minimize the main storage contention. This option only has an effect when the storage pool's
paging option is *CALC. When the storage pool's paging option is *FIXED, the behavior is the same
as *NORMAL. When the object is accessed through a file server, this option has no effect. Instead,
its behavior is the same as *NORMAL.
These values can be used to affect the performance of your Domino environment. As described above, the
default setting is *NORMAL which will work similarly to V5R1. However, there is a new default for the
block transfer size of stream files which are created in V5R2. Stream files created in V5R2 will use a
block transfer size of 16k bytes, versus 32k bytes in V5R1 and earlier. Files created prior to V5R2 will
retain the 32k byte block transfer size. To change stream files created prior to V5R2 to use the 16k block
transfer size, you can use the CHGATR command and specify the *NORMAL attribute. Testing showed
that the 16k block transfer size is advantageous for Domino mail and calendaring function which typically
accesses less than 16k at a time. This may affect the performance of applications that access stream files
with a random access patterns. This change will likely improve the performance of applications that read
and write data in logical I/O sizes smaller than 16k. Conversely, it may slightly degrade the performance
of applications that read and write data with a specified data length greater than 16k.
The *MINIMIZE main storage option is intended to minimize the main storage required when reading
and writing stream files and changes the block transfer size of the stream file object to 8k. When reading
or writing sequentially, main storage pages for the stream file are recycled to minimize the working set
size. To offset some of the adverse effects of the smaller block transfer size and the reduce likelihood that
a page is resident, *MINIMIZE synchronously reads several pages from disk when a read or write request
would cause multiple page faults. Also, *MINIMIZE avoids reading data from disk when the block of
data to be written is page aligned and has a length that is a multiple of the page size.
The *DYNAMIC main storage option is intended to provide a compromise between the *NORMAL and
*MINIMIZE settings. This option only has an effect when the storage pool is set to *CALC. The Expert
Cache feature of the iSeries allows the file system read and write functions to adjust their internal
algorithms based on system tuning recommendations. A system with low paging rates will use an
algorithm similar to *NORMAL, but when the paging rates are too high due to main storage contention,
the algorithm used will be more like *MINIMIZE. When specifying *DYNAMIC, the block transfer size
is set to 12k, midway between the value of *NORMAL and *MINIMIZE.
Deciding when it is appropriate to use the CHGATR command to change the *MAINSTGOPT for a
Domino environment is not necessarily straightforward. The rest of this section will discuss test results of
using the various attributes. For all of the test results shown here for the *MINIMIZE and *DYNAMIC
attributes, the CHGATR command was used to change all of the user mail .NSF files being used in the
test.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 11 - Domino
169
Download from Www.Somanuals.com. All Manuals Search And Download.
The following is an example of how to issue the command:
CHGATR OBJ( name of object) ATR(*MAINSTGOPT) VALUE(*NORMAL, *MINIMIZE, or
*DYNAMIC)
The chart below depicts V5R3-based paging curve measurements performed with the following settings
for the mail databases: *NORMAL, *MINIMIZE, and *DYNAMIC.
V5R3 Main Storage Options Fault Rates
5
4
3
2
1
0
60775040
47026568
36388264
28156548
21787000
BASE POOL SIZE(KB)
V5R3 *DYNAMIC
V5R3 *NORMAL
V5R3 *MINIMIZE
Figure 11.4 V5R3 Main Storage Options on a Power4 System - Page Fault Rates
In figures 11.4 and 11.5, results are shown for tests that were performed with a Mail and Calendaring
Users workload and various settings for Main Storage Option. The tests started with the users running at
steady state with adequate main storage resource available, and then the main storage available to the
*base pool containing the Domino servers was gradually reduced The tests used an NSF Buffer Pool Size
of 300MB with multiple Domino partitions.
Notice in Figure 11.4 above that as the base pool decreased in size (moving to the right on the chart), the
page faulting increased for all settings of main storage option. Using the *DYNAMIC and *NORMAL
attributes provided the lowest fault rates when memory was most abundant at the left side of the curve.
Moving to the right on the chart as main storage became more constrained, it shows that less page faulting
takes place with the *MINIMIZE storage option compared to the other two options. Less page faulting
will generally provide better performance.
In V5R3 the performance of *DYNAMIC has been improved and provides a better improvement for
faulting rates as compared to *NORMAL than was the case in V5R2. When running with *DYNAMIC
in V5R2, information about how the file is being accessed is accumulated for the open instance and
adjustments are made for that file based on that data. But when the file is closed and reopened, the
algorithm essentially needs to start over. V5R3 includes improvements to keep track of the history of the
file access information over open/close instances.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 11 - Domino
170
Download from Www.Somanuals.com. All Manuals Search And Download.
During the tests, the *DYNAMIC and *MINIMIZE settings used up to 5% more CPU resource than
*NORMAL.
Figure 11.5 below shows the response time data rather than fault rates for the same test shown in Figure
11.4 for the attributes *NORMAL, *DYNAMIC, and *MINIMIZE.
V5R3 Main Storage Options Response Times
80
70
60
50
40
30
20
10
0
60775040
47026568
36388264
28156548
21787000
BASE POOL SIZE(KB)
V5R3 *NORMAL
V5R3 *DYNAMIC
V5R3 *MINIMIZE
Figure 11.5 V5R3 Main Storage Options - Response Times
Notice that there is not an exact correlation between fault rates and response times as shown in Figures
11.4 and 11.5. The *NORMAL and *DYNAMIC option showed the lowest average response times at the
left side of the chart where the most main storage was available. As main storage was constrained
(moving to the right on the chart), *MINIMIZE provided lower response times.
As is the case with many performance settings, “your mileage will vary” for the use of *DYNAMIC and
*MINIMIZE. Depending on the relationship between the CPU, disk and memory resources on a given
system, use of the Main Storage Options may yield different results. As has already been mentioned,
both *MINIMIZE and *DYNAMIC required up to 5% more CPU resource than *NORMAL. The test
environment used to collect the results in Figures 11.4 and 11.5 had an adequate number of disk drives
such that disk utilizations were below recommended levels for all tests.
11.12 Sizing Domino on System i
To compare Domino processing capabilities for System i servers that use POWER5 and earlier
processors, you should use the MCU ratings provided in Appendix C . The ratings are based on the Mail
and Calendaring User workload and provide a better means of comparison for Domino processing than do
CPW ratings for these earlier models.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 11 - Domino
171
Download from Www.Somanuals.com. All Manuals Search And Download.
NOTE: MCU ratings should NOT be used directly as a sizing guideline for the number of supported
users. MCU ratings provide a relative comparison metric which enables System i models to be
compared with each other based on their Domino processing capability. MCU ratings are based on
an industry standard workload and the simulated users do not necessarily represent a typical load
exerted by “real life” Domino users.
When comparing models which have different processors types, such as SSTAR, POWER4 and
POWER5, it is important to use appropriate rating metrics (see Appendix C) or a sizing tool such as the
IBM Systems Workload Estimator. The POWER4 and POWER5 processors have been designed to run at
significantly higher MHz than SSTAR processors, and the MHz on SSTAR does not compare directly to
the MHz on POWER4 or POWER5.
For sizing Domino mail and application workloads on System i servers, including the new POWER6
models, the recommended method is the IBM Systems Workload Estimator. This tool was previously
called the IBM eServer Workload Estimator. You can access the Workload Estimator from the Domino
on iSeries home page (select “sizing Information”) or at this URL:
The Workload Estimator is typically refreshed 3 to 4 times each year, and enhancements are continually
added for Domino workloads. Be sure to read the “What’s New” section for updates related to Domino
sizing information. The estimator's rich help text describes the enhancements in more detail. Some of the
recent additions include: SameTime Application Profiles for Chat, Meeting, and Audio/Video, Domino
6, Transaction Logging, a heavier mail client type, adjustments to take size of database into account,
LPAR updates, and enhancement to defining and handling Domino Clustering activity.
Be sure to note the redpaper “Sizing Large-Scale Domino Workloads on iSeries” which is available at:
variety of experiments such as for mail and calendar workloads using different sized documents, and
comparisons of the effect of a small versus very large mail database size. A more recent article describes
“Best Practices for Large Lotus Notes Mail Files” and is found at:
Additional information on sizing Domino HTTP applications for AS/400 can be found at
provided that represent typical Web-enabled applications running on a Domino for AS/400 server. The
examples show projected throughput rates for various iSeries servers. To observe transaction rates for a
Domino sever you can use the “show stat domino” command and note the Domino.Requests.Per1hour,
Domino.Requests.Per1min, and Domino.Requests.Per5min results. The applications described in these
examples are included as IBM defined applications in the Workload Estimator.
For more information on performance data collection and sizing, see Appendix B - iSeries Sizing and
Performance Data Collection Tools.
11.13 LPAR and Partial Processor Considerations
Many customers have asked whether the lowest rated i520 models will provide acceptable performance
for Domino. Given the CPU intensive nature of most collaborative transactions, the 500 CPW and 600
CPW models may not provide acceptable response times for these types of workloads. The issue is one of
response time rather than capacity. So even if the anticipated workload only involves a small number of
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 11 - Domino
172
Download from Www.Somanuals.com. All Manuals Search And Download.
users or relatively low transaction rates, response times may be significantly higher for a small LPAR
(such as 0.2 processor) or partial processor model as compared to a full processor allocation of the same
technology. The IBM Systems Workload Estimator will not recommend the 500 CPW or 600 CPW
models for Domino processing.
Be sure to read the section “Accelerator for System i5” in Chapter 6, Web Server and WebSphere
Performance. That section describes the new “Accelerator” offerings which provide improved
performance characteristics for the i520 models. In particular, note Figure 6.6 to observe potential
response time differences for a 500 CPW or 600 CPW model as compared with a higher rated or
Accelerated CPW model for a CPU intensive workload.
11.14 System i NotesBench Audits and Benchmarks
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 11 - Domino
173
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 12. WebSphere MQ for iSeries
12.1 Introduction
The WebSphere MQ for iSeries product allows application programs to communicate with each other
using messages and message queuing. The applications can reside either on the same machine or on
different machines or platforms that are separated by one or more networks. For example, iSeries
applications can communicate with other iSeries applications through WebSphere MQ for iSeries, or they
can communicate with applications on other platforms by using WebSphere MQ for iSeries and the
appropriate MQ Series product(s) for the other platform (HP-UX, OS/390, etc.).
MQ Series supports all important communications protocols, and shields applications from having to deal
with the mechanics of the underlying communications being used. In addition, MQ Series ensures that
data is not lost due to failures in the underlying system or network infrastructure. Applications can also
deliver messages in a time independent mode, which means that the sending and receiving applications
are decoupled so the sender can continue processing without having to wait for acknowledgement that the
message has been received.
This chapter will discuss performance testing that has been done for Version 5.3 of WebSphere MQ for
iSeries and how you can access the available performance data and reports generated from these tests. A
brief list of conclusions and results are provided here, although it is recommended to obtain the reports
provided for a more comprehensive look at WebSphere MQ for iSeries performance.
12.2 Performance Improvements for WebSphere MQ V5.3 CSD6
WebSphere MQ V5.3 CSD6 introduces substantial performance improvements at queue manager start
and during journal maintenance.
Queue Manager Start Following an Abnormal End
WebSphere MQ cold starts by customers in the field are a common occurrence after a queue manager
ends abnormally because the time needed to clean up outstanding units of work is lengthy (or worse,
because the restart does not complete). Note that during a normal shutdown, messages in the outstanding
units of work would be cleaned up gracefully.
In tests done in our Rochester development lab, we simulated a large customer environment with 50-500
customers connected, each with an outstanding unit of work in progress, and then ended the queue
manager abnormally. These tests showed that with the performance enhancement applied, a queue
manager start that previously took hours to complete finished in less than three minutes. Overall, we saw
90% or greater improvement in start times in these cases.
Checkpoint Following a Journal Receiver Roll-over
Our goal in this case was to improve responsiveness and throughput with regards to persistent messaging,
and reduce the amount of time WebSphere MQ is unavailable during the checkpoint taken after a journal
receiver roll-over. Tests were done in the Rochester lab with several different journal receiver sizes and
various numbers of journal receivers in the chain in order to assess the impact of this performance
enhancement. Our results showed up to a 90% improvement depending on the size and number of journal
receivers involved, with scenarios having larger amounts of journal data receiving the most benefit. This
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 12 - MQ Series
174
Download from Www.Somanuals.com. All Manuals Search And Download.
enhancement should allow customers to run with smaller, more manageable, receivers with less concern
about the checkpoint taken following a receiver roll-over during business hours.
12.3 Test Description and Results
Version 5.3 of WebSphere MQ for iSeries includes several performance enhancements designed to
significantly improve queue manager throughput and application response time, as well as improve the
overall throughput capacity of MQ Series. Measurements were done in the IBM Rochester laboratory
with assistance from IBM Hursley to help show how Version 5.3 compares to Version 5.2 of MQ Series
for iSeries.
The workload used for these tests is the standard CSIM workload provided by Hursley to measure
performance for all MQ Series platforms. Measurements were done using both client-server and
distributed queuing processing. Results of these tests, along with test descriptions, conclusions,
recommendations and tips and techniques are available in support pacs at the following URL:
From this page, you can select to view all performance support pacs. The most current support pac
document at this URL is the “WebSphere MQ for iSeriesV5.3- Performance Evaluations”. This document
contains performance highlights for V5.3 of this product, and includes measurement data, performance
recommendations, and performance tips and techniques.
12.4 Conclusions, Recommendations and Tips
Following are some basic performance conclusions, recommendations and tips/techniques to consider for
WebSphere MQ for iSeries. More details are available in the previously mentioned support pacs.
y
y
MQ V5.3 shows an improvement in peak throughput over MQ V5.2 for persistent and nonpersistent
messaging, both in client-server and distributed messaging environments. The peak throughput for
persistent messaging improved by 15-20%, while for nonpersistent messaging, the peak increased by
about 5-10%.
Tests were also done to determine how many driving applications could be run with a reduced rate of
messages per second. The purpose of these tests was not to measure peak throughput, but instead how
many of these applications could be running and still achieve response times under 1 second.
Compared to MQ Series V5.2, WebSphere MQ for iSeries V5.3 shows an improvement of 40-70% in
the number of client-server applications that can be driven in this manner, and an improvement of
about 10% in the number of distributed applications.
y
y
Use of a trusted listener process generally results in a reduction in CPU utilization of 5-10% versus
using the standard default listener. In addition, the use of trusted applications can result in reductions
in CPU of 15-40%. However, there are other considerations to take into account prior to using a
trusted listener or applications. Refer to the “Other Sources of Information” section below to find
other references on this subject.
MQ performance can be sensitive to the amount of memory that is available for use by this product. If
you are seeing a significant amount of faulting and paging occurring in the memory pools where
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 12 - MQ Series
175
Download from Www.Somanuals.com. All Manuals Search And Download.
applications using MQ Series are running, you may need to consider adding memory to these pools to
help performance.
y
y
Nonpersistent messages use significantly less CPU and IO resource than persistent messages do
because persistent messages use native journaling support on the iSeries to ensure that messages are
recoverable. Because of this, persistent messages should not be used where nonpersistent messages
will be sufficient.
If persistent messages are needed, the user can manually create the journal receiver used by MQ
Series on a user ASP in order to ensure best overall performance (MQ defaults to creating the receiver
on the system ASP). In addition, the disk arms and IOPs in the user ASP should have good response
times to ensure that you achieve maximum capacities for your applications that use persistent
messages.
Other Sources of Information
In addition to the above mentioned support pacs, you can refer to the following URL for reference guides,
online manuals, articles, white papers and other sources of information on MQ Series:
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 12 - MQ Series
176
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 13. Linux on iSeries Performance
13.1 Summary
Linux on iSeries expands the iSeries platform solutions portfolio by allowing customers and software
vendors to port existing Linux applications to the iSeries with minimal effort. But, how does it shape up
in terms of performance? What does it look like generally and from a performance perspective? How
can one best configure an iSeries machine to run Linux?
Key Ideas
y
y
"Linux is Linux." Broadly speaking, Linux on iSeries has the same tools, function, look-and-feel of
any other Linux.
Linux operates in its own independent partition, though it has some dependency on OS/400 for a few
key services like IPL ("booting").
y
y
y
y
Virtual LAN and Virtual Disk provide differentiation for iSeries Linux.
Shared Processors (fractional CPUs) provides additional differentiation.
Linux on iSeries provides a mechanism to port many UNIX and Linux applications to iSeries.
Linux on iSeries particularly permits Linux-based middleware to exploit OS/400 function and data in
a single hardware package.
y
y
Linux on iSeries is available on selected iSeries hardware (see IBM web site for details).
Linux is not dependent per se on OS/400 releases. Technically, any Linux distribution could be
hosted by any of the present two releases (V5R1 or V5R2) that allow Linux. It becomes a question of
service and support. Users should consult product literature to make sure there is support for their
desired combination.
y
Linux and other Open Source tools are almost all constructed from a single Open Source compiler
known as gcc. Therefore, the quality of its code generation is of significant interest. Java is a
significant exception to this, having its own code generation.
13.2 Basic Requirements -- Where Linux Runs
For various technical reasons, Linux may only be deployed on systems with certain hardware facilities.
These are:
y
y
Logical partitioning (LPAR). Linux is not part of OS/400. It needs to have its own partition of the
system resources, segregated from OS/400 and, for that matter, any other Linux partitions. A special
software feature called the Hypervisor keeps each partition operating separately.
“Guest” Operating System Capability. This begins in V5R1. Part of the iSeries Linux freedom
story is to run Linux as Linux, including code from third parties running with root authority and other
privilege modes. By definition, such code is not provided by IBM. Therefore, to keep OS/400 and
Linux segregated from each other, a few key hardware facilities are needed that are not present on
earlier models. (When all partitions run OS/400, the hypervisor’s task is simplified, permitting older
iSeries and AS/400 to run LPAR).
In addition, some models and processor feature codes can run Linux more flexibly than others. The two
key features that not all Linux-capable processors support are:
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 13 - Linux
177
Download from Www.Somanuals.com. All Manuals Search And Download.
y
y
Shared Processors. This variation of LPAR allows the Hypervisor to use a given processor in
multiple partitions. Thus, a uni-processor might be divided in various fractions between (say) three
LPAR partitions. A four way SMP might give 3.9 CPUs to one partition and 0.1 CPUs to another.
This is a large and potentially profitable subject, suitable for its own future paper. Imagine
consolidating racks of old, under utilized servers to several partitions, each with a fraction of an
iSeries CPU driving it.
Hardware Multi-tasking. This is controlled by the system-wide value QPRCMLTTSK, which, in
turn, is controlled by the primary partition. Recent AS/400 and iSeries machines have a feature called
hardware multi-tasking. This enables OS/400 (or, now, Linux) to load two jobs (tasks, threads, Linux
processes, etc.) into the CPU. The CPU itself will then alternate execution between the two tasks if
one task waits on a hardware resource (such as during a cache miss). Due to particular details of
some models, Linux cannot run with this enabled. If so, as a practical matter, the entire machine must
run with it disabled. In machines where Linux supports this, the choice would be based on
experience -- enabling hardware multi-tasking usually boosts throughput, but on occasion would be
turned off.
Which models and feature codes support Linux at all and which enable the specific features such as
shared processors and hardware multi-tasking are revealed on the IBM iSeries Linux web site.
13.3 Linux on iSeries Technical Overview
Linux on iSeries Architecture
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 13 - Linux
178
Download from Www.Somanuals.com. All Manuals Search And Download.
iSeries Linux is a program-execution environment on the iSeries system that provides a traditional
memory model (not single-level store) and allows direct access to machine instructions (without the
mapping of MI architecture). Because they run in their own partition on a Linux Operating System,
programs running in iSeries Linux do have direct access to the full capabilities of the user-state and even
most supervisor state architecture of the original PowerPC architecture. They do not have access to the
single level store and OS/400 facilities. To reach OS/400 facilities requires some sort of
machine-to-machine interface, such as sockets. A high speed Virtual LAN is available to expedite and
simplify this communication.
Storage for Linux comes from two sources: Native and Virtual disks (the latter implemented as OS/400
Network Storage). Native access is provided by allocating ordinary iSeries hard disk to the Linux
partition. Linux can, by a suitable and reasonably conventional mount point strategy, intermix both
native and virtual disks. The Virtual Disk is analogous to some of the less common Linux on Intel
distributions where a Linux file system is emulated out of a large DOS/Windows file, except that on
OS/400, the storage is automatically “striped” to multiple disks and, ordinarily, RAIDed.
Linux partitions can also have virtual or native local area networks. Typically, a native LAN would be
used for communications to the outside world (including the next fire wall) and the virtual LAN would be
used to communicate with OS/400. In a full-blown DMZ (“demilitarized zone”) solution, one Linux
application partition could provide a LAN interface to the outer fire wall. It could then talk to a second
providing the inner fire wall, and then the second Linux partition could use virtual LAN to talk to OS/400
to obtain OS/400 services like data base. This could be done as three total Linux partitions and an
OS/400 partition in the back-end.
See "The Value of Virtual LAN and Virtual Disk" for more on the virtual facilities.
Linux on iSeries Run-time Support
Linux brings significant support including X-Windows and a large number of shells and utilities.
Languages other than C (e.g. Perl, Python, PHP, etc.) are also supported. These have their own history
and performance implications, but we can do no more than acknowledge that here. There are a couple of
generic issues worth highlighting, however.
Applications running in iSeries Linux work in ASCII. At present, no Linux-based code generator
supports EBCDIC nor is that likely. When talking from Linux to OS/400, care must be taken to deal with
ASCII/EBCDIC questions. However, for a great fraction of the ordinary Internet and other sockets
protocols, it is the OS/400 that is required to shoulder the burden of translation -- the Linux code can and
should supply the same ASCII information it would provide in a given protocol. Typically, the
translation costs are on the order of five percent of the total CPU costs, usually on the OS/400 side.
iSeries Linux, as a regular Linux distribution, has as much support for Unicode as the application itself
provides. Generally, the Linux kernel itself currently has no support for Unicode. This can complicate
the question of file names, for instance, but no more or no less than any other Linux environment. Costs
for translating to and from Unicode, if present, will also be around five percent, but this will be
comparable to other Linux solutions.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 13 - Linux
179
Download from Www.Somanuals.com. All Manuals Search And Download.
13.4 Basic Configuration and Performance Questions
Since, by definition, iSeries Linux means at least two independent partitions, questions of configuration
and performance get surprisingly complicated, at least in the sense that not everything is on one operating
system and whose overall performance is not visible to a single set of tools.
Consider the following environments:
y
y
A machine with a Linux and an OS/400 partition, both running CPU-bound work with little I/O.
A machine with a Linux and an OS/400 partition, both running work with much I/O or with Linux
running much I/O and the OS/400 partition extremely CPU-bound.
The first machine will tend to run as expected. If Linux has 3 of 4 CPUs, it will consume about 0.75 of
the machine's CPW rating. In many cases, it will more accurately be observed to consume 0.75 of the
CIW rating (processor bound may be better predicted by CIW, absent specific history to the contrary).
The second machine may be less predictable. This is true for regular applications as well, but it could be
much more visible here.
Special problems for I/O bound applications:
y
y
The Linux environment is independently operated.
Virtual disk, generally a good thing, may result in OS/400 and Linux fighting each other for disk
access. This is normal if one simply were deploying two traditional applications on an iSeries, but
the partitioning may make this more difficult to observe. In fact, one may not be able to attribute the
I/O to “anything” running on the OS/400 side, since the various OS/400 performance tools don’t
know about any other partition, much less a Linux one. Tasks representing Licensed Internal Code
may show more activity, but attributing this to Linux is not straightforward.
y
If the OS/400 partition has a 100 per cent busy CPU for long periods of time, the facilities driving the
I/O on the OS/400 side (virtual disk, virtual LAN, shared CD ROM) must fight other OS/400 work
for the processor. They will get their share and perhaps a bit more, but this can still slow down I/O
response time if the '400 partition is extremely busy over a long period of time.
Some solutions:
y
In many cases, awareness of this situation may be enough. After all, new applications are deployed in
a traditional OS/400 environment all the time. These often fight existing, concurrent applications for
the disk and may add "system" level overhead beyond the new jobs alone. In fact, deploying Virtual
Disk in a large, existing ASP will normally optimize performance overall, and would be the first
choice. Still, problems may be a bit harder to understand if they occur.
y
y
Existing OS/400 guidelines suggest that disk utilization be kept below 42 per cent for non-load source
units. That is, controlling disk utilization for both OS/400 and the aggregate Linux Virtual Disks will
also control CPU costs. If this can be managed, sharing an ASP should usually work well.
However, since Linux is in its own partition, and doesn’t support OS/400 notions of subsystem and
job control, awareness may not be enough. Alternate solutions include native disk and, usually better,
segregating the Linux Virtual Disk (using OS/400 Network Storage objects) into a separate ASP.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 13 - Linux
180
Download from Www.Somanuals.com. All Manuals Search And Download.
13.5 General Performance Information and Results
A limited number of performance related tests have been conducted to date, comparing the performance
of iSeries Linux to other environments on iSeries and to compare performance to similarly configured
(especially CPU MHz) pSeries running the application in an AIX environment.
Computational Performance -- C-based code
A factor not immediately obvious is that most Linux and Open Source code are constructed with a single
compiler, the GNC (gcc or g++) compiler.
In Linux, computational performance is usually dominated by how the gcc/g++ compiler stacks up
against commercial alternatives such as xlc (OS/400 PASE) and ILE C/C++ (OS/400). The leading cause
of any CPU performance deficit for Linux (compared to Native OS/400 or OS/400 PASE) is the quality of
the gcc compiler's code generation. This is widely known in the Open Source community and is
independent of the CPU architecture.
Generally, for integer-based applications (general commercial):
y
y
y
OS/400 PASE (xlc) gives the fastest integer performance.
ILE C/C++ is usually next
Linux (gcc) is last.
Ordinarily, all would be well within a binary order of magnitude of each other. The difference is close
enough that ILE C/C++ sometimes is faster than OS/400 PASE. Linux usually lags slightly more, but is
usually not significantly slower.
Generally, for applications dominated by floating point, the rankings change somewhat.
y
y
OS/400 PASE almost always gives the fastest performance.
Linux and ILE C/C++ often trail substantially. In one measurement, Linux took 2.4 times longer than
PASE.
ILE C/C++ floating point performance will be closer to Linux than to OS/400 PASE. Note carefully that
most commercial applications do not feature floating point.
This chart shows some general expectations that have been confirmed in several workloads.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 13 - Linux
181
Download from Www.Somanuals.com. All Manuals Search And Download.
Fraction of ILE Performance
1.2
1
0.8
0.6
0.4
0.2
0
Integer
Floating
Point
Linux
ILE
PASE
Computational Environment
One virtue of the i870, i890, and i825 machines is that the hardware floating point unit can make up for
some of the code generation deficit due to its superior hardware scheduling capabilities.
Computational Performance -- Java
Generally, Java computational performance will be dictated by the quality of the JVM used. Gcc
performance considerations don't apply (occasional exception: Java Native Methods). Performance on
the same hardware with other IBM JVMs will be roughly equal, except that newer JVMs will often arrive
a bit later on Linux. The IBM JVM is almost always much faster than the typical open source JVM
supplied in many distributions.
Web Serving Performance
Work has been done with web serving solutions. Here is some information (primarily useful for sizing,
not performance per se), which gives some idea of the web serving capacity for static web serving.
Number of 840 0.5
Processors in
Partition
1
2
4
# of web server
hits per second,
Apache 1.3
# of web server
hits per second,
khttpd
514
1,024
1,726
1,878
3,984
3,755
4,961
860
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 13 - Linux
182
Download from Www.Somanuals.com. All Manuals Search And Download.
Here, a model 840 was subdivided into the partition sizes shown and a typical web serving load was used.
A "hit" is one web page or one image. The kttpd is a kernel-based daemon available on Linux which
serves only static web pages or images. It can be cascaded with ordinary Apache to provide dynamic
content as well. The other is a standard Apache 1.3 installation. The 820 or 830 would be a bit less, by
about 10 per cent, than the above numbers.
Network Operations
Here's some results using Virtual and 100 megabit ethernet. This pattern was repeated in several
workloads using 820 and 840 processors:
TCP/IP Function
Transmit Data
Make Connections
100 megabit Ethernet LAN Virtual LAN
50-90 megabits per second
200-3000 connections per
second
200-400 megabits per second
1100-9500 connections per
second
The 825, 870, and 890 should produce slightly higher virtual data rates and nearly the same 100 megabit
ethernet rates (since the latter is ultimately limited by hardware). The very high variance in the "make
connections" relates, in part, to the fact that several workloads with different complexities were involved.
We also have more limited measurements on Gigabit ethernet showing about 450 megabits per second for
some forms of data transmission. For planning purposes, a rough parity between gigabit and Virtual LAN
should be assumed.
Gcc and High Optimization (gcc compiler option -O3)
The current gcc compiler is used for a great fraction of Linux applications and the Linux kernel. At this
writing, the current gcc version is ordinarily 2.95, but this will change over time. This section applies
regardless of the gcc version used. Note also that some things that appear to be different compilers (e.g.
g++) are front-ends for other languages (e.g. C++) but use gcc for actual generation of code.
Generally speaking, RISC architectures have assumed that the final, production version of an application
would be deployed at a high optimization. Therefore, it is important to specify the best optimization level
(gcc option -O3) when compiling with gcc or any gcc derivatives. When debugging (-g), optimization is
less important and even counterproductive, but for final distribution, optimization often has dramatic
performance differences from the base case where optimization isn’t specified.
Programs can run twice as fast or even faster at high versus low optimization. It may be worthwhile to
check and adjust Makefiles for performance critical, open source products. Likewise, if compilers other
than gcc are deployed, they should be examined to see if their best optimizations are used.
One should check the man page (man gcc at a command line) to see if other optimizations are warranted.
Some potentially useful optimizations are not automatically turned on because not all applications may
safely use them.
The Gcc Compiler, Version 3
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 13 - Linux
183
Download from Www.Somanuals.com. All Manuals Search And Download.
As noted above, many distributions are based on the 2.95 gcc compiler. The more recent 3.2 gcc is also
used by some distributions. Results there shows some variability and not much net improvement. To the
extent it improves, the gap with ILE should close somewhat. Floating point performance is improved, but
proportionately. None of the recommendations in terms of Linux versus other platforms change because
the improvement is too inconsistent to alter the rankings, though it bears watching in the future as gcc has
more room to improve. This is comparing at -O3 as per the prior section's recommendations.
13.6 Value of Virtual LAN and Virtual Disk
Virtual LAN
Virtual LAN is a high speed interconnect mechanism which appears to be an ordinary ethernet LAN as
far as Linux is concerned.
There are several benefits from using Virtual LAN:
y
y
Performance. It functions approximately on a par with Gigabit ethernet (see previous section,
Network Primitives).
Cost. Since it uses built-in processor facilities accessed via the Hypervisor (running on the hosting
OS/400 partition), there are no switches, hubs, or wires to deal with. At gigabit speeds, these costs
can be significant.
y
Simplification and Consolidation. It is easy to put multiple Linux partitions on the same Virtual
LAN, achieving the same kinds of topologies available in the real world. This makes Virtual LAN
ideal for server consolidation scenarios.
The exact performance of Virtual LAN, as is always the case, varies based on items like average IP
packet size and so on. However, in typical use, we've observed speeds of 200 to 400 megabits per second
on 600 MHz processors. The consumption on the OS/400 side is usually 10 per cent of one CPU or less.
Virtual Disk
Virtual Disk simulates an arbitrarily sized disk. Most distributions make it "look like" a large, single IDE
disk, but that is an illusion. In reality, the disks used to implement it are based on OS/400 Network
Storage (*NWSSTG object) and will be allocated from all available (SCSI) disks on the Auxiliary
Storage Pool (ASP) containing the Network Storage. By design, OS/400 Single Level Store always
"stripes" the data, so Linux files of a nontrivial size are accordingly spread over multiple physical disks.
Likewise, a typical ASP on OS/400 will have RAID-5 or mirrored protection, providing all the benefits of
these functions without any complexity on the Linux side at all.
Thus, the advantages are:
y
y
Performance. Parallel access is possible. Since the data is striped, it is possible for the data to be
concurrently read from multiple disks.
Reduction in Complexity. Because it looks like one large disk to Linux, but is typically implemented
with RAID-5 and striping, the user does not need to deploy complex strategies such as Linux Volume
Management and other schemes to achieve RAID-5 and striping. Moreover, obtaining both strategies
(which are, in effect, true by default in OS/400) is more complex still in the Linux environment.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 13 - Linux
184
Download from Www.Somanuals.com. All Manuals Search And Download.
y
y
Cost. Because the disk is virtual, it can be created to any size desired. For some kinds of Linux
partitions, a single modern physical disk is overkill -- providing far more data than required. These
requirements only increase if RAID, in particular, is specified. Here, the Network Storage object can
be created to any desired size, which helps keep down the cost of the partition. For instance, for some
kinds of middleware function, Linux can be deployed anywhere between 200 MB and 1 GB or so,
assuming minimal user data. Physical disks are nowadays much larger than this and, often, much
larger than the actual need, even when user/application data is added on.
Simplification and Consolidation. The above advantages strongly support consolidation scenarios.
By "right sizing" the required disk, multiple Linux partitions can be deployed on a single iSeries,
using only the required amount of disk space, not some disk dictated or RAID-5 dictated minimum.
Additional virtual disks can be readily added and they can be saved, copied. etc. using OS/400
facilities.
In terms of performance, the next comparison is compelling, but also limited. Virtual Disk can be much
faster than single Native disks. In a really large and complex case, a Native Disk strategy would also
have multiple disks, possibly managed by the various Linux facilities available for RAID and striping.
Such a usage would be more competitive. But we anticipate that, for many uses of Linux, that level of
complexity will be avoided. This makes our comparison fair in the sense that we are comparing what real
customers will select between and solutions which, for the iSeries customer, have comparable complexity
to deploy.
y
y
1 disk Intel box, 667 MHz CPU: 5 MB/sec for block writes, 3.4 MB/sec for block reads.
Virtual Disk, OS/400 1 600 MHz CPU: 112 MB/sec for block writes, 97 MB/sec for block reads
As noted, this is not an absolute comparison. Linux has some file system caching facilities that will
moderate the difference in many cases. The absolute numbers are less important than the fact that there is
an advantage. The point is: To be sure of this level of performance from the Intel side, more work has to
be done, including getting the right hardware, BIOS, and Linux tools in place. Similar work would also
have to be done using Native Disk on iSeries Linux. Whereas, the default iSeries Virtual Disk
implementation has this kind of capability built-in.
13.7 DB2 UDB for Linux on iSeries
One exciting development has been the release of DB2 UDB V8.1 for Linux on iSeries. The iSeries now
offers customers the choice of an enterprise level database in Linux as well as OS/400.
The choice of which operating environment to use (OS/400 or Linux) will typically be determined by
which database a specific application supports. In some cases (e.g., home-grown applications), both
operating environments are choices to support the new application. Is performance a reason to select
Linux or OS/400 for DB2 UDB workloads?
Initial performance work suggests :
1. If an OLTP application runs well with either of these two data base products, there would not normally
be enough performance difference to make the effort of porting from one to the other worthwhile. The
OS/400-based DB2 product is a bit faster in our measurements, but not enough to make a compelling
difference. Note also that all Linux DB2 performance work to date has used the iSeries virtual storage
capabilities where the Linux storage is managed as objects within OS/400. The virtual storage option is
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 13 - Linux
185
Download from Www.Somanuals.com. All Manuals Search And Download.
typically recommended because it allows the Linux partitions to leverage the storage subsystem the
customer has in the OS/400 hosting partition.
2. As the application gains in complexity, it is probably less likely that the application should switch
from one product to the other. Such applications tend to implicitly play to particular design choices of
their current product and there is probably not much to gain from moving them between products.
3. As scalability requirements grow beyond a 4-way, the DB2 on OS/400 product provides proven
scalability that Linux may not match at this time. If functional requirements of the application require
DB2 UDB on Linux and scaling beyond 4 processors, then a partitioned data base and multiple LPARs
should be explored.
See also the IBM eServer Workload Estimator for sizing information and further considerations when
adding the DB2 UDB for Linux on iSeries workload to your environment.
13.8 Linux on iSeries and IBM eServer Workload Estimator
At this writing, the Workload Estimator contains the following workloads for Linux on iSeries:
y
y
y
y
File Serving
Web Serving
Network Infrastructure (Firewall, DNS/DHCP)
Linux DB2 UDB
These contain estimators for the above popular applications, helpful for estimating system requirements.
Consult the latest version of Workload Estimator, including its on-line help text, when specifying a
system containing relevant Linux partitions. The workload estimator can be accessed from a web browser
13.9 Top Tips for Linux on iSeries Performance
Here's a summary of top tips for improving your Linux on iSeries LPAR performance:
y
Keep up to date on OS/400 PTFs for your hosting partition. This is a traditional, but still useful
recommendation. So far, some substantial performance improvements have been delivered in fixes
for Virtual LAN and Virtual Disk in particular.
y
Investigate keeping up to date with your distribution's kernel. Since these are not offered by
IBM, this document cannot make any claims whatever about the value of upgrading the kernel
provided by your Linux distributor. That said, it may be worth your while to investigate and see if
any kernel updates are provided and whether you, yourself can determine if they aid your
performance.
y
If possible, compare your Distribution's versions. This is a topic well beyond this paper in any
detail, but in practice fairly simple. A Linux distributor might offer several versions of Linux at any
given moment. Usually, you will wish the latest version, as it should be the fastest. But, if you can
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 13 - Linux
186
Download from Www.Somanuals.com. All Manuals Search And Download.
do so, you may wish to compare with the next previous version. This would be especially important
if you have one key piece of open source code largely responsible for the performance of a given
partition. There is no way of ensuring that a new distribution is actually faster than the predecessor
except to test it out. While, formally, no open source product can ever be withdrawn from the
marketplace, actual support (from your distributor or possibly other sources) is always a consideration
in making such a call.
y
y
y
Evaluate upgrading to gcc 3 or sticking with 2.95. At this writing, the 3.2 version of gcc and
perhaps later versions are being delivered, but some other version may be more relevant by the time
you read these words. Check with your Linux distributor about when or if they choose to make it
available. With sufficiently strong Linux skills, you might evaluate and perform the upgrade to this
level yourself for some key applications if it helps them. The distribution may also continue to make
2.95 available (largely for functional reasons). Note also that many distributions will distribute only
one compiler. If multiple compilers are shipped with your distribution, and the source isn't dependent
on updated standards, you might have the luxury of deciding which to use.
Avoid "awkward" LPAR sizes. If you are running with shared processors, and your sizing
recommends one Linux partition to have 0.29 CPUs and the other one 0.65 CPUs, check again. You
might be better off running with 0.30 and 0.70 CPUs. The reason this may be beneficial is that your
two partitions would tend to get allocated to one processor most of the time, which should give a little
better utilization of the cache. Otherwise, you may get some other partition using the processor
sometimes and/or your partitions may more frequently migrate to other processors. Similarly, on a
very large machine (e.g. an 890), the overall limit of 32 partitions on the one hand and the larger
number of processors on the other begins to make shared processors less interesting as a strategy.
Use IBM's JVM, not the default Java typically provided. IBM's PowerPC Java for Linux is now
present on most distributions or it might be obtained in various ways from IBM. For both function
and performance, the IBM Java should be superior for virtually all uses. On at least one distribution,
deselecting the default Java and selecting IBM's Java made IBM's Java the default. In other cases,
you might have to set the PATH and CLASSPATH environment variables to put IBM's Java ahead of
the one shipped with most distributions.
y
y
For Web Serving, investigate khttpd. There is a kernel extension, khttpd, which can be used to
serve "static" web pages and still use Apache for the remaining dynamic functionality. Doing so
ordinarily improves performance
Keep your Linux partitions to a 4-way or less if possible. There will be applications that can
handle larger CPU counts in Linux, and this is improving as new kernels roll out (up to 8-way is now
possible). Still, Linux scaling remains inferior to OS/400 overall. In many cases, Linux will run
middleware function which can be readily split up and run in multiple partitions.
y
Make sure you have enough storage in the machine pool to run your Virtual Disk function.
Often, an added 512 MB is ample and it can be less. In addition, make sure you have enough CPU to
handle your requirements as well. These are often very nominal (often, less than a full CPU for fairly
large Linux partition, such as a 4-way), but they do need to be covered. Keep some reserve CPU
capacity in the OS/400 partition to avoid being "locked out" of the CPU while Linux waits for Virtual
Disk and LAN function.
y
Make sure you have some "headroom" in your OS/400 hosting partition for Virtual I/O. A rule
of thumb would be 0.1 CPUs in the host for every CPU in a Linux partition, presuming it uses a
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 13 - Linux
187
Download from Www.Somanuals.com. All Manuals Search And Download.
substantial amount of Virtual I/O. This is probably on the high side, but can be important to have
something left over. If the hosting partition uses all its CPU, Virtual I/O may slow substantially.
y
y
Use Virtual LAN for connections between iSeries partitions whether OS/400 or Linux. If your
OS/400 PTFs are up to date, it performs roughly on a par with gigabit ethernet and has zero hardware
cost, no switches and wires, etc.
Use Virtual Disk for disk function. Because virtual disk is spread ("striped") amongst all the disks
on an OS/400 ASP, virtual disk will ordinarily be faster. Moreover, with available features like
mirroring and RAID-5, the data is also protected much better than on a single disk. Certainly, the
equivalent function can be built with Linux, but it is much more complex (especially if both RAID-5
and striping is desired). A virtual disk gives the advantages of both RAID-5 and data "striping" and
yet it looks like an ordinary, single hard file to Linux.
y
y
Use Hardware Multithreading if available. While this will not always work, Hardware
multithreading (a global parameter set for all partitions) will ordinarily improve performance by 10 to
25 per cent. Make sure that it profits all important partitions, not just the current one under study,
however. Note that some models cannot run with QPRCMLTTSK set to one ("on") and for the
models 825, 870, and 890, it is not applicable.
Use Shared Processors, especially to support consolidation. There is a global cost of about 8 per
cent (sometimes less) for using the Shared Processors facility. This is a general Hypervisor overhead.
While this overhead is not always visible, it should be planned for as it is a normal and expected
result. After paying this penalty, however, you can often consolidate several existing Linux servers
with low utilization into a single iSeries box with a suitable partition strategy. Moreover, the Virtual
LAN and Virtual Disk provide further performance, functional, and cost leverage to support such
uses. Remember that some models do not support Shared Processors.
y
y
Use spread_lpevents=n when using multiple Virtual Processors from a Shared Processor Pool.
This kernel parameter causes processor interrupts for your Linux partition to be spread across n
processors. Workloads that experience a high number of processor interrupts may benefit when using
this parameter. See the Redbooks or manuals for how to set kernel parameters at boot time.
Avoid Shared Processors when their benefits are absent. Especially as larger iSeries boxes are
used (larger in terms of CPU count), the benefits of consolidation may often be present without using
Shared Processors and its expected overhead penalty. After all, with 16 or more processors, adding
or subtracting a processor is now less than 10 per cent of the overall capacity of the box. Similarly,
boxes lacking Shared Processor capability may still manage to fit particular consolidation
circumstances very well and this should not be overlooked.
y
Watch your "MTU" sizes on LANs. Normally, they are set up correctly, but it is possible to
mismatch the MTU (transmission unit) sizes for OS/400 and Linux whether Virtual or Native LAN.
For Virtual LAN, both sides should be 9000. For 100 megabit Native, they should be 1500. These
are the values seen in ifconfig under Linux. On OS/400, for historical reasons, the correct values are
8996 and 1496 respectively and tend to be called "frame size." If OS/400 says 1496 and Linux says
1500, they are identical. Also, when looking at the OS/400 line description, make sure the "Source
Service Access Point" for code AA is also the same as the frame size value. The others aren't critical.
While it is certain that the frame sizes on the same device should be identical, it may also be
profitable to have all the sizes match. In particular, testing may show that virtual LAN should be
changed to 1500/1496 due to end-to-end considerations on critical network paths involving both
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 13 - Linux
188
Download from Www.Somanuals.com. All Manuals Search And Download.
Native and Virtual LAN (e.g. from outside the box on Native LAN, through the partition with the
Native LAN, and then moving to a second partition via Virtual LAN then to another).
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 13 - Linux
189
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 14. DASD Performance
This chapter discusses DASD subsystems available for the System i platform.
There are two separate considerations. Before IBM i operating system V6R1, one only had to
consider particular devices, IOAs, IOPs, and SAN devices. All attached through similar
strategies directly to IBM i operating system and were all supported natively.
Starting in IBM iV6R1, however, IBM i operating system will be permitted to become a virtual
client of an IBM product known as VIOS. The supported BladeCenter products like the JS12
Express and JS22 Express will only be available in this fashion. For other IBM Power Systems
it will be possible to attach all or some of the disks in this manner. This product and its
implications will be discussed commencing with section 14.5.
14.1 Internal (Native) Attachment.
This section is intended to show relative performance differences in Disk Controllers which I
will refer to as IOAs, DASD and IOPs, for customers to compare some of the available
hardware. The workload used for our throughput measurements should not be used to gauge the
workload capabilities of your system, since it is not a customer like workload.
The workload is designed to concentrate more on DASD, IOAs and IOPs, not the system as a
whole. Workload throughput is not a measurement of operations per second but an activity
counter in the workload itself. No LPAR’s were used, all system resources were dedicated to the
testing. The workload is batch and I/O intensive (small block reads and writes).
This chapter refers to disk drives and disk controllers (IOAs) using their CCIN number/code.
The CCIN is what the system uses to understand what components are installed and is unique by
each device. It is a four character, alphanumeric code. When you use commands in IBM i
operating system to print your system configuration like PRTSYSINF or use the WRKHDWRSC
*STG command to display hardware configuration information for your storage devices like the
571E or 571F disk controllers you see a listing of CCIN codes.
Note that the feature codes used in IBM's ordering system, e-config tool and inventory records
are a four character numeric code which may or may not match the CCIN. IBM will sometimes
use different features for the exact same physical device in order to communicate how the
hardware is configured to the e-config tool or to provide packaging or pricing structures for the
disk drive or IOA. For example, feature code 5738 and 5777 both identify a 571E IOA. A
fairly complete list of CCIN and their feature codes can be found in an appendix of the System
Builder located near the end of the publication, and a partial list can be found on the following
page.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
190
Download from Www.Somanuals.com. All Manuals Search And Download.
14.1.0 Direct Attach (Native)
14.1.1 Hardware Characteristics
14.1.1.1 Devices & Controllers
Max Drive Interface Speed (MB/s) when
mounted in a given enclosure
Seek Time (ms)
Approximat
e Size
CCIN
Codes
Latency
(ms)
RPM
Write
5094/
5294
80
5786/
5787
NA
(GB)
Read
5074/5079
6718
6719
4326
4327
4328
4329
433B
433C
433D
18
35
35
10K
10K
15K
15K
15K
15K
15K
15K
15K
4.9
4.7
3.6
3.6
3.6
3.6
3.5
3.5
3.5
5.9
5.3
4.0
4.0
4.0
4.0
4.0
4.0
4.0
3
3
2
2
2
2
2
2
2
80
80
160
NA
Not Supported
Not Supported
Not Supported
Not Supported
N/A
160
160
160
320
320
320
320
N/A
N/A
N/A
70
140
280
70
140
280
Not Supported
N/A
N/A
N/A
N/A
N/A
CCIN Codes
(IOA)
Feature Codes
Cache
non-compressed
/ up to compressed
Min/Max # of
drives in a
Max Drive
Interface Speed
supported #1
(MB/s)
RAID set
5702
5703
2757
5705, 5712, 5715, 0624
5703
NA
40 MB
235 MB / up to 757
235 MB write/up to 757
256 MB read/up to 1GB
NA
3/18
3/18
160
320
160
5581, 2757, 5591
2780
5580, 2780, 5590
3/18
320
5709
Write cache card
for built in IOA
573D
Write cache card
for built in IOA
57B8
5709, 5726, 9509
16 MB
40 MB
175 MB
3/8
NA
5727, 5728, 9510
5679
3/8
NA
300
3/18 RAID5
4/18 RAID6
(Aux cache card
57B7)
571A
5736, 5775, 0647
5737, 5776, 0648
NA
NA
320
320
3/18 RAID5
4/18 RAID6
3/18 RAID5
4/18 RAID6
3/18 RAID5
4/18 RAID6
NA
571B
90 MB
390 MB write/up to 1.5GB
415 MB read/up to 1.6GB
390 MB write/up to 1.5GB
415 MB read/up to 1.6 GB
NA
571E/574F
571F/575B
5738, 5777, 5582, 5583
320
320
5739, 5778, 5781, 5782,
5799, 5800
572C
572C
572A
300
300
572A
NA
NA
Note: The actual drive interface speed (MB/s) is the minimum value of the maximum supported speeds of the drive,
the enclosure and the IOA. Also note that the minimum value for the various drive & enclosure combinations are
identified in the above table.
Not all disk enclosures support the maximum number of disks in a RAID set.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
191
Download from Www.Somanuals.com. All Manuals Search And Download.
14.1.2 iV5R2 Direct Attach DASD
This section discusses the direct attach DASD subsystem performance improvements that were
new with the iV5R2 release. These consist of the following new hardware and software
offerings :
y 2757 SCSI PCI RAID Disk Unit Controller (IOA)
y 2780 SCSI PCI RAID Disk Unit Controller (IOA)
y 2844 PCI Node I/O Processor (IOP)
y 4326 35 GB 15K RPM DASD
y 4327 70 GB 15K RPM DASD
Note: for more information on the older IOAs and IOPs listed here see a previous copy of the
Performance Capabilities Reference.
14.1.2.1
I/O Intensive Workload Performance Comparison
Compare 2778/2757 & 5074 vs 5094
2778/6718 5074 tower 18GB 10K rpm
2757/6718 5074 tower 18GB 10K rpm
2778/6719 5074 tower 35GB 10K rpm
2757/6719 5074 tower 35GB 10K rpm
2757/6719 5094 tower 35GB 10K rpm
2757/4326 5094 tower 35GB 15K rpm
2757/4327 5094 tower 70GB 15K rpm
2757/4328 5094 tower 140GB 15K rpm
40% DASD Subsystem Utilization
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
1000
2000
3000
4000
5000
6000
7000
Workload Throughput
For our workload we attempt to fill the DASD units to between 40 and 50% full so you are
comparing units with more actual data, but trying to keep the relative seek distances similar. The
reason is that larger capacity drives can appear to be faster than lower capacity drives in the
same environment running the same workload in the same size database. That perceived
improvement can disappear, or even reverse depending upon the workload (primarily because of
where on the disks the data is physically located).
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
192
Download from Www.Somanuals.com. All Manuals Search And Download.
14.1.2.2
I/O Intensive Workload
2778 IOA vs 2757 IOA 15 RAID DASD
50
40
30
20
10
0
2778/6718 10K RPM
2778/6719 10K RPM
2757/6718 10K RPM
2757/6719 10K RPM
2757/4326 15K RPM
2757/4327 15K RPM
2757/4328 15K RPM
0
500
1000
1500
2000
Ops/Sec
IOA and operation
2778 IOA
Number of 35 GB DASD units (Measurement numbers in GB/HR)
15 Units
41
30 Units
83
45 Units
122
Save
*SAVF
Restore
41
83
122
2757 IOA
Save
Restore
82
82
165
165
250
250
*SAVF
This restrictive test is intended to show the effect of the 2757 IOAs in a backup and recovery
environment. The save and restore operations to *SAVF (save files) were done on the same set
of DASD, meaning we were reading from and writing to the same 15, 30, and 45 DASD units at
the same time. So the number of I/O DASD operations are double when saving to *SAVF.
This was not meant to show what can be expected from a backup environment, see chapter 15
for save and restore device information.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
193
Download from Www.Somanuals.com. All Manuals Search And Download.
14.1.3 571B
iV5R4 offers two new options on DASD configuration.
y RAID6 which offers improved system protection on supported IOAs.
y NOTE: RAID6 is supported under iV5R3 but we have chosen to look at performance data on
a iV5R4 system.
y IOPLess operation on supported IOAs.
14.1.3.1 571B RAID5 vs RAID6 - 10 15K 35GB DASD
RAID-5 vs RAID-6
.
571B RAID-5 571B RAID-6
0.25
0.2
0.15
0.1
0.05
0
0
500
1000
1500
2000
2500
3000
3500
Workload Throughput
14.1.3.2 571B IOP vs IOPLESS - 10 15K 35GB DASD
The system CPU % used with and without and IOP was basically the same for the 571B with our
workload tests.
IOP vs IOPLess
571B RAID-5 IOP
571B RAID-5 IOPLess
0.25
0.2
0.15
0.1
0.05
0
0
500
1000
1500
2000
2500
3000
3500
Workload Throughput
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 14 DASD Performance
194
Download from Www.Somanuals.com. All Manuals Search And Download.
14.1.4 571B, 5709, 573D, 5703, 2780 IOA Comparison Chart
In the following two charts we are modeling a System i 520 with a 573D IOA using RAID5,
comparing 3 70GB 15K RPM DASD to 4 70GB 15K RPM DASD. The 520 is capable of
holding up to 8 DASD but many of our smaller customers do not need the storage. The charts
try to point out that there may be performance considerations even when the space isn’t needed.
14.1.4.1
573D 3 RAID5 DASD
573D 4 RAID5 DASD
0.5
0.4
0.3
0.2
0.1
0
0
200
400
Workload Throughput
600
800
14.1.4.2
573D 3 RAID5 DASD
573D 4 RAID5 DASD
40
30
20
10
0
0
100
200
300
Ops/Sec
400
500
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 14 DASD Performance
195
Download from Www.Somanuals.com. All Manuals Search And Download.
The charts below are an attempt to allow the different IOAs available to be compared on a single
chart. An I/O Intensive Workload was used for our throughput measurements. The system used
was a 520 model with a single 5094 attached which contained the IOAs for the measurements.
Note: the 5709 and 573D are cache cards for the built in IOA in the 520/550/570 CECs, even
though I show them in the following chart like they are the IOA. The 5709 had 8 - 15K 35GB
DASD units and the 573D had 8 - 15K 70 GB DASD, the maximum DASD allowed on the built
in IOAs.
The other IOAs used 10 - 15K 35GB DASD units. Again this is all for relative comparison
purposes as the 571B only supports 10 DASD units in a 5094 enclosure and a maximum of 12
DASD units in a 5095 enclosure. The 2757 and 2780 can support up to 18 DASD units with the
same performance characteristics as they display with the 10 DASD units, so when you are
considering the right IOA for your environment remember to take into account your capacity
needs along with this performance information.
571B RAID-5 IOP
571B RAID-6
573D RAID-5
2780 RAID-5
571B RAID-5 IOPLess
5709 RAID-5
5703 RAID-5
14.1.4.3
0.4
0.3
0.2
0.1
0
0
1000
2000
3000
4000
5000
Workload Throughput
571B RAID-5 IOP
571B RAID-6
573D RAID-5
2780 RAID-5
571B RAID-5 IOPLess
5709 RAID-5
5703 RAID-5
50
40
30
20
10
0
14.1.4.4
0
200
400
600
800
1000
1200
1400
Ops/Sec
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 14 DASD Performance
196
Download from Www.Somanuals.com. All Manuals Search And Download.
14.1.5 Comparing Current 2780/574F with the new 571E/574F and 571F/575B NOTE:
iV5R3 has support for the features in this section but all of our performance measurements
were done on iV5R4 systems. For information on the supported features see the IBM
Product Announcement Letters.
A model 570 4 way system with 48 GB of mainstore memory was used for the following. In
comparing the new 571E/574F and 571F/575B with the current 2780/574F IOA, the larger read
and write cache available on the new IOA’s can have a very positive effect on workloads.
Remember this workload is used to get a general comparison between new and current hardware
and cannot predict what will happen with all workloads.
Also note the 571E/574F requires the auxiliary cache card to turn on RAID and the 571F/575B
has the function included in its double-wide card packaging for better system protection.
Understanding of the general results are intended to help customers gauge what might happen in
their environments.
2780/571E/571F 15 DASD RAID5
2780/574F (15 DASD): IOP, RAID-5
571E/574F (15 DASD): IOPLess, RAID-5
571F/575B (15 DASD): IOPLess, RAID-5
14.1.5.1
0.2
0.15
0.1
0.05
0
0
2500
5000
7500
10000
Workload Throughput
2780/571E/571F 15 DASD RAID5
2780/574F (15 DASD): IOP, RAID-5
571E/574F (15 DASD): IOPLess, RAID-5
571F/575B (15 DASD): IOPLess, RAID-5
40
30
20
10
0
14.1.5.2
0
1,000
2,000
Ops/Sec
3,000
4,000
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 14 DASD Performance
197
Download from Www.Somanuals.com. All Manuals Search And Download.
14.1.6 Comparing 571E/574F and 571F/575B IOP and IOPLess
In comparing IOP and IOPLess runs we did not see any significant differences, including the
system CPU used. The system we used was a model 570 4 way, on the IOP run the system CPU
was 11.6% and on the IOPLess run the system CPU was 11.5%. The 571E/574F and
571F/575B display similar characteristics when comparing IOP and IOPLess environments, so
we have chosen to display results from only the 571E/574F.
IOP compared to IOPLess
14.1.6.1
571E/574F (15 - 35 GB DASD): IOP, RAID-5
571E/574F (15 - 35 GB DASD): IOPLess, RAID-5
0.2
0.15
0.1
0.05
0
0
2500
5000
7500
10000
Workload Throughput
IOP compared to IOPLess
14.1.6.2
571E/574F (15 - 35 GB DASD): IOP, RAID-5
571E/574F (15 - 35 GB DASD): IOPLess, RAID-5
35
30
25
20
15
10
5
0
0
1000
2000
3000
4000
Ops/Sec
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 14 DASD Performance
198
Download from Www.Somanuals.com. All Manuals Search And Download.
14.1.7 Comparing 571E/574F and 571F/575B RAID5 and RAID6 and Mirroring
RAID5, RAID6 and Mirroring we are interested in looking at the strength of failure protection vs
storage capacity vs the performance impacts to the system workloads.
A model 570 4 way system with 48 GB of mainstore memory was used for the following. First
comparing characteristics of RAID5 and RAID6; a customer can use Operations Navigator to
better control the number of DASD in a RAID set but for this testing we signed on at DST and
used default available to turn on our protection schemes. When turning on RAID5 the system
configured two RAID sets under our IOA, one with 9 DASD and one with 6 DASD with a total
disk capacity of 457 GB. For RAID6 the system created one RAID set with 15 DASD and a
capacity of 456 GB. This would generally be true for most customer configurations.
As you look at our run information you will notice that the performance boundaries of RAID6 on
the 571E/574F is about the same as the performance boundaries of our 2780/574F configured
using RAID5, so better protection could be achieved at current performance levels.
Another point of interest is that as long as a system is not pushing the boundaries, performance is
similar in both the RAID5 and RAID6 environments. RAID6 is overwhelmed quicker than
RAID5, so if RAID6 is desired for protection and the system workloads are approaching the
boundaries, DASD and IOAs may need to be added to the system to achieve the desired
performance levels. NOTE: If customers need better protection greater than RAID5 it might be
worth considering the IOA level mirroring information on the following page.
RAID-5 Compared to RAID-6
14.1.7.1
2780/574F (15 - 35 GB DASD): IOP, RAID-5
571E/574F (15 - 35 GB DASD): IOPLess, RAID-5
571E/574F (15 - 35 GB DASD): IOPLess, RAID-6
0.2
0.1
0
0
2000
4000
6000
8000
10000
Workload Throughput
RAID-5 Compared to RAID-6
2780/574F (15 - 35 GB DASD): IOP, RAID-5
571E/574F (15 - 35 GB DASD): IOPLess, RAID-5
571E/574F (15 - 35 GB DASD): IOPLess, RAID-6
100
50
0
14.1.7.2
0
1000
2000
Ops/Sec
3000
4000
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 14 DASD Performance
199
Download from Www.Somanuals.com. All Manuals Search And Download.
In comparing Mirroring and RAID one of the concerns is capacity differences and the hardware
needed. We tried to create an environment where the capacity was the same in both
environments. To do this we built the same size database on “15 35GB DASD using RAID5”
and “14 70GB DASD using Mirroring spread across 2 IOAs”. The protection in the Mirrored
environment is better but it also has the cost of an extra IOA in this low number DASD
environment. For the 30 DASD and 60 DASD environments the number of IOAs needed is
equal.
14.1.7.3
RAID-5 compared to Mirroring
571E/574F (15 - 35 GB DASD): IOPLess, RAID-5
571E (14 - 70 GB DASD): IOPLess, 7 Mirrored Pairs 2 IOAs
571E (30 DASD): IOPLess, 15 Mirrored Pairs 2 IOAs
571E/574F (30 DASD): IOPLess, RAID-5 2 IOAs
571E (60 DASD): IOPLess, 30 Mirrored Pairs 4 IOAs
571E/574F (60 DASD): IOPLess, RAID-5 4 IOAs
0.25
0.2
0.15
0.1
0.05
0
0
5000 10000 15000 20000 25000 30000 35000 40000
Workload Throughput
RAID-5 compared to Mirroring
571E/574F (15 - 35 GB DASD): IOPLess, RAID-5
571E (14 - 70 GB DASD): IOPLess, 7 Mirrored Pairs 2 IOAs
571E (30 DASD): IOPLess, 15 Mirrored Pairs 2 IOAs
571E/574F (30 DASD): IOPLess, RAID-5 2 IOAs
571E (60 DASD): IOPLess, 30 Mirrored Pairs 4 IOAs
571E/574F (60 DASD): IOPLess, RAID-5 4 IOAs
14.1.7.4
40
30
20
10
0
0
2000
4000
6000
Ops/Sec
8000
10000
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 14 DASD Performance
200
Download from Www.Somanuals.com. All Manuals Search And Download.
14.1.8 Performance Limits on the 571F/575B
In the following charts we try to characterize the 571F/575B in different DASD configuration.
The 15 DASD experiment is used to give a comparison point with DASD experiments from
chart 14.1.5.1 and 14.1.5.2. The 18, 24 and 36 DASD configurations are used to help in the
discussion of performance vs capacity.
Our DASD IO workload scaled well from 15 DASD to 36 DASD on a single 571F/575B
571F/575B Scaling
571F/575B (15 DASD): IOPLess, RAID-5
571F/575B (1 IOA18 DASD 3 cages each off 1 571F port): IOPLess, RAID-5
571F/575B (24 DASD): IOPLess, RAID-5
14.1.8.1
571F/575B (36 DASD 1 IOA 12 DASD off each 571F port): IOPLess, RAID-5
571F/575B (36 DASD 2 IOAs 6 DASD off each 571F port): IOPLess, RAID-5
0.2
0.15
0.1
0.05
0
0
5000
10000
15000
20000
Workload Throughput
14.1.8.2
571F/575B Scaling
571F/575B (15 DASD): IOPLess, RAID-5
571F/575B (1 IOA 18 DASD 3 cages each off 1 571F port): IOPLess, RAID-5
571F/575B (24 DASD): IOPLess, RAID-5
571F/575B (36 DASD 1 IOA 12 DASD off each 571F port): IOPLess, RAID-5
571F/575B (36 DASD 2 IOAs 6 DASD off each 571F port): IOPLess, RAID-5
30
25
20
15
10
5
0
0
1,000
2,000
3,000
Ops/Sec
4,000
5,000
6,000
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 14 DASD Performance
201
Download from Www.Somanuals.com. All Manuals Search And Download.
14.1.9 Investigating 571E/574F and 571F/575B IOA, Bus and HSL limitations.
With the new DASD controllers and IOPLess capabilities, IBM has created many new options
for our customers. Customers who needed more storage in their smaller configurations can now
grow. With the ability to add more storage into an HSL loop the capacity and performance have
the potential to grow. In the past a single HSL loop only allowed 6 5094 towers with 45 DASD
per tower, giving a loop a capacity of 270 DASD, with the new DASD controllers that capacity
has grown to 918 DASD. With the new configurations, you can see that 500 and even 600
DASD could make better use of the HSL loop's potential as opposed to the current limit of 270
DASD. Customer environments are unique and these options will allow our customers to look at
their space, performance, and capacity needs in new ways.
With the ability to attach so much more DASD to existing towers we want to try to characterize
where possible bottlenecks might exist. The first limits are the IOAs and we have attempted to
characterize the 571E/574F and 571F/575B in RAID and Mirroring environments. The next
limit will be the buses in a single tower. We are using a large file concurrent RSTLIB operations
from multiple virtual tape drives located on the DASD in the target HSL loop, to try to help
characterize the Bus and HSL limits. The tower is by itself in a single HSL loop, with all the
DASD configured into a single user ASP, and RAID5 activated on the IOAs.
As the scenarios progress 2 then 3 towers are added up to 6 in the HSL loop. All 6 have 3
571E/574F’s controlling the 45 DASD in the 5094 towers and 3 571F/575B IOAs controlling
108 DASD in #5786 EXP24 Disk Drawer. Multiple Virtual tape drives were created in the user
ASP. The 3 other HSL loops contained the system ASP where the data is written to. We used
three HSL loops to prevent the destination ASP from being the bottleneck. The system was a
570 ML16 way with 256 GB of memory and originating ASP contained 916 DASD units on
571E/574F and 571F/575B IOAs. Restoring from the virtual tape would create runs of 100%
reads from the ASP on the single loop. The charts show the maximum throughput we were able
to achieve from this workload.
NOTE: This is a DASD only workload. No other IOAs such as communication IOAs were
present.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
202
Download from Www.Somanuals.com. All Manuals Search And Download.
Large Block READs on a
Single 5094 Tower in an HSL Loop
14.1.9.1
14.1.9.2
Large Block READs on Multiple 5094 Towers in a Single HSL Loop
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 14 DASD Performance
203
Download from Www.Somanuals.com. All Manuals Search And Download.
14.1.10 Direct Attach 571E/574F and 571F/575B Observations
We did some simple comparison measurements to provide graphical examples for customers to
observe characteristics of new hardware. We collected performance data using Collection
Services and Performance Explorer to create our graphs after running our DASD IO workload
(small block reads and writes).
IOP vs IOPLess: no measurable difference in CPU or throughput.
Newer models of DASD are U320 capable and with the new IOAs can improve workload
throughput with the same number of DASD or even less in some workload situations.
IOA’s 571E/574F and the 571F/575B achieved up to 25% better throughput at the 40% DASD
Subsystem Utilization point than the 2780/574F IOA. The 571E/574F and 2780/574F were
measured with 15 DASD units in a 5094 enclosure. The 571F/575B IOAs attached to #5786
EXP24 Disk Drawers.
System Models and Enclosures: Although an enclosure supports the new DASD or new IOA,
you must ensure the system is configured optimally to achieve the increased performance
documented above. This is because some card slots or backplanes may only support the PCI
protocol versus the PCI-X protocol. Your performance can vary significantly from our
documentation depending upon device placement. For more information on card placement
rules see the following link:
IBM i operating system iV5R2: http://www.redbooks.ibm.com/redpapers/pdfs/redp3638.pdf
IBM i operating system iV5R3, iV5R4:
Conclusions: There can be great benefits in updating to new hardware depending upon the
system workload. Most DASD intense workloads should benefit from the new IOAs available.
Large block operations will greatly benefit from the 5094/5294 feature code #6417/9517
enclosures in combination with the new IOA’s and DASD units.
Note: The #6417/9517 provides a faster HSL-2 interface compared to the #2887/9877 and is
available for I/O attached to POWER-5 based systems
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
204
Download from Www.Somanuals.com. All Manuals Search And Download.
14.2 New in iV5R4M5
14.2.1 9406-MMA CEC vs 9406-570 CEC DASD
9406-MMA 4 way 6 433B 70 GB DASD Mirrored "No Cache"
9406-570 4 way 6 4327 70 GB DASD Mirrored "No Cache"
9406-570 4 way 6 4327 70 GB DASD Mirrored "With Cache"
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
200
400
600
800
1000
1200
Workload Throughput
9406-MMA 4 way 6 433B 70 GB DASD Mirrored "No Cache"
9406-570 4 way 6 4327 70 GB DASD Mirrored "No Cache"
9406-570 4 way 6 4327 70 GB DASD Mirrored "With Cache"
16
14
12
10
8
6
4
2
0
0
100
200
300
Ops/Sec
400
500
600
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 14 DASD Performance
205
Download from Www.Somanuals.com. All Manuals Search And Download.
14.2.2 RAID Hot Spare
9406-570 4 way 24 4328 140 GB RAID5 24 active
9406-570 4 way 24 4328 140 GB RAID5 22 active 2 Hot Spares
9406-570 4 way 24 4328 140 GB RAID6 24 active
9406-570 4 way 24 4328 140 GB RAID6 22 active 2 Hot Spares
0.14
0.11
0.08
0.05
0.02
5000
6000
7000
8000
9000 10000 11000 12000 13000
Workload Throughput
For the following test, the IO workload was setup to run for 14 hours. About 5 hours after
starting A DASD was pulled from the configurations. This forced a RAID set rebuild.
570 4 way 24 4328 140 GB DASD RAID5 2 Hot Spares
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
2
4
6
8
10
12
Runtime in Hours
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 14 DASD Performance
206
Download from Www.Somanuals.com. All Manuals Search And Download.
14.2.3 12X Loop Testing
12X Loop testing from 1 571F to 8 571F IOAs with 36 DASD off each 571F
1800
1600
1400
1200
1000
800
600
400
200
0
1
2
3
4
5
6
7
8
Number of 571F IOAs
A 9406-MMA 8 Way system with 96 GB of mainstore and 396 DASD in #5786 EXP24 Disk
Drawer on 3 12X loops for the system ASP were used, ASP 2 was created on a 4th 12X loop by
adding 5796 system expansion units with 571F IOAs attaching 36 4327 70 GB DASD in #5786
EXP24 Disk Drawer with RAID5 turned on. I created a virtual tape drive in ASP2 and I used a
320GB file to save to the tape drive for this test.
When I completed the testing up to 288 DASD on 8 IOAs, I moved the 12X loop to the other
12X GX adapter in the CEC and ran the test again and saw no difference in the testing between
the two loops. The 12X loop is rated for more throughput than the DASD configuration would
allow for. So the test isn’t a tell all about the 12X loops capabilities only a statement of support
to the maximum number of 571F IOAs allowed in the loop.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
207
Download from Www.Somanuals.com. All Manuals Search And Download.
14.3 New in iV6R1M0
14.3.1 Encrypted ASP
More CPU and memory may be needed to achieve the same performance once encryption is
enabled.
Non Encrypted ASP vs Encrypted ASP
9406 MMA 4 Way 571F w ith 24 DASD Non Encrypted ASP
9406 MMA 4 Way 571F w ith 24 DASD Encrypted ASP
0.2
0.15
0.1
0.05
0
2000
4000
6000
8000
10000
12000
14000
DASD/IO Workload Throughput
Non Encrypted ASP vs Encrypted ASP
9406 MMA4 Way 571F with 24 DASD Non Encrypted ASP
9406 MMA4 Way 571F with 24 DASD Encrypted ASP
35
30
25
20
15
10
5
0
0
500
1000
1500
2000
2500
3000
3500
Ops/sec
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 14 DASD Performance
208
Download from Www.Somanuals.com. All Manuals Search And Download.
Non Encrypted ASP vs Encrypted ASP
9406 MMA 4 Way 571F with 24 DASD Non Encrypted ASP
9406 MMA 4 Way 571F with 24 DASD Encrypted ASP
25
20
15
10
5
0
6000
7300
8600
9800
Workload Throughput
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 14 DASD Performance
209
Download from Www.Somanuals.com. All Manuals Search And Download.
14.3.2 57B8/57B7 IOA
With the addition of the POWER6 520 and 550 systems comes the new 57B8/57B7 SAS Raid
Ennoblement Controller with Auxiliary Write Cache. This controller is only available in the
POWER6 520 and 550 systems and provides RAID5/6 capabilities, with 175MB redundant write
cache. Below are some charts comparing the Storage Controllers for the POWER5 570 (573D),
which can be either mirrored or RAID5 protected. The POWER6 570 (572C) which can only be
mirrored, and the POWER6 520/550 (57B8/57B7) which can be RAID5/6 or protected with
mirroring.
V5R4M5 9406 570 573D 6 DASD Mirrored (No Cache)
V6R1M0 9406 MMA572C 6 DASD Mirrored (No Cache)
V6R1M0 9406 57B8 6 DASD Mirrored (Cache)
V6R1M0 9406 57B8 6 DASD RAID5 (Cache)
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
100
350
600
850
1100
1350
1600
1850
Workload Throughput
V5R4M5 9406 570 573D 6 DASD Mirrored (No Cache)
V6R1M0 9406 MMA572C 6 DASD Mirrored (No Cache)
V6R1M0 9406 57B8 6 DASD Mirrored (Cache)
V6R1M0 9406 57B8 6 DASD RAID5 (Cache)
40
35
30
25
20
15
10
5
0
0
400
800
1,200
1,600
Ops/Sec
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 14 DASD Performance
210
Download from Www.Somanuals.com. All Manuals Search And Download.
The POWER6 520 and 550 also have an external SAS port, that is controlled by the 57B8/57B7,
used to connect a single #5886 - EXP 12S SAS Disk Drawer which can contain up to 12 SAS
DASD. Below is a chart showing the addition of the #5886 - EXP 12S SAS Disk Drawer.
POWER6 520 57B8/57B7 6 RAID5 DASD in CEC 12 RAID5 DASD in EXP 12S SAS Disk Drawer
POWER6 520 57B8/57B7 6 RAID5 DASD in CEC
0.25
0.2
0.15
0.1
0.05
0
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Workload Throughput
POWER6 520 57B8/57B7 6 RAID5 DASD in CEC 12 RAID5 DASD in EXP 12S SAS Disk Drawer
POWER6 520 57B8/57B7 6 RAID5 DASD in CEC
10
8
6
4
2
0
0
1,000
2,000
3,000
Ops/Sec
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 14 DASD Performance
211
Download from Www.Somanuals.com. All Manuals Search And Download.
14.3.3 572A IOA
The 572A IOA is a SAS IOA that is mainly used for SAS tape attachment but the 5886 EXP 12S
SAS Disk Drawer can also be attached.
Performance will be poor as the IOA does not have any cache.
The following charts help to show the performance characteristics that resulted during
experiments in the Rochester lab.
If storage space is all that is needed then the 5886 EXP 12S SAS Disk Drawer could be an
option
9406 MMA 4 Way 571F w ith 24 RAID5 DASD in a 5786 EXP24 Disk Draw er
9406 MMA 4 Way 572A w ith 24 Mirrored DASD in 5886 EXP12S SAS Disk Draw ers
9406 p5 570 4 Way 2780 IOA 15 RAID5 DASD
0.2
0.15
0.1
0.05
0
0
2000
4000
6000
8000
10000
12000
14000
DASD/IO Workload Throughput
9406 MMA 4 Way 571F w ith 24 RAID5 DASD in a 5786 EXP24 Disk Draw er
9406 MMA 4 Way 572A w ith 24 Mirrored DASD in 5886 EXP12S SAS Disk Draw ers
9406 p5 570 4 Way 2780 IOA 15 RAID5 DASD
35
30
25
20
15
10
5
0
0
500
1,000
1,500
2,000
2,500
3,000
3,500
Ops/sec
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 14 DASD Performance
212
Download from Www.Somanuals.com. All Manuals Search And Download.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 14 DASD Performance
213
Download from Www.Somanuals.com. All Manuals Search And Download.
14.4 SAN - Storage Area Network (External)
There are many factors to consider when looking at external storage options, you can get more
information through your IBM representative and the white papers that are available at the following
location.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
214
Download from Www.Somanuals.com. All Manuals Search And Download.
14.5 iV6R1M0 -- VIOS and IVM Considerations
Beginning in iV6R1M0, IBM i operating system will participate in a new virtualization strategy
by becoming a client of the VIOS product. Customers will view the VIOS product two different
ways:
y On blade products, through the regular configuration tool IVM (which includes an easy to
use interface to VIOS).
y On traditional (non-blade) products, through a combination of HMC and the VIOS command
line.
The blade products have a simpler interface which, on our testing, appear to be sufficient for the
environments involved. On blades, customers are restricted to a single set of 16 logical units
(which IBM i operating system perceives as if they were physical drives). This substantially
reduces the number and value of tuning options. It is possible for blade-based customers to use
the VIOS command line. However, we did not discover the need to do so and do not think most
customers will need to either. The tuning available from IVM proved sufficient and should be
preferred for its ease of use when it is workable.
Customers should strongly consider their disk requirements here and consult with their support
teams before ordering. Customers with more sophisticated disk-based requirements (or, simply,
larger numbers of disks) should choose systems that allow a greater number of LUNs and
thereby enable more substantial tuning options provided from the VIOS command line. No hard
and fast rules are possible here and we again emphasize that one consult with their support team
on what will work for them. However, as a broad rule of thumb, customers with 200 or more
physical drives very likely need something beyond the 16 LUNs provided by the IVM
environment. Customers below 100 physical disks can, in most cases, get by with IVM.
Customers with 50 or fewer very likely will do just fine with IVM.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
215
Download from Www.Somanuals.com. All Manuals Search And Download.
14.5.1 General VIOS Considerations
14.5.1.1 Generic Concepts
520 versus 512. Long time IBM i operating system users know that IBM i operating system
disks are traditionally configured with 520 byte sectors. The extra eight bytes beyond the 512
used for data are used for various purposes by Single Level Store.
For a variety of reasons, VIOS will always surface 512 byte sectors to IBM i operating system
whatever the actual sector size of the disk may be. This means that 520 byte sectors must be
emulated within 512 byte sectors when using disks supported by VIOS. This is done, simply
enough, by reading nine 512 byte data sectors for every eight sectors of actual data and placing
the Single Level Store information within the extra sector. Since all disk operations are
controlled by Single Level store in an IBM i operating system there are no added security
implications from this extra sector, provided standard, sensible configuration practices are
followed just as they would be for regular 520 byte devices.
However, reading nine sectors when only eight contain data will cost some performance, most of
it being the sheer cost of the extra byte transfer of the extra sector. The gains are the standard
ones of virtualization -- one might be able to share or re-purpose existing hardware for System i's
use in various ways.
Note carefully that some "512" byte sectored devices actually have a range of sizes like 522,
524, and others. Confusingly for us, the industry has gone away from strictly 512 byte sectors
for some devices. They, too, have headers that consume extra bytes. However, as noted above,
these extra bytes are not available for IBM i operating system and so, for our purposes, they
should be considered as if they were 512 byte sectored, because that is what IBM i operating
system will see. Some configuration tools, however, will discuss "522 byte" or whatever the
actual size of the sectors is in various interfaces (IVM users will not see any of this).
VIOS will virtualize the devices. Many configuration options are available for mapping physical
devices, as seen by VIOS, to virtual devices that VIOS will export to DST and Single Level
Store. Much more of this will be done by the customer than was done with internal disks.
Regardless of whether the environment is blades or traditional, it is important to make good
choices here. Even though there is much functional freedom, many choices are not optimized for
performance or optimized in an IBM i operating system context. Moreover, nearly as a matter of
sheer physics, some choices, once made, cannot be much improved without very drastic steps
(e.g. dedicating the system, moving masses of data around, etc.). Choosing the right
configuration in the first place, in other words, is very important. Most devices, especially SAN
devices, will have “Best Practices” manuals that should be consulted.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
216
Download from Www.Somanuals.com. All Manuals Search And Download.
14.5.1.2 Generic Configuration Concepts
There are several important principles to keep track of in terms of getting good performance.
Most of the following are issues when the disks are configured. A great many problems can be
eliminated (or, created) when the drives are originally configured. The exact nature of some of
these difficulties might not be easily predicted. But, much of what follows will simply avoid
trouble at no other cost.
1. Ensure that RAID protection is performed as close to the physical device as possible . This
is typically done out at an I/O adapter level or on the external disk array product. This means
that either the external disk's configuration tools or (for internal disks assigned to VIOS) VIOS'
tools will be used to create RAID configurations (RAID5, RAID10, or RAID1). When this is
done, as far as IBM i operating system disk status displays are concerned, the resulting virtual
drives appear to be "unprotected." It might be superficially reassuring to have IBM i operating
system do the protection (if IBM i operating system even permits it). WRKDSKSTS would then
show the protection on that path. DST/SST disk configuration functions would show the
protection, too. However, it is better to put up with what appears to IBM i operating system's
disk status routines to be unprotected devices (which are, after all, actually protected) than to
take on the performance problems of doing this under IBM i operating system. RAID recovery
procedures will have to be pursued outside of IBM i operating system in any event, so the
protection may as well go where the true physicality is understood (either in VIOS or the
external disk array product).
Note also that you also want to configure things so that the outboard devices, rather than VIOS,
do the RAID protection whenever possible. This enables I/O to flow directly from the device to
IBM i operating system as directed by VIOS.
High Availability scenarios also need to be considered. In some cases, to enable appropriate
redundancy, it may be necessary to do the protection a little farther away from the device (e.g.
spread over a couple of adapters) so as to enable the proper duplexing for high availability. If
this applies to you, consult the documentation. Some external storage devices have extensive
duplexing within themselves, for instance, which could allow one to keep the protection close to
the device after all.
2. Recognize that Internal Disks remain the "gold standard" for performance. We have
consistently measured external disks as having less performance than 520 byte, internally
attached disks. However, the loss of throughput, with proper configuration, is not a major
concern. What is harder to control is response time. If you have sensitivity to response time,
consider internal disks more strongly.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
217
Download from Www.Somanuals.com. All Manuals Search And Download.
3. Prefer external disks attached directly to IBM i operating system over those attached via
VIOS This is basically a statement of the Fibre Channel adapter and who owns it. In some
cases, it affects which adapter is purchased. If you do not need to share a given external disk's
resources with non-IBM i operating system partitions, and the support is available, avoiding
VIOS altogether will give better performance. First, the disks will usually have 520 byte
support. Second, the IBM i operating system support will know the device it is dealing with.
Third, VIOS will typically run as a separate partition. If you run VIOS as your first shared
partition, simply turning on shared support costs about five to eight percent overall. The
alternative, a dedicated partition for VIOS, would be a nice thing to avoid if possible. If you
would not have used shared processor support otherwise, or would have to give VIOS a whole
processor or more otherwise, this is a consideration.
4. Prefer standard IBM i operating system internal disks to VIOS internal disks. This
describes who should own a given set of internal disks. If there is a choice, giving the available
internal disks to IBM i operating system instead of going through VIOS will result in noticeably
better performance. VIOS is a better fit for external disk products that do not support the IBM i
operating system 520 byte sector. The VIOS case would include internal disks that came
originally from pSeries or System p. However, one should investigate those devices also. If
those devices support 520 byte sectors (or, alternatively, if it is stated they are supported by IBM
i operating system), they should be reconfigured instead as native IBM i operating system
internal disks. It should be exceptional to use VIOS for internal disks.
5. Prefer RAID 1 or RAID 10 to RAID 5. We are now beginning to generally recommend
RAID 1 ("mirroring") or RAID 10 (a "mirroring" variant) for disks generally in On-line
Transaction Processing (OLTP) environments. OLTP environments have long had to deal with
configurations based on total arm count, not capacity as such. If that applies to you, you have
extra space that is of marginal value. Those in this situation can nowadays use the same number
of arms deployed as RAID 1 or RAID 10 to gain increased performance. This is at least as true
for external disks as it is for internal disks. Note that in this recommendation, one deploys the
same arm count -- just deploys them differently, trading unused space for performance. Also
note that if one goes this route, two physical disks per RAID 10 or RAID 1 set is better than a
larger number of disks per RAID 1 or RAID 10 set. (See also “Ensure, within reason” below).
6. For VIOS, Prefer External Disks (SAN disks) to Internal Disks. SAN disks will have
greater flexibility and better tuning options than internal disks. Accordingly, when there is a
choice, VIOS is best used for external disks.
7. Separate Journal ASPs from other ASPs. Generally, we have long recommended that a
given set of data base files (aka SQL tables) keep its set of journal receivers in a separate ASP
from the data base ASP or ASPs. With VIOS, we recommend that this continue to the extent
feasible. It may be necessary to share things like Fibre Channel links, but it should be possible
to have separate physical devices at the very least. To the extent possible, arrange for journal to
use its own internal buses also (of whatever sort the device provides).
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
218
Download from Www.Somanuals.com. All Manuals Search And Download.
8. Ensure, within reason, a reasonable number of virtual disks are created and made
available to IBM i operating system. One is tempted to simply lump all the storage one has in a
virtual environment into a couple (or even one) large virtual disk. Avoid this if at all possible.
For traditional (non-blade) systems: There is a great deal of variability here, so
generalizations are difficult. However, in the end, favor virtual disks that are within a binary
order of magnitude or two of the physical disk sizes. Make each them as close to the same size if
possible. In any case, strive to have half a dozen or more in an ASP if you can. Years of system
tuning (at all levels) tacitly expect a reasonable number of devices, so it makes sense to provide a
bunch. You don't need a count larger than the physical device count, however, unless the device
count is very small.
For blades-based systems: You only have 16 LUNs available. However, you should use
a good fraction of them rather than merely one or two. In our tests, we tended to use twelve to
sixteen LUNs. One wishes a sufficient number for IBM i operating system to work with -- one
wishes also to segregate physical devices between ASPs to the extent feasible.
9. Prefer Symmetrical Configurations. To the extent possible, we have found that physical
symmetry pays off more than we have seen before. Balancing the number of physical disks as
much as possible seems to help. Strive to have uniform LUN sizes, uniform number of disks in
each RAID set, balance (at least at the static configuration level) between the various internal
and external buses, etc. To the extent practical, the user should strive for even numbers of items.
10. In general, do not share the same physical disk with multiple partitions. Only If you are
running some minimal IBM i operating system partition (say, a very small Domino partition or
perhaps a middle tier partition that has no local data base), should you consider strategies where
IBM i operating system is sharing physical disks with other partitions. For more traditional
application sets (whether a traditional system or a blade) you'll have a data base or large enough
data contents generally to give each IBM i operating system partition its own physical devices.
Once you get to multiple devices, sharing them with other partitions will lead to performance
problems as the two partitions fight (in mutual ignorance) for the same arm, which may increase
seek time (at least) a little to a lot. Service time could be adversely affected as well.
11. To the extent possible, think multiple VIOS partitions for multiple IBM i operating system
partitions. If the physical disks deserve segmentation, multiple VIOS partitions may also be
justified. The main issue is load. If the IBM i operating system partitions are small (under two
CPUs), then you're probably better off with a shared VIOS partition hosting a couple of small
IBM i operating system partitions. As the IBM i operating system partitions grow, it will be
possible to justify dedicated VIOS partitions. Our current measurements suggest one VIOS
processor for every three IBM i operating system processors, but this will vary by the
application.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
219
Download from Www.Somanuals.com. All Manuals Search And Download.
14.5.1.3 Specific VIOS Configuration Recommendations -- Traditional (non-blade)
Machines
1. Avoid volume groups if possible. VIOS "hdisks" must have a volume identifier (PVID).
Creating a volume group is an easy way to assign one and some literature will lead you to do it
that way. However, the volume group itself adds overhead for no particular value in a typical
IBM i operating system context where physical volumes (or, at least, RAID sets) are exported as
a whole without any sort of partitioning or sub-setting. Volume groups help multiple clients
share the same physical disks. In an IBM i operating system setting, this is seldom relevant and
the overhead volume groups employ is therefore not needed. It is better to assign a PVID by
simply changing the attribute of each individual hdisk. For instance, the VIOS command:
chdev -dev hdisk03 -attr pv=yes will assign a PVID to hdisk3.
2. For VIOS disks, use available location information to aid your RAID planning. To obtain
RAID sets in IBM i operating system, you simply point DST at particular groups you want and
IBM i operating system decides which disks go together. Under VIOS, for internal disks, you
have to do this yourself. The names help show you what to do. For instance, suppose VIOS
shows the following for a set of internal disks:
Name
Location
State
Description
Array Member
Array Member
Array Member
Array Member
Array Member
Array Member
Array Member
Size
35.1GB
35.1GB
35.1GB
35.1GB
35.1GB
35.1GB
35.1GB
pdisk0 07-08-00-2,0 Active
pdisk1 07-08-00-3,0 Active
pdisk2 07-08-00-4,0 Active
pdisk3 07-08-00-5,0 Active
pdisk4 07-08-00-6,0 Active
pdisk5 07-08-01-0,0 Active
pdisk6 07-08-01-1,0 Active
Here, it turns out that these particular physical disks are on two internal SCSI buses (00 and 01)
and have device IDs of 2, 3, 4, 5, and 6 on SCSI bus 00 and device IDs of 0 and 1 on SCSI bus
01. If this was all there was, a three disk RAID set of pdisk0, pdisk1, and pdisk5 would be a
good choice. Why? Because pdisk0 and 1 are on internal SCSI bus 00 and the other one is one
SCSI bus 01. That provides a good balance for the available drives. This could also be repeated
for pdisk2, pdisk3, pdisk4, and pdisk6. This would result in two virtual drives being created to
represent the seven physical drives. The fact that these are two RAID5 disk sets (of three and
four physical disks, respectively) would be unknown to IBM i operating system, but managed
instead by VIOS. One or may be two virtual SCSI buses would be required to present them to
IBM i operating system by VIOS. A large configuration could provide for RAID5 balance over
even more SCSI buses (real and virtual).
On external storage, the discussion is slightly more complicated, because these products tend to
package data into LUNs that already involve multiple physical drives. Your RAID set work
would have to use whatever the external disk storage product gives you to work with in terms of
naming conventions and what degree of control you have available to reflect favorable physical
boundaries. Still, the principles are the same.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
220
Download from Www.Somanuals.com. All Manuals Search And Download.
3. Limited number of virtual devices per virtual SCSI adapter. You will have to configure
some number of virtual SCSI adapters so that VIOS can provide a path for IBM i operating
system to talk to VIOS as if these were really physical SCSI devices. These adapters, in turn,
implement some existing rules, so that only 16 virtual disks can be made part of a given virtual
adapter. You probably would not want to exceed this limit anyway. Note that the virtual
adapters need not relate to physical boundaries of the various underlying devices. The main
issue is to balance the load. You may be able to segregate data base and journal data at this
level. Note also that in a proper configuration, the virtual SCSI adapters will carry command
traffic only. The actual data DMA will be direct to the IBM i operating system partition.
4. VIOS and Shared Processors. On the whole, dedicated VIOS processors will work better
than shared processors, especially as the IBM i operating system partition needs three or more
CPUs itself. If you do not need shared processors for other reasons, experiment and see if
dedicated VIOS processors work better. In fact, it might be an experiment worth running even if
you have shared processors configured generally.
5. VIOS and memory. VIOS arranges for the DMA to go directly to the IBM i operating system
memory (with the help of PHYP and IBM i operating system to ensure integrity). This means
that actual data transfer will not go through VIOS. It only needs enough main storage to deal
with managing disk traffic, not the data the traffic itself consumes. Our current measurements
suggests that 1 GB of main storage is the minimum recommended. Other work suggest that
unless substantial virtual LAN is involved, between 1 GB and 2 GB tends to suffice at the 1 to 3
CPU ratio we typically measured.
6. VIOS and Queue Depth. Queue depth is a value you can change, so one can experiment to
find the best value, at least on a per IPL basis. VIOS tends the set the queue depth parameter to
smaller values. Especially if you follow our recommendations for the number of virtual disks,
you will find values like 32 to work well for the device as a starting point. If you do that, you
will also want to set the queue depth for the adapter (usually called num_cmd_elems) to its
larger value, often 512. Consult the documentation.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
221
Download from Www.Somanuals.com. All Manuals Search And Download.
14.5.1.3 VIOS and JS12 Express and JS22 Express Considerations
Most of our work consisted of measurements with the JS22 offering and external disks using the
DS4800 product. The following are results obtained in various measurements and then a few
general comments about configuration will follow.
14.5.1.3.1 BladeCenter H JS22 Express running IBM i operating system/VIOS
The following tests were run using a 4 processor JS22 Express in a BladeCenter H chassis, 32
GB of memory and a DS4800 with a total of 90 DDMs, (8 DDMs using RAID1 externalized in
2 LUNs for the system ASP, 6 DDMs in each of 12 RAID1 LUNs (a total of 72 DDMs) in the
database ASP, and 10 DDMs unprotected externalized in 2 LUNs for the journal ASP). We had
two Fibre Channel attachments to the DS4800 with half of the LUNs in each of the ASP’s using
controller A as the preferred path and the other half of the LUNs using controller B as the
preferred path. The following charts show some of the performance characteristics we observed
running our Commercial Performance Workload in our test environment. Your results may vary
based on the characteristics of your workload. A description of the Commercial Performance
Workload can be found in appendix A of the Performance Capabilities Reference.
Creating and running multiple LPARs can lead to unique system management challenges.
Reference. The following is a link to an LPAR white paper.
For most of our testing we only utilized one IBM i operating system partition on our JS22
Express. Note that VIOS is the base operating system on the JS22 Express, installed on the
internal SAS Disk and VIOS must virtualize the DS4800 LUNs and communication resources to
the IBM i operating system partition, which resides on the DS4800 DDMs.
VIOS/IVM must have some of the memory and processor resources for this vrtulization. The
amount of resources needed will be dependent on the physical hardware in the Blade Center and
the number of partitions being supported on a particular Blade. For our testing we found that we
could not operate with under 1 GB of memory and for all of the tests in this section we used 2
GB of memory. The number of processors varied for each experiment and the charts will define
the processors used in that experiment.
One important thing to note is that we only changed the amount of memory and processors in the
VIOS partition. Otherwise the rest of the settings for the VIOS partition are as they default when
the basic configurations is created during the VIOS install. So the VIOS partition processors in
my experiments are always set up as shared, only the IBM i operating system partition is created
using dedicated processors.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
222
Download from Www.Somanuals.com. All Manuals Search And Download.
VIOS/IBM i operating system JS22 Express
DS4800 (90 DDMs)
Commercial Performance Workload
IBM i operating system .8 Processor VIOS .2 Processor
IBM i operating system 1.7 Processor VIOS .3 Processor
IBM i operating system 2.6 Processors VIOS .4 Processors
IBM i operating system 3.5 Processors VIOS .5 Processors
10
1
0.1
0.01
0.001
0
10000
20000
30000
40000
50000
60000
Transactions/Minute
The chart above shows some basic performance scaling for 1, 2, 3 and 4 processors. For this
comparison both partition measurements were done with the processors set up as shared, and
with the IBM i operating system partition set to capped. The rest of the resources stay constant,
which consists of 90 RAID1 DDMs in a DS4800 under 16 LUNs 2 GB of memory assigned to
VIOS and 28 GB assigned to the IBM i operating system partition. Note that only 1 LPAR is
running at the time of the experiment.
VIOS/IBM i operating system JS22 Blade
DS4800 (90 DDMs)
Commercial Performance Workload
0.031
0.021
0.011
0.001
0
500
1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 8000 8500 9000
ASP Throughput OP/S
i5OS .8 Processor VIOS .2 Processor
i5OS 1.7 Processor VIOS .3 Processor
i5OS 2.6 Processors VIOS .4 Processors
i5OS 3.5 Processors VIOS .5 Processors
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 14 DASD Performance
223
Download from Www.Somanuals.com. All Manuals Search And Download.
The following charts are a view of the characteristics we observed during our Commercial
Performance Workload testing on our JS22 Express. The first chart shows the effect on the
Commercial Performance Workload when we apply 3 Dedicated processors and then switch to 3
shared processors. Then incremented the number of virtual processors available.
The “red line” is our dedicated processor set up, which is our baseline. The “blue line” is
turning on shared processors in what I might have thought of as a fair comparison where 1
virtual processor was assigned for each real processor, resulting in 1 virtual processor for VIOS
and 3 virtual processors for IBM i operating system. The Next experiment the “green line” was
to increase the number of virtual processors assigned to VIOS but not the number of virtual
processors assigned to IBM i operating system. Four virtual processors assigned to VIOS
seemed to worked best for our environment. Next was to increase the number of virtual
processors assigned to the IBM i operating system environment. Six virtual processors seen in
the “purple line” optimized our environment best. As I increased from 6 virtual processors I
started losing performance until I had increased to the 28 virtual processors available to me
shown in the “dark red line”
Not all workloads will react in the same way but it is important to note that a small change to
your configuration can have a large influence on your performance positive and negative. .
IBM i operating system JS22 Express
Commercial Performance Workload
10
1
0.1
0.01
0.001
20000
25000
30000
35000
40000
45000
50000
55000
60000
Transactions/Minute
DS4800 3 Dedicated Processors IBM i operating system, 1 Shared Porcessor 4 Virtual Processors VIOS
DS4800 3 Shared Processors 3 Virtual Processors IBM i operating system, 1 Processor 1 Virtual Processors VIOS
DS4800 3 Shared Processors 3 Virtual Processors IBM i operating system, 1 Shared Processor 4 Virtual Processor VIOS
DS4800 3 Shared Processors 6 Virtual Processors IBM i operating system, 1 Shared Processor 4 Virtual Processors VIOS
DS4800 3 Shared Processors 28 Virtual Processors IBM i operating system, 1 Shared Processor 4 Virtual Porcessors
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
224
Download from Www.Somanuals.com. All Manuals Search And Download.
In following single partition Commercial Performance Workload runs the average VIOS CPU
stayed under 40%. So we seem to have VIOS resource available but in a lot of customer
environments communications and other resources are also running and these resources will also
be routed through VIOS.
IBM i operating system CPU Usage While Running the Commercial Performance Workload
3 Dedicated Processors IBM i operating system Partition 1 Processor VIOS
3 Shared Processors IBM i operating system Partition 1 Processor VIOS
3.5 Shared Processors Partition IBM i operating system .5 Processor VIOS
100
90
80
70
60
50
40
30
20
10
20000
25000
30000
35000
40000
45000
50000
55000
60000
Transactions/Minute
VIOS CPU Usage While Running the Commercial Performance Workload
3 Dedicated Processors IBM i operating system Partition 1 Processor VIOS
3 Shared Processors IBM i operating system Partition 1 Processor VIOS
3.5 Shared Processors IBM i operating system Partition .5 Processor VIOS
50
40
30
20
10
20000
25000
30000
35000
40000
45000
50000
55000
Transactions/Minute
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 14 DASD Performance
225
Download from Www.Somanuals.com. All Manuals Search And Download.
The following chart shows two IBM i operating system partitions using 14GB of memory and
1.7 processors each served by 1 VIOS partition using 2GB of memory and .6 processors. The
Commercial Performance Workload was running the same amount of transactions on each of the
partitions for the same time intervals. Although there is an observed cost for VIOS to manage
multiple partitions, VIOS was able to balance services to the two partitions. Experimenting with
the number of processors and memory assigned to the partitions might yield a better environment
for other workloads.
VIOS/IBM i operation system, JS22, 2 Partition Experiments
Commercial Performance Workload
0.2001
0.1001
0.0001
0
10000
20000
30000
40000
50000
60000
Transactions/Minute
1 of 2 i5/OS 1.7 Processor Partitions on a .6 Processor VIOS Partition Using 48 DS4800 DDMs
2 of 2 i5/OS 1.7 Processor Partitions on a .6 Processor VIOS Partition Using 48 DS4800 DDMs
1 Single i5?OS Partition 3.5 Processors on a .5 Processor VIOS Partition Using 96 4800 DDMs
VIOS/IBM i operation system, JS22, 2 Partition Experiments
Commercial Performance Workload
0.031
0.021
0.011
0.001
0
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 8000 8500 9000
ASP Throughput OP/S
1 of 2 i5/OS 1.7 Processor Partitions on a .6 Processor VIOS Partition Using 48 DS4800 DDMs
2 of 2 i5/OS 1.7 Processor Partitions on a .6 Processor VIOS Partition Using 48 DS4800 DDMs
1 Single i5?OS Partition 3.5 Processors on a .5 Processor VIOS Partition Using 96 4800 DDMs
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
226
Download from Www.Somanuals.com. All Manuals Search And Download.
14.5.1.3.2 BladeCenter S and JS12 Express
The IBM i operating system is now supported on a JS12 Express in a BladeCenter S. The
system is limited to 12 SAS DASD and the following charts try to characterize the performance
we achieved during experiments with the Commercial Performance Workload in the IBM lab.
Using a JS22 Express in a BladeCenter H connected to a DS4800, we limited the resources in
order to get a comparison to the SAS DASD used in the BladeCenter S.
BladeCenter S - 12 internal SAS DASD
BladeCenter H fibre chanbnel attached DS4800 using 12 DDMs
P6 M25 with 12 internal SAS DASD on the 572C(no cache) and 57B8 (cache) IOAs
10
1
0.1
0.01
0.001
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
Transactions/Minute
BladeCenter S JS12 12 BladeCenter SAS DASD IBM i operating system miroring 16GB Memory 1.8 Processors i5 .2 VIOS
BladeCenter H JS22 12 DS4800 DASD RAID1 done on the DS4800 16GB Memory 1.8 Processors i5 .2 VIOS
M25 572C 12 SAS Dasd Mirrored 16 GB memory 2 processors
M25 57B8 12 SAS DASD Mirrored 16 GB memory 2 Processors
BladeCenter S - 12 internal SAS DASD
BladeCenter H fibre chanbnel attached DS4800 using 12 DDMs
P6 M25 with 12 internal SAS DASD on the 572C(no cache) and 57B8 (cache) IOAs
0.031
0.021
0.011
0.001
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
5500
6000
ASP Throughput (OP/S)
BladeCenter S JS12 12 BladeCenter SAS DASD IBM i operating system miroring 16GB Memory 1.8 Processors i5 .2 VIOS
BladeCenter H JS22 12 DS4800 DASD RAID1 done on the DS4800 16GB Memory 1.8 Processors i5 .2 VIOS
M25 572C 12 SAS Dasd Mirrored 16 GB memory 2 processors
M25 57B8 12 SAS DASD Mirrored 16 GB memory 2 Processors
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
227
Download from Www.Somanuals.com. All Manuals Search And Download.
14.5.1.3.3 JS12 Express and JS22 Express Configuration Considerations
1. The aggregate total of virtual disks (LUNs) will be sixteen at most. Many customers will
want to deploy between 12 and 16 LUNs and maximize symmetry. Consult carefully with your
support team on the choices here. This is the most important consideration as it is difficult to
change later. Consult also any available Best Practices manuals for a given SAN attached
storage server.
2. The VIOS partition should be provided with between 1 and 2 GB of memory for disk-based
usage's. If virtual LAN is a substantial factor, more memory may be required.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
228
Download from Www.Somanuals.com. All Manuals Search And Download.
14.5.1.3.4 DS3000/DS4000 Storage Subsystem Performance Tips
Physical disks can be configured various ways with RAID levels, number of disks in each
array and number of LUNs created over those arrays. There are also various reasons for
the configurations that are chosen. One end user might be looking for ease of use and
choose to create one array with multiple LUNs, where another end user might consider
performance to be a more critical issue and select to create multiple arrays. The following
charts are meant to show possible performance affects of various configurations using the
Commercial Performance Workload.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
229
Download from Www.Somanuals.com. All Manuals Search And Download.
Blade Center H with a JS22 4 Way
Commercial Performance Workload
10
1
0.1
0.01
0.001
1000
11000
21000
31000
41000
51000
61000
Transactions/Minute
DS3400 JS22 4 WAY 9 LUNs on 1 36DDM RAID10 Array in DB ASP
DS3400 JS22 4 WAY 9 LUNs on 9 4DDM RAID10 Arrays in DB ASP
DS3400 JS22 4 WAY 18 LUNs on 18 2DDM RAID1 Arrays in DB ASP
DS3400 JS22 4 WAY 9 LUNs on 9 4DDM RAID5 Arrays in DB ASP
DS4800 JS22 4 WAY 9 LUNs on 9 4DDM RAID10 Arrays in DB ASP
Blade Center H with a JS22 4 Way
Commercial Performance Workload
0.061
0.051
0.041
0.031
0.021
0.011
0.001
0
500
1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 8000
ASP Throughput (OP/S)
DS3400 JS22 4 WAY 9 LUNs on 1 36DDM RAID10 Array in DB ASP
DS3400 JS22 4 WAY 9 LUNs on 9 4DDM RAID10 Arrays in DB ASP
DS3400 JS22 4 WAY 18 LUNs on 18 2DDM RAID1 Arrays in DB ASP
DS3400 JS22 4 WAY 9 LUNs on 9 4DDM RAID5 Arrays in DB ASP
DS4800 JS22 4 WAY 9 LUNs on 9 4DDM RAID10 Arrays in DB ASP
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 14 DASD Performance
230
Download from Www.Somanuals.com. All Manuals Search And Download.
14.6 IBM i operating system 5.4 Virtual SCSI Performance
The primary goal of virtualization is to lower the total cost of ownership of equipment by
improving utilization of the overall system resources and reducing the labor requirements to
operate and manage many servers.
With virtualization, the IBM Power Systems can now be used similar to the way mainframes
have been used for decades, sharing the hardware between many programs, services,
applications, or users. Of course, for each of these individual users of the hardware, sharing
resources may result in lower performance than having dedicated hardware, but the overall cost
is usually far less than when dedicating hardware to each user. The decision of using
virtualization is therefore a trade-off between cost and performance.
IBM i operating system Virtual SCSI is based on a client/server relationship. A IBM i operating
system Server partition owns the physical resources, and client partitions access the virtual SCSI
resources provided by the IBM i operating system Server partition. The IBM i operating system
Server partition has physically attached I/O devices and exports one or more of these devices to
other partitions. The client partition is a partition that has a virtual disk and relies on the IBM i
operating system Server partition to provide access to one or more physical devices. POWER5
and future POWER technologies provide virtual SCSI support for AIX 5L V5.3 and Linux.
Previous POWER technology supported Linux virtual SCSI.
The performance considerations that we detail in this section must be balanced against the
savings made on the overall system cost. For example, the smallest physical disk that is available
to the IBM i operating system is 70 GB. An AIX or Linux operating system requires only 4 GB
of disk. If one disk is dedicated to the operating system, nearly 95% of this physical disk space is
unused. Furthermore, the system disk I/O rate is often very low. With the help of IBM i
operating system Virtual SCSI, it is possible to split the same disk into 9 virtual disks of about
8 GB each. If each of these disks is used for installation of the operating system, you can support
nine separate instances of the operating system, with nine times fewer disks and perhaps as many
physical SCSI adapters. Compare these savings with the extra cost of processing power needed
to handle the virtual disks.
Enabling IBM i operating system Virtual SCSI results in using extra processing power compared
to directly attached disks, due to extra POWER VIO activity. Depending on the configuration,
this may or may not yield the same performance when comparing virtual hosted disk devices to
physically attached SCSI devices. If a partition has high performance and disk I/O requirements
that justify the cost of dedicated hardware, then using virtual SCSI is not recommended.
However, partitions with non-critical performance and low disk I/O requirements often can be
configured to use virtual SCSI, which in turn lowers hardware and operating costs.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
231
Download from Www.Somanuals.com. All Manuals Search And Download.
In the test results that follow, we see the CPU required for IBM i operating system Virtual SCSI
server and the benefits of the IBM i operating system Virtual SCSI implementation should be
assessed for a given environment. Simultaneous multithreading should be enabled in a virtual
hosted disk environment. For most efficient virtual hosted disk implementation with larger IO
loads, it may be advantageous to keep the IBM i operating system Virtual SCSI Server partition
as a dedicated processor. Processor micro partitioning should be used with low IO loads or with
workloads which are not latency dependent.
Virtual storage can be created in an ASP using the CRTNWSSTG and linked using the
CRTNWSD commands. The disk can be manipulated in the client AIX or Linux partition the
same as an ordinary physical disk. Some performance considerations from dedicated storage are
still applicable when using virtual storage, such as spreading ASP’s across multiple disks on
multiple RAID adapters so that parallel access is possible. From the server's point of view, a
virtual drive can be served using an entire ASP, or a portion of an ASP. If the server partition
provides the client with a partition of a drive, then the server decides the area of the drive to
serve to the client when the network storage space is created.
This allows reads and writes of an ASP to be shared among several virtual devices. If the entire
ASP is served to the client, then the rules and procedures apply on the client side as if the drive
were local
Consider the following general performance issues when using virtual SCSI:
y Only use virtual hosted disk in low I/O loads
y Virtual hosted disk is a client/server model, so the combined CPU cycles required on the I/O
client and the I/O server will always be higher than local I/O
y If multiple partitions are competing for resources from a virtual hosted disk server, care must
be taken to ensure that enough server resources (processor, memory, and disk) are allocated
to do the job.
y There is data read caching in memory on the Virtual hosted disk Server partition. Thus, all
I/Os that it services could benefit from effects of caching heavily used methods. Read
performance can be improved by increasing the memory in the virtual hosted disk server.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
232
Download from Www.Somanuals.com. All Manuals Search And Download.
14.6.1 Introduction
In general, applications are functionally isolated from the exact nature of their storage
subsystems by the operating system. An application does not have to be aware of whether its
storage is contained on one type of disk or another when performing I/O. But different I/O
subsystems have subtly different performance qualities, and virtual SCSI is no exception. What
differences might an application observe using IBM i operating system Virtual SCSI versus
directly attached storage? Broadly, we can categorize the possibilities into I/O latency and I/O
bandwidth.
We define I/O response time as the time that passes between the initiation of I/O and completion
as observed by the application. Latency is a very important attribute of disk I/O. Consider a
program that performs 1000 random disk I/Os, one at a time. If the time to complete an average
I/O is six milliseconds, the application will run no less than 6 seconds. However, if the average
I/O response time is reduced to three milliseconds, the application's run time could be reduced by
three seconds. Applications that are multi-threaded or use asynchronous I/O may be less
sensitive to latency, but under most circumstances, less latency is better for performance.
We define I/O bandwidth as the maximum amount of data that can be read or written to storage
in a unit of time. Bandwidth can be measured from a single thread or from a set of threads
executing concurrently. Though many applications are more sensitive to latency than
bandwidth, bandwidth is crucial for many typical operations such as backup and restore of
persistent data.
Because disks are mechanical devices, they tend to be rather slow when compared to
high-performance microprocessors such as IBM POWER Systems. As such, we will show that
virtual hosted disk performance is comparable to directly attached storage under most workload
environments.
IBM i operating system hosts disk space in a Network Storage Space (NWSSTG). A network
server description (NWSD) is used to give a name to the configuration, to provide an interface
for starting and stopping an AIX logical partition, and to provide a link between AIX and its
virtual storage.
There are many factors that affect IBM i operating system performance in a virtual SCSI
environment. This chapter discusses some of the common factors and offers guidance on how to
help achieve the best possible performance. Much of the information in this chapter was
obtained as a result of analysis experience within the Rochester development laboratory. Many
of the performance claims are based on supporting performance measurement and analysis with
a primitive disk workload. In some cases, the actual performance data is included here to
reinforce the performance claims and to demonstrate capacity characteristics.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
233
Download from Www.Somanuals.com. All Manuals Search And Download.
All measurements were completed on a POWER5 570+ 4-Way (2.2 GHz). Each system is
configured as an LPAR, and each virtual SCSI test was performed between two partitions on the
same system with one CPU for each partition. IBM i operating system 5.4 was used on the
virtual SCSI server and AIX 5.3 was used on the client partitions.
The primitive disk workload used to evaluative the performance of virtual SCSI is an in house,
multi-processed application that performs all types of Synchronous or Asynchronous I/O
(read/write/sequential/random) to a target device. The program is run on an AIX or Linux client
and gets reports of CPU consumption and gathers disk statistics. Remote statistics are gathered
via a socket based application which gathers CPU from the IBM i operating system hosted disk
and physical disk statistics.
The purpose of this document is to help virtual SCSI users to better understand the performance
of their virtual SCSI system. A customer should be able to size the expected speed of their
application from this document.
Note: You will see different terms in this publication that refer to the various components involved with virtual
SCSI. Depending on the context, these terms may vary. With SCSI, usually the terms server and client are used, so
you may see terms such as virtual SCSI client and virtual SCSI server. On the Hardware Management Console, the
terms virtual SCSI server adapter and virtual SCSI client adapter are used. They refer to the same thing. When
describing the client/server relationship between the partitions involved in virtual SCSI, the terms hosting partition
(meaning the IBM i operating system Server) and hosted partition (meaning the client partition) are used.
14.6.2 Virtual SCSI Performance Examples
The following sections compare virtual to native I/O performance on bandwidth tests. In these
tests, a single thread operates sequentially on a constant file that is 6GB in size, with a dedicated
IBM i operating system Server partition. More I/O operations are issued when reading or writing
to the file using a small block size than with a larger block size. Because of the larger number of
operations and the fact that each operation has a fixed amount of overhead regardless of transfer
length, the bandwidth measured with small block sizes is much lower than with large block
sizes.
For tests with multiple Network Storage Spaces (NWSS), a thread operates sequentially for each
network storage space on a constant file that is 6GB in size, again with a dedicated IBM i
operating system Server partition. The following sections compare native vs. virtual, multiple
network storage spaces, multiple network storage descriptions, and disk scaling.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
234
Download from Www.Somanuals.com. All Manuals Search And Download.
14.6.2.1 Native vs. Virtual Performance
Figure 1 shows a comparison of measured bandwidth using virtual SCSI and local attached
DASD for reads with varying block sizes of operations. The difference in the reads between
virtual I/O and native I/O in these tests is attributable to the increased latency using virtual I/O.
The difference in writes is caused by misalignment, which causes a read for every write. A write
alignment change is planned for a future IBM i operating system release which will make virtual
and native writes similar in speed.
Native vs. Virtual
100
80
60
40
20
0
Native Read
Virtual Read
Native Write
Virtual Write
Figure 1 - The figure above shows a comparison of native vs virtual. This experiment shows that virtual write performance is
significantly less then native. Read performance performs similar or better then native depending on the read-cache
performance.
14.6.2.2 Virtual SCSI Bandwidth-Multiple Network Storage Spaces
Figure 2 shows a comparison of measured bandwidth while scaling network storage spaces with
varying block sizes of operations. The difference in the scaling of these tests is attributable to the
performance gain, which can be achieved by adding multiple network storage spaces. This
experiment shows that in order to achieve better performance from the hard disk, multiple
network storage spaces can be used.
15 Disk 5GB Sequential Write
15 Disk 5GBSequential Read
200
180
50
160
40
140
120
30
100
20
80
60
10
40
20
0
0
4
8
16
32
64
128
256
4
8
16
32
64
128
256
I/O Block Size
I/OBlock Size
1 NWSS
2 NWSS
4 NWSS
8 NWSS
1 NWSS
2 NWSS
4 NWSS
8 NWSS
Figure 2- The figures above show performance while scaling network storage spaces. This experiment shows that adding
NWSS increases the throughput for read/write performance. The best performance is achieved by using 8 NWSS.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
235
Download from Www.Somanuals.com. All Manuals Search And Download.
14.6.2.3 Virtual SCSI Bandwidth-Network Storage Description (NWSD) Scaling
Figure 3 shows a comparison of measured bandwidth while scaling network storage descriptions
with varying block sizes of operations. Each of the network storage descriptions have a single
network storage space attached to them. The difference in the scaling of these tests is
attributable to the performance gain which can be achieved by adding multiple network storage
descriptions. This experiment shows that in order to achieve better write performance from the
hard disk, multiple network storage descriptions can be used. In order to achieve better
performance, 1 network storage space should be used for every 2-4 disk drives in the ASP and
each network storage space should be attached to its own network storage description.
NWSD Read Scaling
NWSD Write Scaling
120
100
80
200
175
150
125
100
75
AIX Nati ve
1 NWSD Read
2 NWSD Read
4 NWSD Read
8 NWSD Read
AIX Nati ve
1 NWSD Wr ite
2 NWSD Wr ite
4 NWSD Wr ite
8 NWSD Read
60
40
50
25
0
20
0
S malT r an s a c t ion s
(4 k- 16 k)
M edui m T r an sact oi n s
(3 2 k-6 4 k)
Lar ge T r an sact ion s
( 12 8 k+)
Smal l T r ansacti ons
(4k-16k)
Medi umT r ansacti ons Lar ge T r ansacti ons
(32k-64k)
(128k+)
Figure 3 The figures above show performance while scaling network storage descriptions. This experiment shows that
adding NWSD increases the throughput for write performance, which was not achievable using 1 network storage
description. Read performance increases similar to the network storage space scaling figure.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
236
Download from Www.Somanuals.com. All Manuals Search And Download.
14.6.2.4 Virtual SCSI Bandwidth-Disk Scaling
Figure 4 shows a comparison of measured bandwidth while scaling disk drives with varying
block sizes of operations. Each of the network storage descriptions have a single network storage
space attached to them. The difference in the scaling of these tests is attributable to the
performance gain which can be achieved by adding disk drives and IO adapters. The figures
below include small (4k-64k) transactions and larger (128k) transactions.
Read Performance-Small Transactions
Write Performance-Small Transactions
300
250
250
200
150
10 0
50
200
150
10 0
50
1 N W SD
1 N W SD
2 NWSD
4 NWSD
8 NWSD
16 N W SD
24 NWSD
2 NWSD
4 NWSD
8 NWSD
16 N W SD
24 NWSD
0
0
15 Disk
30 Disk
45 Disk
15 Disk
30 Disk
45 Disk
Disk
Disk
Read Performance - Large
Transactions
Write Performance-Large Transactions
300
250
200
150
10 0
50
1 N W SD
600
400
200
0
1 N W SD
2 NWSD
4 NWSD
8 NWSD
16 N W SD
24 NWSD
2 NWSD
4 NWSD
8 NWSD
16 N W SD
24 NWSD
0
15 Disk
30 Disk
45 Disk
15 Disk
30 Disk
45 Disk
Disk
Disk
Figure 4 The figures above show read and write performance for small (4k-64k) and large transactions (128k+). This
experiment shows that adding disk drives increases the throughput. A system with 45 disk drives will be able to transfer
approximately 3 times faster then a system with 15 disk drives. Notice 24-network storage descriptions were used in order
to achieve maximum performance.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
237
Download from Www.Somanuals.com. All Manuals Search And Download.
14.6.3 Sizing
Sizing methodology is based on the observation that processor time required to perform an I/O
on the IBM i operating system Virtual SCSI server is fairly constant for a given I/O size. The
I/O devices supported by the Virtual SCSI server are sufficiently similar to provide good
recommendations. These numbers are measured at the physical processor.
There are considerations to address when designing and implementing a Virtual SCSI
environment. The primary considerations are:
• Dedicated processor server partitions or Micro-Partitioning
• Server partition memory requirements
• One thing that does not have to be factored into sizing is the processor impact of using
Virtual I/O on the client. The processor cycles executed on the client to perform a Virtual
SCSI I/O are comparable to that of a locally attached I/O. Thus, there is no increase or
decrease in sizing on the client partition for a known task.
14.6.3.1 Sizing when using Dedicated Processors
One sizing method is to size the Virtual SCSI server to the maximum I/O rate of the attached
storage subsystem. The sizing could be biased to small I/Os or large I/Os. Sizing to maximum
capacity for large I/Os balances the processor capacity of the Virtual SCSI server to the potential
I/O bandwidth of the attached I/O. The negative facet of this sizing methodology is that, in
nearly every case, we will assign more processor entitlement to the Virtual SCSI server than it
typically consumes.
Consider a case where an I/O server manages 15 physical SCSI disks. We can arrive at an upper
bound of processors required based on assumptions about the I/O rates that the disks can
achieve. If it is known that the workload is dominated by 16 KB operations, we could assume
that the 15 disks are capable of 1 read transaction every 36 milliseconds. An IBM i operating
system Virtual SCSI server could support around 30,000 read transactions per second on a single
processor provided enough disk were present.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
238
Download from Www.Somanuals.com. All Manuals Search And Download.
To calculate IBM i operating system Virtual SCSI CPU requirements the following formula is
provided. The number of transactions per second could be collected by the IBM i operating
system command WRKDSKSTS. Based on the average transaction size in WRKDSKSTS,
select a number from the table.
Size of IO
4
8
16 32 64
128
163
148
256
314
282
Read
16 22 34 57 92
21 26 36 54 82
Type of Transaction
Write
Figure 5- CPU milliseconds to process virtual SCSI I/O transaction
The table above shows the time in milliseconds per transaction that Virtual SCSI takes to process
one transaction. This value can be used in the formula below to estimate the amount of CPU
required per a partition.
(# of Transactions per second * Time in Milliseconds per transaction) = CPU
Utilization
1,000,000
For example.. If your workload performed 10,000 16k read transactions the equation would look
like this (34 was selected from the table above):
(10,000 * 34)= .34(34% of a total CPU)
1,000,000
The total CPU required for a workload, which performs 10,000 16k read transactions per second,
would be 34% of a 2.2Ghz POWER5 processor. If a different size processor is used adjust these
numbers accordingly. Remember the number chosen in WRKDSKSTS is an average of all
I/O’s. Your workload could be a mixture of very large transactions and very small transactions.
This is to provide a guideline of how to size your CPU correctly, and your results might vary.
Using Dedicated processor partitions may require more CPU then necessary that could be used
by other partitions, but will guarantee peak performance. It is most effective if the average I/O
size can be estimated so that peak bandwidth does not have to be assumed. Most Virtual SCSI
servers will not run at maximum I/O rates all the time, so the use of surplus processor time is
potentially wasted by using dedicated processor partitions.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
239
Download from Www.Somanuals.com. All Manuals Search And Download.
14.6.3.2 Sizing when using Micro-Partitioning
Defining Virtual SCSI servers in micro-partitions enables much better granularity of processor
resource sizing and potential recovery of unused processor time by uncapped partitions.
Tempering those benefits, use of micro-partitions for Virtual SCSI servers slightly increases I/O
response time and creates somewhat more complex processor entitlement sizing.
The sizing methodology should be based on the same operation costs as for IBM i operating
system Server partitions. However, additional entitlement should be added for running in
micro-partitions. We recommend that the IBM i operating system Server partition be configured
as uncapped so it can take advantage of unused capacity of other partitions, it is possible to get
more processor time to service I/O.
Because I/O latency with Virtual SCSI varies with the machine utilization and IBM i operating
system Server topology, consider the following:
1. For the most demanding I/O traffic (high bandwidth or very low latency), try to use native
I/O.
2. If using Virtual I/O and the system contains enough processors, consider putting the IBM i
operating system Server in a dedicated processor partition.
3. If using a Micro-Partitioning IBM i operating system Server, use as few virtual processors as
possible.
4. In order to avoid latency issues try to always size the CPU generously
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
240
Download from Www.Somanuals.com. All Manuals Search And Download.
14.6.3.3 Sizing memory
The IBM i operating system Virtual SCSI server supports data read caching on the virtual hosted
disk server partition. Thus all I/Os that it services could benefit from effects of caching heavily
used data. Read performance can vary depending upon the amount of memory which is assigned
to the server partition. Workloads which have a small memory footprint can improve their
performance greatly by increasing the amount of memory in the IBM i operating system Virtual
SCSI server. Alternatively, a system which works on a large amount of data may not see any
benefit from caching. The memory for the IBM i operating system Virtual SCSI server in this
case can be set at less then 1 GB.
One method to size this is to begin by looking at your ASP in which your network storage space
is located. While the system is running the desired workload, type in the command
WRKDSKSTS. Write down the average number of I/O request per second in the ASP which is
being used by the network storage space. Now dynamically add memory to the partition.
Check the number of I/O requests per second once again (remember to reset the statistics using
F10). The number of I/O requests per second should lower and your throughput to the IBM i
operating system Virtual SCSI server should increase.
Continue adding memory to the IBM i operating system server until you no longer see the
number of I/O requests per second change. If your workload changes at a later date the memory
can be readjusted accordingly.
Figure 6 below shows a comparison of measured bandwidth of cached transactions with varying
block sizes of operations. The figure includes small (4k-64k) transactions and larger(128k)
transactions. A partition which runs completely from memory can experience throughput rates
as high as 6GB/sec. If it is memory constrained the systems throughput will be lower.
15 Disk 1GB Sequential Read
6
5
4
3
2
1
0
4
8
16
32
64
128
256
512
1024
2048
I/ O Block Size
Figure 6 - The figure above shows a comparison of small, medium, and large transactions affect on memory if cached. The
lines represent the amount of data, which is cached in memory. The efficiency of I/O improves with cache hits and larger I/O
size. Effectively, there is a fixed latency to start and complete an I/O, with some additional cycle time based on the size of
the I/O.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
241
Download from Www.Somanuals.com. All Manuals Search And Download.
14.6.4 AIX Virtual IO Client Performance Guide
The following is a link which will direct you to more in-depth performance tuning for AIX
virtual SCSI client.
Advanced POWER Virtualization on IBM
p5 Servers: Architecture and Performance
Considerations http://www.redbooks.ibm.com/abstracts/sg247940.html?
14.6.5 Performance Observations and Tips
• In order to achieve best performance 1 network storage description should be used for
every 2-4 disks within an ASP.
• A method to improve write performance is to create 8 NWSD for every 15 disks.
• Best performance was obtained with a network storage description for every network
storage space.
• Sizing your memory correctly can improve read performance vastly.
• Multiple network storage descriptions (NWSD) can be attached to a single ASP. No
performance benefit from using multiple ASP’s was seen.
• For maximum logical volume throughput use multiple network storage spaces attached to
a single logical volume.
• With low I/O loads and a small number of partitions, Micro-Partitioning of the IBM i
operating system Server partition has little effect on performance.
• For a more efficient Virtual SCSI implementation with larger loads, it may be
advantageous to keep the I/O server as a dedicated processor.
• Extensive information can be found at the System i Information Center web site at:
14.6.6 Summary
Virtualization is an innovative technology that redefines the utilization and economics of
managing an on demand operating environment. POWER5 and future POWER architectures
provide new opportunities for clients to take advantage of virtualization capabilities. IBM i
operating system family provides the capability for a single physical I/O adapter to be used by
multiple logical partitions of the same server, enabling consolidation of I/O resources.
The system resource cost of Virtual SCSI implementation is small, and clients should assess the
benefits of the Virtual SCSI implementation for their environment. Simultaneous multithreading
should be enabled in a virtual SCSI environment.
Virtual SCSI implementation is an excellent solution for clients looking to consolidate I/O
resources with a modest amount of processor. The new IBM i operating system POWER
Systems Virtual SCSI capability creates new opportunities for consolidation, and demonstrates
strong performance and manageability.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 14 DASD Performance
242
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 15. Save/Restore Performance
This chapter’s focus is on the IBM i operating system platform. For legacy system models, older
device attachment cards, and the lower performing backup devices see the V5R3 performance capabilities
reference.
Many factors influence the observable performance of save and restore operations. These factors include:
y
The backup device models, number of DASD units the data is spread across, processors, LPAR
configurations, IOA used to attach the devices.
y
Workload type: Large Database File, User Mix, Source File, integrated file system (Domino, Network
Storage, 1 Directory Many Objects, Many Directory Many Objects.
y
y
The use of data compression, data compaction, and Optimum Block Size (USEOPTBLK)
Directory structure can have a dramatic effect on save and restore operations.
15.1 Supported Backup Device Rates
As you look at backup devices and their performance rates, you need to understand the backup device
hardware and the capabilities of that hardware. The different backup devices and IOAs have different
capabilities for handling data for the best results in their target market. The following table contains
backup devices and rates. Later in this document the rates are used to help determine possible
performance. A study of some customer data showed that compaction on their database file data
occurred at a ratio of approximately 2.8 to 1. The database files used for the performance workloads were
created to simulate that result.
Table 15.1.1 backup device speed and compaction information
Backup Device
DVD-RAM
Rate (MB/S)
COMPACTION FACTOR
#1
0.75 Write/2.8 Read
2.8
SAS DVD-RAM
2.5
4.0
2.8 #1
2.0
2.0
2.0
2.0
2.0
2.8
2.8
2.8
2.0
2.0
2.0
2.5
SLR60
SLR100
5.0
VXA-2
6.0
6279 VXA-320
12.0
6.0
6258 4MM tape Drive
5755 ½ High Ultrium-2
3580 Ultrium 2
18.0
35.0
40.0
80.0
120.0
120.0
100.0
3592J Fiber Channel
3580 Ultrium 3 Fiber Channel)
5746 Half High Ultrium 4
3580 Ultrium 4 Fiber Channel
3592E Fiber Channel
#1. Software compression is used here because the hardware doesn’t support device compaction
Note the compaction factor is a number that used with the formulas in the following chapter to help describe the actual rates
observed as the lab workloads were run using the above drives. This is not the compression ratio of the data being written to
tape. I list them here to help understand what our experiments were able to achieve relative to the published drive speed.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 15. Save/Restore Performance
243
Download from Www.Somanuals.com. All Manuals Search And Download.
15.2 Save Command Parameters that Affect Performance
Use Optimum Block Size (USEOPTBLK)
The USEOPTBLK parameter is used to send a larger block of data to backup devices that can take
advantage of the larger block size. Every block of data that is sent has a certain amount of overhead that
goes with it. This overhead includes block transfer time, IOA overhead, and backup device overhead. The
block size does not change the IOA overhead and backup device overhead, but the number of blocks
does. For example, sending 8 small blocks will result in 8 times as much IOA overhead and backup
device overhead. This allows the actual transfer time of the data to become the gating factor. In this
example, 8 software operations with 8 hardware operations essentially become 1 software operation with
1 hardware operation when USEOPTBLK(*YES) is specified. The usual results are significantly lower
CPU utilization and the backup device will perform more efficiently.
Data Compression (DTACPR)
Data compression is the ability to compress strings of identical characters and mark the beginning of the
compressed string with a control byte. Strings of blanks from 2 to 63 bytes are compressed to a single
byte. Strings of identical characters between 3 and 63 bytes are compressed to 2 bytes. If a string cannot
be compressed a control character is still added which will actually expand the data. This parameter is
usually used to conserve storage media. If the backup device does not support data compaction, the
system i software can be used to compress the data. This situation can require a considerable amount of
CPU.
Data Compaction (COMPACT)
Data compaction is the same concept as software compression but available at the hardware level. If you
wish to use data compaction, the backup device you choose must support it.
IBM i 6.1 Performance Capabilities Reference - January/April 2008
© Copyright IBM Corp. 2008
Chapter 15. Save/Restore Performance
244
Download from Www.Somanuals.com. All Manuals Search And Download.
15.3 Workloads
The following workloads were designed to help evaluate the performance of single, concurrent and
parallel save and restore operations for selected devices. Familiarization with these workloads can help in
understanding differences in the save and restore rates.
Database File related Workloads:
The following workloads are designed to show some possible customer environments using database
files.
User Mix
User Mix 3GB, User Mix 12GB - The User Mix data is contained in a single library and
made up of a combination of source files, database files, programs, command objects,
data areas, menus, query definitions, etc. User Mix 12GB contains 49,500 objects and
User Mix 3GB contains 12,300 objects.
Source File
Source File 1GB - 96 source files with approximately 30,000 members.
Large Database File Large File 4GB, 32GB, 64GB, 320GB - The Large Database File workload is a
single database file. The members in the 4GB and 32GB files are 4GB in size. The
Members in the 64GB and 320GB files are 64GB in size.
Integrated File System related Workloads:
Analysis of customer systems indicates about 1.5 to 1 compaction on the tape drives with integrated file
system data. This is partly due to the fact that the IBM i operating system programs that store data in the
integrated files system, do some disk management functions where they keep the IFS space cleaned up
and compressed. And the fact that the objects tend to be smaller by nature, or are mail documents, HTML
files or graphic objects that don’t compact. The following workloads (1 Directory Many Objects, Many
Directories Many Objects, Domino, Network Storage Space) show some possible customer integrated file
system environments.
1 Directory Many objects
This integrated file system workload consists of 111,111 stream files in a
single directory where the stream files have 32K of allocated space, 24K of which is data.
Approximately 4 GB total sampling size.
Many Directories Many objects
This integrated file system workload is 6 levels deep, 10
directories wide where each directory level contains 10 directories resulting in a total of
111,111 Directories and 111,111 stream files, where the stream files have 32K of
allocated space, 24K of which is data. Approximately 5 GB total sampling size.
Domino
This integrated file system workload consists of a single directory containing 90 mail
files. Each mail file is 152 MB in size. The mail files contain mail documents with
attachments where approximately 75% of the 152 MB is attachments. Approximately 13
GB total sampling size.
Network Storage Space
This integrated file system workload consists of a Linux storage space of
approximately 6 GB total sampling size.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 15. Save/Restore Performance
245
Download from Www.Somanuals.com. All Manuals Search And Download.
15.4 Comparing Performance Data
When comparing the performance data in this document with the actual performance on your system,
remember that the performance of save and restore operations is data dependent. If the same backup
device was used on data from three different systems, three different rates may result.
The performance of save and restore operations are also dependent on the system configuration, most
directly affected by the number and type of DASD units on which the data is stored and by the type of
storage IOAs being used.
Generally speaking, the Large Database File data that was used in testing for this document was designed
to compact at an approximate 2.8 to1 ratio. If we were to write a formula to illustrate how performance
ratings are obtained, it would be as follows:
((DeviceSpeed * LossFromWorkLoadType) * Compaction) = MB/Sec * 3600 = MB/HR /1000 = GB/HR.
But the reality of this formula is that the “LossFromWorkLoadType” is far more complex than described
here. The different workloads have different overheads, different compaction rates, and the backup
devices use different buffer sizes and different compaction algorithms. The attempt here is to group these
workloads as examples of what might happen with a certain type of backup device and a certain
workload.
Note: Remember that these formulas and charts are to give you an idea of what you might achieve from
a particular backup device. Your data is as unique as your company and the correct backup device
solution must take into account many different factors.
The save and restore rates listed in this document were obtained on a dedicated system. A dedicated
system is one where the system is up and fully functioning but no other users or jobs are running except
the save and restore operations. All processors and Memory were dedicated to the system and no partial
processors were used. Other subsystems such as QBATCH are required in order to run concurrent and
parallel operations. All workloads were deleted before restoring them again.
IBM i 6.1 Performance Capabilities Reference - January/April 2008
© Copyright IBM Corp. 2008
Chapter 15. Save/Restore Performance
246
Download from Www.Somanuals.com. All Manuals Search And Download.
15.5 Lower Performing Backup Devices
With the lower performing backup devices, the devices themselves become the gating factor so the save
rates are approximately the same, regardless of system CPU size (DVD-RAM).
Table 15.5.1 Lower performing backup devices LossFromWorkLoadType Approximations (Save Operations)
Workload Type
Amount of Loss
Large Database File
User Mix / Domino / Network Storage Space
Source File / 1 Directory Many Objects / Many Directories Many Objects
95%
55%
25%
Example for a DVD-RAM:
DeviceSpeed * LossFromWorkLoad * Compaction Factor
0.75 * 0.95 = (.71)
0.75 * 0.95 = (.71)
* 2.8 = (1.995) MB/S * 3600 = 7182 MB/HR = 7 GB/HR
* No Compression * 3600 = 2556 MB/HR = 2.5 GB/HR
15.6 Medium & High Performing Backup Devices
Medium & high performing backup devices (SLR60, SLR100, VXA-2, VXA-320).
Table 15.6.1 Medium performing backup devices LossFromWorkLoadType Approximations (Save Operations)
Workload Type
Amount of Loss
Large Database File
User Mix / Domino / Network Storage Space
Source File / 1 Directory Many Objects / Many Directories Many Objects
95%
65%
25%
Example for SLR100:
DeviceSpeed * LossFromWorkLoad * Compaction Factor
5.0 * 0.95 = (4.75)
* 2.0 = (9.5) MB/S * 3600 = 34200 MB/HR = 34 GB/HR
15.7 Ultra High Performing Backup Devices
High speed backup devices are designed to perform best on large files. The use of multiple high speed
backup devices concurrently or in parallel can also help to minimize system save times. See section on
Multiple backup devices for more information (3580 Ultrium-2, 3580 Ultrium-3 (2Gb & 4Gb Fiber
Channel), 3592J, 3592E).
Table 15.7.1 Higher performing backup devices LossFromWorkLoadType Approximations (Save Operations)
Workload Type
Amount of Loss
Large Database File
User Mix / Domino / Network Storage Space
Source File / 1 Directory Many Objects / Many Directories Many Objects
95%
50%
5%
Example for 3580 ULTRIUM-2 Fiber:
DeviceSpeed * LossFromWorkLoad * Compaction Factor
LG File 35.0 * 0.95 = (33.25)
UserMix 35.0 * 0.50 = (17.5)
Source 35.0 * 0.05 = (1.75)
* 2.8= (93.1) MB/S *3600 = 335160 MB/HR = 335 GB/HR
* 2.8= (49) MB/S *3600 = 176400 MB/HR = 176 GB/HR
* 2.8= (4.9) MB/S *3600 = 17640 MB/HR = 17.6 GB/HR
NOTE: Actual performance is data dependent, these formulas are for estimating purposes and may
not match actual performance on customer systems.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 15. Save/Restore Performance
247
Download from Www.Somanuals.com. All Manuals Search And Download.
15.8 The Use of Multiple Backup Devices
Concurrent Saves and Restores - The ability to save or restore different objects from a single
library/directory to multiple backup devices or different libraries/directories to multiple backup devices at
the same time from different jobs. The workloads that were used for the testing were Large Database
File and User Mix from libraries. For the tests multiple identical libraries were created, a library for each
backup device being used.
Parallel Saves and Restores - The ability to save or restore a single object or library/directory across
multiple backup devices from the same job. Understand that the function was designed to help those
customers, with very large database files which are dominating the backup window. The goal is to
provide them with options to help reduce that window. Large objects, using multiple backup devices,
using the parallel function, can greatly reduce the time needed for the object operation to complete as
compared to a serial operation on the same object.
Concurrent operations to multiple backup devices will probably be the preferred solution for most
customers. The customers will have to weigh the benefits of using parallel verses concurrent operations
for multiple backup devices in their environment. The following are some thoughts on possible solutions
to save and restore situations. Remember that memory, processors and DASD play a large factor in
whether or not you will be able to make use of parallel or concurrent operations that can be used to affect
the back up window.
- For save and restore with a User Mix or small to medium object workloads, the use of concurrent
operations will allow multiple objects to be processed at the same time from different jobs, making better
use of the backup devices and the system.
- For systems with a large quantity of data and a few very large database files whether in libraries or
directories, a mixture of concurrent and parallel might be helpful. (Example: Save all of the
libraries/directories to one backup device, omitting the large files from the library or the directory the file
is located in. At the same time run a parallel save of those large files to multiple backup devices.)
- For systems dominated by Large Files the only way to make use of multiple backup devices is by using
the parallel function.
- For systems with a few very large files that can be balanced over the backup devices, use concurrent
saves.
- For backups where libraries/directories increase or decrease in size significantly throwing concurrent
saves out of balance constantly, the customer might benefit from the parallel function as the
libraries/directories would tend to be balanced against the backup devices no matter how the libraries
change. Again this depends upon the size and number of data objects on the system.
- Customers planning for future growth where they would be adding backup devices over time, might
benefit by being able to set up Backup Recovery Media Services (BRMS/400) using *AVAIL for backup
devices. Then when a new backup device is added to the system and recognized by BRMS/400 it will be
used, leaving the BRMS/400 configuration the same but benefiting from the additional backup device.
Also the same is true in reverse: If a backup device is lost, the weekly backup doesn't have to be
postponed and the BRMS/400 configuration doesn't need to change, the backup will just use the available
backup devices at the time of the save.
IBM i 6.1 Performance Capabilities Reference - January/April 2008
© Copyright IBM Corp. 2008
Chapter 15. Save/Restore Performance
248
Download from Www.Somanuals.com. All Manuals Search And Download.
15.9 Parallel and Concurrent Library Measurements
This section discusses parallel and concurrent library measurements for tape drives, while sections later in
this chapter discuss measurements for virtual tape drives.
15.9.1 Hardware (2757 IOAs, 2844 IOPs, 15K RPM DASD)
Hardware Environment.
This testing consisted of an 840 24 way system with 128 GB of memory. The model 840 doesn’t support
the 15K RPM DASD in the main tower so only 4, 18 GB 10K RPM RAID protected DASD units were in
the main tower.
15 PCI-X towers (5094 towers), were attached and filled with 45, 35 GB 15K RPM RAID protected
DASD units. 2757 IOAs in all 15 towers and 2844 IOPs. All of the towers attached to the system were
configured into 8 High Speed Link (HSL) with two towers in each link. One 5704 fiber channel
connector in each tower, or two per HSL. A total of 679 DASD, 675 of which were 35 GB 15K RPM
DASD units all in the system ASP. We used the new high speed ULTRIUM GEN 2 tape drives, model
3580 002 fiber channel attached.
There were a lot of different options we could have chosen to try to view this new hardware, we were
looking for a reasonable system to get the maximum data flow, knowing that at some point someone will
ask what is the maximum. As you look at this information you will need to put it in prospective of your
own system or system needs.
We chose 8 HSLs because our bus information would tell us that we can only flow so much data across a
single HSL. The total number of 3580 002 tape drives we believe we could put on a link was something a
little greater than 2, but the 3rd tape drive would probably be slowed greatly by what the HSL could
support, so to maximize the data flow we chose to put only two on a HSL.
What does this mean to your configuration? If you are running large file save and restore operations we
would recommend only 2 high speed tape drives per HSL. If your data leans more toward user mix you
could probably make use of more drives in a single HSL. How many will depend upon your data.
Remember there are other factors that affect save and restore operations, like memory, number of
processors available, number and type of DASD available to feed those tape drives, and type of storage
IOAs being used.
Large File operations create a great deal of data flow without using a lot of processing power but User
Mix data will need those Processors, memory and DASD. Could the large file tests have been done by
fewer processors? Yes, probably by something between 8 and 16 but in order to also do the user mix in
the same environment we choose to have the 24 processors available. The user mix is a more generic
customer environment and will be informational to a larger set of customers and we wanted to be able to
provide some comparison information for most customers to be able to use here.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 15. Save/Restore Performance
249
Download from Www.Somanuals.com. All Manuals Search And Download.
15.9.2 Large File Concurrent
For the concurrent testing 16 libraries were built, each containing a single 320 GB file with 80 4 GB
members. The file size was chosen to sustain a flow across the HSL, system bus, processors, memory and
tapes drives for about an hour. We were not interested in peak performance here but sustained
performance. Measurements were done to show scaling from 1 to 16 tape drives, knowing that near the
top number of tape drives that the system would become the limiting factor and not the tape drives. This
could be used by customers to give them an estimate at what might be a reasonable number of tape drives
for their situation.
Table 15.9.2.1 iV5R2 16 - 3580.002 Fiber Channel Tape Device Measurements (Concurrent)
(Save = S, & Restore = R)
# 3580.002
Tape drives
12
13
14
15
16
1
2
3
4
8
320 GB
DB file with
80 4 GB
365
730
1.09
1.45
4.15
4.63
4.90
5.14
5.21
2.88
TB/HR
S
GB/HR GB/HR TB/HR TB/HR
340 680 1.01 1.33
GB/HR GB/HR TB/HR TB/HR
TB/HR TB/HR TB/HR TB/HR TB/HR
3.73 4.02 4.28 4.54 4.68
TB/HR TB/HR TB/HR TB/HR TB/HR
2.56
TB/HR
R
members
In the table above, you will notice that the 16th drive starts to loose value. Even though there is gain we
feel we are starting to see the system saturation points start to factor in. Unfortunately, we didn’t have
anymore drives to add in but believe that the total data throughput would be relatively equal, even if any
more drives were added.
Save and Restore Rates
Large File Concurrent Runs
6
5.2 TB/HR
5
4
Save
3
Restore
2
1
0
1
3
5
7
9
11
13
15
2
4
6
8
10
12
14
16
# 3580 model 002 Tape Drives
IBM i 6.1 Performance Capabilities Reference - January/April 2008
© Copyright IBM Corp. 2008
Chapter 15. Save/Restore Performance
250
Download from Www.Somanuals.com. All Manuals Search And Download.
15.9.3 Large File Parallel
For the measurements in this environment, BRMS was used to manage the save and restore, taking
advantage of the ability built into BRMS to split an object between multiple tape drives. Starting with a
320 GB file in a single library and building it up to 2.1 TB for tape drive tests 1 - 4 and 8. The file was
then duplicated in the library for tape drive tests 12 - 16, a single library with two 2.1 TB files was used.
Not quite the same as having a 4.2 TB file. Because of certain limitations in building our test data, we
felt this was the best way to build the test data. The goal is to see scaling of tape drives on the system
along with trying to locate any saturation points that might help our customers identify limitations in their
own environment.
Table 15.9.3.1 iV5R2 16 - 3580.002 Fiber Channel Tape Device Measurements (Parallel)
(Save = S, & Restore = R)
#
3580.002
Tape
1
2
3
4
8
12
13
14
15
16
drives
363
641
997
1.34
TB/HR
2.1
TB/HR
3.29
TB/HR
3.60
TB/HR
3.71
TB/HR
3.90
TB/HR
4
S
GB/HR GB/HR GB/HR
340 613 936
GB/HR GB/HR GB/HR
TB/HR
1.23
TB/HR
1.90
TB/HR
2.95
TB/HR
3.14
TB/HR
3.35
TB/HR
3.55
TB/HR
3.65
TB/HR
R
Save and Restore Rates
Large File Parallel Runs
5
4
3
2
1
0
4 TB/HR
Save
Restore
1
3
5
7
9
11
13
15
2
4
6
8
10
12
14
16
# 3580 model 002 Tape Drives
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 15. Save/Restore Performance
251
Download from Www.Somanuals.com. All Manuals Search And Download.
15.9.4 User Mix Concurrent
User Mix will generally portray a fair population of customer systems, where the real data is a mixture of
programs, menus, commands along with their database files. The new ultra tape drives are in their glory
when streaming large file data, but a lot of other factors play a part when saving and restoring multiple
smaller objects.
Table 15.9.4.1 iV5R2 16 - 3580.002 Fiber Channel Tape Device Measurements (Concurrent)
(Save = S, & Restore = R)
# 3580.002
Tape drives
6
7
8
9
10
11
12
1
2
3
4
5
12 GB total
Library size
workload
was used
for
140
272
399
504
627
699
782
858
932
965
995
1010
S
GB/HR GB/HR GB/HR GB/HR GB/HR GB/HR GB/HR GB/HR GB/HR GB/HR GB/HR GB/HR
modeling
this, as
described in
section
69
135
150
176
213
231
277
290
321
333
358
380
R
GB/HR GB/HR GB/HR GB/HR GB/HR GB/HR GB/HR GB/HR GB/HR GB/HR GB/HR GB/HR
15.3
Save and Restore Rates
User Mix Concurrent Runs
1200
1000
800
600
400
200
0
1 TB/HR
Save
Restore
1
3
5
7
9
11
10 12
2
4
6
8
# 3580 model 002 Tape Drives
IBM i 6.1 Performance Capabilities Reference - January/April 2008
© Copyright IBM Corp. 2008 Chapter 15. Save/Restore Performance
252
Download from Www.Somanuals.com. All Manuals Search And Download.
15.10 Number of Processors Affect Performance
With the Large Database File workload, it is possible to fully feed two backup devices with a single
processor, but with the User Mix workload it takes 1+ processors to fully feed a backup device. A
recommendation might be 1 and 1/3 processors for each backup device you want to feed with User Mix
data.
.1 Processor to 2 Processors Backup Operations
(Large File Workload) to Ultrium 3 Tape
520 2 way 16GB Memory 60 Dasd
500
490
480
.1 Processor
470
.2 Processor
GB/HR
460
.3 Processor
.5 Processor
450
1.0 Processor
440
2.0 Processors
430
420
Save
Restore
Operation
.1 Processor to 2 Processors Backup Operations
(User Mix Workload) to Ultrium 3 Tape
520 2 way 16GB Memory 60 Dasd
200
150
100
50
.1 Processor
.2 Processor
.3 Processor
.5 Processor
1.0 Processor
2.0 Processors
GB/HR
0
Save
Restore
Operation
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 15. Save/Restore Performance
253
Download from Www.Somanuals.com. All Manuals Search And Download.
15.11 DASD and Backup Devices Sharing a Tower
The system architecture does not require that DASD and backup devices be kept separated. Testing in the
IBM Rochester Lab, we had attached one backup device to each tower and all towers had 45 DASD units
in them, when we did the 3580 002 testing. The 3592J has similar characteristics to the 3580 002 but the
3580 003 and 3592E models have greater capacities which create new scenarios. You aren’t physically
limited to putting one backup device in a tower, but for the newest high speed backup devices you can
saturate the bus if you have multiple devices in a tower. You need to look at your total system or
partition configuration in order to determine if it is possible to use multiple high speed devices on the
system and still get the most out of these devices. No matter what you determine is possible we advocate
spreading your backup devices amongst the towers available.
5 to 75 DASD Backup Operations
(Large File Workload) to Ultrium 3 Tape
520 2 way 16GB Memory
500
5 Dasd
10 Dasd
15 Dasd
400
20 Dasd
25 Dasd
300
GB/HR
30 Dasd
35 Dasd
40 Dasd
45 Dasd
50 Dasd
55 Dasd
60 Dasd
75 Dasd
200
100
0
Save
Restore
Operation
5 to 75 DASD Backup Operations
(User Mix Workload) to Ultrium 3 Tape
520 2 way 16GB Memory
200
150
100
50
5 Dasd
10 Dasd
15 Dasd
20 Dasd
25 Dasd
30 Dasd
35 Dasd
40 Dasd
45 Dasd
50 Dasd
55 Dasd
60 Dasd
75 Dasd
GB/HR
0
Save
Restore
Operation
IBM i 6.1 Performance Capabilities Reference - January/April 2008
© Copyright IBM Corp. 2008
Chapter 15. Save/Restore Performance
254
Download from Www.Somanuals.com. All Manuals Search And Download.
15.12 Virtual Tape
Virtual tape drives are being introduced in iV5R4 so those customers can make use of the speed of
saving to DASD, then save the data using DUPTAP to the tape drives reducing the backup window where
the system is unavailable to users. There are a lot of pieces to consider in setting up and using Virtual
tape drives. The block size must match the physical backup device block capabilities you will be using.
The following helps to show that even if your workload is large file you may not gain anything in your
back up window even using the virtual tape drives. If your tape drive uses smaller block sizes your
virtual tape drive must use small blocks
VRT32K vs VRT256K Block size
570 16 way 128GB Memory
800 DASD units for Virtual Tape Drives
1600
1400
VRT32K Large File
Save
VRT256K Large File
1200
1000
Save
GB/HR
800
VRT32K Large File
Restore
600
VRT256K Large File
Restore
400
200
0
1 Virtual Tape Device
Operation
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 15. Save/Restore Performance
255
Download from Www.Somanuals.com. All Manuals Search And Download.
The following measurements were done on a system with newer hardware including a 3580 Ultrium 3
4Gb Fiber Channel Tape Drive, 571E storage adapters, and 4327 70GB (U320) DASD.
Save to Tape Vs. Save to Virtual Tape then DUPTAP to Tape
570, 8 Way, 96GB Memory, 305 DASD units for Virtual Tape Drives
Restricted State Save to Tape
Restricted State Save to Virtual Tape
Non Restricted DUPTAP
4
3.5
3
2.5
2
1.5
1
0.5
0
Save to Tape Vs. Save to Virtual Tape then DUPTAP to Tape
1000 Empty Libraries
3580 Ultrium 3 4Gb Fiber Tape Drive, 5761 4Gb Fiber Adapter
571E Storage Adapters, 4327 70GB (U320) DASD
305 DASD units for Virtual Tape Drives
570, 8 Way, 96GB Memory
30
20
10
0
Save to Tape
Save to Virtual Tape then DUPTAP
Restricted State Save to Tape
Non Restricted DUPTAP
Restricted State Save to Virtual Tape
Measurements were also done comparing save of 1000 empty libraries to tape versus save of these
libraries to virtual tape followed by DUPTAP from the virtual tape to tape. The save to tape was much
slower which can be explained as follows. When data is being saved to tape, a flush buffer is requested
after each file is written to ensure that the file is actually on the tape. This forces the drive to backhitch for
each file and greatly reduces the performance. The DUPTAP command does not need to send a flush
buffer until the duplicate command completes, so it does not have the same performance impact.
IBM i 6.1 Performance Capabilities Reference - January/April 2008
© Copyright IBM Corp. 2008
Chapter 15. Save/Restore Performance
256
Download from Www.Somanuals.com. All Manuals Search And Download.
15.13 Parallel Virtual Tapes
NOTE: Virtual tape is reading and writing to the same DASD so the maximum throughput with our
concurrent and parallel measurements is different than our tape drive tests where we were reading from
DASD and writing to tape.
Parallel Virtual Tape for Large File
570 16 way 128GB Memory
800 DASD units for Virtual Tape Drives
3500
3000
2500
1 Virtual Tape Device
2000
2 Virtual Tape Devices
3 Virtual Tape Devices
4 Virtual Tape Devices
5 Virtual Tape Devices
GB/HR
1500
1000
500
0
Save
Restore
Operation
Parallel Virtual Tape for User Mix
570 16 way 128GB Memory
800 DASD units for Virtual Tape Drives
300
250
200
150
100
50
1 Virtual Tape Device
2 Virtual Tape Devices
3 Virtual Tape Devices
4 Virtual Tape Devices
5 Virtual Tape Devices
GB/HR
0
Save
Restore
Operation
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 15. Save/Restore Performance
257
Download from Www.Somanuals.com. All Manuals Search And Download.
15.14 Concurrent Virtual Tapes
NOTE: Virtual tape is reading and writing to the same DASD so the maximum throughput with our
concurrent and parallel measurements is different than our tape drive tests where we were reading from
DASD and writing to tape.
Concurrent Virtual Tape for Large File
570 16 way 128GB Memory
800 DASD units for Virtual Tape Drives
3500
3000
2500
1 Virtual Tape Device
2000
2 Virtual Tape Devices
3 Virtual Tape Devices
4 Virtual Tape Devices
5 Virtual Tape Devices
GB/HR
1500
1000
500
0
Save
Restore
Operation
Concurrent Virtual Tape for User Mix
570 16 way 128GB Memory
800 DASD units for Virtual Tape Drives
1000
800
600
400
200
0
1 Virtual Tape Device
2 Virtual Tape Devices
3 Virtual Tape Devices
4 Virtual Tape Devices
5 Virtual Tape Devices
GB/HR
Save
Restore
Operation
IBM i 6.1 Performance Capabilities Reference - January/April 2008
© Copyright IBM Corp. 2008 Chapter 15. Save/Restore Performance
258
Download from Www.Somanuals.com. All Manuals Search And Download.
15.15 Save and Restore Scaling using a Virtual Tape Drive.
A 570 8 way System i was used for the following tests. A user ASP was created using up to 3 571F
IOAs with up to 36 U320 70 GB DASD on each IOA. The Chart shows the number of DASD in each test
and the Virtual tape drive was created using that DASD.
The workload data was restored into the system ASP and was then saved to the Virtual tape drive in the
user ASP. The system ASP consisted of 2 HSL loops, a mix of 571E and 571F IOAs and 312 - 70GB
U320 DASD units. These charts are very specific to this DASD but the scaling flow would be similar
with different IOAs the actual rates would vary. For more information on the IOAs and DASD see
Chapter 14 of this guide. Restoring the workloads from the Virtual tape drives started at 900 GB/HR
reading from 6 DASD and scaled up to 1.5 TB/HR on the 108 DASD. The bottle neck will be limited to
where you are writing and how many DASD are available to the write operation.
Large File Virtual Tape Scaling
(SAVE) Write to Virtual tape
RAID5 SAVE RAID6 SAVE MIRRORING SAVE
800
700
600
500
400
300
200
100
0
User Mix Virtual Tape Scaling
(SAVE) Write to Virtual Tape
RAID5 SAVE RAID6 SAVE MIRRORING SAVE
300
250
200
150
100
50
0
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 15. Save/Restore Performance
259
Download from Www.Somanuals.com. All Manuals Search And Download.
15.16 Save and Restore Scaling using 571E IOAs and U320 15K DASD units to a 3580
Ultrium 3 Tape Drive.
A 570 8 way System i was used for the following tests. A user ASP was created with the number of
DASD listed in each test . The workload data was then saved to the tape drive , deleted from the system
and restored to the user ASP. These charts are very specific to the new IOAs and U320 capable DASD
available. For more information on the IOAs and DASD see Chapter 14 of this guide.
Large File Save
RAID5 SAVE RAID6 SAVE MIRRORING SAVE
600
500
400
300
200
100
0
Large File Restore
RAID5 RESTORE
RAID6 RESTORE
MIRRORING RESTORE
600
500
400
300
200
100
0
IBM i 6.1 Performance Capabilities Reference - January/April 2008
© Copyright IBM Corp. 2008 Chapter 15. Save/Restore Performance
260
Download from Www.Somanuals.com. All Manuals Search And Download.
User Mix Saves
RAID5 SAVE RAID6 SAVE MIRRORING SAVE
350
300
250
200
150
100
50
0
User Mix Restores
RAID5 RESTORE
RAID6 RESTORE
MIRRORING RESTORE
180
160
140
120
100
80
60
40
20
0
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 15. Save/Restore Performance
261
Download from Www.Somanuals.com. All Manuals Search And Download.
15.17 High-End Tape Placement on System i
The current high-end tape drives (ULTRIUM-2 / ULTRIUM-3 and 3592-J / 3592-E) need to be placed
carefully on the System i buses and HSLs in order to avoid bottlenecking. The following rules-of thumb
will help optimize performance in a large-file save environment, and help position the customer for future
growth in tape activity:
vLimit the number of drives per fibre tape adapter as follows:
y
For ULTRIUM-2, 3592-J, and slower drives, two drives can share a fc 5704 or fc 5761 fibre tape adapter.
If running on a 2 GByte loop, a 3rd drive can share a fc 5761 fibre tape adapter
y
For ULTRIUM-3 and TS1120 (3592-E) drives, each drive should be on a separate fibre tape adapter
vPlace the fc 5704 or fc 5761 in a 64-bit slot on a “fast bus” as follows:
y
PCI-X
In a 5094/5294 tower use slot C08 or C09.
In a 5088/0588 tower use slot C08 or C09. You may need to purchase RPQ #847204 to allow the tower
to connect with RIO-G performance
In an 0595 or 5095 or 5790 expansion unit, use any valid slot
PCI
In a 5074/5079/5078 tower, use slot C02, C03 or C04
Note
y
y
“Ensure the fc 5761 is supported on your CPU type”
vPut one fc 5704 or fc 5761 per tower initially. On loops running at 2 GByte speeds, a second fc 5704 card can be
added according to the locations recommended above if needed.
vSpread tape fibre cards across as many HSL’s as possible, with maximums as follow
y
On Loops running at 1 GByte (e.g. all loops on 8xx systems, or loops with HSL-1 towers )
Maximum of two drives per HSL loop
y
On Loops running at 2 GByte (eg loops with all HSL-2 / RIO-G towers on system i systems)
Maximum of six ULTRIUM-2 or 3592-J drives per RIO-G loop.
Maximum of four ULTRIUM-3 drives or TS1120 (3592-E) drives per RIO-G loop using the fc
5704 IOA.
Maximum of two TS1120 (3592-E) drives per RIO-G loop using the fc 5761 IOA
vIf Gbit Ethernet cards are present on the system and will be running during the backups, then treat them as though
they were ULTRIUM-3 or TS1120 (3592-E) tape drives when designing the card and HSL placement using the
rules above since they can command similar bandwidth
The rules above assume that the customer is running a large-file workload and that all tape drives are active
simultaneously. If your customer is running a user-mix tape workload or the high load cards are not running
simultaneously, then it may be possible to put more gear on the bus/HSL than shown. There may also be certain
card layouts that will allow more drives per bus/tower/HSL, but these need to be reviewed individually.
IBM i 6.1 Performance Capabilities Reference - January/April 2008
© Copyright IBM Corp. 2008
Chapter 15. Save/Restore Performance
262
Download from Www.Somanuals.com. All Manuals Search And Download.
15.18 BRMS-Based Save/Restore Software Encryption and DASD-Based ASP Encryption
The Ultrium-3 was used in the following experiments, which attempt to characterize the effects of
BRMS-based save /restore software encryption and DASD-based ASP encryption. Some of the newer
tape drives offer hardware encryption as an option but for those who are not looking to upgrade or invest
in these tape units at this time, software encryption can be a fair solution. In the experiments we used full
processors that were dedicated to the partition. We used a 9406-MMA 4 way partition and a 9406-570 4
way partition. Both systems had 40 GB of main memory. The workload data was located on a single
user ASP with 36 - 70GB 15K RPM DASD attached through a 571F IOA. The experiments were not set
up to show the best possible environment or to take into account all of the possible hardware
environments, instead these experiments were an attempt to portray some of the differences customers
might observe if they choose software encryption as a back up strategy over their current non-encrypted
environment. Software encryption has a significant impact on save times but only a minor impact to
restore times
Tape Backup Performance - Saves
9406-MMA-4w ay Encrypted ASPSAVLIBBRM NO Softw are Encryption
9406-MMA-4w ay Encrypted ASPSAVLIBBRM With Softw are Encryption
9406-MMA-4w ay NON Encrypted ASPSAVLIBBRM NO Softw are Encryption
9406-MMA-4w ay NON Encrypted ASPSAVLIBBRM With Softw are Encryption
9406-570-4w ay NON Encrypted ASPSAVLIBBRM NO Softw are Encryption
9406-570-4w ay NON Encrypted ASPSAVLIBBRM With Softw are Encryption
700
600
500
400
300
200
100
0
1 GB Source File
12 GB User Mix
64 GB Large File
320 GB Large File
CPU Utilization during Saves
9406-MMA-4w ay %CPU Used Encrypted ASPSAVLIBBRM NO Softw are Encryption
9406-MMA-4w ay %CPU Used Encrypted ASPSAVLIBBRM With Softw are Encryption
9406-MMA-4w ay %CPU Used NON Encrypted ASPSAVLIBBRM NO Softw are Encryption
9406-MMA-4w ay %CPU Used NON Encrypted ASPSAVLIBBRM With Softw are Encryption
9406-570-4w ay %CPU Used NON Encrypted ASPSAVLIBBRM NO Softw are Encryption
9406-570-4w ay %CPU Used NON Encrypted ASPSAVLIBBRM With Softw are Encryption
25
20
15
10
5
0
1 GB Source File
12 GB User Mix
64 GB Large File 320 GB Large File
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 15. Save/Restore Performance
263
Download from Www.Somanuals.com. All Manuals Search And Download.
Tape Backup Performance - Restores
9406-MMA-4w ay Encrypted ASPRSTLIBBRM NO Softw are Encryption
9406-MMA-4w ay Encrypted ASPRSTLIBBRM With Softw are Encryption
9406-MMA-4w ay NON Encrypted ASPRSTLIBBRM NO Softw are Encryption
9406-MMA-4w ay NON Encrypted ASPRSTLIBBRM With Softw are Encryption
9406-570-4w ay NON Encrypted ASPRSTLIBBRM NO Softw are Encryption
9406-570-4w ay NON Encrypted ASPRSTLIBBRM With Softw are Encryption
300
250
200
150
100
50
0
1 GB Source File
12 GB User Mix
64 GB Large File
320 GB Large File
CPU Utilization during Restores
9406-MMA-4w ay %CPU Used Encrypted ASPRSTLIBBRM NO Softw are Encryption
9406-MMA-4w ay %CPU Used Encrypted ASPRSTLIBBRM With Softw are Encryption
9406-MMA-4w ay %CPU Used NON Encrypted ASPRSTLIBBRM NO Softw are Encryption
9406-MMA-4w ay %CPU Used NON Encrypted ASPRSTLIBBRM With Softw are Encryption
9406-570-4w ay %CPU Used NON Encrypted ASP RSTLIBBRM NO Softw are Encryption
9406-570-4w ay %CPU Used NON Encrypted ASP RSTLIBBRM With Softw are Encryption
20
15
10
5
0
1 GB Source File
12 GB User Mix
64 GB Large File
320 GB Large File
Performance will be limited to the native drive rates (shown in table 15.1.1) because encrypted data
blocks have a very low compaction ratio.
IBM i 6.1 Performance Capabilities Reference - January/April 2008
© Copyright IBM Corp. 2008
Chapter 15. Save/Restore Performance
264
Download from Www.Somanuals.com. All Manuals Search And Download.
15.19 5XX Tape Device Rates
Note: Measurements for the high speed devices were completed on a 570 4 way system with 2844 IOPs
and 2780 IOA’s and 180 15K RPM RAID5 DASD units. The smaller tape device tests were completed
on a 520 2 way with 75 DASD units. The Virtual tape and *SAVF runs were completed on a 570 ML16
with 256GB of memory and 924 DASD units. The goal of each of the tests is to show the capabilities of
the device and so a system large enough to achieve the maximum throughput for that device was used.
Customer performance will be dependent on over all systems resources and if those resources match the
maximum capabilities of the device. See other sections in this guide about memory, CPU and DASD.
Table 15.19.1
Measurements in (GB/HR) Workload data Saved and Restored from User ASP 2.
Workload
S = Save
R = Restore
Release Measurements were done iV5R4iV5R4iV5R3 iV5R4 iV5R4 iV5R3 iV5R3 iV5R4 iV5R4 iV5R4 iV5R3 iV5R4
17
19
30
30
32
30
32
32
17
17
31
31
35
31
34
34
14
19
33
33
40
35
37
37
41
40
19
19
48
48
70
53
21
24
17
20
17
22
29
30
30
30
30
35
20
35
20
S
R
S
R
S
R
S
R
S
R
S
Source File 1GB
User Mix 3GB
20
113
50
130
115
180
120
280
340
365
390
365
390
145
96
150
80
200
150
210
180
210
180
220
125
220
180
User Mix 12GB
Large File 4GB
280
280
350
330
350
330
82
68
82
68
225
175
225
175
500
500
500
500
525
510
65
560
560
560
560
580
570
65
800
800
Large File 32GB
Large File 64GB
830 1330 1450
830 1340 1500
890 1420 1700
830 1340 1530
R
S
Large File 320GB
R
23
12
25
9
25
13
25
9
27
13
30
9
40
30
40
20
67
55
70
56
35
47
65
14
65
16
65
60
50
30
65
50
50
23
70
60
55
30
S
1 Directory
Many Objects
50
60
R
S
35
50
50
50
50
Many Directories
Many Objects
23
9
9
30
30
R
S
29
29
34
34
29
29
34
34
35
33
40
40
125
110
125
140
190
190
200
200
230
230
230
260
410
420
350
380
500
500
380
380
530 1000 1250
560 1000 1200
500 1100 1100
490 1050 1100
Domino Mail Files
R
S
Network Storage Space
R
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 15. Save/Restore Performance
265
Download from Www.Somanuals.com. All Manuals Search And Download.
Table 15.19.2 - iV5R4M0 Measurements on an 5XX 1-way system 8 RAID5 protected DASD Units 8 GB memory
Measurements in (GB/HR) all 8 DASD in the system ASP .
Workload
S = Save
R = Restore
Release Measurements were done
Source File 1GB
iV5R4M0 iV5R4
22
15
34
30
39
37
12
8
17
19
30
30
32
32
23
12
25
9
S
R
S
R
S
User Mix 12GB
Large File 32GB
R
S
R
S
R
S
R
S
R
1 Directory
Many Objects
15
7
Many Directories
Many Objects
15
15
19
19
29
29
34
34
Domino Mail Files
Network Storage Space
IBM i 6.1 Performance Capabilities Reference - January/April 2008
© Copyright IBM Corp. 2008
Chapter 15. Save/Restore Performance
266
Download from Www.Somanuals.com. All Manuals Search And Download.
15.20 5XX Tape Device Rates with 571E & 571F Storage IOAs and 4327 (U320) Disk Units
Save/restore rates of 3580 Ultrium 3 (2Gb and 4Gb Fiber Channel) tape devices and of virtual tape
devices were measured on a 570 8-way system with 571E and 571F storage adapters and 714 type 4327
70GB (U320) disk units. Customer performance will be dependent on overall system resources and how
well those resources match the maximum capabilities of the device. See other sections in this guide about
memory, CPU and DASD.
Table 15.20.1
Measurements in (GB/HR)
Workload data Saved and Restored from User ASP 2.
2780 Storage
IOAs
571E/571F Storage IOAs
4327 70GB (U320) DASD
4326 35GB
(U160) DASD
(Data from table
15.18.1)
Workload
S = Save
R = Restore
Release Measurements were done
Source File 1GB
iV5R4
iV5R4
iV5R4
iV5R4
iV5R4
22
29
95
40
110
40
55
26
110
40
S
R
S
R
S
200
150
500
500
525
510
65
290
175
510
550
525
550
80
290
175
585
785
635
785
80
295
182
650
760
650
760
90
345
195
1380
1230
1420
1240
80
User Mix 12GB
Large File 64GB
R
S
Large File 320GB
R
S
R
S
R
S
R
S
R
1 Directory
Many Objects
50
60
60
45
65
50
55
60
65
65
Many Directories
Many Objects
30
30
30
25
30
410
420
350
380
440
460
355
405
450
460
410
460
550
600
425
525
1410
1190
1300
1230
Domino Mail Files
Network Storage Space
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 15. Save/Restore Performance
267
Download from Www.Somanuals.com. All Manuals Search And Download.
15.21 5XX DVD RAM and Optical Library
Table 15.21.1 - iV5R3 Measurements on an 520 2-way system 53 RAID protected DASD Units 16 GB memory
Measurements in (GB/HR) ASP 1 (System ASP 23 DASD) ASP 2 (30 DASD)
Workload data Saved and Restored from User ASP 2.
Workload
S = Save
R = Restore
Release Measurements were done
V5R3
V5R3
V5R3
V5R3
V5R3
V5R3
V5R3
V5R3
1.8
9.2
1.8
9.5
9.0
21.0
6.0
2.2
9.8
2.0
9.5
12.0
21.0
7.5
3.0
9.0
2.6
9.5
14.0
21.0
9.0
6
4.5
6
5.3
4.5
S
R
S
R
S
R
S
R
S
R
S
Source File 1GB
User Mix 3GB
5.3
29.0
29.0
29.0
14
11.5
User Mix 12GB
Large File 4GB
1.8
9.7
6.0
2.0
9.7
7.2
2.7
9.7
9.0
6
5.6
31.0
31.0
31.0
21
16.5
Large File 32GB
Large File 64GB
R
S
R
S
R
S
R
S
R
1.8
7.5
1.8
5.4
1.8
9.6
1.8
9.6
1.8
7.5
1.8
5.4
1.8
9.6
1.8
9.6
2.2
7.7
2.2
6.0
2.0
9.8
2.0
9.8
2.2
7.7
2.2
6.0
2.0
9.8
2.0
9.8
2.6
7.8
2.6
6.0
2.6
9.8
2.6
9.8
2.6
7.7
2.6
6.0
2.6
9.8
2.6
9.8
1 Directory
Many Objects
Many Directories
Many Objects
Domino Mail Files
Network Storage Space
IBM i 6.1 Performance Capabilities Reference - January/April 2008
© Copyright IBM Corp. 2008 Chapter 15. Save/Restore Performance
268
Download from Www.Somanuals.com. All Manuals Search And Download.
15.22 Software Compression
The rates a customer will achieve will depend upon the system resources available. This test was run in a
very favorable environment to try to achieve the maximum rates. Software compression rates were
gathered using the QSRSAVO API. The CPU used in all compression schemes was near 100%. The
compression algorithm cannot span CPUs so the fact that measurements were performed on a 24-way
system doesn’t affect the software compression scenario.
Table 15.22.1 - Measurements on an 840 24-way system 1080 RAID protected DASD Units (GB/HR) 128 GB
mainstore
Software
NSRC1GB
NUMX12GB
SR16GB
Compression
Ratio
iV5R1
iV5R2
Save
Restore
Save
Restore
Save
19
7
19
7
135
45
200
50
88
37
170
170
480
480
108
57
iV5R2 Using
API DTACPR
*LOW
1.5:1
2.7:1
3:1
Restore
iV5R2 Using
API DTACPR
*MED
iV5R2 Using
API DTACPR
*HIGH
Save
Restore
26
23
27
31
Save
Restore
6
39
6
65
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 15. Save/Restore Performance
269
Download from Www.Somanuals.com. All Manuals Search And Download.
15.23 9406-MMA DVD RAM
Table 15.23.1 - iV5R4M5 Measurements on an 9406-MMA 4-way system 6 Mirrored DASD in the CEC and 24
RAID5 protected DASD Units attached 32 GB memory Measurements in (GB/HR) all 30 DASD in the system
ASP.
Workload
S = Save
R = Restore
Release Measurements were
iV5R4M5 iV5R4M5
done
3.0
7.3
13.4
9.3
S
R
S
R
S
Source File 1GB
2.3
8.0
User Mix 3GB
Large File 4GB
12.5
2.2
28.0
8.0
R
S
R
S
R
S
R
S
R
14.0
2.3
45.0
2.3
1 Directory
Many Objects
9.0
9.0
2.2
2.2
Many Directories
Many Objects
5.5
5.5
2.3
2.3
Domino Mail Files
14.5
2.2
14.5
2.2
Network Storage Space
14.0
14.0
IBM i 6.1 Performance Capabilities Reference - January/April 2008
© Copyright IBM Corp. 2008 Chapter 15. Save/Restore Performance
270
Download from Www.Somanuals.com. All Manuals Search And Download.
15.24 9406-MMA 576B IOPLess IOA
Table 15.24.1 - iV6R1M0 Measurements on an 9406-MMA 4-way system 200 RAID5 protected DASD Units in
the system ASP, attached via 571F IOAs 40 GB memory Measurements in (GB/HR).
Two different Virtual tape experiments with 60 RAID5 DASD ASP2 and 120 RAID5 DASD in ASP2
Workload
S = Save
R = Restore
IBM i
Release
V6R1M0 V6R1M0
V6R1M0 V6R1M0 V6R1M0 V6R1M0 V6R1M0
V6R1M0
40
32
34
34
34
50
40
40
S
R
S
R
S
Source File
1GB
45
50
50
50
42
42
280
190
615
590
625
234
210
859
837
890
230
230
885
810
920
230
210
700
700
700
230
230
1050
1000
1100
220
220
350
750
350
280
220
770
770
770
User Mix
12GB
Large File
64GB
R
S
1st Drive 2nd Drive
920
485
885
475
Large File
320GB
R
S
R
S
R
590
50
890
55
845
55
700
55
1000
55
750
50
770
50
1 Directory
Many
Objects
50
40
26
50
40
28
50
40
27
50
40
27
50
40
27
50
38
26
50
38
26
Many
Directories
Many Objects
450
450
575
650
580
650
550
650
650
750
330
700
700
700
S
Domino
Mail Files
R
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 15. Save/Restore Performance
271
Download from Www.Somanuals.com. All Manuals Search And Download.
15.25 What’s New and Tips on Performance
What’s New
iV6R1M0
March 2008
BRMS-Based Save/Restore Software Encryption and DASD-Based ASP Encryption
576B IOPLess Storage IOA
iV5R4M5
July 2007
3580 Ultrium 4 - 4Gb Fiber Channel Tape Drive
6331 SAS DVD RAM for 9406-MMA system models
iV5R4
January 2007
571E and 571F storage IOAs (see DASD Performance chapter for more information)
August 2006
1. DUPTAP performance PTFs (iV5R4 - SI23903, MF39598, MF39600, and MF39601)
2. 3580 Ultrium 3 4Gb Fiber Channel Tape Drive
January 2006
1. Virtual Tape
2. Parallel IFS
3. 3580 Ultrium 3 2Gb Fiber Channel Tape Drive
4. 3592E
5. VXA-320
6. ½ High Ultrium 2
7. IFS Restore Improvements for the Directory Workloads
8. 5761 4Gb Fiber Adapter
TIPS
1. Backup devices are affected by the media type. For most backup devices the right media and density
can greatly affect the capacity and speed of your save or restore operation. USE THE RIGHT
MEDIA FOR YOUR BACKUP DEVICE. (i.e. Use a 25 GB tape cartridge in a 25 GB drive).
2. A Backup and Recovery Management System such as BRMS/400 is recommended to keep track of
the data and make the most of multiple backup devices.
3. Domino Online Performance Tips:
IBM i 6.1 Performance Capabilities Reference - January/April 2008
© Copyright IBM Corp. 2008
Chapter 15. Save/Restore Performance
272
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 16 IPL Performance
Performance information for Initial Program Load (IPL) is included in this section.
The primary focus of this section is to present observations from IPL tests on different System i models.
The data for both normal and abnormal IPLs are broken down into phases, making it easier to see the
detail. For information on previous models see a prior Performance Capabilities Reference.
NOTE: The information that follows is based on performance measurements and analysis done in the
Server Group Division laboratory. Actual performance may vary significantly from these tests.
16.1 IPL Performance Considerations
The wide variety of hardware configurations and software environments available make it difficult to
characterize a 'typical' IPL environment and predict the results. The following section provides a simple
description of the IPL tests.
16.2 IPL Test Description
Normal IPL
y
y
Power On IPL (cold start after managed system was powered down completely)
For a normal IPL, benchmark time is measured from power-on until the System i server console
sign-on screen is available.
Abnormal IPL
y
System abnormally terminated causing recovery processing to be done during the IPL. The
amount of processing is determined by the system activities at the time the system terminates.
For an abnormal IPL, the benchmark consists of bringing up a database workload and letting it
run until the desired number of jobs are running on the system. Once the workload is stabilized,
the system is forced to terminate, forcing a mainstore dump (MSD). The dump is then copied to
DASD via the Auto Copy function. The Auto Copy function is enabled through System Service
Tools (SST). The System i partition is set to normal so that once the dump is copied, the system
completes the remaining IPL with no user intervention. Benchmark time is measured from the
time the system is forced to terminate, to when the System i server console sign on screen is
available.
y
y
Settings: on the CHGIPLA command the parameter, HDWDIAG, set to (*MIN). All physical
files are explicitly journaled. Also logical files are journaled using SMAPP (System Managed
Access Path Protection) by using the EDTRCYAP command set to *MIN.
NOTE: Due to some longer starting tasks ( like TCP/IP ), all workstations may not be up and ready at the
same time as the console workstation displays a sign-on screen.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 16 IPL Performance
273
Download from Www.Somanuals.com. All Manuals Search And Download.
16.3 9406-MMA System Hardware Information
16.3.1 Small system Hardware Configuration
9406-MMA 7051 4 way - 32 GB Mainstore
DASD / 30 70GB 15K rpm arms,
6 DASD in CEC Mirrored
24 DASD in a #5786 EXP24 Disk Drawer attached with a 571F IOA RAID5 Protected
Software Configuration
100,000 spool files (100,000 completed jobs with 1 spool file per job)
500 jobs in job queues (inactive)
600 active jobs in system during Mainstore dump
1000 user profiles, 1000 libraries
Active Database: 2 libraries with 500 physical files and 20 logical files
16.3.2 Large system Hardware Configurations
9406-MMA 7056 8 way - 96 GB Mainstore
DASD / 432 70GB 15K rpm arms,
3 ASP's defined, 108 RAID5 DASD in ASP1, 288 RAID5 DASD in ASP2, 36 DASD no protection in
ASP3 - Mainstore dump set to ASP 2
y
This system was tested with database files unrelated to this test covering 30% of the DASD space
available, this database load causes a long directory recovery.
9406-MMA 7061 16 way - 512 GB Mainstore
DASD / 1000 70GB 15K rpm arms,
3 ASP's defined, 196 Nonconfigured DASD, 120 RAID5 DASD in ASP1, 612 RAID5 DASD in ASP2,
72 DASD no protection in ASP3 - Mainstore dump set to ASP 2
y
This system was tested with database files unrelated to this test covering 30% of the DASD space
available, this database load causes a long directory recovery.
Software Configuration
400,000 spool files (400,000 completed jobs with 1 spool files each)
1000 jobs waiting on job queues (inactive)
11000 active jobs in system during mainstore dump
200 remote printers, 6000 user profiles, 6000 libraries
Active Database:
y
y
25 libraries with 2600 physical files and 452 logical files
2 libraries with 10,000 physical files and 200 logical files
NOTE:
y
y
y
Physical files are explicitly journaled
Logical files are journaled using SMAPP set to *MIN
Commitment Control used on 20% of the files
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 16 IPL Performance
274
Download from Www.Somanuals.com. All Manuals Search And Download.
16.4 9406-MMA IPL Performance Measurements (Normal)
The following tables provide a comparison summary of the measured performance data for a normal and
abnormal IPL. Results provided do not represent any particular customer environment.
Measurement units are in minutes and seconds
Table 16.4.1 iV5R4M5 Normal IPL - Power-On (Cold Start)
iV5R4M5
GA1 Firmware
4 Way
iV6R1
GA3 Firmware
4 Way
iV5R4M5
GA1 Firmware
8 Way
iV5R4M5
GA1 Firmware
16 Way
iV6R1
GA3 Firmware
16 Way
9406-MMA
7051
9406-MMA
7051
9406-MMA
7056
9406-MMA
7061
9406-MMA
7061
32 GB
32 GB
96 GB
512 GB
512 GB
30 DASD
432 DASD
1000 DASD
1000 DASD
30 DASD
3:12
22:07
Hardware
SLIC
3:10
4:49
:48
7:53
7:53
2:12
19:17
10:05
2:41
5:07
1:23
9:42
9:58
2:22
OS/400
Total
34:27
8:47
17:58
32:03
Generally, the hardware phase is composed of C1xx xxxx, C3xx xxxx and C7xx xxxx. SLIC is
composed of C200 xxxx and C600 xxxx. OS/400 is composed of C900 xxxx SRCs to the System i server
console sign-on.
16.5 9406-MMA IPL Performance Measurements (Abnormal)
Measurement units are in minutes and seconds.
Table 16.5.1 iV5R4M5 Abnormal IPL (Partition MSD)
iV5R4M5
GA1 Firmware
4 Way
iV6R1
GA3 Firmware
4 Way
iV5R4M5
GA1 Firmware
8 Way
iV5R4M5
GA1 Firmware
16 Way
iV6R1
GA3 Firmware
16 Way
9406-MMA
7051
9406-MMA
7051
9406-MMA
7056
9406-MMA
7061
9406-MMA
7061
32 GB
32 GB
96 GB
512 GB
512 GB
30 DASD
1000 DASD
1000 DASD
30 DASD
432 DASD
Processor
MSD
1:02
4:34
1:50
4:12
4:28
SLIC MSD
IPL
10:45
10:56
7:23
7:00
11:35
with Copy
Shutdown
re-ipl
2:24
1:29
3:04
3:28
2:00
3:09
3:18
2:32
2:28
4:02
SLIC
re-ipl
OS/400
Total
5:04
20:47
42:49
4:22
18:44
28:06
45:08
29:27
52:00
20:44
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 16 IPL Performance
275
Download from Www.Somanuals.com. All Manuals Search And Download.
16.6 NOTES on MSD
MSD is Mainstore Dump. General IPL phase as it relates to the SRCs posted on the operation panel:
Processor MSD includes the D2xx xxxx and C2xx xxxx right after the system is forced to terminate.
SLIC MSD IPL with Copy follows with the next series of C6xx xxxx, see the next heading for more
information on the SLIC MSD IPL with Copy. The copy occurs during the C6xx 4404 SRCs. Shutdown
includes the Dxxx xxxx SRCs. Hardware re-ipl includes the next phase of D2xx xxxx and C2xx xxxx.
SLIC re-IPL follows which are the C600 xxxx SRCs. OS/400 completes with the C900 xxxx SRCs.
16.6.1 MSD Affects on IPL Performance Measurements
SLIC MSD IPL with Copy is affected by the number of DASD units and the jobs executing at the time of
the mainstore dump.
When a system is abnormally terminated, in-process changes to the directories used by the system to
manage storage may be lost. During the subsequent IPL, storage management directory recovery is
performed to ensure the integrity of the directories and the underlying storage allocations.
The duration of this recovery step will depend on the type of recovery performed and on the size of the
directories. In most cases, a subset directory recovery (SRC C6004250) will be performed which may
typically run from 2 minutes to 30 minutes depending upon the system. In rare cases, a full directory
recovery (SRC C6004260) is performed which typically runs much longer than a subset directory
recovery. The duration of the subset directory recovery is dependent on the size of the directory (which
relates to the amount of data stored on the system) and on the amount of in-process changes. With the
amount of data stored on our largest configurations with one to two thousand disk units, subset directory
recovery (SRC C6004250) took from 14 minutes to 50 minutes depending upon the system.
DASD Unit’s Effect on MSD Time - Through experimental testing we found the time spent in MSD
copying the data to disk is related to the number of DASD arms available. Assigning the MSD copy to an
ASP with a larger number of DASD can help reduce your recovery time if an MSD should occur.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 16 IPL Performance
276
Download from Www.Somanuals.com. All Manuals Search And Download.
16.7 5XX System Hardware Information
16.7.1 5XX Small system Hardware Configuration
520 7457 2 way - 16 GB Mainstore
DASD / 23 35GB 15K rpm arms,
RAID Protected
Software Configuration
100,000 spool files (100,000 completed jobs with 1 spool file per job)
500 jobs in job queues (inactive)
500 active jobs in system during Mainstore dump
1000 user profiles
1000 libraries
Database:
y
2 libraries with 500 physical files and 20 logical files
16.7.2 5XX Large system Hardware Configuration
570 7476 16 way - 256 GB Mainstore
DASD / 924 35GB arms 15K rpm arms,
RAID protected, 3 ASP's defined, majority of the DASD in ASP2 - Mainstore dump was to ASP 2
y
This system was tested with 2 TB of database files unrelated to this test, but this load causes a long
directory recovery.
595 7499 32-way - 384 GB Mainstore
DASD / 1125 35GB arms 15K rpm arms
RAID protected, 3 ASP's defined, majority of the DASD in ASP2 - Mainstore dump was to ASP 2
y
This system was tested with 4 TB of database files unrelated to this test, but this load causes a long
directory recovery.
Software Configuration
400,000 spool files (400,000 completed jobs with 1 spool files each)
1000 jobs waiting on job queues (inactive)
11000 active jobs in system during mainstore dump
200 remote printers
6000 user profiles
6000 libraries
Database:
y
y
25 libraries with 2600 physical files and 452 logical files
2 libraries with 10,000 physical files and 200 logical files
NOTE:
y
y
y
Physical files are explicitly journaled
Logical files are journaled using SMAPP set to *MIN
Commitment Control used on 20% of the files
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 16 IPL Performance
277
Download from Www.Somanuals.com. All Manuals Search And Download.
16.8 5XX IPL Performance Measurements (Normal)
The following tables provide a comparison summary of the measured performance data for a normal and
abnormal IPL. Results provided do not represent any particular customer environment.
Measurement units are in minutes and seconds
Table 16.8.1 Normal IPL - Power-On (Cold Start)
V5R3
iV5R4
V5R3
iV5R4
V5R3
iV5R4
GA3 Firmware GA7 Firmware GA3 Firmware GA7 Firmware GA3 Firmware GA7 Firmware
2 Way
520
2 Way
520
16 Way
570
16 Way
570
32 Way
595
32 Way
595
7457
7457
7476
7476
7499
7499
16 GB
23 DASD
5:19
16 GB
23 DASD
3:30
256 GB
924 DASD
18:37
256 GB
924 DASD
17:44
384 GB MS
1125 DASD
25:50
384 GB
1125 DASD
26:27
Hardware
SLIC
3:49
4:30
6:42
6:43
8:50
9:36
1:00
10:08
:50
8:50
1:32
26:51
2:32
26:59
2:30
37:10
3:43
39:46
OS/400
Total
The workloads were increased for iV5R4 to better reflect common system load affecting the OS/400 portion of the IPL
Generally, the hardware phase is composed of C1xx xxxx, C3xx xxxx and C7xx xxxx on the 5xx
systems. SLIC is composed of C200 xxxx and C600 xxxx. OS/400 is composed of C900 xxxx SRCs to
the IBM i operating system console sign-on.
16.9 5XX IPL Performance Measurements (Abnormal)
Measurement units are in hours, minutes and seconds.
Table 16.9.1 Abnormal IPL (Partition MSD)
V5R3
iV5R4
V5R3
iV5R4
V5R3
iV5R4
GA3 Firmware GA7 Firmware GA3 Firmware GA7 Firmware GA3 Firmware GA7 Firmware
2 Way
520
2 Way
520
16 Way
570
16 Way
570
32 Way
595
32 Way
595
7457
7457
7476
7476
7499
7499
16 GB
23 DASD
00:35
16 GB
23 DASD
4:54
256 GB
924 DASD
01:53
256 GB
924 DASD
6:39
384 GB MS
1125 DASD
02:41
384 GB
1125 DASD
6:06
Processor
MSD
04:50
15:40
24:10
42:18
43:10
40:03
SLIC MSD
IPL
with Copy
Shutdown
re-ipl
02:46
01:59
2:50
2:17
04:19
03:59
2:23
5:22
03:59
04:16
3:57
6:21
SLIC
re-ipl
03:21
13:31
4:20
30:01
09:56
44:17
25:45
1:22:27
13:56
1:08:02
44:10
1:40:37
OS/400
Total
The workloads were increased for iV5R4 to better reflect common system load affecting the MSD and the OS/400 portion of the
IPL
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 16 IPL Performance
278
Download from Www.Somanuals.com. All Manuals Search And Download.
16.10 5XX IOP vs IOPLess effects on IPL Performance (Normal)
Measurement units are in minutes and seconds.
Table 16.10.2 Normal IPL - Power-On (Cold Start)
iV5R4
GA7 Firmware
16 Way IOP
570
iV5R4
GA7 Firmware
16 Way IOPLess
570
7476
7476
256 GB
924 DASD
17:44
6:43
256 GB
924 DASD
18:06
Hardware
SLIC
7:20
2:32
26:59
2:52
28:18
OS/400
Total
16.11 IPL Tips
Although IPL duration is highly dependent on hardware and software configuration, there are tasks that
can be performed to reduce the amount of time required for the system to perform an IPL. The following
is a partial list of recommendations for IPL performance:
y
Remove unnecessary spool files. Use the Display Job Tables (DSPJOBTBL) command to monitor
the size of the job table(s) on the system. Change IPL Attributes (CHGIPLA) command can be used
to compress job tables if there is a large number of available job table entries. The IPL to compress
the tables maybe longer, so try to plan it along with a normal maintenance IPL where you have the
time to wait for the table to compress.
y
y
Reduce the number of device descriptions by removing any obsolete device descriptions.
Control the level of hardware diagnostics by setting the CHGIPLA command to specify
HDWDIAG(*MIN), the system will perform only a minimum, critical set of hardware diagnostics.
This type of IPL is appropriate in most cases. The exceptions include a suspected hardware problem,
or when new hardware, such as additional memory, is being introduced to the system.
y
y
Reduce the amount of rebuild time for access paths during an IPL by using System Managed Access
Path Protection (SMAPP). The IBM i operating system Backup and Recovery book (SC41-5304)
describes this method for protecting access paths from long recovery times during an IPL.
For additional information on how to improve IPL performance, refer to IBM i operating system
Basic System Operation, Administration, and Problem Handling (SC41-5206) - or to the redbook The
System Administrator’s Companion to IBM i operating system Availability and Recovery
(SG24-2161).
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 16 IPL Performance
279
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 17. Integrated BladeCenter and System x Performance
4
This chapter provides a performance overview and recommendations for the Integrated xSeries Server , the
Integrated xSeries Adapter and the iSCSI host bus adapter. In addition, the chapter presents some performance
characteristics and impacts of these solutions on System i™.
17.1 Introduction
The Internet SCSI Host Bus Adapter (iSCSI HBA), the Integrated xSeries® Server for iSeries™ (IXS),
and the Integrated xSeries Adapter (IXA) extend the utility of the System i solution by integrating x86
and AMD based servers with the System i platform. Selected models of Intel based servers may run
Windows® 2000 Server editions, Windows Server 2003 editions, Red Hat® Enterprise Linux®, or
SUSE® LINUX Enterprise Server. In addition, the iSCSI HBA allows System i models to integrate and
control IBM System x and IBM BladeCenter® model servers.
For more information about supported models, operating systems, and options, please see the “System i
integration with BladeCenter and System x” web page referenced at the end of this chapter. Also, see the
iSeries Information Center content titled “Integrated operating environments - Windows environment on
iSeries” for iSCSI and IXS/IXA concepts and operation details.
In the text following, the System i platform is often referred to as the “host” server. The IXS, IXA
attached server, or the iSCSI HBA attached server is referred to as the “guest” server.
V5R4 iSCSI Host Bus Adapter (iSCSI HBA)
The iSCSI host bus adapters (hardware type #573B Copper and #573C Fiber-optic) have been introduced
in V5R4. The iSCSI HBA supports 1Gbit Ethernet network connections, and provides i5/OS system
management and disk consolidation for guest System x and BladeCenter platforms.
The iSCSI HBA solution provides an extensive scalability range - from connecting up to 8 guest servers
through one iSCSI HBA for a lower cost connectivity, to allowing up to 4 iSCSI HBAs per individual
guest server for scalable bandwidth. For information about the numbers of supported adapters please see
the “System i integration with BladeCenter and System x” web page.
With the iSCSI HBA solution, no disks are installed on the guest servers. The host i5/OS server provides
disks, storage consolidation, guest server management, along with the tape, optical, and virtual ethernet
devices. Currently, only Windows 2003 Server editions, with SP1 or release 2, are supported with the
iSCSI solution.
The Integrated xSeries Adapter (IXA)
The Integrated xSeries Adapter (IXA - hardware type #2689-001 or #2689-002) is a PCI-based interface
card that installs inside selected models of System x, providing a High Speed Link (HSL) connection to a
host i5/OS system. The guest server provides the processors, memory, and Server Proven adapters, but no
disks5. IXA attached SMP servers support larger workloads, more users and greater flexibility to attach
devices than the IXS uni-processor models.
4
The IBM System i and IBM System x product family names have replaced the IBM eSeries iSeries and xSeries
product family names. However, the IXS and IXA adapters retain the iSeries and xSeries brand labels.
5
With the IXA - all the disk drives used by the guest server are under the control of the host server. The are no
disk drives in the guest server.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 17 - Integrated BladeCenter and System x Performance
280
Download from Www.Somanuals.com. All Manuals Search And Download.
Integrated xSeries Servers (IXS)
An Integrated xSeries Server is an Intel processor-based server on a PCI-based interface card that plugs
into a host system. This card provides the processor, memory, USB interfaces, and in some cases, a
built-in gigabit Ethernet adapter. There are several hardware versions of the IXS:
y
y
The 2.0 GHz Pentium® M IXS (hardware type #4812-001)6.
The 2.0 GHz PCI IXS (hardware type #2892-002).
Older versions of the IXS Card are: the 1.6 GHz PCI IXS (hardware type #2892-001), 1 GHz (type
#2890-003), 850 Mhz (type #2890-002), and 700 MHz (type #2890-001).
17.2 Effects of Windows and Linux loads on the host system
The impact of IXS and IXA device I/O operations on the host system is similar for all versions of the IXA
and IXS cards. The IXA and IXS cards and drivers channel the host directed device I/O through an Input /
Output processor (IOP) on the card..
The iSCSI HBA is an IOP-less input/output adapter. The internal operation and performance
characteristics differ from the IXS/IXA solutions - with a slightly higher host CPU, and memory
requirements. However, the iSCSI solution offers greater scalability and overall improved performance
compared to the IXS/IXA. The iSCSI solution adds support for IBM BladeCenter and additional System
x models.
Depending on the Windows or Linux application activity, integrated guest server I/O operations impose
an indirect load on the System i native CPU, memory and storage subsystems. The rest of this chapter
describes some of the performance and memory resource impacts.
17.2.1 IXS/IXA Disk I/O Operations:
The integrated xSeries servers use i5/OS network server storage spaces for its hard drives, which are
allocated out of i5/OS storage.
For the IXS/IXA server - IBM supplied virtual SCSI drivers render these storage spaces as physical disks
in Windows or Linux. The device drivers cooperate with i5/OS to perform the disk operations, and
additional host CPU resource is used during disk access, along with a fixed amount of storage. The
amount of CPU impact is a function of the disk I/O rate and disk operation size.
The disk linkage type and Windows driver write cache property alters the CPU cost and average write
latency.
y
Fixed and Dynamically Linked Disks
Fixed (statically) and dynamically linked disk drives have the same performance characteristics. One
advantage of dynamically linked drives is that in most cases, they may be linked and unlinked while
the server is active.
y
Shared Disks
System i Windows integration supports the Microsoft Clustering Service, which supports “shared”
disks. When a storage space is linked as a “shared” disk, all write operations require extended
communications to insure data integrity, which slightly increases the host CPU cost and response
time.
6
Requires a separate IOP card, which is included with features 4811, 4812 and 4813.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 17 - Integrated BladeCenter and System x Performance
281
Download from Www.Somanuals.com. All Manuals Search And Download.
y
Write Cache Property
When the disk device write cache property is disabled, disk operations have similar performance
characteristics to shared disks. You may examine or change the “Write Cache” property on Windows
by selecting disk “properties” and then the “Hardware tab”. Then view “Properties” for a selected
disk and view the “Disk Properties” or “Device Options” tab.
All dynamically and statically linked storage spaces have “Write Cache” enabled by default. Shared
links have “Write Cache” disabled by default. While it is possible to enable “Write Cache” on shared
disks, we recommend to keep it disabled to insure integrity during clustering fail-over operations.
There is also negligible performance benefit to enabling the write cache on shared disks.
y Extended Write Operations
Even though a Windows disk driver may have write cache enabled, the system or applications
consider some write operations sensitive enough to request extended writes or flush operations “write
through” to the disk. These operations incur the higher CPW cost regardless of the write caching
property.
y
y
For the IXS and IXA solutions - do not enable the disk driver “Enable advanced performance”
property provided in Windows 2003. When enabled, all extended writes are turned into normal
cached operations and flush operations are masked. This option is only intended to be used when the
integrity of the write operations can be guaranteed via write through or battery backed memory. The
IXS/IXA with write caching enabled cannot make this guarantee.
IXS/IXA Disk Capacity Considerations
The level of disk I/O achieved on the IXS or IXA varies depending on many variables, but given an
adequate storage subsystem, the upper cap on I/O for a single server is limited by the IXS/IXA IOP
component. Except in extreme test loads, it’s unlikely the IOP will saturate due to disk activity.
When multiple IXS/IXA servers are attached under the same System i partition, the partition software
imposes a cap on the aggregate total I/O from all the servers. It is not a strict limitation, but a typical
capacity level is approximately 6000 to 10000 disk operations/sec.
17.2.2 iSCSI Disk I/O Operations:
y
The iSCSI disk operations use a more scalable storage I/O access architecture than the IXS and IXA
solutions. As a result, a single integrated server can scale to greater capacity by using multiple target
and initiator iSCSI HBAs to allow multiple data paths.
y
In addition, there is no inherent partition cap to the iSCSI disk I/O. The entire performance capacity
of installed disks and disk IOAs is available to iSCSI attached servers.
•
The Windows disk drive “write cache” policy does not directly affect iSCSI operations. Write
operations always “write through” to the host disk IOAs, which may or may not cache in battery
backed memory (depending on the capabilities and configuration of the disk IOA).
y
iSCSI attached servers use non-reserved System i virtual storage in order to perform disk input or
output. Thus, disk operations use host memory as an intermediate read cache. Write operations are
flushed to disk immediately, but the disk data remains in memory and can be read on subsequent
operations to the same sectors.
While the disk operations page through a memory pool, the paging activity is not visible in the “Non-DB”
pages counters displayable via the WRKSYSSTS command. This doesn’t mean the memory is not
actively used, it’s just difficult to visualize how much memory is active. WRKSYSSTS will show faults
and paging activity if the memory pool becomes constrained, but some write operations also result in
faulting activity.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 17 - Integrated BladeCenter and System x Performance
282
Download from Www.Somanuals.com. All Manuals Search And Download.
y
With iSCSI, there are some Windows side disk configuration rules you must take into account to
enable efficient disk operations. Windows disks should be configured as:
1 disk partition per virtual drive.
File system formatted with cluster sizes of 4 kbyte or 4 kbyte multiples.
2 gigabyte or larger storage spaces (for which Windows creates a default NTFS cluster size of
4kbytes).
If necessary, you can use care to configure multiple disk partitions on a single virtual drive.
y
y
y
For storage spaces that are 1024 MB or less, make the partitions a multiple of 1 MB (1,048,576
bytes).
For storage spaces that are 511000 MB or less, the partition should be a multiple of 63 MB
(66060288 bytes).
For storage spaces that are greater than 511000 MB, the partition should be a multiple of 252 MB
(264,241,152 bytes).
These guidelines allow file system structures to align efficiently between iSCSI Windows/Linux and
i5/OS7. They allow i5/OS to efficiently manage the storage space memory, mitigate disk operation
faulting activity, and thus improve overall iSCSI disk I/O performance. Failure to follow these
guidelines will cause iSCSI disk write operations to incur performance penalties, including page
faults and increased serialization of disk operations.
y
In V5R4 – the CHGNWSSTG command and iSeries Navigator now supports the expansion of a storage
space size. After the expansion, the file system in the disk should also be expanded - but take care on
iSCSI disks: don’t create a new partition in the expanded disk free space (unless the new partitions
meeting the size guidelines above).
The Windows 2003 “DISKPART” command can be used to perform the file system expansion – however
it only actually expands the file system on “basic” disks. If a disk has been converted to a “dynamic”
disk8, the DISKPART command creates a new partition and configures a spanned set across the partitions.
With iSCSI, the second partition may experience degraded disk performance.
17.2.3 iSCSI virtual I/O private memory pool
Applications sharing the same memory pool with iSCSI disk operations may be adversely impacted if the
iSCSI network servers perform levels of disk I/O which can flush the memory pool. Thus, it is possible
for other applications to begin to page fault because their memory has been flushed out to disk by the
iSCSI operations. By default, the iSCSI virtual disk I/O operations occur through the *BASE memory
pool.
In order to segregate iSCSI disk activity, V5R4 PTF SI23027 has been created to enable iSCSI virtual
disk I/O operations to run out of an allocated private memory pool. The pool is enabled by creating a
subsystem description named QGPL/QFPHIS and allocating a private memory pool of at least 4096
kilobytes. The amount of memory you want to allocate will depend on a number of factors, including
number of iSCSI network servers, expected sustained disk activity for all servers, etc. See the “System i
Memory Rules of Thumb” section below for more guidelines on the “iSCSI private pool” minimum size.
To activate the private memory pool for all iSCSI network servers, perform the following:
1. CRTSBSD SBSD(QGPL/QFPHIS) POOLS((1 10000 1))9
7
These guidelines could also slightly improve IXS and IXA attached servers’ disk performance, but to a much
smaller degree.
8
Not to be confused with “dynamically linked” storage spaces.
9
Pick a larger or smaller pool size as appropriate. 10,000 KByte is a reasonable minimum value, but 4096 KByte
is the absolute minimum supported.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 17 - Integrated BladeCenter and System x Performance
283
Download from Www.Somanuals.com. All Manuals Search And Download.
2. Vary on any Network Server Description (NWSD) with a Network server connection type of *ISCSI.
During the iSCSI network server vary on processing the QFPHIS subsystem is automatically started if
necessary. The subsystem will activate the private memory pool. iSCSI network server descriptions that
are varied on will then utilize the first private memory pool configured with at least the minimum (4MB)
size for virtual disk I/O operations.
The private memory pool is used by the server as long as the subsystem remains active. If the QFPHIS
subsystem is ended prematurely (while an iSCSI network server is active), the server will continue to
function properly but future virtual disk I/O operations will revert to the *BASE memory pool until the
system memory pool is once again allocated.
NOTE: When ending the QFPHIS subsystem, i5/OS can reallocate the memory pool, possibly assigning
the same identifier to another subsystem! Any active iSCSI network servers that are varied on and using
the memory pool at the time the subsystem is ended may adversely impact other applications either when
the memory pool reverts to *BASE or when the memory pool identifier is reassigned to another
subsystem! To prevent unexpected impacts – do not end the QFPHIS subsystem while iSCSI servers are
active.
17.2.4 Virtual Ethernet Connections:
The virtual Ethernet connections utilize the System i systems licensed internal code tasks during
operation. When a virtual Ethernet port is used to communicate between Integrated servers, or between
servers across i5/OS partitions, the host server CPU is used during the transfer. The amount of CPU used
is primarily a function of the number of transactions and their size.
There are three forms of Virtual Ethernet connections used with the IXS/IXA and iSCSI attached servers:
y
The “Point to point virtual Ethernet” is primarily used for the controlling partition to communicate
with the integrated server. This network is called point to point because it has only two endpoints, the
integrated server and the i5/OS platform. It is emulated within the host platform and no additional
physical network adapters or cables are used. In host models, it is configured as an Ethernet line
description with Port Number value *VRTETHPTP.
y
A “Port-based”10 virtual Ethernet connection allows IXS, IXA or iSCSI attached servers to
communicate together over a virtual Ethernet (typically used for clustered IXS configurations), or to
join an inter-LPAR virtual Ethernet available on non-POWER5 based systems This type of virtual
Ethernet uses “network numbers”, and integrated servers can participate by configuring a port number
value in the range *VRTETH0 through *VRTETH9.
“Port-based” virtual Ethernet communications also require the host CPU to switch the
communications data between guest servers.
y
A “VLAN-based” (noted as Phyp in charts) virtual Ethernet connection allows IXS, IXA and iSCSI
attached servers to participate in inter-LPAR virtual Ethernets. Each participating integrated server
needs an Ethernet line description that associates a port value such as *VRTETH0 with a virtual
adapter having a virtual LAN ID. You create the virtual adapter via the Hardware Management
Console (HMC).
VLAN-based communications also use the System i CPU to switch the communications data between
server.
17.2.5 IXS/IXA IOP Resource:
10
“Port-Based” refers to the original method of supporting VE introduced in V5R2 for models earlier than System
i5. It is still available for integrated servers to communicate within a single partition on System i models.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 17 - Integrated BladeCenter and System x Performance
284
Download from Www.Somanuals.com. All Manuals Search And Download.
IXS and IXA I/O operations (disk, tape, optical and virtual Ethernet) communications occur through the
individual IXS and IXA IOP resource. This IOP imposes a finite capacity. The IOP processor utilization
may be examined via the iSeries Collection Services utilities.
The performance results presented in the rest of this chapter are based on measurements and
projections using standard IBM benchmarks in a controlled environment. The actual throughput
or performance that any user will experience will vary depending upon considerations such as the
amount of multiprogramming in the user's job stream, the I/O configuration, the storage
configuration, and the workload processed. Therefore, no assurance can be given that an
individual user will achieve throughput or performance improvements equivalent to the ratios
stated here.
17.3 System i memory rules of thumb for IXS/IXA and iSCSI attached servers.
The i5/OS machine pool memory “rule of thumb” is generally to size the machine pool with at least twice
the active machine pool reserved size. Automatic performance adjustments may alter this according to the
active load characteristics. But, there are base memory requirements needed to support the hardware and
set of adapters used by the i5/OS partition. You can refer to the System i sales manual or Work Load
Estimator for estimates of these base requirements. The “rules of thumb” below estimates the additional
memory required to support iSCSI and IXS/IXA.
17.3.1 IXS and IXA attached servers:
I/O occurs through fixed memory in the machine pool. The IXS and IXA attached servers require
approximately an additional 4MBytes of memory per server in the machine pool.
17.3.2 iSCSI attached servers:
IBM Director Server is required in each i5/OS partition running iSCSI HBA targets. IBM Director
requires a minimum of 500 Mbytes in the base pool.
The specific memory requirements of iSCSI servers vary based on many configuration choices, including
the number of LUNs, number of iSCSI target HBAs, number of NWSDs, etc. A suggested minimum
memory “rule of thumb”11 is:
11
Based on a rough configuration of 5 LUNS per server, 2 VE connections per server, and two target HBA
connections per server.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 17 - Integrated BladeCenter and System x Performance
285
Download from Www.Somanuals.com. All Manuals Search And Download.
For Each Target HBA
21 MBytes
For Each NWSD
1 MByte
Machine Pool:
Base Pool:
1 MByte
0.5 MByte
1 MByte12
QFPHIS Private Pool:
0.5 MByte
Total:
22.5 MBytes
2.5 MBytes
Warning: To ensure expected performance and continuing machine operation, it is critical to
allocate sufficient memory to support all of the devices that are varied on. Inadequate memory
pools can cause unexpected machine operation.
17.4 Disk I/O CPU Cost
Disk Operation Rules of Thumb
iSCSI linked disks
CPWs13 / 1k ops/sec
190
130
IXS/IXA static or dynamically linked disks with write caching
enabled
IXS/IXA shared or quorum linked disks or write caching
disabled
155
While the disk I/O activity driven by the IXS/IXA or iSCSI is not strictly a “CPW” type load, the CPW
estimate is still a useful metric to estimate the amount of i5/OS CPU required for a load. You can use the
values above to estimate the CPW requirements if you know the expected I/O rate. For example, if you
expect the Windows application server to generate 800 disk ops/sec on a dynamically or statically linked
storage space, you can estimate the CPW usage as:
130 cpws/1kops * 800ops * 1kops/1000ops = 130 * 800/1000 = 104 CPWs
out of the host processor CPW capacity. While it is always better to project the performance of an
application from measurements based on that same application, it is not always possible. This calculation
technique gives a relative estimate of performance.
These rules of thumb are estimated from the results of performing file serving or application types of
loads. In more detail, the chart below indicates an approximate amount of host processor (in CPW)
required to perform a constant number of disk operations (1000) of various sizes. You can reasonably
adjust this estimate linearly for your expected I/O level.
12
Private pool assigned to QFPAIS must still be a 4 MB minimum size.
13
A CPW is the “Relative System Performance Metric” from Appendix C. Note that the I/O CPU capacities may
not scale exactly by rated system CPW, as the disk I/O doesn’t represent a CPW type of load . This calculation is
a convenient metric to size the load impacts. The measured CPW cost will actually decrease from the above
values as the number of processors in the NWSD hosting partition increases, and may be higher than estimated
when partial processors are used.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 17 - Integrated BladeCenter and System x Performance
286
Download from Www.Somanuals.com. All Manuals Search And Download.
CPW per 1k Disk Operations
600
500
400
300
200
100
0
iSCSI
IXS/IXA w Caching Disabled or Shared
IXS/IXA w Caching Enabled
The charts shows the relative cost when performing 5 different types of operations14.
y
y
y
Random write operations of a uniform size (512, 1k, ... 64k).
Random read operations of a uniform size (512, 1k, ... 64k).
A 35% random write, 65% random read mix of operations with a uniform size (termed transaction
processing type load).
y
y
A fileserving type load - which consists of a mix of operations of various sizes similar in ratio to
typical fileserving loads. This load is 80% random reads.
An application server - database type load. This is also a mix to simulate the character of application
and database type accesses - mostly random with about 40% reads.
17.4.1 Further notes about IXS/IXA Disk Operations
y
y
y
Maximum disk operation size supported by the IXS or IXA is 32k. Thus, any Windows disk
operations greater than 32k will result in the Windows operating system splitting the operation into 2
or more sequential operations.
The IXS/IXA cost calculation is slightly greater on a POWER5® system (System i5) than on earlier
8xx systems. This includes some increased overhead costs in V5R4, and the new processor types. Use
the newer rules of thumb listed above for CPW calculations.
It does not matter if a storage space is linked statically or dynamically, the performance
characteristics are identical.
y
It does not matter if a server is an IXS or an IXA attached System x server, the disk performance is
almost identical.
14
Measured on a System i Model 570 - 2-way 26F2 processor (7495 capacity card), rated at 6350 CPWs, V5R4
release of i5/OS, 40 parity protected (RAID 5) 4326 disks, 3 2780 disk controllers. The IXA attached server was
a x365xSeries (4way 2.5Ghz Xeon with IXA and Windows Server 2003 with SP1. The iSCSI servers were HS20
BladeCenter servers with a copper iSCSI (p/n 26K6489) daughter card. Switches were Nortel L2/3 Ethernet (p/n
26K6524) and Cisco Intelligent Gigabit Switch (p/n )
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 17 - Integrated BladeCenter and System x Performance
287
Download from Www.Somanuals.com. All Manuals Search And Download.
y
A storage space which is linked as shared, or a disk with caching disabled, requires more CPU to
process write operations (approx. 45%).
y
y
Sequential operations cost approximately 10% less than the random I/O results shown above.
Even though a Windows disk driver may have write cache enabled, some Windows applications may
request to bypass the cache for some operations (extended writes), and these operations would incur
the higher CPW cost.
y
If your application load is skewed to the upper or lower average disk operations, you may encounter a
smaller or larger CPW cost than indicated by the “rules of thumb”.
17.5 Disk I/O Throughput
The chart below compares the throughput performance characteristics of the current IXA product against
the new iSCSI solution. The charts indicates an approximate capacity of a single target HBA when
running various sizes and types of random operations.
The chart below also demonstrates that the new page based architecture utilized in the iSCSI solution
provides better over all performance in Write and Read scenarios. While the sector based architecture of
the current IXA provides slightly better performance in small block Write operations, that difference is
quickly reversed when block sizes reach 4k. 4k block sizes and higher are representative of more I/O
server scenarios.
As with all performance analysis, the actual values that you will achieve are dependent on a number of
variables including workload, network traffic, etc..
iSCSI Target, IXA Capacity Comparison
MB per Second
160
140
120
100
80
60
40
20
0
IXA
iSCSI 1 Targ 1 Init Std
iSCSI 1 Targ 4 Init Std
iSCSI 1 Targ 4 Init Jumbo
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 17 - Integrated BladeCenter and System x Performance
288
Download from Www.Somanuals.com. All Manuals Search And Download.
The blue square line shows an iSCSI connection with a single target iSCSI HBA - single initiator iSCSI
HBA connection, configured to run with standard frames. The pink circle line is a single target iSCSI
HBA to multiple servers and initiators running also running with standard frames. With the initiators and
switches configured to use 9k jumbo frames, a 15% to 20% increase in upper capacity is demonstrated.15
17.6 Virtual Ethernet CPU Cost and Capacities
If the virtual Ethernet connections are used for any significant LAN traffic, you need to account for
additional System i CPU requirements. There is no single rule of thumb applicable to network traffics, as
there are a great number of variables involved.
The charts below demonstrate approximate capacity for single TCP/IP connections, and illustrates the
minimum CPW impacts for some network transaction sizes (send/receive operations) and types gathered
with the Netperf 16exerciser. The CPW chart below gives CPWs per Mbit/sec for increasing transaction
sizes. When the transaction size is small, the CPW requirements are greater per transaction. When the
arrival rate is high enough, some consolidation of operations within the process stream can occur and
increase efficiency of operations.
Several charts are presented comparing the virtual ethernet capacity and costs between an iSCSI server
running jumbo frames and standard frames, and a IXS or IXA server. In addition, a comparison of costs
while using external NICs is added, to place the measurements in context. The “Point-to-Point” refers to
the cost between an iSCSI, IXS or IXA attached server and an host system across the point to point
connection. “Port Based VE” refers to a port-based connection between two guest servers in the same
partition. “VLAN based VE” refers to a Virtual LAN based connection between two guest servers in the
same partition, but using the VLAN to port associated virtual adapters. In the latter two cases, the total
CPW cost would be split across partitions if the communication would occur between guest servers
hosted by different partitions17.
17.6.1 VE Capacity Comparisons
In general, VE has less capacity than an external Gigabit NIC. Greater capacity with VE is possible using
9k jumbo frames than with using standard 1.5k frames. Also, the iSCSI connection has a greater capacity
15
In addition, jumbo frame configuration has no effect on the CPW cost of iSCSI disk operations.
16
Note that the Netperf benchmark consists of C programs which use a socket connection to read and write data
between buffers. The CPW results above don’t attempt to factor out the minimal application CPU cost. That is,
the CPW results above include the primitive Netperf application, socket, TCP, and Ethernet operation costs. A
real user application will only have this type of processing as a percentage of the overall workload.
17
Netperf TCP_STREAM measured on a System i Model 570 - 2-way 26F2 processor (7495 capacity card), rated at
6350 CPWs, V5R4 release of i5/OS. The IXA attached server was a x365xSeries (4way 2.5Ghz Xeon with IXA
and Windows Server 2003 with SP1. The iSCSI servers were HS20 BladeCenter 32.Ghz uniprocessor servers
with a copper iSCSI (p/n 26K6489) daughter card. Switches were Nortel L2/3 Ethernet (p/n 26K6524). This is
only a rough indicator for capacity planning, actual results may differ for other hardware configurations.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 17 - Integrated BladeCenter and System x Performance
289
Download from Www.Somanuals.com. All Manuals Search And Download.
than an IXS or IXA attached VE connection. “Stream” means that the data is pushed in one direction,
with only the TCP acknowledge packets running in the other direction.
PP TCP Stream Windows to i5/OS
PP TCP Stream i5/OS to Windows
iscsi jumbo
iscsi standard
IXS/IXA
iscsi jumbo
iscsi standard
IXS/IXA
1000
900
800
700
600
500
400
300
200
100
0
1000
900
800
700
600
500
400
300
200
100
0
external NICs
external NICs
Transaction Size (bytes)
Transaction Size (bytes)
VEPhyp Stream TCP
VE Internal (port based)
iscsi jumbo
iscsi standard
IXS/IXA
iscsi jumbo
iscsi standard
IXS/IXA
1000
900
800
700
600
500
400
300
200
100
0
1000
900
800
700
600
500
400
300
200
100
0
external NICs
external NICs
16
64
256
1024
4096
16384
16
64
256
1024
4096
16384
Transaction Size (bytes)
Transaction Size (bytes)
17.6.2 VE CPW Cost
CPW cost below is listed as CPW per Mbit/sec. For the point to point connection, the results are different
depending on the direction of transfer. For connections between guest servers - the direction of transfer
doesn’t matter to the results.
PP TCP Stream i5/OS to Windows
iscsi jumbo
PP TCP Stream Windows to i5/OS
25.0
40.0
iscsi jumbo
VE Internal (port based)
VEPhyp Stream TCP
16
iscsi jumbo
iscsi standard
IXS/IXA
25
iscsi jumbo
iscsi standard
IXS/IXA
14
20
12
10
15
8
10
6
4
2
0
5
0
Transaction Size (bytes)
Transaction Size (bytes)
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 17 - Integrated BladeCenter and System x Performance
290
Download from Www.Somanuals.com. All Manuals Search And Download.
The chart above shows the CPW efficiency of operations (larger is better). Note the CPW per Mbits/sec
scale on the left - as it’s different for each chart.
For an IXS or IXA, the port-based VE has the least CPW or smaller packets due to consolidation of
transfers available in Licensed Internal Code. The VLAN-based transfers have the greatest cost (However
the total would be split during inter-LPAR communications).
For iSCSI, the cost of using standard frames is 1.5 to 4.5 times higher than jumbo frames.
17.6.3 Windows CPU Cost
The next chart illustrates the cost of iSCSI port-based, and VLAN-based virtual ethernet operations on a
windows CPU. In this case, the CPUs used are 3.2Ghz Xeon uniprocessor between HS20 BladeCenter
servers. This cost is compared to operations across the external gigabit NIC connections. Again, the
jumbo frames operations are less expensive than standard frames, though the external NIC is twice as
efficient in General.
Streaming TCP Total Windows CPU Util
External - Window s to Window s (std)
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
External - Window s to Window s (jumbo)
Port Based VE- Window s to Window s (std)
Port Based VE- Window s to Window s (jumbo)
Vlan Based VE - Window s to Window s (std)
Vlan Based VE - Window s to Window s (jumbo)
Transaction Size (bytes)
17.7 File Level Backup Performance
The Integrated Server support allows you to save integrated server data (files, directories, shares, and the
Windows registry) to tape, optical or disk (*SAVF) in conjunction with your other i5/OS data. That is,
this “file level backup” approach saves or restores the Windows files on an individual basis within the
stream of other i5/OS data objects. It’s not recommended that this approach is used as a primary backup
procedure. Rather, you should still periodically save your NWS storage spaces and the NWSD associated
with your Windows server or Linux server for disaster recovery.
Saving individual files does not operate as fast as saving the individual storage spaces. The save of a
storage space on a equivalent machine and tape is about 210 Gbytes per hour, compared to the
approximately 70 Gbytes per hour achieved using iSCSI below.
The chart below compares some SAV and RST rates for iSCSI and an IXA attached server. These results
were measured on a System i Model 570 2-way 26F2 processor (7495 capacity card). The i5/OS release
was V5R4. The IXA attached comparison server was an x365 xSeries (4way 2.5Ghz Xeon with IXA and
Windows Server 2003 with SP1). The iSCSI servers were HS20 BladeCenter servers with a copper
iSCSI (p/n 26K6489) daughter card. Switches were Nortel L2/3 Ethernet (p/n 26K6524). The target tape
drive was a model 5755-001 (Ultrium LTO 2). All tests were run with jumbo frames enabled.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 17 - Integrated BladeCenter and System x Performance
291
Download from Www.Somanuals.com. All Manuals Search And Download.
The legend label “Mixed Files” indicates a save of many files of mixed sizes - equivalent to the save of
the Windows system file disk. “Large files” indicates a save of many large files - in this case many
100MB files.
FLBU SAV / RST Rates
90.00
80.00
70.00
60.00
50.00
40.00
30.00
20.00
10.00
0.00
SAV to disk
SAV to Tape RST from disk RST from Tape
iSCSI Mixed IXA Large IXA Mixed
iSCSI Large
17.8 Summary
The iSCSI host bus adapter, Integrated xSeries Server and Integrated xSeries Adapter provides scalable
integration for full file, print and application servers running Windows 2000 Server, Windows Server
2003 or with Intel Linux editions. They provide flexible consolidation of System i solutions and
Windows or Linux services, in combination with improved hardware control, availability, and reduced
maintenance costs. These solutions perform well as a file or application server for popular applications,
using the System i host disks, tape and optical resources. The iSCSI HBA addition in V5R4 increases
Integrated server configuration flexibility and performance scalability. As part of the preparation for
integrated server installations, care should be taken to estimate the expected workload of the Windows or
Linux server applications and reserve sufficient i5/OS resources for the integrated servers.
17.9 Additional Sources of Information
System i integration with BladeCenter and System x URL:
Redbook: “Microsoft Windows Server 2003 Integration with iSeries”, SG246959 at
Redbook: “Tuning IBM eServer xSeries Servers for Performance SG24-5287”
While this document doesn’t address the integrated server configurations specifically, it is an
excellent resource for understanding and addressing performance issues with Windows or Linux.
Online documentation: “Integrated operating environments on iSeries”
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 17 - Integrated BladeCenter and System x Performance
292
Download from Www.Somanuals.com. All Manuals Search And Download.
Choose V5R4. In the “Contents” panel choose “iSeries Information Center”.
Expand “Integrated operating environments” and then “Windows environment on iSeries” for
Windows environment information or “Linux” and then “Linux on an integrated xSeries solution
for Linux Information on an IXS or attached xSeries server.
Microsoft Hardware Compatibility Test URL: See
search on IBM for product types Storage/SCSI Controller and System/Server Uniprocessor.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 17 - Integrated BladeCenter and System x Performance
293
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 18. Logical Partitioning (LPAR)
18.1 Introduction
Logical partitioning (LPAR) is a mode of machine operation where multiple copies of operating systems
run on a single physical machine.
A logical partition is a collection of machine resources that are capable of running an operating system.
The resources include processors (and associated caches), main storage, and I/O devices. Partitions
operate independently and are logically isolated from other partitions. Communication between partitions
is achieved through I/O operations.
The primary partition provides functions on which all other partitions are dependent. Any partition that
is not a primary partition is a secondary partition. A secondary partition can perform an IPL, can be
powered off, can dump main storage, and can have PTFs applied independently of the other partitions on
the physical machine. The primary partition may affect the secondary partitions when activities occur that
cause the primary partition’s operation to end. An example is when the PWRDWNSYS command is run
on a primary partition. Without the primary partition’s continued operation all secondary partitions are
ended.
V5R3 Information
Please refer to the whitepaper ‘i5/OS LPAR Performance on POWER4 and POWER5 Systems’ for the
latest information on LPAR performance. It is located at the following website:
V5R2 Additions
In V5R2, some significant items may affect one's LPAR strategy (see "General Tips"):
y
"Zero" interactive partitions. You do not have to allocate a minimum amount of interactive
performance to every partition when V5R2 OS is in the Primary partition.
In V5R2, the customer no longer has to assign a minimum interactive percentage to LPAR partitions (can
be 0). For partitions with no assigned interactive capability, LPAR system code will allow interactive as
follows: 0.1% x (processors in partition/total processors) x processor CPW.
In V5R1, the customer had to allocate a minimum interactive percentage to LPAR partitions as follows:
1.5% x (processors in partition/total processors) x processor CPW. It is expected that the LPAR
system code will issue a PTF to change the percentage from 1.5% to 0.5% for V5R1 systems.
Notes:
1. The above formulas yield the minimum ICPW for an LPAR region. The customer still has to divide this value by
the total ICPW to get the percentage value to specify for the LPAR partition.
2. If there is not enough interactive CPW available for the partition given the previous formula ... the interactive
percentage can be set to the percentage of the (processors in partition/total processors).
General Tips
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 18 - LPAR
294
Download from Www.Somanuals.com. All Manuals Search And Download.
y
y
Allocate fractional CPUs wisely. If your sizing indicates two partitions need 0.7 and 0.4 CPUs, see if
there will be enough remaining capacity in one of the partitions with 0.6 and 0.4 or else 0.7 and 0.3
CPUs allocated. By adding fractional CPUs up to a "whole" processor, fewer physical processors
will be used. Design implies that some performance will be gained.
Avoid shared processors on large partitions if possible. Since there is a penalty for having shared
processors (see later discussion), decide if this is really needed. On a 32 way machine, a whole
processor is only about 3 per cent of the configuration. On a 24 way, this is about 4 per cent. Though
we haven't measured this, the general penalty for invoking shared processors (often, five per cent)
means that rounding up to whole processors may actually gain performance on large machines.
V5R1 Additions
In V5R1, LPAR provides additional support that includes: dynamic movement of resources without a
system or partition reset, processor sharing, and creating a partition using Operations Navigator. For more
information on these enhancements, click on System Management at URL:
With processor sharing, processors no longer have to be dedicated to logical partitions. Instead, a shared
processor pool can be defined which will facilitate sharing whole or partial processors among partitions.
There is an additional system overhead of approximately 5% (CPU processing) to use processor sharing.
y
Uniprocessor Shared Processors. You can now LPAR a single processor and allocate as little as 0.1
CPUs to a partition. This may be particularly useful for Linux (see Linux chapter).
18.2 Considerations
This section provides some guidelines to be used when sizing partitions versus stand-alone systems. The
actual results measured on a partitioned system will vary greatly with the workloads used, relative sizes,
and how each partition is utilized. For information about CPW values, refer to Appendix D, “CPW, CIW
and MCU Values for iSeries”.
When comparing the performance of a standalone system against a single logical partition with similar
machine resources, do not expect them to have identical performance values as there is LPAR overhead
incurred in managing each partition. For example, consider the measurements we ran on a 4-way system
using the standard AS/400 Commercial Processing Workload (CPW) as shown in the chart below.
For the standalone 4-way system we used we measured a CPW value of 1950. We then partitioned the
standalone 4-way system into two 2-way partitions. When we added up the partitioned 2-way values as
shown below we got a total CPW value of 2044. This is a 5% increase from our measured standalone
4-way CPW value of 1950. I.e. (2044-1950)/1950 = 5 %. The reason for this increased capacity can be
attributed primarily to a reduction in the contention for operating system resources that exist on the
standalone 4-way system.
Separately, when you compare the CPW values of a standalone 2-way system to one of the partitions (i.e.
one of the two 2-ways), you can get a feel for the LPAR overhead cost. Our test measurement showed a
capacity degradation of 3%. That is, two standalone 2-ways have a combined CPW value of 2100. The
total CPW values of two 2-ways running on a partitioned four way, as shown above, is 2044. I.e.
(2100-2044)/2044 = -3%.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 18 - LPAR
295
Download from Www.Somanuals.com. All Manuals Search And Download.
The reasons for the LPAR overhead can be attributed to contention for the shared memory bus on a
partitioned system, to the aggregate bandwidth of the standalone systems being greater than the
bandwidth of the partitioned system, and to a lower number of system resources configured for a system
partition than on a standalone system. For example on a standalone 2-way system the main memory
available may be X, and on a partitioned system the amount of main storage available for the 2-way
partition is X-2.
LPAR Performance Considerations
2500
5% increase
3% decrease
1050
1050
2000
1500
1000
500
0
1019
1025
1950
S/Alone 4-W
LPAR 2/2 W
2 x 2-W
Figure 18.1. LPAR Performance Measured Against Standalone Systems
In summary, the measurements on the 4-way system indicate that when a workload can be logically split
between two systems, using LPAR to configure two systems will result in system capacities that are
greater than when the two applications are run on a single system, and somewhat less than splitting the
applications to run on two physically separate systems. The amount of these differences will vary
depending on the size of the system and the nature of the application.
18.3 Performance on a 12-way system
As the machine size increases we have seen an increase in both the performance of a partitioned system
and in the LPAR overhead on the partitioned system. As shown below you will notice that the capacity
increase and LPAR overhead is greater on a 12-way system than what was shown above on a 4-way
system.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 18 - LPAR
296
Download from Www.Somanuals.com. All Manuals Search And Download.
Also note that part of the performance increase of an larger system may have come about because of a
reduction in contention within the CPW workload itself. That is, the measurement of the standalone
12-way system required a larger number of users to drive the system’s CPU to 70 percent than what is
required on a 4-way system. The larger number of users may have increased the CPW workload’s
internal contention. With a lower number of users required to drive the system’s CPU to 70 percent on a
standalone 4-way system., there is less opportunity for the workload’s internal contention to be a factor in
the measurements.
The overall performance of a large system depends greatly on the workload and how well the workload
scales to the large system. The overall performance of a large partitioned system is far more complicated
because the workload of each partition must be considered as well as how each workload scales to the
size of the partition and the resources allocated to the partition in which it is running. While the partitions
in a system do not contend for the same main storage, processor, or I/O resources, they all use the same
main storage bus to access their data. The total contention on the bus affects the performance of each
partition, but the degree of impact to each partition depends on its size and workload.
In order to develop guidelines for partitioned systems, the standard AS/400 Commercial Processing
Workload (CPW) was run in several environments to better understand two things. First, how does the
sum of the capacity of each partition in a system compare to the capacity of that system running as a
single image? This is to show the cost of consolidating systems. Second, how does the capacity of a
partition compare to that of an equivalently sized stand-alone system?
The experiments were run on a 12-way 740 model with sufficient main storage and DASD arms so that
CPU utilization was the key resource. The following data points were collected:
y
y
y
y
Stand-alone CPW runs of a 4-way, 6-way, 8-way, and 12-way
Total CPW capacity of a system partitioned into an 8-way and a 4-way partition
Total CPW capacity of a system partitioned into two 6-way partitions
Total CPW capacity of a system partitioned into three 4-way partitions
The total CPW capacity of a partitioned system is greater than the CPW capacity of the stand-alone
12-way, but the percentage increase is inversely proportional to the size of the largest partition. The CPW
workload does not scale linearly with the number of processors. The larger the number of processors, the
closer the contention on the main storage bus approached the contention level of the stand-alone 12-way
system.
For the partition combinations listed above, the total capacity of the 12-way system increases as shown in
the chart below.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 18 - LPAR
297
Download from Www.Somanuals.com. All Manuals Search And Download.
LPAR Throughput Increase
Total Increase in CPW Capacity of an LPAR System
5400
5300
5200
5100
5000
4900
4800
4700
4600
13%
9%
7%
12-way
8-way+4-way
2 x 6-way
3 x 4-way
LPAR Configuration
Figure 18.2. 12 way LPAR Throughput Example
To illustrate the impact that varying the workload in the partitions has on an LPAR system, the CPW
workload was run at an extremely high utilization in the stand-alone 12-way. This high utilization
increased the contention on the main storage bus significantly. This same high utilization CPW
benchmark was then run concurrently in the three 4-way partitions. In this environment, the total
capacity of the partitioned 12-way exceeded that of the stand-alone 12-way by 18% because the total
main storage bus contention of the three 4-way partitions is much less than that of a stand-alone 12-way.
The capacity of a partition of a large system was also compared to the capacity of an equally sized
stand-alone system. If all the partitions except the partition running the CPW are idle or at low
utilization, the capacity of the partition and an equivalent stand-alone system are nearly identical.
However, when all of the partitions of the system were running the CPW, then the total contention for the
main storage bus has a measurable effect on each of the partitions.
The impact is greater on the smaller partitions than on the larger partitions because the relative increase of
the main storage bus contention is more significant in the smaller partitions. For example, the 4-way
partition is degraded by 12% when an 8-way partition is also running the CPW, but the 8-way partition is
only degraded by 9%. The two 6-way partitions and three 4-way partitions are all degraded by about 8%
when they run CPW together. The impact to each partition is directly proportional to the size of the
largest partition.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 18 - LPAR
298
Download from Www.Somanuals.com. All Manuals Search And Download.
18.4 LPAR Measurements
The following chart shows measurements taken on a partitioned 12-way system with the system’s CPU
utilized at 70 percent capacity. The system was at the V4R4M0 release level.
Note that the standalone 12-way CPW value of 4700 in our measurement is higher than the published
V4R3M0 CPW value of 4550. This is because there was a contention point that existed in the CPW
workload when the workload was run on large systems. This contention point was relieved in V4R4M0
and this allowed the CPW value to be improved and be more representative of a customer workload when
the workload is run on large systems.
Table 18.1 12-way system measurements
Stand alone Total
LPAR CPW
LPAR
Configuration
CPW
Increase
Average LPAR
Overhead
12-way
CPW
4700
LPAR
CPW
5020
5140
5290
Primary Secondary Secondary
8-way, 4-way
(2) 6-ways
(3) 4-ways
7%
9%
13%
3330
2605
1770
1690
2535
1770
n/a
n/a
1750
10 %
9 %
9 %
4700
4700
While we saw performance improvements on a 12-way system as shown above, part of those
improvements may have come about because of a reduction in contention within the CPW workload
itself. That is, the measurement of the standalone 12-way system required a larger number of users to
drive the system’s CPU to 70 percent than what is required on a 4-way system. The larger number of
users may have increased the CPW workload’s internal contention.
With a lower number of users required to drive the system’s CPU to 70 percent on a standalone 4-way
system., there is less opportunity for the workload’s internal contention to be a factor in the
measurements.
The following chart shows our 4-way measurements.
Table 18.2 4-way system measurements
Stand alone Total
LPAR CPW
LPAR
Configuration
CPW
Increase
4-way
CPW
1950
LPAR
CPW
2044
Average LPAR Overhead
3 %
Primary Secondary
(2) 2-ways
5%
1025
1019
The following chart shows the overhead on n-ways of running a single LPAR partition alone vs. running
with other partitions. The differing values for managing partitions is due to the size of the memory nest
and the number of processors to manage (n-way size).
Table 18.3 LPAR overhead per partition
Processors
Measured
Projected
2
4
8
-
1.5 %
3.0 %
-
9.0 %
-
6.0 %
-
12
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 18 - LPAR
299
Download from Www.Somanuals.com. All Manuals Search And Download.
The following chart shows projected LPAR capacities for several LPAR configurations. The projections
are based on measurements on 1 and 2 way measurements when the system’s CPU was utilized at 70
percent capacity. The LPAR overhead was also factored into the projections. The system was at the
V4R4M0 release level.
Table 18.4 Projected LPAR Capacities
LPAR Configuration
Number Processors
Projected CPW Increase
Over a Standalone 12-way
Projected LPAR CPW
12
6
1-ways
2-ways
5920
5700
26 %
21 %
18.5 Summary
On a partitioned system the capacity increases will range from 5% to 26%. The capacity increase will
depend on the number of processors partitioned and on the number of partitions. In general the greater
the number of partitions the greater the capacity increase.
When consolidating systems, a reasonable and safe guideline is that a partition may have about 10% less
capacity than an equivalent stand-alone system if all partitions will be running their peak loads
concurrently. This cross-partition contention is significant enough that the system operator of a
partitioned system should consider staggering peak workloads (such as batch windows) as much as
possible.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 18 - LPAR
300
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 19. Miscellaneous Performance Information
19.1 Public Benchmarks (TPC-C, SAP, NotesBench, SPECjbb2000, VolanoMark)
iSeries systems have been represented in several public performance benchmarks. The purpose of these
benchmarks is to give an indication of relative strength in a general field of computing. Benchmark
results can give confidence in a system's capabilities, but should not be viewed as a sole criterion for the
purchase or upgrading of a system. We do not include specific benchmark results in this chapter, because
the positioning of these results are constantly changing as other vendors submit their own results. Instead,
this section will reference several locations on the internet where current information may be found.
A good source of information on many benchmark results can be found at the ideasInternational
TPC-C Commercial Performance
The Transaction Processing Performance Council's TPC Benchmark C (TPC-C (**)) is a public
benchmark that stresses systems in a full integrity transaction processing environment. It was designed to
stress systems in a way that is closely related to general business computing, but the functional emphasis
may still vary significantly from an actual customer environment. It is fair to note that the business model
for TPC-C was created in 1990, so computing technologies that were developed in subsequent years are
not included in the benchmark.
There are two methods used to measure the TPC-C benchmark. One uses multiple small systems
connected to a single database server. This implementation is called a "non-cluster" implementation by
the TPC. The other implementation method grows this configuration by coupling multiple database
servers together in a clustered environment. The benchmark is designed in such a way that these clusters
scale far better than might be expected in a real environment. Less than 10% of the transactions touch
more than one of the database server systems, and for that small number the cross-system access is
typically for only a single record. Because the benchmark allows unrealistic scaling of clustered
configurations, we would advise against making comparisons between clustered and non-clustered
configurations. All iSeries results and AS/400 results in this benchmark are non-clustered configurations -
showing the strengths of our system as a database server.
The most current level of TPC-C benchmark standards is Version 5, which requires the same performance
reporting metrics but now requires pricing of configurations to include 24 hr x 7 day a week maintenance
rather than 8 hr x 5 day a week and some additional changes in pricing the communication connections.
All previous version submissions from reporting vendors have been offered the opportunity to simply
republish their results with these new metric ground rules. And as of April, 2001 not all vendors have
chosen to republish their results to the new Version 5 standard. iSeries and pSeries has republished.
For additional information on the benchmark and current results, please refer to the TPC's web site at:
SAP Performance Information
Several Business Partner companies have defined benchmarks for which their applications can be rated on
different hardware and middle ware platforms. Among the first to do this was SAP. SAP has defined a
suite of "Standard Application Benchmarks", each of which stresses a different part of SAP's solutions.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 19 - Misc Perf Information
301
Download from Www.Somanuals.com. All Manuals Search And Download.
The most commonly run of these is the SAP-SD (Sales and Distribution) benchmark. It can be run in a
2-tier environment, where the application and database reside on the same system, or on a 3-tier
environment, where there are many application servers feeding into a database server.
Care must be taken to ensure that the same level of software is being run when comparing results of SAP
benchmarks. Like most software suppliers, SAP strives to enhance their product with useful functions in
each release. This can yield significantly different performance characteristics between releases such as
4.0B, 4.5B, and 4.6C. It should be noted that, although SAP is used as an example here, this situation is
not restricted to SAP software.
Application Benchmarks Published Results.
NotesBench
There are several benchmarks that are called "Notesbench xxx". All come from the Notesbench
Consortium, a consortium of vendors interested in using benchmarks to help quantify system capabilities
using Lotus Domino functions. The most popular benchmark is Notesbench R5 Mail, which is actually a
mail and calendar benchmark that was designed around the functions of Lotus Domino Release 5.0.
AS/400 and iSeries systems have traditionally demonstrated very strong performance in both capacity and
response time in Notesbench results.
For official iSeries audited NotesBench results, see http://www.notesbench.org . (Note: in order to access
the NotesBench results you will need to apply for a userid/password through the Notesbench
organization. Click on Site Registration at the above address.) An alternate is to refer to the
ideasInternational web site listed above.
For more information on iSeries performance in Lotus Domino environments, refer to Chapter 11 of this
document.
SPECjbb2000
The Standard Performance Evaluation Corporation (SPEC) defined, in June, 2000, a server-side Java
benchmark called SPECjbb2000. It is one of the only Java-related benchmarks in the industry that
concentrates on activity in the server, rather than a single client. The iSeries architecture is well suited for
an object-oriented environment and it provides one of the most efficient and scalable environments for
server-side Java workloads. iSeries and AS/400 results are consistently at or near the top rankings for this
benchmark.
For more information on SPECjbb2000 and for published results, see http://www.spec.org/osg/jbb2000/
For more information on iSeries performance in Java environments, refer to Chapter 7 of this document.
VolanoMark
IBM has chosen the VolanoMark benchmark as another means for demonstrating strength with
server-side Java applications. VolanoMark is a 100% Pure Java server benchmark characterized by
long-lasting network connections and high thread counts. It is as much a test of tcp/ip strengths as it is of
multithreaded, server-side Java strengths. In order to scale well in this benchmark, a solution needs to
scale well in tcp/ip, Java-based applications, multithreaded application, and the operating system in
general. Additional information on the benchmark can be found at
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 19 - Misc Perf Information
302
Download from Www.Somanuals.com. All Manuals Search And Download.
This web site is primarily focused on results for systems that the Volano company measures themselves.
These results tend to be for much smaller, Intel-based systems that are not comparable with iSeries
servers. The web site also references articles written by other groups regarding their measurements of the
benchmark, including AS/400 and iSeries articles. iSeries servers have demonstrated significant strengths
in this benchmark, particularly in scaling to large systems.
19.2 Dynamic Priority Scheduling
On an AS/400 CISC-model, all ready-to-run OS/400 jobs and Licensed Internal Code (LIC) tasks are
sequenced on the Task Dispatching Queue (TDQ) based on priority assigned at creation time. In addition,
for N-way models, there is a cache affinity field used by Horizontal Licensed Internal Code (HLIC) to
keep track of the processor on which the job was most recently active. A job is assigned to the processor
for which it has cache affinity, unless that would result in a processor remaining idle or an excessive
number of higher-priority jobs being skipped. The priority of jobs varies very little such that the
resequencing for execution only affects jobs of the same initially assigned priority. This is referred to as
Fixed Priority Scheduling.
For V3R6 and beyond, the new algorithm being used is Dynamic Priority Scheduling. This new
scheduler schedules jobs according to "delay costs" dynamically computed based on their time waiting in
the TDQ as well as priority. The job priority may be adjusted if it exceeded its resource usage limit. The
cache affinity field is no longer used in a N-way multiprocessor machine. Thus, on an N-way
multiprocessor machine, a job will have equal affinity for all processors, based only on delay cost.
A new system value, QDYNPTYSCD, has been implemented to select the type of job dispatching. The
job scheduler uses this system value to determine the algorithm for scheduling jobs running on the
system. The default for this system value is to use Dynamic Priority Scheduling (set to '1'). This
scheduling scheme allows the CPU resource to be spread to all jobs in the system.
The benefits of Dynamic Priority Scheduling are:
y
y
y
y
No job or set of jobs will monopolize the CPU
Low priority jobs, like batch, will have a chance to progress
Jobs which use too much resource will be penalized by having their priority reduced
Jobs response time/throughput will still behave much like fixed priority scheduling
By providing this type of scheduling, long running, batch-type interactive transactions, such as a query,
will not run at priority 20 all the time. In addition, batch jobs will get some CPU resources rather than
interactive jobs running at high CPU utilization and delivering response times that may be faster than
required.
To use Fixed Priority Scheduling, the system value has to be set to '0'.
Delay Cost Terminology
y
Delay Cost
Delay cost refers to how expensive it is to keep a job in the system. The longer a job spends in the
system waiting for resources, the larger its delay cost. The higher the delay cost, the higher the
priority. Just like the priority value, jobs of higher delay cost will be dispatched ahead of other jobs
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 19 - Misc Perf Information
303
Download from Www.Somanuals.com. All Manuals Search And Download.
of relatively lower delay cost.
Waiting Time
y
y
The waiting time is used to determine the delay cost of a job at a particular time. The waiting time of
a job which affects the cost is the time the job has been waiting on the TDQ for execution.
Delay Cost Curves
The end-user interface for setting job priorities has not changed. However, internally the priority of a
job is mapped to a set of delay cost curves (see "Priority Mapping to Delay Cost Curves" below).
The delay cost curve is used to determine a job's delay cost based on how long it has been waiting on
the TDQ. This delay cost is then used to dynamically adjust the job's priority, and as a result, possibly
the position of the job in the TDQ.
On a lightly loaded system, the jobs' cost will basically stay at their initial point. The jobs will not
climb the curve. As the workload is increased, the jobs will start to climb their curves, but will have
little, if any, effect on dispatching. When the workload gets around 80-90% CPU utilization, some of
the jobs on lower slope curves (lower priority), begin to overtake jobs on higher slope curves which
have only been on the dispatcher for a short time. This is when the Dynamic Priority Scheduler
begins to benefit as it prevents starvation of the lower priority jobs. When the CPU utilization is at a
point of saturation, the lower priority jobs are climbing quite a way up the curve and interacting with
other curves all the time. This is when the Dynamic Priority Scheduler works the best.
Note that when a job begins to execute, its cost is constant at the value it had when it began
executing. This allows other jobs on the same curve to eventually catch-up and get a slice of the
CPU. Once the job has executed, it "slides" down the curve it is on, to the start of the curve.
Priority Mapping to Delay Cost Curves
The mapping scheme divides the 99 'user' job priorities into 2 categories:
y
y
y
User priorities 0-9
This range of priorities is meant for critical jobs like system jobs. Jobs in this range will NOT be
overtaken by user jobs of lower priorities. NOTE: You should generally not assign long-running,
resource intensive jobs within this range of priorities.
User priorities 10-99
This range of priorities is meant for jobs that will execute in the system with dynamic priorities. In
other words, the dispatching priorities of jobs in this range will change depending on waiting time in
the TDQ if the QDYNPTYSCD system value is set to '1'.
The priorities in this range are divided into groups:
y
y
y
y
Priority 10-16
Priority 17-22
Priority 23-35
Priority 36-46
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 19 - Misc Perf Information
304
Download from Www.Somanuals.com. All Manuals Search And Download.
y
y
y
Priority 47-51
Priority 52-89
Priority 90-99
Jobs in the same group will have the same resource (CPU seconds and Disk I/O requests) usage
limits. Internally, each group will be associated with one set of delay cost curves. This would give
some preferential treatment to jobs of higher user priorities at low system utilization.
With this mapping scheme, and using the default priorities of 20 for interactive jobs and 50 for batch jobs,
users will generally see that the relative performance for interactive jobs will be better than that of batch
jobs, without CPU starvation.
Performance Testing Results
Following are the detailed results of two specific measurements to show the effects of the Dynamic
Priority Scheduler:
In Table 19.1, the environment consists of the RAMP-C interactive workload running at approximately
70% CPU utilization with 120 workstations and a CPU intensive interactive job running at priority 20.
In Table 19.2 below, the environment consists of the RAMP-C interactive workload running at
approximately 70% CPU utilization with 120 workstations and a CPU intensive batch job running at
priority 50.
Table 19.1. Effect of Dynamic Priority Scheduling: Interactive Only
QDYNPTYSCD = ‘1’ (ON)
QDYNPTYSCD = ‘0’
Total CPU Utilization
Interactive CPU Utilization
RAMP-C Transactions per Hour
RAMP-C Average Response Time
Priority 20 CPU Intensive Job CPU
93.9%
77.6%
60845
0.32
97.8%
82.2%
56951
0.75
21.9%
28.9%
Table 19.2. Effect of Dynamic Priority Scheduling: Interactive and Batch
QDYNPTYSCD = ‘1’ (ON)
QDYNPTYSCD = ‘0’
Total CPU Utilization
89.7%
56.3%
61083
0.30
15.0%
01:06:52
90.0%
57.2%
61692
0.21
14.5%
01:07:40
Interactive CPU Utilization
RAMP-C Transactions per Hour
RAMP-C Average Response Time
Batch Priority 50 Job CPU
Batch Priority 50 Job Run Time
Conclusions/Recommendations
y
When you have many jobs running on the system and want to ensure that no one CPU intensive job
'takes over' (see Table 19.1 above), Dynamic Priority Scheduling will give you the desired result. In
this case, the RAMP-C jobs have higher transaction rates and faster response times, and the priority
20 CPU intensive job consumes less CPU.
y
Dynamic Priority Scheduling will ensure your batch jobs get some of the CPU resources without
significantly impacting your interactive jobs (see Table 96). In this case, the RAMP-C workload gets
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 19 - Misc Perf Information
305
Download from Www.Somanuals.com. All Manuals Search And Download.
less CPU utilization resulting in slightly lower transaction rates and slightly longer response times.
However, the batch job gets more CPU utilization and consequently shorter run time.
y
It is recommended that you run with Dynamic Priority Scheduling for optimum distribution of
resources and overall system performance.
For additional information, refer to the Work Management Guide.
19.3 Main Storage Sizing Guidelines
To take full advantage of the performance of the new AS/400 Advanced Series using PowerPC
technology, larger amounts of main storage are required. To account for this, the new models are
provided with substantially more main storage included in their base configurations. In addition, since
more memory is required when moving to RISC, memory prices have been reduced.
The increase in main storage requirements is basically due to two reasons:
y
y
When moving to the PowerPC RISC architecture, the number of instructions to execute the same
program as on CISC has increased. This does not mean the function takes longer to execute, but it
does result in the function requiring more main storage. This obviously has more of an impact on
smaller systems where fewer users are sharing the program.
The main storage page size has increased from 512 bytes to 4096 bytes (4KB). The 4KB page size is
needed to improve the efficiency of main storage management algorithms as main storage sizes
increase dramatically. For example, 4GB of main storage will be available on AS/400 Advanced
System model 530.
The impact of the 4KB page size on main storage utilization varies by workload. The impact of the
4KB page size is dependent on the way data is processed. If data is being processed sequentially, the
4KB page size will have little impact on main storage utilization. However, if you are processing
data randomly, the 4KB page size will most likely increase the main storage utilization.
19.4 Memory Tuning Using the QPFRADJ System Value
The Performance Adjustment support (QPFRADJ system value) is used for initially sizing memory pools
and managing them dynamically at run time. In addition, the CHGSHRPOOL and WRKSHRPOOL
commands allow you to tailor memory tuning parameters used by QPFRADJ. You can specify your own
faulting guidelines, storage pool priorities, and minimum/maximum size guidelines for each shared
memory pool. This allows you the flexibility to set unique QPFRADJ parameters at the pool level.
For a detailed discussion of what changes are made by QPFRADJ, see the Work Management Guide.
What follows is a description of some of the affects of this system value and some discussion of when the
various settings might be appropriate.
When the system value is set to 1, adjustments are made to try to balance the machine pool, base pool,
spooling pool, and interactive pool at IPL time. The machine pool is based on the amount of storage
needed for the physical configuration of the system; the spool pool is fairly small and reflects the number
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 19 - Misc Perf Information
306
Download from Www.Somanuals.com. All Manuals Search And Download.
of printers in the configuration. 70% of the remaining memory is allocated to the interactive pool; 30% to
the base pool.
A QPFRADJ value of 1 ensures that memory is allocated on the system in a way that the system will
perform adequately at IPL time. It does not allow for reaction to changes in workload over time. In
general, this value is avoided unless a routine will be run shortly after an IPL that will make adjustments
to the memory pools based on the workload.
When the system value is set to 2, adjustments are made as described, plus dynamic changes are made
as changes in workload occur. In addition to the pools mentioned above, shared pools (*SHRPOOLxxx)
are also managed dynamically. Adjustments are based on the number of jobs active in the subsystem
using the pool, the faulting rates in the pool, and on changes in the workload over the course of time.
This is a good option for most environments. It attempts to balance system memory resources based on
the workload that is being run at the time. When workload changes occur, such as time-of-day changes
when one workload may increase while another may decrease, memory resources are gradually shifted to
accommodate the heaviest loads.
When the system value is set to 3, adjustments are only made during the runtime, not as a result of an
IPL.
This is a good option if you believe that your memory configuration was reasonable prior to scheduling
an IPL. Overall, having the system value set to 2 or 3 will yield a similar effect for most environments.
When the system value is set to 0, no adjustments are made. This is a good option if you plan on
managing the memory by yourself. Examples of this may be if you know times when abrupt changes in
memory are likely to be required (such as a difference between daytime operations and nighttime
operations) or when you want to always have memory available for specific, potentially sporadic work,
even at the expense of not having that memory available for other work. It should be noted, however, that
this latter case can also be covered by using a private memory pool for this work. The QPFRADJ system
value only affects tuning of system-supplied shared pools.
19.5 Additional Memory Tuning Techniques
Expert Cache
Normally, the system will treat all data that is brought into a memory pool in a uniform way. In a purely
random environment, this may be the best option. However, there are often situations where some files
are accessed more often than others or when some are accessed in blocks of information instead of
randomly. In these situations, the use of "Expert Cache" may improve the efficiency of the memory in a
pool. Expert Cache is enabled by changing the pool attribute from *FIXED to *CALC. One advantage for
using Expert Cache (*CALC) is that the system dynamically determines which objects should have larger
blocks of data brought into main storage. This is based on how frequently the object is accessed. If the
object is no longer accessed heavily, the system automatically makes the storage available for other
objects that are accessed. If the newly accessed objects then become heavily accessed, the objects have
larger blocks of data placed in main storage.
Expert Cache is often the best solution for batch processing, when relatively few files may be accessed in
large blocks at a time or in sequential order. It is also beneficial in many interactive environments when
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 19 - Misc Perf Information
307
Download from Www.Somanuals.com. All Manuals Search And Download.
files of differing characteristics are being accessed. The pool attribute can be changed from *FIXED to
*CALC and back at any time, so making a change and evaluating its affect over a period of time is a
fairly safe experiment.
More information about Expert Cache can be found in the Work Management guide.
In some situations, you may find that you can achieve better memory utilization by defining the caching
characteristics yourself, rather than relying on the system algorithms. This can be done using the
QWCCHGTN (Change Pool Tuning Information) API, which is described in the Work Management API
reference manual. This API was provided prior to the offering of the *CALC option for the system. It is
still available for use, although most situations will see relatively little improvement over the *CALC
option and it is quite possible to achieve less improvement than with *CALC. When the API is used to
adjust the pool attribute, the value that is shown for the pool is USRDFN (user defined).
SETOBJACC (Set Object Access)
In some cases, the object access performance is improved when the user manually defines (names a
specific object) which object is placed into main storage. This can be achieved with the SETOBJACC
command. This command will clear any pages of an object that are in other storage pools and moves the
object to the specified pool. If the object is larger than the pool, the first portions of the object are
replaced with the later pages that are moved into the pool. The command reports on the current amount of
storage that is used in the pool.
If SETOBJACC is used when the QPFRADJ system value is set to either 2 or 3, the pool that is used to
hold the object should be a private pool so that the dynamic adjustment algorithms do not shrink the pool
because of the lack of job activity in the pool.
Large Memory Systems
Normally, you will use memory pools to separate specific sets of work, leaving all jobs which do a similar
activity in the same memory pool. With today's ability to configure many gigabytes of mainstore, you
may also find that work can be done more efficiently if you divide large groups of similar jobs into
separate memory pools. This may allow for more efficient operation of the algorithms which need to
search the pool for the best candidates to purge when new data is being brought in. Laboratory
experiments using the I/O intensive CPW workload on a fully configured 24-way system have shown
about a 2% improvement in CPU utilization when the transaction jobs were split among pools of about
16GB each, rather than all running in a single memory pool.
19.6 User Pool Faulting Guidelines
Due to the large range of AS/400 processors and due to an ever increasing variance in the complexity of
user applications, paging guidelines for user pools are no longer published. Even the system wide
guidelines are just that...guidelines. Each customer needs to track response time, throughput, and cpu
utilization against the paging rates to determine a reasonable paging rate.
There are two choices for tuning user pools:
1. Set system value QPFRADJ = 2 or 3, as described earlier in this chapter.
2. Manual tuning. Move storage around until the response times and throughputs are acceptable. The
rest of this section deals with how to determine these acceptable levels.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 19 - Misc Perf Information
308
Download from Www.Somanuals.com. All Manuals Search And Download.
To determine a reasonable level of page faulting in user pools, determine how much the paging is
affecting the interactive response time or batch throughput. These calculations will show the percentage
of time spent doing page faults.
The following steps can be used: (all data can be gathered w/STRPFRMON and printed w/PRTSYSRPT).
The following assumes interactive jobs are running in their own pool, and batch jobs are running in their
own pool.
Interactive:
1. flts = sum of database and non-database faults per second during a meaningful sample interval for the
interactive pool.
2. rt = interactive response time for that interval.
3. diskRt = average disk response time for that interval.
4. tp = interactive throughput for that interval in transactions per second. (transactions per hour/3600
seconds per hour)
5. fltRtTran = diskRt * flts / tp = average page faulting time per transaction.
6. flt% = fltRtTran / rt * 100 = percentage of response time due to
7. If flt% is less than 10% of the total response time, then there's not much potential benefit of adding
storage to this interactive pool. But if flt% is 25% or more of the total response time, then adding
storage to the interactive pool may be beneficial (see NOTE below).
Batch:
1. flts = sum of database and non-database faults per second during a meaningful sample interval for the
batch pool.
2. flt% = flts * diskRt X 100 = percentage of time spent page faulting in the batch pool. If multiple batch
jobs are running concurrently, you will need to divide flt% by the number of concurrently running
batch jobs.
3. batchcpu% = batch cpu utilization for the sample interval. If higher priority jobs (other than the batch
jobs in the pool you are analyzing) are consuming a high percentage of the processor time, then flt%
will always be low. This means adding storage won't help much, but only because most of the batch
time is spent waiting for the processor. To eliminate this factor, divide flt% by the sum of flt% and
batchcpu%. That is: newflt% = flt% / (flt% + batchcpu%)
This is the percentage of time the job is spent page faulting compared to the time it spends at the
processor.
4. Again, the potential gain of adding storage to the pool needs to be evaluated. If flt% is less than 10%,
then the potential gain is low. If flt% is greater than 25% then the potential gain is high enough to
warrant moving main storage into this batch pool.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 19 - Misc Perf Information
309
Download from Www.Somanuals.com. All Manuals Search And Download.
NOTE:
It is very difficult to predict the improvement of adding storage to a pool, even if the potential gain
calculated above is high. There may be instances where adding storage may not improve anything
because of the application design. For these circumstances, changes to the application design may be
necessary.
Also, these calculations are of limited value for pools that have expert cache turned on. Expert cache can
reduce I/Os given more main storage, but those I/Os may or may not be page faults.
19.7 AS/400 NetFinity Capacity Planning
Performance information for AS/400 NetFinity attached to a V4R1 AS/400 is included below. The
following NetFinity functions are included:
y
y
Time to collect software inventory from client PCs
Time to collect hardware inventory from client PCs
The figures below illustrate the time it takes to collect software and hardware inventory from various
numbers of client PCs. This test was conducted using the Rochester development site, during normal
working hours with normal activity (i.e.. not a dedicated environment). This environment consists of:
y
y
y
y
y
16 and 4Mb token ring LANs (mostly 16)
LANs connected via routers and gateways
Dedicated AS/400
TCP/IP
Client PCs varied from 386s to Pentiums (mostly 100 MHz with 32MB memory), using OS/2,
Windows/95 and NT
y
About 20K of data was collected, hardware and software, for each client
While these tests were conducted in a typical work environment, results from other environments may
vary significantly from what is provided here.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 19 - Misc Perf Information
310
Download from Www.Somanuals.com. All Manuals Search And Download.
AS/400 NetFinity Software Inventory Performance
240
220
AS/400 510-2142
200
180
160
140
120
100
80
Token Rings
TPC/IP
V4R1
About 100 clients were
collected in 42 minutes
60
40
20
0
0
100
200
300
400
500
600
Number of PC Clients
Figure 19.1. AS/400 NetFinity Software Inventory Performance
AS/400 NetFinity Hardware Inventory Performance
100
80
60
40
20
0
AS/400 510-2142
Token Rings
TCP/IP
V4R1
0
100
200
300
400
500
600
Number of PC Clients
Figure 19.2. AS/400 NetFinity Hardware Inventory Performance
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 19 - Misc Perf Information
311
Download from Www.Somanuals.com. All Manuals Search And Download.
Conclusions/Recommendations for NetFinity
1. The time to collect hardware or software information for a number of clients is fairly linear.
2. The size of the AS/400 CPU is not a limitation. Data collection is performed at a batch priority. CPU
utilization can spike quite high (ex. 80%) when data is arriving, but in general is quite low (ex. 10%).
3. The LAN type (4 or 16Mb Token Ring or Ethernet) is not a limitation. Hardware collection tends to
be more chatty on the LAN than software collection, depending on the hardware features.
4. The communications protocol (IPX, TCP/IP, or SNA) is not a limitation.
5. Collected data is automatically stored in a standard DB2/400 database file, accessible by SQL and
other APIs.
6. Collection time depends on clients being powered-on and the needed software turned on. The server
will retry 5 times.
7. The number of jobs on the server increases during collection and decreases when not needed.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 19 - Misc Perf Information
312
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 20. General Performance Tips and Techniques
This section's intent is to cover a variety of useful topics that "don't fit" in the document as a whole, but
provide useful things that customers might do or deal with special problems customers might run into on
iSeries. It may also contain some general guidelines.
20.1 Adjusting Your Performance Tuning for Threads
History
Historically, the iSeries and AS/400 programmers have not had to worry very much about threads. True,
they were introduced into the machine some time ago, but the average RPG application does not use them
and perhaps never will, even if it is now allowed. Multiple-thread jobs have been fairly rare. That means
that those who set up and organize AS/400 subsystems (e.g. QBATCH, QINTER,
MYOWNSUBSYSTEM, etc.) have not had to think much about the distinction between a "job" and a
"thread."
The Coming Change
But, threads are a good thing and so applications are increasingly using them. Especially for customers
deploying (say) a significant new Java application, or Domino, a machine with the typical
one-thread-per-job model may suddenly have dozens or even hundreds of threads in a particular job.
Unfortunately, they are distinct ideas and certain AS/400 commands carefully distinguish them. If
iSeries System Administrators are careless about these distinctions, as it is so easy to do today, poor
performance can result as the system moves on to new applications such as Lotus Domino or especially
Java.
With Java generally, and with certain applications, it will be commonplace to have multiple threads in a
job. That means taking a closer look at some old friends: MAXACT and MAXJOB.
Recall that every subsystem has at least one pool entry. Recall further that, in the subsystem description
itself, the pool number is an arbitrary number. What is more important is that the arbitrary number maps
to a particular, real storage pool (*BASE, *SHRPOOL1, etc.). When a subsystem is actually started, the
actual storage pool (*SHRPOOL1), if someone else isn't already using it, comes to life and obtains its
storage.
However, storage pools are about more than storage. They are also about job and thread control. Each
pool has an associated value called MAXACT that also comes into play. No matter how many
subsystems share the pool, MAXACT limits the total number of threads able to reside and execute in the
pool. Note that this is threads and not jobs.
Each subsystem, also, has a MAXJOBS value associated with it. If you reach that value, you are not
supposed to be able to start any more jobs in the subsystem. Note that this is a jobs value and not a
threads value. Further, within the subsystem, there are usually one or more JOBQs in the subsystem.
Within each entry you can also control the number of jobs using a parameter. Due to an unfortunate turn
in history, this parameter, which might more logically be called MAXJOBS today is called MAXACT.
However, it controls jobs, not threads.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 20 - General Tips and Techniques
313
Download from Www.Somanuals.com. All Manuals Search And Download.
Problem
It is too easy to use the overall pool's value of MAXACT as a surrogate for controlling the number of
Jobs. That is, you can forget the distinction between jobs and threads and use MAXACT to control the
activity in a storage pool. But, you are not controlling jobs; you are controlling threads.
It is also too easy to have your existing MAXACT set too low if your existing QBATCH subsystem
suddenly sees lots of new Java threads from new Java applications.
If you make this mistake (and it is easy to do), you'll see several possible symptoms:
v Mysterious failures in Java. If you set the value of MAXACT really low, certainly as low as one,
sometimes Java won't run, but it also won't always give a graceful message explaining why.
v Mysterious "hangs" and slowdowns in the system. If you don't set the value pathologically low,
but still too low, the system will function. But it will also dutifully "kick out" threads to a limbo
known as "ineligible" because that's what MAXACT tells it to do. When MAXACT is too low, the
result is useless wait states and a lot of system churn. In severe cases, it may be impossible to
"load up" a CPU to a high utilization and/or response times will substantially increase.
v Note carefully that this can happen as a result of an upgrade. If you have just purchased a new
machine and it runs slower instead of faster, it may be because you're using "yesterday's" limits for
MAXACT
If you're having threads thrown into "ineligible", this will be visible via the WRKSYSSTS command.
Simply bring it up, perhaps press PF11 a few times, and see if the Act->Inel is something other than zero.
Note that other transitions, especially Act->Wait, are normal.
Solution
Make sure the storage pool's MAXACT is set high enough for each individual storage pool. A
MAXACT of *NOMAX will sometimes work quite well, especially if you use MAXJOBS to control the
amount of working coming into each subsystem.
Use CHGSHRPOOL to change the number of threads that can be active in the pool (note that multiple
subsystems can share a pool):
CHGSHRPOOL ACTLVL(newmax)
Use MAXJOB in the subsystem to control the amount of outstanding work in terms of jobs:
CHGSBSD QBATCH MAXJOBS(newmax)
Use the Job Queue Entry in the subsystem to have even finer control of the number of jobs:
CHGJOBQE SBSD(QBATCH) JOBQ(QBATCH) MAXACT(newqueue job maximum)
Note in this particular case that MAXACT does refer to jobs and not threads.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 20 - General Tips and Techniques
314
Download from Www.Somanuals.com. All Manuals Search And Download.
20.2 General Performance Guidelines -- Effects of Compilation
In general, the higher the optimization, the less easy the code will be to debug. It may also be the case
that the program will do things that are initially confusing.
In-lining
For instance, suppose that ILE Module A calls ILE Module B. ILE Module B is a C program that does
allocation (malloc/free in C terms). However, in the right circumstances, compiler optimization will
"inline" Module B. In-lining means that the code for B is not called, but it is copied into the calling
module instead and then further optimized. So, for at least Module A, then, the "in-lined" Module B will
cease to be an individual compiled unit and simply have its code copied, verbatim, into A.
Accordingly, when performance traces are run, the allocation activity of Module B will show up under
Module A in the reports. Exceptions would also report the exception taking place in Module A of
Program X.
In-lining of "final" methods is possible in Java as well, with similar implications.
Optimization Levels
Most of the compilers and Java support a reasonably compatible view of optimization. That is, if you
specify OPTIMIZE(10) in one language, it performs similar levels of optimization in another language,
including Java's CRTJVAPGM command. However, these things can differ at the detailed level. Consult
the manuals in case of uncertainty.
Generally:
v OPTIMIZE(10) is the lowest and most debuggable.
v OPTIMIZE(20) is a trade-off between rapid compilation and some minimal optimization
v OPTIMIZE(30) provides a higher level of optimization, though it usually avoids the more
aggressive options. This level can debug with difficulty.
v OPTIMIZE(40) provides the highest level of optimization. This includes sophisticated analysis,
"code motion" (so that the execution results are what you asked for, but not on a
statement-by-statement basis), and other optimizations that make debugging difficult. At this level
of optimization, the programmer must pay stricter attention to the manuals. While it is surprisingly
often irrelevant in actual cases, many languages have specific definitions that allow latitude to
highly optimized compilers to do or, more importantly, "not do" certain functions. If the coder is
not aware of this, the code may behave differently than expected at high optimization levels.
LICOPT
A new option has been added to most ILE Languages called LICOPT. This allows language specific
optimizations to be turned on and off as individual items. A full description of this is well beyond the
scope of this paper, but those interested in the highest level of performance and yet minimizing potential
difficulties with specific optimization types would do well to study these options.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 20 - General Tips and Techniques
315
Download from Www.Somanuals.com. All Manuals Search And Download.
20.3 How to Design for Minimum Main Storage Use (especially with Java, C, C++)
The iSeries family has added popular languages whose usage continues to increase -- Java, C, C++.
These languages frequently use a different kind of storage -- heap storage.
Many iSeries programmers, with a background in RPG or COBOL are unaware of the influence this may
have on storage consumption. Why? Simply because these languages, by their nature, do not make much
if any use of the heap. Meanwhile, C, C++, and Java very typically do.
The implications can be very profound. Many programmers are unclear about the tradeoffs and, when
reducing memory usage, frequently attack the wrong problem. It is surprisingly easy, with these
languages, to spend many megabytes and even hundreds of megabytes of main storage without really
understanding how and why this was done.
Conversely, with the right understanding of heap storage, a programmer might be able to solve a much
larger problem on the identical machine.
Theory -- and Practice
This is one place where theory really matters. Often, programmers wonder whether a theory applies in
practice. After surveying a set of applications, we have concluded that the theory of memory usage
applies very widely in practice.
In computer science theory, programmers are taught to think about how many “entities” there are, not
how big the entity is. It turns out that controlling the number of entities matters most in terms of
controlling main storage -- and even processor usage (it costs some CPU, after all, to have and initialize
storage in the first place). This is largely a function of design, but also of storage layout. It is also
knowing which storage is critical and which is not. Formally, the literature talks about:
Order(1) -- about one entity per system
Order(N) -- about “N” entities, where “N” are things like number of data base records, Java objects, and
like items.
Order(N log N) -- this can arise because there is a data base and it has an accompanying index.
Order(N squared) -- data base joins of two data bases can produce this level of storage cost
Note the emphasis on “about.” It is the number of entities in relation to the elements of the problem that
count. An element of the problem is not a program or a subsystem description. Those are Order(1) costs.
It is a data base record, objects allocated from the heap inside of loops, or anything like these examples.
In practice, Order(N) storage predominates, so this paper will concentrate on Order(N).
Of course, one must eventually get down to actual sizes. Thus, one ends up with actual costs that get
Order(N) estimated like this:
ActualCostForOrder(1) = a
ActualCostInBytes(N) = a + (b x N)
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 20 - General Tips and Techniques
316
Download from Www.Somanuals.com. All Manuals Search And Download.
Where a and b are constants. “a” is determined by adding up things like the static storage taken up by the
application program. “b” is the size of the data base record plus the size of anything else, such as a Java
object, that is created one entity per data base record. In some applications, “N” will refer to some
freestanding fact, like the maximum number of concurrent web serving operations or the number of
outstanding new orders being processed.
However, the number of data base records will very often be the source of “N.” Of course, with multiple
data base files, there may be more than one actual “N”. Still, it is usually true that the record count of one
file compared to another will often be available as a ratio. For instance, one could have an “Order” record
and average of three and a half “Order Detail” records. As long as the ratio is reasonably stable or can be
planned at a stable value, it is a matter of convention which is picked to be “N” in that case; one merely
adjusts “b” in the above equation to account for what is picked for “N”.
System Level Considerations
In terms of the computer science textbooks, we are largely done. But, for someone in charge of
commercial application deployment, there is one more practical thing to consider: Jobs and those newer
items that now often come with them, threads.
Formally, if there is only one job or thread, then these are part of the Order(1) storage. If there are many,
they end up proportional to N (e.g. One job for every 100,000 active records) and so are part of the
Order(N) storage cost.
However, it is frequently possible to adjust these based on observed performance effects; the ratio to N is
not entirely fixed. So, it is remains of interest to segregate these when planning storage. So, while they
will not appear on the formal computer science literature, this paper will talk about Order(j) and Order(t)
storage.
Typical Storage Costs
Here are typical things in modern systems and where they ordinarily sit in terms of their “entity”
relationships.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 20 - General Tips and Techniques
317
Download from Www.Somanuals.com. All Manuals Search And Download.
Order(1)
ILE and OS/400
Programs
Order(j)
Order(t)
Order(N)
Just In Time compiled
programs (Java *JIT)
Java threads
Data Base Records and
IFS file records
Subsystem Descriptions Total Job Storage
File Buffers of all kinds Java (and C/C++)
objects
Direct Execution Java
Programs
Static storage from
RPG and COBOL.
Static final in Java.
SQL Result Set
(nonrecord)
Operating System
copies (e.g. Data Base)
copies of application
records
System values
Java Virtual Machine
and most WebSphere
storage
Program stack storage
SQL records in a result
set
A Brief Example
To show these concepts, consider a simple example.
Part of a financial system has three logical elements to deal with:
1. An order record (order summary including customer information, sales tax, etc.)
2. An order detail record (individual purchased items, quantities, prices).
3. A table containing international currency rates of exchange between two arbitrary countries.
Question: What is more important? Reducing the cost of the detail record by a couple of bytes, or
reducing the currency table from a cost of N squared (where “N” is the number of countries) to 2 times N.
There are two obvious implementations of the currency table:
1. Implement the table as a two dimensional array such that CurrencyExchangei,j will give the exchange
between countryi and countryj for all countries.
2. Implement the table as a single dimension array with the ith element being the exchange rate between
countryi and the US dollar. One can convert to any country simply by converting twice; once to dollars
and once to the other currency.
Clearly, the second is more storage efficient.
Now consider the first problem. The detail record looks like this:
Quantity as a four byte number (9B or 10B in RPG terms).
Name of the item (up to 60 characters)
Price of the item (as a zoned decimal field, 15 total digits with two decimal points).
A simple scrub would give:
Quantity as a two byte number (4B in RPG terms).
Name of the item (probably still 60 characters)
Price of the item (as a packed decimal field, probably 10 total digits with two decimal points).
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 20 - General Tips and Techniques
318
Download from Www.Somanuals.com. All Manuals Search And Download.
How practical this change would be, if it represented a large, existing data base, would be a separate
question. If this is at the initial design, however, this is an easy change to make.
Boundary considerations. In Java, we are done because Java will order the three entities such that the
least amount of space is wasted. In C and C++, it might be possible to lay out the storage entities such
that the compiler will not introduce padding between elements. In this particular example, the order given
above would work out well.
Which is more important?
Reading the above superficially, one would expect the currency table improvement to matter most. There
was a reduction from an N squared to an 2 times N relationship. However, this cannot be right. In fact,
the number of countries is not “N” for this problem. “N” is the number of outstanding orders, a number
that is likely in a practical system to be much larger than the number of countries. More critically, the
number of countries is essentially fixed. Yes, the number of countries in the world change from time to
time. But, of course, this is not the same degree of change as order records in an order entry system. In
fact, the currency table is part of the Order(1) storage. The choice between 2 times N and N squared
should be based on whatever is operationally simpler.
Perform this test to know what “N” really is: If your department merged with a department of the same
size, doing the same job, which storage requirements would double? It is these factors that reveal what
the value of “N” is for your circumstances.
And, of course, the detail order record would be one such item. So, where are the savings? The above
recommendations will save 9 bytes per record. If you write the code in RPG, this does not seem like
much. That would be 9 bytes times the number of jobs used to process the incoming records. After all,
there is only one copy of the record in a typical RPG program.
However, one must account for data base. Especially when accessing the records through an index of
some kind, the number of records data base will keep laying about will be proportional to “N” -- the total
number of outstanding orders. In Java, this can be even more clear-cut. In some Java programs, one
processes records one at a time, just as in RPG. The most straightforward case is some sort of “search”
for a particular record. In Java, this would look roughly the same as RPG and potentially consume the
same storage.
However, Java can also use the power of the heap storage to build huge networks of records. A custom
sort of some kind is one easy example of this.
In that case, it is easy for Java to contain the summary record and “dozens” of detail records, all at once,
all connected together in a whole variety of ways. If necessary, modern applications might bring in the
entire file for the custom sort function, which would then have a peak size at least as large as the data base
file(s) itself or themselves.
Once you get above a couple hundred records, even in but one application, the storage savings for the
record scrub will swamp the currency table savings. And, since one might have to buy for peak storage
usage, even one application that references thousands of detail records would be enough to tip the scale.
A Short but Important Tip about Data Base
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 20 - General Tips and Techniques
319
Download from Www.Somanuals.com. All Manuals Search And Download.
One thing easily misunderstood is variable length characters. At first, one would think every character
field should be variable length, especially if one codes in Java, where variable length data is the norm.
However, when one considers the internals of data base, a field ought to be ten to twenty bytes long
before variable length is even considered. The reason is, there is a cost of about ten bytes per record for
the first variable length field. Obviously, this should not be introduced to “save” a few bytes of data.
Likewise, the “ALLOCATE” value should be understood (in OS/400 SQL, “ALLOCATE” represents the
minimum amount of a variable record always present). Getting this right can improve performance.
Getting it wrong simply wastes space. If in doubt, do not specify it at all.
A Final Thought About Memory and Competitiveness
The currency storage reduction example remains a good one -- just at the wrong level of granularity.
Avoiding a SQL join that produces N squared records would be an example where the 2 times N
alternative, if available, saves great amounts of storage.
But, more critically, deploying the least amount of Order(N) storage in actual implementation is a
competitive advantage for your enterprise, large or small. Reducing the size of each N in main storage (or
even on disk) eventually means more “things” in the same unit of storage. That is more competitive
whether the cost of main storage falls by half tomorrow or not. More “things” per byte is always an
advantage. It will always be cheaper. Your competitor, after all, will have access to the same costs. The
question becomes: Who uses it better?
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 20 - General Tips and Techniques
320
Download from Www.Somanuals.com. All Manuals Search And Download.
20.4 Hardware Multi-threading (HMT)
Hardware multi-threading is a facility present in several iSeries processors. The eServer i5 models
instead have the Simultaneous Multi-threading (SMT) facility, which are discussed in the SMT white
HMT is mentioned here primarily to compare-and-contrast with the SMT. Moreover, several system
facilities operate slightly differently on HMT machines versus SMT machines and these differences need
some highlighting.
HMT Described
Broadly, HMT exploited the concept that modern processors are often quite fast relative to certain
memory accesses.
Without HMT, a modern CPU might spend a lot of time stalled on things like cache misses. In modern
machines, the memory can be a considerable distance from the CPU, which translates to more cycles per
fetch when a cache miss occurs. The CPU idles during such accesses.
Since many OS/400 applications feature database activity, cache misses often figured noticeably in the
execution profile. Could we keep the CPU busy with something else during these misses?
HMT created two securely segregated streams of execution on one physical CPU, both controlled by
hardware. It was created by replicating key registers including another instruction counter. Generally,
there is a distinction between the one physical processor and its two logical processors. However, for
HMT, the customer seldom sees any of this as the various performance facilities of the system continue to
report on a physical CPU basis.
Unlike SMT, HMT allows only one instruction stream to execute at a time. But, if one instruction stream
took a cache miss, the hardware switches to the other instruction stream (hence, "hardware
multi-threading" or, some say, "hardware multi-tasking"). There would, of course, be times when both
were waiting on cache misses, or, conversely, applications that hardly ever had misses. Yet, on the
whole, the facility works well for OS/400 applications.
The system value QPRCMLTTSK was introduced in order to turn HMT on or off. This could only take
affect when the whole system was IPLed, so (for clarity) one should change the system value itself
shortly before a full system IPL. The default is to have it set on ('1').
Generally, in most commercial workloads, HMT enabled ('1') gives gains in throughput between 10 and
25 percent, often without impact to response time.
In rare cases, HMT results in losses rather than gains.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 20 - General Tips and Techniques
321
Download from Www.Somanuals.com. All Manuals Search And Download.
HMT and SMT Compared and Contrasted
Some key similarities and differences are:
HMT Feature
SMT Feature
yHMT is can be turned on and off only by a
whole system IPL.
ySMT can be turned on and off dynamically at
any time. No IPL required
yAll partitions have the same value for HMT
ySMT, because it is more dynamic, the SMT state
need not be identical across partitions.
ySMT allows multiple streams of execution
simultaneously.
yHMT executes only one instruction stream at a
time.
yCPU utilization measurements are not greatly
affected by HMT.
ySMT complicates the question of measuring
CPU utilization.
ySystem performance counters and CPU
utilization values continue to be reported on a
physical CPU basis.
ySMT machines continue to report data on a
physical processor basis, but some of the
measurements are harder to interpret (reporting on
a logical CPU basis would be no better).
yHMT operation is controlled by the system value ySMT has three values for QPRCMLTTSK ("0"
QPRCMLTTSK ("1" means active, "0" means
inactive)
for off, also called "ST mode", "1" for on, and "2"
for "controlled" where OS/400 decides,
dynamically, whether to be in ST or SMT mode.
ySMT can allow QPRCMLTTSK to change at any
time.
yHMT needs a full IPL for the change to
QPRCMLTTSK to be activated.
yHMT typically improves throughput by 10 to 25 ySMT can improve throughput up to 40 per cent,
per cent.
in rare cases, higher.
Models With/Without HMT
Not all prior models have HMT. In fact, some recent models have neither HMT nor SMT.
The following models have HMT available:
y
270, 800, 810, 820, 830, 840
The following have neither SMT nor HMT:
825, 870, 890
y
Earlier models than the 270 or 820 series (e.g. 170, 7xx, etc.) did not have either HMT nor SMT.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 20 - General Tips and Techniques
322
Download from Www.Somanuals.com. All Manuals Search And Download.
20.5 POWER6 520 Memory Considerations
Because of the design of the Power6 520 system, there are some key factors with the memory
subsystem that one should keep in mind when sizing this system. The Power6 520, unlike the Power6
570, has no L3 cache, which does have an effect on memory sensitive workloads, like Java applications
for instance. Having no L3 cache makes memory speed, or the bandwidth rating in megabytes per
second, even more critical for memory sensitive workloads. The Power6 520 has 8 memory DIMM
slots, which are positioned in groups of four behind each of the Power6-SCM modules and each group of
four will be referred to as a quad for this discussion. The available number of active memory slots
depends on the Processor Feature Code of the system.
When only one SCM module is installed, only one quad of memory is active and all slots must
contain DIMMs of the same size and speed. When two SCM modules are installed (except in the case of
the 4-way capable Capacity-on-Demand model with only one module enabled, which activates both
memory quads), both quads of memory are active. When both are active, it is important to note that the
first and second modules are separate and independent. So this means that even though the size and speed
of memory DIMMs behind each module have to be the same, the size and speed of memory DIMMs
behind the first module do not have to match the memory DIMMs behind the second module. For
DIMMs ranging from 512 MB to 4 GB, the speed is 667 Mbps (PC2-5300). The 8 GB DIMMs are
different however, with a speed of 400 Mbps (PC2-3200). This decrease in speed for 8 GB DIMMs can
have a negative effect on performance with memory sensitive workloads. This effect, along with the fact
that there is no L3 cache, should be considered when planning for current and future growth and also
LPAR configurations.
To test the performance difference of 4 GB DIMMs versus 8 GB DIMMs (essentially testing the
difference in speed) and what occurs when the DIMMs of different sizes are “mixed”, we used a Power6
520 (9408-M25) F/C 5635 (a fully enabled system) with one partition using all the available resources.
“Mixed” here means the DIMMs in one quad behind a module are 4 GB and the DIMMs in the opposite
quad are 8 GB. We started with a baseline consisting of all 4 GB DIMMs behind both modules, which is
the best performing case. Then switched to all 8 GB DIMMs behind both modules and ran the same tests
again. The performance of the workloads that were memory sensitive followed suit with the decrease in
memory speed, which was expected. This is very important to consider when considering the amount of
memory needed for a system. Deciding to go with the larger capacity 8 GB DIMMs does reduce your
memory’s speed and can have a negative performance effect on your workload. Of course each workload
will behave differently based on its sensitivity to memory.
Next we placed 4 GB DIMMs behind one module and 8 GB DIMMs behind the opposite module.
Because the one module had the faster 4 GB DIMMs behind it, the same workloads produced results that
ranged between the best case, all 4 GB DIMMs, and the worst case, all 8 GB DIMMs. Again, we used
only one partition that utilized all the available resources, but there are other factors to consider when
using LPAR.
LPAR, or Logical Partitioning, increases flexibility, enabling selected system resources like
processors, memory and I/O components to be utilized by various partitions, either in a shared or
dedicated environment, on the same system. In the “mixed” environment previously described, it is
possible to have one partition utilizing memory on 4 GB DIMMs and a second partition, configured with
exactly the same amount of resources, utilizing memory on 8 GB DIMMs. This can cause an application
to have different performance characteristics on the partitions. It is also possible for partitions to be
assigned a mix of memory from different DIMMs, depending on how the memory is allocated at partition
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 20 - General Tips and Techniques
323
Download from Www.Somanuals.com. All Manuals Search And Download.
activation time. This means that a partition that requires 4 GB of memory could be assigned 2 GB from
the quad with 4 GB DIMMs and the other 2 GB from the quad with 8 GB DIMMs. This too can cause an
application to have different performance characteristics on partitions configured with exactly the same
amount of resources.
When system planning for the Power6 520, there are a number of memory related factors that
should be considered, each of which can affect performance of memory sensitive workloads. First and
foremost, the Power6 520 has no L3 cache. Having no L3 cache makes memory speed even more critical
for memory sensitive workloads. If memory capacity needs can be achieved with 4 GB DIMMs or
smaller, this will give the best memory speed. If memory capacity needs result in mixing 4 GB and 8 GB
DIMMs, that option is available, but can have a negative performance effect on memory sensitive
workloads. Mixing DIMMs can also cause partitions configured with exactly the same amount of
resources to have varying performance characteristics. Since the Power6 520 only has 8 available
memory DIMM slots, memory capacity can be an issue. If memory capacity is a concern, the 8 GB
DIMMs will increase the capacity, but result in a slower memory speed.
20.6 Aligning Floating Point Data on Power6
The PowerPC architecture specifies that storage operands ought to be appropriately aligned. In many
cases, there is a slight performance benefit and the compiler knows this, In other cases, the operands must
be aligned for functional reasons. For example:
1.
2.
3.
Pointers used by IBM i must be aligned on a 16-byte boundary,
PowerPC instructions in a program must be word aligned,
Binary Floating-Point operands ought to be word-aligned and should not cross a page boundary.
Other operand types allow generally free alignment of the data.
Although such a specification exists for Binary Floating-Point operands, the processor designs have the
option of allowing free alignment of Binary Floating-Pointer operands as well. The Power6 processors,
however, took a different approach. If either a 4-byte short form or 8-byte long form are not
word-aligned, the Power6 processor will produce an alignment interrupt. Fortunately, the IBM i
alignment interrupt handler recognizes this and does allow programs to successfully execute even if the
Binary Floating-Point operand is not word aligned. However, this emulation of each such operation
comes at a very considerable impact to the performance of such floating-point load and store instructions.
While an appropriately aligned floating-point load or store can execute extremely rapidly, the emulation
when misaligned can take thousands of times longer. If such accesses are rare compared to the remainder
of the function being provided, this emulation may not matter to the performance of the application. As
such floating-point accesses become more frequent, this emulation alone can account for most of the time
spent within an application.
The compiler does attempt to assure that such Binary Floating-Point operands are at least word aligned.
However, there are ways that the compiler's intent can be over-ridden. Packing data which includes
floating-point variables within a structure may result in this occurring; packing of structures can
occasionally save some space in memory. For this reason, it is prudent to assure that floating-point
variables are allowed to be at least word aligned. If this can not be done, it may be appropriate to first
copy the floating-point variables to a local aligned variable in storage; this may need to be done via an
explicit move operation which is unaware of the type of the data for if the type is known; without this the
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 20 - General Tips and Techniques
324
Download from Www.Somanuals.com. All Manuals Search And Download.
floating-point data may be copied using the floating-point loads and store, resulting in an alignment
interrupt.
As an example, consider the following structures, one specifying "packed" and the other allowed to be
aligned per the compiler. For example:
struct FPAlignmentStruct Packed
{
long FloatingPointOp1;
char ACharacter;
long FloatingPointOp2; // Byte aligned; Can result in alignment interrupt.
}
struct FPAlignmentStructNormal
{
// Allows for preferred alignment
long FloatingPointOp1;
char ACharacter;
long FloatingPointOp2; // Compiler padding added.
}
The first of these structures uses packing in order to minimize the amount of storage used. Here the
structure consumes exactly 17 bytes, 8 each for the two floating-point values and one byte for the
character. Assuming that the first is doubleword aligned as preferred, the second floating-point variable
will be aligned on a doubleword+1 boundary. Each access of this second floating-point variable will
result in an interrupt on Power6 processors.
The second of these structures allows the compiler to assure preferred alignment. Here the structure
consumes exactly 24 bytes. The extra 7 bytes over the first comes from the compiler adding padding of
seven bytes after the character variable in order to assure that the second floating-point variable is
doubleword aligned.
If minimal storage is nonetheless required, there is another technique which will assure preferred
alignment and minimal storage. This is accomplished by packaging the larger variables first as in the
following example:
struct FPAlignmentStructNormal
{
long FloatingPointOp1;
long FloatingPointOp2; // Aligned without padding.
char ACharacter;
}
This structure is also seventeen bytes in size and does assure preferred alignment.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 20 - General Tips and Techniques
325
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 21. High Availability Performance
The primary focus of this chapter is to present data that compares the effects of high availability scenarios
using different hardware configurations. The data for the high availability test are broken down into two
different categories which include Switchable IASP’s, and Geographic Mirroring.
High Availability Switchable Resources Considerations
Switchable IASPs are the physical resource that can be switched between systems in a cluster. A
switchable IASP contains objects, the directories and libraries that contain the objects, and other object
attributes such as authorization and ownership attributes.
Geographic Mirroring is a subfunction of cross-site mirroring (XSM) that generates a mirror image of an
IASP on a system, which can be geographically distant from the originating site.
21.1 Switchable IASP’s
There are three different switchover/failover scenarios that can occur in a switchable IASP
environment.
Switchover: A cluster event where the primary database server or application server switches
over to a backup system due to a manual intervention from the cluster management interface.
Failover: A cluster event where the primary database server or application server automatically
switches to a backup system due to the failure of the primary server
Partition: A cluster event where communication is lost between one or more nodes in the cluster
and a failure of the lost nodes cannot be confirmed. When a cluster partition condition is
detected, cluster resource services limits the types of actions that you can perform on the nodes
in the cluster partition.
NOTE: Failover performance is similar to switchover performance and therefore the workload
was only run for switchover performance.
Workload Description
Switchable IASP’s using hardware resources
y
Active Switchover - For an active switchover, the workload consists of bringing up a database
workload on the IASP until the desired number of jobs are running on the system. Once the
workload is stabilized the CHGCRGPRI(Change Cluster Resource Group Primary) command is
issued from the command prompt. Switching time is measured from the time the CHGCRGPRI
command is issued on the primary system until the new primary system’s IASP is available. The
CHGCRGPRI command ends all the jobs in the susbsystems that are using the IASP and thus
time depends heavily on how many jobs are active on the IASP at the time the command is
issued.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 21 High Availability Performance
326
Download from Www.Somanuals.com. All Manuals Search And Download.
·
·
Inactive switchover - The switching time is measured from the point at which the CHGCRGPRI
command is issued from the primary system which has no work until the IASP is available on the
new primary system.
Partition - An active partition is created by starting the database workload on the IASP. Once the
workload is stabilized an option 22(force MSD) is issued on the panel. Switching time is
measured from the time the MSD is forced on the primary side until new primary node varies on
the IASP.
Workload Configuration
The wide variety of hardware configurations and software environments available make it
difficult to characterize a ‘typical’ high availability environment. The following section
provides a simple description of the high availability test environment used in our lab.
System Configuration
Hardware Configuration
System A
Cabling Map
System B
870
Model
870
7421
Feature code
Interactive Code
Processor
CPW
7421
7421
2486
7421
System ASP
2486
22000
22000
# of Arms in
System ASP
Speed of Dasd
Type of IOA's
Size of Dasd
Memory
50
15k
2757
35 Gig
64Gig
50
15k
2757
35 Gig
64 Gig
iASP
iASP
System ASP
# of Arms
180
15k
Speed of Dasd
Type of IOA's
Size of Dasd
2757
35 Gig
HSL Line
870 Cluster HSL Cabling Map
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 21 High Availability Performance
327
Download from Www.Somanuals.com. All Manuals Search And Download.
Switchover Measurements
NOTE: The information that follows is based on performance measurements and analysis done in the
Server Group Division laboratory. Actual performance may vary significantly from these tests.
Switchable IASP’s using Hardware Resources
Time Required to Switch the IASP using Hardware Resources
Inactive Switchovers
4:31
Active Switchovers
10:19
Active Partitions
6:55
Time(Minutes)
Switchover Tips
When planning an iSeries availability solution consider the characteristics of IASPs, as well as their
advantages and disadvantages. For example, consider these statements regarding switched disks or IASPs
when determining their value in an availability solution:
y
For a faster vary on, keep the user-ID(UID) and group-ID(GID) of user profiles that own objects
on the IASP the same between nodes of the cluster group. Having different UID’s lengthens the
vary on time significantly.
y
y
y
The time to vary on an IASP during the switching process depends on the number of objects on
the IASP, and not the size of the objects. If possible, keep the number of objects small.
The number of devices in a tower affects switchover time. A larger number of devices in a
switchable resource increases switchover time because devices must be reset.
Keep the number of database objects in SYSBAS low on both systems. Larger number of objects
in SYSBAS can slow the switchover.
21.2 Geographic Mirroring
A variety of scenarios exist in Cross-Site Mirroring that could be tested for performance. The best
representative scenarios for the majority of our customers was measured.
With Geographic Mirroring we assessed the performance of the following components:
Synchronization: The geographic mirroring processing that copies data from the production copy to the
mirror copy. During synchronization the mirror copy contains unusable data. When synchronization is
completed, the mirror copy contains usable data.
Three resume priority options exist which may affect synchronization performance and time. Resume
priority of low, medium, and high for geographic mirroring will affect the CPU utilization and the speed
at which data is transferred. The default value is set at medium. A system set at high will transfer data
faster and consume more CPU than a lower setting. Your choice will depend on how much time and CPU
you want to allocate for this synchronization function.
Switchable IASP’s using Geographic Mirroring: Refer to the switchable IASP’s for a description of the
different switchover scenarios that can occur using switchable IASP’s with geographic mirroring.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 21 High Availability Performance
328
Download from Www.Somanuals.com. All Manuals Search And Download.
Active State: In geographic mirroring, pertaining to the configuration state of a mirror copy that indicates
geographic mirroring is being performed, if the IASP is online.
Workload Description
Synchronization: This workload is performed by starting the synchronization process on the source side
from an unsynchronized geographic mirrored IASP. The workload time is measured from the time
geographic mirroring is activated on the source side until the target side has completed synchronization.
Synchronization time is measured using the three resume priority levels of low, medium, and high.
Switchable IASPs using Geographic Mirroring:
•
Active Switchover - The workload consists of bringing up a database workload on the IASP and
letting it run until the desired number of jobs are active on the system. Once the workload is
stabilized and the geographic mirror copy is synchronized the command is issued from the GUI
or the CHGCRGPRI command to change the primary owner of the geographic mirrored copy.
Switchover time is measured from the time the role change is issued from the GUI or the
CHGCRGPRI until the new primary systems IASP is available.
•
•
Inactive Switchover- Once the geographic mirror copy is synchronized the switchover is issued
from the GUI or CHGCRGPRI command. Switchover time is measured from the time at which
the switchover command is issued until the new primary systems IASP is available.
Partition – After the geographic mirror copy is synchronized and the active workload is stabilized
an option 22(force MSD) is issued on the panel. Switchover time is measured from the time the
MSD is forced on the source side until the source node reports a failed status. After the failed
status is reported the commands are issued to perform the switchover of the mirrored copy to a
production copy.
Active State: The workload used was a slightly modified CPW workload for iASP environments.
Initially a baseline without Geographic Mirroring is performed at 70% CPU utilization on a System/User
ASP. The baseline value is then compared to various environment to assess the overhead of Geographic
Mirroring.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 21 High Availability Performance
329
Download from Www.Somanuals.com. All Manuals Search And Download.
Workload Configuration
The wide variety of hardware configurations and software environments available make it difficult to
characterize a ‘typical’ high availability environment and predict the results. The following section
provides a simple description of the high availability test.
Large System Configuration
Hardware Configuration
System A
Cabling Map
System B
870
Model
870
7421
Feature code
Interactive Code
Processor
CPW
7421
7421
2486
7421
iASP
System ASP
2486
22000
22000
# of Arms in
System ASP
Speed of Dasd
Type of IOA's
Size of Dasd
Memory
4 Gigabit Lines
50
15k
2757
35 Gig
64Gig
50
15k
2757
35 Gig
64 Gig
switch
4 Gigabit Lines
iASP
# of Arms
180
15k
2757
iASP
System ASP
Speed of Dasd
Type of IOA's
Size of Dasd
35 Gig
HSL Line
870 Cluster HSL Cabling Map
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 21 High Availability Performance
330
Download from Www.Somanuals.com. All Manuals Search And Download.
Geographic Mirroring Measurements
NOTE: The information that follows is based on performance measurements and analysis done in the
IBM Server Group Division laboratory. Actual performance may vary significantly from this test.
Synchronization on an idle system:
The following data shows the time required to synchronize 1 terabyte of data. This test case could vary
greatly depending on the speed and latency of communication between the two systems. In the following
measurements, the same switch was used for both the source and target systems. Also, large objects were
synchronized were synchronized, which tend to synchronize faster than small objects.
Time Required in Hours to Synchronize 1 Terabyte of Data using High Priority in Asynchronous mode
1 Gigabit Line
2.75
2 Gigabit Lines
1.4
3 Gigabit Lines
1.25
4 Gigabit Lines
1.1
Time(Hours)
*This case represents best case scenario. An environment with objects ¼ the size of the objects used in
this test caused synchronization times 4x’s larger.
Effects of Resume Priority Settings on Synchronization of 1 Terabyte of data using 4 gigabit lines in
Asynchronous Mode.
Low
96:00
9%
Medium
75:00
12%
High
63:00
16%
18%
Time(Minutes)
Source System CPU Overhead
Target System CPU Overhead
12%
16%
*This case represents best case scenario. An enviroment with objects ¼ the size of the objects used in this
test caused synchronization times 4x’s larger.
Switchable Towers using Geographic Mirroring:
The following data shows the time required to switch a geographic mirrored IASP that is synchronized
from the source system to the target system.
Time Required to Switch Towers using Geographic Mirroring using Asynchronous Mode
Inactive Switchovers
6:00
Active Switchovers
6:00
Time(Minutes)
Active State:
The following measurements show the CPU overhead from an System/User ASP baseline on the source
system.
CPU Overhead caused by Geographic Mirroring
Asynchronous
Asynchronous
Synchronous
Geographic Mirroring
Synchronization Stage
19%
Synchronous
Geographic
Mirroring
24%
Geographic Mirroring Geographic
Synchronization Stage Mirroring
Source System CPU Overhead 19%
Target System CPU Overhead 13%
24%
13%
13%
13%
•
•
•
Geographic Mirroring Synchronization Stage: The number reflects the amount of CPU utilized
while in the synchronization mode on a 1-line system.
Asynchronous Geographic Mirroring: The number reflects the overhead caused by mirroring in
asynchronous mode using 1-line system and running the CPW workload.
Synchronous Geographic Mirroring: The number reflects overhead caused by mirroring in
synchronous mode using 1-line and running the CPW workload
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 21 High Availability Performance
331
Download from Www.Somanuals.com. All Manuals Search And Download.
Geographic Mirroring Tips
•
For a quicker switchover time, keep the user-ID (UID) and group-ID (GID) of user profiles that
own objects on the IASP the same between nodes of the cluster group. Having different UID’s
lengthens vary on times.
•
•
Geographic mirroring is optimized for large files. A large number of small files will produce a
slower synchronization rate.
The priority settings available in the disk management section of iSeries navigator can improve
the speed of the synchronization process. The tradeoff for faster speed however is a higher CPU
utilization and could possibly degrade the applications running on the system during the
synchronization process.
•
•
Multiple TCP lines should be configured using TCP routes. Failure to use TCP routes will lead
to a single line on the target side to be flooded with data. For more information look on the IBM
InfoCenter.
If geographic mirroring is not being used, geographic mirroring should not be configured.
Configuring geographic mirroring without actually mirroring your data consumes up to 5% extra
CPU.
•
•
Increasing the number of lines will increase performance and reliability
Place the journal in an IASP separate from the database to help the synchronization process
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Chapter 21 High Availability Performance
332
Download from Www.Somanuals.com. All Manuals Search And Download.
Chapter 22. IBM Systems Workload Estimator
22.1 Overview
The IBM Systems Workload Estimator (a.k.a., the Estimator or WLE), located at:
http://www.ibm.com/systems/support/tools/estimator, is a web-based sizing tool for System i, System p,
and System x. You can use this tool to size a new system, to size an upgrade to an existing system, or to
size a consolidation of several systems. The Workload Estimator allows measurement input to best reflect
your current workload and provides a variety of built-in workloads to reflect your emerging application
requirements. Virtualization can be reflected in the sizing to yield a more robust solution, by using
various types of partitioning and virtual I/O. The Workload Estimator will provide current and growth
recommendations for processor, memory, and disk that satisfy the overall customer performance
requirements.
The Estimator supports sizings dealing with multiple systems, multiple partitions, multiple operating
systems, and multiple time intervals. The Estimator also provides the ability to easily do multiple sizings.
These features can be coordinated by using the functions on the Workload Selection screen.
The Estimator will recommend the system model including processor, memory, and DASD requirements
that are necessary to handle the overall workload with reasonable performance expectations. In the case of
System i5™, the Estimator may also recommend the 5250 OLTP feature or the Accelerator feature. To
use the Estimator, you select one or more workloads and answer a few questions about each workload.
Based on the answers, the Estimator generates a recommendation and shows the predicted CPU utilization
of the recommended system in graphical format. The results can be viewed, printed, or generated in
Portable Document Format (PDF). The visualize solution function can be used to better understand the
recommendation in terms of time intervals and virtualization. The Estimator can also be optionally linked
to the System Planning Tool so that the configuration and validation may continue.
Sizing recommendations from the Estimator are based on processing capacity, which reflect the system's
overall ability to handle the aggregate transaction rate. Again, this recommendation will yield processor,
memory, and DASD requirements. Other aspects of sizing must also be considered beyond the scope of
this tool. For example, to satisfy overnight batch windows or to deal with single-threaded applications,
there may be additional unique hardware requirements that would allow adequate completion time. Also,
you may need to increase the overall DASD recommendation to ensure that there is enough space to
satisfy the overall storage requirements.
Sizing recommendations start with benchmarks and performance measurements based on well-defined,
consistent workloads. For the built-in workloads in the Estimator, measurements have been done with
numerous systems to characterize the workloads. Most of those workloads have parameters that allow
them to be tailored to best suit the customer environment. This, again, is based on measurements and
feedback from customers and Business Partners. Keep in mind, however, that many of these technologies
are constantly evolving. IBM will continue to refine these workloads and sizing rules of thumb as IBM
and our customers gain more experience.
As with every performance estimate (whether a rule of thumb or a sophisticated model), you always need
to treat it as an estimate. This is particularly true with robust IBM systems that offer so many different
capabilities where each installation will have unique performance characteristics and demands. The
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 22 - Workload Estimator
333
Download from Www.Somanuals.com. All Manuals Search And Download.
typical disclaimers that go with any performance estimate ("your experience might vary...") are especially
true. We provide these sizing estimates as general guidelines only.
22.2 Merging PM for System i data into the Estimator
The Measured Data workload of the Estimator is designed to accept data from various data sources. The
most common ones are the PM for System i™ and PM for System p™. These are two tools that are tools
available for the IBM System i™ and IBM System p™ respectively. These tools assist with many of the
functions associated with capacity planning and performance analysis -- automatically. Either one will
collect various data from your system that is critical to sizing and growth estimation. This data is then
consolidated and sent to the Estimator. The Estimator will then use monthly, or weekly, statistics recorded
from your system and show the system performance over time. The Estimator can use this information to
more accurately determine the growth trends of the system workload.
The PM data is easily merged into the Estimator while viewing your PM graphs on the web. To view
your PM iSeries graphs on the web, go to http://www.ibm.com/eserver/iseries/pm. Choose the ‘click here
to view your online reports’ button.
Follow these instructions to merge the PM for System i data into the Estimator:
1. Enter your user id/password on the PM web site
2. Choose the ‘Size my next upgrade’ button.
Your PM data is then passed into the Estimator. If this is your first time using PM data with the
Estimator, it is recommended that you take a few minutes to read the Measured Workload Integration
tutorial, found on the help/tutorial tab in the Estimator.
22.3 Estimator Access
The intent is to provide a new version of the IBM Systems Workload Estimator 3 to 4 times per
year. Each version includes an update message after approximately 3 months.
The IBM Systems Workload Estimator is available in two formats, on-line and as a download. Both are
preferred.
It is also highly recommended that there should be involvement of IBM Sales or IBM Business Partners
before making any purchasing decisions based on the results obtained from the Estimator.
The approximate size requirements are about 16.5 MB of hard disk space for the Workload Estimator and
60 MB for the server setup. A rough expectation of the time required to install the entire tool (server and
Workload Estimator) is 20 minutes. A rough estimate of the time required for installing the update to
Workload Estimator only, assuming the server was set up previously, is about 5 minutes.
22.4 What the Estimator is Not
The Estimator focuses on sizing based on capacity for processor, memory, and DASD. The Estimator
does not recommend network adapters, communications media, I/O adapters, or configuration topology.
The Estimator is not a configurator nor a configuration validation tool. The Estimator does not take into
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 22 - Workload Estimator
334
Download from Www.Somanuals.com. All Manuals Search And Download.
account features like detailed journaling, resource locking, single-threaded applications, time-limited
batch job windows, or poorly tuned environments.
The Estimator is a capacity sizing tool. Even though it does not represent actual transaction response
times, it does adhere to the policy of giving recommendations that abide by generally accepted utilization
thresholds. This means that the recommendation will likely have acceptable response times with the
solution being prior to the knee of the curve on the common throughput vs. response time graph.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Chapter 22 - Workload Estimator
335
Download from Www.Somanuals.com. All Manuals Search And Download.
Appendix A. CPW and CIW Descriptions
"Due to road conditions and driving habits, your results may vary." "Every workload is different." These
are two hallmark statements of measuring performance in two very different industries. They are both
absolutely correct. For iSeries and AS/400 systems, IBM has provided a measure called CPW to represent
the relative computing power of these systems in a commercial environment. The type of caveats listed
above are always included because no prediction can be made that a specific workload will perform in the
same way that the workload used to generate CPW information performs.
Over time, IBM analysts have identified two sets of characteristics the appear to represent a large number
of environments on iSeries and AS/400 systems. Many applications tend to follow the same patterns as
CPW - which stands for Commercial Processing Workload. These applications tend to have many jobs
running brief transactions in an environment that is dominated by IBM system code performing database
operations. Other applications tend to follow the same patterns as CIW - which stands for Compute
Intensive Workload. These applications tend to have somewhat fewer jobs running transactions which
spend a substantial amount of time in the application, itself. The term "Compute Intensive" does not mean
that commercial processing is not done. It simply means that more CPU power is typically expended in
each transaction because more work is done at the application level instead of at the IBM licensed internal
code level.
A.1 Commercial Processing Workload - CPW
The CPW rating of a system is generated using measurements of a specific workload that is maintained
internally within the iSeries Systems Performance group. CPW is designed to evaluate a computer system
and associated software in the commercial environment. It is rigidly defined for function, performance
metrics, and price/performance metrics. It is NOT representative of any specific environment, but it is
generally applicable to the commercial computing environment.
y
What CPW is
Test of a range of data base applications, including simple and medium complexity updates,
simple and medium complexity inquiries, realistic user interfaces, and a combination of
interactive and batch activities.
Test of commitment control
Test of concurrent data access by large numbers of users running a single group of programs.
Reasonable approximation of a steady-state, data base oriented commercial application.
y
y
What CPW is not:
An indication of the performance capabilities of a system for any specific customer situation
A test of "ad-hoc" (query) data base performance
When to use CPW data
Approximate product positioning between different AS/400 models where the primary
application is expected to be oriented to traditional commercial business uses (order entry,
payroll, billing, etc.) using commitment control
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Appendix A - CPW and CIW Descriptions
336
Download from Www.Somanuals.com. All Manuals Search And Download.
CPW Application Description
The CPW application simulates the database server of an online transaction processing (OLTP)
environment. Requests for transactions are received from an outside source and are processed by
application service jobs on the database server. It is based, in part, on the business model from
benchmarks owned and managed by the Transaction Processing Performance Council. However, there are
substantive differences between this workload and public benchmarks that preclude drawing any
correlation between them. For more information on public benchmarks from the Transaction Processing
Specific choices were made in creating CPW to try to best represent the relative positioning of iSeries and
AS/400 systems. Some of the differences between CPW and public benchmarks are:
y
The code base for public benchmarks is constantly changing to try to obtain the best possible results,
while an attempt is made to keep the base for CPW as constant as possible to better represent relative
improvements from release to release and system to system.
y
y
Public benchmarks typically do not require full security, but since IBM customers tend to run on
secure systems, Security Level 50 is specified for the CPW workload
Public benchmarks are super-tuned to obtain the best possible results for that specific benchmark,
whereas for CPW we tend to use more of the system defaults to better represent the way the system is
shipped to our customers.
y
Public benchmarks can use different applications for different sized systems and take advantage of all
of the resources available on a particular system, while CPW has been designed to run as the same
application at all levels with approximately the same disk and memory resources per simulated user
on all systems
y
y
Public benchmarks tend to stress extreme levels of scaling at very high CPU utilizations for very
limited applications. To avoid misrepresenting the capacity of larger systems, CPW is measured at
approximately 70% CPU utilization.
Public benchmarks require extensive, sophisticated driver and middle tier configurations. In order to
simplify the environment and add a small computational component into the workload, CPW is
driven by a batch driver that is included as a part of the overall workload.
The net result is an application that IBM believes provides an excellent indicator of transaction processing
performance capacity when comparing between members of the iSeries and AS/400 families. As
indicated above, CPW is not intended to be a guarantee of performance, but can be viewed as a good
indicator.
The CPW application simulates the database server of an online transaction processing (OLTP)
environment. There are five business functions of varying complexity that are simulated. These
transactions are all executed by batch server jobs, although they could easily represent the type of
transactions that might be done interactively in a customer environment. Each of the transactions interacts
with 3-8 of the 9 database files that are defined for the workload. Database functions and file sizes vary.
Functions exercised are single and multiple row retrieval, single and multiple row insert, single row
update, single row delete, journal, and commitment control. These operations are executed against files
that vary from 100's of rows to 100's of millions of rows. Some files have multiple indexes, some only
one. Some accesses are to the actual data and some take advantage of advanced functions such as
index-only access.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Appendix A - CPW and CIW Descriptions
337
Download from Www.Somanuals.com. All Manuals Search And Download.
A.2 Compute Intensive Workload - CIW
Unlike CPW values, CIW values are not derived from specific measurements of a single workload. They
are modeled projections which are based upon the characteristics of internal workloads such as Domino
workloads and application server environments such as can be found with SAP or JDEdwards
applications. CIW is meant to depict a workload that has the following characteristics:
y
y
y
The majority of the system procession time is spent in the user (or software supplier) application
instead of system services. For example, a Domino Mail and Calendar workload might spend 80% of
the total processing time outside of OS/400, while the CPW workload spends most of its time in
OS/400 database code.
Compute intensive applications tend to be considerably less I/O intensive than most commercial
application processing. That is, more time is spent manipulating each piece of data than in a
CPW-like environment.
What CIW is
Indicator of relative performance in environments where a significant amount of transaction time
is spent computing within the processor
Indicator of some of the differences between this type of workload and a "commercial" workload
y
y
What CIW is not:
An indication of the performance capabilities of a system for any specific customer situation
A measure of pure numeric-intensive computing
When to use CIW data
Approximate product positioning between different iSeries or AS/400 models where the primary
application spends much of its time in the application code or middleware.
What guidelines exist to help decide whether my workload is CIW-like or CPW-like?
An absolute assignment of a workload is difficult without doing a very detailed analysis. The general
rules listed here are probable placements, but not absolute guarantees. The importance of having the two
measures is to show that different workloads react differently to changes in the configuration. IBM's
Workload Estimator tries to take some of these differences into account when projecting how a workload
will fit into a new system (see Appendix B.)
In general, if your application is online transaction processing (order entry, billing, accounts receivable,
and the like), it will be CPW-like. If there are many, many jobs that spend more time waiting for a user to
enter data than for the system to process it, it is likely to be CPW-like. If a significant part of the
transaction response time is spent in disk and communications I/O, it is likely to be CPW-like. If the
primary purpose of the application is to retrieve, process, and store database information, it is likely to be
CPW-like.
CIW-like workloads tend to process less data with more instructions than CPW-like workloads. If your
application is an "information manipulator" rather than an "information processor", it is probable that it
will be CIW-like. This includes web-servers where much time is spent in generating and sending web
frames and pages. It also includes application servers, where data is received from end-users, massaged
and formatted into transaction requests, and then sent on to another system to actually service the
database requests. If an application is both a "manipulator" and a "processor", experience has shown that
enough time is spent in the manipulation portion of the application that it tends to be the dominant factor
and the workload tends to be CIW-like. This is especially true of applications that are written using
"modern" tools like Java, WebSphere Application Server, and WebSphere Commerce Suite. Another
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Appendix A - CPW and CIW Descriptions
338
Download from Www.Somanuals.com. All Manuals Search And Download.
category that often fits into the CIW-like classification is overnight batch. Even though batch jobs often
process a great deal of database work, there are relatively few jobs which means there is little switching
of jobs from processor to processor. As a result, overnight batch data processing jobs sometimes act more
like compute-intensive jobs.
What are the differences in how these workloads react to hardware configurations?
When you upgrade your system, the effectiveness of the upgrade may be affected by the type of workload
you are running. CPW-like workloads tend to respond well to upgrades in memory and to processor
upgrades where the increase in MHz of the processor is accompanied by improvements in the processor
cache and memory subsystem. CIW-like workloads tend to respond more to pure MHz improvements and
to increasing the number of processors. You may experience both kinds of improvements. For example,
there may be a difference between the way the daytime OLTP application reacts to an upgrade and the
way the nighttime batch application reacts.
In a CPW-type workload, a lot of data is moved around and a wide variety of instructions are executed to
manage the data. Because the transactions tend to be fairly short and because tasks are often waiting for
new data to be brought from disk, processors are switched rapidly from task to task to task. This type of
workload runs most efficiently when large amounts of the data it must process are readily available. Thus,
it reacts favorably to large memory and large processor caches. We say that this type of workload is
cache-sensitive. The bigger and faster the cache is, the more efficiently the workload runs (Note that
cache is not an orderable feature. For iSeries, we attempt to balance processor upgrades with cache and
memory subsystem upgrades whenever possible.) Increasing the MHz of the processor also helps, but
you should not expect performance to scale directly with MHz unless other aspects of the system are
equally improved. An example of this scenario can be found in V4R1, when the Model 640 systems were
introduced as an upgrade path to Model 530 systems. The Model 640 systems actually had a lower MHz
than the Model 530s, yet because they had much more cache and a much stronger memory
implementation, they delivered a significantly higher CPW rating. Another aspect of CPW-type
workloads is a dependency on a strong memory I/O subsystem. iSeries systems have always had a strong
memory subsystem, but with the model 890, that subsystem was again significantly enhanced. Thus,
CPW-like workloads see an additional benefit when moving to these systems.
In a CIW-type workload, the situation is somewhat different. Compute intensive workloads tend to
processes less data with more instructions. As a consequence, the opportunity for both instruction and
data cache hits is much higher in this kind of workload. Furthermore, because the instruction path length
tends to be longer, it is likely that processors will switch from task to task much less often. Having some
cache is very important for these workloads, but having a big cache is not nearly as important as it is for
CPW-like workloads. For systems that are designed with enough cache and memory to accommodate
CPW-like work, there is usually more than enough to assist CIW-like work and so an increase in MHz
will tend to have a more dramatic effect on these workloads than it does on CPW-like work. CIW-like
workloads tend to be MHz-sensitive. Furthermore, since tasks stay resident on individual processors
longer, we tend to see better scaling on multiprocessor systems.
CPW and CIW ratings for iSeries systems can be found in Appendix D of this manual.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Appendix A - CPW and CIW Descriptions
339
Download from Www.Somanuals.com. All Manuals Search And Download.
Appendix B. System i Sizing and Performance Data Collection Tools
The following section presents some of the alternative tools available for sizing and capacity planning.
(Note: There are products from vendors not included here that perform similar functions.) All of the
tools discussed here support the current range of System i products, and include the capability to model
logical partitions, partial processors (micropartitions) and server workload consolidation. The products
supplied by vendors other than IBM require usage licenses from the respective vendor. All of these
products depend on performance data collected through Collection Services.
y
Performance Data Collection Services
This tool which is part of the operating system collects system and job performance data which is the
input for many of the sizing tools that are available today. It replaced the Performance Monitor in
V5R1 and provides a more efficient and flexible way to collect performance data.
y
IBM Systems Workload Estimator
The IBM Systems Workload Estimator (a.k.a., the Estimator or WLE) is a web-based sizing tool for
System i, System p, and System x. You can use this tool to size a new system, to size an upgrade to
an existing system, or to size a consolidation of several systems. See Chapter 22 for a discussion and
a link for the IBM Systems Workload Estimator.
y
y
System i Batch Modeling tool STRBCHMDL. (BATCH400)
This is best for MES upgrade sizing where the 'Batch Window' is important. BCHMDL uses
Collection Services data to allow the user to view the batch jobs on a timeline. The elapsed time
components (cpu, cpu queuing, disk, disk queuing, page faulting, etc.) are also available for viewing.
The user can change the jobs or the configuration and run an analysis to determine the effect on batch
runtime. The user can also model the effect of changing a single job into multiple jobs running
PATROL for iSeries - Predict - BMC Software Inc
Users of PATROL for iSeries – Predict interact through a 5250-interface with the performance
database files to develop a capacity model for a system based on performance data collected during a
period of peak utilization. The model is then downloaded to a PC for stand-alone predictive
evaluation. The results show resource utilization and response times in reports and graphics. An
enhancement supports LPAR configurations and the assignment of partial processors to partitions and
attempts to predict the response time impact of virtual processors specifications. You will be able to
find information about this product at http://www.bmc.com
y
Performance Navigator - Midrange Performance Group Inc
Sizing is one of the options included in the Performance Navigator product. Performance data can be
collected on a regular (with the installation of Performance Navigator on the System i) or on an ad
hoc basis. It can also use data prepared by PM for the System i platform. The sizing option is
usually selected after evaluation of summary performance data, and represents a day of data.
Continuous interaction with a System i host is required. A range of graphics present resource
utilization of the system selected. The tool supports LPAR configurations and the assignment of
partial processors to partitions. Performance Navigator presents a consolidated view of a
multi-partition system. You will be able to find information about this product at
For more information on other System i Performance Tools, see the Performance Management web page
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Appendix B - System i Platform Sizing
340
Download from Www.Somanuals.com. All Manuals Search And Download.
B.1 Performance Data Collection Services
Collecting performance data with Collection Services is an operating system function designed to run
continuously that collects system and job level performance data at regular intervals which can be set
from 15 seconds to 1 hour. It runs a number of collection routines called probes which collect data from
many system resources including jobs, disk units, IOPs, buses, pools, and communication lines.
Collection Services is the replacement for the Performance Monitor function which you may have used in
previous releases to collect performance data by running the STRPFRMON command. Collection
Services has been available in OS/400 since V4R4. The Performance Monitor remained on the system
through V4R5 to allow time to switch over to the new Collection Services function.
How Collection Services works
Collection Services has an improved method for storing the performance data that is collected. A system
object called a management collection object (*MGTCOL) was created in V4R4 to store Collection
Services data. The management collection object takes advantage of teraspace support to make it a more
efficient way to store large quantities of performance data. Collection Services stores the data in a single
collection object and supports a release independent design which allows you to move data to a system at
a different release without requiring database file conversions.
A command called CRTPFRDTA (Create Performance Data) can be used to create the database files from
the contents of the management collection object. The CRTPFRDTA command gives you the flexibility
to generate only the database files you need to analyze a specific situation. If you decide that you always
want to generate the database, you can configure Collection Services to run CRTPFRDTA as a
low-priority batch job while data is being collected. Separating the collection of the data from the
database generation, and running the database function at a lower priority are key reasons why Collection
Services is efficient and can collect data from large quantities of jobs and threads at very frequent
intervals. With Collection Services, you can collect performance data at intervals as frequent as every 15
seconds if you need that level of granularity to diagnose a performance problem. Collection Services also
supports collection intervals of 30 seconds, and 1, 5, 15, 30, and 60 minutes.
The overhead associated with collecting performance data is minimal enough that Collection Services can
run continuously, no matter what workload is being run on your system. If Collection Services is run
continuously as designed, you will capture the data needed to analyze and solve many performance
slowdowns before they turn into a serious problem.
Starting Collection Services
You can start Collection Services by using option 2 on the Performance menu (GO PERFORM), the
Collection Services component of iSeries Navigator, the STRPFRCOL command, or the QYPSSTRC
(Start Collector) API. For more details on these options, see Performance under the Systems
Management topic in the latest version of Information Center which is available at
When using the Collection Services component of iSeries Navigator, you will find that it gives you
flexibility to collect only the performance data you are interested in. Collection Services data is
organized into over 20 categories and you have the ability to turn on and off each category or select a
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Appendix B - System i Platform Sizing
341
Download from Www.Somanuals.com. All Manuals Search And Download.
predefined profile containing commonly used categories. For example, if you do not have a need to
monitor the performance of SNADS transaction data on a regular basis, you can choose to turn that
category off so that SNADS transaction data is not collected.
Since Collection Services is intended to be run continuously and trace mode is not, trace mode was not
integrated into the start options of Collection Services. To run the trace mode facility you need to use two
commands; STRPFRTRC (Start Performance Trace) and ENDPFRTRC (End Performance Trace). For
more information on these commands, see Performance under the Systems Management topic in the latest
Information Center which is available at http://www.ibm.com/eserver/iseries/infocenter.
B.2 Batch Modeling Tool (BCHMDL).
BCHMDL is a tool for Batch Window Analysis available for recent systems. Instructions for requesting a
copy are at the end of this description.
BCHMDL is a tool to enable System i batch window analysis to be done using information collected by
Collection Services.
BCHMDL addresses the often asked question: 'What can I do to my system in order to meet my
overnight batch run-time requirements (also known as the Batch Window).'
BCHMDL creates a 'model' from Collection Services performance data. This model will reside in a set of
files named ‘QAB4*’ in the target library. The tool can then be asked to analyze the model and provide
results for various 'what-if' conditions. Individual batch job run-time, and overall batch window
run-times will be reported by this tool.
BCHMDL Output description:
1. Configuration summary shows the current and modeled hardware for DASD and CPU.
2. Job Statistics show the modeled results such as the following: elapsed time, cpu seconds, cpu queuing
seconds (how long the job waited for the processor due to it being in use by other jobs), disk seconds,
disk queuing, exceptional wait time, cpu %busy, etc.
3. Graph of Threads vs. Time of Day shows a 'horizontal' view of all threads in the model. This output
is very handy in showing the relationship of job transitions within threads. It might indicate
opportunities to break threads up to allow jobs to start earlier and run in parallel with jobs currently
running in a sequential order.
4. Total CPU utilization shows a 'horizontal' view of how busy the CPU is. This report is on the same
time-line as the previous Threads report.
After looking at the results, use the change option to make changes to the processor, disk, or to the jobs
themselves. You can increase the total workload by making copies of jobs or by increasing the amount of
work done by any given job. If you have a long running single threaded job, you could model how fast it
would run as 4 multithreaded jobs by making 4 copies but make each job do 1/4th the work.
This tool will be available soon at:
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Appendix B - System i Platform Sizing
342
Download from Www.Somanuals.com. All Manuals Search And Download.
Unzip this file, transfer to your System i platform as a save file and restore library QBCHMDL. Add this
library to your library list and start the tool by using the STRBCHMDL command. Tips, disclaimers,
and general help are available in the QBCHMDL/README file. It is recommended that you work
closely with your IBM Technical Support Representative when using this tool.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Appendix B - System i Platform Sizing
343
Download from Www.Somanuals.com. All Manuals Search And Download.
Appendix C. CPW and MCU Relative Performance Values for System i
This chapter details the relative system performance values:
y
Commercial Processing Workload (CPW). For a detailed description, refer to Appendix A, “CPW
Benchmark Description”. CPW values are relative system performance metrics and reflect the
relative system capacity for the CPW workload. CPW values can be used with caution in a capacity
planning analysis (e.g., to scale CPU-constrained capacities, CPU time per transaction). However,
these values may not appropriately reflect the performance of workloads other CPW because of
differing detailed characteristics (e.g., cache miss ratios, average cycles per instruction, software
contention, I/O characteristics, memory requirements, and application performance characteristics).
The CPW values shown in the tables are based on IBM internal tests. Actual performance in a
customer environment may vary significantly. Use the “IBM Systems Workload Estimator” for
assistance with sizing; please refer to Chapter 22.
y
Mail and Calendar Users (MCU). For a detailed description, refer to Chapter 11, “Domino for
System i”. MCU values can be used to help size Domino environments for POWER5 and prior
hardware. For new models, MCU values are not utilized or provided here.
y
y
Compute Intensive Workload (CIW). For a detailed description, refer to Appendix A. CIW values
are no longer utilized or provided here.
User-based Licensing. Many newer models utilize user-based licensing for i5/OS. For assistance in
determining the required number of user licenses, see the product web pages (for example:
that user-based licensing is not a performance statement or a replacement for system sizing; instead,
user-based licensing only enables appropriate user connectivity to the system. Application
environments differ in their requirements for system resources. Use the “IBM Systems Workload
Estimator” for assistance with sizing based on performance.
y
Relative Performance metric for System p (rPerf). System i systems that run AIX can be expected to
produce the same performance as equivalent System p models given the same memory, disk, I/O, and
workload configurations. The relative capacity of System p is often expressed in terms of rPerf
values. The definition and the performance ratings for System p can be found at:
y
y
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Appendix C CPW, CIW and MCU for System i Platform
344
Download from Www.Somanuals.com. All Manuals Search And Download.
C.1 V6R1 Additions (October 2008)
C.1.1 CPW values for the IBM Power Systems - IBM i operating system
Table C.1.1. CPW values for Power System Models
Processor CPW
Processor Chip Speed
L2/L3 cache (1)
per chip
Model
2 cores
4 cores
8 cores 12 cores 16 cores
Feature
GHz
9850
19400
21600
36200
40300
51500
56800
70000
77600
570 (9117-MMA)
570 (9117-MMA)
7387
7388
4.4
5.0
2x4MB / 32MB
2x4MB / 32MB
11000
*Note: 1. These models have a dedicated L2 cache per processor core, and share the L3 cache between
two processor cores.
2. Memory speed differences account for some slight variations in performance difference
between models.
3. CPW values for Power System models introduced in October 2008 were based on IBM i 6.1
plus enhancements in post-release PTFs.
C.1.2 CPW values for the IBM Power Systems - IBM i operating system
Table C.1.2. CPW values for Power System Models
Processor CPW
Processor Chip Speed L2/L3 cache (1)
Model
4 cores
16200
8 cores 16 cores 24 cores 32 cores
31900 56400 81600 104800
Feature
GHz
per chip
570 (9117-MMA)
7540
4.2
2x4MB / 32MB
*Note: 1. These models have a dedicated L2 cache per processor core, and share the L3 cache between
two processor cores.
2. Memory speed differences account for some slight variations in performance difference
between models.
3. For large partitions, some workloads may experience nonlinear scaling at high system
utilization on these new models.
4. CPW values for Power System models introduced in October 2008 were based on IBM i 6.1
plus enhancements in post-release PTFs.
C.1.3 CPW values for IBM Power Systems - IBM i operating system
Table C.1.3. CPW values for Power System Models
Processor CPW
Processor Chip Speed L2/L3 cache (1)
Model
4 cores
14100
8 cores 16 cores
27600 48500
Feature
GHz
per chip
560 (8234-EMA)
7537
3.6
2x4MB / 32MB
*Note: 1. These models have a dedicated L2 cache per processor core, and share the L3 cache between
two processor cores.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Appendix C CPW, CIW and MCU for System i Platform
345
Download from Www.Somanuals.com. All Manuals Search And Download.
2. Memory speed differences account for some slight variations in performance difference
between models.
3. CPW values for Power System models introduced in October 2008 were based on IBM i 6.1
plus enhancements in post-release PTFs.
C.1.4 CPW values for IBM Power Systems - IBM i operating system
Table C.1.4. CPW values for Power System Models
Processor Chip Speed L2/L3 cache (1) CPU (2)
Processor
CPW
Model
Feature
GHz
per chip
Range
520 (8203-E4A)
520 (8203-E4A)
520 (8203-E4A)
5633
5634
5635
4.2
4.2
4.2
2x4MB / 0MB
2x4MB / 0MB
2x4MB / 0MB
1
2
4
4300
8300
15600
550 (8204-E8A)
550 (8204-E8A)
4965
4966
3.5
4.2
2x4MB / 32MB
2x4MB / 32MB
2 - 8
2 - 8
7750-27600
9200-32650
*Note: 1. These models have a dedicated L2 cache per processor core, and share the L3 cache
between two processor cores.
2. The range of the number of processor cores per system.
3. Memory speed differences account for some slight variations in performance difference
between models.
4. CPW values for Power System models introduced in October 2008 were based on IBM i 6.1
plus enhancements in post-release PTFs.
C.2 V6R1 Additions (August 2008)
C.2.1 CPW values for the IBM Power 595 - IBM i operating system
Table C.2.1. CPW values for Power System Models
Processor CPW
Processor Chip Speed
L2/L3 cache (1)
per chip
64 cores (2)
(2x32)
Model
8 cores 16 cores 24 cores 32 cores
41000 77000 108100 147900
Feature
MHz
595 (9119-FHA)
595 (9119-FHA)
4695
4694
5000
4200
2x4MB / 32MB
2x4MB / 32MB
294700
256200
35500 66400
93800 128000
*Note: 1. These models have a dedicated L2 cache per processor core, and share the L3 cache
between two processor cores.
2. This configuration was measured with two 32-core partitions running simultaneously on a 64
core system
C.3 V6R1 Additions (April 2008)
C.3.1 CPW values for IBM Power Systems - IBM i operating system
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Appendix C CPW, CIW and MCU for System i Platform
346
Download from Www.Somanuals.com. All Manuals Search And Download.
Table C.3.1. CPW values for Power System Models
Processor Chip Speed L2/L3 cache (1) CPU (2)
Processor
CPW
Model
Feature
MHz
per chip
Range
520 (9407-M15)
520 (9408-M25)
550 (9409-M50)
5633
5634
4966
4200
4200
4200
2x4MB / 0MB
2x4MB / 0MB
2x4MB / 32MB
1
4300
1 - 2
1 - 4
4300-8300
4800-18000
*Note: 1. These models have a dedicated L2 cache per processor core, and share the L3 cache
between two processor cores.
2. The range of the number of processor cores per system.
C.3.2 CPW values for IBM BladeCenter JS12 - IBM i operating system
Table C.3.2. IBM BladeCenter models
Processor Chip Speed L2/L3 cache (1)
Processor
CPW (3)
Blade Model
CPUs(2)
Feature
MHz
per chip
JS12 (7998-60X)
52BF
3800
2x4MB / 0 MB 1.8 of 2
7100
*Note: 1. These models have a dedicated L2 cache per processor core, and no L3 cache
2. CPW value is for a 1.8-core partition with shared processors and a 0.2-core VIOS partition
3. The value listed is unconstrained CPW (there is sufficient I/O such that the processor would be
the first constrained resource). The I/O constrained CPW value for a 12-disk configuration is
approximately 1200 CPW (100 CPW per disk).
C.3.3 CPW values for IBM Power Systems - IBM i operating system
Table C.3.3. CPW values for Power System Models
Processor CPW
Processor Chip Speed
L2/L3 cache (1)
per chip
Model
2 cores 4 cores 8 cores 16 cores
Feature
MHz
570 (9117-MMA)
570 (9117-MMA)
5620
3500
4200
2x4MB / 32MB
2x4MB / 32MB
8150
9650
16100 30100
19200 35500
57600
68600
5621/5622
570 (9117-MMA)
7380
4700
2x4MB / 32MB
10800 21200 40100
76900
*Note: 1. These models have a dedicated L2 cache per processor core, and share the L3 cache
between two processor cores.
C.4 V6R1 Additions (January 2008)
C.4.1 IBM i5/OS running on IBM BladeCenter JS22 using POWER6 processor technology
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Appendix C CPW, CIW and MCU for System i Platform
347
Download from Www.Somanuals.com. All Manuals Search And Download.
Table C.4.1. IBM BladeCenter models
L2/L3 cache (1)
per chip
Server Edition Processor Chip Speed
Processor
CPW
Blade Model
CPUs
Feature Feature Feature
MHz
JS22 (7998-61X)
JS22 (7998-61X)
n/a
n/a
n/a
n/a
52BE
52BE
4000
2x4MB / 0 MB 3 of 4 (2)
11040
13800
4000
2x4MB / 0 MB 3.7 of 4 (3)
*Note: 1. These models have a dedicated L2 cache per processor core, and no L3 cache
2. CPW value is for a 3-core dedicated partition and a 1-core VIOS
3. CPW value is for a 3.7-core partition with shared processors and a 0.3-core VIOS partition
C.5 V5R4 Additions (July 2007)
C.5.1 IBM System i using the POWER6 processor technology
Table C.5.1. System i models
CPU (5)
Range
Processor
CPW
L2/L3 cache (1)
per chip
Edition
Feature2
Server
Feature
Processor Chip Speed
Model
MCU(4)
Feature
MHz
i570 (9406-MMA) 4910
i570 (9406-MMA) 4911
i570 (9406-MMA) 4912
5460
5461
5462
7380
7380
7380
4700
4700
4700
2x4MB / 32MB
2x4MB / 32MB
2x4MB / 32MB
1 - 4
2 - 8
5500-21200
10800-40100
20100-76900
12300-47500
24200-89700
45000-172000
4 - 16
i570 (9406-MMA) 4922
i570 (9406-MMA) 4923
i570 (9406-MMA) 4924
7053(3)
7058(3)
7063(3)
7380
7380
7380
4700
2x4MB / 32MB
2x4MB / 32MB
2x4MB / 32MB
1 - 4
1 - 8
5500-21200
5500-40100
10800-76900
12300-47500
12300-89700
24200-172000
4700
4700
2 - 16
*Note: 1. These models have a dedicated L2 cache per processor core, and share the L3 cache
between two processor cores.
2. This is the Edition Feature for the model. This is the feature displayed when you display
the system value QPRCFEAT.
3. Capacity Backup model.
4. Projected values. See Chapter 11 for more information.
5. The range of the number of processor cores per system.
C.6 V5R4 Additions (January/May/August 2006 and January/April 2007)
C.6.1 IBM System i using the POWER5 processor technology
Table C.6.1.1. System i models
Edition
Feature2
L2/L3 cache
per CPU (1)
Accelerator Chip Speed
CPU
Range
Processor
CPW
5250 OLTP
CPW
Model
MCU
Feature
MHz
32 - 64 (8)
32 - 64 (8)
242K(7)- 460K(7)
242K(7)- 460K(7)
131K(7)- 242K(7)
131K(7)- 242K(7)
35800(7)- 242K(7)
35800(7)- 242K(7)
68400(7)- 131K(7)
9406-595 5892
9406-595 5872
9406-595 5891
9406-595 5871
9406-595 5896(4)
9406-595 5876(4)
9406-595 5890
NA
NA
NA
NA
NA
NA
NA
2300
2300
2300
2300
2300
2300
2300
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
108000-216000
108000-216000
Per Processor
0
16 - 32 61000-108000
16 - 32 61000-108000
4 - 32 16000-108000
4 - 32 16000-108000
Per Processor
0
Per Processor
0
8-16
31500-58800
Per Processor
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Appendix C CPW, CIW and MCU for System i Platform
348
Download from Www.Somanuals.com. All Manuals Search And Download.
Table C.6.1.1. System i models
Edition
Feature2
L2/L3 cache
per CPU (1)
Accelerator Chip Speed
CPU
Range
Processor
CPW
5250 OLTP
CPW
Model
MCU
Feature
MHz
68400(7)- 131K(7)
18300(7)- 131K(7)
18300(7)- 131K(7)
9406-595 5870
9406-595 5895(4)
9406-595 5875(4)
NA
NA
NA
2300
2300
2300
1.9/36MB
1.9/36MB
1.9/36MB
8-16
2-16
2-16
31500-58800
8200-58800
8200-58800
0
Per Processor
0
9406-595 7583(5)
NA
1900
1.9/36MB
92000-184000
32 - 64 (8)
213K(7)- 405K(7)
213K(7)- 405K(7)
Per Processor
8)
32 - 64 (
9406-595 7487
9406-595 7486
9406-595 7581(5)
NA
NA
NA
1900
1900
1900
1.9/36MB
1.9/36MB
1.9/36MB
92000-184000 Per Processor
92000-184000
51000-92000
32 - 64 (8)
16 - 32
213K(7)- 405K(7)
0
115000 - 213K(7)
115000 - 213K(7)
115000 - 213K(7)
31500 - 213K(7)
Per Processor
9406-595 7483
NA
1900
1.9/36MB
16 - 32
51000-92000
Per Processor
0
Per Processor
9406-595 7482
9406-595 7590(4)
9406-595 7912(4)
NA
NA
NA
NA
NA
NA
NA
NA
1900
1900
1900
1900
1900
1900
1900
1900
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
16 - 32
4 - 32
4 - 32
8 - 16
8 - 16
8 - 16
2 - 16
2 - 16
51000-92000
13600-92000
13600-92000 Per Processor 31500 - 213K(7)
26700-50500 Per Processor 60500 - 114000
26700-50500
26700-50500
6675-50500
6675-50500
7580(5)
9406-595
9406-595 7481
9406-595 7480
9406-595 7910(4)
9406-595 7911(4)
60500 - 114000
60500 - 114000
Per Processor
0
Per Processor 15125 - 114000
Per Processor 15125 - 114000
9406-570 7760(4)
9406-570 7918(4)
9406-570 7765(5)
9406-570 7749
9406-570 7759
9406-570 7764(5)
9406-570 7748
9406-570 7758
9406-570 7916(4)
9406-570 7917(4)
9406-570 7763(5)
9406-570 7747
9406-570 7757
9406-570 7914(4)
9406-570 7915(4)
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
NA
2200
2200
2200
2200
2200
2200
2200
2200
2200
2200
2200
2200
2200
2200
2200
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
2 - 16
2 - 16
8 - 16
8 - 16
8 - 16
4 - 8
4 - 8
4 - 8
1 - 8
1 - 8
2 - 4
2 - 4
2 - 4
1 - 4
1 - 4
8100-58500
8100-58500
Per Processor 18200 - 130000
Per Processor 18200 - 130000
31100-58500 Per Processor 67500 - 130000
31100-58500 Per Processor 67500 - 130000
31100-58500
16700-31100 Per Processor
16700-31100 Per Processor
16700-31100
4200-31100
4200-31100
8400-16000
8400-16000
8400-16000
4200-16000
4200-16000
0
67500 - 130000
35500 - 67500
35500 - 67500
35500 - 67500
9100 - 67500
9100 - 67500
0
Per Processor
Per Processor
Per Processor 18200 - 34500
Per Processor 18200 - 34500
0
18200 - 34500
9100 - 34500
9100 - 34500
Per Processor
Per Processor
9406-550 7551(5)
9406-550 7629(6)
9406-550 7155
9406-550 7154
9406-550 7920(4)
9406-550 7921(4)
NA
NA
NA
NA
NA
NA
1900
1900
1900
1900
1900
1900
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1 - 4
1 - 4
1 - 4
1 - 4
1 - 4
1 - 4
3800-14000
3800-14000
3800-14000
3800-14000
3800-14000
3800-14000
Per Processor
0
Per Processor
0
Per Processor
Per Processor
8200 - 30000
8200 - 30000
8200 - 30000
8200 - 30000
8200 - 30000
8200 - 30000
9406-525 7792(11)
9406-525 7791(11)
9406-525 7790(11)
9407-515 6028(11)
9407-515 6021(11)
9407-515 6018(11)
9407-515 6011(11)
9407-515 6010(11)
NA
NA
NA
1900
1900
1900
1.9/36MB
1.9/36MB
1.9/36MB
1-2
1-2
1-2
3800-7100
3800-7100
3800-7100
7100(12)
7100(12)
3800(12)
3800(12)
3800(12)
3800-7100
3800-7100
3800-7100
8200 - 15600
8200 - 15600
8200 - 15600
15600(12)
15600(12)
8200(12)
8200(12)
8200(12)
NA
NA
NA
NA
NA
1900
1900
1900
1900
1900
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
2
2
1
1
1
7100
7100
3800
3800
3800
9406-520 7375(5)
9406-520 7736
9406-520 7785
9406-520 7784
9406-520 7691(10)
9406-520 7374(5)
9406-520 7735
NA
NA
NA
NA
NA
NA
NA
1900
1900
1900
1900
1900
1900
1900
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1.9/36MB
1 - 2
1 - 2
1 - 2
1
3800-7100
3800-7100
3800-7100
3800
3800-7100
3800-7100
8200 - 15600
8200 - 15600
8200 - 15600
8200
0
0
0
2800
2800
1
3800
2800
8200
6100
6100
1(3)
1(3)
2800
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Appendix C CPW, CIW and MCU for System i Platform
349
Download from Www.Somanuals.com. All Manuals Search And Download.
Table C.6.1.1. System i models
Edition
Feature2
L2/L3 cache
per CPU (1)
Accelerator Chip Speed
CPU
Range
Processor
CPW
5250 OLTP
CPW
Model
MCU
Feature
MHz
9406-520 7373(5)
9406-520 7734
Value
9406-520 7352
9406-520 7350
Express
NA
NA
1900
1900
1.9/36MB
1.9/36MB
1200
1200
1200
1200
2600
2600
1(3)
(3)
1(3)
1(3)
1200-3800 9
600-3100 9
7357
7355
1900
1900
1.9/36MB
1.9MB/NA
60
30
2600 - 8200
NR - 6600
9405-520 7152
NA
1900
1.9/36MB
1
3800
3800
60
8200
8200
9405-520 7144
9405-520 7143
9405-520 7148
9405-520 7156
9405-520 7142
9405-520 7141
9405-520 7140
NA
1900
1900
1900
1900
1900
1900
1900
1.9/36MB
1.9/36MB
1.9/36MB
1.9/NA
1.9MB/NA
1.9MB/NA
1.9MB/NA
60
60
60
30
30
30
30
1
1(3)
1(3)
1(3)
1(3)
1(3)
1(3)
1200-3800 9
1200-3800 9
600-3100 9
600-3100 9
600-3100 9
600-3100 9
2600 - 8200 (9)
2600 - 8200 (9)
NR - 6600 (9)
NR - 6600 (9)
NR - 6600 (9)
NR - 6600 (9)
7354
7687
7353
7682
7681
7680
*Note: 1. These models share L2 and L3 cache between two processor cores.
2. This is the Edition Feature for the model. This is the feature displayed when you display the
system value QPRCFEAT.
3. CPU Range - entry model is a partial processor model, offering multiple price/performance
points for the entry market.
4. Capacity Backup model.
5. High Availability model.
6. Domino edition.
7. The MCU rating is a projected value.
8. The 64-way CPW value is reflects two 32-way partitions.
9. These models are accelerator models. The base CPW or MCU value is the capacity with the
default processor feature. The max CPW or MCU value is the capacity when
purchasing the accelerator processor feature.
10. Collaboration Edition. (Announced May 9, 2006)
11. User based pricing models.
12. These values listed are unconstrained CPW or MCU values (there is sufficient I/O such that
the processor would be the first constrained resource). The I/O constrained CPW value
for an 8-disk configuration is approximately 800 CPW (100 CPW per disk).
NR - Not Recommended: the 600 CPW processor offering is not recommended for Domino.
C.7 V5R3 Additions (May, July, August, October 2004, July 2005)
New for this release is the eServer i5 servers which provide a significant performance improvement when
compared to iSeries model 8xx servers.
C.7.1 IBM ~® i5 Servers
Table C.7.1.1. ~® i5 Servers
L3 cache
per CPU(
L2 cache
per CPU (1)
Chip Speed
MHz
CPU
Range
Processor
CPW
5250 OLTP
CPW
2)
Model
MCU
)
32 - 64 (8
32 - 64
16 - 32
16 - 32
595-0952 (7485)
595-0952 (7484)
595-0947 (7499)
595-0947 (7498)
1650
1650
1650
1650
1.9 MB 36 MB
1.9 MB 36 MB
1.9 MB 36 MB
1.9 MB 36 MB
86000-165000
86000-165000
46000-85000
46000-85000
196000(7)-375000(7)
196000(7)-375000(7)
12000-165000
0
)
(8
12000-85000 105000 -194000(7)
0
105000 -194000(7)
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Appendix C CPW, CIW and MCU for System i Platform
350
Download from Www.Somanuals.com. All Manuals Search And Download.
Table C.7.1.1. ~® i5 Servers
L3 cache
per CPU(
L2 cache
per CPU (1)
Chip Speed
MHz
CPU
Range
Processor
CPW
5250 OLTP
CPW
2)
Model
MCU
595-0946 (7497)
595-0946 (7496)
570-0926 (7476)
570-0926 (7475)
1650
1650
1650
1650
1650
1650
1.9 MB 36 MB
1.9 MB 36 MB
1.9 MB 36 MB
1.9 MB 36 MB
1.9 MB 36 MB
1.9 MB 36 MB
8 - 16
8 - 16
13 - 16
13 - 16
13 - 16
2 - 16
24500-45500
24500-45500
12000-45500
0
54000-104000
54000-104000
83600-102000
83600-102000
83600-102000
14100-102000
36300-44700 12,000-44,700
36300-44700
36300-44700 12000-44,700
6350-44700
6,350-44,700
25500-33400 12,000-33,400
25500-33400
0
570-0926 (7563)5
570-0928 (7570)4
570-0928 (7474)
570-0924 (7473)
1650
1650
1.9 MB 36 MB
1.9 MB 36 MB
9 - 12
9 - 12
57300-77000
57300-77000
0
570-0924 (7562)5
570-0922 (7472)
1650
1650
1.9 MB 36 MB
1.9 MB 36 MB
9 - 12
5 - 8
25500-33400 12000-44,700
15200-23500 12,000-23,500
57300-77000
33600-52500
570-0922 (7471)
1650
1650
1650
1.9 MB 36 MB
1.9 MB 36 MB
1.9 MB 36 MB
5 - 8
5 - 8
2 - 4
15200-23500
15200-23500 12,000-23,500
0
33600-52500
33600-52500
14100-26600
5
570-0921 (7495)
6350-12000
6350-12000
6350-12000
3300-6000
3300-6000
3300-6000
6350-12000
6350-12000
3300-6000
3300-6000
12000
0
12000
6000
0
6,000
Max
0
Max
0
570-0921 (7494)
570-0921 (7560)5
570-0930 (7491)
570-0930 (7490)
570-0930 (7559)5
570-0920 (7470)
570-0920 (7469)
570-0919 (7489)
570-0919 (7488)
1650
1650
1650
1650
1650
1650
1650
1650
1650
1.9 MB 36 MB
1.9 MB 36 MB
1.9 MB 36 MB
1.9 MB 36 MB
1.9 MB 36 MB
1.9 MB 36 MB
1.9 MB 36 MB
1.9 MB 36 MB
1.9 MB 36 MB
2 - 4
2 - 4
1 - 2
1 - 2
1 - 2
2 - 4
2 - 4
1 - 2
1 - 2
14100-26600
14100-26600
7300-13300
7300-13300
7300-13300
14100-26600
14100-26600
7300-13300
7300-13300
550-0915 (7530)6
550-0915 (7463)
550-0915 (7462)
550-0915 (7558)
1650
1.9 MB 36 MB
2 - 4
6350-12000
0
14,100-26600
1650
1650
1650
1.9 MB 36 MB
1.9 MB 36 MB
1.9 MB 36 MB
1 - 4
1 - 4
1 - 4
3300-12000
3300-12000
3300-12000
3,300-12,000
0
3,300-12,000
7300-26600
7300-26600
7300-26600
520-0905 (7457)
520-0905 (7456)
1650
1650
1650
1650
1650
1650
1500
1.9 MB 36 MB
1.9 MB 36 MB
1.9 MB 36 MB
1.9 MB 36 MB
1.9 MB 36 MB
1.9 MB 36 MB
1.9 MB NA
2
2
2
1
1
1
1
6000
6000
6000
3300
3300
3300
2400
3,300-6000
13300
13300
13300
7300
7300
7300
0
3,300-6,000
3,300
0
520-0905 (7555)5
520-0904 (7455)
520-0904 (7454)
520-0904 (7554)5
520-0903 (7453)
3,300
2400
5500
520-0912 (7397)
520-0912 (7395)
520-0903 (7452)
520-0903 (7553)5
520-0902 (7459)
1500
1500
1.9 MB NA
1.9 MB NA
1
1
2400
2400
60
60
5500
5500
1500
1500
1500
1.9 MB
1.9 MB
1.9 MB
NA
NA
NA
1
1
2400
2400
1000
0
5500
5500
2300
2400
1000
(3)
1
1 (3)
520-0902 (7458)
1500
1500
1.9 MB
1.9 MB
NA
NA
1000
1000
0
2300
2300
520-0902 (7552)5
520-0901 (7451)
520-0900 (7450)
1 (3)
1000
1 (3)
1 (3)
1500
1500
1.9MB
1.9 MB
NA
NA
1000
500
60
30
2300
NA recommended
*Note: 1. 1.9MB - These models share L2 cache between 2 processors.
2. 36MB - These models share L3 cache between 2 processors.
3. CPU Range - Partial processor models, offering multiple price/performance points for the
entry market.
4. Capacity Backup model.
5. High Availability model.
6. Domino edition.
7. The MCU rating is a projected value.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Appendix C CPW, CIW and MCU for System i Platform
351
Download from Www.Somanuals.com. All Manuals Search And Download.
8. The 64-way is measured as two 32-way partitions since i5/OS does not support a 64-way
partition.
9. IBM stopped publishing CIW ratings for iSeries after V5R2. It is recommended that
the IBM Systems Workload Estimator be used for sizing guidance, available at:
C.8 V5R2 Additions (February, May, July 2003)
New for this release is a product line refresh of the iSeries hardware which simplifies the model structure
and minimizes the number of interactive choices. In most cases, the customer must choose between a
Standard edition which includes a 5250 interactive CPW value of 0, or an Enterprise edition which
supports the maximum 5250 OLTP capacity. The table in the following section lists the entire product
line for 2003.
C.8.1 iSeries Model 8xx Servers
Table C.8.1.1. iSeries Models 8xx Servers
Chip Speed L2 cache
CPU
Range
5250 OLTP
CPW*
Processor
CIW*
Model
Processor CPW
MCU
MHz
per CPU
890-2498 (7427)
890-2498 (7425)
1300
1300
1.41 MB* 24 - 32 29300-37400
1.41 MB* 24 - 32 29300-37400
Max
0
12900-16700 84100-108900
12900-16700 84100-108900
890-2497 (7424)
890-2497 (7422)
1300
1300
1.41 MB* 16 - 24 20000-29300
1.41 MB* 16 - 24 20000-29300
Max
0
8840-12900
8840-12900
57600-84100
57600-84100
870-2486 (7421)
870-2486 (7419)
870-2489 (7431)
870-2489 (7433)
1300
1300
1300
1300
1.41 MB*
1.41 MB*
1.41 MB*
1.41 MB*
8 - 16
8 - 16
5 - 8
11500-20000
11500-20000
7700-11500
7700-11500
Max
0
0
5280-9100
5280-9100
3600-5280
3600-5280
29600-57600
29600-57600
20200-29600
20200-29600
5 - 8
Max
825-2473 (7418)
825-2473 (7416)
1100
1100
1.41 MB*
1.41 MB*
3 - 6
3 - 6
3600-6600
3600-6600
Max
0
1570-2890
1570-2890
8700-17400
8700-17400
810-2469 (7430)
810-2469 (7428)
810-2467 (7412)
810-2467 (7410)
810-2466 (7409)
810-2466 (7407)
810-2465 (7406)
810-2465 (7404)
4 MB
4 MB
4 MB
4 MB
2 MB
2 MB
2 MB
2 MB
2
2
1
1
1
1
1
1
2700
2700
1470
1470
1020
1020
750
Max
0
Max
0
Max
0
Max
0
950
950
530
530
380
380
250
250
7900
7900
4200
4200
3100
3100
1900
1900
750
750
750
750
540
540
540
540
750
800-2464 (7408)
800-2463 (7400)
540
540
2 MB
0 MB
1
1
950
300
50
25
350
-
2900
-
*Note: 1. 5250 OLTP CPW - Max (maximum CPW value). There is no limit on 5250 OLTP workloads
and the full capacity of the server (Processor CPW) is available for 5250 OLTP work.
2. 1.41MB - These models share L2 cache between 2 processors
3. IBM does not intend to publish CIW ratings for iSeries after V5R2. It is recommended that
the eServer Workload Estimator be used for sizing guidance, available at:
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Appendix C CPW, CIW and MCU for System i Platform
352
Download from Www.Somanuals.com. All Manuals Search And Download.
C.8.2 Model 810 and 825 iSeries for Domino (February 2003)
Table C.8.2.1. iSeries for Domino 8xx Servers
Chip Speed L2 cache
CPU
Range
5250 OLTP
CPW*
Processor
CIW*
Model
Processor CPW
MCU
MHz
per CPU
825-2473 (7416)
825-2473 (7416)
1100
1100
1.41 MB
1.41 MB
6
4
6600
na
0
0
2890
na
17400
11600
810-2469 (7428)
810-2467 (7410)
810-2466 (7407)
4 MB
4 MB
2 MB
2
1
1
2700
1470
1020
0
0
0
950
530
380
7900
4200
3100
750
750
540
*Note: 1. 5250 OLTP CPW - With a rating of 0, adequate interactive processing is available for a
single 5250 job to perform system administration functions.
2. IBM does not intend to publish CIW ratings for iSeries after V5R2. It is recommended that
the eServer Workload Estimator be used for sizing guidance, available at:
na - indicates the rating is not available for the 4-way processor configuration
C.9 V5R2 Additions
In V5R2 the following new iSeries models were introduced:
y
y
y
890 Base and Standard models
840 Base models
830 Base and Standard models
Base models represent server systems with “0” interactive capability. Standard Models represent systems
that have interactive features available and also may have Capacity Upgrade on Demand Capability.
See Chapter 2, iSeries RISC Server Model Performance Behavior, for a description of the performance
highlights of the new Dedicated servers for Domino models.
C.9.1 Base Models 8xx Servers
Table C.9.1.1 Base Models 8xx Servers
Chip Speed
MHz
L2 cache
per CPU
1.41 MB*
1.41 MB*
Processor
CPW
37400
Interactive
CPW
Processor
CIW
16700
12900
Model
CPUs
MCU
108900
84100
890-0198 (none)
890-0197 (none)
1300
1300
32
24
0
0
29300
840-0159 (none)
840-0158 (none)
830-0153 (none)
600
600
540
16 MB
16 MB
4 MB
24
12
8
20200
12000
7350
0
0
0
10950
5700
3220
77800
40500
20910
* 890 Models share L2 cache between 2 processors
C.9.2 Standard Models 8xx Servers
Standard models have an initial offering of processor and interactive capacity with featured upgrades for
activation of additional processors and increased interactive capacity. Processor features are offered
through Capacity Upgrade on Demand, described in C.10 V5R1 Additions.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Appendix C CPW, CIW and MCU for System i Platform
353
Download from Www.Somanuals.com. All Manuals Search And Download.
Table C.9.2.1 Standard Models 8xx Servers
Chip Speed L2 cache
Model
CPU
Range
Interactive
CPW
Processor
CIW
Processor CPW
MCU
MHz
per CPU
890-2488 (1576)
890-2488 (1577)
890-2488 (1578)
890-2488 (1579)
890-2488 (1581)
890-2488 (1583)
890-2488 (1585)
1300
1300
1300
1300
1300
1300
1300
1.41 MB* 24 - 32 29300-37400
1.41 MB* 24 - 32 29300-37400
1.41 MB* 24 - 32 29300-37400
1.41 MB* 24 - 32 29300-37400
1.41 MB* 24 - 32 29300-37400
1.41 MB* 24 - 32 29300-37400
1.41 MB* 24 - 32 29300-37400
120
240
560
1050
2000
4550
10000
12900-16700 84100-108900
12900-16700 84100-108900
12900-16700 84100-108900
12900-16700 84100-108900
12900-16700 84100-108900
12900-16700 84100-108900
12900-16700 84100-108900
890-2488 (1587)
890-2488 (1588)
890-2488 (1591)
1300
1300
1300
1.41 MB* 24 - 32 29300-37400
1.41 MB* 24 - 32 29300-37400
1.41 MB* 24 - 32 29300-37400
16500
20200
37400
12900-16700 84100-108900
12900-16700 84100-108900
12900-16700 84100-108900
890-2487 (1576)
890-2487 (1577)
890-2487 (1578)
890-2487 (1579)
890-2487 (1581)
890-2487 (1583)
890-2487 (1585)
890-2487 (1587)
890-2487 (1588)
1300
1300
1300
1300
1300
1300
1300
1300
1300
1.41 MB* 16 - 24 20000-29300
1.41 MB* 16 - 24 20000-29300
1.41 MB* 16 - 24 20000-29300
1.41 MB* 16 - 24 20000-29300
1.41 MB* 16 - 24 20000-29300
1.41 MB* 16 - 24 20000-29300
1.41 MB* 16 - 24 20000-29300
1.41 MB* 16 - 24 20000-29300
1.41 MB* 16 - 24 20000-29300
120
240
560
1050
2000
4550
10000
16500
20200
8840-12900
8840-12900
8840-12900
8840-12900
8840-12900
8840-12900
8840-12900
8840-12900
8840-12900
57600-84100
57600-84100
57600-84100
57600-84100
57600-84100
57600-84100
57600-84100
57600-84100
57600-84100
830-2349 (1531)
830-2349 (1532)
830-2349 (1533
830-2349 (1534)
830-2349 (1535)
830-2349 (1536)
830-2349 (1537)
540
540
540
540
540
540
540
4 MB
4 MB
4 MB
4 MB
4 MB
4 MB
4 MB
4 - 8
4 - 8
4 - 8
4 - 8
4 - 8
4 - 8
4 - 8
4200-7350
4200-7350
4200-7350
4200-7350
4200-7350
4200-7350
4200-7350
70
120
240
560
1050
2000
4550
1630 - 3220 10680 - 20910
1630 - 3220 10680 - 20910
1630 - 3220 10680 - 20910
1630 - 3220 10680 - 20910
1630 - 3220 10680 - 20910
1630 - 3220 10680 - 20910
1630 - 3220 10680 - 20910
* 890 Models share L2 cache between 2 processors
Other models available in V5R2 and listed in C.10 V5R1 Additions are as follows:
y
y
y
y
All 270 Models
All 820 Models
Model 830-2400
All 840 model listed in Table C.10.4.1.1 V5R1 Capacity Upgrade on-demand Models
C.10 V5R1 Additions
In V5R1 the following new iSeries models were introduced:
y
y
y
y
820 and 840 server models
270 server models
270 and 820 Dedicated servers for Domino
840 Capacity Upgrade on-demand models (including V4R5 models December 2000)
See Chapter 2, iSeries RISC Server Model Performance Behavior, for a description of the performance
highlights of the new Dedicated Servers for Domino (DSD) models.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Appendix C CPW, CIW and MCU for System i Platform
354
Download from Www.Somanuals.com. All Manuals Search And Download.
C.10.1 Model 8xx Servers
Table C.9.1.1 Model 8xx Servers
Chip Speed
L2 cache
per CPU
2 MB
4 MB
4 MB
Processor
CPW
1100
2350
3700
Interactive
CPW
Processor
CIW
Model
MHz
820-0150 (none)
820-0151 (none)
820-0152 (none)
CPUs
MCU
3110
6660
600
600
600
1
2
4
0
0
0
385
840
1670
11810
820-2435 (1521)
820-2435 (1522)
820-2435 (1523)
820-2435 (1524)
600
600
600
600
2 MB
2 MB
2 MB
2 MB
1
1
1
1
600
600
600
600
35
70
120
240
200
200
200
200
1620
1620
1620
1620
820-2436 (1521)
820-2436 (1522)
820-2436 (1523)
820-2436 (1524)
820-2436 (1525)
600
600
600
600
600
2 MB
2 MB
2 MB
2 MB
2 MB
1
1
1
1
1
1100
1100
1100
1100
1100
35
70
120
240
560
385
385
385
385
385
3110
3110
3110
3110
3110
820-2437 (1521)
820-2437 (1522)
820-2437 (1523)
820-2437 (1524)
820-2437 (1525)
820-2437 (1526)
600
600
600
600
600
600
4 MB
4 MB
4 MB
4 MB
4 MB
4 MB
2
2
2
2
2
2
2350
2350
2350
2350
2350
2350
35
70
120
240
560
1050
840
840
840
840
840
840
6660
6660
6660
6660
6660
6660
820-2438 (1521)
820-2438 (1522)
820-2438 (1523)
820-2438 (1524)
820-2438 (1525)
820-2438 (1526)
820-2438 (1527)
600
600
600
600
600
600
600
4 MB
4 MB
4 MB
4 MB
4 MB
4 MB
4 MB
4
4
4
4
4
4
4
3700
3700
3700
3700
3700
3700
3700
35
70
1670
1670
1670
1670
1670
1670
1670
11810
11810
11810
11810
11810
11810
11810
120
240
560
1050
2000
830-2400 (1531)
830-2400 (1532)
830-2400 (1533)
830-2400 (1534)
830-2400 (1535)
400
400
400
400
400
2 MB
2 MB
2 MB
2 MB
2 MB
2
2
2
2
2
1850
1850
1850
1850
1850
70
120
240
560
1050
580
580
580
580
580
4490
4490
4490
4490
4490
830-2402 (1531)
830-2402 (1532)
830-2402 (1533)
830-2402 (1534)
830-2402 (1535)
830-2402 (1536)
540
540
540
540
540
540
4 MB
4 MB
4 MB
4 MB
4 MB
4 MB
4
4
4
4
4
4
4200
4200
4200
4200
4200
4200
70
120
240
560
1050
2000
1630
1630
1630
1630
1630
1630
10680
10680
10680
10680
10680
10680
830-2403 (1531)
830-2403 (1532)
830-2403 (1533)
830-2403 (1534)
830-2403 (1535)
830-2403 (1536)
830-2403 (1537)
540
540
540
540
540
540
540
4 MB
4 MB
4 MB
4 MB
4 MB
4 MB
4 MB
8
8
8
8
8
8
8
7350
7350
7350
7350
7350
7350
7350
3220
3220
3220
3220
3220
3220
3220
20910
20910
20910
20910
20910
20910
20910
70
120
240
560
1050
2000
4550
840-2461 (1540)
840-2461 (1541)
840-2461 (1542)
840-2461 (1543)
840-2461 (1544)
600
600
600
600
600
16 MB
16 MB
16 MB
16 MB
16 MB
24
24
24
24
24
20200
20200
20200
20200
20200
120
240
560
1050
2000
10950
10950
10950
10950
10950
77800
77800
77800
77800
77800
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Appendix C CPW, CIW and MCU for System i Platform
355
Download from Www.Somanuals.com. All Manuals Search And Download.
Table C.9.1.1 Model 8xx Servers
Chip Speed
L2 cache
per CPU
16 MB
16 MB
16 MB
Processor
CPW
20200
20200
20200
20200
Interactive
CPW
Processor
CIW
10950
10950
10950
10950
Model
MHz
CPUs
MCU
77800
77800
77800
77800
840-2461 (1545)
840-2461 (1546)
840-2461 (1547)
840-2461 (1548)
600
600
600
600
24
24
24
24
4550
10000
16500
20200
16 MB
Note: 830 models were first available in V4R5.
C.10.2 Model 2xx Servers
Table C.10.2.1 Model 2xx Servers
Chip Speed
MHz
L2 cache
per CPU
n/a
Processor
CPW
Interactive
CPW
Processor
CIW
Model
CPUs
1
MCU
1490
270-2431 (1518)
540
465
30
185
270-2432 (1516)
270-2432 (1519)
540
540
2 MB
2 MB
1
1
1070
1070
0
50
380
380
3070
3070
270-2434 (1516)
270-2434 (1520)
600
600
4 MB
4 MB
2
2
2350
2350
0
70
840
840
6660
6660
C.10.3 V5R1 Dedicated Server for Domino
Table C.10.3 .1 Dedicated Servers for Domino
Chip Speed
MHz
L2 cache
per CPU
2 MB
NonDomino
CPW
Interactive
CPW
Processor
CIW
Model
CPUs
MCU
270-2452 (none)
270-2454 (none)
540
600
1
2
100
240
0
0
380
840
3070
6660
4 MB
820-2456 (none)
820-2457 (none)
820-2458 (none)
600
600
600
2 MB
4 MB
4 MB
1
2
4
120
240
380
0
0
0
385
840
1670
3110
6660
11810
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Appendix C CPW, CIW and MCU for System i Platform
356
Download from Www.Somanuals.com. All Manuals Search And Download.
C.10.4 Capacity Upgrade on-demand Models
New in V4R5 (December 2000) , Capacity Upgrade on Demand (CUoD) capability offered for the
iSeries Model 840 enables users to start small, then increase processing capacity without disrupting any
of their current operations. To accomplish this, six processor features are available for the Model 840.
These new processor features offer a Startup number of active processors; 8-way, 12-way or 18-way ,
with additional On-Demand processors capacity built-in (Standby). The customer can add capacity in
increments of one processor (or more), up to the maximum number of On-Demand processors built into
the Model 840. CUoD has significant value for installations who want to upgrade without disruption. To
activate processors, the customer simply enters a unique activation code (“software key”) at the server
console (DST/SST screen).
The table below list the Capacity Upgrade on Demand features.
Startup Processors
(“Active”)
On-Demand Processors
(“Stand-by”)
TOTAL Processors
840-2352 (2416)
840-2353 (2417)
840-2354 (2419)
8
4
6
6
12
18
24
12
18
Note: Features 23xx added in V5R1. Features 24xx were available in V4R5 (December 2000)
C.10.4.1 CPW Values and Interactive Features for CUoD Models
The following tables list only the processor CPW value for the Startup number of processors as well as a
processor CPW value that represents the full capacity of the server for all processors active (Startup +
On-Demand). Interpolation between these values can give an approximate rating for incremental
processor improvements, although the incremental improvements will vary by workload and because
earlier activations may take advantage of caching resources that are shared among processors.
Interactive Features are available for the Model 840 ordered with CUoD Processor Features. Interactive
performance is limited by total capacity of the active processors . When ordering FC 1546, FC 1547, or
FC 1548 one should consider that the full capacity of interactive is not available unless all of the
On-Demand processors have been activated .For more information on Capacity Upgrade on-demand, see
Note: In V5R2, CUoD features come with all standard models, which are described in the V5R2
Additions section of this appendix.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Appendix C CPW, CIW and MCU for System i Platform
357
Download from Www.Somanuals.com. All Manuals Search And Download.
Table C.10.4.1.1 V5R1 Capacity Upgrade on-demand Models
Chip Speed L2 cache
CPU
Range
Interactive
CPW
Processor
CIW
Model
Processor CPW
MCU
MHz
per CPU
840-2352 (1540)
840-2352 (1541)
840-2352 (1542)
600
600
600
16 MB
16 MB
16 MB
8 - 12
8 - 12
8 - 12
9000 - 12000
9000 - 12000
9000 - 12000
120
240
560
3850 - 5700 27400 - 40500
3850 - 5700 27400 - 40500
3850 - 5700 27400 - 40500
840-2352 (1543)
840-2352 (1544)
600
600
16 MB
16 MB
8 - 12
8 - 12
9000 - 12000
9000 - 12000
1050
2000
3850 - 5700 27400 - 40500
3850 - 5700 27400 - 40500
840-2352 (1545)
840-2352 (1546)
600
600
16 MB
16 MB
8 - 12
8 - 12
9000 - 12000
9000 - 12000
4550
10000
3850 - 5700 27400 - 40500
3850 - 5700 27400 - 40500
840-2353 (1540)
840-2353 (1541)
840-2353 (1542)
840-2353 (1543)
840-2353 (1544)
840-2353 (1545)
840-2353 (1546)
840-2353 (1547)
600
600
600
600
600
600
600
600
16 MB
16 MB
16 MB
16 MB
16 MB
16 MB
16 MB
16 MB
12 - 18 12000 - 16500
12 - 18 12000 - 16500
12 - 18 12000 - 16500
12 - 18 12000 - 16500
12 - 18 12000 - 16500
12 - 18 12000 - 16500
12 - 18 12000 - 16500
12 - 18 12000 - 16500
120
240
560
1050
2000
4550
10000
5700 - 8380 40500 - 59600
5700 - 8380 40500 - 59600
5700 - 8380 40500 - 59600
5700 - 8380 40500 - 59600
5700 - 8380 40500 - 59600
5700 - 8380 40500 - 59600
5700 - 8380 40500 - 59600
5700 - 8380 40500 - 59600
16500
840-2354 (1540)
840-2354 (1541)
840-2354 (1542)
840-2354 (1543)
840-2354 (1544)
840-2354 (1545)
840-2354 (1546)
840-2354 (1547)
840-2354 (1548)
600
600
600
600
600
600
600
600
600
16 MB
16 MB
16 MB
16 MB
16 MB
16 MB
16 MB
16 MB
16 MB
18 - 24 16500 - 20200
18 - 24 16500 - 20200
18 - 24 16500 - 20200
18 - 24 16500 - 20200
18 - 24 16500 - 20200
18 - 24 16500 - 20200
18 - 24 16500 - 20200
18 - 24 16500 - 20200
18 - 24 16500 - 20200
120
240
560
1050
2000
4550
10000
8380 - 10950 59600 - 77800
8380 - 10950 59600 - 77800
8380 - 10950 59600 - 77800
8380 - 10950 59600 - 77800
8380 - 10950 59600 - 77800
8380 - 10950 59600 - 77800
8380 - 10950 59600 - 77800
8380 - 10950 59600 - 77800
8380 - 10950 59600 - 77800
16500
20200
Table C.10.4.1.2 V4R5 Capacity Upgrade on-demand Models (12/00)
Chip Speed
MHz
L2 cache
per CPU
CPU
Range
Processor
CPW
Interactive
CPW
Processor
MCU
Model
CIW
840-2416 (1540)
840-2416 (1541)
840-2416 (1542)
840-2416 (1543)
840-2416 (1544)
840-2416 (1545)
840-2416 (1546)
500
500
500
500
500
500
500
8 MB
8 MB
8 MB
8 MB
8 MB
8 MB
8 MB
8 - 12 7800 - 10000
8 - 12 7800 - 10000
8 - 12 7800 - 10000
8 - 12 7800 - 10000
8 - 12 7800 - 10000
8 - 12 7800 - 10000
8 - 12 7800 - 10000
120
240
560
1050
2000
4550
10000
3100 - 4590 22000 - 32600
3100 - 4590 22000 - 32600
3100 - 4590 22000 - 32600
3100 - 4590 22000 - 32600
3100 - 4590 22000 - 32600
3100 - 4590 22000 - 32600
3100 - 4590 22000 - 32600
840-2417 (1540)
840-2417 (1541)
840-2417 (1542)
840-2417 (1543)
840-2417 (1544)
840-2417 (1545)
840-2417 (1546)
500
500
500
500
500
500
500
8 MB
8 MB
8 MB
8 MB
8 MB
8 MB
8 MB
12 - 18 10000 - 13200
12 - 18 10000 - 13200
12 - 18 10000 - 13200
12 - 18 10000 - 13200
12 - 18 10000 - 13200
12 - 18 10000 - 13200
12 - 18 10000 - 13200
120
240
560
1050
2000
4550
10000
4590 - 6750 32600 - 48000
4590 - 6750 32600 - 48000
4590 - 6750 32600 - 48000
4590 - 6750 32600 - 48000
4590 - 6750 32600 - 48000
4590 - 6750 32600 - 48000
4590 - 6750 32600 - 48000
840-2419 (1540)
840-2419 (1541)
840-2419 (1542)
840-2419 (1543)
840-2419 (1544)
840-2419 (1545)
840-2419 (1546)
840-2419 (1547)
500
500
500
500
500
500
500
500
8 MB
8 MB
8 MB
8 MB
8 MB
8 MB
8 MB
8 MB
18 - 24 13200 - 16500
18 - 24 13200 - 16500
18 - 24 13200 - 16500
18 - 24 13200 - 16500
18 - 24 13200 - 16500
18 - 24 13200 - 16500
18 - 24 13200 - 16500
18 - 24 13200 - 16500
120
240
560
1050
2000
4550
10000
16500
6750 - 8820 48000 - 62700
6750 - 8820 48000 - 62700
6750 - 8820 48000 - 62700
6750 - 8820 48000 - 62700
6750 - 8820 48000 - 62700
6750 - 8820 48000 - 62700
6750 - 8820 48000 - 62700
6750 - 8820 48000 - 62700
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Appendix C CPW, CIW and MCU for System i Platform
358
Download from Www.Somanuals.com. All Manuals Search And Download.
C.11 V4R5 Additions
For the V4R5 hardware additions, the tables show each new server model characteristics and its
maximum interactive CPW capacity. For previously existing hardware, the tables show for each server
model the maximum interactive CPW and its corresponding CPU % and the point (the knee of the curve)
where the interactive utilization begins to increasingly impact client/server performance. For the models
that have multiple processors, and the knee of the curve is also given in CPU%, the percent value is the
percent of all the processors (not of a single one).
CPW values may be increased as enhancements are made to the operating system (e.g. each feature of the
Model 53S for V3R7 and V4R1). The server model behavior is fixed to the original CPW values.
For example, the model 53S-2157 had V3R7 CPWs of 509.9/30.7 and V4R1 CPWs 650.0/32.2. When
using the 53S with V4R1, this means the knee of the curve is 2.6% CPU and the maximum interactive is
7.7% CPU, the same as it was in V3R7.
The 2xx, 8xx and SBx models are new in V4R5. See the chapter, AS/400 RISC Server Model
Performance Behavior, for a description of the performance highlights of these new models.
C.11.1 AS/400e Model 8xx Servers
Table C.11.1 Model 8xx Servers (all new Condor models)
Chip Speed
MHz
L2 cache
per CPU
Model
CPUs
Processor CPW Interactive CPW
820-2395 (1521)
820-2395 (1522)
820-2395 (1523)
820-2395 (1524)
400
400
400
400
n/a
n/a
n/a
n/a
1
1
1
1
370
370
370
370
35
70
120
240
820-2396 (1521)
820-2396 (1522)
820-2396 (1523)
820-2396 (1524)
820-2396 (1525)
450
450
450
450
450
2 MB
2 MB
2 MB
2 MB
2 MB
1
1
1
1
1
950
950
950
950
950
35
70
120
240
560
820-2397 (1521)
820-2397 (1522)
820-2397 (1523)
820-2397 (1524)
820-2397 (1525)
820-2397 (1526)
500
500
500
500
500
500
4 MB
4 MB
4 MB
4 MB
4 MB
4 MB
2
2
2
2
2
2
2000
2000
2000
2000
2000
2000
35
70
120
240
560
1050
820-2398 (1521)
820-2398 (1522)
820-2398 (1523)
820-2398 (1524)
820-2398 (1525)
820-2398 (1526)
820-2398 (1527)
500
500
500
500
500
500
500
4 MB
4 MB
4 MB
4 MB
4 MB
4 MB
4 MB
4
4
4
4
4
4
4
3200
3200
3200
3200
3200
3200
3200
35
70
120
240
560
1050
2000
830-2400 (1531)
830-2400 (1532)
830-2400 (1533)
830-2400 (1534)
830-2400 (1535)
400
400
400
400
400
2 MB
2 MB
2 MB
2 MB
2 MB
2
2
2
2
2
1850
1850
1850
1850
1850
70
120
240
560
1050
830-2402 (1531)
830-2402 (1532)
830-2402 (1533)
540
540
540
4 MB
4 MB
4 MB
4
4
4
4200
4200
4200
70
120
240
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Appendix C CPW, CIW and MCU for System i Platform
359
Download from Www.Somanuals.com. All Manuals Search And Download.
Chip Speed
MHz
L2 cache
per CPU
Model
CPUs
Processor CPW Interactive CPW
830-2402 (1534)
830-2402 (1535)
830-2402 (1536)
540
540
540
4 MB
4 MB
4 MB
4
4
4
4200
4200
4200
560
1050
2000
830-2403 (1531)
830-2403 (1532)
830-2403 (1533)
830-2403 (1534)
830-2403 (1535)
830-2403 (1536)
830-2403 (1537)
540
540
540
540
540
540
540
4 MB
4 MB
4 MB
4 MB
4 MB
4 MB
4 MB
8
8
8
8
8
8
8
7350
7350
7350
7350
7350
7350
7350
70
120
240
560
1050
2000
4550
840-2418 (1540)
840-2418 (1541)
840-2418 (1542)
840-2418 (1543)
840-2418 (1544)
840-2418 (1545)
840-2418 (1546)
500
500
500
500
500
500
500
8 MB
8 MB
8 MB
8 MB
8 MB
8 MB
8 MB
12
12
12
12
12
12
12
10000
10000
10000
10000
10000
10000
10000
120
240
560
1050
2000
4550
10000
840-2420 (1540)
840-2420 (1541)
840-2420 (1542)
840-2420 (1543)
840-2420 (1544)
840-2420 (1545)
840-2420 (1546)
840-2420 (1547)
500
500
500
500
500
500
500
500
8 MB
8 MB
8 MB
8 MB
8 MB
8 MB
8 MB
8 MB
24
24
24
24
24
24
24
24
16500
16500
16500
16500
16500
16500
16500
16500
120
240
560
1050
2000
4550
10000
16500
C.11.2 Model 2xx Servers
Table C.11.2.1 Model 2xx Servers
Chip Speed
MHz
L2 cache
per CPU
n/a
Model
CPUs
Processor CPW Interactive CPW
250-2295
250-2296
200
200
1
1
50
75
15
20
n/a
270-2248 (1517)
400
n/a
1
150
25
270-2250 (1516)
270-2250 (1518)
270-2252 (1516)
270-2252 (1519)
270-2253 (1516)
270-2253 (1520)
400
400
450
450
450
450
n/a
n/a
2 MB
1
1
1
1
2
2
370
370
950
950
2000
2000
0
30
0
50
0
2 MB
4 MB
4 MB
70
C.11.3 Dedicated Server for Domino
Table C.11.3.1 Dedicated Server for Domino
Chip Speed
MHz
L2 cache
per CPU
2 MB
4 MB
4 MB
Non Domino
Model
CPUs
Interactive CPW
CPW
100
200
300
820-2425
820-2426
820-2427
450
500
500
1
2
4
0
0
0
270-2422
270-2423
270-2424
400
450
450
n/a
2 MB
4 MB
1
1
2
50
100
200
0
0
0
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Appendix C CPW, CIW and MCU for System i Platform
360
Download from Www.Somanuals.com. All Manuals Search And Download.
C.11.4 SB Models
Table C.11.4.1 SB Models
Chip Speed
L2 cache
per CPU
4 MB
8 MB
8 MB
Model
CPUs
Processor CPW*
Interactive CPW
MHz
540
500
500
SB2-2315
SB3-2316
SB3-2318
8
12
24
7350
10000
16500
70
120
120
* Note: The "Processor CPW" values listed for the SB models are identical to the
830-2403-1531 (8-way), the 840-2418-1540 (12-way) and the 840-2420-1540 (24-way).
However, due to the limited disk and memory of the SB models, it would not be possible
to measure these values using the CPW workload. Disk space is not a high priority for
middle-tier servers performing CPU-intensive work because they are always connected to
another computer acting as the "database" server in a multi-tier implementation.
C.12 V4R4 Additions
The Model 7xx is new in V4R4. Also in V4R4 are the Model 170s features 2289 and 2388 were added.
See the chapter, AS/400 RISC Server Model Performance Behavior, for a description of the
performance highlights of these new models.
Testing in the Rochester laboratory has shown that for systems executing traditional commercial
applications such as RPG or COBOL interactive general business applications may experience about a
5% increase in CPU requirements. This effect was observed using the workload used to compute CPW, as
shown in the tables that follows. Except for systems which are nearing the need for an upgrade, we do not
expect this increase to significantly affect transaction response times. It is recommended that other
sections of the Performance Capabilities Reference Manual (or other sizing and positioning documents)
be used to estimate the impact of upgrading to the new release.
C.12.1 AS/400e Model 7xx Servers
MAX Interactive CPW = Interactive CPW (Knee) * 7/6
CPU % used by Interactive @ Knee = Interactive CPW (Knee) / Processor CPW * 100
CPU % used by Processor @ Knee = 100 - CPU % used by Interactive @ Knee
CPU % used by Interactive @ Max = Max Interactive CPW / Processor CPW * 100
Table C.12.1.1 Model 7xx Servers (all new Northstar models)
Chip Speed L2 cache
Interactive CPW
Interactive CPW
(Knee)
Model
CPUs Processor CPW
MHz
per CPU
(Max)
720-2061 (Base)
720-2061 (1501)
720-2061 (1502)
200
200
200
n/a
n/a
n/a
1
1
1
40.8
81.7
140
240
240
240
35
70
120
720-2062 (Base)
720-2062 (1501)
720-2062 (1502)
720-2062 (1503)
200
200
200
200
4 MB
4 MB
4 MB
4 MB
1
1
1
1
40.8
81.7
140
280
420
420
420
420
35
70
120
240
720-2063 (Base)
720-2063 (1502)
720-2063 (1503)
720-2063 (1504)
200
200
200
200
4 MB
4 MB
4 MB
4 MB
2
2
2
2
40.8
140
280
810
810
810
810
35
120
240
560
653.3
720-2064 (Base)
720-2064 (1502)
720-2064 (1503)
255
255
255
4 MB
4 MB
4 MB
4
4
4
40.8
140
280
1600
1600
1600
35
120
240
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Appendix C CPW, CIW and MCU for System i Platform
361
Download from Www.Somanuals.com. All Manuals Search And Download.
Chip Speed L2 cache
Interactive CPW
Interactive CPW
(Knee)
Model
CPUs Processor CPW
MHz
per CPU
(Max)
720-2064 (1504)
720-2064 (1505)
255
255
4 MB
4 MB
4
4
653.3
1225
1600
1600
560
1050
730-2065 (Base)
730-2065 (1507)
730-2065 (1508)
730-2065 (1509)
262
262
262
262
4 MB
4 MB
4 MB
4 MB
1
1
1
1
81.7
140
280
560
560
560
560
70
120
240
560
653.3
730-2066 (Base)
730-2066 (1507)
730-2066 (1508)
730-2066 (1509)
730-2066 (1510)
262
262
262
262
262
4 MB
4 MB
4 MB
4 MB
4 MB
2
2
2
2
2
81.7
140
280
653.3
1225
1050
1050
1050
1050
1050
70
120
240
560
1050
730-2067 (Base)
730-2067 (1508)
730-2067 (1509)
730-2067 (1510)
730-2067 (1511)
262
262
262
262
262
4 MB
4 MB
4 MB
4 MB
4 MB
4
4
4
4
4
81.7
280
653.3
1225
2333.3
2000
2000
2000
2000
2000
70
240
560
1050
2000
730-2068 (Base)
730-2068 (1508)
730-2068 (1509)
730-2068 (1510)
730-2068 (1511)
262
262
262
262
262
4 MB
4 MB
4 MB
4 MB
4 MB
8
8
8
8
8
81.7
280
653.3
1225
2333.3
2890
2890
2890
2890
2890
70
240
560
1050
2000
740-2069 (Base)
740-2069 (1510)
740-2069 (1511)
740-2069 (1512)
262
262
262
262
8 MB
8 MB
8 MB
8 MB
8
8
8
8
140
1225
2333.3
4270
3660
3660
3660
3660
120
1050
2000
3660
740-2070 (Base)
740-2070 (1510)
740-2070 (1511)
740-2070 (1512)
740-2070 (1513)
262
262
262
262
262
8 MB
8 MB
8 MB
8 MB
8 MB
12
12
12
12
12
140
1225
2333.3
4270
4550
4550
4550
4550
4550
120
1050
2000
3660
4550
5308.3
C.12.2 Model 170 Servers
Current 170 Servers
MAX Interactive CPW = Interactive CPW (Knee) * 7/6
CPU % used by Interactive @ Knee = Interactive CPW (Knee) / Processor CPW * 100
CPU % used by Processor @ Knee = 100 - CPU % used by Interactive @ Knee
CPU % used by Interactive @ Max = Max Interactive CPW / Processor CPW * 100
Table C.12.2.1 Current Model 170 Servers
Interactive Processor Interactive Interactive
Interactive
Chip
Speed
L2 cache
per CPU
Processor
CPW
Feature # CPUs
CPW
(Max)
17.5
23.3
29.2
35
CPU %
@ Knee
70
72.6
78.3
CPU %
@ Knee
30
27.4
21.7
CPU %
@ Max
35
CPW
(Knee)
15
20
25
30
50
70
70
2289
2290
2291
2292
2385
2386
2388
1
1
1
1
1
1
2
200 MHz
200 MHz
200 MHz
200 MHz
252 MHz
252 MHz
255 MHz
n/a
n/a
n/a
50
73
32
25.4
15.9
12.7
17.8
7.5
115
220
460
460
1090
n/a
86.4
13.6
4 MB
4 MB
4 MB
58.3
81.7
81.7
89.1
84.8
92.3
10.9
15.2
6.4
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Appendix C CPW, CIW and MCU for System i Platform
362
Download from Www.Somanuals.com. All Manuals Search And Download.
Note: the CPU not used by the interactive workloads at their Max CPW is used by the system CFINTnn
jobs. For example, for the 2386 model the interactive workloads use 17.8% of the CPU at their maximum
and the CFINTnn jobs use the remaining 82.2%. The processor workloads use 0% CPU when the
interactive workloads are using their maximum value.
AS/400e Dedicated Server for Domino
Table C.12.2.2 Dedicated Server for Domino
Processor
CPU%
@ Knee
Processor Interactive Interactive
Chip
Speed
L2 cache
per CPU
Processor
CPW
Feature # CPUs
CPU %
@ Max
CPU %
@ Knee
CPU %
@ Max
Interactive
CPW
1
1
2
n/a
n/a
n/a
n/a
2407
2408
2409
30
60
120
10
15
20
-
-
-
-
-
-
-
-
-
-
-
-
4 MB
4 MB
Previous Model 170 Servers
On previous Model 170's the knee of the curve is about 1/3 the maximum interactive CPW value.
Note that a constrained (c) CPW rating means the maximum memory or DASD configuration is the
constraining factor, not the processor. An unconstrained (u) CPW rating means the processor is the first
constrained resource.
Table C.12.2.3 Previous Model 170 Servers
Constrain /
Unconstr
c
Interactive CPW
(Knee)
Interactive
CPU % @ Max CPU % @ Knee
Interactive
Client / Server
Interactive
CPW (Max)
16
Feature #
2159
CPW
73
5.3
22.2
22.2
21.2
21.2
14
7.7
7.7
7.4
7.4
4.7
4.7
4.4
4.4
7.2
7.2
u
c
u
c
u
c
u
c
u
5.3
7.7
73
16
23
23
29
29
40
40
67
67
114
114
125
210
125
319
125
319
2160
7.7
9.7
2164
9.7
14
13.3
13.3
22.3
22.3
12.9
12.9
21.5
21.5
2176
2183
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Appendix C CPW, CIW and MCU for System i Platform
363
Download from Www.Somanuals.com. All Manuals Search And Download.
C.13 AS/400e Model Sxx Servers
For AS/400e servers the knee of the curve is about 1/3 the maximum interactive CPW value.
Table C.13.1 AS/400e Servers
1/3 Max
Interact CPW
5.4
CPU % @ Max
CPU %
@ the Knee
11.9
11.1
9.1
Max
C/S CPW
45.4
73.1
113.8
210
464.3
759
319
583.3
998.6
1794
3660
4550
1794
2340
Max
Inter CPW
16.2
24.4
31
Model
S10
Feature #
CPUs
Interact
35.7
33.4
27.2
17
10.7
7.5
16.1
11
6.4
3.6
3.2
2.6
2118
2119
2161
2163
2165
2166
2257
2258
2259
2260
2207
2208
2256
2261
1
1
1
1
2
4
1
2
4
8
8
12
8
12
8.1
10.3
11.9
16.7
19.0
17.2
21.3
21.3
21.3
40
40
21.3
5.7
3.6
2.5
5.4
3.7
2.1
1.2
1.1
35.8
49.7
56.9
51.5
64
64
64
120
120
S20
S30
S40
0.8
1.2
0.9
3.6
2.7
64
64
21.3
C.14 AS/400e Custom Servers
For custom servers the knee of the curve is about 6/7 maximum interactive CPW value.
Table C.14.1 AS/400e Custom Servers
6/7 Max
94.9
CPU % @
14.6
CPU %
12.5
Max
759
Max
110.7
221.4
215.1
386.4
579.6
1050.0
Model
Feature #
2177
CPUs
4
4
4
8
8
8
S20
2178
189.8
184.4
331.2
496.8
900.0
29.2
25.0
759
2320
21.5
18.5
998.6
1794
1794
3660
2321
21.5
18.5
S30
S40
2322
32.5
27.7
2340
28.6
24.5
2341
12
1757.1
38.6
33.1
4550
2050.0
C.15 AS/400 Advanced Servers
For AS/400 Advanced Servers the knee of the curve is about 1/3 the maximum interactive CPW value.
For releases prior to V4R1 the model 150 was constrained due to the memory capacity. With the larger
capacity for V4R1, memory is no longer the limiting resource. In V4R1, the limit of 4 DASD devices is
the constraining resource. For workloads that do not perform as many disk operations or don't require as
much memory, the unconstrained CPW value may be more representative of the performance capabilities.
An unconstrained CPW rating means the processor is the first constrained resource.
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Appendix C CPW, CIW and MCU for System i Platform
364
Download from Www.Somanuals.com. All Manuals Search And Download.
Table C.15.1 AS/400 Advanced Servers: V4R1 and V4R2
Constrain /
Unconstr
c
1/3 Max
Interact CPW
4.6
CPU % @ Max CPU % @ the
Max
C/S CPW
20.2
Max
Inter CPW
13.8
Model Feature #
2269
CPUs
Interact
51.1
51.1
61.9
61.9
30.1
37.4
29.8
29.8
27.8
30
Knee
17
17
1
1
1
1
1
1
1
1
1
1
1
1
2
4
4
2269
2270
u
c
u
4.6
6.7
6.9
3.1
3.9
7.2
10.8
8.1
10.7
12.0
15.9
10.7
10.7
10.9
27
20.2
35
27
35
63.0
91.0
81.6
111.5
138.0
188.2
319.0
598.0
650.0
13.8
20.2
20.6
9.4
14.5
21.6
32.2
22.5
32.2
32.2
32.2
32.2
32.2
32.2
150
20.6
20.6
10
12.5
9.9
9.9
9.3
10
2270
2109
2110
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
40S
2111
2112
50S
53S
2120
2121
2122
2154
2155
2156
2157
23.8
20.3
13.5
9
8.9
6.8
4.5
3
7.7
2.6
Table C.15.2 AS/400 Advanced Servers: V3R7
Constrain /
Unconstr
c
1/3 Max
Interact CPW
3.6
CPU % @ Max
Interact
100.0
100.0
51.1
CPU %
@ the Knee
33.0
33.0
17.0
20.6
10
Max
C/S CPW
10.9
Max
Inter CPW
10.9
Model Feature #
2269
CPUs
1
1
1
1
1
1
1
1
1
1
1
1
2
4
4
2269
2270
u
c
u
3.6
4.6
6.9
3.1
3.7
6.9
10.3
7.7
10.2
11.5
13.3
10.2
10.2
10.4
10.9
27.0
33.3
27.0
33.3
59.8
87.3
77.7
104.2
130.7
162.7
278.8
459.3
509.9
10.9
13.8
20.6
9.4
13.8
20.6
30.7
21.4
30.7
30.7
30.7
30.7
30.7
30.7
150
2270
2109
61.9
30.1
37.4
29.8
29.8
27.8
30
23.8
20.3
13.5
9
7.7
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
2110
2111
2112
2120
2121
2122
2154
2155
2156
2157
12.5
9.9
9.9
9.3
10
8.9
6.8
4.5
3
40S
50S
53S
2.6
C.16 AS/400e Custom Application Server Model SB1
AS/400e application servers are particularly suited for environments with minimal database needs,
minimal disk storage needs, lots of low-cost memory, high-speed connectivity to a database server, and
minimal upgrade importance.
The throughput rates for Financial (FI) dialogsteps (ds) per hour may be used to size systems for customer
orders. Note: 1 SD ds = = 2.5 Fl ds. (SD = Sales & Distribution).
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Appendix C CPW, CIW and MCU for System i Platform
365
Download from Www.Somanuals.com. All Manuals Search And Download.
Table C.16.1 AS/400e Custom Application Server Model SB1
SAP
Release
3.1H
SD ds/hr
@ 65% CPU Utilization
109,770.49
FI ds/hr
@ 65% CPU Utilization
274,426.23
Model
2312
CPUs
8
4.0B
65,862.29
164,655.74
3.1H
4.0B
158,715.76
95,229.46
396,789.40
238,073.64
2313
12
C.17 AS/400 Models 4xx, 5xx and 6xx Systems
Table C.17.1 AS/400 RISC Systems
Memory (MB)
Maximum
160
Disk (GB)
Maximum
50
Model
Feature Code
CPUs
V3R7 CPW
V4R1 CPW
2130
2131
2132
2133
2140
2141
2142
2143
2144
2150
2151
2152
2153
2162
1
1
1
1
1
1
1
1
1
1
1
2
4
4
13.8
20.6
27
33.3
21.4
30.7
43.9
77.7
104.2
131.1
162.7
278.8
459.3
509.9
13.8
20.6
27
224
224
224
768
50
50
50
400
35
652
652
652
652
652
996
996
996
996
996
21.4
30.7
43.9
81.6
111.5
148
188.2
319
598
500
510
768
1024
1024
1024
4096
4096
4096
4096
4096
530
650
Table C.17.2 AS/400e Systems
Model
Feature Code
2129
CPUs Memory (MB) Maximum Disk (GB) Maximum
V4R3 CPW
1
1
1
1
1
1
1
1
2
384
384
384
175.4
175.4
175.4
175.4
944.8
944.8
944.8
944.8
944.8
1340
22.7
32.5
45.4
73.1
50
85.6
113.8
210
2134
2135
2136
2175
2179
2180
2181
2182
2237
2238
2239
2188
2189
2240
2243
600
512
1856
2048
2048
2048
4096
16384
8704
16384
40960
40960
32768
32768
620
464.3
319
1
640
650
2
1340
583.3
998.6
3660
4550
1794
2340
4
1340
8
12
8
2095.9
2095.9
2095.9
2095.9
12
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Appendix C CPW, CIW and MCU for System i Platform
366
Download from Www.Somanuals.com. All Manuals Search And Download.
C.18 AS/400 CISC Model Capacities
Table C.18.1 AS/400 CISC Model: 9401
Model
P02
Feature
n/a
2114
2115
2117
CPUs
Memory (MB) Maximum
Disk (GB) Maximum
CPW
7.3
7.3
9.6
16.8
1
1
1
1
16
24
40
56
2.1
2.99
3.93
3.93
P03
Table C.18.2 AS/400 CISC Model: 9402 Systems
Model
C04
C06
D02
D04
E02
D06
E04
F02
CPUs
Memory (MB) Maximum
Disk (GB) Maximum
CPW
1
1
1
1
1
1
1
1
1
1
1
12
16
16
16
24
20
24
24
24
40
40
1.3
1.3
1.2
1.6
2.0
1.6
4.0
2.1
4.1
7.9
8.2
3.1
3.6
3.8
4.4
4.5
5.5
5.5
5.5
7.3
7.3
9.6
F04
E06
F06
Table C.18.3 AS/400 CISC Model: 9402 Servers
Feature Code
CPUs
1
1
Memory (MB) Maximum Disk (GB) Maximum
C/S CPW
17.1
Interactive CPW
S01
100
56
56
3.9
7.9
5.5
5.5
17.1
Table C.18.4 AS/400 CISC Model: 9404 Systems
Model
B10
C10
B20
C20
D10
C25
D20
E10
D25
F10
CPUs
Memory (MB) Maximum
Disk (GB) Maximum
CPW
2.9
3.9
5.1
5.3
5.3
6.1
6.8
7.6
9.7
9.6
9.7
11.6
11.8
13.7
1
1
1
1
1
1
1
1
1
1
1
1
1
1
16
20
28
32
32
40
40
40
64
72
72
80
80
80
1.9
1.9
3.8
3.8
4.8
3.8
4.8
19.7
6.4
20.6
19.7
20.6
19.7
20.6
E20
F20
E25
F25
Table C.18.5 AS/400 CISC Model: 9404 Servers
Feature Code
CPUs
Memory (MB) Maximum Disk (GB) Maximum
C/S CPW
Interactive CPW
135
140
1
2
384
512
27.5
47.2
32.3
65.6
9.6
11.6
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008 Appendix C CPW, CIW and MCU for System i Platform
367
Download from Www.Somanuals.com. All Manuals Search And Download.
Table C.18.6 AS/400 CISC Model: 9406 Systems
Model
B30
B35
B40
B45
D35
B50
E35
D45
D50
E45
F35
B60
F45
E50
B70
D60
F50
E60
D70
E70
F60
D80
F70
E80
E90
F80
E95
F90
F95
F97
CPUs
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
1
2
3
2
4
3
4
4
Memory (MB) Maximum
Disk (GB) Maximum
CPW
3.8
4.6
5.2
6.5
7.4
9.3
9.7
36
40
40
40
72
48
72
80
128
80
13.7
13.7
13.7
13.7
67.0
27.4
67.0
67.0
98.0
67.0
67.0
54.8
67.0
98.0
54.8
146
114
146
146
146
146
256
256
256
256
256
256
256
256
256
10.8
13.3
13.8
13.7
15.1
17.1
18.1
20.0
23.9
27.8
28.1
32.3
39.2
40.0
56.6
57.0
69.4
96.7
97.1
116.6
127.7
148.8
177.4
80
96
80
128
192
192
192
192
256
256
384
384
512
512
1024
768
1152
1024
1280
1536
Table C.18.7 AS/400 Advanced Systems (CISC)
Model
Feature Code
2030
CPUs
Memory (MB) Maximum Disk (GB) Maximum
CPW
7.3
1
1
1
1
1
1
1
2
1
2
4
24
56
23.6
23.6
200
2031
2032
2040
2041
2042
2043
2044
2050
11.6
16.8
11.6
16.8
21.1
33.8
56.5
67.5
120.3
177.4
128
72
80
160
832
832
1536
1536
1536
23.6
117.4
117.4
117.4
159.3
159.3
259.6
259.6
259.6
300
310
320
2051
2052
Table C.18.8 AS/400 Advanced Servers (CISC)
Interactive
Model
Feature Code CPUs Memory (MB) Maximum Disk (GB) Maximum
C/S CPW
CPW
5.5
5.5
5.5
5.5
20S
2FS
2SG
2SS
2010
2010
2010
2010
2411
2412
1
1
1
1
1
2
128
128
128
128
384
832
23.6
7.8
7.8
7.8
86.5
86.5
17.1
17.1
17.1
17.1
32.3
68.5
9.6
11.6
30S
IBM i 6.1 Performance Capabilities Reference - January/April/October 2008
© Copyright IBM Corp. 2008
Appendix C CPW, CIW and MCU for System i Platform
368
Download from Www.Somanuals.com. All Manuals Search And Download.
|