Implementation Supplement Fujitsu SPARC64 V User Manual

SPARC JPS1  
Implementation Supplement:  
Fujitsu SPARC64 V  
Fujitsu Limited  
Release 1.0, 1 July 2002  
Fujitsu Limited  
4-1-1 Kamikodanaka  
Nahahara-ku, Kawasaki, 211-8588  
Japan  
Part No. 806-6755-1.0  
3
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002  
F.CHAPTER  
Component Overview 4  
Execution Unit (EU) 6  
Storage Unit (SU) 7  
4. Data Formats 15  
Tick (TICK) Register 19  
Privileged Registers 19  
Trap State (TSTATE) Register 19  
Version (VER) Register 20  
Ancillary State Registers (ASRs) 20  
Registers Referenced Through ASIs 22  
i
Floating-Point Deferred-Trap Queue (FQ) 24  
IU Deferred-Trap Queue 24  
Processor Pipeline 31  
Instruction Fetch Stages 31  
Issue Stages 33  
Execution Stages 33  
Completion Stages 34  
Trap-Table Entry Addresses 38  
Trap Type (TT) 38  
Details of Supported Traps 39  
Trap Processing 39  
Exception and Interrupt Descriptions 39  
SPARC V9 Implementation-Dependent, Optional Traps That Are  
Mandatory in SPARC JPS1 39  
ii  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002  
SPARC JPS1 Implementation-Dependent Traps 39  
8. Memory Models 41  
Overview 42  
SPARC V9 Memory Model 42  
Mode Control 42  
Synchronizing Instruction and Data Memory 42  
Read State Register 58  
SHUTDOWN (VIS I) 58  
Write State Register 59  
Deprecated Instructions 59  
Store Barrier 59  
B. IEEE Std 754-1985 Requirements for SPARC V9 61  
Traps Inhibiting Results 61  
Floating-Point Nonstandard Mode 61  
fp_exception_other Exception (ftt=unfinished_FPop) 62  
Operation Under FSR.NS = 1 65  
C. Implementation Dependencies 69  
Definition of an Implementation Dependency 69  
Hardware Characteristics 70  
Implementation Dependency Categories 70  
List of Implementation Dependencies 70  
Release 1.0, 1 July 2002  
F. Chapter  
Contents  
iii  
E. Opcode Maps 83  
Reset, Disable, and RED_state Behavior 91  
Internal Registers and ASI operations 92  
I/ D TLB Data In, Data Access, and Tag Read Registers 93  
MMU Bypass 104  
K. Programming with the Memory Models 115  
L. Address Space Identifiers 117  
SPARC64 V ASI Assignments 117  
Special Memory Access ASIs 119  
Barrier Assist for Parallel Processing 121  
Interface Definition 121  
ASI Registers 122  
M. Cache Organization 125  
Cache Types 125  
Level-1 Instruction Cache (L1I Cache) 126  
iv  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002  
Cache Control/ Status Instructions 128  
Flush Level-1 Instruction Cache (ASI_FLUSH_L1I) 129  
Level-2 Cache Control Register (ASI_L2_CTRL) 130  
L2 Diagnostics Tag Read (ASI_L2_DIAG_TAG_READ) 130  
L2 Diagnostics Tag Read Registers (ASI_L2_DIAG_TAG_READ_REG) 131  
Interrupt Global Registers 136  
Interrupt-Related ASR Registers 136  
Interrupt Vector Dispatch Register 136  
Interrupt Vector Dispatch Status Register 136  
Interrupt Vector Receive Register 136  
CPU Fatal Error state 141  
Processor State after Reset and in RED_state 141  
Operating Status Register (OPSR) 146  
Hardware Power-On Reset Sequence 147  
Firmware Initialization Sequence 147  
P. Error Handling 149  
Error Classification 149  
Fatal Error 149  
Release 1.0, 1 July 2002  
F. Chapter  
Contents  
v
error_state Transition Error 150  
Urgent Error 150  
Restrainable Error 152  
Action and Error Control 153  
Control of Error Action (ASI_ERROR_CONTROL) 161  
Fatal Error and error_state Transition Error 163  
Action of async_data_error (ADE) Trap 168  
Instruction End-Method at ADE Trap 170  
Expected Software Handling of ADE Trap 171  
Instruction Access Errors 173  
Data Access Errors 173  
Restrainable Errors 174  
ASI_ASYNC_FAULT_STATUS (ASI_AFSR) 174  
ASI_ASYNC_FAULT_ADDR_D1 177  
ASI_ASYNC_FAULT_ADDR_U2 178  
Expected Software Handling of Restrainable Errors 179  
Handling of Internal Register Errors 181  
Register Error Handling (Excluding ASRs and ASI Registers) 181  
ASR Error Handling 182  
ASI Register Error Handling 183  
Cache Error Handling 188  
Handling of a Cache Tag Error 188  
Handling of an I1 Cache Data Error 190  
Handling of a D1 Cache Data Error 190  
Handling of a U2 Cache Data Error 192  
Automatic Way Reduction of I1 Cache, D1 Cache, and U2 Cache 193  
vi  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V • Release 1.0, 1 July 2002  
Handling of TLB Entry Errors 195  
Automatic Way Reduction of sTLB 196  
Handling of Extended UPA Bus Interface Error 197  
Handling of Extended UPA Address Bus Error 197  
Handling of Extended UPA Data Bus Error 197  
Trap-Related Statistics 206  
MMU Event Counters 207  
Cache Event Counters 208  
UPA Event Counters 210  
UPA PortID Register 214  
UPA Config Register 215  
S. Summary of Differences between SPARC64 V and UltraSPARC-III 219  
Bibliography 223  
General References 223  
Index 225  
Release 1.0, 1 July 2002  
F. Chapter  
Contents  
vii  
viii  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
F.CHAPTER  
1
Overview  
1.1  
Navigating the SPARC64 V  
We suggest that you approach this Implementation Supplement SPARC Joint  
Programming Specification as follows.  
1. Familiarize yourself with the SPARC64 V processor and its components by  
reading these sections:  
The SPARC64 V processor on page 2  
Processor Pipeline on page 31  
2. Study the terminology in Chapter 2, Definitions:  
3. For details of architectural changes, see the remaining chapters in this  
Implementation Supplement as your interests direct.  
For this revision, we added new appendixes: Appendix R, UPA Programmers Model,  
and Appendix S, Summary of Differences between SPARC64 V and UltraSPARC-III.  
1.2  
Fonts and Notational Conventions  
Please refer to Section 1.2 of Commonality for font and notational conventions.  
1
     
1.3  
The SPARC64 V processor  
The SPARC64 V processor is a high-performance, high-reliability, and high-integrity  
processor that fully implements the instruction set architecture that conforms to  
SPARC V9, as described in JPS1 Commonality. In addition, the SPARC64 V processor  
implements the following features:  
64-bit virtual address space and 43-bit physical address space  
Advanced RAS features that enable high-integrity error handling  
Microarchitecture for High Performance  
The SPARC64 V is an out-of-order execution superscalar processor that issues up to  
four instructions per cycle. Instructions in the predicted path are issued in program  
order and are stored temporarily in reservation stations until they are dispatched out  
of program order to appropriate execution units. Instructions commit in program  
order when no exceptional conditions occur during execution and all prior  
instructions commit (that is, the result of the instruction execution becomes visible).  
Out-of-order execution in SPARC64 V contributes to high performance.  
SPARC64 V implements a large branch history buffer to predict its instruction path.  
The history buffer is large enough to sustain a good prediction rate for large-scale  
programs such as DBMS and to support the advanced instruction fetch mechanism  
of SPARC64 V. This instruction fetch scheme predicts the execution path beyond the  
multiple conditional branches in accordance with the branch history. It then tries to  
prefetch instructions on the predicted path as much as possible to reduce the effect  
of the performance penalty caused by instruction cache misses.  
High Integration  
SPARC64 V integrates an on-board, associative, level-2 cache. The level-2 cache is  
unified for instruction and data. It is the lowest layer in the cache hierarchy.  
This integration contributes to both performance and reliability of SPARC64 V. It  
enables shorter access time and more associativity and thus contributes to higher  
performance. It contributes to higher reliability by eliminating the external  
connections for level-2 cache.  
High Reliability and High Integrity  
SPARC64 V implements the following advanced RAS features for reliability and  
integrity beyond that of ordinary microprocessors.  
2
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
     
1. Advanced RAS features for caches  
Strong cache error protection:  
ECC protection for D1 (Data level 1) cache data, U2 (unified level 2) cache data,  
and the U2 cache tag.  
Parity protection for I1 (Instruction level 1) cache data.  
Parity protection and duplication for the I1 cache tag and the D1 cache tag.  
Automatic correction of all types of single-bit error:  
Automatic single-bit error correction for the ECC protected data.  
Invalidation and refilling of I1 cache data for the I1 cache data parity error.  
Copying from duplicated tag for I1 cache tag and D1 cache tag parity errors.  
Dynamic way reduction while cache consistency is maintained.  
Error marking for cacheable data uncorrectable errors:  
Special error-marking pattern for cacheable data with uncorrectable errors. The  
identification of the module that first detects the error is embedded in the  
special pattern.  
Error-source isolation with faulty module identification in the special error-  
marking. The identification information enables the processor to avoid  
repetitive error logging for the same error cause.  
2. Advanced RAS features for the core  
Strong error protection:  
Parity protection for all data paths.  
Parity protection for most of software-visible registers and internal temporary  
registers.  
Parity prediction or residue checking for the accumulator output.  
Hardware instruction retry  
Support for software instruction retry (after failure of hardware instruction retry)  
Error isolation for software recovery:  
Error indication for each programmable register group.  
Indication of retryability of the trapped instruction.  
Use of different error traps to differentiate degrees of adverse effects on the  
CPU and the system.  
3. Extended RAS interface to software  
Error classification according to the severity of the effect on program execution:  
Urgent error (nonmaskable): Unable to continue execution without OS  
intervention; reported through a trap.  
Restrainable error (maskable): OS controls whether the error is reported  
through a trap, so error does not directly affect program execution.  
Isolated error indication to determine the effect on software  
Release 1.0, 1 July 2002  
F. Chapter 1  
Overview  
3
       
Asynchronous data error (ADE) trap for additional errors:  
Relaxed instruction end method (precise, retryable, not retryable) for the  
async_data_error exception to indicate how the instruction should end; depends  
on the executing instruction and the detected error.  
Some ADE traps that are deferred but retryable.  
Simultaneous reporting of all detected ADE errors at the error barrier for correct  
handling of retryability.  
1.3.1  
The SPARC64 V processor contains these components.  
Instruction control Unit (IU)  
Execution Unit (EU)  
Storage Unit (SU)  
Secondary cache and eXternal access Unit (SXU)  
FIGURE 1-1 illustrates the major units; the following subsections describe them.  
4
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
 
Extended UPA Bus  
E-Unit  
SX-Unit  
UPA interface logic  
MoveOut buffer  
MoveIn buffer  
U2$  
tag  
U2$ data  
2M 4-way  
ALUs  
ALU  
Input  
Registers  
and  
Output  
Registers  
EXA  
EXB  
FLA  
FLB  
EAGA  
S-Unit interface  
S-Unit  
SX interface  
EAGB  
SX order queue Store queue  
GUB  
GPR  
FUB  
FPR  
I-TLB  
tag  
data  
D-TLB  
2048  
+ 32  
entry  
tag  
data  
2048  
+ 32  
entry  
Level-1 I cache  
128 KB, 2-way  
Level-1 D cache  
128 KB, 2-way  
I-Unit  
E-unit  
Commit stack entry  
Reservation stations  
Instruction Instruction  
PC  
nPC  
CCR  
FSR  
fetch  
buffer  
control  
logic  
pipeline  
Branch  
history  
FIGURE 1-1 SPARC64 V Major Units  
Release 1.0, 1 July 2002  
F. Chapter 1  
Overview  
5
 
1.3.2  
Instruction Control Unit (IU)  
The IU predicts the instruction execution path, fetches instructions on the predicted  
path, distributes the fetched instructions to appropriate reservation stations, and  
dispatches the instructions to the execution pipeline. The instructions are executed  
out of order, and the IU commits the instructions in order. Major blocks are defined  
in TABLE 1-1.  
TABLE 1-1  
Name  
Instruction Control Unit Major Blocks  
Description  
Instruction fetch pipeline Five stages: fetch address generation, iTLB access, iTLB match,  
I-Cache fetch, and a write to I-buffer.  
Branch history  
16K entries, 4-way set associative.  
Six entries, 32 bytes/ entry.  
Instruction buffer  
Reservation station  
Six reservation stations to hold instructions until they can  
execute: RSBRfor branch and the other control-transfer  
instructions; RSAfor load/ store instructions; RSEAand RSEBfor  
integer arithmetic instructions; RSFAand RSFBfor floating-point  
arithmetic and VIS instructions.  
Commit stack entries  
Sixty-four entries; basically one instruction/ entry, to hold  
information about instructions issued but not yet committed.  
PC, nPC, CCR, FSR  
Program-visible registers for instruction execution control.  
1.3.3  
Execution Unit (EU)  
The EU carries out execution of all integer arithmetic, logical, shift instructions, all  
floating-point instructions, and all VIS graphic instructions. TABLE 1-2 describes the  
EU major blocks.  
TABLE 1-2  
Name  
Execution Unit Major Blocks  
Description  
General register (gr) renaming  
register file (GUB: gr update  
buffer)  
Thirty-two entries, 8 read ports, 2 write ports  
Gr architecture register file (GPR) 160 entries, 1 read port, 2 write ports  
Floating-point (fr) renaming  
register file (FUB: fr update  
buffer)  
Thirty-two entries, 8 read ports, 2 write ports  
Fr architecture register file (FPR) Thirty-two entries,  
6 read ports, 2 write ports  
EU control logic  
Controls the instruction execution stages: instruction  
selection, register read, and execution.  
6
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
           
TABLE 1-2  
Name  
Execution Unit Major Blocks (Continued)  
Description  
Interface registers  
Input/ output registers to other units.  
Two integer execution pipelines 64-bit ALU and shifters.  
(EXA, EXB)  
Two floating-point and graphics Each floating-point execution pipeline can execute floating  
execution pipelines (FLA, FLB) point multiply, floating point add/ sub, floating-point  
multiply and add, floating point div/ sqrt, and floating-  
point graphics instruction.  
Two virtual address adders for  
memory access pipeline (EAGA,  
EAGB)  
Two 64-bit virtual addresses for load/ store.  
1.3.4  
Storage Unit (SU)  
The SU handles all sourcing and sinking of data for load and store instructions.  
TABLE 1-3 describes the SU major blocks.  
TABLE 1-3  
Name  
Storage Unit Major Blocks  
Description  
Instruction level-1 cache  
Data level-1 cache  
128-Kbyte, 2-way associative, 64-byte line; provides low latency  
instruction source  
128-Kbyte, 2-way associative, 64-byte line, writeback; provides  
the low latency data source for loads and stores.  
Instruction Translation  
Buffer  
1024 entries, 2-way associative TLB for 8-Kbyte pages,  
1
1024 entries, 2-way associative TLB for 4-Mbyte pages ,  
32 entries, fully associative TLB for unlocked 64-Kbyte, 512-  
Kbyte, 4-Mbyte1 pages and locked pages in all sizes.  
Data Translation Buffer  
1024 entries, 2-way associative TLB for 8-Kbyte pages,  
1024 entries, 2-way associative TLB for 4-Mbyte pages1,  
32 entries, fully associative TLB for unlocked 64-Kbyte, 512-  
Kbyte, 4-Mbyte1 pages and locked pages in all sizes.  
Store queue  
Decouples the pipeline from the latency of store operations.  
Allows the pipeline to continue flowing while the store waits for  
data, and eventually writes into the data level 1 cache.  
1. Unloced 4-Mbyte page entry is stored either in 2-way associative TLB or fully associative  
TLB exclusively, depending on the setting.  
Release 1.0, 1 July 2002  
F. Chapter 1  
Overview  
7
           
1.3.5  
Secondary Cache and External Access Unit (SXU)  
The SXU controls the operation of unified level-2 caches and the external data access  
interface (extended UPA interface). TABLE 1-4 describes the major blocks of the SXU.  
TABLE 1-4  
Name  
Secondary Cache and External Access Unit Major Blocks  
Description  
Unified level-2 cache  
Movein buffer  
2-Mbyte, 4-way associative, 64-byte line, writeback; provides low  
latency data source for both instruction level-1 cache and data  
level-1 cache.  
Sixteen entries, 64-bytes/ entry; catches returning data from  
memory system in response to the cache line read request. A  
maximum of 16 outstanding cache read operations can be issued.  
Moveout buffer  
Eight entries, 64-bytes/ entry; holds writeback data. A maximum  
of 8 outstanding writeback requests can be issued.  
Extended UPA interface  
control logic  
Send/ receive transaction packets to/ from Extended UPA  
interface connected to the system.  
8
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
       
F.CHAPTER  
2
Definitions  
This chapter defines concepts unique to the SPARC64 V, the Fujitsu implementation  
of SPARC JPS1. For definition of terms that are common to all implementations,  
please refer to Chapter 2 of Commonality.  
committed Term applied to an instruction when it has completed without error and all  
prior instructions have completed without error and have been committed. When  
an instruction is committed, the state of the machine is permanently changed  
to reflect the result of the instruction; the previously existing state is no longer  
needed and can be discarded.  
completed Term applied to an instruction after it has finished, has sent a nonerror status to  
the issue unit, and all of its source operands are nonspeculative. Note:  
Although the state of the machine has been temporarily altered by completion  
of an instruction, the state has not yet been permanently changed and the old  
state can be recovered until the instruction has been committed.  
executed Term applied to an instruction that has been processed by an execution unit  
such as a load unit. An instruction is in execution as long as it is still being  
processed by an execution unit.  
fetched Term applied to an instruction that is obtained from the I2 instruction cache or  
from the on-chip internal cache and sent to the issue unit.  
finished Term applied to an instruction when it has completed execution in a functional  
unit and has forwarded its result onto a result bus. Results on the result bus are  
transferred to the register file, as are the waiting instructions in the instruction  
queues.  
initiated Term applied to an instruction when it has all of the resources that it needs (for  
example, source operands) and has been selected for execution.  
instruction dispatch Synonym: instruction initiation.  
instruction issued Term applied to an instruction when it has been dispatched to a reservation  
station.  
9
                         
instruction retired Term applied to an instruction when all machine resources (serial numbers,  
renamed registers) have been reclaimed and are available for use by other  
instructions. An instruction can only be retired after it has been committed.  
instruction stall Term applied to an instruction that is not allowed to be issued. Not every  
instruction can be issued in a given cycle. The SPARC64 V implementation  
imposes certain issue constraints based on resource availability and program  
requirements.  
issue-stalling  
instruction An instruction that prevents new instructions from being issued until it has  
committed.  
machine sync The state of a machine when all previously executing instructions have  
committed; that is, when no issued but uncommitted instructions are in the  
machine.  
Memory Management  
Unit (MMU) Refers to the address translation hardware in SPARC64 V that translates 64-bit  
virtual address into physical address. The MMU is composed of the mITLB,  
mDTLB, uITLB, uDTLB, and the ASI registers used to manage address  
translation.  
mTLB Main TLB. Split into I and D, called mITLB and mDTLB, respectively. Contains  
address translations for the uITLB and uDTLB. When the uITLB or uDTLB do  
not contain a translation, they ask the mTLB for the translation. If the mTLB  
contains the translation, it sends the translation to the respective uTLB. If the  
mTLB does not contain the translation, it generates a fast access exception to a  
software translation trap handler, which will load the translation information  
(TTE) into the mTLB and retry the access. See also TLB.  
uDTLB Micro Data TLB. A small, fully associative buffer that contains address  
translations for data accesses. Misses in the uDTLB are handled by the mTLB.  
uITLB Micro Instruction TLB. A small, fully associative buffer that contains address  
translations for instruction accesses. Misses in the uTLB are handled by the  
mTLB.  
nonspeculative A distribution system whereby a result is guaranteed known correct or an  
operand state is known to be valid. SPARC64 V employs speculative  
distribution, meaning that results can be distributed from functional units  
before the point at which guaranteed validity of the result is known.  
reclaimed The status when all instruction-related resources that were held until commit  
have been released and are available for subsequent instructions. Instruction  
resources are usually reclaimed a few cycles after they are committed.  
rename registers A large set of hardware registers implemented by SPARC64 V that are invisible  
to the programmer. Before instructions are issued, source and destination  
registers are mapped onto this set of rename registers. This allows instructions  
that normally would be blocked, waiting for an architected register, to proceed  
10  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
                     
in parallel. When instructions are committed, results in renamed registers are  
posted to the architected registers in the proper sequence to produce the correct  
program results.  
scan A method used to initialize all of the machine state within a chip. In a chip that  
has been designed to be scannable, all of the machine state is connected in one  
or several loops called scan rings.Initialization data can be scanned into the  
chip through the scan rings. The state of the machine also can be scanned out  
through the scan rings.  
reservation station A holding location that buffers dispatched instructions until all input operands  
are available. SPARC64 V implements dataflow execution based on operand  
availability. When operands are available, the instructions in the reservation  
station are scheduled for execution. Reservation stations also contain special  
tag-matching logic that captures the appropriate operand data. Reservation  
stations are sometimes referred to as queues (for example, the integer queue).  
speculative A distribution system whereby a result is not guaranteed as known to be  
correct or an operand state is not known to be valid. SPARC64 V employs  
speculative distribution, meaning results can be distributed from functional  
units before the point at which guaranteed validity of the result is known.  
superscalar An implementation that allows several instructions to be issued, executed, and  
committed in one clock cycle. SPARC64 V issues up to 4 instructions per clock  
cycle.  
sync Synonym: machine sync.  
syncing instruction An instruction that causes a machine sync. Thus, before a syncing instruction is  
issued, all previous instructions (in program order) must have been committed.  
At that point, the syncing instruction is issued, executed, completed, and  
committed by itself.  
TLB Translation lookaside buffer.  
Release 1.0, 1 July 2002  
F. Chapter 2  
Definitions  
11  
                 
12  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
F.CHAPTER  
3
Architectural Overview  
Please refer to Chapter 3 in the Commonality section of SPARC Joint Programming  
Specification.  
13  
 
14  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
F.CHAPTER  
4
Data Formats  
Please refer to Chapter 4, Data Formats in Commonality.  
15  
 
16  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
F.CHAPTER  
5
Registers  
The SPARC64 V processor includes two types of registers: general-purposethat is,  
working, data, control/ statusand ASI registers.  
The SPARC V9 architecture also defines two implementation-dependent registers:  
the IU Deferred-Trap Queue and the Floating-Point Deferred-Trap Queue (FQ);  
SPARC64 V does not need or contain either queue. All processor traps caused by  
instruction execution are precise, and there are several disrupting traps caused by  
asynchronous events, such as interrupts, asynchronous error conditions, and  
RED_stateentry traps.  
For general information, please see parallel subsections of Chapter 5 in  
Commonality. For easier referencing, this chapter follows the organization of  
Chapter 5 in Commonality.  
For information on MMU registers, please refer to Section F.10, Internal Registers and  
ASI operations, on page 92.  
The chapter contains these sections:  
Nonprivileged Registers on page 17  
Privileged Registers on page 19  
5.1  
Nonprivileged Registers  
Most of the definitions for the registers are as described in the corresponding  
sections of Commonality. Only SPARC64 V-specific features are described in this  
section.  
17  
                 
5.1.7  
Floating-Point State Register (FSR)  
Please refer to Section 5.1.7 of Commonality for the description of FSR.  
The sections below describe SPARC64 V-specific features of the FSRregister.  
SPARC V9 defines the FSR.NSbit which, when set to 1, causes the FPU to produce  
implementation-dependent results that may not conform to IEEE Std 754-1985.  
SPARC64 V implements this bit.  
When FSR.NS= 1, denormal input operands and denormal results that would  
otherwise trap are flushed to 0 of the same sign and an inexact exception is signalled  
(that may be masked by FSR.TEM.NXM). See Section B.6, Floating-Point Nonstandard  
Mode, on page 61 for details.  
When FSR.NS= 0, the normal IEEE Std 754-1985 behavior is implemented.  
FSR_version (ver)  
For each SPARC V9 IU implementation (as identified by its VER.implfield), there  
may be one or more FPU implementations or none. This field identifies the  
particular FPU implementation present. For the first SPARC64 V, FSR.ver= 0 (impl.  
dep. #19); however, future versions of the architecture may set FSR.verto other  
values. Consult the SPARC64 V Data Sheet for the setting of FSR.verfor your  
chipset.  
FSR_floating-point_trap_type (ftt)  
The complete conditions under which SPARC64 V triggers fp_exception_other with  
trap type unfinished_FPop is described in Section B.6, Floating-Point Nonstandard Mode,  
on page 61 (impl. dep. #248).  
FSR_current_exception (cexc)  
Bits 4 through 0 indicate that one or more IEEE_754 floating-point exceptions were  
generated by the most recently executed FPop instruction. The absence of an  
exception causes the corresponding bit to be cleared.  
In SPARC64 V, the cexcbits are set according to the following pseudocode:  
if (<LDFSR or LDXFSR commits>)  
<update using data from LDFSR or LDXFSR>;  
else if (<FPop commits with ftt = 0>)  
<update using value from FPU>  
18  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
                       
else if (<FPop commits with IEEE_754_exception>)  
<set one bit in the CEXC field as supplied by FPU>;  
else if (<FPop commits with unfinished_FPop error>)  
<no change>;  
else if (<FPop commits with unimplemented_FPop error>)  
<no change>;  
else  
<no change>;  
FSR Conformance  
SPARC V9 allows the TEM, cexc, and aexcfields to be implemented in hardware in  
either of two ways (both of which comply with IEEE Std 754-1985). SPARC64 V  
follows case (1); that is, it implements all three fields in conformance with IEEE Std  
754-1985. See FSR Conformance in Section 5.1.7 of Commonality for more  
information about other implementation methods.  
5.1.9  
Tick (TICK) Register  
SPARC64 V implements TICK.counterregister as a 63-bit register (impl. dep.  
#105).  
Implementation Note – On SPARC64 V, the counterpart of the value returned  
when the TICKregister is read is the value of TICK.counterwhen the RDTICK  
instruction is executed. The difference between the countervalues read from the  
TICKregister on two reads reflects the number of processor cycles executed between  
the executions of the RDTICKinstructions, not their commits. In longer code  
sequences, the difference between this value and the value that would have been  
obtained when the instructions are committed would have been small.  
5.2  
Privileged Registers  
Please refer to Section 5.2 of Commonality for the description of privileged registers.  
5.2.6  
Trap State (TSTATE) Register  
SPARC64 V implements only bits 2:0 of the TSTATE.CWPfield. Writes to bits 4 and 3  
are ignored, and reads of these bits always return zeroes.  
Release 1.0, 1 July 2002  
F. Chapter 5  
Registers  
19  
                         
Note – Spurious setting of the PSTATE.REDbit by privileged software should not  
be performed, since it will take the SPARC64 V into RED_statewithout the  
required sequencing.  
5.2.9  
Version (VER) Register  
TABLE 5-1 shows the values for the VERregister for SPARC64 V.  
TABLE 5-1 VERRegister Encodings  
Bits  
Field  
Value  
63:48  
47:32  
31:24  
15:8  
4:0  
manuf  
impl  
000416 (impl. dep. #104)  
5 (impl. dep. #13)  
mask  
n (The value of n depends on the processor chip version)  
maxtl  
maxwin  
5
7
The manuffield contains Fujitsus 8-bit JEDEC code in the lower 8 bits and zeroes in  
the upper 8 bits. The manuf, impl, and maskfields are implemented so that they  
may change in future SPARC64 V processor versions. The maskfield is incremented  
by 1 any time a programmer-visible revision is made to the processor. See the  
SPARC64 V Data Sheet to determine the current setting of the maskfield.  
5.2.11  
Ancillary State Registers (ASRs)  
Please refer to Section 5.2.11 of Commonality for details of the ASRs.  
Performance Control Register (PCR) (ASR 16)  
SPARC64 V implements the PCRregister as described in SPARC JPS1 Commonality,  
In SPARC64 V, the accessibility of PCRwhen PSTATE.PRIV= 0 is determined by  
PCR.PRIV. If PSTATE.PRIV= 0 and PCR.PRIV= 1, an attempt to execute either  
RDPCRor WRPCRwill cause a privileged_action exception. If PSTATE.PRIV= 0 and  
PCR.PRIV= 0, RDPCRoperates without privilege violation and WRPCRcauses a  
privileged_action exception only when an attempt is made to change (that is, write 1  
to) PCR.PRIV(impl. dep. #250).  
See Appendix Q, Performance Instrumentation, for a detailed discussion of the PCR  
and PICregister usage and event count definitions.  
20  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
                                   
The Performance Control Register in SPARC64 V is illustrated in FIGURE 5-1 and  
described in TABLE 5-2.  
NC  
0
OVF  
0
OVRO 0  
0
SC  
0
SU  
0
SL  
ULRO UT ST PRIV  
63  
48 47 32 31 27 26  
25 24 22 21  
18 17 16 11 10  
9
4
3
2
1
0
20  
FIGURE 5-1 SPARC64 V Performance Control Register (PCR) (ASR 16)  
TABLE 5-2 PCRBit Description  
Bit  
Field  
Description  
47:32  
OVF  
Overflow Clear/ Set/ Status. Used to read counter overflow status (via RDPCR) and clear  
or set counter overflow status bits (via WRPCR). PCR.OVFis a SPARC64 V-specific field  
(impl. dep. #207).  
The following figure depicts the bit layout of SPARC64 V OVFfield for four counter  
pairs. Counter status bits are cleared on write of 0 to the appropriate OVFbit.  
0
U3 L3 U2 L2 U1 L1 U0 L0  
15  
7
6
5
4
3
2
1
0
26  
OVRO  
Overflow read-only. Write-only/ read-as-zero field specifying PCR.OVFupdate behavior  
for WRPCR.PCR. The OVROfield is implementation -dependent (impl. dep. #207).  
WRPCR.PCRwith PCR.OVRO = 1inhibits updating of PCR.OVFfor the current write  
only. The intention of PCR.OVROis to write PCRwhile preserving current PCR.OVF  
value. PCR.OVFis maintained internally by hardware, so a subsequent RDPCR.PCR  
returns accurate overflow status at the time.  
24:22  
20:18  
NC  
SC  
Number of counter pairs. Three-bit, read-only field specifying the number of counter  
pairs, encoded as 07 for 18 counter pairs (impl. dep. #207).  
For SPARC64 V, the hardcoded value of NCis 3 (indicating presence of 4 counter pairs).  
Select PIC. In SPARC64 V, three-bit field specifying which counter pair is currently  
selected as PIC(ASR 17) and which SU/ SLvalues are visible to software. On write,  
PCR.SCselects which counter pair is updated (unless PCR.ULROis set; see below). On  
read, PCR.SCselects which counter pair is to be read through PIC(ASR 17).  
16:11  
9:4  
3
SU  
Defined (as S1) in SPARC JPS1 Commonality.  
SL  
Defined (as S0) in SPARC JPS1 Commonality.  
ULRO  
Implementation-dependent field (impl. dep. #207) that specifies whether SU/ SLare  
read-only. In SPARC64 V, this field is write-only/ read-as-zero, specifying update  
behavior of SU/ SLon write. When PCR.ULRO= 1, SU/ SLare considered as read-only;  
the values set on PCR.SU/PCR.SLare not written into SU/SL. When PCR.ULRO= 0,  
SU/SLare updated. PCR.ULROis intended to switch visible PICby writing PCR.SC,  
without affecting current selection of SU/SLof that PIC. On PCRread, PCR.SU/PCR.SL  
always shows the current setting of the PICregardless of PCR.ULRO.  
2
1
UT  
ST  
Defined in SPARC JPS1 Commonality.  
Defined in SPARC JPS1 Commonality.  
Release 1.0, 1 July 2002  
F. Chapter 5  
Registers  
21  
             
TABLE 5-2 PCRBit Description (Continued)  
Bit  
Field  
Description  
0
PRIV  
Defined in SPARC JPS1 Commonality, with the additional function of controlling PCR  
accessibility as described above (impl. dep. #250).  
Performance Instrumentation Counter (PIC) Register (ASR  
17)  
The PICregister is implemented as described in SPARC JPS1 Commonality.  
Four PICs are implemented in SPARC64 V. Each is accessed through ASR 17, using  
PCR.SCas a select field. Read/ write access to the PICwill access the PICU/ PICL  
counter pair selected by PCR. For PICU/ PICLencodings of specific event counters,  
see Appendix Q, Performance Instrumentation.  
Counter Overflow.On overflow, counters wrap to 0, SOFTINTregister bit 15 is set,  
and an interrupt level-15 exception is generated. The counter overflow trap is  
triggered on the transition from value FFFFFFFF16 to value 0. If multiple overflows  
are generated simultaneously, then multiple overflow status bits will be set. If  
overflow status bits are already set, then they remain set on counter overflow.  
Overflow status bits are cleared by software writing 0 to the appropriate bit of  
PCR.OVFand may be set by writing 1 to the appropriate bit. Setting these bits by  
software does not generate a level 15 interrupt.  
Dispatch Control Register (DCR) (ASR 18)  
The DCRis not implemented in SPARC64 V. Zero is returned on read, and writes to  
the register are ignored. The DCRis a privileged register; attempted access by  
nonprivileged (user) code generates a privileged_opcode exception.  
5.2.12  
Registers Referenced Through ASIs  
Data Cache Unit Control Register (DCUCR)  
ASI 4516 (ASI_DCU_CONTROL_REGISTER), VA = 016  
.
The Data Cache Unit Control Register contains fields that control several memory-  
related hardware functions. The functions include Instruction, Prefetch, write and  
data caches, MMUs, and watchpoint setting. SPARC64 V implements most of  
DCUCURs functions described in Section 5.2.12 of Commonality.  
22  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
                       
After a power-on reset (POR), all fields of DCUCR, including implementation-  
dependent fields, are set to 0. After a WDR, XIR, or SIRreset, all fields of DCUCR,  
including implementation-dependent fields, are set to 0.  
The Data Cache Unit Control Register is illustrated in FIGURE 5-2 and described in  
TABLE 5-3. In the table, bits are grouped by function rather than by strict bit sequence.  
0
Implementation dependent  
WEAK_SPCA PM  
VM PR PW VR VW  
21 20  
DM IM  
0
0
0
63  
50  
49 48 47 42  
41 40 33 32 25 24 23 22  
4
3
2
1
0
FIGURE 5-2 DCU Control Register Access Data Format (ASI 45  
)
16  
TABLE 5-3  
Bits  
DCUCR Description  
Field  
Type  
Use Description  
49:48  
CP, CV  
RW  
Not implemented in SPARC64 V (impl. dep. #232). It reads as 0 and writes to  
it are ignored.  
47:42  
41  
impl. dep.  
Not used. It reads as 0 and writes to it are ignored.  
WEAK_SPCA  
RW  
Used for disabling speculative memory access (impl. dep. #240). When  
DCUCR.WEAK_SPCA= 1, the branch history table is cleared and no longer  
issues aggressive instruction prefetch.  
During DCUCR.WEAK_SPCA= 1, aggressive instruction prefetching is  
disabled and any load and store instructions are considered presync  
instructions that are executed when all previous instructions are committed.  
Because all CTI are considered as not taken, instructions residing beyond 1  
Kbyte of a CTI may be fetched and executed.  
On entering aggressive instruction Prefetch disable mode, supervisor  
software should issue membar #Sync, to make sure all in-flight instructions  
in the pipeline are discarded.  
During DCUCR.WEAK_SPCA= 1, an L2 cache flush by writing 1 to  
ASI_L2_CTRL.U2_FLUSHremains pending internally until  
DCUCR.WEAK_SPCAis set to 0. To wait for completion of the cache flush, a  
member #Syncmust be issued after DCUCR.WEAK_SPCAis set to 0.  
Executing a membar #Syncwhile the DCUCR.WEAK_SPCA= 1 after writing 1  
to ASI_L2_CTRL.U2_FLUSHdoes not wait for the cache flush to complete.  
40:33  
32:25  
24, 23  
22, 21  
20:4  
3
PM<7:0>  
VM<7:0>  
PR, PW  
VR, VW  
Defined in SPARC JPS1 Commonality.  
Defined in SPARC JPS1 Commonality.  
Defined in SPARC JPS1 Commonality.  
Defined in SPARC JPS1 Commonality.  
Reserved.  
DM  
Defined in SPARC JPS1 Commonality.  
Defined in SPARC JPS1 Commonality.  
2
IM  
Release 1.0, 1 July 2002  
F. Chapter 5  
Registers  
23  
                                 
TABLE 5-3  
DCUCR Description (Continued)  
Bits  
Field  
Type  
Use Description  
1
DC  
RW  
Not implemented in SPARC64 V (impl. dep. #252). It reads as 0 and writes to  
it are ignored.  
0
IC  
RW  
Not implemented in SPARC64 V (impl. dep. #253). It reads as 0 and writes to  
it are ignored.  
Data Watchpoint Registers  
No implementation-dependent feature of SPARC64 V reduces the reliability of data  
watchpoints (impl. dep. #244).  
SPARC64 V employs conservative check of PA/ VA watchpoint over partial store  
instruction. See Section A.42, Partial Store (VIS I), on page 57 for details.  
Instruction Trap Register  
SPARC64 V implements the Instruction Trap Register (impl. dep. #205).  
In SPARC64 V, the least significant 11 bits (bits 10:0) of a CALLor branch (BPcc,  
FBPfcc, Bicc, BPr) instruction in an instruction cache are identical to their  
architectural encoding (as it appears in main memory) (impl. dep. #245).  
5.2.13  
5.2.14  
Floating-Point Deferred-Trap Queue (FQ)  
SPARC64 V does not contain a Floating-Point Deferred-trap Queue (impl. dep. #24).  
An attempt to read FQwith an RDPRinstruction generates an illegal_instruction  
exception (impl. dep. #25).  
IU Deferred-Trap Queue  
SPARC64 V neither has nor needs an IU deferred-trap queue (impl. dep. #16)  
24  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
                                 
F.CHAPTER  
6
This chapter presents SPARC64 V implementation-specific instruction details and the  
processor pipeline information in these subsections:  
Instruction Execution on page 25  
Instruction Formats and Fields on page 28  
Instruction Categories on page 29  
Processor Pipeline on page 31  
For additional, general information, please see parallel subsections of Chapter 6 in  
Commonality. For easy referencing, we follow the organization of Chapter 6 in  
Commonality.  
6.1  
Instruction Execution  
SPARC64 V is an advanced superscalar implementation of SPARC V9. Several  
instructions may be issued and executed in parallel. Although SPARC64 V provides  
serial program execution semantics, some of the implementation characteristics  
described below are part of the architecture visible to software for correctness and  
efficiency. The affected software includes optimizing compilers and supervisor code.  
6.1.1  
Data Prefetch  
SPARC64 V employs speculative (out of program order) execution of instructions; in  
most cases, the effect of these instructions can be undone if the speculation proves to  
be incorrect.1 However, exceptions can occur because of speculative data  
prefetching. Formally, SPARC64 V employs the following rules regarding speculative  
prefetching:  
1. An async_data_error may be signalled during speculative data prefetching.  
25  
                 
1. If a memory operation y resolves to a volatile memory address (location[y]),  
SPARC64 V will not speculatively prefetch location[y] for any reason; location[y]  
will be fetched or stored to only when operation y is commitable.  
2. If a memory operation y resolves to a nonvolatile memory address (location[y]),  
SPARC64 V may speculatively prefetch location[y] subject, adhering to the  
following subrules:  
a. If an operation y can be speculatively prefetched according to the prior rule,  
operations with store semantics are speculatively prefetched for ownership  
only if they are prefetched to cacheable locations. Operations without store  
semantics are speculatively prefetched even if they are noncacheable as long as  
they are not volatile.  
b. Atomic operations (CAS(X)A, LDSTUB, SWAP) are never speculatively  
prefetched.  
SPARC64 V provides two mechanisms to avoid speculative execution of a load:  
1. Avoid speculation by disallowing speculative accesses to certain memory pages or  
I/ O spaces. This can be done by setting the E(side-effect) bit in the PTEfor all  
memory pages that should not allow speculation. All accesses made to memory  
pages that have the Ebit set in their PTEwill be delayed until they are no longer  
speculative or until they are cancelled. See Appendix F, Memory Management Unit,  
for details.  
2. Alternate space load instructions that force program order, such as  
ASI_PHYS_BYPASS_WITH_EBIT[_L] (AS I = 1516, 1D16), will not be speculatively  
executed.  
6.1.2  
Instruction Prefetch  
The processor prefetches instructions to minimize cases where the processor must  
wait for instruction fetch. In combination with branch prediction, prefetching may  
cause the processor to access instructions that are not subsequently executed. In  
some cases, the speculative instruction accesses will reference data pages.  
SPARC64 V does not generate a trap for any exception that is caused by an  
instruction fetch until all of the instructions before it (in program order) have been  
committed.1  
1. Hardware errors and other asynchronous errors may generate a trap even if the instruction that caused the  
trap is never committed.  
26  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
         
6.1.3  
Syncing Instructions  
SPARC64 V has instructions, called syncing instructions, that stop execution for the  
number of cycles it takes to clear the pipeline and to synchronize the processor.  
There are two types of synchronization, pre and post. A presyncing instruction waits  
for all previous instructions to commit, commits by itself, and then issues successive  
instructions. A postsyncing instruction issues by itself and prevents the successive  
instructions from issuing until it is committed. Some instructions have both pre- and  
postsync attributes.  
In SPARC64 V almost all instructions commit in order, but store instruction commit  
before becoming globally visible. A few syncing instructions cause the processor to  
discard prefetched instructions and to refetch the successive instructions. TABLE 6-1  
lists all pre-/ postsync instructions and the effects of instruction execution.  
TABLE 6-1 SPARC64 V Syncing Instructions  
Presyncing  
Wait for  
Postsyncing  
Discard  
Opcode  
Sync?  
store global  
visibility?  
Sync?  
prefetched  
instructions?  
ALIGNADDRESS{_LITTLE}  
Yes  
Yes  
Yes  
BMASK  
DONE  
Yes  
Yes  
FCMP(GT,LE,NE,EQ)(16,32)  
Yes  
Yes  
Yes  
Yes  
Yes  
Yes  
Yes  
Yes  
FLUSH  
FMOV(s,d)icc  
FMOVr  
LDD  
Yes  
Yes  
Yes  
LDDA  
LDDFA  
memory access with  
ASI=ASI_PHYS_BYPASS_EC{_LITTLE},  
ASI_PHYS_BYPASS_EC_WITH_E_BIT{_LITTLE}  
LDFSR, LDXFSR  
MEMBAR  
MOVfcc  
MULScc  
PDIST  
Yes  
Yes  
1
Yes  
Yes  
Yes  
Yes  
Yes  
RDASR  
Yes  
RETRY  
Yes  
Yes  
Yes  
SIAM  
STBAR  
Yes  
Yes  
STD  
Release 1.0, 1 July 2002  
F. Chapter 6  
Instructions  
27  
   
TABLE 6-1 SPARC64 V Syncing Instructions (Continued)  
Presyncing  
Wait for  
Postsyncing  
Discard  
Opcode  
Sync?  
store global  
visibility?  
Sync?  
prefetched  
instructions?  
STDA  
Yes  
STDFA  
Yes  
Yes  
Yes  
Yes  
STFSR, STXFSR  
Tcc  
Yes  
Yes  
Yes  
2
WRASR  
1. When #cmask != 0.  
2. WRGSRonly.  
6.2  
Instruction Formats and Fields  
Instructions are encoded in five major 32-bit formats and several minor formats.  
Please refer to Section 6.2 of Commonality for illustrations of four major formats.  
FIGURE 6-1 illustrates Format 5, unique to SPARC64 V.  
Format 5 (op = 2, op3 = 3716): FMADD, FMSUB, FNMADD, and FNMSUB(in place of IMPDEP2B)  
op  
rd  
op3  
rs1  
rs3  
var  
size  
rs2  
31 30 29  
25 24  
19 18 17  
14 13 12 11 10  
9
8
7
6
5
4
0
FIGURE 6-1 Summary of Instruction Formats: Format 5  
Instruction fields are those shown in Section 6.2 of Commonality. Three additional  
fields are implemented in SPARC64 V. They are described in TABLE 6-2.  
TABLE 6-2  
Instruction Fields Specific to SPARC64 V  
Bits  
Field  
Description  
13:9  
rs3  
This 5-bit field is the address of the third fregister source operand for  
the floating-point multiply-add and multiply-subtract instruction.  
8.7  
6.5  
var  
This 2-bit field specifies which specific operation (variation) to perform  
for the floating-point multiply-add and multiply-subtract instructions  
size  
This 2-bit field specifies the size of the operands for the floating-point  
multiply-add and multiply-subtract instructions.  
28  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
               
Since size= 00 is not IMPDEP2Band since size= 11 assumed quad operations but  
is not implemented in SPARC64 V, the instruction with size= 00 or 11 generates an  
illegal_instruction exception in SPARC64 V.  
6.3  
Instruction Categories  
SPARC V9 instructions comprise the categories listed below. All categories are  
described in Section 6.3 of Commonality. Subsections in bold face are SPARC64 V  
implementation dependencies.  
Memory access  
Memory synchronization  
Integer arithmetic  
Control transfer (CTI)  
Conditional moves  
Register window management  
State register access  
Privileged register access  
Floating-point operate (FPop)  
Implementation-dependent  
6.3.3  
Control-Transfer Instructions (CTIs)  
These are the basic control-transfer instruction types:  
Conditional branch (Bicc, BPcc, BPr, FBfcc, FBPfcc)  
Unconditional branch  
Call and link (CALL)  
Jump and link (JMPL, RETURN)  
Return from trap (DONE, RETRY)  
Trap (Tcc)  
Instructions other than CALLand JMPLare described in their entirety in Section 6.3.2  
of Commonality. SPARC64 V implements CALLand JMPLas described below.  
CALL and JMPL Instructions  
SPARC64 V writes all 64 bits of the PCinto the destination register when  
PSTATE.AM= 0. The upper 32 bits of r[15](CALL) or of r[rd](JMPL) are written  
as zeroes when PSTATE.AM= 1 (impl. dep. #125).  
Release 1.0, 1 July 2002  
F. Chapter 6  
Instructions  
29  
           
SPARC64 V implements JMPLand CALLreturn prediction hardware in a form of  
special stack, called the Return Address Stack (RAS). Whenever a CALLor JMPLthat  
writes to %o7(r[15]) occurs, SPARC64 V pushesthe return address (PC+8) onto  
the RAS. When either of the synthetic instructions retl (JMPL[%o7+8]) and ret (JMPL  
[%i7+8]) are subsequently executed, the return address is predicted to be the  
address stored on the top of the RAS and the RAS is popped.If the prediction in  
the RAS is incorrect, SPARC64 V backs up and starts issuing instructions from the  
correct target address. This backup takes a few extra cycles.  
Programming Note For maximum performance, software and compilers must  
take into account how the RAS works. For example, tricks that do nonstandard  
returns in hopes of boosting performance may require more cycles if they cause the  
wrong RAS value to be used for predicting the address of the return. Heavily nested  
calls can also cause earlier entries in the RAS to be overwritten by newer entries,  
since the RAS only has a limited number of entries. Eventually, some return  
addresses will be mispredicted because of the overflow of the RAS.  
6.3.7  
Floating-Point Operate (FPop) Instructions  
The complete conditions of generating an fp_exception_other exception with  
FSR.ftt= unfinished_FPop are described in Section B.6, Floating-Point Nonstandard  
Mode on page 61.  
The SPARC64 V-specific FMADDand FMSUBinstructions (described below) are also  
floating-point operations. They require the floating-point unit to be enabled;  
otherwise, an fp_disabled trap is generated. They also affect the FSR, like FPop  
instructions. However, these instructions are not included in the FPop category and,  
hence, reserved encodings in these opcodes generate an illegal_instruction exception, as  
defined in Section 6.3.9 of Commonality.  
6.3.8  
Implementation-Dependent Instructions  
SPARC64 V uses the IMPDEP2instruction to implement the Floating-Point Multiply-  
Add/ Subtract and Negative Multiply-Add/ Subtract instructions; these have an op3  
field = 37 (IMPDEP2). See Floating-Point Multiply-Add/Subtract on page 50 for fuller  
16  
definitions of these instructions. Opcode space is reserved in IMPDEP2for the quad-  
precision forms of these instructions. However, SPARC64 V does not currently  
implement the quad-precision forms, and the processor generates an illegal_instruction  
exception if a quad-precision form is specified. Since these instructions are not part  
of the required SPARC V9 architecture, the operating system does not supply  
software emulation routines for the quad versions of these instructions.  
SPARC64 V uses the IMPDEP1instruction to implement the graphics acceleration  
instructions.  
30  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
                       
6.4  
Processor Pipeline  
The pipeline of SPARC64 V consists of fifteen stages, shown in FIGURE 6-2. Each  
stage is referenced by one or two letters as follows:  
IA  
IT  
IM  
IB  
IR  
E
D
P
B
X
U
W
Ps  
Ts  
Ms  
Bs  
Rs  
6.4.1  
Instruction Fetch Stages  
IA (Instruction Address generation) Calculate fetch target address.  
IT (Instruction TLB Tag access) Instruction TLB tag search. Search of BRHIS  
and RAS is also started.  
IM (Instruction TLB tag Match) Check TLB tag is matched.  
The result of BRHIS and RAS search is also available at this stage and is  
forwarded to IA stage for subsequent fetch.  
IB (Instruction cache Buffer read) Read L1 cache data if TLB is hit.  
IR (Instruction read Result) Write to I-Buffer.  
IA through IR stages are dedicated to instruction fetch. These stages work in concert  
with the cache access unit to supply instructions to subsequent stages. The  
instructions fetched from memory or cache are stored in the Instruction Buffer (I-  
buffer). The I-buffer has six entries, each of which can hold 32-byte-aligned 32-byte  
data (eight instructions).  
SPARC64 V has a branch prediction mechanism and resources named BRHIS  
(BRanch HIStory) and RAS (Return Address Stack). Instruction fetch stages use these  
resources to determine fetch addresses.  
Instruction fetch stages are designed so that they work independently of subsequent  
stages as much as possible. And they can fetch instructions even when execution  
stages stall. These stages fetch until the I-Buffer is full; further fetches are possible by  
requesting prefetches to the L1 cache.  
Release 1.0, 1 July 2002  
F. Chapter 6  
Instructions  
31  
     
IF EAG  
IA  
IT  
iTLB  
L1I  
BRHIS  
IM  
IB  
IR  
Instruction Buffer  
E
IWR  
D
P
RSFA  
RSA  
RSFB  
RSEA  
RSEB  
RSBR  
CSE  
B
X
FXB  
RR  
FXA  
RR  
EXB  
RR  
EXA  
RR  
EAGA  
EAGB  
Ps  
Ts  
dTLB  
FUB  
GUB  
Ms  
L1D  
Bs  
Rs  
LB  
LR  
U
W
FPR  
GPR  
ccr fsr PC nPC  
FIGURE 6-2 SPARC64 V Pipeline  
32  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
 
6.4.2  
Issue Stages  
E (Entry) Instructions are passed from fetch stages.  
D (Decode) Assign resources and dispatch to reservation station (RS.)  
SPARC64 V is an out-of-order execution CPU. It has six execution units (two of  
arithmetic and logic unit, two of floating-point unit, two of load/ store unit). Each  
unit except the load/ store unit has its own reservation station. E and D stages are  
issue stages that decode instructions and dispatch them to the target RS. SPARC64 V  
can issue up to four instructions per cycle.  
The resources needed to execute an instruction are assigned in the issue stages. The  
resources to be allocated include the following:  
Commit stack entry (CSE)  
Renaming registers of integer (GUB) and floating-point (FUB)  
Entries of reservations stations  
Memory access ports  
Resources needed for an instruction are specific to the instruction, but all resources  
must be assigned at these stages. In normal execution, assigned resources are  
released at the very last stage of the pipeline, W-stage.1 Instructions between the E-  
stage and W-stage are considered to be in-flight. When an exception is signalled, all  
in-flight instructions and the resources used by them are released immediately. This  
behavior enables the decoder to restart issuing instructions as quickly as possible.  
The number of in-flight instructions depends on how many resources are needed by  
them. The maximum number is 64.  
6.4.3  
Execution Stages  
P (priority) Select an instruction from those that have met the conditions for  
execution.  
B (buffer read) Read register file, or receive forwarded data from another  
pipelines.  
X (execute) Execution.  
Instructions in reservation stations will be executed when certain conditions are met,  
for example, the values of source registers are known, the execution unit is available.  
Execution latency varies from one to many, depending on the instruction.  
1. An entry in a reservation station is released at the X-stage.  
Release 1.0, 1 July 2002  
F. Chapter 6  
Instructions  
33  
   
Execution Stages for Cache Access  
Memory access requests are passed to the cache access pipeline after the target  
address is calculated. Cache access stages work the same way as instruction fetch  
stages, except for the handling of branch prediction. See Section 6.4.1, Instruction  
Fetch Stages, for details. Stages in instruction fetch and cache access correspond as  
follows:  
Instruction Fetch Stages  
Cache Access  
IA  
IT  
Ps  
Ts  
IM  
IB  
IR  
Ms  
Bs  
Rs  
When an exception is signalled, fetch ports and store ports used by memory access  
instructions are released. The cache access pipeline itself remains working in order to  
complete outgoing memory accesses. When data is returned, it is then stored to the  
cache.  
6.4.4  
Completion Stages  
U (Update) Update of physical (renamed) register.  
W (Write) Update of architectural registers and retire; exception handling.  
After an out-of-order execution, execution reverts to program order to complete.  
Exception handling is done in the completion stages. Exceptions occurring in  
execution stages are not handled immediately but are signalled when the  
instruction is completed.1  
1. RAS-related exception may be signalled before completion.  
34  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
 
F.CHAPTER  
7
Please refer to Chapter 7 of Commonality. Section numbers in this chapter  
correspond to those in Chapter 7 of Commonality.  
This chapter adds SPARC64 V-specific information in the following sections:  
Processor States, Normal and Special Traps on page 35  
Reset Traps on page 37  
Uses of the Trap Categories on page 37  
Trap Control on page 38  
PIL Control on page 38  
Trap-Table Entry Addresses on page 38  
Trap Type (TT) on page 38  
Details of Supported Traps on page 39  
Exception and Interrupt Descriptions on page 39  
7.1  
Processor States, Normal and Special  
Traps  
Please refer to Section 7.1 of Commonality.  
35  
   
7.1.1  
RED_state  
RED_state Trap Table  
The RED_statetrap vector is located at an implementation-dependent address  
referred to as RSTVaddr. The value of RSTVaddris a constant within each  
implementation; in SPARC64 V this virtual address is FFFFFFFFF000000016,  
which translates to physical address 000007FFF000000016 in RED_state(impl.  
dep. #114).  
RED_state Execution Environment  
In RED_state, the processor is forced to execute in a restricted environment by  
overriding the values of some processor controls and state registers.  
Note The values are overridden, not set, allowing them to be switched atomically.  
SPARC64 V has the following implementation-dependent behavior in RED_state  
(impl. dep. #115):  
While in RED_state, all internal ITLB-based translation functions are disabled.  
DTLB-based translations are disabled upon entry but may be reenabled by  
software while in RED_state. However, ASI-based access functions to the TLBs  
are still available.  
While mTLBs and uTLBs are disabled, all accesses are assumed to be  
noncacheable and strongly ordered for data access.  
XIRerrors are not masked and can cause a trap.  
Note When RED_stateis entered because of component failures, the handler  
should attempt to recover from potentially catastrophic error conditions or to disable  
the failing components. When RED_stateis entered after a reset, the software  
should create the environment necessary to restore the system to a running state.  
7.1.2  
error_state  
The processor enters error_statewhen a trap occurs while the processor is  
already at its maximum supported trap level (that is, when TL= MAXTL) (impl. dep.  
#39).  
36  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
                         
Although the standard behavior of the CPU upon an entry into error_stateis to  
internally generate a watchdog_reset (WDR), the CPU optionally stays halted upon an  
entry to error_statedepending on a setting in the OPSR register (impl. dep #40,  
#254).  
7.2  
Trap Categories  
Please refer to Section 7.2 of Commonality.  
An exception or interrupt request can cause any of the following trap types:  
Precise trap  
Deferred trap  
Disrupting trap  
Reset trap  
7.2.2  
Deferred Traps  
Please refer to Section 7.2.2 of Commonality.  
SPARC64 V implements a deferred trap to signal certain error conditions (impl. dep.  
#32). Please refer to the description of I_UGE error on Relation between %tpcand  
the instruction that caused the errorrow in TABLE P-2 (page 156) for details. See also  
Instruction End-Method at ADE Trap on page 170.  
7.2.4  
7.2.5  
Reset Traps  
Please refer to Section 7.2.4 of Commonality.  
In SPARC64 V, a watchdog reset (WDR) occurs when the processor has not  
committed an instruction for 233 processor clocks.  
Uses of the Trap Categories  
Please refer to Section 7.2.5 of Commonality.  
All exceptions that occur as the result of program execution are precise in  
SPARC64 V (impl. dep. #33).  
An exception caused after the initial access of a multiple-access load or store  
instruction (LDD(A), STD(A), LDSTUB, CASA, CASXA, or SWAP) that causes a  
catastrophic exception is precise in SPARC64 V.  
Release 1.0, 1 July 2002  
F. Chapter 7  
Traps  
37  
                                         
7.3  
Trap Control  
Please refer to Section 7.3 of Commonality.  
7.3.1  
PIL Control  
SPARC64 V receives external interrupts from the UPA interconnect. They cause an  
interrupt_vector_trap (TT = 6016). The interrupt vector trap handler reads the interrupt  
information and then schedules SPARC V9-compatible interrupts by writing bits in  
the SOFTINTregister. Please refer to Section 5.2.11 of Commonality for details.  
During handling of SPARC V9-compatible interrupts by SPARC64 V, the PIL  
register is checked. If an interrupt has sufficient priority, SPARC64 V will stop  
issuing new instructions, will flush all uncommitted instructions, and then will  
vector to the trap handler. The only exception to this process occurs when  
SPARC64 V is processing a higher-priority trap.  
SPARC64 V takes a normal disrupting trap upon receipt of an interrupt request.  
7.4  
Trap-Table Entry Addresses  
Please refer to Section 7.4 of Commonality.  
7.4.2  
Trap Type (TT)  
Please refer to Section 7.4.2 of Commonality.  
SPARC64 V implements all mandatory SPARC V9 and SPARC JPS1 exceptions, as  
described in Chapter 7 of Commonality, plus the exception listed in TABLE 7-1, which  
is specific to SPARC64 V (impl. dep. #35; impl. dep. #36).  
TABLE 7-1  
Exceptions Specific to SPARC64 V  
Exception or Interrupt Request  
TT  
Priority  
async_data_error  
040  
2
16  
38  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
                     
7.4.4  
Details of Supported Traps  
Please refer to Section 7.4.4 in Commonality.  
SPARC64 V Implementation-Specific Traps  
SPARC64 V supports the following implementation-specific trap type:  
async_data_error  
7.5  
Trap Processing  
Please refer to Section 7.5 of Commonality.  
7.6  
Exception and Interrupt Descriptions  
Please refer to Section 7.6 of Commonality.  
7.6.4  
SPARC V9 Implementation-Dependent, Optional  
Traps That Are Mandatory in SPARC JPS1  
Please refer to Section 7.6.4 of Commonality.  
SPARC64 V implements all six traps that are implementation dependent in SPARC  
V9 but mandatory in JPSI (impl. dep. #35). Se Section 7.6.4 of Commonality for  
details.  
7.6.5  
SPARC JPS1 Implementation-Dependent Traps  
Please refer to Section 7.6.5 of Commonality.  
SPARC64 V implements the following traps that are implementation dependent  
(impl. dep. #35).  
async_data_error [tt= 040 ] (Preemptive or disrupting) (impl. dep. #218) —  
16  
SPARC64 V implements the async_data_error exception to signal the following  
errors.  
Release 1.0, 1 July 2002  
F. Chapter 7  
Traps  
39  
               
Uncorrectable errors in the internal architecture registers (general registersgr,  
floating-point registersfr, ASR, ASI registers)  
Uncorrectable errors in the core pipeline  
System data corruption  
Watch dog timeout first time  
TLB access error upon access by an ldxaor stxainstruction  
Multiple errors may be reported in a single generation of the async_data_error  
exception. Depending on the situation, the async_data_error trap becomes a precise  
trap, a disrupting trap, or a preemptive trap upon error detection. The TPCand  
TNPCstacked by the exception may indicate the exact instruction, the preceding  
instruction, or the subsequent instruction inducing the error. See Appendix P for  
details of the async_data_error exception in SPARC64 V.  
40  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
F.CHAPTER  
8
Memory Models  
The SPARC V9 architecture is a model that specifies the behavior observable by  
software on SPARC V9 systems. Therefore, access to memory can be implemented in  
any manner, as long as the behavior observed by software conforms to that of the  
models described in Chapter 8 of Commonality and defined in Appendix D, Formal  
Specification of the Memory Models, also in Commonality.  
The SPARC V9 architecture defines three different memory models: Total St ore Order  
(TSO), Partial Store Order (PSO), and Relaxed Memory Order (RMO). All SPARC V9  
processors must provide Total Store Order (or a more strongly ordered model, for  
example, Sequential Consistency) to ensure SPARC V8 compatibility.  
Whether the PSO or RMO models are supported by SPARC V9 systems is  
implementation dependent; SPARC64 V behaves in a manner that guarantees  
adherence to whichever memory model is currently in effect.  
This chapter describes the following major SPARC64 V-specific details of memory  
models.  
SPARC V9 Memory Model on page 42  
For general information, please see parallel subsections of Chapter 8 in  
Commonality. For easier referencing, this chapter follows the organization of  
Chapter 8 in Commonality, listing subsections whether or not there are  
implementation-specific details.  
41  
               
8.1  
Overview  
Note The words hardware memory modeldenote the underlying hardware  
memory models as differentiated from the SPARC V9 memory model,which is the  
memory model the programmer selects in PSTATE.MM.  
SPARC64 V supports only one mode of memory handling to guarantee correct  
operation under any of the three SPARC V9 memory ordering models (impl. dep.  
#113):  
Total Store Order All loads are ordered with respect to loads, and all stores are  
ordered with respect to loads and stores. This behavior is a superset of the  
requirements for the SPARC V9 memory models TSO, PSO, and RMO. When  
PSTATE.MMselects TSO or PSO, SPARC64 V operates in this mode. Since  
programs written for PSO (or RMO) will always work if run under Total Store  
Order, this behavior is safe but does not take advantage of the reduced restrictions  
of PSO.  
8.4  
SPARC V9 Memory Model  
Please refer to Section 8.4 of Commonality.  
In addition, this section describes SPARC64 V-specific details about the processor/  
memory interface model.  
8.4.5  
Mode Control  
SPARC64 V implements Total Store Ordering for all PSTATE.MM. Writing 112 into  
PSTATE.MMalso causes the machine to use TSO (impl. dep. #119). However, the  
encoding 112 should not be used, since future version of SPARC64 V may use this  
encoding for a new memory model.  
8.4.6  
Synchronizing Instruction and Data Memory  
All caches in a SPARC64 V-based system (uniprocessor or multiprocessor) have a  
unified cache consistency protocol and implement strong coherence between  
instruction and data caches. Writes to any data cache cause invalidations to the  
42  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
               
corresponding locations in all instruction caches; references to any instruction cache  
cause corresponding modified data to be flushed and corresponding unmodified  
data to be invalidated from all data caches. The flush operation is still operative in  
SPARC64 V, however.  
Since the FLUSHinstruction synchronizes the processor, the total latency varies  
depending on the situation in SPARC64 V. Assuming all prior instructions are  
completed, the latency of FLUSHis 18 CPU cycles.  
Release 1.0, 1 July 2002  
F. Chapter 8  
Memory Models  
43  
44  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
F.APPENDIX  
A
Instruction Definitions:  
SPARC64 V Extensions  
This appendix describes the SPARC64 V-specific implementation of the instructions  
in Appendix A of Commonality. If an instruction is not described in this appendix,  
then no SPARC64 V implementation-dependency applies.  
See TABLE A-1 of Commonality for the location at which general information about  
the instruction can be found.  
Section numbers refer to the parallel section numbers in Appendix A of  
Commonality.  
TABLE A-1 lists four instructions that are unique to SPARC64 V.  
TABLE A-1 Implementation-Specific Instructions  
Operation  
Name  
Page  
V9 Ext?  
FMADD(s,d)  
FMSUB(s,d)  
FNMADD(s,d)  
FNMSUB(s,d)  
Floating-point multiply add  
Floating-point multiply subtract  
Floating-point multiply negate add  
Floating-point multiply negate subtract  
page 50  
page 50  
page 50  
page 50  
Each instruction definition consists of these parts:  
1. A table of the opcodes defined in the subsection with the values of the field(s)  
2. An illustration of the applicable instruction format(s). In these illustrations a dash  
() indicates that the field is reserved for future versions of the architecture and  
shall be 0 in any instance of the instruction. If a conforming SPARC V9  
implementation encounters nonzero values in these fields, its behavior is  
undefined.  
3. A list of the suggested assembly language syntax, as described in Appendix G,  
Assembly Language Syntax.  
45  
             
4. A description of the features, restrictions, and exception-causing conditions.  
5. A list of exceptions that can occur as a consequence of attempting to execute the  
instruction(s). Exceptions due to an instruction_access_error,  
instruction_access_exception, fast_instruction_access_MMU_miss, async_data_error,  
ECC_error, and interrupts are not listed because they can occur on any instruction.  
Also, any instruction that is not implemented in hardware shall generate an  
illegal_instruction exception (or fp_exception_other exception with  
ftt= unimplemented_FPop for floating-point instructions) when it is executed.  
The illegal_instruction trap can occur during chip debug on any instruction that has  
been programmed into the processors IIU_INST_TRAP(ASI = 6016, VA = 0).  
These traps are also not listed under each instruction.  
The following traps never occur in SPARC64 V:  
instruction_access_MMU_miss  
data_access_MMU_miss  
data_access_protection  
unimplemented_LDD  
unimplemented_STD  
internal_processor_error  
fp_exception_other (ftt= invalid_fp_register)  
This appendix does not include any timing information (in either cycles or clock  
time).  
The following SPARC64 V-specific extensions are described.  
Implementation-Dependent Instructions on page 49  
Load Quadword, Atomic [Physical] on page 54  
Memory Barrier on page 55  
Partial Store (VIS I) on page 57  
Prefetch Data on page 57  
Read State Register on page 58  
SHUTDOWN (VIS I) on page 58  
Write State Register on page 59  
Deprecated Instructions on page 59  
46  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
                             
A.4  
Block Load and Store Instructions (VIS I)  
The following notes summarize behavior of block load/ store instructions in  
SPARC64 V.  
1. Block load and store operations are not atomic, in that they are internally  
decomposed into eight independent, 8-byte load/ store operations in SPARC64 V.  
Each load/ store is always issued and performed in the RMO memory model and  
obeys all prior MEMBARand atomic instruction-imposed ordering constraints.  
2. Block load/ store instructions are out of the scope of V9 memory models, meaning  
that self-consistency of memory reference instruction is not always maintained if  
block load/ store instructions are involved in the execution flow. The following  
table describes the implemented ordering constraints for block load/ store  
instructions with respect to the other memory reference instructions with an  
operand address conflict in SPARC64 V:  
Program Order for conflicting bld/bst/ld/st  
Ordered/  
first  
next  
Out-of-Order  
store  
blockstore  
blockload  
blockstore  
blockload  
store  
Ordered  
store  
Ordered  
load  
Ordered  
load  
Ordered  
blockstore  
blockstore  
blockstore  
blockstore  
blockload  
blockload  
blockload  
blockload  
Out-of-Order  
Out-of-Order  
Out-of-Order  
Out-of-Order  
Ordered  
load  
blockstore  
blockload  
store  
load  
Ordered  
blockstore  
blockload  
Ordered  
Ordered  
To maintain the memory ordering even for the memory address conflicts, MEMBAR  
instructions shall be inserted into appropriate location in the program.  
Although self-consistency with respect to the block load/ store and the other  
memory reference instructions is not maintained in some cases, register conflicts  
between the other instructions and block load/ store instructions are maintained  
in SPARC64 V. The read-after-write, write-after-read, and write-after-write  
obstructions between a block load/ store instruction and the other arithmetic  
instructions are detected and handled appropriately.  
3. Block load instruction operate on the cache if the operand is present.  
Release 1.0, 1 July 2002  
F. Chapter A  
Instruction Definitions: SPARC64 V Extensions  
47  
     
4. The block store with commit instruction always stores the operand in main  
storage and invalidates the line in the L1D cache if it is present. The invalidation  
is performed through an S_INV_REQtransaction through UPA by the system  
controller.  
5. The block store instruction stores the operand into main storage if it is not present  
in the operand cache and the status of the line is invalid, shared, or owned. In  
case the line is not present in the L1D cache and is exclusive or modified on the  
L2 cache, the block store instruction modifies only the line in L2 cache. If the line  
is present in the operand cache and the status is either clean/ shared or clean/  
owned, the line is stored in main storage. If the line is present in the operand  
cache and the status is clean/ exclusive, the line in the operand cache is  
invalidated and the operand is stored in the L2 cache. If the line is in the operand  
cache and the status is modified/ modified, the operand is stored in the operand  
cache. The following table summarizes each cache status before block store and  
the results of the block store. Blank cells mean that no action occurred in the  
corresponding cache or memory, and the data, if it exists, is unchanged.  
Storage  
Status  
L1  
Invalid  
I, S, O  
Valid  
Cache status  
before bst  
L2  
E, M  
E
M
S, O  
L1  
invalidate  
update  
Action  
L2  
update  
update  
S
Memory  
update  
update  
Exceptions  
fp_disabled  
PA_watchpoint  
VA_watchpoint  
illegal_instruction (misaligned rd)  
mem_address_not_aligned (see Block Load and Store ASIs on page 120)  
data_access_exception (see Block Load and Store ASIs on page 120)  
LDDF_mem_address_not_aligned (see Block Load and Store ASIs on page 120)  
data_access_error  
fast_data_access_MMU_miss  
fast_data_access_protection  
48  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
   
A.12 Call and Link  
SPARC64 V clears the upper 32 bits of the PCvalue in r[15]when PSTATE.AMis  
set (impl. dep. #125). The value written into r[15]is visible to the instruction in the  
delay slot.  
SPARC64 V has a special hardware table, called the return address stack, to predict  
the return address from a subroutine. Though the return prediction stack achieves  
better performance in normal cases, there is a special use of the CALLinstruction  
(call.+8) that may have an undesirable effect on the return address stack. In this  
case, the CALLinstruction is used to read the PCcontents, not to call a subroutine. In  
SPARC64 V, the return address of the CALL(PC+8) is not stored in its return  
address stack, to avoid a detrimental performance effect. When a retor retlis  
executed, the value in the return address stack is used to predict the return address.  
A.24 Implementation-Dependent Instructions  
Opcode  
op3  
Operation  
IMPDEP1  
IMPDEP2  
11 0110  
11 0111  
Implementation-Dependent Instruction 1  
The IMPDEP1and IMPDEP2instructions are completely implementation dependent.  
Implementation-dependent aspects include their operation, the interpretation of bits  
2925 and 180 in their encodings, and which (if any) exceptions they may cause.  
SPARC64 V uses IMPDEP1to encode VIS instructions (impl. dep. #106).  
SPARC64 V uses IMPDEP2Bto encode the Floating-Point Multiply Add/ Subtract  
instructions (impl. dep. #106). See Section A.24.1, Floating-Point Multiply-Add/  
Subtract, on page 50 for details.  
See I.1.2, Implementation-Dependent and Reserved Opcodes, in Commonality for  
information about extending the SPARC V9 instruction set by means of the  
implementation-dependent instructions.  
Compatibility Note These instructions replace the CPopn instructions in  
SPARC V8.  
Exceptions  
implementation-dependent (IMPDEP2)  
Release 1.0, 1 July 2002  
F. Chapter A  
Instruction Definitions: SPARC64 V Extensions  
49  
                 
A.24.1  
Floating-Point Multiply-Add/ Subtract  
SPARC64 V uses IMPDEP2Bopcode space to encode the Floating-Point Multiply  
Add/ Subtract instructions.  
Opcode  
Variation  
00  
Size†  
01  
Operation  
FMADDs  
FMADDd  
FMSUBs  
FMSUBd  
FNMADDs  
FNMADDd  
FNMSUBs  
FNMSUBd  
Multiply-Add Single  
00  
10  
Multiply-Add Double  
01  
01  
Multiply-Subtract Single  
Multiply-Subtract Double  
Negative Multiply-Add Single  
Negative Multiply-Add Double  
Negative Multiply-Subtract Single  
Negative Multiply-Subtract Double  
01  
10  
11  
01  
11  
10  
10  
01  
10  
10  
11 is reserved for quad.  
Format (5)  
10  
rd  
110111  
25 24  
rs1  
rs3  
var size  
7 6  
rs2  
31 30 29  
19 18  
14 13  
9 8  
5 4  
0
Operation  
Implementation  
Multiply-Add  
Multiply-Subtract  
rd rs1 × rs2 + rs3  
rd rs1 × rs2 rs3  
Negative Multiply-Subtract  
Negative Multiple-Add  
rd (rs1 × rs2 rs3)  
rd (rs1 × rs2 + rs3)  
Assembly Language Syntax  
fmadds  
fmaddd  
fmsubs  
fmsubd  
fnmadds  
fnmaddd  
fnmsubs  
fnmsubd  
freg , freg , freg , freg  
rs1 rs2 rs3  
rd  
rd  
rd  
rd  
rd  
rd  
rd  
rd  
freg , freg , freg , freg  
rs1  
rs2  
rs3  
freg , freg , freg , freg  
rs1  
rs2  
rs3  
freg , freg , freg , freg  
rs1  
rs2  
rs3  
freg , freg , freg , freg  
rs1  
rs2  
rs3  
freg , freg , freg , freg  
rs1  
rs2  
rs3  
freg , freg , freg , freg  
rs1  
rs2  
rs3  
freg , freg , freg , freg  
rs1  
rs2  
rs3  
50  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
                       
Description  
The Floating-point Multiply-Add instructions multiply the registers specified by the  
rs1field times the registers specified by the rs2field, add that product to the  
registers specified by the rs3field, then write the result into the registers specified  
by the rdfield.  
The Floating-point Multiply-Subtract instructions multiply the registers specified by  
the rs1field times the registers specified by the rs2field, subtract from that  
product the registers specified by the rs3field, and then write the result into the  
registers specified by the rdfield.  
The Floating-point Negative Multiply-Add instructions multiply the registers  
specified by the rs1field times the registers specified by the rs2field, negate the  
product, subtract from that negated value the registers specified by the rs3 field, and  
then write the result into the registers specified by the rdfield.  
The Floating-point Negative Multiply-Subtract instructions multiply the registers  
specified by the rs1field times the registers specified by the rs2field, negate the  
product, add that negated product to the registers specified by the rs3field, and  
then write the result into the registers specified by the rdfield.  
All of the operations above are treated as separate multiply and add/ subtract  
operations in SPARC64 V. That is, a multiply operation is first performed with a  
complete rounding step (as if it were a single multiply operation), and then an add/  
subtract operation is performed with a complete rounding step (as if it were a single  
add/ subtract operation). Consequently, at most two rounding errors can be  
incurred.1  
Special behaviors in handling traps are generated in a Floating-point Multiply-Add/  
Subtract instruction in SPARC64 V because of its implementation characteristics. If  
any trapping exception is detected in the multiply part in the process of a Floating-  
point Multiply-Add/ Subtract instruction, the execution of the instruction is aborted,  
the exception condition is recorded in FSR.cexcand FSR.aexc, and the CPU traps  
with the exception condition. The add/ subtract part of the instruction is only  
performed when the multiply-part of the instruction does not have any trapping  
exceptions.  
As described in the TABLE A-2, if there are trapping IEEE754 exception conditions in  
either of the operations FMULor FADD/SUB, only the trapping exception condition is  
recorded in the cexc, and the aexcis not modified. If there are no trapping IEEE754  
exception conditions, every nontrapping exception condition is ORed into the cexc  
and the cexcis accumulated into the aexc. The boundary conditions of an  
unfinished_FPop trap for Floating-point Multiply-Add/ Subtract instructions are  
exactly same as for FMULand FADD/SUBinstructions; if either of the operations  
1. Note that this implementation differs from previous SPARC64 implementations, which incurred at most one  
rounding error.  
Release 1.0, 1 July 2002  
F. Chapter A  
Instruction Definitions: SPARC64 V Extensions  
51  
detects any conditions for an unfinished_FPop trap, the Floating-point Multiply-Add/  
Subtract instruction generates the unfinished_FPop exception. In this case, none of rd,  
cexc, or aexcare modified.  
TABLE A-2  
Exceptions in Floating-Point Multiply-Add/ Subtract Instructions  
FMUL  
IEEE754 trap  
No trap  
No trap  
FADD/SUB  
cexc  
IEEE754 trap  
Exception condition of FMUL Exception condition of FADD Logical or of the nontrapping exception  
conditions of FMULand FADD/SUB  
aexc  
No change  
No change  
Logical OR of the cexc(above) and the  
aexc  
Detailed contents of cexcand aexcdepending on the various conditions are  
described in TABLE A-3 and TABLE A-4. The following terminology is used: uf, of, inv,  
and nx are nontrapping IEEE exception conditionsunderflow, overflow, invalid  
operation, and inexact, respectively.  
TABLE A-3 Non-Trapping cexcWhen FSR.NS= 0  
FADD  
none  
none  
nx  
nx  
of nx  
of nx  
of nx  
of nx  
uf of nx  
inv  
none  
nx  
nx  
inv  
nx  
inv nx  
inv of nx  
uf inv nx  
inv  
FMUL  
of nx  
uf nx  
inv  
of nx  
uf nx  
inv  
of nx  
uf nx  
TABLE A-4 Non-Trapping aexcWhen FSR.NS= 1  
FADD  
none  
none  
nx  
nx  
of nx  
of nx  
of nx  
of nx  
uf nx  
uf nx  
uf nx  
inv  
none  
nx  
nx  
inv  
nx  
inv nx  
inv of nx  
uf inv nx  
inv  
FMUL  
of nx  
uf nx  
inv  
of nx  
uf nx  
inv  
of nx  
In the tables, the conditions in the shaded columns are all reported as an  
unfinished_FPop trap by SPARC64 V. In addition, the conditions with do not  
exist.  
52  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
     
Programming Note The Multiply Add/ Subtract instructions are encoded in the  
SPARC V9 IMPDEP2opcode space, and they are specific to the SPARC64 V  
implementation. They cannot be used in any programs that will be executed on any  
other SPARC V9 processor, unless that implementation exactly matches the  
SPARC64 V use for the IMPDEP2opcode.  
Exceptions  
fp_disabled  
fp_exception_ieee_754 (NV, NX, OF, UF)  
illegal_instruction (size = 00 or 11 ) (fp_disabled is not checked for these encodings)  
2
2
fp_exception_other (unfinished_FPop)  
A.29 Jump and Link  
SPARC64 V clears the upper 32 bits of the PCvalue in r[rd] when PSTATE.AMis set  
(impl. dep. #125). The value written into r[rd]is visible to the instruction in the  
delay slot.  
If either of the low-order two bits of the jump address is nonzero, a  
mem_address_not_aligned exception occurs. However, when the JMPLinstruction  
causes a mem_address_not_aligned trap, DSFSRand DSFARare not updated.  
If the JMPLinstruction has r[rd]= 15, SPARC64 V stores PC + 8 in a hardware table  
called return address stack (RAS). When a ret (jmpl %i7+8, %g0) or retl (jmpl  
%o7+8, %g0) is executed, the value in the RAS is used to predict the return address.  
JMPLwith rd= 0can be used to return from a subroutine. The typical return  
address is r[31] + 8if a nonleaf routine (one that uses the SAVEinstruction) is  
entered by a CALLinstruction, or r[15] + 8if a leaf routine (one that does not  
use the SAVEinstruction) is entered by a CALLinstruction or by a JMPLinstruction  
with rd= 15.  
Release 1.0, 1 July 2002  
F. Chapter A  
Instruction Definitions: SPARC64 V Extensions  
53  
                             
A.30 Load Quadword, Atomic [Physical]  
The Load Quadword ASIs in this section are specific to SPARC64 V, as an extension  
to SPARC JPS1.  
Format (3) LDDA  
Description  
ASIs 3416 and 3C16 are used with the LDDAinstruction to atomically read a 128-bit  
54  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
                   
TTE.NFO= 0  
TTE.CP = 1  
TTE.CV = 0  
TTE.E = 0  
TTE.P = 1  
TTE.W = 0  
Note TTE.IEdepends on the endianness of the ASI. When the ASI is 03416,  
TTE.IE = 0; TTE.IE = 1 when the ASI is 03C16  
.
Therefore, the atomic quad load physical instruction can only be applied to a  
cacheable memory area. Semantically, ASI_QUAD_LDD_PHYS{_L} (03416 and  
03C16) is a combination of ASI_NUCLEUS_QUAD_LDDand ASI_PHYS_USE_EC.  
With respect to little endian memory, a Load Quadword Atomic instruction behaves  
as if it comprises two 64-bit loads, each of which is byte-swapped independently  
before being written into its respective destination register.  
Exceptions:  
privileged_action  
PA_watchpoint (recognized on only the first 8 bytes of a transfer)  
illegal_instruction (misaligned rd)  
mem_address_not_aligned  
data_access_exception  
data_access_error  
fast_data_access_MMU_miss  
fast_data_access_protection  
A.35 Memory Barrier  
Format (3)  
i=1  
cmask  
mmask  
10  
0
op3  
0 1111  
31 30 29  
25 24  
19 18  
14 13 12  
4 3  
0
6
7
Assembly Language Syntax  
membar  
membar_mask  
Release 1.0, 1 July 2002  
F. Chapter A  
Instruction Definitions: SPARC64 V Extensions  
55  
       
Description  
The memory barrier instruction, MEMBAR, has two complementary functions: to  
express order constraints between memory references and to provide explicit control  
of memory-reference completion. The membar_maskfield in the suggested assembly  
language is the concatenation of the cmaskand mmaskinstruction fields.  
The mmaskfield is encoded in bits 3 through 0 of the instruction. TABLE A-5 specifies  
the order constraint that each bit of mmask(selected when set to 1) imposes on  
memory references appearing before and after the MEMBAR. From zero to four mask  
bits can be selected in the mmaskfield.  
TABLE A-5  
Mask Bit  
Order Constraints Imposed by mmaskBits  
Name  
Description  
mmask<3>  
#StoreStore  
The effects of all stores appearing before the MEMBARinstruction must be  
visible to all processors before the effect of any stores following the MEMBAR.  
Equivalent to the deprecated STBARinstruction. Has no effect on SPARC64 V  
since all stores are performed in program order.  
mmask<2>  
#LoadStore  
All loads appearing before the MEMBARinstruction must have been performed  
before the effects of any stores following the MEMBARare visible to any other  
processor. Has no effect on SPARC64 V since all stores are performed in  
program order and must occur after performance of any load.  
mmask<1>  
#StoreLoad  
#LoadLoad  
The effects of all stores appearing before the MEMBARinstruction must be  
visible to all processors before loads following the MEMBARmay be performed.  
mmask<0>  
All loads appearing before the MEMBARinstruction must have been performed  
before any loads following the MEMBARmay be performed. Has no effect on  
SPARC64 V since all loads are performed after any prior loads.  
The cmaskfield is encoded in bits 6 through 4 of the instruction. Bits in the cmask  
field, described in TABLE A-6, specify additional constraints on the order of memory  
references and the processing of instructions. If cmask is zero, then MEMBARenforces  
the partial ordering specified by the mmaskfield; if cmaskis nonzero, then  
completion and partial order constraints are applied.  
TABLE A-6  
Mask Bit  
Bits in the cmaskField  
Function  
Name  
Description  
cmask<2>  
Synchronization #Sync  
barrier  
All operations (including nonmemory reference operations)  
appearing before the MEMBARmust have been performed, and  
the effects of any exceptions become visible before any  
instruction after the MEMBARmay be initiated.  
cmask<1>  
Memory issue  
barrier  
#MemIssue  
All memory reference operations appearing before the MEMBAR  
must have been performed before any memory operation after  
the MEMBARmay be initiated. Equivalent to #Syncin  
SPARC64 V.  
cmask<0>  
Lookaside  
barrier  
#Lookaside  
A store appearing before the MEMBARmust complete before  
any load following the MEMBARreferencing the same address  
can be initiated. Equivalent to #Syncin SPARC64 V.  
56  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
                                         
A.42 Partial Store (VIS I)  
Please refer A.42 in Commonality for general details.  
Watchpoint exceptions on partial store instructions occur conservatively on  
SPARC64 V. The DCUCRData Watchpoint masks are only checked for nonzero value  
(watchpoint enabled). The byte store mask (r[rs2]) in the partial store instruction  
is ignored, and a watchpoint exception can occur even if the mask is zero (that is, no  
store will take place) (impl. dep. #249).  
For a partial store instruction with mask = 0, SPARC64 V still issues a UPA  
transaction with zero-byte mask.  
Exceptions:  
fp_disabled  
PA_watchpoint  
VA_watchpoint  
illegal_instruction (misaligned rd)  
mem_address_not_aligned (see Partial Store ASIs on page 120)  
data_access_exception (see Partial Store ASIs on page 120)  
LDDF_mem_address_not_aligned (see Partial Store ASIs on page 120)  
data_access_error  
fast_data_access_MMU_miss  
fast_data_access_protection  
A.49 Prefetch Data  
Please refer to Section A.49, Prefetch Data, of Commonality for principal information.  
The prefetchainstruction of SPARC64 V works for the following ASIs.  
ASI_PRIMARY(08016), ASI_PRIMARY_LITTLE(08816  
ASI_SECONDARY(08116), ASI_SECONDARY_LITTLE(08916)  
ASI_NUCLEUS(0416), ASI_NUCLEUS_LITTLE(0C16  
ASI_PRIMARY_AS_IF_USER(01016), ASI_PRIMARY_AS_IF_USER_LITTLE  
)
)
(01816  
)
ASI_SECONDARY_AS_IF_USER(01116), ASI_SECONDARY_AS_IF_USER_LITTLE  
( 01916)  
If an ASI other than the above is specified, prefetchais executed as a nop.  
Release 1.0, 1 July 2002  
F. Chapter A  
Instruction Definitions: SPARC64 V Extensions  
57  
                                         
TABLE A-7 describes prefetch variants implemented in SPARC64 V.  
TABLE A-7 Prefetch Variants  
fcn  
Fetch to:  
L1D  
L2  
Status  
S
Description  
0
1
S
2
L1D  
L2  
M
3
M
4
NOP  
5-15  
16-19  
reserved (SPARC V9)  
illegal_instruction exception is signalled.  
implementation  
NOP  
dependent.  
20  
L1D  
S
If an access causes an mTLB miss,  
fast_data_access_MMU_miss exception is signalled.  
21  
L2  
S
If an access causes an mTLB miss,  
fast_data_access_MMU_miss exception is signalled.  
22  
L1D  
L2  
M
M
If an access causes an mTLB miss,  
fast_data_access_MMU_miss exception is signalled.  
23  
If an access causes an mTLB miss,  
fast_data_access_MMU_miss exception is signalled.  
24-31  
implementation  
dependent  
NOP  
A.51 Read State Register  
In SPARC64 V, an RDPCRinstruction will generate a privileged_action exception if  
PSTATE.PRIV= 0 and PCR.PRIV= 1. If PSTATE.PRIV= 0 and PCR.PRIV= 0,  
RDPCRwill not cause any access privilege violation exception (impl. dep. #250).  
A.70 SHUTDOWN (VIS I)  
In SPARC64 V, SHUTDOWNacts as a NOPin privileged mode (impl. dep. #206).  
58  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
                     
A.70 Write State Register  
In SPARC64 V, a WRPCRinstruction will cause a privileged_action exception if  
PSTATE.PRIV= 0 and PCR.PRIV= 1. If PSTATE.PRIV= 0 and PCR.PRIV= 0,  
WRPCRcauses a privileged_action exception only when an attempt is made to change  
(that is, write 1 to) PCR.PRIV(impl. dep. #250).  
A.71 Deprecated Instructions  
The deprecated instructions in A.71 of Commonality are provided only for  
compatibility with previous versions of the architecture. They should not be used in  
new software.  
A.71.10 Store Barrier  
In SPARC64 V, STBARbehaves as NOP since the hardware memory models always  
enforce the semantics of these MEMBARs for all memory accesses.  
Release 1.0, 1 July 2002  
F. Chapter A  
Instruction Definitions: SPARC64 V Extensions  
59  
               
60  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
F.APPENDIX  
B
IEEE Std 754-1985 Requirements for  
SPARC V9  
The IEEE Std 754-1985 floating-point standard contains a number of implementation  
dependencies.  
Please see Appendix B of Commonality for choices for these implementation  
dependencies, to ensure that SPARC V9 implementations are as consistent as  
possible.  
Following is information specific to the SPARC64 V implementation of SPARC V9 in  
these sections:  
Traps Inhibiting Results on page 61  
Floating-Point Nonstandard Mode on page 61  
B.1  
B.6  
Traps Inhibiting Results  
Please refer to Section B.1 of Commonality.  
The SPARC64 V hardware, in conjunction with kernel or emulation code, produces  
the results described in this section.  
Floating-Point Nonstandard Mode  
In this section, the hardware boundary conditions for the unfinished_FPop exception  
and the nonstandard mode of SPARC64 V floating-point hardware are discussed.  
61  
         
SPARC64 V floating-point hardware has its specific range of computation. If either  
the values of input operands or the value of the intermediate result shows that the  
computation may not fall in the range that hardware provides, SPARC64 V generates  
an fp_exception_other exception (tt= 02216) with FSR.ftt= 0216 (unfinished_FPop)  
and the operation is taken over by software.  
The kernel emulation routine completes the remaining floating-point operation in  
accordance with the IEEE 754-1985 floating-point standard (impl. dep. #3).  
SPARC64 V implements a nonstandard mode, enabled when FSR.NSis set (see  
FSR_nonstandard_fp (NS) on page 18). Depending on the setting in FSR.NS, the  
behavior of SPARC64 V with respect to the floating-point computation varies.  
B.6.1  
fp_exception_other Exception (ftt=unfinished_FPop)  
SPARC64 V may invoke an fp_exception_other (tt= 02216) exception with FSR.ftt=  
unfinished_FPop (ftt= 0216) in FsTOd, FdTOs, FADD(s,d), FSUB(s,d),  
FsMULd(s,d), FMUL(s,d), FDIV(s,d), FSQRT(s,d) floating-point instructions. In  
addition, Floating-point Multiply-Add/ Subtract instructions generate the exception,  
since the instruction is the combination of a multiply and an add/ subtract operation:  
FMADD(s,d), FMSUB(s,d), FNMADD(s,d), and FNMADD(s,d).  
The following basic policies govern the detection of boundary conditions:  
1. When one of the operands is a denormalized number and the other operand is a  
normal non-zero floating-point number (except for a NaN or an infinity), an  
fp_exception_other with unfinished_FPop condition is signalled. The cases in which  
the result is a zero or an overflow are excluded.  
2. When both operands are denormalized numbers, except for the cases in which the  
result is a zero or an overflow, an fp_exception_other with unfinished_FPop condition  
is signalled.  
3. When both operands are normal, the result before rounding is a denormalized  
number and TEM.UFM = 0, and fp_exception_other with unfinished_FPop condition  
is signalled, except for the cases in which the result is a zero.  
When the result is expected to be a constant, such as an exact zero or an infinity, and  
an insignificant computation will furnish the result, SPARC64 V tries to calculate the  
result without signalling an unfinished_FPop exception.  
62  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
           
Implementation Note Detecting the exact boundary conditions requires a large  
amount of hardware. SPARC64 V detects approximate boundary conditions by  
calculating the exponent intermediate result (the exponent before rounding) from  
input operands, to avoid the hardware cost. Since the computation of the boundary  
conditions is approximate, the detection of a zero result or an overflow result shall  
be pessimistic. SPARC64 V generates an unfinished_FPop exception pessimistically.  
The equations to calculate the result exponent to detect the boundary conditions  
from the input exponents are presented in TABLE B-1, where Er is the approximation  
of the biased result exponent before rounding and is calculated only from the input  
exponents (esrc1, esrc2). Er is to be used for detecting the boundary condition for an  
unfinished_FPop.  
TABLE B-1 Result Exponent Approximation for Detecting unfinished_FPop Boundary  
Conditions  
Operation  
fmuls  
fmuld  
fdivs  
fdivd  
Formula  
Er = esrc1 + esrc2 126  
Er = esrc1 + esrc2 1022  
Er = esrc1 - esrc2 + 126  
Er = esrc1 - esrc2 + 1022  
esrc1 and esrc2 are the biased exponents of the input operands. When the  
corresponding input operand is a denormalized number, the value is 0.  
From Er, eres is calculated. eres is a biased result exponent, after mantissa alignment  
and before rounding, where the appropriate adjustment of the exponent is applied to  
the result mantissa: left-shifting or right-shifting the mantissa to the implicit 1 at the  
left of the binary point, subtracting or adding the shift-amount to the exponent. The  
result mantissa is assumed to be 1.xxxx in calculating eres. If the result is a  
denormalized number, eres is less than zero.  
TABLE B-2 describes the boundary condition of each floating-point instruction that  
generates an unfinished_FPop exception.  
TABLE B-2  
unfinished_FPop Boundary Conditions  
Operation  
FdTOs  
Boundary Conditions  
25 < eres < 1 and TEM.UFM= 0.  
Second operand (rs2) is a denormalized number.  
FsTOd  
FADDs, FSUBs,  
FADDd, FSUBd  
1. One of the operands is a denormalized number, and the other operand is a normal,  
1
nonzero floating-point number (except for a NaN and an infinity) .  
2. Both operands are denormalized numbers.  
3. Both operands are normal nonzero floating-point numbers (except for a NaN and  
an infinity), eres < 1, and TEM.UFM= 0.  
Release 1.0, 1 July 2002  
F. Chapter B  
IEEE Std 754-1985 Requirements for SPARC V9  
63  
   
TABLE B-2  
Operation  
unfinished_FPop Boundary Conditions (Continued)  
Boundary Conditions  
FMULs, FMULd  
1. One of the operands is a denormalized number, the other operand is a normal,  
nonzero floating-point number (except for a NaN and an infinity), and  
single precision: -25 < Er  
double precision: -54 < Er  
2. Both operands are normal, nonzero floating-point numbers (except for a NaN and  
an infinity), TEM.UFM= 0, and  
single precision: 25 < eres < 1  
double precision: 54 < eres < 1  
FsMULd  
1. One of the operands is a denormalized number, and the other operand is a normal,  
nonzero floating-point number (except for a NaN and an infinity).  
2. Both operands are denormalized numbers.  
FDIVs, FDIVd  
1. The dividend (operand1; rs1) is a normal, nonzero floating-point number (except  
for a NaN and an infinity), the divisor (operand2; rs2) is a denormalized number,  
and  
single precision: Er < 255  
double precision: Er < 2047  
2. The dividend (operand1; rs1) is a denormalized number, the divisor (operand2;  
rs2) is a normal, nonzero floating-point number (except for a NaN and an infinity),  
and  
single precision: 25 < Er  
double precision: 54 < Er  
3. Both operands are denormalized numbers.  
4. Both operands are normal, nonzero floating-point numbers (except for a NaN and  
an infinity), TEM.UFM= 0 and  
single precision: 25 < eres < 1  
double precision: 54 < eres < 1  
FSQRTs, FSQRTd  
The input operand (operand2; rs2) is a positive nonzero and is a denormalized  
number.  
1. Operation of 0 and denormalized number generates a result in accordance with the IEEE754-1985 standard.  
Pessimistic Zero  
If a condition in TABLE B-3 is true, SPARC64 V generates the result as a pessimistic  
zero, meaning that the result is a denormalized minimum or a zero, depending on  
the rounding mode (FSR.RD).  
64  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
 
TABLE B-3 Conditions for a Pessimistic Zero  
Conditions  
Operations  
One operand is denormalized1  
Both are denormalized  
Both are normal fp-number2  
FdTOs  
always  
eres -25  
FMULs,  
FMULd  
single precision: Er 25  
double precision: Er 54  
single precision: Er 25  
double precision: Er 54  
Always  
single precision: eres 25  
double precision: eres 54  
single precision: eres 25  
double precision: eres 54  
FDIVs,  
FDIVd  
Never  
1. Both operands are non-zero, non-NaN, and non-infinity numbers.  
2. Both may be zero, but both are non-NaN and non-infinity numbers.  
Pessimistic Overflow  
If a condition in TABLE B-4 is true, SPARC64 V regards the operation as having an  
overflow condition.  
TABLE B-4 Pessimistic Overflow Conditions  
Operations  
FDIVs  
Conditions  
The divisor (operand2; rs2) is a denormalized number and, Er 255.  
The divisor (operand2; rs2) is a denormalized number and, E 2047.  
FDIVd  
B.6.2  
Operation Under FSR.NS = 1  
When FSR.NS= 1 (nonstandard mode), SPARC64 V zeroes all the input  
denormalized operands before the operation and signals an inexact exception if  
enabled. If the operation generates a denormalized result, SPARC64 V zeroes the  
result and also signals an inexact exception if enabled. The following list defines the  
operation in detail.  
If either operand is a denormalized number and both operands are non-zero, non-  
NaN, and non-infinity numbers, the input denormalized operand is replaced with  
a zero with same sign, and the operation is performed. If enabled, inexact  
exception is signalled; an fp_exception_ieee_754 (tt= 02116) is generated, with  
nxc=1 in FSR.cexc(FSR.ftt=0116; IEEE754_exception). However, if the  
operation is FDIV(s,d) and either a division_by_zero or an invalid_operation  
condition is detected, or if the operation is FSQRT(s,d) and an invalid_operation  
condition is detected, the inexact condition is not reported.  
If the result before rounding is a denormalized number, the result is flushed to a  
zero with a same sign and signals either an underflow exception or an inexact  
exception, depending on FSR.TEM.  
As observed from the preceding, when FSR.NS = 1, SPARC64 V generates neither  
an unfinished_FPop exception nor a denormalized number as a result. TABLE B-5  
Release 1.0, 1 July 2002  
F. Chapter B  
IEEE Std 754-1985 Requirements for SPARC V9  
65  
             
summarizes the behavior of SPARC64 V floating-point hardware depending on  
FSR.NS.  
Note The result and behavior of SPARC64 V of the shaded column in the tables  
Table B-5 and Table B-6 conform to IEEE754-1985 standard.  
Note Throughout Table B-5 and Table B-6, lowercase exception conditions such as  
nx, uf, of, dv and nv are nontrapping IEEE 754 exceptions. Uppercase exception  
conditions such as NX, UF, OF, DZ and NV are trapping IEEE 754 exceptions.  
TABLE B-5  
66  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
 
TABLE B-6 describes how SPARC64 V behaves when FSR.NS= 1 (nonstandard mode).  
TABLE B-6  
Nonarithmetic Operations Under FSR.NS= 1  
op2=  
Operations op1= denorm denorm  
UFM  
NXM  
1
DVM  
NVM  
Result  
FsTOd  
Yes  
Yes  
NX  
0
nx, a signed zero  
FdTOs  
1
0
1
UF  
NX  
0
uf + nx, a signed zero  
FADDs,  
FSUBs,  
FADDd,  
FSUBd  
Yes  
No  
Yes  
Yes  
No  
Yes  
Yes  
1
NX  
0
nx, op2  
1
NX  
0
nx, op1  
1
NX  
0
nx, a signed zero  
FMULs,  
FMULd,  
FsMULd  
1
NX  
0
nx, a signed zero  
Yes  
No  
Yes  
Yes  
1
NX  
0
nx, a signed zero  
FDIVs,  
FDIVd  
Yes  
No  
Yes  
1
1
1
NX  
0
nx, a signed zero  
1
DZ  
0
dz, a signed infinity  
NV  
1
0
nv, dNaN  
FSQRTs,  
FSQRTd  
Yes and op2  
> 0  
1
NX  
0
nx, zero  
NV  
Yes and op2  
< 0  
0
nv, dNaN  
1. A single precision dNaN is 7FFF.FFFF and a double precision dNaN is 7FFF.FFFF.FFFF.FFFF  
16,  
.
16  
Release 1.0, 1 July 2002  
F. Chapter B  
IEEE Std 754-1985 Requirements for SPARC V9  
67  
 
68  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
F.APPENDIX  
C
Implementation Dependencies  
This appendix summarizes implementation dependencies. In SPARC V9 and SPARC  
JPS1, the notation IMPL. DEP. #nn:identifies the definition of an implementation  
dependency; the notation (impl. dep. #nn)identifies a reference to an  
implementation dependency. These dependencies are described by their number nn  
in TABLE C-1 on page 70. These numbers have been removed from the body of this  
document for SPARC64 V to make the document more readable. TABLE C-1 has been  
modified to include descriptions of the manner in which SPARC64 V has resolved  
each implementation dependency.  
Note SPARC International maintains a document, Implementation Characteristics of  
Current SPARC-V9-based Products, Revision 9.x, that describes the implementation-  
dependent design features of all SPARC V9-compliant implementations. Contact  
SPARC International for this document at  
home page: www.sparc.org  
email: info@sparc.org  
C.1  
Definition of an Implementation  
Dependency  
Please refer to Section C.1 of Commonality.  
69  
   
C.2  
C.3  
C.4  
Hardware Characteristics  
Please refer to Section C.2 of Commonality.  
Implementation Dependency Categories  
Please refer to Section C.3 of Commonality.  
List of Implementation Dependencies  
TABLE C-1 provides a complete list of how each implementation dependency is  
treated in the SPARC64 V implementation.  
TABLE C-1 SPARC64 V Implementation Dependencies (1 of 11)  
Nbr  
SPARC64 V Implementation Notes  
Page  
1
The operating system emulates all instructions that generate  
illegal_instruction or unimplemented_FPop exceptions.  
2
3
Number of IU registers  
SPARC64 V supports eight register windows (NWINDOWS= 8).  
SPARC64 V supports an additional two global register sets (Interrupt  
globals and MMU globals) for a total of 160 integer registers.  
Incorrect IEEE Std 754-1985 results  
62  
See Section B.6, Floating-Point Nonstandard Mode, on page 61 for details.  
45  
Reserved.  
6
I/O registers privileged status  
This dependency is beyond the scope of this publication. It should be  
defined in each system that uses SPARC64 V.  
7
8
I/O register definitions  
This dependency is beyond the scope of this publication. It should be  
defined in each system that uses SPARC64 V.  
RDASR/WRASR target registers  
See A.50 and A.70 in Commonality for details of implementation-dependent  
RDASR/ WRASRinstructions.  
70  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
             
TABLE C-1 SPARC64 V Implementation Dependencies (2 of 11)  
Nbr  
SPARC64 V Implementation Notes  
Page  
9
RDASR/WRASR privileged status  
See A.50 and A.70 in Commonality for details of implementation-dependent  
RDASR/ WRASRinstructions.  
1012 Reserved.  
13  
VER.impl  
20  
VER.impl= 5 for the SPARC64 V processor.  
1415 Reserved.  
16  
IU deferred-trap queue  
24  
SPARC64 V neither has nor needs an IU deferred-trap queue.  
17  
18  
Reserved.  
Nonstandard IEEE 754-1985 results  
18, 62  
SPARC64 V flushes denormal operands and results to zero when  
FSR.NS= 1. For the treatment of denormalized numbers, please refer to  
Section B.6, Floating-Point Nonstandard Mode, on page 61 for details.  
19  
FPU version, FSR.ver  
FSR.ver= 0 for SPARC64 V.  
18  
19  
2021 Reserved.  
22  
FPU TEM, cexc, and aexc  
SPARC64 V implements all bits in the TEM, cexc, and aexcfields in  
hardware.  
23  
24  
25  
Floating-point traps  
In SPARC64 V floating-point traps are always precise; no FQ is needed.  
24  
24  
24  
SPARC64 V neither has nor needs a floating-point deferred-trap queue.  
RDPR of FQ with nonexistent FQ  
Attempting to execute an RDPRof the FQcauses an illegal_instruction  
exception.  
2628 Reserved.  
29  
Address space identifier (ASI) definitions  
The ASIs that are supported by SPARC64 V are defined in Appendix L,  
Address Space Identifiers.  
30  
31  
ASI address decoding  
SPARC64 V supports all of the listed ASIs.  
117  
138  
Catastrophic error exceptions  
SPARC64 V contains a watchdog timer that times out after no instruction  
has been committed for a specified number of cycles. If the timer times out,  
the CPU tries to invoke an async_data_error trap. If the counter continues to  
count to reach 233, the processor enters error_state. Upon an entry to  
error_state, the processor optionally generates a WDR reset to recover  
from error_state.  
Release 1.0, 1 July 2002  
F. Chapter C  
Implementation Dependencies  
71  
         
TABLE C-1 SPARC64 V Implementation Dependencies (3 of 11)  
Nbr  
SPARC64 V Implementation Notes  
Page  
32  
Deferred traps  
37, 149  
SPARC64 V signals a deferred trap in a few of its severe error conditions.  
SPARC64 V does not contain a deferred trap queue.  
33  
Trap precision  
37  
There are no deferred traps in SPARC64 V other than the trap caused by a  
few severe error conditions. All traps that occur as the result of program  
execution are precise.  
34  
35  
Interrupt clearing  
For details of interrupt handling see Appendix N, Interrupt Handling.  
Implementation-dependent traps  
SPARC64 V supports the following traps that are implementation  
dependent:  
39, 39  
interrupt_vector_trap (tt= 060  
)
16  
PA_watchpoint (tt= 061  
)
16  
VA_watchpoint (tt= 062  
)
16  
ECC_error (tt= 063  
)
16  
fast_instruction_access_MMU_miss (tt= 064 through 067  
)
16  
16  
fast_data_access_MMU_miss (tt= 068 through 06B  
)
16 16  
fast_data_access_protection (tt= 06C through 06F  
)
16  
16  
async_data_error (tt= 04016  
)
36  
Trap priorities  
38  
SPARC64 Vs implementation-dependent traps have the following  
priorities:  
interrupt_vector_trap (priority =16)  
PA_watchpoint (priority =12)  
VA_watchpoint (priority = 1)  
fast_instruction_access_MMU_miss (priority = 2)  
fast_data_access_MMU_miss (priority = 12)  
fast_data_access_protection (priority = 12)  
async_data_error (priority = 2)  
37  
38  
Reset trap  
37  
SPARC64 V implements power-on reset (POR) and watchdog reset.  
Effect of reset trap on implementation-dependent registers  
141  
See Section O.3, Processor State after Reset and in RED_state, on page 141.  
39  
40  
Entering error_state on implementation-dependent errors  
CPU watchdog timeout at 233 ticks, a normal trap, or an SIR at TL= MAXTL  
causes the CPU to enter error_state.  
36  
36  
Error_state processor state  
SPARC64 V optionally takes a watchdog reset trap after entry to  
error_state. Most error-logging register state will be preserved. (See also  
impl. dep. #254.)  
41  
Reserved.  
72  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
       
TABLE C-1 SPARC64 V Implementation Dependencies (4 of 11)  
Nbr  
SPARC64 V Implementation Notes  
Page  
42  
FLUSH instruction  
SPARC64 V implements the FLUSHinstruction in hardware.  
43  
44  
Reserved.  
Data access FPU trap  
The destination register(s) are unchanged if an access error occurs.  
4546 Reserved.  
47  
RDASR  
See A.50, Read State Register, in Commonality for details.  
48  
WRASR  
See A.70, Write State Register, in Commonality for details.  
4954 Reserved.  
55 Floating-point underflow detection  
See FSR_underflow in Section 5.1.7 of Commonality for details.  
56100 Reserved.  
101  
Maximum trap level  
20  
MAXTL= 5.  
102  
Clean windows trap  
SPARC64 V generates a clean_window exception; register windows are  
cleaned in software.  
103  
Prefetch instructions  
following implementation-dependent characteristics:  
The prefetches have observable effects in privileged code.  
Prefetch variants 03 do not cause a fast_data_access_MMU_miss trap,  
because the prefetch is dropped when a fast_data_access_MMU_miss  
condition happens. On the other hand, prefetch variants 2023 cause  
data_access_MMU_miss traps on TLB misses.  
All prefetches are for 64-byte cache lines, which are aligned on a 64-byte  
boundary.  
See Section A.49, Prefetch Data, on page 57, for implemented variations  
and their characteristics.  
Prefetches will work normally if the ASI is ASI_PRIMARY,  
ASI_SECONDARY, or ASI_NUCLEUS, ASI_PRIMARY_AS_IF_USER,  
ASI_SECONDARY_AS_IF_USER, and their little-endian pairs.  
104  
105  
VER.manuf  
20  
19  
VER.manuf= 0004 . The least significant 8 bits are Fujitsus JEDEC  
manufacturing code.  
16  
TICK register  
SPARC64 V implements 63 bits of the TICKregister; it increments on every  
clock cycle.  
Release 1.0, 1 July 2002  
F. Chapter C  
Implementation Dependencies  
73  
     
TABLE C-1 SPARC64 V Implementation Dependencies (5 of 11)  
Nbr  
SPARC64 V Implementation Notes  
Page  
106  
IMPDEPn instructions  
49  
SPARC64 V uses the IMPDEP2opcode for the Multiply Add/ Subtract  
instructions. SPARC64 V also conforms to Suns specification for VIS-1 and  
VIS-2.  
107  
108  
109  
Unimplemented LDD trap  
SPARC64 V implements LDDin hardware.  
Unimplemented STD trap  
SPARC64 V implements STDin hardware.  
LDDF_mem_address_not_aligned  
If the address is word aligned but not doubleword aligned, SPARC64 V  
generates the LDDF_mem_address_not_aligned exception. The trap handler  
software emulates the instruction.  
110  
111  
112  
STDF_mem_address_not_aligned  
If the address is word aligned but not doubleword aligned, SPARC64 V  
generates the STDF_mem_address_not_aligned exception. The trap handler  
software emulates the instruction.  
LDQF_mem_address_not_aligned  
SPARC64 V generates an illegal_instruction exception for all LDQFs. The  
processor does not perform the check for fp_disabled. The trap handler  
software emulates the instruction.  
STQF_mem_address_not_aligned  
SPARC64 V generates an illegal_instruction exception for all STQFs. The  
processor does not perform the check for fp_disabled. The trap handler  
software emulates the instruction.  
113  
114  
SPARC64 V implements Total Store Order (TSO) for all the memory models  
specified in PSTATE.MM. See Chapter 8, Memory Models, for details.  
42  
36  
RED_state trap vector address (RSTVaddr)  
RSTVaddris a constant in SPARC64 V, where:  
VA= FFFF FFFF F000 000016 and  
PA=07FF F000 000016  
115  
116  
RED_state processor state  
See RED_state on page 36 for details of implementation-specific actions in  
RED_state.  
36  
SIR_enable control flag  
See Section A.60 SIRin Commonality for details.  
117  
118  
MMU disabled prefetch behavior  
Prefetch and nonfaulting Load always succeed when the MMU is disabled.  
91  
Identifying I/O locations  
This dependency is beyond the scope of this publication. It should be  
defined in a system that uses SPARC64 V.  
74  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
       
TABLE C-1 SPARC64 V Implementation Dependencies (6 of 11)  
Nbr  
SPARC64 V Implementation Notes  
Page  
119  
Unimplemented values for PSTATE.MM  
42  
Writing 112 into PSTATE.MMcauses the machine to use the TSO memory  
model. However, the encoding 112 should not be used, since future versions  
of SPARC64 V may use this encoding for a new memory model.  
120  
121  
Coherence and atomicity of memory operations  
Although SPARC64 V implements the UPA-based cache coherency  
mechanism, this dependency is beyond the scope of this publication. It  
should be defined in a system that uses SPARC64 V.  
Implementation-dependent memory model  
SPARC64 V implements TSO, PSO, and RMO memory models. See  
Chapter 8, Memory Models, for details.  
Accesses to pages with the E(Volatile) bit of their MMU page table entry set  
are also made in program order.  
122  
123  
FLUSH latency  
Since the FLUSHinstruction synchronizes the processor, its total latency  
varies depending on many portions of the SPARC64 V processor s state.  
Assuming that all prior instructions are completed, the latency of FLUSHis  
18 processor cycles.  
Input /output (I/O) semantics  
This dependency is beyond the scope of this publication. It should be  
defined in a system that uses SPARC64 V.  
124  
125  
Implicit ASI when TL > 0  
See Section 5.1.7 of Commonality for details.  
Address masking  
29, 49, 53  
When PSTATE.AM = 1, SPARC64 V does mask out the high-order 32 bits of  
the PCwhen transmitting it to the destination register.  
126  
Register Windows State Registers width  
NWINDOWSfor SPARC64 V is 8; therefore, only 3 bits are implemented for  
the following registers: CWP, CANSAVE, CANRESTORE, OTHERWIN. If an  
attempt is made to write a value greater than NWINDOWS 1 to any of these  
registers, the extraneous upper bits are discarded. The CLEANWINregister  
contains 3 bits.  
127201 Reserved.  
202  
fast_ECC_error trap  
fast_ECC_error trap is not implemented in SPARC64 V.  
203  
204  
205  
Dispatch Control Register bits 13:6 and 1  
SPARC64 V does not implement DCR.  
22  
22  
24  
DCR bits 5:3 and 0  
SPARC64 V does not implement DCR.  
Instruction Trap Register  
SPARC64 V implements the Instruction Trap Register.  
Release 1.0, 1 July 2002  
F. Chapter C  
Implementation Dependencies  
75  
               
TABLE C-1 SPARC64 V Implementation Dependencies (7 of 11)  
Nbr  
SPARC64 V Implementation Notes  
Page  
206  
SHUTDOWN instruction  
58  
In privileged mode the SHUTDOWNinstruction executes as a NOP in  
SPARC64 V.  
207  
PCR register bits 47:32, 26:17, and bit 3  
20, 21,  
201  
SPARC64 V uses these bits for the following purposes:  
Bits 47:32 for set/ clear/ show status of overflow (OVF).  
Bit 26 for validity of OVFfield (OVRO).  
Bits 24:22 for number of counter pair (NC).  
Bits 20:18 for counter selector (SC).  
Bit 3 for validity of SU/ SLfield (ULRO).  
Other implementation-dependent bits are read as 0 and writes to them are  
ignored.  
208  
Ordering of errors captured in instruction execution  
The order in which errors are captured during instruction execution is  
implementation dependent. Ordering can be in program order or in order of  
detection.  
209  
210  
211  
212  
Software intervention after instruction-induced error  
Precision of the trap to signal an instruction-induced error for which  
recovery requires software intervention is implementation dependent.  
ERROR output signal  
The causes and the semantics of ERROR output signal are implementation  
dependent.  
Error logging registersinformation  
The information that the error logging registers preserves beyond the reset  
induced by an ERROR signal is implementation dependent.  
Trap with fatal error  
Generation of a trap along with ERROR signal assertion upon detection of a  
fatal error is implementation dependent.  
213  
214  
215  
AFSR.PRIV  
SPARC64 V does not implement the AFSR.PRIVbit.  
Enable/disable control for deferred traps  
SPARC64 V does not implement a control feature for deferred traps.  
Error barrier  
DONEand RETRYinstructions may implicitly provide an error barrier  
function as MEMBAR #Sync. Whether DONEand RETRYinstructions provide  
an error barrier is implementation dependent.  
216  
217  
data_access_error trap precision  
data_access_error trap is always precise in SPARC64 V.  
instruction_access_error trap precision  
instruction_access_error trap is always precise in SPARC64 V.  
76  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
TABLE C-1 SPARC64 V Implementation Dependencies (8 of 11)  
Nbr  
SPARC64 V Implementation Notes  
Page  
218  
async_data_error  
39  
async_data_error trap is implemented in SPARC64 V, using tt= 40 . See  
16  
Appendix P for details.  
219  
Asynchronous Fault Address Register (AFAR) allocation  
177, 178  
SPARC64 V implements two AFARs:  
VA = 00 for an error occurring in D1 cache.  
16  
VA = 08 for an error occurring in U2 cache.  
16  
220  
221  
Addition of logging and control registers for error handling  
SPARC64 V implements various features for sustaining reliability. See  
Appendix P for details.  
Special/signalling ECCs  
The method to generate specialor signallingECCs and whether  
processor-ID is embedded into the data associated with special/ signalling  
ECCs is implementation dependent.  
222  
TLB organization  
85  
SPARC64 V has the following TLB organization:  
Level-2 micro ITLB (uITLB), 32-way fully associative  
Level-1 micro DTLB (uDTLB), 32-way fully associative  
Level-2 IMMU-TLBconsisting of sITLB (set-associative Instruction TLB)  
and fITLB (fully associative Instruction TLB).  
Level-2 DMMU-TLBconsisting of sDTLB (set-associative Data TLB) and  
fDTLB (fully associative Data TLB).  
223  
224  
TLB multiple-hit detection  
86  
86  
On SPARC64 V, TLB multiple hit detection is supported. However, the  
multiple hit is not detected at every TLB reference. When the micro-TLB  
(uTLB), which is the cache of sTLB and fTLB, matches the virtual address,  
the multiple hit in sTLB and fTLB is not detected. The multiple hit is  
detected only when the micro-TLB mismatches and the main TLB is  
referenced.  
MMU physical address width  
The SPARC64 V MMU implements 43-bit physical addresses. The PAfield of  
the TTEholds a 43-bit physical address. Bits 46:43 of each TTE always read  
as 0 and writes to them are ignored. The MMU translates virtual addresses  
into 43-bit physical addresses. Each cache tag holds bits 42:6 of physical  
addresses.  
225  
226  
TLB locking of entries  
87  
87  
In SPARC64 V, when a TTE with its lock bit set is written into TLB through  
the Data In register, the TTE is automatically written into the corresponding  
fully associative TLB and locked in the TLB. Otherwise, the TTE is written  
into the corresponding sTLB of fTLB, depending on its page size.  
TTE support for CV bit  
SPARC64 V does not support the CVbit in TTE. Since I1 and D1 are  
virtually indexed caches, unaliasing is supported by SPARC64 V. See also  
impl. dep. #232.  
Release 1.0, 1 July 2002  
F. Chapter C  
Implementation Dependencies  
77  
           
TABLE C-1 SPARC64 V Implementation Dependencies (9 of 11)  
Nbr  
SPARC64 V Implementation Notes  
Page  
227  
TSB number of entries  
88  
SPARC64 V supports a maximum of 16 million entries in the common TSB  
and a maximum of 32 million lines the Split TSB.  
228  
229  
TSB_Hash supplied from TSB or context-ID register  
TSB_Hashis generated from the context-ID register in SPARC64 V.  
88  
88  
TSB_Base address generation  
SPARC64 V generates the TSB_Baseaddress directly from the TLB  
Extension Registers. By maintaining compatibility with UltraSPARC I/ II,  
SPARC64 V provides mode flag MCNTL.JPS1_TSBP. When  
MCNTL.JPS1_TSBP= 0, the TSB_Baseregister is used.  
230  
231  
232  
233  
data_access_exception trap  
SPARC64 generates data_access_exception only for the causes listed in  
Section 7.6.1 of Commonality.  
89  
91  
MMU physical address variability  
SPARC64 V supports both 41-bit and 43-bit physical address mode. The  
initial width of the physical address is controlled by OPSR.  
DCU Control Register CP and CV bits  
SPARC64 V does not implement CPand CVbits in the DCU Control  
Register. See also impl. dep. #226.  
23, 91  
92  
TSB_Hash field  
SPARC64 V does not implement TSB_Hash.  
234  
235  
TLB replacement algorithm  
For fTLB, SPARC64 V implements a pseudo-LRU. For sTLB, LRU is used.  
93  
94  
TLB data access address assignment  
The MMU TLB data-access address assignment and the purpose of the  
address are implementation dependent.  
236  
TSB_Size field width  
97  
In SPARC64 V, TSB_Sizeis 4 bits wide, occupying bits 3:0 of the TSB  
register. The maximum number of TSBentries is, therefore, 512 × 215 (16M  
entries).  
237  
238  
DSFAR/DSFSR for JMPL/RETURN mem_address_not_aligned  
A mem_address_not_aligned exception that occurs during a JMPLor RETURN  
instruction does not update either the D-SFARor D-SFSRregister.  
89, 97  
87  
TLB page offset for large page sizes  
On SPARC64 V, even for a large page, written data for TLB Data Register is  
preserved for bits representing an offset in a page, so the data previously  
written is returned regardless of the page size.  
239  
Register access by ASIs 5516 and 5D16  
92  
In SPARC64 V, VA<63:19> of IMMU ASI 5516 and DMMU ASI 5D16 are  
ignored. An access to virtual addresses 4000016 to 60FF816 is treated as an  
access 0000016 to 20FF816  
78  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
   
TABLE C-1 SPARC64 V Implementation Dependencies (10 of 11)  
Nbr  
SPARC64 V Implementation Notes  
Page  
240  
DCU Control Register bits 47:41  
23  
SPARC64 V uses bit 41 for WEAK_SPCA, which enables/ disables memory  
access in speculative paths.  
241  
242  
Address Masking and DSFAR  
SPARC64 V writes zeroes to the more significant 32 bits of DSFAR.  
TLB lock bit  
87  
In SPARC64 V, only the fITLB and the fDTLB support the lock bit. The lock  
bit in sITLB and sDTLB is read as 0 and writes to it are ignored.  
243  
Interrupt Vector Dispatch Status Register BUSY/NACK pairs  
136  
In SPARC64 V, 32 BUSY/ NACK pairs are implemented in the Interrupt  
Vector Dispatch Status Register.  
244  
245  
Data Watchpoint Reliability  
No implementation-dependent features of SPARC64 V reduce the reliability  
of data watchpoints.  
24  
24  
Call/Branch displacement encoding in I-Cache  
In SPARC64 V, the least significant 11 bits (bits 10:0) of a CALLor branch  
(BPcc, FBPfcc, Bicc, BPr) instruction in an instruction cache are identical  
to the architectural encoding (as they appear in main memory).  
246  
247  
248  
249  
VA<38:29> for Interrupt Vector Dispatch Register Access  
SPARC64 V ignores all 10 bits of VA<38:29> when the Interrupt Vector  
Dispatch Register is written.  
136  
136  
18  
Interrupt Vector Receive Register SID fields  
SPARC64 V obtains the interrupt source identifier SID_Lfrom the UPA  
packet.  
Conditions for fp_exception_other with unfinished_FPop  
SPARC64 V triggers fp_exception_other with trap type unfinished_FPop  
under the standard conditions described in Commonality Section 5.1.7.  
Data watchpoint for Partial Store instruction  
57  
Watchpoint exceptions on Partial Store instructions occur conservatively on  
SPARC64 V. The DCUCRData Watchpoint masks are only checked for  
nonzero value (watchpoint enabled). The byte store mask (r[rs2]) in the  
Partial Store instruction is ignored, and a watchpoint exception can occur  
even if the mask is zero (that is, no store will take place).  
250  
PCR accessibility when PSTATE.PRIV = 0  
20, 22, 58  
In SPARC64 V, the accessibility of PCRwhen PSTATE.PRIV= 0 is  
determined by PCR.PRIV. If PSTATE.PRIV= 0 and PCR.PRIV= 1, an  
attempt to execute either RDPCRor WRPCRwill cause a privileged_action  
exception. If PSTATE.PRIV= 0 and PCR.PRIV= 0, RDPCRoperates without  
privilege violation and WRPCRgenerates a privileged_action exception only  
when an attempt is made to change (that is, write 1 to) PCR.PRIV.  
251  
Reserved.  
Release 1.0, 1 July 2002  
F. Chapter C  
Implementation Dependencies  
79  
   
TABLE C-1 SPARC64 V Implementation Dependencies (11 of 11)  
Nbr  
SPARC64 V Implementation Notes  
Page  
252  
DCUCR.DC (Data Cache Enable)  
24  
SPARC64 V does not implement DCUCR.DC.  
253  
254  
DCUCR.IC (Instruction Cache Enable)  
SPARC64 V does not implement DCUCR.IC.  
24  
Means of exiting error_state  
37, 146  
The standard behavior of a SPARC64 V CPU upon entry into  
error_stateis to reset itself by internally generating a watchdog_reset  
(WDR). However, OPSRcan be set so that when error_state is entered, the  
processor remains halted in error_stateinstead of generating a  
watchdog_reset.  
255  
256  
LDDFA with ASI E0 or E1 and misaligned destination register number  
No exception is generated based on the destination register rd.  
120  
120  
16  
16  
LDDFA with ASI E016 or E116 and misaligned memory address  
n
For LDDFAwith ASI E016 or E11 and a memory address aligned on a 2 -byte  
boundary, a SPARC64 V processor behaves as follows:  
n 3 (8-byte alignment): no exception related to memory address  
alignment is generated.  
n = 2 (4-byte alignment): LDDF_mem_address_not_aligned exception is  
generated.  
n 1 (2-byte alignment): mem_address_not_aligned exception is  
generated.  
LDDFA with ASI C016C516 or C816CD16 and misaligned memory address  
120  
257  
For LDDFAwith C016C516 or C816CD16 and a memory address aligned on  
n
a 2 -byte boundary, a SPARC64 V processor behaves as follows:  
n 3 (8-byte alignment): no exception related to memory address  
alignment is generated.  
n = 2 (4-byte alignment): LDDF_mem_address_not_aligned exception is  
generated.  
n 1 (2-byte alignment): mem_address_not_aligned exception is  
generated.  
ASI_SERIAL_ID  
119  
258  
SPARC64 V provides an identification code for each processor.  
80  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
       
F.APPENDIX  
D
Formal Specification of the Memory  
Models  
Please refer to Appendix D of Commonality.  
81  
 
82  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
F.APPENDIX  
E
Opcode Maps  
Please refer to Appendix E in Commonality. TABLE E-1 lists the opcode map for the  
SPARC64 V IMPDEP2instruction.  
TABLE E-1 IMPDEP2(op = 2, op3 = 3716)  
var (instruction <8:7>)  
00  
01  
10  
11  
(not used reserved)  
00  
01  
10  
11  
FMADDs  
FMADDd  
FMSUBs  
FMSUBd  
FNMADDs  
SNMSUBd  
FNMADDs  
FNMSUBd  
size  
(instruction<6:5>)  
(reserved for quad operations)  
83  
     
84  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
F.APPENDIX  
F
Memory Management Unit  
The Memory Management Unit (MMU) architecture of SPARC64 V conforms to the  
MMU architecture defined in Appendix F of Commonality but with some model  
dependency. See Appendix F in Commonality for the basic definitions of the  
SPARC64 V MMU.  
Section numbers in this appendix correspond to those in Appendix F of  
Commonality. Figures and tables, however, are numbered consecutively.  
This appendix describes the implementation dependencies and other additional  
information about the SPARC64 V MMU. For SPARC64 V implementations, we first  
list the implementation dependency as given in TABLE C-1 of Commonality, then  
describe the SPARC64 V implementation.  
F.1  
Virtual Address Translation  
IMPL. DEP. #222: TLB organization is JPS1 implementation dependent.  
SPARC64 V has the following TLB organization:  
Level-1 micro ITLB (uITLB), 32-way fully associative  
Level-1 micro DTLB (uDTLB), 32-way fully associative  
Level-2 IMMU-TLB consists of sITLB (set-associative Instruction TLB) and  
fITLB (fully associative Instruction TLB).  
Level-2 DMMU-TLB consists of sDTLB (set-associative Data TLB) and fDTLB  
(fully associative Data TLB).  
TABLE F-1 shows the organization of SPARC64 V TLBs.  
Hardware contains micro-ITLB and micro-DTLB as the temporary memory of the  
main TLBs, as shown in TABLE F-1. In contrast to the micro-TLBs, sTLB and fTLB  
are called main TLBs.  
85  
                         
The micro-TLBs are coherent to main TLBs and are not visible to software, with  
the exception of TLB multiple hit detection. Hardware maintains the consistency  
between micro-TLBs and main TLBs.  
No other details on micro-TLB are provided because software cannot execute  
direct operations to micro-TLB and its configuration is invisible to software.  
TABLE F-1 Organization of SPARC64 V TLBs  
Feature  
sITLB and sDTLB  
2048  
fITLB and fDTLB  
32  
Entries  
Associativity  
2-way set associative  
8 KB/ 4MB  
Fully associative  
8 KB/ 64 KB/ 512 KB/ 4 MB  
Supported  
Page size supported  
Locked translation entry  
Unlocked translation entry  
Not supported  
Supported  
Supported  
IMPL. DEP. #223: Whether TLB multiple-hit detections are supported in JPS1 is  
implementation dependent.  
On SPARC64 V, TLB multiple hit detection is supported. However, the multiple  
hit is not detected at every TLB reference. When the micro-TLB (uTLB), which is  
the cache of sTLB and fTLB, matches the virtual address, the multiple hit in sTLB  
and fTLB is not detected. The multiple hit is detected only when the micro-TLB  
mismatches and main TLB is referenced.  
F.2  
Translation Table Entry (TTE)  
IMPL DEP. in Commonality TABLE F-1: TTE_Data bits 4643 are implementation  
dependent.  
On SPARC64 V, TTE_Databits 46:43 are reserved.  
IMPL. DEP. #224: Physical address width support by the MMU is implementation  
dependent in JPS1; minimum PAwidth is 43 bits.  
The SPARC64 V MMU implements 43-bit physical addresses. The PAfield of the  
TTEholds a 43-bit physical address. The MMU translates virtual addresses into  
43-bit physical addresses. Each cache tag holds bits 42:6 of physical addresses.  
Bits 46:43 of each TTE always read as 0 and writes to them are ignored.  
A cacheable access for a physical address 400 0000 000016 always causes the  
cache miss for the U2 cache and generates a UPA request for the cacheable access.  
The urgent error ASI_UGESR.SDCis signalled after the UPA cacheable access is  
requested.  
86  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
                   
The physical address length to be passed to the UPA interface is 41 bits or 43 bits,  
as designated in the ASI_UPA_CONFIG.AMfield. When the 41-bit PAis specified  
in ASI_UPA_CONFIG.AM, the most significant 2 bits of the CPU internal physical  
address are discarded and only the remaining least significant 41 bits are passed  
to the UPA address bus. If the discarded most significant 2 bits are not 0, the  
urgent error ASI_UGESR.SDC is detected after the invalid address transfer to the  
UPA interface. Otherwise, when the 43-bit PA is specified in  
ASI_UPA_CONFIG.AM,the entire 43 bits of CPU internal physical address are  
passed to the UPA address bus.  
IMPL. DEP. #238: When page offset bits for larger page size (PA<15:13>, PA<18:13>,  
and PA<21:13> for 64-Kbyte, 512-Kbyte, and 4-Mbyte pages, respectively) are stored  
in the TLB, it is implementation dependent whether the data returned from those  
fields by a Data Access read are zero or the data previously written to them.  
On SPARC64 V, the data returned from PA<15:13>, PA<18:13>, and PA<21:13> for  
64-Kbyte, 512-Kbyte, and 4-Mbyte pages, respectively, by a Data Access read are  
the data previously written to them.  
IMPL. DEP. #225: The mechanism by which entries in TLB are locked is  
implementation dependent in JPS1.  
In SPARC64 V, when a TTE with its lock bit set is written into TLB through the  
Data In register, the TTE is automatically written into the corresponding fully  
associative TLB and locked in the TLB. Otherwise, the TTE is written into the  
corresponding sTLB or fTLB, depending on its page size.  
IMPL. DEP. #242: An implementation containing multiple TLBs may implement the L  
(lock) bit in all TLBs but is only required to implement a lock bit in one TLB for each  
page size. If the lock bit is not implemented in a particular TLB, it is read as 0 and  
writes to it are ignored.  
In SPARC64 V, only the fITLB and the fDTLB support the lock bit as described in  
TABLE F-1. The lock bit in sITLB and sDTLB is read as 0 and writes to it are  
ignored.  
IMPL. DEP. #226: Whether the CVbit is supported in TTEis implementation  
dependent in JPS1. When the CVbit in TTEis not provided and the implementation  
has virtually indexed caches, the implementation should support hardware  
unaliasing for the caches.  
In SPARC64 V, no TLB supports the CVbit in TTE. SPARC64 V supports hardware  
unaliasing for the caches. The CVbit in any TLBentry is read as 0 and writes to it  
are ignored.  
Release 1.0, 1 July 2002  
F. Chapter F  
Memory Management Unit  
87  
           
F.3.3  
F.4.2  
TSB Organization  
IMPL. DEP. #227: The maximum number of entries in a TSB is implementation  
dependent in JPS1. See impl. dep. #228 for the limitation of TSB_sizein TSB  
registers.  
SPARC64 V supports a maximum of 16 million lines in the common TSB and a  
maximum 32 million lines in the split TSB. The maximum number N in  
FIGURE F-4 of Commonality is 16 million (16 * 220).  
TSB Pointer Formation  
IMPL. DEP. #228: Whether TSB_Hashis supplied from a TSB Extension Register or  
from a context-ID register is implementation dependent in JPS1. Only for cases of  
direct hash with context-ID can the width of the TSB_sizefield be wider than 3  
bits.  
On SPARC64 V, TSB_Hashis supplied from a context-ID register. The width of  
the TSB_sizefield is 4 bits.  
IMPL. DEP. #229: Whether the implementation generates the TSB Base address by  
exclusive-ORing the TSB Base Register and a TSB Extension Register or by taking the  
TSB_Basefield directly from the TSB Extension Register is implementation  
dependent in JPS1. This implementation dependency is only to maintain  
compatibility with the TLB miss handling software of UltraSPARC I/ II.  
On SPARC64 V, when ASI_MCNTL.JPS1_TSBP= 1, the TSB Base address is  
generated by taking TSB_Basefield directly from the TSB Extension Register.  
TSB Pointer Formation  
On SPARC64 V, the number N in the following equations ranges from 0 to 15; N is  
defined to be the TSB_Sizefield of the TSB Base or TSB Extension Register.  
88  
SPARC JPS1 Implementation Supplement: Fujitsu SPARC64 V Release 1.0, 1 July 2002  
             
8K_POINTER = TSB_Extension[63:14+N] 0 (VA[21+N:13] TSB_Hash)  
0000  
64K_POINTER = TSB_Extension[63:14+N]  
TSB_Hash) 0000  
1
(VA[24+N:16] ⊕  
Value of TSB_Hash for both a shared TSB and a split TSB  
When 0 <= N <= 4,  
TSB_Hash= context_register[N+8:0]  
Otherwise, when 5 <= N <= 15,  
TSB_Hash[ 12:0 ] = context_register[ 12:0 ]  
TSB_Hash[ N+8:13 ] = 0 ( N-4 bits zero )  
F.5  
IMPL. DEP. #230: The cause of a data_access_exception trap is implementation  
dependent in JPS1, but there are several mandatory causes of data_access_exception  
trap.  
SPARC64 V signals a data_access_exception for the causes, as defined in F.5 in  
Commonality. However, caution is needed to deal with an invalid ASI. See  
Section F.10.9 for details.  
IMPL. DEP. #237: Whether the fault status and/ or address (DSFSR/ DSFAR) are  
captured when mem_address_not_aligned is generated during a JMPLor RETURN  
instruction is implementation dependent.  
On SPARC64 V, the fault status and address (DSFSR/ DSFAR) are not captured  
when a mem_address_not_aligned exception is generated during a JMPLor RETURN  
instruction.  
Additional information: On SPARC64 V, the two precise traps—  
instruction_access_error and data_access_errorare recorded by the MMU in addition  
to those in TABLE F-2 of Commonality. A modification (the two traps are added) of  
that table is shown below.  
TABLE F-2  
MMU Trap Types, Causes, and Stored State Register Update Policy  
Registers Updated  
(Stored State in MMU)  
I-MMU  
Tag  
D-MMU  
D-SFSR, Tag  
Ref #Trap Name  
Trap Cause  
I-SFSR Access SFAR  
Access Trap Type  
fast_instruction_access_MMU_miss  
1.  
I-TLB miss  
X2  
X
64166716  
Release 1.0, 1 July 2002  
F. Chapter F  
Memory Management Unit  
89  
           

Nokia 3589i User Manual
Motorola IN VEHICLE PHONE M930 User Manual
Melissa Take 2 ME2TMBCHR User Manual
LG Electronics LG Lifes Good Cell Phone 800G User Manual
Kambrook KCR30 User Manual
Hamilton Beach 42884 User Manual
Fagor America MQC A10 US User Manual
Echo Bear Cat 71125 User Manual
Black Box NetPower 26542 User Manual
Apple Webcam User Manual