Title Page PowerPC 476FP Embedded Processor Core User’s Manual Version 2.2 July 31, 2014 ® Copyright and Disclaimer © Copyright International Business Machines Corporation 2009, 2014 Printed in the United States of America July 2014 IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml. Other company, product, and service names may be trademarks or service marks of others. All information contained in this document is subject to change without notice. The products described in this document are NOT intended for use in applications such as implantation, life support, or other hazardous uses where malfunction could result in death, bodily injury, or catastrophic property damage. The information contained in this document does not affect or change IBM product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnity under the intellectual property rights of IBM or third parties. All information contained in this document was obtained in specific environments, and is presented as an illustration. The results obtained in other operating environments may vary. THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN “AS IS” BASIS. In no event will IBM be liable for damages arising directly or indirectly from any use of the information contained in this document. IBM Systems and Technology Group 2070 Route 52, Bldg. 330 Hopewell Junction, NY 12533-6351 The IBM home page can be found at ibm.com®. The IBM microelectronics home page can be found at ibm.com/chips. Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Contents List of Figures ............................................................................................................... 13 List of Tables ................................................................................................................. 15 Revision Log ................................................................................................................. 19 About this Document .................................................................................................... 23 1. Overview .................................................................................................................... 25 1.1 General Features ............................................................................................................................ 1.2 Power Control Features .................................................................................................................. 1.2.1 Power Control Modes ............................................................................................................ 1.2.2 Power Control Procedures .................................................................................................... 1.2.2.1 CPU Sleep Mode ............................................................................................................ 1.2.2.2 CPU Doze Mode ............................................................................................................ 1.2.2.3 Waking up the Processor ............................................................................................... 1.3 Implemented Instruction Set ........................................................................................................... 1.4 Test and Debug Facilities ................................................................................................................ 1.5 Floating-Point Unit Overview ........................................................................................................... 1.6 Instruction Cache Overview ............................................................................................................ 1.7 Data Cache Unit Overview .............................................................................................................. 1.8 Memory Management Unit Overview .............................................................................................. 1.9 Timers ............................................................................................................................................. 26 26 26 28 28 28 28 29 29 30 30 31 31 31 2. Programming Model ................................................................................................. 33 2.1 Storage Addressing ......................................................................................................................... 2.1.1 Storage Operands ................................................................................................................. 2.1.2 Effective Address Calculation ................................................................................................ 2.1.2.1 Data Storage Addressing Modes ................................................................................... 2.1.2.2 Instruction Storage Addressing Modes .......................................................................... 2.1.3 Byte Ordering ........................................................................................................................ 2.1.3.1 Structure Mapping Examples ......................................................................................... 2.1.3.2 Instruction Byte Ordering ................................................................................................ 2.1.3.3 Data Byte Ordering ......................................................................................................... 2.1.3.4 Byte-Reverse Instructions .............................................................................................. 2.2 Registers ......................................................................................................................................... 2.2.1 Register Types ...................................................................................................................... 2.2.1.1 General Purpose Registers ............................................................................................ 2.2.1.2 Special Purpose Registers ............................................................................................. 2.2.1.3 Condition Register .......................................................................................................... 2.2.1.4 Machine State Register .................................................................................................. 2.2.1.5 Device Control Registers ................................................................................................ 2.3 Instruction Classes .......................................................................................................................... 2.3.1 Defined Instruction Class ....................................................................................................... 2.3.2 Preserved Instruction Class ................................................................................................... Version 2.2 July 31, 2014 33 33 34 35 35 36 37 38 39 40 40 45 45 46 46 46 46 47 47 48 Contents Page 3 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 2.3.3 Reserved Instruction Class .................................................................................................... 2.4 Implemented Instruction Set Summary ........................................................................................... 2.4.1 Integer Instructions ................................................................................................................ 2.4.1.1 Integer Storage Access Instructions ............................................................................... 2.4.1.2 Integer Arithmetic Instructions ........................................................................................ 2.4.1.3 Integer Logical Instructions ............................................................................................. 2.4.1.4 Integer Compare Instructions ......................................................................................... 2.4.1.5 Integer Trap Instructions ................................................................................................. 2.4.1.6 Integer Rotate Instructions ............................................................................................. 2.4.1.7 Integer Shift Instructions ................................................................................................. 2.4.1.8 Integer Select Instruction ................................................................................................ 2.4.2 Branch Instructions ................................................................................................................ 2.4.3 Processor Control Instructions ............................................................................................... 2.4.3.1 Condition Register Logical Instructions .......................................................................... 2.4.3.2 Register Management Instructions ................................................................................. 2.4.3.3 System Linkage Instructions ........................................................................................... 2.4.3.4 Processor Synchronization Instruction ........................................................................... 2.4.4 Storage Control Instructions .................................................................................................. 2.4.4.1 Cache Management Instructions .................................................................................... 2.4.4.2 TLB Management Instructions ........................................................................................ 2.4.4.3 Storage Synchronization Instructions ............................................................................. 2.4.5 Previous Integer Multiply-Accumulate Instructions ................................................................ 2.5 Branch Processing .......................................................................................................................... 2.5.1 Branch Addressing ................................................................................................................. 2.5.2 Branch Instruction BI Field ..................................................................................................... 2.5.3 Branch Instruction BO Field ................................................................................................... 2.5.4 Branch Prediction ................................................................................................................... 2.5.5 Branch Control Registers ....................................................................................................... 2.5.5.1 Link Register (LR) ........................................................................................................... 2.5.5.2 Count Register (CTR) ..................................................................................................... 2.5.5.3 Condition Register (CR) ................................................................................................. 2.6 Integer Processing .......................................................................................................................... 2.6.1 General Purpose Registers (GPRs) ....................................................................................... 2.6.2 Fixed-Point Exception Register (XER) ................................................................................... 2.6.2.1 Summary Overflow (SO) Field ........................................................................................ 2.6.2.2 Overflow (OV) Field ........................................................................................................ 2.6.2.3 Carry (CA) Field .............................................................................................................. 2.6.2.4 Transfer Byte Count (TBC) Field .................................................................................... 2.7 Processor Control ............................................................................................................................ 2.7.1 Special Purpose Registers General (USPRG0, SPRG0 - SPRG8) ....................................... 2.7.2 Processor Version Register (PVR) ........................................................................................ 2.7.3 Processor Identification Register (PIR) .................................................................................. 2.7.4 Core Configuration Register 0 (CCR0) .................................................................................. 2.7.5 Core Configuration Register 1 (CCR1) .................................................................................. 2.7.6 Core Configuration Register 2 (CCR2) .................................................................................. 2.7.7 Reset Configuration (RSTCFG) ............................................................................................. 2.7.8 Device Control Register Immediate Prefix Register (DCRIPR) ............................................. 2.8 User and Supervisor Modes ............................................................................................................ 2.8.1 Privileged Instructions ............................................................................................................ 2.8.2 Privileged SPRs ..................................................................................................................... Contents Page 4 of 322 48 49 49 50 50 51 51 51 51 52 52 52 52 53 53 53 54 54 54 54 55 55 56 56 57 57 58 59 59 60 60 63 63 64 65 65 66 66 66 67 68 68 69 70 73 74 74 75 75 75 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 2.9 Speculative Accesses ..................................................................................................................... 2.10 Synchronization ............................................................................................................................. 2.10.1 Context Synchronization ...................................................................................................... 2.10.2 Execution Synchronization .................................................................................................. 2.10.3 Storage Ordering and Synchronization ............................................................................... 2.10.4 SPRs Requiring Context Synchronization ........................................................................... 2.10.5 Instructions Requiring a Context Synchronization Instruction ............................................. 2.11 Storage Model ............................................................................................................................... 76 76 76 78 78 79 80 81 3. Floating-Point Unit Programming Model ................................................................ 85 3.1 Floating-Point Exceptions ............................................................................................................... 85 3.2 Floating-Point Registers .................................................................................................................. 86 3.2.1 Register Types ...................................................................................................................... 86 3.2.1.1 Floating-Point Registers (FPR0 - FPR31) ...................................................................... 86 3.2.1.2 Floating-Point Status and Control Register (FPSCR) ................................................... 87 3.3 Floating-Point Data Formats ........................................................................................................... 89 3.3.1 Value Representation ............................................................................................................ 90 3.3.2 Binary Floating-Point Numbers .............................................................................................. 91 3.3.2.1 Normalized Numbers ...................................................................................................... 91 3.3.2.2 Denormalized Numbers .................................................................................................. 91 3.3.2.3 Zero Values .................................................................................................................... 91 3.3.3 Infinities ................................................................................................................................. 91 3.3.3.1 Not a Numbers ............................................................................................................... 92 3.3.4 Sign of Result ........................................................................................................................ 93 3.3.5 Data Handling and Precision ................................................................................................. 93 3.3.6 Rounding ............................................................................................................................... 94 3.4 Floating-Point Instructions ............................................................................................................... 95 3.4.1 Instructions By Category ....................................................................................................... 96 3.4.2 Load and Store Instructions ................................................................................................... 97 3.4.3 Floating-Point Store Instructions ........................................................................................... 98 3.4.4 Floating-Point Move Instructions ........................................................................................... 99 3.4.5 Floating-Point Arithmetic Instructions .................................................................................. 100 3.4.5.1 Floating-Point Multiply-Add Instructions ....................................................................... 100 3.4.6 Floating-Point Rounding and Conversion Instructions ........................................................ 101 3.4.7 Floating-Point Compare Instructions ................................................................................... 101 3.4.8 Floating-Point Status and Control Register Instructions ...................................................... 102 4. Memory Management Unit ..................................................................................... 103 4.1 Overview ....................................................................................................................................... 4.2 Address Translation ...................................................................................................................... 4.3 MMU Implementation .................................................................................................................... 4.3.1 Translation Lookaside Buffer ............................................................................................... 4.3.2 UTLB Index Address Hash .................................................................................................. 4.3.3 Initialize a Single UTLB Entry .............................................................................................. 4.3.4 Tag Array ............................................................................................................................. 4.3.5 Comparison ......................................................................................................................... 4.3.6 Data Array ........................................................................................................................... 4.3.6.1 Hardware Enforced I = 1 = IL1I = IL1D ........................................................................ 4.3.7 Writing UTLB Entries ........................................................................................................... Version 2.2 July 31, 2014 103 103 104 106 106 107 108 109 109 110 110 Contents Page 5 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 4.3.8 Bolted UTLB Entries ............................................................................................................ 4.3.9 Hardware Assisted Way Selection ....................................................................................... 4.3.10 Searching UTLB Entries .................................................................................................... 4.3.10.1 Instruction-Side and Data-Side TLB Miss Searches .................................................. 4.3.11 Reading UTLB Entries ....................................................................................................... 4.3.12 Invalidating UTLB Entries .................................................................................................. 4.4 Access Control .............................................................................................................................. 4.4.1 Execute Access ................................................................................................................... 4.4.2 Write Access ........................................................................................................................ 4.4.3 Read Access ........................................................................................................................ 4.4.4 Access Control Applied to Cache Management Instructions ............................................... 4.5 Storage Attributes .......................................................................................................................... 4.5.1 Write-Through (W) ............................................................................................................... 4.5.2 Caching Inhibited (I) ............................................................................................................. 4.5.3 Hardware Enforced IL1I and IL1D ....................................................................................... 4.5.4 Memory Coherence Required (M) ....................................................................................... 4.5.5 Guarded (G) ......................................................................................................................... 4.5.6 Endian (E) ............................................................................................................................ 4.5.7 User-Definable (U0 - U3) ..................................................................................................... 4.5.8 Supported Storage Attribute Combinations ......................................................................... 4.5.9 Aliasing ................................................................................................................................ 4.6 MMU Registers .............................................................................................................................. 4.6.1 Process ID Register (PID) .................................................................................................... 4.6.2 Real Mode Page Description Register (RMPD) ................................................................... 4.6.3 MMU Bolted Entries 0 Register (MMUBE0) ......................................................................... 4.6.4 MMU Bolted Entries 1 Register (MMUBE1) ......................................................................... 4.6.5 Search Priority Configuration Registers ............................................................................... 4.6.6 Supervisor Search Priority Configuration Register (SSPCR) ............................................... 4.6.7 Invalidate Search Priority Configuration Register (ISPCR) .................................................. 4.6.8 User Search Priority Configuration Register (USPCR) ........................................................ 4.6.9 Reset Configuration Register (RSTCFG) ............................................................................. 4.6.10 MMU Configuration Register (MMUCR) ............................................................................ 4.7 UTLB Block Descriptions ............................................................................................................... 4.7.1 Tag Array ............................................................................................................................. 4.8 Software Considerations ............................................................................................................... 4.8.1 TLB Search Indexed (tlbsx) ................................................................................................ 4.8.2 TLB Read Entry (tlbre) ........................................................................................................ 4.8.3 TLB Write Entry (tlbwe) ....................................................................................................... 4.8.4 TLB Invalidate, Virtual Address Indexed (tlbivax) ............................................................... 4.9 UTLB Coherency ........................................................................................................................... 4.10 tlbsync Special Operations ........................................................................................................ 4.10.1 Remote tlbsync operation ................................................................................................. 4.10.1.1 CPU Remote tlbsync Operation ................................................................................ 4.10.1.2 L2 Cache Remote tlbsync Operations ...................................................................... 111 111 111 112 113 113 113 113 114 114 115 116 116 116 117 117 117 118 118 118 118 119 120 120 121 121 122 122 123 125 126 126 127 127 127 128 128 129 130 130 131 131 131 132 5. Instruction and Data Caches .................................................................................. 133 5.1 Cache Array Organization and Operation ..................................................................................... 133 5.2 Instruction Cache Controller .......................................................................................................... 134 5.2.1 I-Cache Operations .............................................................................................................. 134 Contents Page 6 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 5.2.2 Instruction Cache Parity Operations .................................................................................... 5.2.2.1 Instruction Cache Block Lock Clear (icblc) .................................................................. 5.2.2.2 Instruction Cache Block Invalidate (icbi) ...................................................................... 5.2.2.3 Instruction Cache Invalidate (ici) .................................................................................. 5.2.2.4 icbt ............................................................................................................................... 5.2.2.5 icbtls ............................................................................................................................ 5.2.2.6 icread ........................................................................................................................... 5.2.2.7 Instruction Cache Debug Data Register 0 (ICDBDR0) ................................................. 5.2.2.8 Instruction Cache Debug Data Register 1 (ICDBDR1) ................................................. 5.2.2.9 Instruction Cache Debug Tag Register Low (ICDBTRL) .............................................. 5.2.2.10 Instruction Cache Debug Tag Register High (ICDBTRH) .......................................... 5.2.2.11 Instruction Cache Parity Operations ........................................................................... 5.2.3 Speculative Prefetch ............................................................................................................ 5.2.4 Exceptions ........................................................................................................................... 5.2.4.1 Instruction Storage Interrupt ......................................................................................... 5.2.4.2 Instruction-Side UTLB Miss .......................................................................................... 5.2.4.3 Instruction-Side Machine Check ................................................................................... 5.3 ICU Special Purpose Registers ..................................................................................................... 5.3.1 Instruction Cache Error Syndrome Register (ICESR) ......................................................... 5.4 Self-Modifying Code ...................................................................................................................... 5.5 Data Cache Controller ................................................................................................................... 5.5.1 DCU Operations .................................................................................................................. 5.5.1.1 Load Operations ........................................................................................................... 5.5.1.2 Store Operations .......................................................................................................... 5.5.2 Store Gathering ................................................................................................................... 5.5.3 Line Flush Operations ......................................................................................................... 5.5.4 Storage Access Ordering .................................................................................................... 5.5.5 Data Cache Coherency ....................................................................................................... 5.5.6 Data Cache Control and Debug .......................................................................................... 5.5.7 Data Cache Management and Debug Instruction Summary ............................................... 5.5.7.1 Data Cache Block Zero (dcbz) ..................................................................................... 5.5.8 Data Cache Block Lock Clear (dcblc) ................................................................................. 5.5.9 Data Cache Block Store (dcbst) ......................................................................................... 5.5.10 Data Cache Block Flush (dcbf) ......................................................................................... 5.5.11 Data Cache Block Invalidate (dcbi) ................................................................................... 5.5.12 Data Cache Invalidate (dci) ............................................................................................... 5.5.13 Data Cache Block Touch (dcbt) ........................................................................................ 5.5.14 Data Cache Block Touch with Lock Set (dcbtls) .............................................................. 5.5.15 Data Cache Block Touch for Store (dcbtst) ...................................................................... 5.5.16 Data Cache Block Touch For Store with Lock Set (dcbtstls) ........................................... 5.5.17 Data Cache Read (dcread) ............................................................................................... 5.5.18 Memory Barrier Instructions .............................................................................................. 5.5.18.1 Memory Synchronization (msync) ............................................................................. 5.5.18.2 Memory Barrier (mbar) .............................................................................................. 5.5.18.3 Lightweight Sync (lwsync) ......................................................................................... 5.5.19 Core Configuration Registers (CCR0, CCR1, and CCR2) ................................................ 5.5.20 dcbt and dcbtst Operation ............................................................................................... 5.5.21 dcread Operation .............................................................................................................. 5.5.22 Data Cache Debug Tag Register Low (DCDBTRL) .......................................................... 5.5.23 Data Cache Debug Tag Register High (DCDBTRH) ......................................................... Version 2.2 July 31, 2014 134 135 135 135 135 136 136 137 137 137 138 138 138 139 139 139 139 139 140 140 141 142 142 142 142 143 143 144 144 144 144 145 145 145 146 146 147 147 148 148 149 149 150 150 150 150 151 151 152 153 Contents Page 7 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 5.5.24 Data Cache Parity Operations ........................................................................................... 153 5.5.24.1 Data Cache Exception Status Register (DCESR) ...................................................... 153 5.5.25 Simulating Data Cache Parity Errors for Software Testing ................................................ 155 6. Timer Facilities ........................................................................................................ 157 6.1 Time Base ..................................................................................................................................... 6.1.1 Reading the Time Base ....................................................................................................... 6.1.2 Writing the Time Base .......................................................................................................... 6.2 Decrementer and Decrementer Autoreload Registers .................................................................. 6.3 Fixed-Interval Timer ...................................................................................................................... 6.4 Watchdog Timer ............................................................................................................................ 6.5 Timer Control Register .................................................................................................................. 6.6 Timer Status Register .................................................................................................................... 6.7 Halting the Timer Facilities ............................................................................................................ 6.8 Selection of the Timer Clock Source ............................................................................................. 158 159 159 159 160 161 163 164 165 165 7. Processor Interrupts and Exceptions .................................................................... 167 7.1 Overview ....................................................................................................................................... 7.2 Interrupt Classes ........................................................................................................................... 7.2.1 Asynchronous Interrupts ...................................................................................................... 7.2.2 Synchronous Interrupts ........................................................................................................ 7.2.2.1 Synchronous, Precise Interrupts .................................................................................. 7.2.2.2 Synchronous, Imprecise Interrupts ............................................................................... 7.2.3 Critical and Noncritical Interrupts ......................................................................................... 7.2.4 Machine Check Interrupts .................................................................................................... 7.3 Interrupt Processing ...................................................................................................................... 7.3.1 Partially Executed Instructions ............................................................................................. 7.4 Interrupt Processing Registers ...................................................................................................... 7.4.1 Machine State Register (MSR) ............................................................................................ 7.4.2 Save/Restore Register 0 (SRR0) ......................................................................................... 7.4.3 Save/Restore Register 1 (SRR1) ......................................................................................... 7.4.4 Critical Save/Restore Register 0 (CSRR0) .......................................................................... 7.4.5 Critical Save/Restore Register 1 (CSRR1) .......................................................................... 7.4.6 Machine Check Save/Restore Register 0 (MCSRR0) ......................................................... 7.4.7 Machine Check Save/Restore Register 1 (MCSRR1) ......................................................... 7.4.8 Data Exception Address Register (DEAR) ........................................................................... 7.4.9 Interrupt Vector Offset Registers (IVOR0 - IVOR15) ........................................................... 7.4.10 Interrupt Vector Prefix Register (IVPR) .............................................................................. 7.4.11 Exception Syndrome Register (ESR) ................................................................................. 7.4.12 Machine Check Syndrome Register (MCSR) .................................................................... 7.5 Interrupt Definitions ....................................................................................................................... 7.5.1 Critical Input Interrupt ........................................................................................................... 7.5.2 Machine Check Interrupt ...................................................................................................... 7.5.3 Data Storage Interrupt ......................................................................................................... 7.5.4 Instruction Storage Interrupt ................................................................................................ 7.5.5 External Input Interrupt ........................................................................................................ 7.5.6 Alignment Interrupt ............................................................................................................... 7.5.7 Program Interrupt ................................................................................................................. 7.5.8 Floating-Point Unavailable Interrupt ..................................................................................... Contents Page 8 of 322 167 167 167 168 168 168 169 169 170 172 173 173 174 175 175 176 176 177 177 178 179 179 181 182 185 186 188 190 191 192 193 196 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 7.5.9 System Call Interrupt ........................................................................................................... 7.5.10 Decrementer Interrupt ....................................................................................................... 7.5.11 Fixed-Interval Timer Interrupt ............................................................................................ 7.5.12 Watchdog Timer Interrupt .................................................................................................. 7.5.13 Data TLB Error Interrupt .................................................................................................... 7.5.14 Instruction TLB Error Interrupt ........................................................................................... 7.5.15 Debug Interrupt .................................................................................................................. 7.6 Interrupt Ordering and Masking .................................................................................................... 7.6.1 Interrupt Ordering Software Requirements .......................................................................... 7.6.2 Interrupt Order ..................................................................................................................... 7.7 Exception Priorities ....................................................................................................................... 7.7.1 Exception Priorities for Integer Load, Store, and Cache Management Instructions ............ 7.7.2 Exception Priorities for Floating-Point Load and Store Instructions .................................... 7.7.3 Exception Priorities for Allocated Load and Store Instructions ............................................ 7.7.4 Exception Priorities for Floating-Point Instructions (Other) .................................................. 7.7.5 Exception Priorities for Allocated Instructions (Other) ......................................................... 7.7.6 Exception Priorities for Privileged Instructions .................................................................... 7.7.7 Exception Priorities for Trap Instructions ............................................................................. 7.7.8 Exception Priorities for System Call Instruction ................................................................... 7.7.9 Exception Priorities for Branch Instructions ......................................................................... 7.7.10 Exception Priorities for Return From Interrupt Instructions ................................................ 7.7.11 Exception Priorities for Preserved Instructions .................................................................. 7.7.12 Exception Priorities for Reserved Instructions ................................................................... 7.7.13 Exception Priorities for All Other Instructions .................................................................... 197 197 198 199 199 201 201 207 208 209 210 211 211 212 212 213 214 214 214 215 215 215 216 216 8. Debug Facilities ...................................................................................................... 217 8.1 Development Tool Support ........................................................................................................... 8.2 Debug Modes ................................................................................................................................ 8.2.1 Internal Debug Mode ........................................................................................................... 8.2.2 External Debug Mode .......................................................................................................... 8.2.3 Trace Mode ......................................................................................................................... 8.2.4 Debug Wait Enable Mode .................................................................................................... 8.3 Debug Events ................................................................................................................................ 8.3.1 Broadcast of Debug Events ................................................................................................. 8.3.2 Exceptions ........................................................................................................................... 8.3.3 Instruction Address Comparison ......................................................................................... 8.3.3.1 IAC Debug Events ........................................................................................................ 8.3.3.2 Exact Comparison Mode .............................................................................................. 8.3.3.3 Range Inclusive Comparison Mode ............................................................................. 8.3.3.4 Range Exclusive Comparison Mode ............................................................................ 8.3.3.5 IAC User/Supervisor Field ............................................................................................ 8.3.3.6 IAC Effective/Real Address Field ................................................................................. 8.3.3.7 IAC Range Mode Autotoggle Field ............................................................................... 8.3.4 Data Address Comparison .................................................................................................. 8.3.4.1 DAC Debug Event Fields ............................................................................................. 8.3.4.2 DAC Debug Events Applied to Instructions that Result in Multiple Storage Accesses 8.3.4.3 DAC Debug Events Applied to Various Instruction Types ............................................ 8.3.4.4 Data Value Compare (DVC) Debug Event ................................................................... 8.3.5 Trap ..................................................................................................................................... 8.3.6 Branch Taken ...................................................................................................................... Version 2.2 July 31, 2014 217 217 217 218 218 218 219 219 219 220 220 221 221 221 222 222 222 223 223 226 226 227 230 230 Contents Page 9 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 8.3.7 Instruction Completed .......................................................................................................... 8.3.8 Return Debug Events ........................................................................................................... 8.3.9 Interrupt Debug Events ........................................................................................................ 8.3.10 Unconditional Debug Events .............................................................................................. 8.4 Debug Timer Freeze ..................................................................................................................... 8.5 Debug Special Purpose Registers ................................................................................................ 8.5.1 Debug Control Register 0 (DBCR0) ..................................................................................... 8.5.2 Debug Control Register 1 (DBCR1) ..................................................................................... 8.5.3 Debug Control Register 2 (DBCR2) ..................................................................................... 8.5.4 Debug Status Register (DBSR) ........................................................................................... 8.5.5 Setting the DBSR Based on MSR[DE] and DBCR0[IDM] .................................................... 8.5.6 Instruction Address Comparison 1 - 4 (IAC1 - IAC4) ........................................................... 8.5.7 Setup Order for IACs, DACs, and DVCs .............................................................................. 8.6 JTAG and Debug Capabilities in a Multiprocessor SoC Environment ........................................... 8.6.1 Debug Bus Out Mask Register (DBOMask) ......................................................................... 8.6.2 Debug Input Mask Register (DBIMask) ............................................................................... 231 232 233 234 234 235 235 236 237 239 240 240 240 241 241 242 9. Initialization .............................................................................................................. 243 9.1 Processor Core State after Reset ................................................................................................. 9.2 Reset Types .................................................................................................................................. 9.3 Reset Sources ............................................................................................................................... 9.4 Initialization Software Requirements ............................................................................................. 243 249 250 250 10. L2 Cache and UTLB Synchronous Interfaces ..................................................... 255 10.1 L2 Cache Interface ...................................................................................................................... 10.2 L2 Cache Features ...................................................................................................................... 10.2.1 L2 Cache Storage Reservation Management .................................................................... 10.2.2 Performance Monitor ......................................................................................................... 10.2.2.1 Performance Monitor Unit Core Control Register 0 (PMUCC0) ................................. 10.2.3 Cache Operations Handling ............................................................................................... 10.2.4 tlbivax, tlbsync, msync, mbar Handling .......................................................................... 10.3 L1 Cache UTLB Snoop Interface ................................................................................................ 255 257 257 258 259 259 261 261 Appendix A. Register Summary ................................................................................. 263 A.1 Data Cache Address Compare 1 Register (DAC1) ....................................................................... A.2 Data Cache Address Compare 2 Register (DAC2) ....................................................................... A.3 Data Cache Value Compare 1 Register (DVC1) ........................................................................... A.4 Data Cache Value Compare 2 Register (DVC2) ........................................................................... A.5 Debug Data Register (DBDR) ....................................................................................................... A.6 Data Cache Exception Syndrome Register (DCESR) .................................................................. A.7 Instruction Opcode Compare Control Register (IOCCR) .............................................................. A.8 Instruction Opcode Compare Register 1 (IOCR1) ........................................................................ A.9 Instruction Opcode Compare Register 2 (IOCR2) ........................................................................ 266 266 266 266 266 268 269 269 270 Appendix B. Instruction Summary ............................................................................. 271 B.1 Instructions That Behave Differently from the Power ISA Specification ....................................... 271 B.2 Unsupported Power ISA Instructions ............................................................................................ 271 Contents Page 10 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core B.3 Integer Instructions in the PowerPC 476FP Processor ................................................................ 271 B.4 Floating-Point Instructions ............................................................................................................ 276 Appendix C. Instruction Execution Performance for Code Optimization .............. 279 C.1 PowerPC 476FP Pipeline Overview ............................................................................................. C.1.1 PowerPC 476FP Integer Pipelines ..................................................................................... C.1.1.1 ICRD, IST, and ISD Pipeline Stages ........................................................................... C.1.1.2 DISS Stage .................................................................................................................. C.1.1.3 RACC Stage ................................................................................................................ C.1.1.4 Execution Pipeline Stages ........................................................................................... C.1.2 PowerPC 476FP Floating-Point Pipelines .......................................................................... C.2 Instruction Execution Latency and Penalty ................................................................................... C.3 Instruction Fetch and Decode ....................................................................................................... C.3.1 Instruction Fetch Address Arbitration and Fetch Process ................................................... C.3.2 Instruction Predecode, Instruction Field Adjust, and Endian Adjust ................................... C.3.2.1 Instruction Field Adjust ................................................................................................ C.3.3 Instruction Predecode ......................................................................................................... C.4 Branch Prediction and Branch Instruction Processing ................................................................. C.4.1 Branch History Table Operation .......................................................................................... C.4.2 Global History Register Operation ...................................................................................... C.4.3 Branch Target Address CAM (BTAC) Operation ................................................................ C.4.4 Branch Link-Stack Operation .............................................................................................. C.4.5 Branch Instruction process ................................................................................................. C.4.6 Branch Information Queue Operation ................................................................................. C.5 Instruction Issue Operation ........................................................................................................... C.5.1 L-Pipe Instructions .............................................................................................................. C.5.2 I-Pipe Instructions ............................................................................................................... C.5.3 I-Pipe and J-Pipe Instructions ............................................................................................. C.5.4 B-Pipe Instructions .............................................................................................................. C.5.5 FA-pipe Instructions ............................................................................................................ C.5.6 FP FL-pipe Instructions ....................................................................................................... C.5.7 Special Issue Rules for System Synchronizing Instructions ............................................... C.6 Instruction Execution and Penalties ............................................................................................. C.6.1 Contention for the Same RACC Stage ............................................................................... C.6.2 GPR Operand Dependency ................................................................................................ C.6.3 General CR Operand Dependency ..................................................................................... C.6.4 Multiply Dependency ........................................................................................................... C.6.5 Multiply-Accumulate (MAC) Dependency ........................................................................... C.6.6 Divide Dependency ............................................................................................................. C.6.7 Move to Condition Register Fields (mtcrf) Instruction Dependency ................................... C.6.8 Store Word Conditional Indexed (stwcx.) Instruction Dependency .................................... C.6.9 Move from Conditional Register (mfcr) Instruction Dependency ........................................ C.6.10 Move from Special Purpose Register (mfspr) Dependency ............................................. C.6.11 Move from Machine State Register (mfmsr) Dependency ............................................... C.6.12 Move to Special Purpose Register (mtspr) Dependency ................................................. C.6.13 TLB Management Instruction Dependency ....................................................................... C.6.14 DCR Register Managing Instruction Operation Dependency ........................................... C.6.15 Processor Control Instruction Operation ........................................................................... C.6.16 Load Instruction Dependency ........................................................................................... Version 2.2 July 31, 2014 279 279 280 281 281 281 283 284 289 290 291 291 291 292 294 295 296 296 297 297 298 298 299 299 299 300 300 300 300 303 303 304 304 305 305 305 306 306 307 307 307 308 308 309 310 Contents Page 11 of 322 User’s Manual PowerPC 476FP Embedded Processor Core C.6.17 Load/Store Operations ...................................................................................................... C.6.18 String and Multiple Operations .......................................................................................... C.6.19 lwarx and stwcx. Operations ............................................................................................ C.6.20 Storage Ordering and Synchronizing Operations .............................................................. C.6.21 Special TLB Managing Operations .................................................................................... C.7 Interrupt Handling ......................................................................................................................... 310 310 311 311 311 312 Glossary ....................................................................................................................... 313 Index ............................................................................................................................. 317 Contents Page 12 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core List of Figures Figure 1-1. PowerPC 476FP Embedded Processor Core Block Diagram ................................................ 25 Figure 2-1. User Programming Model Registers ...................................................................................... 41 Figure 2-2. Supervisor Programming Model Registers ............................................................................ 42 Figure 3-1. Approximation to Real Numbers ............................................................................................ 90 Figure 3-2. Selection of z1 and z2 ............................................................................................................ 95 Figure 4-1. Address Mapping for each Page Size .................................................................................. 104 Figure 4-2. MMU Block Diagram ............................................................................................................ 105 Figure 4-3. Supervisor Search Priority Configuration Registers ............................................................. 123 Figure 4-4. Invalidate Search Priority Configuration Register ................................................................ 124 Figure 4-5. User Search Priority Configuration Registers (USPCR) ...................................................... 126 Figure 6-1. Relationship of Timer Facilities to the Time Base ................................................................ 157 Figure 6-2. Watchdog State Machine ..................................................................................................... 163 Figure 8-1. JTAG-Controlled MP DBSR Monitor Capability ................................................................... 241 Figure 8-2. JTAG-Controlled MP Stop and Run Control Capability. ....................................................... 242 Figure 10-1. L2 Cache and Interface Block Diagram ............................................................................... 256 Figure C-1. PowerPC 476FP Integer Pipeline Structure ......................................................................... 280 Figure C-2. PowerPC 476FP Floating-Point Pipeline Structure .............................................................. 283 Figure C-3. Instruction Sequence Without a Dependency ...................................................................... 286 Figure C-4. Instruction sequence with a dependency ............................................................................. 288 Figure C-5. Load Instruction Followed by an add with a Dependency on the Load ................................ 289 Figure C-6. Typical Branch-Predict-Taken Timing Diagram (Branch Target Address is Computed at ISD) ...................................................................... 293 Figure C-7. TBTAC and BHT Based Branch-Predict-Taken Timing Diagram (BTAC Hit and BTAC Contains the Branch Target Address) ............................................... 294 Figure C-8. Link-Stack Based Branch-Predict-taken Timing Diagram (Link-Stack Pops the Branch Target Address at Clock 3) .................................................... 294 Figure C-9. GHR use for BHT Lookup .................................................................................................... 296 Figure C-10. Instruction Sequence Example with no Dependency on the Integer Unit ............................ 302 Version 2.2 July 31, 2014 List of Figures Page 13 of 322 User’s Manual PowerPC 476FP Embedded Processor Core List of Figures Page 14 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core List of Tables Table 1-1. PowerPC 476FP Power Savings Modes ................................................................................ 27 Table 1-2. PowerPC 476FP Frequency switching ................................................................................... 27 Table 1-3. Frequency Switching Examples ............................................................................................. 28 Table 2-1. Data Operand Definitions ....................................................................................................... 34 Table 2-2. Alignment Effects for Storage Access Instructions ................................................................ 34 Table 2-3. Big-Endian Mapping of Structure S ........................................................................................ 37 Table 2-4. Little-Endian Mapping of Structure S ..................................................................................... 38 Table 2-5. PowerPC 476FP SPRs .......................................................................................................... 43 Table 2-6. Instruction Categories ........................................................................................................... 49 Table 2-7. Integer Storage Access Instructions ...................................................................................... 50 Table 2-8. Integer Arithmetic Instructions ................................................................................................ 50 Table 2-9. Integer Logical Instructions .................................................................................................... 51 Table 2-10. Integer Compare Instructions ................................................................................................. 51 Table 2-11. Integer Trap Instructions ........................................................................................................ 51 Table 2-12. Integer Rotate Instructions ..................................................................................................... 51 Table 2-13. Integer Shift Instructions ........................................................................................................ 52 Table 2-14. Integer Select Instruction ....................................................................................................... 52 Table 2-15. Branch Instructions ................................................................................................................ 52 Table 2-16. Condition Register Logical Instructions .................................................................................. 53 Table 2-17. Register Management Instructions ........................................................................................ 53 Table 2-18. System Linkage Instructions .................................................................................................. 53 Table 2-19. Processor Synchronization Instruction ................................................................................... 54 Table 2-20. Cache Management Instructions ........................................................................................... 54 Table 2-21. TLB Management Instructions ............................................................................................... 55 Table 2-22. Storage Synchronization Instructions ..................................................................................... 55 Table 2-23. Previous Integer Multiply-Accumulate Instructions ................................................................ 56 Table 2-24. BO Field Definition ................................................................................................................. 57 Table 2-25. BO Field Examples ................................................................................................................ 58 Table 2-26. CR Updating Instructions ....................................................................................................... 61 Table 2-27. XER[SO,OV] Updating Instructions ........................................................................................ 65 Table 2-28. XER[CA] Updating Instructions .............................................................................................. 65 Table 2-29. Privileged Instructions ............................................................................................................ 75 Table 3-1. Invalid Operation Exception Categories ................................................................................. 85 Table 3-2. Format Fields ......................................................................................................................... 90 Table 3-3. IEEE 754 Floating-Point Fields .............................................................................................. 90 Table 3-4. Rounding Modes .................................................................................................................... 95 Table 3-5. Floating-Point Load Instructions ............................................................................................. 98 Table 3-6. Floating-Point Store Instructions ............................................................................................ 99 Version 2.2 July 31, 2014 List of Tables Page 15 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table 3-7. Floating-Point Move Instructions ..........................................................................................100 Table 3-8. Floating-Point Elementary Arithmetic Instructions ................................................................100 Table 3-9. Floating-Point Multiply-Add Instructions ...............................................................................101 Table 3-10. Floating-Point Rounding and Conversion Instructions .........................................................101 Table 3-11. Comparison Sets ..................................................................................................................102 Table 3-12. Floating-Point Compare and Select Instructions ..................................................................102 Table 3-13. Floating-Point Status and Control Register Instructions .......................................................102 Table 4-1. PowerPC 476FP Processor MMU ........................................................................................103 Table 4-2. UTLB Set Address Generation Hashing Function ................................................................107 Table 4-3. UTLB Tag Field Description .................................................................................................108 Table 4-4. EPN and EA Comparison .....................................................................................................109 Table 4-5. UTLB Data Field Description ................................................................................................110 Table 4-6. Access Control Applied to Cache Management Instructions ................................................117 Table 4-7. MMU SPR Summary ............................................................................................................119 Table 5-1. Instruction and Data Cache Array Organization ...................................................................133 Table 5-2. Cache Size and Parameters .................................................................................................134 Table 5-3. EA Format icread .................................................................................................................136 Table 5-4. ICU Special Purpose Registers ............................................................................................139 Table 5-5. Effective Address Format for icread and dcread .................................................................151 Table 6-1. Timer Register Summary ......................................................................................................158 Table 6-2. Fixed-Interval Timer Period Selection ..................................................................................161 Table 6-3. Watchdog Timer Period Selection ........................................................................................161 Table 6-4. Watchdog Timer Exception Behavior ...................................................................................162 Table 7-1. Interrupt Types Associated with each IVOR .........................................................................178 Table 7-2. Interrupt and Exception Types ..............................................................................................183 Table 7-3. BRT Debug Event Actions ....................................................................................................202 Table 7-4. TRAP Debug Event Actions .................................................................................................203 Table 7-5. RET Debug Event Actions ....................................................................................................203 Table 7-6. ICMP Debug Event Actions ..................................................................................................204 Table 7-7. IRPT Debug Event Actions ...................................................................................................204 Table 7-8. UDE Debug Event Actions ...................................................................................................204 Table 8-1. IAC Range Mode Toggle Summary ......................................................................................223 Table 8-2. Trap Debug Event Actions ....................................................................................................230 Table 8-3. BRT Debug Event Actions ....................................................................................................231 Table 8-4. ICMP Debug Event Actions ..................................................................................................231 Table 8-5. RET Debug Event Actions ....................................................................................................232 Table 8-6. IRPT Debug Event Actions ...................................................................................................233 Table 8-7. UDE Debug Event Actions ...................................................................................................234 Table 8-8. Setting the DBSR based on MSR[DE] and DBCR0[IDM] .....................................................240 List of Tables Page 16 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Table 9-1. Reset Values of Registers and Other PowerPC 476FP Facilities ........................................ 244 Table 10-1. lwarx and stwcx. Actions in the L2 Cache and Processor Core ......................................... 258 Table 10-2. CT Field Value and Cache Level ......................................................................................... 259 Table 10-3. Cache Operations ................................................................................................................ 259 Table A-1. Register Categories ............................................................................................................. 263 Table B-1. New Instructions in the PowerPC 476FP Core .................................................................... 271 Table B-2. Power ISA V2.05 Integer Instructions .................................................................................. 271 Table B-3. Floating-Point Instructions .................................................................................................... 276 Table C-1. Instruction Predecode Bit Definition ..................................................................................... 292 Table C-2. Branch Prediction and BHT, GHR, and BTAC Use .............................................................. 293 Table C-3. Link-Stack Operations .......................................................................................................... 297 Version 2.2 July 31, 2014 List of Tables Page 17 of 322 User’s Manual PowerPC 476FP Embedded Processor Core List of Tables Page 18 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Revision Log Revision Date Description July 31, 2014 Version 2.2 • Revised Section 2.7.4 Core Configuration Register 0 (CCR0) on page 69. February 26, 2014 Version 2.1 • Revised Section 2.7.4 Core Configuration Register 0 (CCR0) on page 69. • Revised Section 7.5.15 Debug Interrupt on page 201. • Revised Section 8.2.1 Internal Debug Mode on page 217. • Revised Section 8.2.3 Trace Mode on page 218. January 10, 2014 Version 2.0 • Revised Section 5.2.2.9 Instruction Cache Debug Tag Register Low (ICDBTRL) on page 137. June 24, 2013 May 1, 2013 Version 1.9 • Revised Appendix C.2 Instruction Execution Latency and Penalty on page 284. • Revised Figure C-5 Load Instruction Followed by an add with a Dependency on the Load on page 289. Version 1.8 • Revised Table 4-3 UTLB Tag Field Description on page 108. • Revised Table 4-5 UTLB Data Field Description on page 110. • Revised Figure 6-1 Relationship of Timer Facilities to the Time Base on page 157. • Revised Section 7.5.6 Alignment Interrupt on page 192. • Revised Section 9.1 Processor Core State after Reset on page 243. • Revised Section 9.4 Initialization Software Requirements on page 250. April 20, 2012 Version 1.7 • Revised Section 2.10.3 Storage Ordering and Synchronization on page 78. • Revised Section 4.3.3 Initialize a Single UTLB Entry on page 107. • Revised Section 4.8.1 TLB Search Indexed (tlbsx) on page 128. • Revised Section 7.4.12 Machine Check Syndrome Register (MCSR) on page 181. • Revised Table 9-1 Reset Values of Registers and Other PowerPC 476FP Facilities on page 244. • Changed ESR[MCI] to ESR[ISMC] throughout the book. October 26, 2011 Version 1.6 • Revised the book title. • Revised About this Document on page 23. • Revised Related Publications on page 23. • Revised Section 2.7.4 Core Configuration Register 0 (CCR0) on page 69. • Revised Section 2.7.5 Core Configuration Register 1 (CCR1) on page 70. • Revised Section 3 Floating-Point Unit Programming Model on page 85. • Revised Section 3.2.1.2 Floating-Point Status and Control Register (FPSCR) on page 87. • Revised Section 4.5.5 Guarded (G) on page 117. • Revised Section 4.6.2 Real Mode Page Description Register (RMPD) on page 120. • Revised Section 5.2.2.6 icread on page 136. • Revised Section 6.8 Selection of the Timer Clock Source on page 165. • Revised Table 9-1 Reset Values of Registers and Other PowerPC 476FP Facilities on page 244. • Revised Section 9.4 Initialization Software Requirements on page 250. April 13, 2011 Version 2.2 July 31, 2014 Version 1.5 • Added “PowerPC 470S Synthesizable Core” to the title page and a reference to the 407S core in About this Document on page 23. • Revised Section 4.2 Address Translation on page 103. • Revised Figure 4-1 Address Mapping for each Page Size on page 104. • Revised Table 4-3 UTLB Tag Field Description on page 108. • Revised Table 4-5 UTLB Data Field Description on page 110. • Revised Table A-1 Register Categories on page 263. Revision Log Page 19 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Revision Date January 19, 2011 Description Version 1.4 • Changed SSPCR to ISPCR in Section 4.3.12 Invalidating UTLB Entries on page 113. • Made the following changes in Section 4.8.4 TLB Invalidate, Virtual Address Indexed (tlbivax) on page 130: – Removed a reference to tlbivax. – Changed SSPCR to ISPCR. – Removed a reference to USPCR entries. – Changed isync to tlbsync. • Added references to SPR addresses of x‘23C’ and x‘33C’ in Section 7.4.12 Machine Check Syndrome Register (MCSR) on page 181. November 23, 2010 Version 1.3 • Removed a reference to the mfapidi instruction in Section 2.3.1 Defined Instruction Class on page 47. • Changed SPRG7 to SPRG8 in Section 2.7.1 Special Purpose Registers General (USPRG0, SPRG0 SPRG8) on page 67. • Added bit 21 DPC to Core Configuration Register 1 (CCR1) on page 70. • Made corrections to text and code in Section 4.8.2 TLB Read Entry (tlbre) on page 128 and Section 4.8.3 TLB Write Entry (tlbwe) on page 129. • Added information that tlbivax and tlbsync are never executed simultaneously in Section 4.9 UTLB Coherency on page 130. • Added information that an isync must follow an ici instruction in Section 5.2.2.3 Instruction Cache Invalidate (ici) on page 135. • Added information that an isync instead of an msync can follow a dci instruction in Section 5.5.12 Data Cache Invalidate (dci) on page 146. • Changed “privileged instructions cannot be executed” to read “the processor is in privileged state” for bit 49 in Section 7.4.1 Machine State Register (MSR) on page 173. • Removed a reference to tlbiva in Section 7.5.7 Program Interrupt on page 193. • Changed dccci to dci, iccci to ici, and removed a reference to tlbiva in Section 7.7.6 Exception Priorities for Privileged Instructions on page 214. • Added Section 8.6 JTAG and Debug Capabilities in a Multiprocessor SoC Environment on page 241. • Added substeps to step 4 in Section 9.4 Initialization Software Requirements on page 250. • Made minor updates to Table 10-3 Cache Operations on page 259. • Made minor updates to Section A Register Summary on page 263. • Moved the Debug Bus Out Mask Register (DBOMask) from Appendix A to Section 8.6.1 Debug Bus Out Mask Register (DBOMask) on page 241. • Moved the Debug Input Mask Register (DBIMask) from Appendix A to Section 8.6.2 Debug Input Mask Register (DBIMask) on page 242. September 16, 2010 Version 1.2 Made minor technical corrections to the following sections: • Section 1 Overview on page 25. • Section 4.3.3 Initialize a Single UTLB Entry on page 107. • Section 4.6.9 Reset Configuration Register (RSTCFG) on page 126. • Section 4.8.3 TLB Write Entry (tlbwe) on page 129. • Section 5.1 Cache Array Organization and Operation on page 133. • Section 9.4 Initialization Software Requirements (step f on page 252). Revision Log Page 20 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Revision Date Description September 1, 2010 Version 1.1 • Made various changes to the following sections: – Section 1.2 Power Control Features on page 26 – Section 2.2 Registers on page 40 – Section 2.11 Storage Model on page 81 – Section 3.2 Floating-Point Registers on page 86 – Section 3.4 Floating-Point Instructions on page 95 – Section 4.3 MMU Implementation on page 104 – Section 4.4 Access Control on page 113 – Section 4.10 tlbsync Special Operations on page 131 – Section 5 Instruction and Data Caches on page 133 – Section 5.1 Cache Array Organization and Operation on page 133 – Section 5.2 Instruction Cache Controller on page 134 – Section 5.3 ICU Special Purpose Registers on page 139 – Section 5.4 Self-Modifying Code on page 140 – Section 5.5 Data Cache Controller on page 141 – Section 7.1 Overview on page 167 – Section 7.5 Interrupt Definitions on page 182 – Section 8 Debug Facilities on page 217 (throughout the entire chapter) – Section 10.2 L2 Cache Features on page 257 – Section 10.3 L1 Cache UTLB Snoop Interface on page 261 – Appendix A Register Summary on page 263 • Changed the register bit numbers from [0:31] to [32:63] in the following sections: – Section 2.5.5.2 Count Register (CTR) on page 60 – Section 2.5.5.3 Condition Register (CR) on page 60 – Section 2.6.1 General Purpose Registers (GPRs) on page 63 – Section 2.6.2 Fixed-Point Exception Register (XER) on page 64 – Section 2.7.1 Special Purpose Registers General (USPRG0, SPRG0 - SPRG8) on page 67 – Section 2.7.2 Processor Version Register (PVR) on page 68 – Section 2.7.3 Processor Identification Register (PIR) on page 68 – Section 3.2.1.2 Floating-Point Status and Control Register (FPSCR) on page 87 – Section 6.5 Timer Control Register on page 163 – Section 6.6 Timer Status Register on page 164 – Section 7.4.1 Machine State Register (MSR) on page 173 – Section 7.4.2 Save/Restore Register 0 (SRR0) on page 174 – Section 7.4.4 Critical Save/Restore Register 0 (CSRR0) on page 175 – Section 7.4.5 Critical Save/Restore Register 1 (CSRR1) on page 176 – Section 7.4.6 Machine Check Save/Restore Register 0 (MCSRR0) on page 176 – Appendix A Register Summary on page 263 • Added Appendix C Instruction Execution Performance for Code Optimization on page 279. September 15, 2009 Initial release. Version 2.2 July 31, 2014 Revision Log Page 21 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Revision Log Page 22 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core About this Document This user’s manual describes the IBM® PowerPC® 476FP core. The core can be embedded into higher-function application-specific integrated circuit (ASIC) designs to provide a comprehensive control and computation device. The document provides information about the registers, facilities, initialization, and use of the processor core. It is intended for the use of programmers and engineers who are creating software to control the processor. Related Publications The following document can be helpful a reference when reading this user’s manual: • Power ISA, version 2.05 The following documents are IBM confidential. For access to these documents, contact your IBM representative: • PowerPC 476FP Embedded Processor Core Support Manual • PowerPC 470S Synthesizable Core Support Manual • PowerPC 476FP L2 Cache Core Databook • DCR Arbiter Core Data Book • Multiprocessor Interrupt Controller Data Book Documentation Conventions This section explains numbers, bit fields, instructions, and signals that are in this document. Representation of Numbers Numbers are generally shown in decimal format, unless designated as follows: • Hexadecimal values are preceded by an “x” and enclosed in single quotation marks. For example: x‘0A00’. • Binary values in sentences are shown in single quotation marks. For example: ‘1010’. Note: A bit value that is immaterial, which is called a “don't care” bit, is represented by an “x.” Bit Significance In the PowerPC 476FP documentation, the smallest bit number represents the most significant bit of a field, and the largest bit number represents the least significant bit of a field. Other Conventions PowerPC 476FP processor instruction mnemonics are shown in lower-case, bold text. For example: tlbivax. I/O signal names are shown in upper case. Version 2.2 July 31, 2014 About this Document Page 23 of 322 User’s Manual PowerPC 476FP Embedded Processor Core About this Document Page 24 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 1. Overview The PowerPC 476FP embedded processor core is a 4-issue, 6-pipeline (floating-point [FP] L-pipe and integer L-pipe are shared), superscalar, 32-bit reduced instruction set computer (RISC) processor. The core supports the Power Instruction Set Architecture (ISA) Version 2.05. The architectural flexibility of this core enhances IBM application-specific integrated circuit (ASIC) solutions and applications. The core also supports memory coherency to broaden ASIC solutions into multiprocessing system environments and to increase its scalability for emerging wired communications, storage, and pervasive computing applications. Figure 1-1 shows the overall organization of the processor core. Figure 1-1. PowerPC 476FP Embedded Processor Core Block Diagram Snoop Bus and L2 Cache Interface 128-bit I-Data L2 Cache Interface Snoop Interface Predecoder 32 KB Instruction Cache 128-bit D-Data L2 Cache Interface ITLB 1024-Entry Memory Management Unit DTLB 32 KB Data Cache 4 KB Branch History Table Instruction Unit Branch Unit Issue Queue (DISSQ) 8-Entry, 4-Issue (includes FP) Branch Target Instruction Buffer Link Stack Floating-Point Unit (Four Instructions) DCR Bus JTAG Debug Trace Timer Interrupt Control Branch Pipeline Clock and Power Management DCR DISSQ DTLB D-Data FP GPR ITLB Version 2.2 July 31, 2014 Multiply and Divide Pipeline General Purpose Registers MAC Device Control Register Decode issue queue Data translation lookaside buffer Data-cache data Floating-Point General purpose registers Instruction translation lookaside buffer Complex Integer Pipeline Simple Integer Pipeline PGPR PGPR I-Data JTAG KB L2 MAC PGPR General Purpose Registers Load and Store Pipeline Floating- FloatingPoint Point Load Arithmetic and Pipeline Store Unit (4 words) FloatingPoint Registers Instruction-cache data Joint Test Action Group 1024 bytes Level 2 cache Multiply and accumulate Pre-GPR buffers Overview Page 25 of 274 User’s Manual PowerPC 476FP Embedded Processor Core 1.1 General Features The PowerPC 476FP processor core provides the following features: • Four-issue architecture (decode and issue [DISS] decode complexity) • Five pipelines and a separate floating-point (FP) arithmetic pipeline: – – – – – – Branch pipeline L pipeline (for load and store operations) J pipeline (for simple arithmetic and logical operations) Instruction pipeline (simple and complex instruction pipeline, miscellaneous instruction pipeline) Multiplication and division pipeline FP execution pipeline • Two-cycle pipelined cache accesses • Real-address tagging for both the instruction cache and the data cache • Indexing of the instruction cache with the virtual address • Indexing of the data cache with the real address • Snoopable instruction and data caches • 1024-entry unified translation lookaside buffer (UTLB) • Pregeneral purpose register (PGPR) temporary buffers to capture and hold results until commitment time when the results are transferred to general purpose registers (GPRs) • Early delivery of instructions to the floating-point unit (FPU) is enabled because all instructions are predecoded 1.2 Power Control Features The following design features minimize the operating power of the PowerPC 476FP processor core: • All latches are clock gated so that idle functions do not waste power. • All nonexecuting and idle functions are disabled. • Static random access memory (SRAM) is partitioned so that only the required portion of the SRAM is enabled or selected. • Doze and idle sleep modes are available. • The central logic and the floating-point unit have separate clock enables. 1.2.1 Power Control Modes The power control modes for the PowerPC 476FP core are CPU sleep mode, CPU doze mode, and CPU cold mode. CPU sleep mode has the following characteristics: • The CPU clock is turned off by deasserting the clock enable signal, but the timer clock still runs to maintain the time base. Overview Page 26 of 274 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core • The CPU can be awakened by enabling the clock using an interrupt such as an external interrupt, decrementer (DEC) interrupt, watchdog timer interrupt, or fixed-interval timer (FIT) interrupt if the application has implemented those mechanisms. • Sleep mode can be controlled dynamically and randomly. • Because the processor does not maintain the cache or memory management unit (MMU) coherency, the following steps must be taken: – The L2 cache must be in a state of not having the processor or a special DCR being set. – The processor caches must be invalidated by using the data cache invalidate (dci) and instruction cache invalidate (ici) instructions at the beginning of the awakening process. – The processor MMU must be invalidated by using the translation lookaside buffer write entry (tlbwe) instruction except for one (or required) entry that handles the exception or interrupt. CPU doze mode has the following characteristics: • The processor is in either wait state or halt state when doze mode is used. • The CPU clock continues to run, but no instructions are executed. • Doze mode allows the processor to process exceptions and interrupts; however, when the return from interrupt (rfi) instruction is issued, the processor goes back into doze mode. • The processor maintains the cache and MMU coherency. In CPU cold mode, the PowerPC 476FP core is powered off. Table 1-1 lists the power control modes of the PowerPC 476FP core. Table 1-1. PowerPC 476FP Power Savings Modes Operation Mode Mode Core Clock Core Power Core State Definition Sleep Off On Yes • Low power • Timers on Doze On On Yes • Lower power • Timer on • Interrupt serviced Cold Off Off No • No power Effect Off • Clock gated • Dynamically awakened or put to sleep Standby • Timer running, interrupt serviced Off • Power off, lose state Table 1-2 defines how frequency switching affects the power control modes of the PowerPC 476FP core. Table 1-2. PowerPC 476FP Frequency switching Mode Core Clock Core Power Core State Definition Operation Mode Shift On On Yes Frequency switch Running Version 2.2 July 31, 2014 Allowed Scale CPU, L2, PLB6, and DCR clock ratios remain constant × glitchless switching Overview Page 27 of 274 User’s Manual PowerPC 476FP Embedded Processor Core Table 1-3 provides examples of frequency switching. Table 1-3. Frequency Switching Examples CPU Clock L2 Clock PLB6 Clock Initial 1600 MHz 800 MHz 800 MHz Supervisor write (SW) 1 800 MHz 400 MHz 400 MHz SW 2 533+ MHz 266.5 MHz 266.5 MHz SW 3 400 MHz 200 MHz 200 MHz 1.2.2 Power Control Procedures Power control procedures consist of putting the PowerPC 476FP core into sleep mode, doze mode, and waking up the processor from those modes. 1.2.2.1 CPU Sleep Mode To put the PowerPC 476FP core into sleep mode, perform the following steps. Note: Ensure the core is not processing storage instructions. 1. Issue the system call (sc) instruction to put the processor in privileged mode. 2. Ensure the PowerPC 476FP core is not processing storage instructions, then issue the instruction synchronize (isync) instruction, then the synchronize (msync) instructions. 3. Set up the L2 cache for sleep mode by setting L2SLEEPREQ[31] = ‘1’, then issue the isync and move to Special Purpose Register (mtmsr) instructions to set Machine State Register (MSR)[WE] = ‘1’. This puts the processor in wait mode (similar to doze mode). 4. The processor asserts the core sleep request signal, the clock and power management (CPM) interface deasserts the CPU clock enable signal, and the processor goes into sleep mode. 1.2.2.2 CPU Doze Mode To put the processor in CPU doze mode, perform the following steps: 1. Set up MSR[WE] = ‘1’ either by calling an executive routine that sets up the MSR (MSR) by issuing the mtmsr instruction or by setting Save/Restore Register 1 (SRR1)[WE] = ‘1’ in an interrupt handler. When in doze mode, the processor stops instruction fetching and executions and goes into stop state. Any exception or interrupt will wake up the processor. 2. To wake up the processor, reset SRR1[WE] to ‘0’. Otherwise, the processor remains in doze mode after the rfi instruction is issued. 1.2.2.3 Waking up the Processor To wake up the processor, perform the following steps: 1. Generate one of the following interrupts: external, DEC, FIT, or watchdog timer. 2. Enable the CPU clock by asserting the clock enable (CPMC476CLKEN) signal. This will take you to an interrupt handler. Overview Page 28 of 274 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 3. Set SRR1[WE] = ‘0’. 4. Invalidate MMU entries, except for entries required to handle interrupts and exceptions. 5. Invalidate the L1 instruction cache (I-cache) by issuing the ici instruction. 6. Invalidate the L1 data cache (D-cache) by issuing the dci instruction (note the CT field of instructions). 7. Set up the L2 cache to wake up by setting L2 Sleep Request Register (L2SLEEPREQ)[31] = ‘0’ (See the L2 Cache Core Controller Databook), then issue the isync instruction. 8. Process the interrupt, and issue the rfi instruction. The processor will now be awakened. Consult with IBM PowerPC support for further details about the implementation and coding of sleep mode, doze mode, or waking up the processor. 1.3 Implemented Instruction Set All Power ISA Version 2.05 instructions in Book I, Book II, and Book III-E are implemented in the PowerPC 476FP processor core. The following categories of instructions are supported: • • • • • • • Base Embedded Embedded cache debug Embedded cache initialization Embedded little-endian Embedded cache locking Memory coherence See Appendix B Instruction Summary on page 271 for a list of the implemented instructions. 1.4 Test and Debug Facilities Like the previous PowerPC 4xx processor cores, the PowerPC 476FP processor core provides RISCWatch and Joint Test Action Group (JTAG) interfaces that enable the following functions: • Reset the processor core • Stop, halt, and start the processor core • Perform debug operations • Trace the status and operation of the processor core and other cores and devices in the ASIC Additional debug and status observing capabilities facilitate debugging and monitoring the processor cores and other cores in a multiprocessor system-on-a-chip (SoC) environment. The following features defined by the Power.org Common Debug Interface Technical Committee are implemented: • JTAG-controlled multiprocessor Debug Status Register (DBSR) monitor capability • JTAG stop and run controls These capabilities provide the means to observe DBSR bits individually under JTAG control. The processor can also be started and stopped through JTAG control. See Section 8 Debug Facilities on page 217 for more information about the use of the JTAG features. Version 2.2 July 31, 2014 Overview Page 29 of 274 User’s Manual PowerPC 476FP Embedded Processor Core 1.5 Floating-Point Unit Overview The FPU is a pipelined, double-precision math computation processing unit that is attached to the processor core. The FPU conforms to the ANSI/IEEE Standard 754-1985, IEEE Standard for Binary Floating-Point Arithmetic. The following key design features are included in the PowerPC 476FP FPU: • Complies with the ANSI/IEEE 754-1985 floating-point standard: – Single-precision floating-point standard – Double-precision floating-point standard • PowerPC floating-point instruction set • Compliance with Book E: Enhanced PowerPC Architecture • Superscalar operation with independent floating-point load-and-store and execution units • Six-stage super-pipelined floating-point arithmetic execution – Extended division stages – Extended operation stages for denormalized operation See Section 3 Floating-Point Unit Programming Model on page 85 for more information. 1.6 Instruction Cache Overview The instruction cache unit (ICU) is divided into several subunits: the instruction cache array, the instruction cache control unit, the instruction-side translation lookaside buffer (ITLB), the branch history table, and the instruction fetch unit. The instruction cache array consists of standard SRAMs for instruction data and tag information, arranged as a 4-way, set-associative cache. The instruction data bus that enters the core is either 128 or 256 bits wide and is loaded into an 8-word instruction-line fill buffer. The replacement method is a 6-bit least recently used (LRU) algorithm, with provisions for cache locking or partitioning. The ITLB, instruction-cache control unit, and instruction fetch unit are completely synthesizable logic blocks. They contain both data path and control logic. The ITLB is an 8-entry, fully-associative array. Its main purpose is to enhance performance and reduce TLB contention between instruction accesses and load-and-store operations. Each ITLB entry contains the translation information for a page. The processor uses this information to translate the address of instruction accesses when the MSR[IS] = ‘1’. The branch history module improves branch prediction by tracking the most likely outcome of recently taken program branches. See Section 5 Instruction and Data Caches on page 133 for more information about the instruction cache. Overview Page 30 of 274 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 1.7 Data Cache Unit Overview The data cache unit (DCU) primarily consists of the following three subunits: data cache arrays, data cache control, and the data TLB (DTLB). The data cache array subunit contains three arrays: the LRU, the tag, and the data arrays. The tag and data arrays are standard SRAMs. The LRU array is a smaller, dual-port register array. Both the tag and data arrays are 4-way set associative and have a pipelined, 2-cycle access. The DCU incorporates an LRU replacement algorithm that uses a 6-bit age vector in combination with way locking to determine the best candidate for a replacement. The DCU can receive 256 bits of read data simultaneously from the bus and can send up to 128 bits of write data. The data cache is nonblocking. Cache coherency is supported through the level 2 (L2) cache interface by using write-through mode. The data cache control subunit includes all of the DCU pipeline controls, cache arbitration, the Special Purpose Registers (SPRs), and the snoop pipeline. It drives most of the data path flow and operation. The DTLB is an 8-entry, fully-associative cache that uses the effective address to quickly calculate the real address. It is accessed in parallel with the other three arrays. If a DTLB miss occurs, a request is made to the UTLB to calculate the real address. See Section 5 Instruction and Data Caches on page 133 for more information. 1.8 Memory Management Unit Overview The memory management unit (MMU) provides cache control, access protection, and address translation. The MMU contains the UTLB, control logic, and registers that support the UTLB. The MMU interfaces with the execution unit (EU), the instruction unit (IU), the ICU, the DCU, and the TLB snoop interface. The EU interface performs TLB instructions such as translation lookaside buffer read entry (tlbre), tlbwe, translation lookaside buffer search index (tlbsx), and translation lookaside buffer invalidate bus transaction (tlbivax). The IU interface provides the translation space (TS) and data space (DS) selector bits for a lookup request from the DCU, ICU, or TLB snoop. The ICU generates a lookup request to the UTLB on an ITLB miss. Similarly, the DCU generates a lookup request to the UTLB on a DTLB miss. The MMU arbitrates requests from the ICU, DCU, EU, and snoop interface, and provides the data for each request. Software manages the MMU, but hardware-assisting logic is provided for replacing entries. Software writes entries into the UTLB so that they can be read by using the hash function described in Section 4.3.2 UTLB Index Address Hash on page 106. Freescale-style MMU enhanced operation is not supported. See Section 4 Memory Management Unit on page 103 for more information. 1.9 Timers The PowerPC 476FP processor core contains a time base and three timers: a decrementer (DEC), a fixedinterval timer (FIT), and a watchdog timer. The time base is a 64-bit counter that is incremented at a frequency either equal to the processor core clock rate or controlled by a separate asynchronous timer clock supplied to the core. No interrupt is generated if the time base wraps back to zero. Version 2.2 July 31, 2014 Overview Page 31 of 274 User’s Manual PowerPC 476FP Embedded Processor Core The DEC is a 32-bit register that is decremented at the rate at which the time base is incremented. The user loads the DEC register with a value to create the required interval. When the register is decremented to zero, a status bit is set and an exception is generated that can notify software. Optionally, the DEC can be programmed to automatically reload the value contained in the Decrementer Auto-Reload Register (DECAR), after which the DEC resumes decrementing. The FIT can generate periodic interrupts based on a transition of 1 user-selected bit from 4 time base bits. When the selected bit changes from ‘0’ to ‘1’, a status bit is set and an exception is generated that can notify software. The watchdog timer also generates a periodic interrupt based on the transition of a selected bit from the time base. The user can choose one of four intervals for the watchdog timer. Upon the first transition from ‘0’ to ‘1’ of the selected time base bit, the watchdog timer generates an exception that can notify software. The watchdog timer can also be configured to initiate a hardware reset if a second transition of the selected time base bit occurs before the first watchdog exception is serviced. This capability provides an extra measure of recoverability from potential system lockups. The timer functions of the PowerPC 476FP processor core are more fully described in Section 6 Timer Facilities on page 157. Overview Page 32 of 274 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 2. Programming Model The programming model of the PowerPC 476FP core describes the following features and operations of the processor from a programmer’s perspective: • • • • • • • • • • Storage Addressing (including data types and byte ordering), starting on page 33 Registers, starting on page 40 Instruction Classes, starting on page 47 Instruction Set, starting on page 49 Branch Processing, starting on page 56 Integer Processing, starting on page 63 Processor Control, starting on page 66 User and Supervisor Modes, starting on page 75 Speculative Accesses, starting on page 76 Synchronization, starting on page 76 2.1 Storage Addressing As a 32-bit implementation of the Power Instruction Set Architecture (ISA) Version 2.05, the PowerPC 476FP core implements a uniform 32-bit effective address (EA) space. Effective addresses are expanded into virtual addresses and are then translated to 42-bit (4 TB) real addresses by the memory management unit (see Section 4 Memory Management Unit on page 103 for more information about the translation process). The organization of the real address space into a physical address space is system-dependent and is described in the user’s manuals for chip-level products that incorporate a PowerPC 476FP core. The PowerPC 476FP core generates an effective address whenever it executes a storage access, branch, cache management, or translation lookaside buffer (TLB) management instruction, or when it fetches the next sequential instruction. 2.1.1 Storage Operands Bytes in storage are numbered consecutively starting with 0. Each number is the address of the corresponding byte. Data storage operands accessed by the integer load/store instructions can be bytes (8-bit), halfwords (16-bit), words (32-bit), and double word (64-bit); or, for load/store multiple and string instructions, a sequence of words or bytes. Data storage operands accessed by floating-point (FP) load or store instructions can be bytes, halfwords, words, doublewords, or quadwords. The address of a storage operand is the address of its first byte (that is, of its lowest-numbered byte). Byte ordering can be either big-endian or little-endian, as controlled by the endian storage attribute (see Section 2.1.3 Byte Ordering on page 36; also see Section 4.5.6 Endian (E) on page 118 for more information about the endian storage attribute). Operand length is implicit for each scalar storage access instruction type (that is, each storage access instruction type other than the load/store multiple and string instructions). The operand of such a scalar storage access instruction has a natural alignment boundary equal to the operand length. Therefore, the natural address of an operand is an integral multiple of the operand length. A storage operand is said to be aligned if it is aligned at its natural boundary; otherwise, it is said to be unaligned. Table 2-1 on page 34 lists the storage access instructions for the data storage operands. Version 2.2 July 31, 2014 Programming Model Page 33 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table 2-1. Data Operand Definitions Storage Access Instruction Type Operand Length Addr[28:31] if aligned Byte (or String) 8 bits ‘xxxx’ Halfword 2 bytes ‘xxx0’ Word (or Multiple) 4 bytes ‘xx00’ Doubleword (FP only) 8 bytes ‘x000’ Note: An x in an address bit position indicates that the bit can be ‘0’ or ‘1’ independently of the state of other bits in the address. The alignment of the operand effective address of some storage access instructions can affect performance, and in some cases, can cause an alignment exception to occur. For such storage access instructions, the best performance is obtained when the storage operands are aligned. Table 2-2 summarizes the effects of alignment on those storage access instruction types for which such effects exist. If an instruction type is not shown in the table, there are no alignment effects for that instruction type. Table 2-2. Alignment Effects for Storage Access Instructions Storage Access Instruction Type Integer load/store halfword Integer load/store word Integer load/store multiple or string FP load/store word FP load/store doubleword Alignment Effects Broken into two byte accesses if it crosses the 8-byte boundary; otherwise no effect. Broken into two accesses if it crosses the 8-byte boundary; otherwise no effect. Broken into a series of 4-byte accesses until the last byte is accessed or a 8-byte boundary is reached, whichever occurs first. If bytes remain past a 8-byte boundary, resume accessing 4 bytes at a time until the last byte is accessed or the next 8-byte boundary is reached, whichever occurs first; repeat. Alignment exception if it crosses the word boundary; otherwise no effect (see note). Alignment exception if it crosses the double words boundary; otherwise no effect (see note). Note: The floating-point unit can specify that the EA for a particular FP load or store instruction must be aligned at the operand-size boundary, or alternatively, at a word boundary. If the FPU indicates this requirement and the calculated EA fails to meet it, the PowerPC 476FP core generates an alignment exception. Alternatively, the FPU can specify that the EA for a particular FP load or store instruction should be forced to be aligned by ignoring the appropriate number of low-order EA bits and processing the FP load or store as if those bits were ‘0’. Byte, halfword, word, doubleword, and quadword FP load or store instructions ignore 0, 1, 2, 3, and 4 low-order EA bits. Cache management instructions access cache block operands; for the PowerPC 476FP core the cache block size is 32 bytes. However, the effective addresses calculated by cache management instructions are not required to be aligned on cache block boundaries. Instead, the architecture specifies that the associated loworder effective address bits (bits 27:31 for PowerPC 476FP core) are ignored during the execution of these instructions. Similarly, the TLB management instructions access page operands, and, as determined by the page size, the associated low-order effective address bits are ignored during the execution of these instructions. Instruction storage operands, however, are always 4 bytes long, and the effective addresses calculated by branch instructions are therefore always word-aligned. 2.1.2 Effective Address Calculation For a storage access instruction, if the sum of the effective address and the operand length exceeds the maximum effective address of 232–1 (that is, the storage operand itself crosses the maximum address boundary), the result of the operation is undefined, as specified by the architecture. The PowerPC 476FP core performs the operation as if the storage operand wrapped around from the maximum effective address Programming Model Page 34 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core to effective address 0. However, software should not depend upon this behavior, so that it can be ported to other implementations that do not handle this scenario in the same fashion. Accordingly, software should ensure that no data storage operands cross the maximum address boundary. Note: Because instructions are words and the effective addresses of instructions are always implicitly on word boundaries, it is not possible for an instruction storage operand to cross any word boundary, including the maximum address boundary. Effective address arithmetic, which calculates the starting address for storage operands, wraps around from the maximum address to address 0 for all effective address computations except next sequential instruction fetching. See Section 2.1.2.2 for more information about next sequential instruction fetching at the maximum address boundary. 2.1.2.1 Data Storage Addressing Modes There are two data storage addressing modes supported by the PowerPC 476FP core: • Base + displacement (D-mode) addressing mode: The 16-bit D field is sign-extended and added to the contents of the General Purpose Register (GPR) designated by RA, or to zero if RA = ‘0’; the low-order 32 bits of the sum form the effective address of the data storage operand. • Base + index (X-mode) addressing mode: The contents of the GPR designated by RB (or the value 0 for load string word immediate [lswi] and store string word immediate ([stswi]) are added to the contents of the GPR designated by RA, or to 0 if RA = ‘0’; the low-order 32 bits of the sum form the effective address of the data storage operand. 2.1.2.2 Instruction Storage Addressing Modes There are four instruction storage addressing modes supported by the PowerPC 476FP core: • I-form branch instructions (unconditional): The 24-bit LI field is concatenated on the right with ‘00’, sign-extended, and then added to either the address of the branch instruction if the absolute address (AA) instruction field equals 0, or to 0 if AA = ‘1’; the low-order 32 bits of the sum form the effective address of the next instruction. • Taken B-form branch instructions: The 14-bit branch displacement (BD) field is concatenated on the right with ‘00’, sign-extended, and then added to either the address of the branch instruction if AA = ‘0’, or to 0 if AA = ‘1’; the low-order 32 bits of the sum form the effective address of the next instruction. • Taken XL-form branch instructions: The contents of bits 0:29 of the Link Register (LR) or bits 32:61 of the Count Register (CTR) are concatenated on the right with ‘00’ to form the 32-bit effective address of the next instruction. • Next sequential instruction fetching (including nontaken branch instructions): The value 4 is added to the address of the current instruction to form the 32-bit effective address of the next instruction. If the address of the current instruction is x‘FFFF FFFC’, the PowerPC 476FP core wraps the next sequential instruction address back to address 0. This behavior is not required by the architecture, which specifies that the next sequential instruction address is undefined under these circumstances. Therefore, software should not depend upon this behavior, so that it can be ported to other implementations that do not handle this scenario in the same fashion. Accordingly, if software must execute across this maximum address boundary and wrap back to address 0, it should place an unconditional branch at the boundary, with a displacement of 4. Version 2.2 July 31, 2014 Programming Model Page 35 of 322 User’s Manual PowerPC 476FP Embedded Processor Core In addition to the four instruction storage addressing modes, the following behavior applies to branch instructions: • Any branch instruction with the link bit (LK) equal to ‘1’: The value 4 is added to the address of the current instruction and the low-order 32 bits of the result are placed into the LR. As for the similar scenario for next sequential instruction fetching, if the address of the branch instruction is x‘FFFF FFFC’, the result placed into the LR is architecturally undefined, although once again the PowerPC 476FP core wraps the LR update value back to address 0. Again, however, software should not depend on this behavior, in order that it can be ported to implementations that do not handle this scenario in the same fashion. 2.1.3 Byte Ordering If scalars (individual data items and instructions) were indivisible, there would be no such concept as byte ordering. It is meaningless to consider the order of bits or groups of bits within the smallest addressable unit of storage, because nothing can be observed about such order. Only when scalars, which the programmer and processor regard as indivisible quantities, can comprise more than one addressable unit of storage does the question of order arise. For a system in which the smallest addressable unit of storage is the 64-bit doubleword, there is no question of the ordering of bytes within doublewords. All transfers of individual scalars between registers and storage are of doublewords, and the address of the byte containing the high-order 8 bits of a scalar is no different from the address of a byte containing any other part of the scalar. For the Book III-E Enhanced PowerPC Architecture, as for most current computer architectures, the smallest addressable unit of storage is the 8-bit byte. Many scalars are halfwords, words, or doublewords, that consist of groups of bytes. When a word-length scalar is moved from a register to storage, the scalar occupies four consecutive byte addresses. It thus becomes meaningful to present the order of the byte addresses with respect to the value of the scalar: which byte contains the highest-order 8 bits of the scalar, which byte contains the next-highest-order 8 bits, and so on. Given a scalar that contains multiple bytes, the choice of byte ordering is essentially arbitrary. There are 4! = 24 ways to specify the ordering of 4 bytes within a word, but only two of these orderings are sensible: • The ordering that assigns the lowest address to the highest-order (left-most) 8 bits of the scalar, the next sequential address to the next-highest-order 8 bits, and so on. This ordering is called big-endian because the big end (most significant end) of the scalar, considered as a binary number, comes first in storage. IBM eServer™ pSeries® and IBM zSeries® are examples of computer architectures that use this byte ordering. • The ordering that assigns the lowest address to the lowest-order (right-most) 8 bits of the scalar, the next sequential address to the next-lowest-order 8 bits, and so on. This ordering is called little-endian because the little end (least significant end) of the scalar, considered as a binary number, comes first in storage. The Intel® x86 is an example of a processor architecture that uses this byte ordering. Power ISA supports both big-endian and little-endian byte ordering, for both instruction and data storage accesses. Which byte ordering is used is controlled on a memory page basis by the endian (E) storage attribute, which is a field within the TLB entry for the page. The endian storage attribute is set to ‘0’ for a bigendian page, and is set to ‘1’ for a little-endian page. See Section 4 Memory Management Unit on page 103 for more information about memory pages, the TLB, and storage attributes, including the endian storage attribute. Programming Model Page 36 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 2.1.3.1 Structure Mapping Examples The following C language structure, s, contains an assortment of scalars and a character string. The comments show the value assumed to be in each structure element; these values show how the bytes comprising each structure element are mapped into storage. struct { int a; long long b; char *c; char d[7]; short e; int f; } s; /* /* /* /* /* /* x‘1112_1314’ word */ x‘2122_2324_2526_2728’ doubleword */ x‘3132_3334’ word */ 'A','B','C','D','E','F','G' array of bytes */ x‘5152’ halfword */ x‘6162_6364’ word */ C structure mapping rules permit the use of padding (skipped bytes) to align scalars on desirable boundaries. Big-Endian Mapping and Little-Endian Mapping show structure-mapping examples where each scalar is aligned at its natural boundary. This alignment introduces padding of 4 bytes between a and b, 1 byte between d and e, and 2 bytes between e and f. The same amount of padding is present in both big-endian and little-endian mappings. Big-Endian Mapping The big-endian mapping of structure s is shown in Table 2-3, with the data highlighted in the structure mappings. The hexadecimal addresses are shown below the data stored at the address. The contents of each byte, as defined in structure s, is shown as a (hexadecimal) number or character (for the string elements). The shaded cells correspond to padded bytes. Table 2-3. Big-Endian Mapping of Structure S 11 12 13 14 x‘00’ x‘01’ x‘02’ x‘03’ x‘04’ x‘05’ x‘06’ x‘07’ 21 22 23 24 25 26 27 28 x‘08’ x‘09’ x‘0A’ x‘0B’ x‘0C’ x‘0D’ x‘0E’ x‘0F’ 31 32 33 34 'A' 'B' 'C' 'D' x‘10’ x‘11’ x‘12’ x‘13’ x‘14’ x‘15’ x‘16’ x‘17’ 'E' 'F' 'G' 51 52 x‘18’ x‘19’ x‘1A’ x‘1B’ x‘1C’ x‘1D’ x‘1E’ x‘1F’ 61 62 63 64 x‘20’ x‘21’ x‘22’ x‘23’ x‘24’ x‘25’ x‘26’ x‘27’ Little-Endian Mapping Table 2-4 shows structure s is mapped into a little-endian format. The shaded cells correspond to padded bytes. Version 2.2 July 31, 2014 Programming Model Page 37 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table 2-4. Little-Endian Mapping of Structure S 14 13 12 11 x‘00’ x‘01’ x‘02’ x‘03’ x‘04’ x‘05’ x‘06’ x‘07’ 28 27 26 25 24 23 22 21 x‘08’ x‘09’ x‘0A’ x‘0B’ x‘0C’ x‘0D’ x‘0E’ x‘0F’ 34 33 32 31 'A' 'B' 'C' 'D' x‘10’ x‘11’ x‘12’ x‘13’ x‘14’ x‘15’ x‘16’ x‘17’ 'E' 'F' 'G' 52 51 x‘18’ x‘19’ x‘1A’ x‘1B’ x‘1C’ x‘1D’ x‘1E’ x‘1F’ 64 63 62 61 x‘20’ x‘21’ x‘22’ x‘23’ x‘24’ x‘25’ x‘26’ x‘27’ 2.1.3.2 Instruction Byte Ordering Power ISA defines instructions as aligned words (4 bytes) in memory. As such, instructions in a big-endian program image are arranged with the most significant byte (MSB) of the instruction word at the lowestnumbered address. Consider the big-endian mapping of instruction p at address x‘00’, where, for example, p = add r7, r7, r4: MSB x‘00’ LSB x‘01’ x‘02’ x‘03’ In a little-endian mapping the same instruction is arranged with the least significant byte (LSB) of the instruction word at the lowest-numbered address: LSB x‘00’ MSB x‘01’ x‘02’ x‘03’ By the definition of Power ISA bit numbering, the most significant byte of an instruction is the byte containing bits 0:7 of the instruction. The most significant byte is the one that contains the primary opcode field (bits 0:5). Because of this difference in byte orderings, the processor must perform whatever byte reversal is required (depending on the particular byte ordering in use) to correctly deliver the opcode field to the instruction decoder. In the PowerPC 476FP core, this reversal is performed between the memory interface and the instruction cache, according to the value of the endian storage attribute for each memory page, such that the bytes in the instruction cache are always correctly arranged for delivery directly to the instruction decoder. If the endian storage attribute for a memory page is reprogrammed from one byte ordering to the other, the contents of the memory page must be reloaded with program and data structures that are in the appropriate byte ordering. Furthermore, anytime the contents of instruction memory change, the instruction cache must be made coherent with the updates by invalidating the instruction cache and refetching the updated memory contents with the new byte ordering. Programming Model Page 38 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 2.1.3.3 Data Byte Ordering Unlike instruction fetches, data accesses cannot be byte-reversed between memory and the data cache. Data byte ordering in memory depends upon the data type (byte, halfword, word, and so on) of a specific data item. It is only when moving a data item of a specific type from or to an architected register, as directed by the execution of a particular storage access instruction, that it becomes known what kind of byte reversal can be required due to the byte ordering of the memory page containing the data item. Therefore, byte reversal during load or store accesses is performed between data cache (or for a data cache miss, between memory) and the load register target or store register source, depending on the specific type of load or store instruction (that is, byte, halfword, word, and so on). Comparing the big-endian and little-endian mappings of structure s, as shown in Section 2.1.3.1 Structure Mapping Examples on page 37, the differences between the byte locations of any data item in the structure depends upon the size of the particular data item. For example, again referring to the big-endian and littleendian mappings of structure s: • The word a has its 4 bytes reversed within the word spanning addresses x‘00’ through x‘03’. • The halfword e has its two bytes reversed within the halfword spanning addresses x‘1C’ through x‘1D’. The array of bytes d, where each data item is a byte, is not reversed when the big-endian and little-endian mappings are compared. For example, the character 'A' is located at address x‘14’ in both the big-endian and little-endian mappings. The size of the data item being loaded or stored must be known before the processor can determine whether, and if so how, to reorder the bytes when moving them between a register and the data cache (or memory): • For byte loads and stores, including strings, no reordering of bytes occurs, regardless of byte ordering. • For halfword loads and stores, bytes are reversed within the halfword, for one byte order with respect to the other. • For word loads and stores (including load/store multiple), bytes are reversed within the word, for one byte order with respect to the other. This mechanism applies independently of the alignment of data. That is, when loading a multibyte data operand with a scalar load instruction, bytes are accessed from the data cache (or memory) starting with the byte at the calculated effective address and continuing with consecutively higher-numbered bytes until the required number of bytes have been retrieved. Then, the bytes are arranged such that either the byte from the highest-numbered address (for big-endian storage regions) or the lowest-numbered address (for littleendian storage regions) is placed into the least significant byte of the register. The rest of the register is filled in corresponding order with the rest of the accessed bytes. An analogous procedure is followed for scalar store instructions. For load/store multiple instructions, each group of 4 bytes is transferred between memory and the register according to the procedure for a scalar load word instruction. For load/store string instructions, the most significant byte of the first register is transferred to or from memory at the starting (lowest-numbered) effective address, regardless of byte ordering. Subsequent register bytes (from most significant to least significant, and then moving into the next register, starting with the most significant byte, and so on) are transferred to or from memory at sequentially higher-numbered addresses. This behavior for byte strings ensures that if two strings are loaded into registers and then compared, the first bytes of the strings are treated as most significant with respect to the comparison. Version 2.2 July 31, 2014 Programming Model Page 39 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 2.1.3.4 Byte-Reverse Instructions Power ISA defines load/store byte-reverse instructions which can access storage that is specified as being of one byte ordering in the same manner that a regular (that is, non-byte-reverse) load/store instruction accesses storage that is specified as being of the opposite byte ordering. That is, a load/store byte-reverse instruction to a big-endian memory page transfers data between the data cache (or memory) and the register in the same manner that a normal load/store transfers the data to or from a little-endian memory page. Similarly, a load/store byte-reverse instruction to a little-endian memory page transfers data between the data cache (or memory) and the register in the same manner that a normal load/store transfers the data to or from a big-endian memory page. The function of the load/store byte-reverse instructions is useful when a particular memory page contains a combination of data with both big-endian and little-endian byte ordering. In such an environment, the endian storage attribute for the memory page is set according to the predominant byte ordering for the page and the normal load/store instructions are used to access data operands that used this predominant byte ordering. Conversely, the load/store byte-reverse instructions are used to access the data operands that were of the other (less prevalent) byte ordering. Software compilers cannot typically make general use of the load/store byte-reverse instructions. Such instructions are ordinarily used only in special, hand-coded device drivers. 2.2 Registers This section provides an overview of the register categories and types provided by the PowerPC 476FP core. Detailed descriptions of each of the registers are provided within the sections covering the functions with which they are associated (for example, the cache control and cache debug registers are described in Section 5 Instruction and Data Caches on page 133). An alphabetic summary of all registers, including bit definitions, is provided in Appendix A Register Summary on page 263. All registers in the PowerPC 476FP core are architected as 32 bits wide (bits 32:63 and the higher order bits 0:31 are ignored unless specified otherwise), although certain bits in some registers are reserved and thus not necessarily implemented. For all registers with fields marked as reserved, these reserved fields should be written as 0 and read as undefined. The recommended coding practice is to perform the initial write to a register with reserved fields set to 0 and to perform all subsequent writes to the register by using a readmodify-write strategy: read the register; use logical instructions to alter defined fields, leaving reserved fields unmodified; and write the register. All Floating-Point Registers (FPRs) are 64 bits, and specified as bits 0:63. See the floating-point processor chapter in Power ISA Version 2.05, Book-I for more information. All of the registers are grouped into categories according to the processor functions with which they are associated. In addition, each register is classified as being of a particular type, as characterized by the specific instructions that are used to read and write registers of that type. Finally, most of the registers contained within the PowerPC 476FP core are defined by the Power ISA, although some registers are implementation-specific and unique to the PowerPC 476FP core. Figure 2-1 on page 41 illustrates the PowerPC 476FP core registers contained in the user programming model, that is, those registers to which access is nonprivileged and that are available to both user and supervisor programs. Programming Model Page 40 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Figure 2-1. User Programming Model Registers Integer Processing General Purpose Branch Control Condition Register GPR0 CR GPR1 Count Register GPR2 CTR ‚ ‚ ‚ Link Register LR GPR31 Processor Control Integer Exception Register XER Timer SPR General 4 - 7 SPRG4 SPRG5 Time Base SPRG5 TBL TBU SPRG7 User SPR General 0 USPRG0 Figure 2-2 on page 42 illustrates the PowerPC 476FP core registers contained in the supervisor programming model, to which access is privileged and that are available to supervisor programs only. See Section 2.8 User and Supervisor Modes on page 75 for more information about privileged instructions and register access and the user and supervisor programming models. Version 2.2 July 31, 2014 Programming Model Page 41 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Figure 2-2. Supervisor Programming Model Registers Processor Control Machine State Register Timer Time Base Storage Control Process ID MSR TBU PID Processor Version Register TBL MMU Control Register PVR Timer Control Register Processor ID Register TCR PIR Timer Status Register Core Configuration Registers TSR DBSR CCR0 Decrementer Debug Data Register CCR1 MMUCR Debug Debug Status Register DEC DBDR Decrementer Auto-Reload Debug Control Registers Reset Configuration DECAR DBCR0 RSTCFG Integer Exception Register DBCR1 SPR General XER DBCR2 SPRG0 ‚ ‚ ‚ SPRG8 Link Register Data Address Compares LR DAC1 CCR2 Count Register DAC2 Interrupt Processing Exception Syndrome Register CTR Data Value Compares ESR User SPR General 0 DVC1 USPGR0 DVC2 MCSR Real Mode Page Descriptor Register Instruction Address Compares Data Exception Address Register RMPD IAC1 MMU Bolted Entry Specification Registers IAC2 MMUBE0 IAC3 MMUBE1 IAC4 Critical Save/Restore Registers Supervisor Search Priority Configuration Register Cache Debug Instruction Cache Debug Data Registers CSRR0 SSPCR ICDBDR0 CSRR1 User Search Priority Configuration Register ICDBDR1 Machine Check Syndrome Register DEAR Save/Restore Registers SRR0 SRR1 Machine Check Save/Restore Registers Instruction Cache Debug Tag Registers MCSRR0 USPCR MCSRR1 tlbivax, tlbsx Search Priority Configuration Register ICDBTRL IVPR ISPCR Data Cache Debug Tag Registers Interrupt Vector Offset Registers Instruction Opcode Compare Control Register Interrupt Vector Prefix Register ICDBTRH DCDBTRH DCDBTRL IVOR0 ‚ ‚ ‚ IVOR15 Instruction Opcode Compare Registers DCR Immediate Prefix Register IOCR1 Data Cache Exception Syndrome Register DCRIPR IOCR2 DCESR Programming Model Page 42 of 322 IOCCR Instruction Cache Exception Syndrome Register ICESR Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Table 2-5 lists the PowerPC 476FP Special Purpose Registers (SPRS), their decimal and binary SPR numbers (SPRNs), their access, and a cross-reference to the section that describes them more fully. Registers that are not part of Power ISA, and are thus specific to the PowerPC 476FP core, are shown in italics. Unless otherwise indicated, all registers have read/write access. Note: See Table A-1 on page 263 for the register categories, the registers that belong to each category, and with their types. Table 2-5. PowerPC 476FP SPRs (Page 1 of 3) SPR SPR Name Decimal SPRN Binary SPRN Privileged Access Page CCR0 Core Configuration Register 0 947 ‘11101 10011’ Yes R/W 69 CCR1 Core Configuration Register 1 888 ‘11011 11000’ Yes R/W 70 CCR2 Core Configuration Register 2 889 ‘11011 11001’ Yes R/W 73 CTR Count Register 9 ‘00000 01001’ No R/W 60 CSRR0 Critical Save/Restore Register 0 58 ‘00001 11010’ Yes R/W 175 CSRR1 Critical Save/Restore Register 1 59 ‘00001 11011’ Yes R/W 176 DAC1 Data Address Compare 1 316 ‘01001 11100’ Yes R/W 266 DAC2 Data Address Compare 2 317 ‘01001 11101’ Yes R/W 266 DCDBTRH Data Cache Debug Tag Register High 925 ‘11100 11101’ Yes Read 153 DCDBTRL Data Cache Debug Tag Register Low 924 ‘11100 11100’ Yes Read 152 DEAR Data Exception Address Register 61 ‘00001 11101’ Yes R/W 177 DVC1 Data Value Compare 1 318 ‘01001 11110’ Yes R/W 266 DVC2 Data Value Compare 2 319 ‘01001 11111’ Yes R/W 266 DCESR D-cache Exception Syndrome Register 850 ‘11010 10010’ Yes R/W 268 DCRIPR DCR Immediate Prefix Register 891 ‘11011 11011’ Yes R/W 74 DBCR0 Debug Control Register 0 308 ‘01001 10100’ Yes R/W 235 DBCR1 Debug Control Register 1 309 ‘01001 10101’ Yes R/W 236 DBCR2 Debug Control Register 2 310 ‘01001 10110’ Yes R/W 237 DBDR Debug Data Register 1011 ‘11111 10011’ Yes R/W 266 DBSR Debug Status Register 304 ‘01001 10000’ Yes Read/Clear 239 816 ‘11001 10000’ Yes Write DEC Decrementer 22 ‘00000 10110’ Yes R/W 159 DECAR Decrementer Autoreload 54 ‘00001 10110’ Yes R/W 159 ESR Exception Syndrome Register 62 ‘00001 11110’ Yes R/W 179 ICESR I-cache Exception Syndrome Register 851 ‘11010 10011’ Yes R/W 140 IAC1 Instruction Address Compare 1 312 ‘01001 11000’ Yes R/W 240 Note: R = read; W = write. Version 2.2 July 31, 2014 Programming Model Page 43 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table 2-5. PowerPC 476FP SPRs (Page 2 of 3) SPR SPR Name Decimal SPRN Binary SPRN Privileged Access Page IAC2 Instruction Address Compare 2 313 ‘01001 11001’ Yes R/W 240 IAC3 Instruction Address Compare 3 314 ‘01001 11010’ Yes R/W 240 IAC4 Instruction Address Compare 4 315 ‘01001 11011’ Yes R/W 240 ICDBDR0 Instruction Cache Debug Data Register 0 979 ‘11110 10011’ Yes Read 137 ICDBDR1 Instruction Cache Debug Data Register 1 980 ‘11110 10100’ Yes Read 137 ICDBTRH Instruction Cache Debug Tag Register High 927 ‘11100 11111’ Yes Read 138 ICDBTRL Instruction Cache Debug Tag Register Low 926 ‘11100 11110’ Yes Read 137 IOCCR Instruction Opcode Compare Control Register 860 ‘11010 11100’ Yes R/W 269 IOCR1 Instruction Opcode Compare Register 1 861 ‘11010 11101’ Yes R/W 269 IOCR2 Instruction Opcode Compare Register 2 862 ‘11010 11110’ Yes R/W 270 XER Fixed-Point Exception Register 1 ‘00000 00001’ No R/W 64 IVOR[0-15] Interrupt Vector Offset Register 400 - 415 ‘01100 1xxxx’ Yes R/W 178 IVPR Interrupt Vector Prefix Register 63 ‘00001 11111’ Yes R/W 179 LR Link Register 8 ‘00000 01000’ No R/W 59 MCSRR0 Machine Check Save/Restore Register 0 570 ‘10001 11010’ Yes R/W 176 MCSRR1 Machine Check Save/Restore Register 1 571 ‘10001 11011’ Yes R/W 177 MCSR Machine Check Syndrome Register 572 ‘10001 11100’ Yes R/W 181 828 ‘11001 11100’ Yes Clear MSR Machine State Register - - Yes R/W 173 MMUBE0 MMU Bolted Entry-0 Spec Register 820 ‘11001 10100’ Yes R/W 121 MMUBE1 MMU Bolted Entry-1 Spec Register 821 ‘11001 10101’ Yes R/W 121 MMUCR MMU Control Register 946 ‘11101 10010’ Yes R/W 126 PMUCC0 PMU Core Control Register 858 ‘11010 11010’ Yes R/W 259 PMUCC0 PMU Core Control Register, User 842 ‘11010 01010’ No Read 259 PID Process ID Register 48 ‘00001 10000’ Yes R/W 120 PIR Processor ID Register 286 ‘01000 11110’ Yes Read 68 PVR Processor Version Register 287 ‘01000 11111’ Yes Read 68 PWM Pulse Width Margin Register 886 ‘11011 10110’ Yes R/W RMPD Real Mode Page Descriptor Register 825 ‘11001 11001’ Yes R/W 120 RSTCFG Reset Configuration Register 923 ‘11100 11011’ Yes Read 126 SRR0 Save/Restore Register 0 26 ‘00000 11010’ Yes R/W 174 Note: R = read; W = write. Programming Model Page 44 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Table 2-5. PowerPC 476FP SPRs (Page 3 of 3) SPR SPR Name Decimal SPRN Binary SPRN Privileged Access Page SRR1 Save/Restore Register 1 27 ‘00000 11011’ Yes R/W 175 SPRG[0-3] SPR General 0 - 3 272 - 275 ‘01000 100xx’ Yes R/W 67 SPRG3 SPR General 3 259 ‘01000 00011’ No Read 67 SPRG[4-7] SPR General 4 -7 260 - 263 ‘01000 001xx’ No Read 67 SPRG[4-7] SPR General 4 -7 276 - 279 ‘01000 101xx’ Yes R/W 67 SPRG8 SPR General 8 604 ‘10010 11100’ Yes R/W 67 SSPCR Supervisor Search Priority Configuration Register 830 ‘11001 11110’ Yes R/W 122 TBL Time Base Register 268 ‘01000 01100’ No Read 158 TBL Time Base Register Lower 284 ‘01000 11100’ Yes Write 158 TBU Time Base Register Upper 269 ‘01000 01101’ No Read 158 285 ‘01000 11101’ Yes Write TCR Timer Control Register 340 ‘01010 10100’ Yes R/W 163 TSR Timer Status Register 336 ‘01010 10000’ Yes R/C 164 848 ‘11010 10000’ Yes Write ISPCR tlbivax, tlbsx Search Priority Configuration Register 829 ‘11001 11101’ Yes R/W 123 USPCR User Search Priority Configuration Register 831 ‘11001 11111’ Yes R/W 125 USPGR0 User SPR General 0 256 ‘01000 00000’ No R/W 67 Note: R = read; W = write. 2.2.1 Register Types There are five register types contained within or supported by the PowerPC 476FP core. Each register type is characterized by the instructions that are used to read and write the registers of that type. The following subsections provide an overview of each of the register types and the instructions associated with them. 2.2.1.1 General Purpose Registers The PowerPC 476FP core contains 32 GPRs; each contains a 32-bit integer. Data from the data cache or memory can be loaded into GPRs by using integer load instructions; the contents of GPRs can be stored to the data cache or memory by using integer store instructions. Most of the integer instructions reference GPRs. The GPRs are also used as targets and sources for most of the instructions that read and write the other register types. Section 2.6 Integer Processing on page 63 provides more information about integer operations and the use of GPRs. Version 2.2 July 31, 2014 Programming Model Page 45 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 2.2.1.2 Special Purpose Registers Special Purpose Registers (SPRs) (see Table 2-5 on page 43 and Table A-1 on page 263) are directly accessed by using the mtspr and mfspr instructions. In addition, certain SPRs can be updated as a sideeffect of the execution of various instructions. For example, the Fixed-Point Exception Register (XER) (see Section 2.6.2 Fixed-Point Exception Register (XER) on page 64) is an SPR that is updated with arithmetic status (such as carry and overflow) upon execution of certain forms of integer arithmetic instructions. SPRs control the use of the debug facilities, timers, interrupts, memory management, caches, and other architected processor resources. Table A-1 Register Categories on page 263 shows the name, mnemonic, and address for each SPR. Each of the SPRs is described in more detail within the section covering the function with which it is associated. See Table 2-5 on page 43 for a list of all SPRs. 2.2.1.3 Condition Register The Condition Register (CR) is a unique type of 32-bit register and is divided into eight independent 4-bit fields (CR0 - CR7). The CR can be used to record certain conditional results of various arithmetic and logical operations. Subsequently, conditional branch instructions can designate a bit of the CR as one of the branch conditions (see Section 2.5 Branch Processing on page 56). Instructions are also provided for performing logical bit operations and for moving fields within the CR. See Section 2.5.5.3 Condition Register (CR) on page 60 and the condition register section of the branch chapter in Power ISA Version 2.05, Book-I for more information about the various instructions that can update the CR. 2.2.1.4 Machine State Register The Machine State Register (MSR) is a unique type of register that controls important chip functions, such as enabling or disabling various interrupt types. The MSR can be written from a GPR by using the mtmsr instruction. The contents of the MSR can be read into a GPR by using the mfmsr instruction. The MSR[EE] bit can be set or cleared atomically by using the wrtee or wrteei instructions. The MSR contents are also automatically saved, altered, and restored by the interrupt-handling mechanism. See Section 7.4.1 Machine State Register (MSR) on page 173 for more detailed information about the MSR and the function of each of its bits. 2.2.1.5 Device Control Registers Device Control Registers (DCRs) are on-chip registers that exist architecturally and physically outside the PowerPC 476FP core, and thus are not specified by the Power ISA, nor by this user’s manual for the PowerPC 476FP core. Rather, Power ISA defines the existence of the DCR address space and the instructions that access the DCRs and does not define any particular DCRs. The DCR access instructions are move to device control register (mtdcr) and move from device control register (mfdcr), which move data between GPRs and the DCRs. DCRs can be used to control various on-chip system functions, such as the operation of on-chip buses, peripherals, and certain processor behaviors. Programming Model Page 46 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core To accommodate additional DCRs and SPRs in a system, the PowerPC 476FP core has added additional instructions, such as mtdcrx (move to device control register indexed), mtdcrux (move to device control register user-mode indexed), mfdcrx (move from device control register indexed), and mfdcrux (move from device control register user-mode indexed). 2.3 Instruction Classes Power ISA defines all instructions as falling into one of the following four classes, as determined by the primary opcode (and the extended opcode, if any): 1. Defined 2. Preserved 3. Reserved (-illegal or -no-op) 2.3.1 Defined Instruction Class This class of instructions consists of all the instructions defined in Power ISA Version 2.05, Book-I, Book-II, and Book-III E. In general, defined instructions are guaranteed to be supported within a Power ISA system as specified by the architecture, either within the processor implementation itself or within emulation software supported by the system operating software. One exception to this is that, for implementations (such as the PowerPC 476FP core) that only provide the 32-bit subset of Power ISA, it is not expected (and likely not even possible) that emulation of the 64-bit behavior of the defined instructions will be provided by the system. As defined by Power ISA, any attempt to execute a defined instruction produces one of the following effects: • An illegal instruction exception type program interrupt, if the instruction is not recognized by the implementation; or • An unimplemented instruction exception type program interrupt, if the instruction is recognized by the implementation and is not a floating-point instruction, but is not supported by the implementation; or • A floating-point unavailable interrupt if the instruction is recognized as a floating-point instruction, but floating-point processing is disabled; or • Performance of the actions described in the rest of this document, if the instruction is recognized and supported by the implementation. The architected behavior can cause other exceptions. The PowerPC 476FP core recognizes and fully supports all of the instructions in the defined class, with a few exceptions. First, because the PowerPC 476FP core is a 32-bit implementation, those operations that are defined specifically for 64-bit operation are not supported at all, and always cause an illegal instruction exception type program interrupt. There is one defined instruction that is not supported within the PowerPC 476FP core. The instruction, mfapidi (move from auxiliary processor ID indirect), is a special instruction intended to assist with identification of the auxiliary processors that can be attached to a particular processor implementation. Because the PowerPC 476FP core does not have an auxiliary processor, the mfapidi instruction is not supported. Execution of mfapidi causes an illegal instruction exception type program interrupt. Version 2.2 July 31, 2014 Programming Model Page 47 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 2.3.2 Preserved Instruction Class The preserved instruction class is provided to support compatibility with earlier versions of either the PowerPC Architecture or the Power ISA. This instruction class includes opcodes defined for these previous architectures, but that are no longer defined for Power ISA. Any attempt to execute a preserved instruction results in one of the following effects: • Performance of the actions described in the previous version of the architecture, if the instruction is recognized • An illegal instruction exception type program interrupt, if the instruction is not recognized The only preserved instruction recognized and supported by the PowerPC 476FP core is the mftb (move from time base) opcode. This instruction was used in the PowerPC Architecture to read the Time Base Upper (TBU) and Time Base Lower (TBL) registers. Power ISA instead defines TBU and TBL as SPRs, and thus the mfspr (move from special purpose register) instruction is used to read them. To enable previous time base management software to be run on the PowerPC 476FP core, the core also supports the preserved opcode of mftb. However, the mftb instruction is not included in the various sections of this document that describe the implemented instructions, and software should take care to use the currently architected mechanism of mfspr to read the time base registers, to guarantee portability between the PowerPC 476FP core and future implementations of Power ISA. 2.3.3 Reserved Instruction Class This class of instructions consists of all instruction primary opcodes (and associated extended opcodes, if applicable) that do not belong to any of the defined, or preserved instruction classes. Reserved instructions are available for future versions of Power ISA. That is, future versions of Power ISA can define any of these instructions to perform new functions or make them available for implementationdependent use as allocated instructions. There are two types of reserved instructions: reserved-illegal and reserved-no-op. Any attempt to execute a reserved-illegal instruction causes an illegal instruction exception type program interrupt on the PowerPC 476FP core. Therefore, reserved-illegal instructions are available for future extensions to Power ISA that might affect the architected state. Such extensions might include new forms of integer or floating-point arithmetic instructions, or new forms of load or store instructions that affect architected registers or the contents of memory. However, any attempt to execute a reserved-no-op instruction either has no effect (that is, is treated as a nooperation instruction) or causes an illegal instruction exception type program interrupt on the PowerPC 476FP core. Because implementations are typically expected to treat reserved-no-op instructions as true noops, these instruction opcodes are thus available for future extensions to the Power ISA that have no effect on architected state. Such extensions might include performance-enhancing hints, such as new forms of cache touch instructions. Software can take advantage of the functions offered by the new instructions, and still remain backwards-compatible with implementations of previous versions of Power ISA. The PowerPC 476FP core implements all of the reserved-no-op instruction opcodes as true no-ops. Programming Model Page 48 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 2.4 Implemented Instruction Set Summary This section provides an overview of the various types and categories of instructions implemented within the PowerPC 476FP core. In addition, Appendix B Instruction Summary on page 271 provides a listing of instructions that are unique to the PowerPC 476FP core. Table 2-6 summarizes the PowerPC 476FP core instruction set by category. Instructions within each category are described in subsequent sections. Table 2-6. Instruction Categories Category Integer Subcategory Integer Storage Access load, store Integer Arithmetic add, subtract, multiply, divide, negate Integer Logical and, andc, or, orc, xor, nand, nor, xnor, extend sign, count leading zeros Integer Compare compare, compare logical Integer Select select operand Integer Trap trap Integer Rotate rotate and insert, rotate and mask Integer Shift shift left, shift right, shift right algebraic Branch Processor control Storage control Allocated Instruction Types branch, branch conditional, branch to link, branch to count Condition register logical crand, crandc, cror, crorc, crnand, crnor, crxor, crxnor Register management move to/from SPR, move to/from DCR, move to/from MSR, write to external interrupt enable bit, move to/from CR System linkage system call, return from interrupt, return from critical interrupt, return from machine check interrupt Processor synchronization instruction synchronize Cache management data allocate, data invalidate, data touch, data zero, data flush, data store, instruction invalidate, instruction touch TLB management read, write, search, synchronize Storage synchronization memory synchronize, memory barrier Allocated arithmetic multiply-accumulate, negative multiply-accumulate, multiply halfword Allocated logical detect left-most zero byte Allocated cache management data congruence-class invalidate, instruction congruence-class invalidate Allocated cache debug data read, instruction read 2.4.1 Integer Instructions Integer instructions transfer data between memory and the GPRs and perform various operations on the GPRs. This category of instructions is further divided into the eight subcategories that are described in the subsequent sections. Version 2.2 July 31, 2014 Programming Model Page 49 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 2.4.1.1 Integer Storage Access Instructions Integer storage access instructions load and store data between memory and the GPRs. These instructions operate on bytes, halfwords, and words. Integer storage access instructions also support loading and storing multiple registers, character strings, and byte-reversed data, and loading data with sign-extension. Table 2-7 on page 50 shows the integer storage access instructions in the PowerPC 476FP core. In the table, the syntax [u] indicates that the instruction has both an update form (in which the RA addressing register is updated with the calculated address) and a nonupdate form. Similarly, the syntax [x] indicates that the instruction has both an indexed form (in which the address is formed by adding the contents of the RA and RB GPRs) and a base + displacement form (in which the address is formed by adding a 16-bit signed immediate value (specified as part of the instruction) to the contents of GPR RA. See the detailed instruction descriptions in Appendix B Instruction Summary on page 271. Table 2-7. Integer Storage Access Instructions Loads Stores Byte Halfword Word Multiple/String Byte Halfword Word Multiple/String lbz[u][x] lha[u][x] lhbrx lhz[u][x] lwarx lwbrx lwz[u][x] lmw lswi lswx stb[u][x] sth[u][x] sthbrx stw[u][x] stwbrx stwcx. stmw stswi stswx 2.4.1.2 Integer Arithmetic Instructions Arithmetic operations are performed on integer or ordinal operands stored in registers. Instructions that perform operations on two operands are defined in a 3-operand format; an operation is performed on the operands, which are stored in two registers. The result is placed in a third register. Instructions that perform operations on one operand are defined in a 2-operand format; the operation is performed on the operand in a register and the result is placed in another register. Several instructions also have immediate formats in which one of the source operands is a field in the instruction. Most integer arithmetic instructions have versions that can update CR[CR0] or XER[SO, OV] based on the result of the instruction. Some integer arithmetic instructions also update XER[CA] (carry) implicitly. See Section 2.6 Integer Processing on page 63 for more information about how these instructions update the CR or the XER. Table 2-8 lists the integer arithmetic instructions in the PowerPC 476FP core. In the table, the syntax [o] indicates that the instruction has both an o form (that updates the XER[SO,OV] fields) and a non-o form. Similarly, the syntax [.] indicates that the instruction has both a record form (that updates CR[CR0]) and a nonrecord form. Table 2-8. Integer Arithmetic Instructions Add Subtract Multiply Divide Negate add[o][.] addc[o][.] adde[o][.] addi addic[.] addis addme[o][.] addze[o][.] subf[o][.] subfc[o][.] subfe[o][.] subfic subfme[o][.] subfze[o][.] mulhw[.] mulhwu[.] mulli mullw[o][.] divw[o][.] divwu[o][.] neg[o][.] Programming Model Page 50 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 2.4.1.3 Integer Logical Instructions Table 2-9 on page 51 lists the integer logical instructions in the PowerPC 476FP core. See Section 2.4.1.2 Integer Arithmetic Instructions for an explanation of the [.] syntax. Table 2-9. Integer Logical Instructions And And with complement Nand Or andc[.] nand[.] or[.] ori oris and[.] andi. andis. Or with complement Nor Xor Equivalence Extend sign orc[.] nor[.] xor[.] xori xoris eqv[.] extsb[.] extsh[.] Count leading zeros cntlzw[.] 2.4.1.4 Integer Compare Instructions These instructions perform arithmetic or logical comparisons between two operands and update the CR with the result of the comparison. Table 2-10 lists the integer compare instructions in the PowerPC 476FP core. Table 2-10. Integer Compare Instructions Arithmetic Logical cmp cmpi cmpl cmpli 2.4.1.5 Integer Trap Instructions Table 2-11 lists the integer trap instructions in the PowerPC 476FP core. Table 2-11. Integer Trap Instructions Trap tw twi 2.4.1.6 Integer Rotate Instructions These instructions rotate operands stored in the GPRs. Rotate instructions can also mask rotated operands. Table 2-12 lists the rotate instructions in the PowerPC 476FP core. See Section 2.4.1.2 Integer Arithmetic Instructions on page 50 for an explanation of the [.] syntax. Table 2-12. Integer Rotate Instructions Version 2.2 July 31, 2014 Rotate and Insert Rotate and Mask rlwimi[.] rlwnm[.] rlwinm[.] Programming Model Page 51 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 2.4.1.7 Integer Shift Instructions Table 2-13 lists the integer shift instructions in the PowerPC 476FP core. Note that the shift right algebraic instructions implicitly update the XER[CA] field. See Section 2.4.1.2 Integer Arithmetic Instructions on page 50 for an explanation of the [.] syntax. Table 2-13. Integer Shift Instructions Shift Left Shift Right Shift Right Algebraic slw[.] srw[.] sraw[.] srawi[.] 2.4.1.8 Integer Select Instruction Table 2-14 lists the integer select instruction in the PowerPC 476FP core. The RA operand is 0 if the RA field of the instruction is 0, or is the contents of GPR[RA] otherwise. Table 2-14. Integer Select Instruction Integer Select isel 2.4.2 Branch Instructions These instructions unconditionally or conditionally branch to an address. Conditional branch instructions can test condition codes set in the CR by a previous instruction and branch accordingly. Conditional branch instructions can also decrement and test the CTR as part of branch determination and can save the return address in the Link Register (LR). The target address for a branch can be a displacement from the current instruction address or an absolute address, or contained in the LR or CTR. See Section 2.5 Branch Processing on page 56 for more information about branch operations. Table 2-15 on page 52 lists the branch instructions in the PowerPC 476FP core. In the table, the syntax [l] indicates that the instruction has both a link update form (that updates LR with the address of the instruction after the branch) and a nonlink update form. Similarly, the syntax [a] indicates that the instruction has both an absolute address form (in which the target address is formed directly by using the immediate field specified as part of the instruction) and a relative form (in which the target address is formed by adding the specified immediate field to the address of the branch instruction). Table 2-15. Branch Instructions Branch b[l][a] bc[l][a] bcctr[l] bclr[l] 2.4.3 Processor Control Instructions Processor control instructions manipulate system registers, perform system software linkage, and synchronize processor operations. The following sections describe instructions in these three subcategories of processor control instructions. Programming Model Page 52 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 2.4.3.1 Condition Register Logical Instructions These instructions perform logical operations on a specified pair of bits in the CR, placing the result in another specified bit. The benefit of these instructions is that they can logically combine the results of several comparison operations without incurring the extra processing time of conditional branching between each one. Software performance can significantly improve if multiple conditions are tested in a group as part of a branch decision. Table 2-16 lists the condition register logical instructions in the PowerPC 476FP core. Table 2-16. Condition Register Logical Instructions Condition Register Logical crand crandc creqv crnand crnor cror crorc crxor 2.4.3.2 Register Management Instructions These instructions move data between the GPRs and control registers in the PowerPC 476FP core. Table 2-17 lists the register management instructions in the PowerPC 476FP core. Table 2-17. Register Management Instructions CR DCR MSR SPR mcrf mcrxr mfcr mtcrf mfdcr mfdcrux mfdcrx mtdcr mtdcrux mtdcrx mfmsr mtmsr wrtee wrteei mfspr mtspr 2.4.3.3 System Linkage Instructions These instructions start supervisor software level for system services and return from interrupts. Table 2-18 lists the system linkage instructions in the PowerPC 476FP core. Table 2-18. System Linkage Instructions System Linkage rfi rfci rfmci sc Version 2.2 July 31, 2014 Programming Model Page 53 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 2.4.3.4 Processor Synchronization Instruction The processor synchronization instruction, isync, forces the processor to complete all instructions preceding the isync before allowing any context changes as a result of any instructions that follow the isync. Additionally, all instructions that follow the isync execute within the context established by the completion of all the instructions that precede the isync. See Section 2.10 Synchronization on page 76 for more information about the synchronizing effect of isync. Table 2-19 shows the processor synchronization instruction in the PowerPC 476FP core. Table 2-19. Processor Synchronization Instruction Processor Synchronization isync 2.4.4 Storage Control Instructions These instructions manage the instruction and data caches and the TLB of the PowerPC 476FP core. Instructions are also provided to synchronize and order storage accesses. The following sections describe the instructions in these three subcategories of storage control instructions. 2.4.4.1 Cache Management Instructions These instructions control the operation of the data and instruction caches. Instructions are provided to fill, flush, invalidate, or zero data cache blocks, where a block is defined as a 32-byte cache line. Instructions are also provided to fill or invalidate instruction cache blocks. Table 2-20 lists the cache management instructions in the PowerPC 476FP core. Table 2-20. Cache Management Instructions Data Cache Instruction Cache dcba (no-op) dcbf dcbi dcbst dcbt dcbtst dcbz dcbtls dcbtstls dcblc icbi icbt icbtls icblc 2.4.4.2 TLB Management Instructions The TLB management instructions read and write entries of the TLB array and search the TLB array for an entry that translates to a particular virtual address. Table 2-21 on page 55 lists the TLB management instructions in the PowerPC 476FP core. See Section 2.4.1.2 Integer Arithmetic Instructions on page 50 for an explanation of the [.] syntax. Programming Model Page 54 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Table 2-21. TLB Management Instructions TLB Management tlbre tlbsx[.] tlbsync tlbwe tlbivax 2.4.4.3 Storage Synchronization Instructions The storage synchronization instructions allow software to enforce ordering among the storage accesses caused by load and store instructions, which by default are weakly ordered by the processor. Weakly ordered means that the processor is architecturally permitted to perform loads and stores generally out-of-order with respect to their sequence within the instruction stream, with some exceptions. However, if a storage synchronization instruction is executed, all storage accesses prompted by instructions preceding the synchronizing instruction must be performed before any storage accesses prompted by instructions that come after the synchronizing instruction. See Section 2.10 Synchronization on page 76 for more information about storage synchronization. Table 2-22 shows the storage synchronization instructions in the PowerPC 476FP core. Table 2-22. Storage Synchronization Instructions Storage Synchronization lwsync msync mbar 2.4.5 Previous Integer Multiply-Accumulate Instructions The previous integer multiply-accumulate instructions implemented within the PowerPC 476FP core are divided into four subcategories and are shown in Table 2-23 on page 56. See Section 2.4.1.2 Integer Arithmetic Instructions on page 50 for an explanation of the [.] and [o] syntax. Version 2.2 July 31, 2014 Programming Model Page 55 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table 2-23. Previous Integer Multiply-Accumulate Instructions Arithmetic Multiply-Accumulate macchw[o][.] macchws[o][.] macchwsu[o][.] macchwu[o][.] machhw[o][.] machhws[o][.] machhwsu[o][.] machhwu[o][.] maclhw[o][.] maclhws[o][.] maclhwsu[o][.] maclhwu[o][.] Negative Multiply-Accumulate Multiply Halfword nmacchw[o][.] nmacchws[o][.] nmachhw[o][.] nmachhws[o][.] nmaclhw[o][.] nmaclhws[o][.] mulchw[.] mulchwu[.] mulhhw[.] mulhhwu[.] mullhw[.] mullhwu[.] 2.5 Branch Processing The following sections provide additional information about branch addressing, instruction fields, prediction, and registers. 2.5.1 Branch Addressing The branch instruction (b[l][a]) specifies the displacement of the branch target address as a 26-bit value (the 24-bit LI field right-extended with ‘00’). This displacement is regarded as a signed 26-bit number covering an address range of ±32 MB. Similarly, the branch conditional instruction (bc[l][a]) specifies the displacement as a 16-bit value (the 14-bit BD field right-extended with ‘00’). This displacement covers an address range of ±32 KB. For the relative form of the branch and branch conditional instructions (b[l] and bc[l], with instruction field AA = ‘0’), the target address is the address of the branch instruction itself (the current instruction address) plus the signed displacement. This address calculation is defined to wrap around from the maximum effective address (x‘FFFF FFFF’) to x‘0000 0000’, and vice-versa. For the absolute form of the branch and branch conditional instructions (ba[l] and bca[l], with instruction field AA = ‘1’), the target address is the sign-extended displacement. This means that with absolute forms of the branch instruction, the branch target can be within the first or last 32 MB of the address space. With the absolute form of the branch conditional instructions, the branch target can be within the first or last 32 KB of the address space. The other two branch instructions, bclr (branch conditional to LR) and bcctr (branch conditional to CTR), do not use absolute nor relative addressing. Instead, they use indirect addressing, in which the target of the branch is specified indirectly as the contents of the LR or CTR. Programming Model Page 56 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 2.5.2 Branch Instruction BI Field Conditional branch instructions can optionally test one bit of the CR, as indicated by the branch option (BO) instruction field bit 0 (see the BO field description in Section 2.5.3). The value of the branch index (BI) instruction field specifies the CR bit to be tested (0 - 31). The BI field is ignored if BO[0] =‘1’. The branch (b[l][a]) instruction is by definition unconditional, and hence does not have a BI instruction field. Instead, the position of this field is part of the LI displacement field. 2.5.3 Branch Instruction BO Field The BO field specifies the condition under which a conditional branch is taken and whether the branch decrements the CTR. The branch (b[l][a]) instruction is by definition unconditional and hence does not have a BO instruction field. Instead, the position of this field is part of the LI displacement field. Conditional branch instructions can optionally test one bit in the CR. This option is selected when BO[0] = ‘0’; if BO[0] = ‘1’, the CR does not participate in the branch condition test. If the CR condition option is selected, the condition is satisfied (branch can occur) if the CR bit selected by the BI instruction field matches BO[1]. Conditional branch instructions can also optionally decrement the CTR by one and test whether the decremented value is 0. This option is selected when BO[2] = ‘0’; if BO[2] = ‘1’, the CTR is not decremented and does not participate in the branch condition test. If CTR decrement option is selected, BO[3] specifies the condition that must be satisfied to allow the branch to be taken. If BO[3] = ‘0’, CTR ≠ ‘0’ is required for the branch to occur. If BO[3] = ‘1’, CTR = ‘0’ is required for the branch to occur. Table 2-24 summarizes the use of the bits of the BO field. BO[4] is further discussed in Section 2.5.4 Branch Prediction on page 58. Table 2-24. BO Field Definition BO Bit Description BO[0] CR test control. 0 Test CR bit specified by BI field for value specified by BO[1]. 1 Do not test CR. BO[1] CR test value. 0 If BO[0] = ‘0’, test for CR[BI] = ‘0’. 1 If BO[0] = ‘0’, test for CR[BI] = ‘1’. BO[2] CTR decrement and test control. 0 Decrement CTR by one and test whether the decremented CTR satisfies the condition specified by BO[3]. 1 Do not decrement CTR; do not test CTR. BO[3] CTR test value. 0 If BO[2] = ‘0’, test for decremented CTR ≠ ‘0’. 1 If BO[2] = ‘0’, test for decremented CTR = ‘0’. BO[4] Branch prediction reversal. 0 Apply standard branch prediction. 1 Reverse the standard branch prediction. Table 2-25 on page 58 lists specific BO field contents and the resulting actions; z represents a mandatory value of zero and y is a branch prediction option discussed in Section 2.5.4 on page 58. Version 2.2 July 31, 2014 Programming Model Page 57 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table 2-25. BO Field Examples BO Value Description 0000y Decrement the CTR, then branch if the decremented CTR ≠ ‘0’ and CR[BI] = ‘0’. 0001y Decrement the CTR, then branch if the decremented CTR = ‘0’ and CR[BI] = ‘0’. 001zy Branch if CR[BI] = ‘0’. 0100y Decrement the CTR, then branch if the decremented CTR ≠ ‘0’ and CR[BI] = ‘1’. 0101y Decrement the CTR, then branch if the decremented CTR = ‘0’ and CR[BI] = ‘1’. 011zy Branch if CR[BI] = ‘1’. 1z00y Decrement the CTR, then branch if the decremented CTR ≠ ‘0’. 1z01y Decrement the CTR, then branch if the decremented CTR = ‘0’. 1z1zz Branch always. The “a” and “t” bits of the BO field can be used by software to provide a hint about whether the branch is likely to be taken or is likely not to be taken, as follows: • BO = 0z1at (“at” bits = ‘11’): The branch is very likely to be taken. • BO = 0z1at (“at” bits = ‘10’): The branch is very likely not to be taken. • This branch hint is enabled when CCR2[SPC5C1] is set. Otherwise, this branch prediction hint is ignored. 2.5.4 Branch Prediction Conditional branches might be taken or not taken; if taken, instruction fetching is redirected to the target address. If the branch is not taken, instruction fetching falls through to the next sequential instruction. The PowerPC 476FP core attempts to predict whether a branch is taken before all information necessary to determine the branch direction is available. This action is called branch prediction. The core can then prefetch instructions down the predicted path. If the prediction is correct, performance is improved because the branch target instruction is available immediately, instead of having to wait until the branch conditions are resolved. If the prediction is incorrect, the prefetched instructions (that were fetched from addresses down the wrong path of the branch) must be discarded and new instructions fetched from the correct path. The PowerPC 476FP core combines the static prediction mechanism defined by Power ISA and a dynamic branch prediction mechanism to provide correct branch prediction as often as possible. The dynamic branch prediction mechanism is an implementation optimization and is not part of the architecture, nor is it visible to the programming model. The static branch prediction mechanism enables software to designate the preferred branch prediction through bits in the instruction encoding. The default static branch prediction for conditional branches is as follows: Predict that the branch is to be taken if ((BO[0] ∧ BO[2]) ∨ s) = 1 where s is bit 16 of the instruction (the sign bit of the displacement for all branch conditional [bc] forms and zero for all branch conditional to link register [bclr] and branch conditional to count register [bcctr] forms). That is, conditional branches are predicted taken if their branch displacement is negative (that is, the branch is branching backwards from the current instruction address). The standard prediction for this case derives from considering the relative form of bc, often used at the end of loops to control the number of times that a Programming Model Page 58 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core loop is executed. Because the branch is taken each time the loop is executed except the last, it is best if the branch is predicted taken. The branch target is the beginning of the loop, so the branch displacement is negative and s = ‘1’. Because this situation is most common, a branch is taken if s = ‘1’. If branch displacements are positive, s = ‘0’, the branch is predicted not taken. Also, if the branch instruction is any form of bclr or bcctr except the unconditional form, s = ‘0’, and the branch is predicted not taken. There is a peculiar consequence of this prediction algorithm for the absolute forms of bc (bca and bcla). As described in Section 2.5.1 Branch Addressing on page 56, if s = ‘1’, the branch target is in high memory. If s = ‘0’, the branch target is in low memory. Because these are absolute-addressing forms, there is no reason to treat high and low memory differently. Nevertheless, for the high memory case, the standard prediction is taken, and for the low memory case the standard prediction is not taken. Another bit in the BO field allows software further control over branch prediction. Specifically, BO[4] is the prediction reversal bit. If BO[4] = ‘0’, the default prediction is applied. If BO[4] = ‘1’, the reverse of the default prediction is applied. For the cases in Table 2-25 BO Field Examples on page 58 where BO[4] = y, software can reverse the default prediction by setting y to ‘1’. This should only be done when the default prediction is likely to be wrong. Note that for the branch always condition, reversal of the default prediction is not allowed, as BO[4] is designated as z for this case, meaning the bit must be set to 0 or the instruction form is not valid. 2.5.5 Branch Control Registers There are three registers in the PowerPC 476FP core that are associated with branch processing, and they are described in the following sections. 2.5.5.1 Link Register (LR) The LR is written from a GPR by using mtspr and can be read into a GPR by using mfspr. The LR can also be updated by the link update form of branch instructions (instruction field LK = ‘1’). Such branch instructions load the LR with the address of the instruction that follows the branch instruction (4 + address of the branch instruction). Thus, the LR contents can be used as a return address for a subroutine that was entered by using a link update form of branch. The bclr instruction uses the LR in this fashion, enabling indirect branching to any address. When being used as a return address by a bclr instruction, bits 30:31 of the LR are ignored, because all instruction addresses are on word boundaries. Access to the LR is nonprivileged. LR 0 1 2 3 4 5 6 Bits Field Name 0:31 LR Version 2.2 July 31, 2014 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Description Link Register contents. Target address of bclr instruction. Programming Model Page 59 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 2.5.5.2 Count Register (CTR) The CTR is written from a GPR by using mtspr and can be read into a GPR by using mfspr. The CTR contents can be used as a loop count that gets decremented and tested by conditional branch instructions that specify count decrement as one of their branch conditions (instruction field BO[2] = ‘0’). Alternatively, the CTR contents can specify a target address for the bcctr instruction, enabling indirect branching to any address. Access to the CTR is nonprivileged. Count 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:63 Count Description Used as the count for branch conditional with decrement instructions, or as the target address for bcctr instructions. 2.5.5.3 Condition Register (CR) The CR is used to record certain information (conditions) related to the results of the various instructions that are enabled to update the CR. A bit in the CR can also be selected to be tested as part of the condition of a conditional branch instruction. The CR is organized into eight 4-bit fields (CR0 - CR7). Table 2-26 on page 61 lists the instructions that update the CR. Access to the CR is nonprivileged. CR0 CR1 CR2 CR3 CR4 CR5 CR6 CR7 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:35 CR0 Condition register field 0. 36:39 CR1 Condition register field 1. 40:43 CR2 Condition register field 2. 44:47 CR3 Condition register field 3. 48:51 CR4 Condition register field 4. 52:55 CR5 Condition register field 5. 56:49 CR6 Condition register field 6. 60:63 CR7 Condition register field 7. Programming Model Page 60 of 322 Description Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Table 2-26. CR Updating Instructions Integer Storage Access stwcx. Arithmetic Logical Compare Rotate Shift add.[o] addc.[o] adde.[o] addic. addme.[o] addze.[o] and. andi. andis. cmp cmpi rlwimi. slw. rlwinm. rlwnm. srw. andc. cmpl cmpli sraw. srawi. nand. subf.[o] subfc.[o] subfe.[o] subfme.[o] subfze.[o] or. orc. nor. xor. mulhw. mulhwu. mullw.[o] eqv. divw.[o] divwu.[o] extsb. extsh. neg.[o] cntlzw. Processor Control Storage Control Auxiliary Processor CR-Logical and Register Management TLB Management Arithmetic and Logical tlbsx. macchw.[o] macchws.[o] macchwsu.[o] macchwu.[o] machhw.[o] machhws.[o] machhwsu.[o] machhwu.[o] maclhw.[o] maclhws.[o] maclhwsu.[o] maclhwu.[o] crand crandc creqv crnand crnor cror crorc crxor mcrf mcrxr mtcrf nmacchw.[o] nmacchws.[o] nmachhw.[o] nmachhws.[o] nmaclhw.[o] nmaclhws.[o] mulchw. mulchwu. mulhhw. mulhhwu. mullhw. mullhwu. dlmzb. The Power ISA provides detailed information about how each of these instructions updates the CR. To summarize, the CR can be accessed in any of the following ways: • mfcr (move from Condition Register) reads the CR into a GPR. Note that this instruction does not update the CR and is therefore not listed in Table 2-26. • Conditional branch instructions can designate a CR bit to be used as a branch condition. Note that these instructions do not update the CR and are therefore not listed in Table 2-26. • mtcrf (move to Condition Register fields) sets specified CR fields by writing to the CR from a GPR, under control of a mask field specified as part of the instruction. • mcrf (move to Condition Register from Floating-Point Status and Control Register [FPSCR]) updates a specified CR field by copying another specified CR field into it. • mcrxr (move to Condition Register from Integer Exception Register [XER]) copies certain bits of the XER into a specified CR field and clears the corresponding XER bits. • Integer compare instructions update a specified CR field. Version 2.2 July 31, 2014 Programming Model Page 61 of 322 User’s Manual PowerPC 476FP Embedded Processor Core • CR-logical instructions update a specified CR bit with the result of any one of eight logical operations on a specified pair of CR bits. • Certain forms of various integer instructions (the “.” forms) implicitly update CR[CR0], as do certain forms of the auxiliary processor instructions implemented within the PowerPC 476FP core. CR[CR0] Implicit Update By Integer Instructions Most of the CR-updating instructions listed in Table 2-26 on page 61 implicitly update the CR0 field. These are the various dot-form instructions, indicated by a “.” in the instruction mnemonic. Most of these instructions update CR[CR0] according to an arithmetic comparison of 0 with the 32-bit result that the instruction writes to the GPR file. That is, after performing the operation defined for the instruction, the 32-bit result that is written to the GPR file is compared to 0 by using a signed comparison, independent of whether the actual operation being performed by the instruction is considered signed. For example, logical instructions such as and., or., and nor. update CR[CR0] according to this signed comparison to 0, even though the result of such a logical operation is not typically interpreted as a signed value. For each of these dot-form instructions, the individual bits in CR[CR0] are updated as follows: CR[CR0[0]] — LT Less than 0; set if the most significant bit of the 32-bit result is ‘1’. CR[CR0[1]] — GT Greater than 0; set if the 32-bit result is nonzero and the most significant bit of the result is ‘0’. CR[CR0[2]] — EQ Equal to 0; set if the 32-bit result is 0. CR[CR0[3]] — SO Summary overflow; a copy of XER[SO] at the completion of the instruction (including any XER[SO] update being performed the instruction itself. Note: If an arithmetic overflow occurs, the sign of an instruction result indicated in CR[CR0] might not represent the true (infinitely precise) algebraic result of the instruction that set CR0. For example, if an add. instruction adds two large positive numbers and the magnitude of the result cannot be represented as a twoscomplement number in a 32-bit register, an overflow occurs and CR[CR0[0]] is set, even though the infinitely precise result of the add is positive. Similarly, adding the largest 32-bit twos-complement negative number (x‘8000 0000’) to itself results in an arithmetic overflow and x‘0000 0000’ is recorded in the target register. CR[CR0[2]] is set, indicating a result of 0, but the infinitely precise result is negative. CR[CR0[3]] is a copy of XER[SO] at the completion of the instruction, whether or not the instruction that is updating CR[CR0] is also updating XER[SO]. Note: If an instruction causes an arithmetic overflow but is not of the form that actually updates XER[SO], the value placed in CR[CR0[3]] does not reflect the arithmetic overflow that occurred on the instruction; it is merely a copy of the value of XER[SO] that was already in the XER before the execution of the instruction updating CR[CR0]. There are a few dot-form instructions that do not update CR[CR0] in the fashion described previously. These instructions are: store word conditional indexed (stwcx.), TLB search indexed (tlbsx.), and determine left most zero byte (dlmzb). See the instruction descriptions in Power ISA for details on how these instructions update CR[CR0]. Programming Model Page 62 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core CR Update By Integer Compare Instructions Integer compare instructions update a specified CR field with the result of a comparison of two 32-bit numbers, the first of which is from a GPR and the second of which is either an immediate value or from another GPR. There are two types of integer compare instructions, arithmetic and logical, and they are distinguished by the interpretation given to the 32-bit numbers being compared. For arithmetic compares, the numbers are considered to be signed, whereas for logical compares, the numbers are considered to be unsigned. For example, consider the comparison of 0 with x‘FFFF FFFF’. In an arithmetic compare, 0 is larger; in a logical compare, x‘FFFF FFFF’ is larger. A compare instruction can direct its result to any CR field. The BF field (bits 6:8) of the instruction specifies the CR field to be updated. After a compare, the specified CR field is interpreted as follows: CR[BF]0 — LT The first operand is less than the second operand. CR[BF]1 — GT The first operand is greater than the second operand. CR[BF]2 — EQ The first operand is equal to the second operand. CR[BF]3 — SO Summary overflow; a copy of XER[SO]. 2.6 Integer Processing Integer processing includes loading and storing data between memory and GPRs and performing various operations on the values in GPRs and other registers (the categories of integer instructions are summarized in Table 2-6 on page 49). The sections that follow describe the registers that are used for integer processing and how they are updated by various instructions. In addition, Section 2.5.5.3 Condition Register (CR) on page 60 provides more information about the CR updates caused by integer instructions. Finally, Power ISA also provides details on the various register updates performed by integer instructions. 2.6.1 General Purpose Registers (GPRs) The PowerPC 476FP core contains 32 GPRs. The contents of these registers can be transferred to and from memory by using integer storage access instructions. Operations are performed on GPRs by most other instructions. Access to the GPRs is nonprivileged. Data 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:63 Data Version 2.2 July 31, 2014 Description General Purpose Register data. Programming Model Page 63 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 2.6.2 Fixed-Point Exception Register (XER) The XER records overflow and carry indications from integer arithmetic and shift instructions. It also provides a byte count for string indexed integer storage access instructions (lswx and stswx). Note that the term exception in the name of this register does not refer to exceptions as they relate to interrupts, but rather to the arithmetic exceptions of carry and overflow. The fields of the XER are shown here; Table 2-27 and Table 2-28 list the instructions that update XER[SO,OV] and the XER[CA] fields. The sections that follow the figure and tables describe the fields of the XER in more detail. Access to the XER is nonprivileged. SO OV CA Reserved TBC 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32 SO Summary overflow. 0 No overflow has occurred. 1 Overflow has occurred. Can be set by mtspr or by integer or auxiliary processor instructions with the [o] option; can be reset by mtspr or by mcrxr. 33 OV Overflow. 0 No overflow has occurred. 1 Overflow has occurred. Can be set by mtspr or by integer or allocated instructions with the [o] option; can be reset by mtspr, by mcrxr, or by integer or allocated instructions with the [o] option. 34 CA Carry. 0 Carry has not occurred. 1 Carry has occurred. Can be set by mtspr or by certain integer arithmetic and shift instructions; can be reset by mtspr, by mcrxr, or by certain integer arithmetic and shift instructions. 35:56 Reserved 57:63 TBC Programming Model Page 64 of 322 Description Transfer byte count. Used as a byte count by lswx and stswx; written by dlmzb[.] and by mtspr. Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Table 2-27. XER[SO,OV] Updating Instructions Integer Arithmetic Processor Control Auxiliary Processor Add Subtract Multiply Divide Negate addo[.] addco[.] addeo[.] addmeo[.] addzeo[.] subfo[.] subfco[.] subfeo[.] subfmeo[.] subfzeo[.] mullwo[.] divwo[.] divwuo[.] nego[.] MultiplyAccumulate Negative MultiplyAccumulate Register Management macchwo[.] macchwso[.] macchwsuo[.] macchwuo[.] machhwo[.] machhwso[.] machhwsuo[.] machhwuo[.] maclhwo[.] maclhwso[.] maclhwsuo[.] maclhwuo[.] nmacchwo[.] nmacchwso[.] nmachhwo[.] nmachhwso[.] nmaclhwo[.] nmaclhwso[.] mtspr mcrxr Table 2-28. XER[CA] Updating Instructions Integer Arithmetic Integer Shift Processor Control Add Subtract Shift Right Algebraic Register Management addc[o][.] adde[o][.] addic[.] addme[o][.] addze[o][.] subfc[o][.] subfe[o][.] subfic subfme[o][.] subfze[o][.] sraw[.] srawi[.] mtspr mcrxr 2.6.2.1 Summary Overflow (SO) Field This field is set to ‘1’ when an instruction is executed that causes XER[OV] to be set to ‘1’, except for the case of mtspr(XER), which writes XER[SO] with the values in (RS[32]) and writes XER[OV] with the values in (RS[33]). After it is set, XER[SO] is not reset until either an mtspr(XER) is executed with data that explicitly writes ‘0’ to XER[SO], or until an mcrxr instruction is executed. The mcrxr instruction sets XER[SO] (and XER[OV,CA]) to ‘0’ after copying all three fields into CR[CR0[0:2]] (and setting CR[CR0[3]] to ‘0’). Given this behavior, XER[SO] does not necessarily indicate that an overflow occurred on the most recent integer arithmetic operation, but rather that one occurred at some time subsequent to the last clearing of XER[SO] by mtspr(XER) or mcrxr. XER[SO] is read (with the rest of the XER) into a GPR by mfspr(XER). In addition, various integer instructions copy XER[SO] into CR[CR0[3]] (see Section 2.5.5.3 Condition Register (CR) on page 60). 2.6.2.2 Overflow (OV) Field This field is updated by certain integer arithmetic instructions to indicate whether the infinitely precise result of the operation can be represented in 32 bits. For those integer arithmetic instructions that update XER[OV] and produce signed results, XER[OV] = ‘1’ if the result is greater than 231 – 1 or less than –231; otherwise, XER[OV] = ‘0’. For those integer arithmetic instructions that update XER[OV] and produce unsigned results Version 2.2 July 31, 2014 Programming Model Page 65 of 322 User’s Manual PowerPC 476FP Embedded Processor Core (certain integer divide instructions and multiply-accumulate auxiliary processor instructions), XER[OV] = ‘1’ if the result is greater than 232–1; otherwise, XER[OV] = ‘0’. See the instruction descriptions in the Power ISA for more details on the conditions under which the integer divide instructions set XER[OV] to ‘1’. The mtspr(XER) and mcrxr instructions also update XER[OV]. Specifically, mcrxr sets XER[OV] (and XER[SO,CA]) to ‘0’ after copying all three fields into CR[CR0[0:2]] (and setting CR[CR0[3]] to ‘0’). mtspr(XER) writes XER[OV] with the value in (RS[33]). XER[OV] is read (along with the rest of the XER) into a GPR by mfspr(XER). 2.6.2.3 Carry (CA) Field This field is updated by certain integer arithmetic instructions (the carrying and extended versions of add and subtract) to indicate whether there is a carry-out of the most significant bit of the 32-bit result. XER[CA] = ‘1’ indicates a carry. The integer shift right algebraic instructions update XER[CA] to indicate whether any 1-bits were shifted out of the least significant bit of the result, if the source operand was negative. The mtspr(XER) and mcrxr instructions also update XER[CA]. Specifically, mcrxr sets XER[CA] (and XER[SO,OV]) to ‘0’ after copying all three fields into CR[CR0[0:2]] (and setting CR[CR0[3]] to ‘0’). mtspr(XER) writes XER[CA] with the value in (RS[34]). XER[CA] is read (with the rest of the XER) into a GPR by mfspr(XER). In addition, the extended versions of the add and subtract integer arithmetic instructions use XER[CA] as a source operand for their arithmetic operations. 2.6.2.4 Transfer Byte Count (TBC) Field The TBC field is used by the string indexed integer storage access instructions (lswx and stswx) as a byte count. The TBC field is updated by the dlmzb[.] instruction with a value indicating the number of bytes up to and including the zero byte detected by the instruction. The TBC field is also written by mtspr(XER) with the value in (RS[25:31]). XER[TBC] is read (with the rest of the XER) into a GPR by mfspr(XER). 2.7 Processor Control The PowerPC 476FP core provides several registers for general processor control and status. It includes the following registers: • Machine State Register (MSR) Controls interrupts and other processor functions. • Special Purpose Registers General (SPRGs) SPRs for general purpose software use. • Processor Version Register (PVR) Indicates the specific implementation of a processor. • Processor Identification Register (PIR) Indicates the specific instance of a processor in a multiprocessor system. • Core Configuration Register 0 (CCR0) Controls specific processor functions, such as instruction prefetch. Programming Model Page 66 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core • Core Configuration Register 1 (CCR1) CCR1 can cause all possible parity error exceptions to verify correct machine check exception handler operation. Other CCR1 bits can force a full-line data cache flush and select a processor timer clock input other than CPUCLOCK. • Core Configuration Register 2 (CCR2) CCR2 defines additional cache parameters. • Reset Configuration (RSTCFG) Reports the values of certain fields of the TLB as supplied at reset. • Device Control Register Immediate Prefix Register (DCRIPR) The DCRIPR provides the upper order 22 bits of the DCR address to be used by the mtdcr and mfdcr. This SPR has hex address x‘37B’, can be read and written, and is privileged. Except for the MSR, each of these registers is described in more detail in the following sections. The MSR is described in more detail in Section 7 Processor Interrupts and Exceptions on page 167. 2.7.1 Special Purpose Registers General (USPRG0, SPRG0 - SPRG8) USPRG0 and SPRG0 - SPRG8 are provided for general purpose, system-dependent software use. One common system use of these registers is as temporary storage locations. For example, a routine might save the contents of a GPR to an SPRG and later restore the GPR from it. This is faster than a save/restore to a memory location. These registers are written by using mtspr and read by using mfspr. Access to USPRG0 is nonprivileged for both read and write. Access to SPRG4 - SPRG7 is nonprivileged for read but privileged for read/write, and hence, different SPR numbers are used for reading than for writing. See Table 2-5 on page 43 for their accesses. Access to SPRG0 - SPRG3 is privileged for both read and write; access to SPRG8 is privileged for both read and write. General data 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:63 General data Version 2.2 July 31, 2014 Description Software value; hardware does not use the value. Programming Model Page 67 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 2.7.2 Processor Version Register (PVR) The PVR is a read-only register typically used to identify a specific processor core and chip implementation. Software can read the PVR to determine processor core and chip hardware features. The PVR can be read into a GPR by using mfspr. Access to the PVR is privileged. OWN PVN 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name Description 32:43 OWN Owner identifier. Identifies the owner of a core. This implementation-specific value (after reset and otherwise) is specified by core input signals. 44:63 PVN Processor version number. This implementation-specific value identifies the specific version and use of a processor core within a chip. This value (after reset and otherwise) is specified by core input signals. 2.7.3 Processor Identification Register (PIR) The PIR is a read-only register that uniquely identifies a specific instance of a processor core, within a multiprocessor configuration, enabling software to determine exactly which processor it is running on. This capability is important for operating system software within multiprocessor configurations. The PIR can be read into a GPR by using mfspr. Access to the PIR is privileged. Reserved PIN 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:59 Reserved 60:63 PIN Programming Model Page 68 of 322 Description Processor identification number (PIN). Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 2.7.4 Core Configuration Register 0 (CCR0) The CCR0 controls a number of special chip functions, including data cache and auxiliary processor operation, speculative instruction fetching, trace, and the operation of the cache block touch instructions. The CCR0 is written from a GPR by using mtspr, and can be read into a GPR by using mfspr. A cross reference after the bit-field description indicates the section of this document that describes each field in more detail. 2 Bits 3 4 5 6 7 8 9 Reserved IQWPM DQWPM[0:1] Reserved DBTAC Reserved Reserved FLSTA ICWRIDX[0:3] DTB Reserved DAPUIB ICS 1 CRPE PRE 0 Reserved ITE Access to the CCR0 is privileged. 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Field Name Description ITE Internal trace enable. 0 Disable internal trace. 1 Enable internal trace. The debugger tool or debug software must turn this bit on to enable instruction trace. If the user software clears ITE, any currently running debugger trace operation is terminated. 1 PRE Parity recoverability enable. 0 Semirecoverable parity mode enabled for data cache. 1 Fully recoverable parity mode enabled for data cache. Must be set to ‘1’ to guarantee full recoverability from memory management unit (MMU) and datacache parity errors. 2:3 Reserved 0 4 CRPE 5:9 Reserved 10 ICS 11 DAPUIB 12:15 ICWRIDX[0:3] 16 Version 2.2 July 31, 2014 DTB Cache read parity enable. 0 Disable parity information reads. 1 Enable parity information reads. When enabled, execution of the following instructions loads parity information into the associated register: Instruction Register icread ICDBTRH, ICDBTRL, ICDBDR1 dcread DCDBTRH, DCDBTRL tlbre GPR (see tlbre operation) icbi request size. 0 32-byte icbi request. 1 128-byte icbi request. Disable APU instruction broadcast. 0 Enabled. 1 Disabled. Instructions are not broadcast to the APU for decoding. Instruction cache write index (for JTAG). Specifies the index value to write to the instruction cache. Disable trace broadcast. 0 Enabled. 1 Disabled; no trace information is broadcast. This mechanism is provided as a means of reducing power consumption when instruction tracing is not needed. See Initialization on page 243. Programming Model Page 69 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Bits Field Name 17:22 Reserved Description Force load/store alignment. 0 No alignment exception occurs on integer storage access instructions, regardless of alignment. 1 An alignment exception occurs on integer storage access instructions if the data address is not on an operand boundary. 23 FLSTA 24 Reserved 25 DBTAC 26:27 Reserved 28:29 DQWPM[0:1] Data cache quadword prediction mode. 00 No prediction, cause a hold. 01 Use EA[19]. 10 Use last value for quadword EA[19]. 11 Use NOT EA[19]. 30 IQWPM Instruction cache quadword prediction mode. 0 Use last value for quadword EA[19]. 1 Use EA[19]. 31 Reserved Disable the branch target address CAM (BTAC). 0 Use the BTAC in the branch prediction unit. 1 Disable the BTAC in the branch prediction unit. 2.7.5 Core Configuration Register 1 (CCR1) DCDPEI 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 DPC ICTPEI 5 TSS ICTPEI Reserved ICLPEI Reserved ICLPEI MMUDPEI ICDPEI 4 DCTPEI ICDPEI 3 DCTPEI FPRPEI 2 DCLPEI FPRPEI 1 DCLPEI GPRPEI 0 DCDPEI GPRPEI MMUTPEI Bits 0:17 of CCR1 can cause all possible parity error exceptions to verify correct machine check exception handler operation. Other CCR1 bits can force a full-line data cache flush, or select a processor timer clock input other than CPUClock. The CCR1 is written from a GPR by using mtspr, and can be read into a GPR by using mfspr. Access to the CCR1 is privileged. TCS Reserved Bits Field Name 0:1 GPRPEI GPR parity error insert. GPRPEI[0]: Records parity in the I-pipe of the GPR file if set. GPRPEI[1]: Records parity in the L-pipe of the GPR file if set. 2:3 FPRPEI Floating-Point Register (FPR) parity error insert. FPRPEI[2]: Records parity in the first FPR if set. FPRPEI[3]: Records parity in the second FPR if set. 4:5 ICDPEI Instruction cache data parity error insert. 0 Record odd parity (normal). 1 Record even parity (simulate parity errors). Controls inversion of parity bits that are recorded when the instruction cache is filled. ICDPEI[4]: Records parity in the left array. ICDPEI[5]: Records parity in the right array. Programming Model Page 70 of 322 Description Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Bits Field Name 6:7 ICLPEI Instruction cache LRU parity error insert. 0 Record odd parity (normal). 1 Record even parity (simulate parity error). ICLPEI[6]: Records in the left array. ICLPEI[7]: Records in the right array. 8:9 ICTPEI Instruction cache tag parity error insert. 0 Record odd parity (normal). 1 Record even parity (simulate parity error). Controls inversion of parity bits that are recorded for the data field in the instruction cache. ICTPEI[8]: Records parity in the left array. ICTPEI[9]: Records parity in the right array. 10:11 DCDPEI Data cache data parity error insert. 0 Record odd parity (normal). 1 Record even parity (simulate parity error). Controls inversion of parity bits recorded for the data field in the data cache. DCDPEI[10]: Records parity in the even array. DCDPEI[11]: Records data parity in the odd array. 12:13 DCLPEI Data cache LRU parity error insert (even array). 0 Record odd parity (normal). 1 Record even parity (simulate parity error). Controls inversion of parity bits recorded for the LRU field in the data cache. DCLPEI[12]: Records data cache LRU parity in the even array. DCLPEI[13]: Records data cache LRU parity in the odd array. 14:15 DCTPEI Data cache Tag parity error insert (even array). 0 Record odd parity (normal). 1 Record even parity (simulate parity error). Controls inversion of parity bits recorded for the Tag field in the data cache. DCTPEI[14]: Records parity in the even array DCTPEI[15]: Records parity in the odd array 16 MMUTPEI Memory management unit tag parity error insert. 0 Record odd parity (normal). 1 Record even parity (simulate parity error). Controls inversion of parity bits recorded for the tag field in the MMU. 17 MMUDPEI Memory management unit data parity error insert. 0 Record odd parity (normal). 1 Record even parity (simulate parity error). Controls inversion of parity bits recorded for the tag field in the MMU. 18 Reserved 19 TSS 20 Reserved 21 DPC Version 2.2 July 31, 2014 Description Timer clock source select 0 CPU timer source is the CPU clock. 1 CPU timer source is an alternate timer clock. Disable parity checking (at reset, this bit is set to ‘1’). 0 Parity checking is enabled in the L1 cache core. 1 Disable all parity checking in the L1 cache core. Programming Model Page 71 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Bits Field Name 22:23 TCS 24:31 Reserved Programming Model Page 72 of 322 Description Timer clock select, watchdog timer select. 00 The CPU timer advances by one at each rising edge of the CPU input clock. 01 The CPU timer advances by every fourth rising edge of the CPU input clock. 10 The CPU timer advances by every eighth rising edge of the CPU input clock. 11 The CPU timer advances by every sixteenth rising edge of the CPU input clock. Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 5 6 7 8 9 MCDTO Reserved SPC5C1 STGCTR DISTG 4 DCSTGW 3 Reserved Reserved 2 PMUD DSTI 1 Reserved 0 DLFPD DSTG 2.7.6 Core Configuration Register 2 (CCR2) Reserved 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Bits Field Name Description 0:1 DSTG Disable store gathering. 00 When enabled, stores to all bytes within an L1 halfline can be gathered into a single transfer. 01 Only contiguous, overlapping store gathering is permitted. Noncontiguous store gathering is disabled. 10 Reserved 11 All store gathering is disabled. 2 DLFPD Data cache line fill prediction disable. 0 Line fill match prediction is enabled. 1 Line fill match prediction is disabled. 3 Reserved 4 DSTI 5:8 Reserved 9 PMUD 10 Reserved 11 DCSTGW Disable Cacheable Store Gathering Write-Through 0 Cacheable stores with W = 1 can gather, but must be contiguous. 1 Cacheable stores with W = 1 cannot gather. 12:15 STGCTR Store Gathering Counter This field describes how long a store request remains in the SBQ before a write request is sent. This counter is initialized to STGCTR × 2 whenever SBQ0 is loaded for a store. It decrements by one each cycle until it reaches zero or is initialized again. When it reaches zero, it forces the store in SBQ0 to be transmitted. It is gathered only by gatherable stores. 16 DISTG Disable Cache Inhibited Store Gathering 0 Inhibited stores can gather if they are on the same half of L1 cache line, and the cache line is contiguous and not guarded. 1 Inhibited stores do not gather. 17:19 Reserved 20 SPC5C1 ICU ‘AT’ Field Static Branch Predict on Code C5 and C1 0 No ‘AT’ field static branch predict. 1 Use ‘AT’ field static branch predict. 21 MCDTO Machine Check on DCR Timeout Enable 0 No DCR timeout 1 DCR timeout machine check enabled. 22:31 Reserved Version 2.2 July 31, 2014 Disable shadow TLB invalidate. 0 When context synchronization occurs, invalidate shadow TLBs (ITLB, DTLB). 1 If set, do not invalidate shadow TLBs upon isync context synchronization. Performance Monitor Unit Disable 0 Enable PMU counting 1 Disable PMU counting of various events Programming Model Page 73 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 2.7.7 Reset Configuration (RSTCFG) Reserved The read-only RSTCFG Register reports the values of certain fields of TLB as supplied at reset. Access to RSTCFG is privileged. 0 ERPN 1 2 3 4 5 6 Bits Field Name 0:1 Reserved 2:11 ERPN 12:16 Reserved 17 E 18:27 Reserved 28:31 U 7 Reserved 8 9 E Reserved U 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Description Extended real page number read only. Set to the value strapped by core inputs (chip implementation-specific configuration values). Endian read only. Set to the value strapped by core input (chip implementation-specific configuration values). U0 - U3 read only. Set to the value strapped by core inputs (chip implementation-specific configuration values). 2.7.8 Device Control Register Immediate Prefix Register (DCRIPR) The Device Control Register Immediate Prefix Register (DCRIPR) provides the upper order 22 bits of DCR address to be used by the mtdcr and mfdcr instructions. This SPR has hexadecimal address x‘37B’, can be read and written, and is privileged. It is implementation dependent; that is, it is not part of the Book E-III Architecture specification. To support the mtdcr[u]x and mfdcr[u]x instructions, the DCR interface adds 22 output pins for the upper order address bits and one output pin for the privileged/nonprivileged indicator. Note that privileged signal indicates which type of opcode caused the DCR operation to be presented on the DCR interface and is not directly related to the MSR[PR] bit. Privileged (also known as supervisor-mode) code can execute any of the six DCR opcodes, and hence can produce DCR operations on the interface with either value indicated on the privileged signal. Nonprivileged (user-mode) code only generates DCR traffic with a nonprivileged indication on the interface. If user-mode code attempts to execute a privileged opcode, an exception is signaled due to the privilege violation. Access to the DCRIPR is privileged. UOA 0 1 2 3 4 5 6 Bits Field Name 0:21 UOA 22:31 Reserved Programming Model Page 74 of 322 7 8 9 Reserved 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Description Upper order address. Implementation-specific. Used for the upper order address bits of DCR address. Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 2.8 User and Supervisor Modes Power ISA defines two operating states or modes: supervisor (privileged), and user (nonprivileged). Which mode the processor is operating in is controlled by MSR[PR]. When MSR[PR] is ‘0’, the processor is in supervisor mode, and can execute all instructions and access all registers, including privileged ones. When MSR[PR] is ‘1’, the processor is in user mode, and can only execute nonprivileged instructions and access nonprivileged registers. An attempt to execute a privileged instruction or to access a privileged register while in user mode causes a privileged instruction exception type program interrupt to occur. Note that the name PR for the MSR field refers to an historical alternative name for user mode, which is problem state. Hence, the value ‘1’ in the field indicates problem state and not privileged as one might expect. 2.8.1 Privileged Instructions The following instructions are privileged and cannot be executed in user mode: Table 2-29. Privileged Instructions Instruction Comments dcbi dci dcread ici icread mfdcr mfmsr mfspr For any SPR number with SPRN[5] = ‘1’. See Section 2.8.2 Privileged SPRs on page 75. mtdcr mtmsr mtspr For any SPR number with SPRN[5] = ‘1’. See Section 2.8.2 Privileged SPRs on page 75. rfci rfi rfmci tlbre tlbsx tlbsync tlbwe wrtee wrteei 2.8.2 Privileged SPRs Most SPRs are privileged. The only defined nonprivileged SPRs are the LR, CTR, XER, USPRG0, SPRG 3 - 7 (read access only), TBU (read access only), and TBL (read access only). The PowerPC 476FP core also treats all SPR numbers with a ‘1’ in bit 5 of the SPRN field as privileged, whether the particular SPR Version 2.2 July 31, 2014 Programming Model Page 75 of 322 User’s Manual PowerPC 476FP Embedded Processor Core number is defined or not. Therefore, the core causes a privileged instruction exception type program interrupt on any attempt to access such an SPR number while in user mode. In addition, the core causes an illegal instruction exception type program interrupt on any attempt to access an undefined SPR number with a ‘0’ in SPRN[5] while in user mode. However, the result of attempting to access an undefined SPR number in supervisor mode is undefined, regardless of the value in SPRN[5]. 2.9 Speculative Accesses The Power ISA permits implementations to perform speculative accesses to memory, either for instruction fetching, or for data loads. A speculative access is defined as any access that is not required by the sequential execution model (SEM). For example, the PowerPC 476FP core speculatively prefetches instructions down the predicted path of a conditional branch; if the branch is later determined to not go in the predicted direction, the fetching of the instructions from the predicted path is not required by the SEM and thus is speculative. Similarly, the PowerPC 476FP core executes load instructions out-of-order and can read data from memory for a load instruction that is past an undetermined branch. However, sometimes speculative accesses are inappropriate. For example, attempting to access data at addresses to which I/O devices are mapped can cause problems. If the I/O device is a serial port, reading it speculatively can cause data to be lost. The architecture provides two mechanisms for protecting against errant accesses to such non-well-behaved memory addresses. The first is the guarded (G) storage attribute, and protects against speculative data accesses. The second is the execute permission mechanism, and protects against speculative instruction fetches. Both of these mechanisms are described in Section 4 Memory Management Unit on page 103. 2.10 Synchronization The PowerPC 476FP core supports the synchronization operations of the d. There are three kinds of synchronization defined by the architecture, each of which is described in the following sections. 2.10.1 Context Synchronization The context of a program is the environment in which the program executes. For example, the mode (user or supervisor) is part of the context, as are the address translation space and storage attributes of the memory pages being accessed by the program. Context is controlled by the contents of certain registers and other resources, such as the MSR and the TLB. Under certain circumstances, it is necessary for the hardware or software to force the synchronization of a program’s context. Context synchronizing operations include all interrupts except machine check, and the isync, sc, rfi, rfci, mtmsr, and rfmci instructions. Context synchronizing operations satisfy the following requirements: 1. The operation is not initiated until all instructions preceding the operation have completed to the point at which they have reported any and all exceptions that they will cause. 2. All instructions preceding the operation must complete in the context in which they were initiated. That is, they must not be affected by any context changes caused by the context synchronizing operation, or any instructions after the context synchronizing operation. Programming Model Page 76 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 3. If the operation is the sc instruction (which causes a system call interrupt) or is itself an interrupt, the operation is not initiated until no higher priority interrupt is pending (see Section 7 Processor Interrupts and Exceptions on page 167). 4. All instructions that follow the operation must be refetched and executed in the context that is established by the completion of the context synchronizing operation and all of the instructions that preceded it. 5. If the operation is an mtmsr instruction, the operation is not initiated until all instructions preceding the operation have completed to the point in which they have reported any exceptions that they will cause. Then, MSR is updated with the contents of GPR bits 32:63, and the context synchronizing operation is performed. Context synchronizing operations do not force the completion of storage accesses, nor do they enforce any ordering among accesses before or after the context synchronizing operation. If such behavior is required, a storage synchronizing instruction must be used (see Section 2.10.3 Storage Ordering and Synchronization on page 78). Also, architecturally, machine check interrupts are not context synchronizing. Therefore, an instruction that precedes a context synchronizing operation can cause a machine check interrupt after the context synchronizing operation occurs and additional instructions have completed. For the PowerPC 476FP core, this can only occur with data machine check exceptions, and not instruction machine check exceptions. The following scenarios use pseudocode examples to illustrate the effects of context synchronization. Subsequent text explains how software can further guarantee storage ordering. 1. Consider the following self-modifying code instruction sequence: stw XYZ isync Store to caching inhibited address XYZ. Fetch and execute the instruction at address XYZ. In this sequence, the isync instruction does not guarantee that the XYZ instruction is fetched after the store has occurred to memory. There is no guarantee which XYZ instruction will execute; either the old version or the new (stored) version might. 2. Now consider the required self-modifying code sequence: stw dcbst msync icbi isync Write new instruction to data cache. Push the new instruction from the data cache to memory. Order copy before invalidating the old instruction in the instruction cache. Invalidate the copy in the instruction cache. Discard prefetched instructions and refetch of new instruction, context switch. 3. This example illustrates the use of isync with context changes to the debug facilities. mtdbcr0 isync XYZ Enable the instruction address compare (IAC) debug event. Wait for the new Debug Control Register 0 (DBCR0) context to be established. This instruction is at the IAC address; an isync is necessary to guarantee that the IAC event is recognized on the execution of this instruction; without the isync, the XYZ instruction can be prefetched and dispatched to execution before recognizing that the IAC event has been enabled. 4. The last example is the use of isync to access DCRs with mtdcr or mfdcr instructions based on DCRIPR register: mtspr isync mtdcr Version 2.2 July 31, 2014 DCRIPR set up DCRIPR value for DCRn. Ensures new DCRn by context synchronization. Access new DCR with new value. Programming Model Page 77 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 2.10.2 Execution Synchronization Execution synchronization is a subset of context synchronization. An execution synchronizing operation satisfies the first two requirements of context synchronizing operations, but not the latter two. That is, execution synchronizing operations guarantee that preceding instructions execute in the old context, but do not guarantee that subsequent instructions operate in the new context. For example, a scenario requiring execution synchronization is just before the execution of a TLB-updating instructions (such as tlbwe). An execution synchronizing instruction should be executed to guarantee that all preceding storage access instructions have performed their address translations before executing tlbwe to invalidate an entry that might be used by those preceding instructions. There are five execution synchronizing instructions: wrtee, wrteei, msync, mbar, and lwsync. All context synchronizing instruction are also implicitly execution synchronizing, because context synchronization is a superset of execution synchronization. The Power ISA imposes additional requirements on updates to MSR[EE] (the external interrupt enable bit). Specifically, if a wrtee, or wrteei instruction sets MSR[EE] = ‘1’, and an external input, decrementer, or fixedinterval timer exception is pending, the interrupt must be taken before the instruction that follows the MSR[EE]-updating is executed. In this sense, these MSR[EE]-updating instructions can be thought of as being context synchronizing with respect to the MSR[EE] bit, in that it guarantees that subsequent instructions execute (or are prevented from executing and an interrupt taken) according to the new context of MSR[EE]. 2.10.3 Storage Ordering and Synchronization Storage synchronization enforces ordering between storage access instructions executed by the PowerPC 476FP core. There are three storage synchronizing instructions: msync, mbar, and lwsync. The Power ISA defines different ordering requirements for these three instructions, but the PowerPC 476FP core implements msync and mbar in an identical fashion. Architecturally, msync is the stronger of the two, and is also execution synchronizing, whereas mbar is intended to be an equivalent of eieio. Thus, users are recommended to use mbar instead of eieio or a storage barrier operation for future compatibility. The lwsync instruction is a lighter version of msync. For more information, see the lightweight sync information in the storage control instructions chapter of Book II of Power ISA Version 2.05. However, msync guarantees that all preceding storage accesses have actually been performed with respect to the memory subsystem execution synchronization, before the execution of any instruction after the msync. Note: This requirement goes beyond the requirements of mere execution synchronization, in that execution synchronization does not require the completion of preceding storage accesses. The following two examples illustrate the distinctive use of mbar versus msync. stw lwz msync mtdcr Store data to an I/O device. Dummy load from the same 32-byte line to ensure that the store takes place before msync. Wait for store to actually complete. Reconfigure the I/O device. In this example, the mtdcr is reconfiguring the I/O device in a manner that would cause the preceding store instruction to fail, if the mtdcr changed the device before the completion of the store. Because mtdcr is not a storage access instruction, the use of mbar instead of msync does not guarantee that the store is performed before letting the mtdcr reconfigure the device. It only guarantees that subsequent storage accesses are not performed to memory or any device before the earlier store. Programming Model Page 78 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Another example follows: stb X mbar lbz Y Store data to an I/O device at address X, causing a status bit at address Y to be reset. Guarantee preceding store is performed to the device before any subsequent storage accesses are performed. Load status from the I/O device at address Y. Here, mbar is appropriate instead of msync because all that is required is that the store to the I/O device happens before the load does. Other instructions subsequent to the mbar are not executed before the store. 2.10.4 SPRs Requiring Context Synchronization The following is a list of SPRs that may require context synchronization when written by the mtspr instruction: • • • • • • • • • • • • • Process ID (PID) Debug Control Register 0 - 2 (DBCR0 - DBCR2) Instruction Address Compare 1 - 4 (IAC1 - IAC4) Data Cache Address Compare 1 - 2 (DAC1 - DAC2) Data Cache Value Compare 1 - 2 (DVC1 - DVC2) MMU Configuration Register (MMUCR) Real Mode Page Description Register (RMPD) Supervisor Search Priority Configuration Register (SSPCR) User Search Priority Configuration Register (USPCR) Invalidate Search Priority Configuration Register (ISPCR) Core Configuration Register 0 - 2 (CCR0 - CCR2) Instruction Opcode Compare Control Register (IOCCR) Instruction Opcode Compare Register 1 - 2 (IOCR1 - IOCR2) The following examples demonstrate the effects of the context synchronization. Example 1: mtPID isync XYZ Change PID or virtual addressing. Context switch to wait for and ensure the new PID to use next. XYZ instruction is based on the new PID. Example 2: mtIAC mtIAC2 mtIAC3 mtIAC4 mtDAC1 mtDAC2 mtDVC1 mtDVC2 mtDBCR1 mtDBCR2 mtDBCR0 isync XYZ Version 2.2 July 31, 2014 IAC1 setup. DA1C setup. DAV setup. IAC debug control setup. DAC, DVC debug control setup. Enable debug events. Ensure all debug controls context are established. All debug events set up in the previous code are now in effect. Programming Model Page 79 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Example 3: mtCCR0 mtCCR1 mtCCR2 isync XYZ CCR0 change. Ensure the new configuration context is established. New configuration is in effect. Example 4: mtMMUCR mtRMPD mtSSPCR mtUSPCR mtISPCR isync XYZ MMU configuration register is updated. Ensure the new MMU configuration context is established. The new MMU environment is in effect. Example 5: mtIOCR1 mtIOCR2 mtIOCCR isync XYZ IOCR1 update. IOCR2 update. IOCCR update. Ensure the new instruction trap control is established. The new instruction trap is in effect. 2.10.5 Instructions Requiring a Context Synchronization Instruction The following instructions require a context synchronization instruction (CSI) to ensure the effect on the subsequent instruction operations: tlbwe: • Instruction fetch: A CSI (isync) is required after a tlbwe is executed. • Operand (data) access: A CSI (isync) is required before and after a tlbwe is executed. The recommended sequence for the tlbwe instruction is as follows: 1. isync (If operand access is concerned). 2. tlbwe Write all or necessary words. 3. isync Ensure the new TLB mapping. tlbivax: Note: See Section 4.9 UTLB Coherency on page 130 for more information about tlbivax. • Instruction fetch: A CSI (isync) is required after a tlbivax is executed. Programming Model Page 80 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core • Operand (data) access: A CSI (isync) is required before and after a tlbivax is executed. The recommended sequence for the tlbivax instruction is as follows: 1. isync (if operand access is concerned) 2. tlbivax 3. tlbivax (multiple tlbivax instructions if needed) 4. tlbsync 5. msync 6. isync 2.11 Storage Model The PowerPC 476FP core and PowerPC subsystem support memory coherency in full time, whether or not page attribute M is being set. Also, the PowerPC 476FP storage is a weakly-consistent model, and therefore, loads can generally be accessed out-of-order, except for the following cases: • lwarx operands are always accessed in order. • Cache-inhibited operand accesses are performed in order, including the case in which G = I = ‘1’. • Operand accesses within the coherency granule, which is the L2 cache line, are in order. See the storage model chapter in Power ISA Version 2.05, Book I for further details about the following topics: • Atomicity • Cache model • Storage Control Attributes • Shared Storage The PowerPC 476FP weakly-consistent model has the following characteristics: • Allows load misses to bypass or return out of order as long as they are not in the same cache line (the L1 cache line granule). • Keeps stores in order (though some Power ISA designs can allow out-of-order stores). • Allows store data to be forwarded to subsequent loads on the same processor. • Allows the first operand use even if the line is snoop invalidated. Generally the L2 sends the newest data. Because of this storage model design, there might be an issue in the following data dependency scenario: CPU lwz sth lwz 0 R3,X R3,Y R31,Y Version 2.2 July 31, 2014 CPU 1 st addrY msync st addrX Programming Model Page 81 of 322 User’s Manual PowerPC 476FP Embedded Processor Core In the previous code example, CPU 0 might miss on X but hit on Y and get new data X. However, R31 is loaded with both new and old data. But if CPU 0 has a cache miss on sth Y (store half word or store byte), R31is new data Y, and there is no issue. Consult with IBM PowerPC support for further details and when you have such issues in your applications. The following examples demonstrate the Power ISA specification and recommendations for store operations ordering. These examples avoid the data dependency issues described previously. Note: msync is used to ensure store operations ordering. Example 1: Ordering Type #1, Operand Boundary Matches CPU 0 lwz R3,X sync stw R3,Y lwz R31,Y CPU 1 st addrY msync st addrX .. Example 2: Ordering Type #2, Memory Barrier Use CPU 0 lwz R3,X sync sth R3,Y lwz R31,Y CPU 1 st addrY msync st addrX ..or CPU 0 lwz R3,X sync lwz R31,Y CPU 1 st addrY msync st addrX Example 3: Ordering Type #3, Test Flag for Store CPU 0 lwz R3,X cmp X (test X) bne .. st R31,Z CPU 1 st addrY msync st addrX In the previous example, CPU 0 st Z is not an issue because Z is updated only if X is updated. Example 4: Ordering Type #4, Test Flag (See Example 5: Server Practice) CPU lwz cmp bne sth lwz 0 R3,X X (test X) loop A R3,Y R31,Y Programming Model Page 82 of 322 CPU 1 st addrY msync st addrx .. Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core .. loop A Example 5: Server Practice A barrier operation is needed between the following two loads (with or without compare and branch instructions). loop1 Version 2.2 July 31, 2014 CPU 0 lw X cmp bne loop1 msync/mabar/isync lw Y CPU 1 st Y msync st X Programming Model Page 83 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Programming Model Page 84 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 3. Floating-Point Unit Programming Model The programming model of the PowerPC 476FP core describes how the features and operations appear to programmers. The floating-point processor chapter in Book-I of Power ISA Version 2.05 specifies that the floating-point unit (FPU) implements a floating-point system as defined in ANSI/IEEE Standard 754-1985, IEEE Standard for Binary Floating-Point Arithmetic (referred to as IEEE 754), but the architecture requires software support to conform fully with the standard. IEEE 754 defines certain required operations (addition, subtraction, and so on); the term floating-point operation is used to refer to one of these required operations, or to the operation performed by one of the multiply-add or reciprocal estimate instructions. In the PowerPC 476FP core, all floating-point operations conform to the IEEE standard. 3.1 Floating-Point Exceptions Each floating-point exception, and each category of invalid operation exception, is associated with an exception bit in the FPSCR. The following floating-point exceptions are detected by the processor; the associated FPSCR fields are listed with each exception and invalid operation exception category: • Invalid operation exception (VX) (seeTable 3-1) Table 3-1. Invalid Operation Exception Categories Category FPSCR Field SNaN VXSNAN Infinity – Infinity VXISI Infinity ÷ Infinity VXIDI Zero ÷ Zero VXZDZ Infinity × Zero VXIMZ Invalid Compare VXVC Software Request VXSOFT Invalid Square Root VXSQRT Invalid Integer Convert VXCVI • Zero divide exception (ZX) • Overflow exception (OX) • Underflow exception (UX) • Inexact exception (XI) Each floating-point exception also has a corresponding enable bit in the FPSCR. See Section 3.4.8 FloatingPoint Status and Control Register Instructions on page 102 for descriptions of these exception and enable bits. Version 2.2 July 31, 2014 Floating-Point Unit Programming Model Page 85 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 3.2 Floating-Point Registers This section provides an overview of the register types implemented in the PowerPC 476FP core. Detailed descriptions of the floating-point registers are provided within the sections covering the functions with which they are associated. An alphabetical summary of all registers, including bit definitions, is provided in Appendix A Register Summary on page 263. Certain bits in some registers are reserved and are not necessarily implemented. For all registers with fields marked as reserved, these reserved fields should be written as ‘0’ and read as undefined. The recommended coding practice is to perform the initial write to a register with reserved fields set to ‘0’, and to perform all subsequent writes to the register using a read-modify-write strategy: read the register; use logical instructions to alter defined fields, leaving reserved fields unmodified; and write the register. Each register is classified as being of a particular type, as characterized by the specific instructions used to read and write registers of that type. The registers contained within the PowerPC 476FP processor are defined by the floating-point processor chapter in Book-I of Power ISA Version 2.05. 3.2.1 Register Types The PowerPC 476FP processor provides two types of floating-point registers: Floating-Point Registers (FPRs) and the FPSCR. Each type is characterized by the instructions that are used to read and write the registers. The following subsections provide an overview of each register type and the instructions that are associated with them. 3.2.1.1 Floating-Point Registers (FPR0 - FPR31) The PowerPC 476FP processor provides 32 FPRs, each 64-bits wide. In any cycle, the FPR file can read the operands for a store instruction and an arithmetic instruction, or write the data from a load instruction and the result of an arithmetic instruction. Data 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Data 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 0:63 Data Description Floating-point register data. The FPRs are numbered FPR0 - FPR31. The floating-point instruction formats provide 5-bit fields to specify the FPRs used as operands in the execution of the associated instructions. Each FPR contains 64 bits that support the floating-point double format (see the floating-point processor chapter in Book-I of Power ISA Version 2.05 for details). All instructions that interpret the contents of an FPR as a floating-point value uses the floating-point double format for this interpretation. Though architecturally FPRs are 64-bits, the FPRs consist of 66-bit wide encoded data plus an additional 8 bits of parity protection (74 bits total). Floating-Point Unit Programming Model Page 86 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core The computational instructions, and the move and select instructions, operate on data located in FPRs and, with the exception of the compare instructions, place the result value into a FPR and optionally place status information into the Condition Register (CR). Load and store double instructions are provided that transfer 64 bits of data between storage and the FPRs with no conversion. Load single instructions transfer and convert floating-point values in floating-point single format from storage to the same value in floating-point double format in the FPRs. Store single instructions are provided to transfer and convert floating-point values in floating-point double format from the FPRs to the same value in floating-point single format in storage. Some floating-point instructions update the FPSCR and CR explicitly. Some of these instructions move data to and from an FPR to the FPSCR, or from the FPSCR to an FPR. The computational instructions and the select instruction accept values from the FPRs in double format. For single-precision arithmetic instructions, all input values must be representable in single format; if not, the result placed into the target FPR, and the setting of status bits in the FPSCR are undefined. 3.2.1.2 Floating-Point Status and Control Register (FPSCR) VE OE UE ZE XE Reserved VXCVI VXSQRT VXSOFT FL FG FE FU Reserved FR FI FPRF VXVC VXIMZ VXZDZ VXIDI VXISI VX OX UX ZX XX VXSNAN FX FEX The FPSCR controls the handling of floating-point exceptions and records status resulting from the floatingpoint operations. FPSCR bits 0:31 are reserved. RN 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name Description 32 FX Floating-point exception summary. 0 No FPSCR exception bits changed from 0 to 1. 1 At least one FPSCR exception bit changed from 0 to 1. All floating-point instructions, except mtfsfi and mtfsf, implicitly set this field to 1 if the instruction causes any floating-point exception bits in the FPSCR to change from 0 to 1. mcrfs, mtfsfi, mtfsf, mtfsb0, and mtfsb1 can alter this field explicitly. 33 FEX 34 VX Floating-point invalid operation exception summary. The OR of all the invalid operation exception fields. The mcrfs, mtfsfi, mtfsf, mtfsb0, and mtfsb1 instructions cannot alter this field explicitly. 35 OX Floating-point overflow exception. 0 A floating-point overflow exception did not occur. 1 A floating-point overflow exception occurred. 36 UX Floating-point underflow exception. 0 A floating-point underflow exception did not occur. 1 A floating-point underflow exception occurred. 37 ZX Floating-point zero divide exception. 0 A floating-point zero divide exception did not occur. 1 A floating-point zero divide exception occurred. Version 2.2 July 31, 2014 Floating-point enabled exception summary. The OR of all the floating-point exception fields masked by their respective enable fields. mcrfs, mtfsfi, mtfsf, mtfsb0, and mtfsb1 cannot alter this field explicitly. Floating-Point Unit Programming Model Page 87 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Bits Field Name Description 38 XX Floating-point inexact exception. 0 A floating-point inexact exception did not occur. 1 A floating-point inexact exception occurred. This field is a sticky version of FPSCR[FI]. The following rules describe how a given instruction sets this field: • If the instruction affects FPSCR[FI], the new value of this field is obtained by ORing the old value of this field with the new value of FPSCR[FI]. • If the instruction does not affect FPSCR[FI], the value of this field is unchanged. 39 VXSNAN 40 VXISI Floating-point invalid operation exception (∞ – ∞). 0 A floating-point invalid operation exception (VXISI) did not occur. 1 A floating-point invalid operation exception (VXISI) occurred. 41 VXIDI Floating-point invalid operation exception (∞ ÷ ∞). 0 A floating-point invalid operation exception (VXIDI) did not occur. 1 A floating-point invalid operation exception (VXIDI) occurred. 42 VXZDZ Floating-point invalid operation exception (0 ÷ 0). 0 A floating-point invalid operation exception (VXZDZ) did not occur. 1 A floating-point invalid operation exception (VXZDZ) occurred. 43 VXIMZ Floating-point invalid operation exception (∞ × 0). 0 A floating-point invalid operation exception (VXIMZ) did not occur. 1 A floating-point invalid operation exception (VXIMZ) occurred. 44 VXVC Floating-point invalid operation exception (invalid compare). 0 A floating-point invalid operation exception (VXVC) did not occur. 1 A floating-point invalid operation exception (VXVC) occurred. 45 FR Floating-point invalid operation exception (SNaN). 0 A floating-point invalid operation exception (VXSNAN) did not occur. 1 A floating-point invalid operation exception (VXSNAN) occurred. Floating-point fraction rounded. The last arithmetic or rounding and conversion instruction either produced an inexact result during rounding or caused a disabled overflow exception. See Section 3.3.6 Rounding on page 94. This bit is not sticky. 46 FI Floating-point fraction inexact. The last arithmetic or rounding and conversion instruction either produced an inexact result during rounding or caused a disabled overflow exception. See Section 3.3.6 Rounding. This bit is not sticky. See the definition of FPSCR[XX] regarding the relationship between FPSCR[FI] and FPSCR[XX]. Floating-point result flag (FPRF). 47 FPRF 48 FL Floating-point less than or negative. 49 FG Floating-point greater than or positive. 50 FE Floating-point equal to zero. 51 FU Floating-point unordered or not-a-number (NaN). 52 Reserved Reserved. 53 VXSOFT Floating-point invalid operation exception (software request). 0 A floating-point invalid operation exception (software request) did not occur. 1 A floating-point invalid operation exception (software request) occurred. 54 VXSQRT Floating-point invalid operation exception (invalid square root). 0 A floating-point invalid operation exception (invalid square root) did not occur. 1 A floating-point invalid operation exception (invalid square root) occurred. Floating-Point Unit Programming Model Page 88 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Bits Field Name Description 55 VXCVI 56 VE Floating-point invalid operation exception enabled. 0 Floating-point invalid operation exceptions are disabled. 1 Floating-point invalid operation exceptions are enabled. 57 OE Floating-point overflow exception enable. 0 Floating-point overflow exceptions are disabled. 1 Floating-point overflow exceptions are enabled. 58 UE Floating-point underflow exception enable. 0 Floating-point underflow exceptions are disabled. 1 Floating-point underflow exceptions are enabled. 59 ZE Floating-point zero divide exception enable. 0 Floating-point zero divide exceptions are disabled. 1 Floating-point zero divide exceptions are enabled. 60 XE Floating-point inexact exception enable. 0 Floating-point inexact exceptions are disabled. 1 Floating-point inexact exceptions are enabled. 61 Reserved 62:63 RN Floating-point invalid operation exception (invalid integer convert). 0 A floating-point invalid operation exception (invalid integer convert) did not occur. 1 A floating-point invalid operation exception (invalid integer convert) occurred. Floating-point rounding control. 00 Round to nearest. 01 Round toward zero. 10 Round toward +∞. 11 Round toward –∞. See Rounding on page 94. Note: Setting FPSCR[NI] = ‘1’ is intended to permit results to be approximate and to cause performance to be more predictable and less data-dependent than when FPSCR[NI] = ‘0’. For example, in non-IEEE mode, 0 is returned instead of a denormalized number, and non-IEEE mode may return a large number instead of an infinity. The following section describes floating-point data formats, representation of floating-point values, data handling and precision, and rounding. 3.3 Floating-Point Data Formats Floating-point values are represented in two binary fixed-length formats. Single-precision values are represented in the 32-bit single format. Double-precision values are represented in the 64-bit double format. The single format can be used for data in storage, but cannot be stored in the FPRs. The double format can be used for data in storage and for data in the FPRs. When a floating-point value is loaded from storage using a load single instruction, it is converted to double format and placed in the target FPR. Conversely, a floatingpoint value stored from an FPR into storage using a store single instruction is converted to single format before being placed in storage. See the FP load instructions and FP store instructions in the floating-point processor chapter of Book-I in Power ISA Version 2.05 Values in floating-point format are composed of three fields, as shown in Table 3-2. Version 2.2 July 31, 2014 Floating-Point Unit Programming Model Page 89 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table 3-2. Format Fields Field Description S Sign Bit EXP Exponent + bias FRACTION Fraction The lengths of the exponent and the fraction fields differ between the single and double formats. See Table 3-3 for more information. 3.3.1 Value Representation Representation of numeric values in the floating-point formats consists of a sign bit (S), a biased exponent (EXP), and the fraction portion (FRACTION) of the significand. The significand consists of a leading implied bit concatenated on the right with the FRACTION. This leading implied bit is ‘1’ for normalized numbers and ‘0’ for denormalized numbers and is located in the unit bit position (that is, the first bit to the left of the binary point). Values representable within the two floating-point formats can be specified by the parameters listed in Table 3-3. Table 3-3. IEEE 754 Floating-Point Fields Parameter Single Double Exponent Bias +127 +1023 Maximum Exponent +127 +1023 Minimum Exponent –126 –1022 Sign 1 1 Exponent 8 11 Fraction 23 52 Significand 24 53 Field Widths (Bits) The FPRs support the floating-point double format only. The numeric and nonnumeric values representable within each of the two supported formats are approximations to the real numbers and include the normalized numbers, denormalized numbers, and zero values. The nonnumeric values that are representable are the infinities and the not a numbers (NaNs). The infinities are adjoined to the real numbers, but are not numbers themselves, and the standard rules of arithmetic do not hold when they are used in an operation. They are related to the real numbers by order alone. It is possible, however, to define restricted operations among numbers and infinities. The relative location on the real number line for each of the defined entities is shown in Figure 3-1. Figure 3-1. Approximation to Real Numbers –INF Floating-Point Unit Programming Model Page 90 of 322 –NOR –DEN -0 +0 +DEN +NOR +INF Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core The NaNs are not related to the numeric values or infinities by order or value, but are encodings used to convey diagnostic information such as the representation of uninitialized variables. The different floating-point values defined in the architecture are described in the following sections. 3.3.2 Binary Floating-Point Numbers Machine-representable values used as approximations to real numbers. Three categories of numbers are supported: normalized numbers, denormalized numbers, and zero values. 3.3.2.1 Normalized Numbers Normalized numbers (±NOR) have an unbiased exponent value in the range: • –126 to 127 in single format • –1022 to 1023 in double format They are values in which the implied unit bit is 1. Normalized numbers are interpreted as follows: NOR = (–1)s × 2E × (1.fraction) where s is the sign, E is the unbiased exponent, and 1.fraction is the significand, which is composed of a leading unit bit (implied bit) and a fraction part. The ranges covered by the magnitude (M) of a normalized floating-point number are approximately equal to: • Single format: 1.2 × 10–38 ≤ M ≤ 3.4 × 1038 • Double format: 2.2 × 10–308 ≤ M ≤ 1.8 × 10308 3.3.2.2 Denormalized Numbers Denormalized numbers (±DEN) are values that have a biased exponent value of zero and a nonzero fraction value. They are nonzero numbers smaller in magnitude than the representable normalized numbers. They are values in which the implied unit bit is 0. Denormalized numbers are interpreted as follows: DEN = (–1)s × 2Emin × (0.fraction) where Emin is the minimum representable exponent value (–126 for single-precision, –1022 for double-precision). 3.3.2.3 Zero Values Zero values (±0) have a biased exponent value of zero and a fraction value of zero. Zeros can have a positive or negative sign. The sign of zero is ignored by comparison operations; comparison treats +0 as equal to –0). 3.3.3 Infinities Infinities (±∞) are values that have the maximum biased exponent value: Version 2.2 July 31, 2014 Floating-Point Unit Programming Model Page 91 of 322 User’s Manual PowerPC 476FP Embedded Processor Core • 255 in single format • 2047 in double format and a zero fraction value. They are used to approximate values greater in magnitude than the maximum normalized value. Infinity arithmetic is defined as the limiting case of real arithmetic, with restricted operations defined among numbers and infinities. Infinities and the real numbers can be related by ordering in the affine sense: –∞ < every finite number < +∞ Arithmetic on infinities is always exact and does not signal any exception, except when an exception occurs due to the invalid operations. 3.3.3.1 Not a Numbers Not a numbers (NaNs) are values that have the maximum biased exponent value and a nonzero fraction value. The sign bit is ignored, that is, NaNs are neither positive nor negative. If the high-order bit of the fraction field is ‘0’, the NaN is a signalling NaN (SNaN); otherwise, it is a quiet NaN (QNaN). Signaling NaNs are used to signal exceptions when they appear as operands of computational instructions. Quiet NaNs are used to represent the results of certain invalid operations, such as invalid arithmetic operations on infinities or on NaNs, when invalid operation exception is disabled (FPSCR[VE] = ‘0’). Quiet NaNs propagate through all floating-point instructions except fcmpo, frsp, and fctiw. Quiet NaNs do not signal exceptions, except for ordered comparison and conversion to integer operations. Specific encodings in QNaNs can thus be preserved through a sequence of floating-point operations, and used to convey diagnostic information to help identify results from invalid operations. When a QNaN is the result of a floating-point operation because one of the operands is a NaN or because a QNaN was generated due to a disabled invalid operation exception, the following rule is applied to determine the NaN with the high-order fraction bit set to 1 that is to be stored as the result. if FPR(FRA) is a NaN then FPR(FRT) ← FPR(FRA) else if FPR(FRB) is a NaN then if instruction is frsp then FPR(FRT) ← FPR(FRB)[0:34] || 290 else FPR(FRT) ← FPR(FRB) else if FPR(FRC) is a NaN then FPR(FRT) ← FPR(FRC) else if generated QNaN then FPR(FRT) ← generated QNaN If the operand specified by FRA is a NaN, that NaN is stored as the result. Otherwise, if the operand specified by FRB is a NaN (if the instruction specifies an FRB operand), that NaN is stored as the result, with the loworder 29 bits of the result set to ‘0’ if the instruction is frsp. Otherwise, if the operand specified by FRC is a NaN (if the instruction specifies an FRC operand), that NaN is stored as the result. Otherwise, if a QNaN was generated due to a disabled invalid operation exception, that QNaN is stored as the result. If a QNaN is to be generated as a result, the QNaN generated has a sign bit of ‘0’, an exponent field of all ‘1’s, and a high-order fraction bit of ‘1’ with all other fraction bits 0. Any instruction that generates a QNaN as the result of a disabled invalid operation must generate this QNaN (that is, x‘7FF8 0000 0000 0000’). Floating-Point Unit Programming Model Page 92 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core A double-precision NaN is representable in single format if and only if the low-order 29 bits of the doubleprecision NaNs fraction are zero. 3.3.4 Sign of Result The following rules govern the sign of the result of an arithmetic, rounding, or conversion operation, when the operation does not yield an exception. They apply even when the operands or results are zeros or infinities. • The sign of the result of an add operation is the sign of the operand having the larger absolute value. The sign of the result of the subtract operation x – y is the same as the sign of the result of the add operation x + (–y). When the sum of two operands with opposite sign, or the difference of two operands with the same sign, is exactly zero, the sign of the result is positive in all rounding modes except round toward -Infinity, in which mode the sign is negative. • The sign of the result of a multiply or divide operation is the exclusive OR of the signs of the operands. • The sign of the result of a frsqrte instruction is always positive, except that the reciprocal square root of –0 is –Infinity. • The sign of the result of an frsp[.] or fctiw operation is the sign of the operand being converted. For the multiply-add instructions, the preceding rules are applied first to the multiply operation and then to the add or subtract operation (one of the inputs to the add or subtract operation is the result of the multiply operation). 3.3.5 Data Handling and Precision Instructions are defined to move floating-point data between the FPRs and storage. For double format data, the data are not altered during the move. For single format data, a format conversion from single to double is performed when loading from storage into an FPR. A format conversion from double to single is performed when storing from an FPR to storage. The load/store instructions do not cause floating-point exceptions. All computational, move, and fsel instructions use the floating-point double format. Floating-point single-precision values are obtained with the following types of instruction. • Load floating-point single. This form of instruction accesses a single-precision operand in single format in storage, converts it to double format, and loads it into an FPR. No floating-point exceptions are caused by these instructions. • Round to floating-point single-precision. The frsp instruction rounds a double-precision operand to single-precision, checking the exponent for single-precision range and handling any exceptions according to respective enable bits, and places that operand into an FPR as a double-precision operand. For results produced by single-precision arithmetic instructions, single-precision loads, and other instances of the frsp instruction, this operation does not alter the value. Note: The frsp instruction enables value conversion from double-precision to single-precision with appropriate exception checking and rounding. This instruction should be used to convert double-precision floatingpoint values (produced by double-precision load and arithmetic instructions) to single-precision values before storing them into single format storage elements or using them as operands for single-precision arithmetic instructions. Values produced by single-precision load and arithmetic instructions are already single-precision values and can be stored directly into single format storage elements, or used directly as operands for singleVersion 2.2 July 31, 2014 Floating-Point Unit Programming Model Page 93 of 322 User’s Manual PowerPC 476FP Embedded Processor Core precision arithmetic instructions, without preceding the store, or the arithmetic instruction, by an frsp instruction. • Single-precision arithmetic instructions. This form of instruction takes operands from the FPRs in double format, performs the operation as if it produced an intermediate result having infinite precision and unbounded exponent range, and then coerces this intermediate result to fit in single format. Status bits in the FPSCR are set to reflect the single-precision result. The result is then converted to double format and placed into an FPR. The result lies in the range supported by the single format. All input values must be representable in single format. If they are not, the result placed into the target FPR, and the setting of status bits in the FPSCR, are undefined. • Store floating-point single. This form of instruction converts a double-precision operand to single format and stores that operand into storage. No floating-point exceptions are caused by these instructions. (The value being stored is effectively assumed to be the result of an instruction of one of the preceding three types.) When the result of a load floating-point single, frsp, or single-precision arithmetic instruction is stored in an FPR, the low-order 29 fraction bits are zero. Note: A single-precision value can be used in double-precision arithmetic operations. The reverse is true only if the double-precision value is representable in single format. 3.3.6 Rounding Rounding applies to operations that have numeric operands (operands that are not infinities or NaNs). Rounding the intermediate result of such operations might cause an overflow exception, an underflow exception, or an inexact exception. The following description assumes that the operations cause no exceptions and that the result is numeric. See Section 3.3.1 Value Representation on page 90 for the cases not covered here. The arithmetic and rounding and conversion instructions produce intermediate results that can be regarded as having infinite precision and unbounded exponent range. Such intermediate results are normalized or denormalized if required, then rounded to the target format. The final result is then placed into the target FPR in double format or in integer format, depending on the instruction. The arithmetic and rounding and conversion instructions, which round intermediate results, set FPSCR[FR, FI]. If the fraction was incremented during rounding, FPSCR[FR] = ‘1’; otherwise, FPSCR[FR] = ‘0’. If the rounded result is inexact, FPSCR[FI] = ‘1’; otherwise, FPSCR[FI] = ‘0’. The estimate instructions set FPSCR[FR, FI] to undefined values. The remaining floating-point instructions do not alter FPSCR[FR, FI]. FPSCR[RN] specifies one of four programmable rounding modes. Let z be the intermediate arithmetic result or the operand of a convert operation. If z can be represented exactly in the target format, then the result in all rounding modes is z as represented in the target format. If z cannot be represented exactly in the target format, let z1 and z2 bound z as the next larger and next smaller numbers representable in the target format. Then, z1 or z2 can be used to approximate the result in the target format. Figure 3-2 shows the relation of z, z1, and z2 in this case. The following rules specify the rounding in the four modes. LSb means least-significant bit. Floating-Point Unit Programming Model Page 94 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Figure 3-2. Selection of z1 and z2 By Incrementing LSb of z Infinitely Precise Value By Truncating after LSb z2 z z1 z2 0 Negative values z z1 Positive values Table 3-4 describes the rounding modes. Table 3-4. Rounding Modes FPSCR[RN] Rounding Mode Description 00 Round to nearest. Choose the value that is closest to z, either z1 or z2. In case of a tie, choose the one that is even (the LSb is 0). 01 Round toward zero. 10 Round toward +infinity. Choose z1. 11 Round toward –infinity. Choose z2. Choose the smaller in magnitude (z1 or z2). 3.4 Floating-Point Instructions Primary opcode 63 is used for the double-precision arithmetic instructions and miscellaneous instructions, such as the floating-point status and control register manipulation instructions. Primary opcode 59 is used for the single-precision arithmetic instructions. The single-precision instructions for which there is a corresponding double-precision instruction have the same format and extended opcode as the corresponding double-precision instruction. Instructions are provided to perform arithmetic, rounding, conversion, comparison, and other operations in floating-point registers; to move floating-point data between storage and these registers; and to manipulate the FPSCR explicitly. Version 2.2 July 31, 2014 Floating-Point Unit Programming Model Page 95 of 322 User’s Manual PowerPC 476FP Embedded Processor Core These instructions are divided into two categories. • Computational instructions The computational instructions are those that perform addition, subtraction, multiplication, division, extracting the square root, rounding, conversion, comparison, and combinations of these operations. These instructions provide the floating-point operations. They place status information into the FPSCR. They are the instructions described in Section 3.4.5 Floating-Point Arithmetic Instructions on page 100, Section 3.4.6 Floating-Point Rounding and Conversion Instructions on page 101, and Section 3.4.7 Floating-Point Compare Instructions on page 101. • Noncomputational instructions The noncomputational instructions that perform loads and stores, move the contents of a floating-point register to another floating-point register possibly altering the sign, manipulate the FPSCR explicitly, and select a value from one of two floating-point registers based on the value in a third floating-point register. These operations are not considered floating-point operations. With the exception of the instructions that manipulate the FPSCR explicitly, they do not alter the FPSCR. Those instructions are described in Section 3.4.8 Floating-Point Status and Control Register Instructions on page 102. A floating-point number consists of a signed exponent and a signed significand. The quantity expressed by this number is the product of the significand and the number 2exponent. Encodings are provided in the data format to represent finite numeric values, ±infinity, and values that are not a number (NaN). Operations involving infinities produce results following traditional mathematical conventions. NaNs have no mathematical interpretation, but their encoding supports a variable diagnostic information field. NaNs may be used to indicate such things as uninitialized variables, and can be produced by certain invalid operations. One class of exceptions that occur during floating-point instruction execution is unique to floating-point operations: the floating-point exception. Bits set in the FPSCR indicate floating-point exceptions. They can cause an enabled exception type program interrupt to be taken, precisely or imprecisely, if the proper control bits are set. 3.4.1 Instructions By Category The floating-point instructions can be classified into computational and noncomputational categories. The computational instructions include those that perform arithmetic operations or conversions on operands. Noncomputational instructions perform loads/stores and moves (with possible sign changes), or select data. Additionally, some noncomputational instructions can write directly to the FPSCR. All instructions executed in the load/store pipeline are noncomputational, while most executed in the arithmetic pipe are computational. All floating-point operands are stored internally in double-precision format. Arithmetic operations specified as single, require that the internal data is representable as single (that is, having an unbiased exponent between -126 and 127 and a significand accurately representable in 24 bits). If the data cannot be represented in this way, the results stored in FPR, and the status bits set in FPSCR and CR (as appropriate), are undefined. For consistency, to reduce the likelihood of causing a serious malfunction resulting from user error, and to enable random testing, single-precision operations are performed on double-precision operands. For all cases except for fdivs, the operation is performed as if it were double-precision; the result is then rounded to single-precision. For fdivs, the appropriate number of iterations are performed to accomplish a single-precision result (potentially with early out); the quotient is then properly rounded. Floating-Point Unit Programming Model Page 96 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core In all cases, result exceptions (overflow, underflow, and inexact) are detected and reported based on the result, not on the source operands. Default (masked exception) results are the same as for the single-precision instructions. In the case of masked overflow or underflow exceptions, the least significant 11 bits of the adjusted true exponent are returned. The results of all single-precision operations are rounded to single-precision. These results are stored in double-precision format, but are restricted to single-precision range (exponent and fraction). All status bits are set based upon the single-precision result. 3.4.2 Load and Store Instructions The PowerPC 476FP processor instruction set includes instructions to load from memory to an FPR, and to store from an FPR to memory. Data received from PowerPC 476FP core can be single or double-precision, and in the big or little-endian formats. Also, the data received is word aligned. Data to the FPR must be in the big-endian, double-precision format. There are two basic forms of load instruction: single-precision and double-precision. Because the FPRs support only floating-point double format, single-precision load floating-point instructions convert single-precision data to double format before loading the operand into the target FPR. The conversion and loading steps are as follows. Let WORD[0:31] be the floating-point single-precision operand accessed from storage. Normalized Operand if WORD[1:8] > 0 and WORD[1:8] < 255 then FPR(FRT)[0:1] ← WORD[0:1] FPR(FRT)[2] ← ¬WORD[1] FPR(FRT)[3] ← ¬WORD[1] FPR(FRT)[4] ← ¬WORD[1] FPR(FRT)[5:63] ← WORD[2:31] || 290 Denormalized Operand if WORD[1:8] = 0 and WORD[9:31] ≠ 0 then sign ← WORD[0] exp ← -126 frac[0:52] ← 0b0 || WORD[9:31] || 290 normalize the operand do while frac[0] = 0 frac ← frac[1:52] || 0b0 exp ← exp - 1 FPR(FRT)[0] ← sign FPR(FRT)[1:11] ← exp + 1023 FPR(FRT)[12:63] ← frac[1:52] Zero / Infinity / NaN if WORD[1:8] = 255 or WORD[1:31] = 0 then FPR(FRT)[0:1] ← WORD[0:1] FPR(FRT)[2] ← WORD[1] FPR(FRT)[3] ← WORD[1] FPR(FRT)[4] ← WORD[1] FPR(FRT)[5:63] ← WORD[2:31] || 290 Version 2.2 July 31, 2014 Floating-Point Unit Programming Model Page 97 of 322 User’s Manual PowerPC 476FP Embedded Processor Core For double-precision load floating-point instructions no conversion is required because the data from storage are copied directly into the FPR. Some of the floating-point load instructions update GPR(RA) with the effective address. For these forms, if RA ≠ 0, the effective address is placed into GPR(RA) and the storage element (byte, halfword, word, or doubleword) addressed by EA is loaded into FPR(RT). If RA = 0, the instruction form is invalid. Floating-point load storage accesses cause data storage exceptions if the program is not allowed to read the storage location. Floating-point load storage accesses cause data TLB error exceptions if the program attempts to access storage that is unavailable. Note: RA and RB denote GPRs, while FRT denotes an FPR. Both big-endian and little-endian byte orderings are supported. Table 3-5. Floating-Point Load Instructions Mnemonic Operands Instruction lfd FRT, D(RA) Load floating-point double. lfdu FRT, D(RA) Load floating-point double with update. lfdux FRT, RA, RB Load floating-point double with update indexed. lfdx FRT, RA, RB Load floating-point double indexed. lfs FRT, D(RA) Load floating-point single. lfsu FRT, D(RA) Load floating-point single with update. lfsux FRT, RA, RB Load floating-point single with update indexed. lfsx FRT, RA, RB Load floating-point single indexed. lfiwax FRT, RA, RB Load floating-point as integer word algebraic indexed 3.4.3 Floating-Point Store Instructions There are three basic forms of store instruction: single-precision, double-precision, and integer. The integer form is provided by the stfiwx instruction. Because the FPRs support only floating-point double format for floating-point data, single-precision store floating-point instructions convert double-precision data to single format before storing the operand in storage. The conversion steps are as follows. Let WORD[0:31] be the word in storage written to. No Denormalization Required (includes Zero / Infinity / NaN) if FPR(FRS)[1:11] > 896 or FPR(FRS)[1:63] = 0 then WORD[0:1] ← FPR(FRS)[0:1] WORD[2:31] ← FPR(FRS)[5:34] Denormalization Required if 874 ≤ FRS[1:11] ≤ 896 then sign ← FPR(FRS)[0] exp ← FPR(FRS)[1:11] – 1023 frac ← 0b1 || FPR(FRS)[12:63] denormalize operand do while exp < –126 frac ← 0b0 || frac[0:62] exp ← exp + 1 Floating-Point Unit Programming Model Page 98 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core WORD[0] ← sign WORD[1:8] ← 0x00 WORD[9:31] ← frac[1:23] else WORD ← undefined Notice that if the value to be stored by a single-precision store floating-point instruction is larger in magnitude than the maximum number representable in single format, the first case (no denormalization required) applies. The result stored in WORD is then a well-defined value, but is not numerically equal to the value in the source register. The result of a single-precision load floating-point from WORD will not compare equal to the contents of the original source register. For double-precision store floating-point instructions and for the store floating-point as integer word instruction, no conversion is required because the data from the FPR are copied directly into storage. Some of the floating-point store instructions update GPR(RA) with the effective address. For these forms, if RA ≠ 0, the effective address is placed into GPR(RA). Floating-point store storage accesses cause a data storage interrupt if the program is not allowed to write to the storage location. Integer store storage accesses cause a data TLB error interrupt if the program attempts to access storage that is unavailable. Note: RA and RB denote GPRs, and FRS denotes an FPR. Both big-endian and little-endian byte orderings are supported. Table 3-6. Floating-Point Store Instructions Mnemonic Operands Instruction stfd FRS, D(RA) Store floating-point double. stfdu FRS, D(RA) Store floating-point double with update. stfdux FRS, RA, RB Store floating-point double with update indexed. stfdx FRS, RA, RB Store floating-point double indexed. stfiwx FRS, RA, RB Store floating-point as integer word indexed. stfs FRS, D(RA) Store floating-point single. stfsu FRS, D(RA) Store floating-point single with update. stfsux FRS, RA, RB Store floating-point single with update indexed. stfsx FRS, RA, RB Store floating-point single indexed. 3.4.4 Floating-Point Move Instructions These instructions copy data from one floating-point register to another, altering the sign bit (bit 0) as described in the instruction descriptions in the Power Instruction Set Architecture (ISA) Version 2.05 specification for fneg, fabs, and fnabs. These instructions treat NaNs just like any other kind of value (for example, the sign bit of an NaN can be altered by fneg, fabs, and fnabs). These instructions do not alter the FSPCR. Version 2.2 July 31, 2014 Floating-Point Unit Programming Model Page 99 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table 3-7. Floating-Point Move Instructions Mnemonic Operands Instruction fabs[.] FRT, FRB Floating absolute value. fmr[.] FRT, FRB Floating move register. fnabs[.] FRT, FRB Floating negative absolute value. fneg[.] FRT, FRB Floating negate. 3.4.5 Floating-Point Arithmetic Instructions These instructions perform elementary arithmetic operations. Table 3-8. Floating-Point Elementary Arithmetic Instructions Mnemonic Operands Instruction fadd[.] FRT, FRA, FRB Floating add. fadds[.] FRT, FRA, FRB Floating add single. fcfid[.] FRT, FRB Floating convert from integer doubleword fcpsgn[.] FRT, FRB Floating copy sign. fctid[.] FRT, FRB Floating convert to integer doubleword. fctiw[.] FRT, FRB Floating convert to integer word. fctiwz[.] FRT, FRB Floating convert to integer word with round toward zero fdiv[.] FRT, FRA, FRB Floating divide. fdivs[.] FRT, FRA, FRB Floating divide single. fmul[.] FRT, FRA, FRB Floating multiply. fmuls[.] FRT, FRA, FRB Floating multiply single. fre[.] FRT, FRB Float reciprocal estimate. fres[.] FRT, FRB Floating reciprocal estimate single. frsqrte[.] FRT, FRB Floating reciprocal square root estimate. frsqrtes[.] FRT, FRB Float reciprocal square root estimate single. fsqrt[.] FRT, FRB Float square root. fsqrts[.] FRT, FRB Float square root single. fsub[.] FRT, FRA, FRB Floating subtract. fsubs[.] FRT, FRA, FRB Floating subtract single. 3.4.5.1 Floating-Point Multiply-Add Instructions These instructions combine a multiply and an add operation without an intermediate rounding operation. The fraction part of the intermediate product is 106 bits wide (L bit, FRACTION), and all 106 bits take part in the add or subtract portion of the instruction. Floating-Point Unit Programming Model Page 100 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core FPSCR bits are set as follows: • Overflow, underflow, and inexact exception bits, the FR and FI bits, and the FPRF field are set based on the final result of the operation: not on the result of the multiplication. • Invalid operation exception bits are set as if the multiplication and the addition were performed using two separate instructions (fmul[s], followed by fadd[s] or fsub[s]. That is, multiplication of infinity by 0 or of anything by an SNaN, and addition of an SNaN, cause the corresponding exception bits to be set. Table 3-9. Floating-Point Multiply-Add Instructions Mnemonic Operands Instruction fmadd[.] FRT, FRA, FRB, FRC Floating multiply-add. fmadds[.] FRT, FRA, FRB, FRC Floating multiply-add single. fmsub[.] FRT, FRA, FRB, FRC Floating multiply-subtract. fmsubs[.] FRT, FRA, FRB, FRC Floating multiply-subtract single. fnmadd[.] FRT, FRA, FRB, FRC Floating negative multiply-add. fnmadds[.] FRT, FRA, FRB, FRC Floating negative multiply-add single. fnmsub[.] FRT, FRA, FRB, FRC Floating negative multiply-subtract. fnmsubs[.] FRT, FRA, FRB, FRC Floating negative multiply-subtract single. 3.4.6 Floating-Point Rounding and Conversion Instructions The floating-point rounding instructions are shown in Table 3-10. Table 3-10. Floating-Point Rounding and Conversion Instructions Mnemonic Operand Instruction frim[.] FRT, FRB Floating round to integer minus. frin[.] FRT, FRB Floating round to integer nearest. frip[.] FRT, FRB Floating round to integer plus. friz[.] FRT, FRB Floating round to integer toward zero. frsp[.] FRT, FRB Floating round to single-precision. 3.4.7 Floating-Point Compare Instructions The floating-point compare instructions compare the contents of two floating-point registers. Comparison ignores the sign of zero (+0 is treated as equal to –0). The comparison result can be ordered or unordered. The comparison sets one bit in the designated CR field to ‘1’ and the other three bits to ‘0’. FPSCR[FPCC] is set in the same way. The CR field and FPSCR[FPCC] are set as shown in Table 3-11 on page 102. Version 2.2 July 31, 2014 Floating-Point Unit Programming Model Page 101 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table 3-11. Comparison Sets Bit Name Description 0 FL (FRA) < (FRB) 1 FG (FRA) > (FRB) 2 FE (FRA) = (FRB) 3 FU (FRA) ? (FRB) (unordered) Table 3-12. Floating-Point Compare and Select Instructions Mnemonic Operands Instruction fcmpo BF, FRA, FRB Floating compare ordered. fcmpu BF, FRA, FRB Floating compare unordered. fsel[.] FRT, FRA, FRB, FRC Floating select. 3.4.8 Floating-Point Status and Control Register Instructions Every Floating-Point Status and Control Register instruction synchronizes the effects of all floating-point instructions executed by a given processor. Executing a Floating-Point Status and Control Register instruction ensures that all floating-point instructions previously initiated by the given processor have completed before the Floating-Point Status and Control Register instruction is initiated, and that no subsequent floatingpoint instructions are initiated by the given processor until the Floating-Point Status and Control Register instruction has completed. In particular: • All exceptions that will be caused by the previously initiated instructions are recorded in the FPSCR before the Floating-Point Status and Control Register instruction is initiated. • All invocations of the enabled exception type program interrupt that will be caused by the previously initiated instructions have occurred before the Floating-Point Status and Control Register instruction is initiated. • No subsequent floating-point instruction that depends on or alters the settings of any FPSCR bits is initiated until the Floating-Point Status and Control Register instruction has completed. Floating-point load and floating-point store instructions are not affected. Table 3-13 lists floating-point status and control register instructions. Table 3-13. Floating-Point Status and Control Register Instructions Mnemonic Operands mcrfs Instruction Move to condition register from FPSCR. mffs[.] FRT mtfsb0[.] BT Move to FPSCR bit 0. mtfsb1[.] BT Move to FPSCR bit 1. mtfsf[.] FLM, FRB Move to FPSCR fields. mtfsfi[.] BF, U Floating-Point Unit Programming Model Page 102 of 322 Move from FPSCR. Move to FPSCR field immediate. Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 4. Memory Management Unit 4.1 Overview The PowerPC 476FP memory management unit (MMU) provides cache control, access protection, and address translation. The MMU contains the unified translation lookaside buffer (UTLB), control logic, and registers that support the UTLB. The MMU interfaces with the execution unit (EU), the instruction cache unit (ICU), the data cache unit (DCU), and the TLB snoop interface. The EU interface provides the ability to perform translation lookaside buffer (TLB) operation instructions: tlbre, tlbwe, tlbsx, tlbivax (see Section 4.8 Software Considerations on page 127 for more information about these instructions). The instruction unit (IU) interface provides the translation space (TS) and DSIZ bits for a lookup request from the DCU, ICU, or TLB snoop. The ICU interface generates a lookup request to the UTLB on an instruction translation lookaside buffer (ITLB) miss. Similarly, the DCU interface generates a lookup request to the UTLB on a data translation lookaside buffer (DTLB) miss. The MMU is a software managed unit with hardware assistance available for replacing entries. Software is responsible for writing entries into the UTLB so that they can be read by using the hash function described in Section 4.3.2 UTLB Index Address Hash on page 106. Freescale-style MMU operation is not supported. Table 4-1 lists the MMU features of the PowerPC 476FP processor: Table 4-1. PowerPC 476FP Processor MMU Function PowerPC 476FP UTLB size 1024 entries. Memory array architecture SRAM1P. UTLB associativity 4-way set associative; reads four entries at a time. Data cache MMU access time Variable, from 6 to 30 cycles, depending on simultaneous requests and hashes used. Page sizes support 4 KB, 16 KB, 64 KB, 1 MB, 16 MB, 256 MB, 1 GB. Page descriptors WIMGE, U[0:3], IL1I, IL1D. See Table 4-5 on page 110 for a definition of these fields. Translation ID (TID) field 16 bits. Extended real page number (ERPN) 10 bits. UTLB search mechanism Search of up to seven hashes, optimized for page size, in an order set in an SPR supervisor or user register. 4.2 Address Translation A description of the MMU address translation is shown in Figure 4-1 on page 104. The 49-bit virtual address (VA) is formed by prepending the 32-bit effective address (EA) with a 1-bit address space (AS) and 16-bit process ID (PID). Using the AS, PID, and EA, the 10-bit extended real page number (ERPN), and 20-bit real page number (RPN) are obtained from the UTLB. These are concatenated together with a 12-bit offset to form the 42-bit real address (RA). The 4 K page size requires 20 bits of EA (and RPN) while the 1G page size only requires 2 bits of EA (and RPN), as shown in Figure 4-1 on page 104. Version 2.2 July 31, 2014 Memory Management Unit Page 103 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Figure 4-1. Address Mapping for each Page Size 0 31 19 20 EA 0 1 AS VA PID 0 RA 36 37 16 17 9 48 EA 10 n ERPN n+1 41 Offset RPN Page Size 0 4 KB 9 10 ERPN RPN 9 10 0 16 KB 9 10 0 ERPN RPN 9 10 0 ERPN 9 10 0 256 MB ERPN 0 1 GB 21 RPN ERPN 16 MB 25 9 10 0 1 MB 27 RPN ERPN 64 KB 29 17 RPN AS Address space from MSR[IS] or MSR[DS] 13 EA Effective address ERPN Extended real page number PID Process ID (or Process Identifier) RA Real address RPN Real page number VA Virtual address RPN 9 10 11 ERPN RPN 4.3 MMU Implementation Figure 4-2 MMU Block Diagram on page 105 shows the basic design of the MMU. It is a 4-way, set-associative memory structure with hashed addressing. The MMU performs request arbitration, and consists of a UTLB tag array, compare logic, and a UTLB data array. The PowerPC 476FP implementation supports a 1024-entry UTLB. Memory Management Unit Page 104 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Figure 4-2. MMU Block Diagram DCU Snoop EU ICU Two 256 × 95-bit SRAMs are used for tag Two 256 × 100-bit SRAMs are used for data 1024-Entry UTLB UTLB TAG 4-way, 256 sets (tlbwe, tlbsx, and tlbivax use hashing function) W0 W1 W2 UTLB Data 4-way, 256 sets W3 W0 W1 W2 W3 UTLB Index Address Hash tlbre Index Compare Logic MSR[PR] Supervisor Search Priority Configuration Register User Search Priority Configuration Register Hit DSIZ ERPN, * RPN, Attributes, Description *The ITLB and DTLB maximum page size is 256 MB. A 1 GB page must be converted into 256 MB granules. DCU DSIZ DTLB ERPN EU ICU Data cache unit Decoded Page size Data shadow translation lookaside buffer Extended real page number Execution unit Instruction cache unit Version 2.2 July 31, 2014 ITLB MSR[PR] RPN SRAM UTLB Instruction shadow translation lookaside buffer Machine State Register, problem state Real page number Static random access memory Unified translation lookaside buffer Memory Management Unit Page 105 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 4.3.1 Translation Lookaside Buffer The unified translation lookaside buffer (UTLB) is the hardware resource that controls translation, protection, and storage attributes. A single unified 1024-entry, 4-way-set-associative TLB is used for both instruction and data accesses. In addition, the PowerPC 476FP core implements two separate, smaller shadow TLB arrays, one for instruction fetch accesses and one for data accesses. These shadow TLBs improve performance by lowering the latency for address translation, and by reducing contention for the main unified TLB between instruction fetching and data storage accesses. Maintenance of TLB entries is under software control. System software determines the TLB entry replacement strategy or hardware assisted replacement, and use of any page table information. A TLB entry contains all of the information required to identify the page, specify the address translation, control the access permissions, and designate the storage attributes. A TLB entry is written by copying information from a GPR and the MMUCR[STID] field, using a series of three tlbwe instructions. A TLB entry is read by copying the information into a GPR and the MMUCR[STID] field, using a series of three tlbre instructions. Software can also search for specific TLB entries using the tlbsx[.] instruction. The PowerPC 476FP core also allows software to invalidate each TLB entry using either tlbivax or tlbwe instruction. UTLB access method, look-up operation, and attributes and access control information for ERPN, RPN, and storage are described in the subsequent sections. 4.3.2 UTLB Index Address Hash To increase the UTLB use and to provide better distribution for use, a hash function is implemented that indexes the UTLB arrays. This exclusive OR (XOR) based hash function is used when an entry is searched by an instruction such as the TLB search indexed (tlbsx) instructions, or when it is invalidated by local or remote tlbivax operations. The hash is bypassed on TLB read entry (tlbre) operations, because the tlbre instruction provides both the way and the index address. For example, a 4 KB page with a 16-bit PID of x‘00D5’ and a 20-bit effective address (EA) of x‘E9B6C’ must be placed at an 8-bit UTLB index address of x‘E0’. This is calculated as shown here: UTLB index address bit 7 = PID[15] XOR EA[19] XOR EA[7]. UTLB index address bit 6 = PID[14] XOR EA[18] XOR EA[6]. UTLB index address bit 5 = PID[13] XOR EA[17] XOR EA[5]. UTLB index address bit 4 = PID[12] XOR EA[16] XOR EA[4]. UTLB index address bit 3 = PID[11] XOR EA[15] XOR EA[11] XOR EA[3]. UTLB index address bit 2 = PID[10] XOR EA[14] XOR EA[10] XOR EA[2]. UTLB index address bit 1 = PID[9] XOR EA[13] XOR EA[9] XOR EA[1]. UTLB index address bit 0 = PID[8] XOR EA[12] XOR EA[8] XOR EA[0]. Each of the UTLB indexes hold up to four entries, referred to as way 0 - 3. Memory Management Unit Page 106 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Different page size hashes with different, nonoverlapping effective addresses can arrive at the same UTLB index address. For example, a 4 KB hash used for an EA of x‘000F0’ translates to an index address of x‘F0’. A 1 MB hash used for a nonoverlapping EA of x‘F0000’ also translates to an index address of x‘F0’. This is not a problem, because way 0 can be used for the 4 KB page, and way 1 for the 1 MB page. However, software must take this into account when setting up the entries in the UTLB to avoid unintentionally overwriting entries. Table 4-2 UTLB Set Address Generation Hashing Function on page 107 shows the PID and EA address bits that are used. Table 4-2. UTLB Set Address Generation Hashing Function PID UTLB PID Bits Index Bit PID ≠ ‘0’ PID = ‘0’ Effective Address Bits For Each Page Size 4 KB 16 KB 64 KB 1 MB 16 MB 256 MB 1 GB 7 31 19, —, 7 17, —, 7 15, 7 11, — 7 — — 6 30 18, —, 6 16, —, 6 14, 6 10, — 6 — — 5 29 17, —, 5 15, —, 5 13, 5 9, — 5 — — 4 28 16, —, 4 14, —, 4 12, 4 8, — 4 — — 3 27 15, 11, 3 13, —, 3 11, 3 7, 3 3 3 — 2 26 14, 10, 2 12, —, 2 10, 2 6, 2 2 2 — 1 25 13, 9, 1 11, 9, 1 9, 1 5, 1 1 1 1 0 24 12, 8, 0 10, 8, 0 8, 0 4, 0 0 0 0 7 — 19, —, 7 17, —, 7 15, 7 11, — 7 — — 6 — 18, —, 6 16, —, 6 14, 6 10, — 6 — — 5 — 17, —, 5 15, —, 5 13, 5 9, — 5 — — 4 — 16, —, 4 14, —, 4 12, 4 8, — 4 — — 3 — 15, 11, 3 13, —, 3 11, 3 7, 3 3 3 — 2 — 14, 10, 2 12, —, 2 10, 2 6, 2 2 2 — 1 — 13, 9, 1 11, 9, 1 9, 1 5, 1 1 1 1 0 — 12, 8, 0 10, 8, 0 8, 0 4, 0 0 0 0 Note: A dash (—) indicates that an EA is not used in the hash calculation for the page size. 4.3.3 Initialize a Single UTLB Entry Hardware initializes one UTLB entry at reset. This entry is set up to access privileged cache inhibited and guarded space at the 4 GB location (top of the 4 GB space). This entry corresponds to the reset vector of the processor at x‘FFFF FFFC’, and has the following characteristics: Index address x‘F0’ (4 KB hash equivalent to PID[0:15] = x‘0000’ and EA[0:19] = x‘FFFFF’) Way 3 EPN x‘FFFFF’ TS ‘0’ DSIZ ‘000000’ (4 KB page) RPN x‘FFFFF’ Version 2.2 July 31, 2014 Memory Management Unit Page 107 of 322 User’s Manual PowerPC 476FP Embedded Processor Core WIMG ‘0101’ IL1I, IL1D ‘11’ UX, UW, UR ‘000’ SX, SW, SR ‘101’ ERPN From chip implementation-specific configuration values. U0-U3 From chip implementation-specific configuration values. E From chip implementation-specific configuration values. Initializing this single UTLB entry duplicates the function of driving the same values to the ICU and DCU during reset, initializing the reset vector entry into the ITLB and DTLB. However, the entry in the ITLB and DTLB can be invalidated by snooping an msync, isync, rfi, or CSI (context switching instruction) before software writes entries into the UTLB. Writing the reset vector entry into the UTLB ensures that an ITLB and DTLB miss finds a matching entry in this case. 4.3.4 Tag Array The hashed index address (only the tlbre index address is not hashed) is presented to the tag array. The tag array contains the information required to determine if a UTLB request from the EU, ICU, DCU, or snoop interface matches a valid entry. The tag array consists of two SRAM1Ps, each 256 × 94 bits. Each tag entry is 44-bits plus 3 bits of parity. Therefore, one SRAM1P stores tag way 0 and way1, and the other SRAM1P stores way 2 and way 3. The index address is presented to both SRAM1Ps in the same clock so that a comparison of all four ways can be performed at the same time in a subsequent clock. The information stored in the tag array is listed in Table 4-3. Odd parity is stored when CCR1[MMUTPEI] = ‘0’. Even parity is stored when CCR1[MMUTPEI] = ‘1’. Table 4-3. UTLB Tag Field Description (Page 1 of 2) WS Bits Name Size 0 0:19 EPN 20 Effective page number. The EPN, with the DSIZ, defines the page (both size and starting address) that this UTLB entry represents. Unused bits in this field due to a larger than minimum DSIZ are ignored. 0 20 EPNPar 1 The parity bit that covers the EPN. 0 21 Valid 1 If set, the remaining fields describe a valid UTLB entry. 0 22 TS 1 Translation space. This entry only matches if the translation space bit matches the request bit. 0 23:28 DSIZ 6 This field describes the page size for this entry. Supported page sizes are 4 KB (minimum size), 16 KB, 64 KB, 1 MB, 16 MB, 256 MB, and 1 GB. Entry Page Size ‘000000’ 4 KB ‘000001’ 16 KB ‘000011’ 64 KB ‘000111’ 1 MB ‘001111’ 16 MB ‘011111’ 256 MB ‘111111’ 1 GB Note: This decoding is used so that logic can use dedicated bits to conditionally enable comparators on the appropriate address bits. Memory Management Unit Page 108 of 322 Description Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Table 4-3. UTLB Tag Field Description (Page 2 of 2) WS Bits Name Size Description 0 29 DSIZPar 1 The parity bit that covers the Valid, TS, and DSIZ fields. 0 30:45 TID 16 Translation ID (MMUCR[0:15]). This field describes the process ID for which this entry is valid. If TID = ‘0’, it considered a match with the PID. If TID != ‘0’, the TID field must match the PID for a search to come back as a positive match. 0 46 TIDPar 1 The parity bit that covers the TID. 4.3.5 Comparison A virtual address to a TLB entry match is found when the valid, TS, EPN, and TID tags have following values: • Valid == 1. • TS == the requested address space (AS). • EPN == the requested EA, where the number of bits compared depends on the DSIZ tag. • TID != 0. • TID == the requested process identifier (PID). Table 4-4 defines the number of EPN and EA that are compared, depending on the page size. Table 4-4. EPN and EA Comparison Page Size DSIZ Comparison 4 KB ‘000000’ EPN[0:19] == EA[0:19] 16 KB ‘000001’ EPN[0:19] == EA[0:17] 64 KB ‘000011’ EPN[0:19] == EA[0:15] 1 MB ‘000111’ EPN[0:19] == EA[0:11] 16 MB ‘001111’ EPN[0:19] == EA[0:7] 256 MB ‘011111’ EPN[0:19] == EA[0:3] 1 GB ‘111111’ EPN[0:19] == EA[0:1] The ITLB and DTLB maximum page size is 256 MB. Therefore, a 1 GB page must be converted into 256 MB granules. This is done automatically by hardware. 4.3.6 Data Array The data array contains the address translation information, storage control, and permission bits for the entry. The data array also consists of two SRAM1Ps, each 256 × 100 bits. Each data entry is 47 bits, plus 3 bits of parity. Therefore, one SRAM1P stores data way 0 and way 1, and the other SRAM1P stores way 2 and way 3. The tag index address is latched and presented to both data SRAM1Ps in the same clock so that data is available to be latched back to the requesting unit if a comparison of the tag results in a match. The information stored in the data array is listed in Table 4-5 on page 110. Odd parity is stored when CCR1[MMUDPEI] = ‘0’. Even parity is stored when CCR1[MMUDPEI] = ‘1’. Version 2.2 July 31, 2014 Memory Management Unit Page 109 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table 4-5. UTLB Data Field Description WS Bits Name Size Description 1 0:19 RPN 20 Real page number. This field contains the real address bits that replace the EPN of the search address. The entire RPN is used for 4 KB pages. For larger pages, the least significant bits of the RPN are unused. However, the unused bits must be set to 0 or indeterminate results will occur. 1 20 RPNPar 1 The parity bit that covers the RPN. 1 21:30 ERPN 10 Extended real page number. This field contains the 10 most significant bits of the real address. They are always used and prepended to the RPN. 1 31 ERPNPar 1 The parity bit that covers the ERPN. 2 32:36 WIMGE 5 This field describes the page type for this entry at the L2 cache level. W Write through (L1 is always write-through). I Cache inhibited. M Coherent page (all L1 pages are coherent. G Guarded access. E Endianness (this describes L1 page). 2 37 IL1I 1 Inhibit L1 instruction. If set, this page is treated as cache-inhibited for the L1 cache, regardless of the I bit used for the L2 attribute. 2 38 IL1D 1 Inhibit L1 data. If set, this page is treated as cache-inhibited for the L1 cache, regardless of the I bit used for the L2 attribute. 2 39:42 U 4 User defined bits. This field can be used at the system level for any purpose. 2 43 UX 1 User execute-permission bit. 2 44 UW 1 User write-permission bit. 2 45 UR 1 User read-permission bit. 2 46 SX 1 Supervisor execute-permission bit. 2 47 SW 1 Supervisor write-permission bit. 2 48 SR 1 Supervisor read-permission bit. 2 49 StorPermPar 1 The parity bit that covers storage attributes and permissions. 4.3.6.1 Hardware Enforced I = 1 = IL1I = IL1D If software sets I = 1 on a tlbwe instruction (WS = 2), the MMU hardware forces IL1I = IL1D = 1. 4.3.7 Writing UTLB Entries Software places entries in the UTLB by using the tlbwe instruction. The operating system must understand the details of the page size hashes to select process IDs and effective addresses to obtain the best utilization within the UTLB. Each tlbwe instruction provides the page size, PID, and EA. The tlbwe can also specify one of the four ways or allow the hardware to select the way to be written. Memory Management Unit Page 110 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Because the processor is a 32-bit architecture, and UTLB tag and data entries require 91 bits, a sequence of three tlbwe instructions (WS = 0, WS = 1, and WS = 2) is required to write a new UTLB entry. On the first tlbwe, with WS = 0, values are latched in the MMUCR for valid (V), tlb index, and tlb way. These latched tlb index and way fields are used in the two subsequent tlbwe instructions with WS = 1 and WS = 2 to select the tlb entry. In addition, the V bit is set to the latch value (LVALID) when WS = 2. This way, the MMU supports atomic writes of UTLB entries. Note: WS is a field that designates which word of the TLB entry is to be transferred (that is, WS = 0 specifies TLB word 0, and so on). 4.3.8 Bolted UTLB Entries The MMU enables software to specify up to six UTLB bolted entries. These entries are typically used for pages containing the operating system kernel or interrupt handler. Bolted entries are automatically avoided by hardware-assisted way selection. Bolted entries are also protected from both local and remote tlbivax instructions. Bolted entries can be overwritten through software with the way specifically provided by the tlbwe instruction. The location of a bolted entry within MMU Bolted Entries Registers (MMUBE0 or MMUBE1) is specified with the tlbwe instruction, in RA[5:7]. When written, the index address of a bolted entry can be obtained by a mfspr instruction from MMUBE0 or MMUBE1. Only way 0 can support a bolted entry, so software must ensure that bolted entries are not placed at the same UTLB index address. 4.3.9 Hardware Assisted Way Selection To reduce the burden in software of understanding where entries are placed, and to automatically avoid overwriting bolted entries, the hardware provides assistance by selecting the way that can be written by the next tlbwe. Each UTLB index address has a corresponding 2-bit counter. A tlbwe, WS = 0 instruction specifies that the counter value is used for way selection when RA[0] = ‘0’. The counter is incremented each time there is a tlbwe, WS = 2 when the corresponding tlbwe, WS = 0 had V = 1. The counter is reset to a way when that way receives a tlbwe, WS = 0, V = 0. This places the next entry written to that index to the way that was just “vacated’ by the tlbwe, WS = 0, V = 0. Similarly, the counter also resets to a way when that way is invalidated by a local or snooped tlbivax instruction. The counter automatically skips a bolted entry. For example, if the counter points to way 3 and way 0 contains a bolted entry, the counter increments to way 1 after the next tlbwe, WS = 2. 4.3.10 Searching UTLB Entries UTLB entries are searched by the ICU in response to an instruction-side TLB miss, by the DCU in response to a data-side TLB miss, and by the EU in response to the tlbsx instruction. The PowerPC 476FP core implements two shadow TLB arrays: one for instruction fetches and one for data accesses. These arrays shadow the value of a subset of the entries in the main, UTLB (the UTLB in the context of this discussion). The purpose of the shadow TLB arrays is to reduce the latency of the address translation operation and to avoid contention for the UTLB array between instruction fetches and data accesses. Both shadow TLBs (ITLB and DTLB) contain eight entries. No latency is associated with accessing the shadow TLB arrays, and instruction execution continues in a pipelined fashion provided that the requested address is found in the shadow TLB. If the requested address is not found in the shadow TLB, the instruction fetch or data storage access is automatically stalled while the address is looked up in the UTLB. If the Version 2.2 July 31, 2014 Memory Management Unit Page 111 of 322 User’s Manual PowerPC 476FP Embedded Processor Core address is found in the UTLB, the penalty associated with the miss in the shadow array is five cycles if there is no contention. If the address is also a miss in the UTLB, an instruction or data TLB miss exception is reported. The replacement of entries in the shadow TLBs is managed by hardware in a round-robin fashion. Upon a shadow TLB miss that leads to a UTLB hit, the hardware casts out the oldest entry in the shadow TLB and replaces it with the new translation. The hardware also invalidates all of the entries in both of the shadow TLBs upon any context synchronization. Context synchronizing operations follow: • Any interrupt (including machine check) • Execution of isync • Execution of rfi, rfci or rfmci • Execution of sc Note that there are other context changing operations that do not cause automatic context synchronization in the hardware. For example, execution of a tlbwe instruction changes the UTLB contents but does not cause a context synchronization, and thus, does not invalidate or otherwise update the shadow TLB entries. For changes to the entries in the UTLB (or to other address-related resources such as the PID) to be reflected in the shadow TLBs, software must ensure that a context synchronizing operation occurs before any attempt to use any address associated with the updated UTLB entries (either the old or new contents of those entries). By invalidating the shadow TLB arrays, a context synchronizing operation forces the hardware to refresh the shadow TLB entries with the updated information in the UTLB as each memory page is accessed. 4.3.10.1 Instruction-Side and Data-Side TLB Miss Searches When an instruction-side or data-side TLB miss occurs, the ICU or DCU presents the EA with the request. The AS bit is driven by the ICU or IU (for the DCU). The EA, with an ICU or DCU-latched version of the PID register, is hashed to obtain the UTLB index address, based on the page-size hash-order specified in either the Supervisor Search Priority Configuration Register (SSPCR) or the User Search Priority Configuration Register (USPCR). Because it takes four cycles to determine if there is a matching entry in the tag array, subsequent page size hashes specified in either the SSPCR or USPCR are used to pipeline index addresses to the tag array. Entries placed in the UTLB with TID[0:15] = x‘0000’ are considered global pages. They match whether the request PID[0:15] = x‘0000’ or not. However, because part of the PID is used in the hash, it is possible for a global page to exist in the UTLB, even though the TLB index does not match the EA-PID hash. This is handled by going through the search order twice when the requested PID[0:15] does not equal x‘0000’: once using a value of x‘0000’ instead of the actual PID, and again using the actual PID[0:15] value. UTLB entries are searched by the EU in response to a tlbsx instruction. The EU presents the effective address (EA) with the request. The EA, along with the set translation ID (STID) field set in the MMUCR, is hashed to obtain the UTLB index address, based on the page size hash order specified in the SSPCR. When a matching UTLB entry is located, the way and index address are returned to the EU. Software can compare the index address returned to the MMUBE0 and MMUBE1 SPRs to determine if the returned TLB entry is a bolted entry. Software can then use the tlbre instruction to read the contents of the UTLB entry. Memory Management Unit Page 112 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 4.3.11 Reading UTLB Entries The UTLB entry can be read by the EU using the tlbre instruction. The way and index address specified in the tlbre instruction must first be obtained by a tlbsx instruction. Three tlbre instructions are required to read the entry information stored in the tag (WS = 0) and data (WS = 1 and WS = 2) arrays. 4.3.12 Invalidating UTLB Entries A UTLB entry can be invalidated by the tlbwe instruction with WS = 0 and V = 0. A UTLB entry can also be invalidated by a local or remote (snooped) tlbivax instruction. A local tlbivax instruction is received as an EU request, with a corresponding effective address (EA). The EA, along with the STID set in the MMUCR SPR, is hashed to obtain the UTLB index address, based on the page size hash order specified in the ISPCR. If a matching entry is found in the UTLB, the corresponding V bit in the tag is written to ‘0’, unless that entry is bolted. If the tlb entry is bolted. It is not invalidated. A remote tlbivax instruction is received as a snoop request. The MMU holds the SnpAvail signal active until a second snoop request is sampled active. Thus, two snoop requests can be serviced at the same time. Each snoop request is presented with the corresponding AS, PID, and EA fields. The EA and PID fields are hashed to obtain the UTLB index address, based on the page size hash order specified in the ISPCR. If a matching entry is found in the UTLB, the corresponding V bit in the tag is written to ‘0’, unless that entry is bolted. If the tlb entry is bolted. It is not invalidated. If the index address of the matching entry is equivalent to the value in the latched index address (LINDEX), then the LVALID bit is cleared, again unless the entry is bolted. This allows an entry that is partially written to be snoop invalidated. 4.4 Access Control When a matching TLB entry has been identified and the address has been translated, the access control mechanism determines whether the program has execute, read, write, or read and write access to the page the address refers to. 4.4.1 Execute Access The UX or SX bit of a TLB entry controls execute access to a page of storage, depending on the operating mode (user or supervisor) of the processor. User mode (MSR[PR] = ‘1’) Instructions can be fetched and executed from a page in storage while in user mode if the UX access control bit for that page is equal to ‘1’. If the UX access control bit is equal to ‘0’, instructions from that page are not fetched and will not be placed into any cache as the result of a fetch request to that page while in user mode. Furthermore, if the sequential execution model calls for the execution in user mode of an instruction from a page that is not enabled for execution in user mode (that is, UX = ‘0’ when MSR[PR] = ‘1’), an execute access control exception type instruction storage interrupt is taken (see Section 7 Processor Interrupts and Exceptions on page 167 for more information). Version 2.2 July 31, 2014 Memory Management Unit Page 113 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Supervisor Mode (MSR[PR] = ‘0’) Instructions can be fetched and executed from a page in storage while in supervisor mode if the SX access control bit for that page is equal to ‘1’. If the SX access control bit is equal to ‘0’, instructions from that page are not fetched and will not be placed into any cache as the result of a fetch request to that page while in supervisor mode. Furthermore, if the sequential execution model calls for the execution in supervisor mode of an instruction from a page that is not enabled for execution in supervisor mode (that is, SX = ‘0’ when MSR[PR] = ‘0’), an execute access control exception type instruction storage interrupt is taken (see Section 7 Processor Interrupts and Exceptions on page 167 for more information). 4.4.2 Write Access The UW or SW bit of a TLB entry controls write access to a page, depending on the operating mode (user or supervisor) of the processor. User mode (MSR[PR] = ‘1’) Store operations (including the store-class cache management instructions dcbz and dcbtst) are permitted to a page in storage while in user mode if the UW access control bit for that page is equal to ‘1’. If execution of a store operation is attempted in user mode to a page for which the UW access control bit is ‘0’, a write access control exception occurs. If the instruction is an stswx with string length 0, no interrupt is taken and no operation is performed. For all other store operations, execution of the instruction is suppressed and a data storage interrupt is taken. Although the dcbi cache management instruction is a store-class instruction, its execution is privileged and thus will not cause a data storage interrupt if execution of it is attempted in user mode (a privileged instruction exception type program interrupt will occur instead). Supervisor mode (MSR[PR] = ‘0’) Store operations (including the store-class cache management instructions dcbz, dcbtst, and dcbtstls) are permitted to a page in storage while in supervisor mode if the SW access control bit for that page is equal to ‘1’. If execution of a store operation is attempted in supervisor mode to a page for which the SW access control bit is ‘0’, a write access control exception occurs. If the instruction is a stswx with string length 0, no interrupt is taken and no operation is performed. For all other store operations, execution of the instruction is suppressed and a data storage interrupt is taken. 4.4.3 Read Access The UR or SR bit of a TLB entry controls read access to a page, depending on the operating mode (user or supervisor) of the processor. User mode (MSR[PR] = ‘1’) Load operations (including the load-class cache management instructions dcbst, dcbf, dcbt, icbi, and icbt) are permitted from a page in storage while in user mode if the UR access control bit for that page is equal to ‘1’. If execution of a load operation is attempted in user mode to a page for which the UR access control bit is ‘0’, a read access control exception occurs. If the instruction is a load (not including lswx with string length 0) Memory Management Unit Page 114 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core or is a dcbst, dcbf, or icbi, execution of the instruction is suppressed and a data storage interrupt is taken. However, if the instruction is an lswx with string length 0, or is a dcbt or icbt, no interrupt is taken and no operation is performed. Supervisor mode (MSR[PR] = ‘0’) Load operations (including the load-class cache management instructions dcbst, dcbf, dcbt, dcbi, dcbtls, dcblc, icbi, icbt, and icblc) are permitted from a page in storage while in supervisor mode if the SR access control bit for that page is equal to ‘1’. If execution of a load operation is attempted in supervisor mode to a page for which the SR access control bit is ‘0’, a read access control exception occurs. If the instruction is a load (not including lswx with string length 0) or is a dcbst, dcbf, dcbi or icbi, execution of the instruction is suppressed and a data storage interrupt is taken. However, if the instruction is an lswx with string length 0, or is a dcbt or icbt, no interrupt is taken and no operation is performed. 4.4.4 Access Control Applied to Cache Management Instructions This section summarizes how each of the cache management instructions is affected by the access control mechanism. dcbz This instruction is treated as a store with respect to access control because it changes the data in a cache block. As such, it can cause write access control exception type data storage interrupts. dcbi This instruction is treated as a load with respect to access control because it can change the value of a storage location by invalidating the current copy of the location in the data cache, effectively restoring the value of the location to the former value that is contained in memory. As such, it can cause write access control exception type data storage interrupts. dcba This instruction is treated as a no-op under all circumstances, and thus cannot cause any form of data storage interrupt. icbi This instruction is treated as a load with respect to access control. As such, it can cause read access control exception type data storage interrupts. This instruction can cause a data storage interrupt (and not an instruction storage interrupt), even though it otherwise would perform its operation on the instruction cache. Instruction storage interrupts are associated with exceptions that occur upon the fetch of an instruction whereas data storage interrupts are associated with exceptions that occur upon the execution of a storage access or cache management instruction. dcbt and icbt These instructions are treated as loads with respect to access control. As such, they can cause read access control exceptions. However, because these instructions act merely as hints that the specified cache block will likely be accessed by the processor in the near future, such exceptions do not result in a data storage interrupt. Instead, if a read access control exception occurs, the instruction is treated as a no-op. dcbtst This instruction is treated as a store with respect to access control. As such, it can cause store access control exceptions. However, because this instruction is intended to act merely as a hint that the specified cache block will likely be accessed by the processor in the near future, such exceptions do not result in a data storage interrupt. Instead, if a read access control exception occurs, the instruction is treated as a no-op. dcbf and dcbst These instructions are treated as loads with respect to access control. As such, they can cause read access control exception type data storage interrupts. Flushing or storing a dirty line from the cache is not considered a store because an earlier store operation has already updated the cache line, and the dcbf or dcbst instruction is simply causing the results of that earlier store operation to be propagated to memory. Version 2.2 July 31, 2014 Memory Management Unit Page 115 of 322 User’s Manual PowerPC 476FP Embedded Processor Core dci and ici These instructions do not generate an address. Also, the access control mechanism does not affect these instructions. They are privileged instructions, and if executed in supervisor mode, they flash invalidate the entire associated cache. 4.5 Storage Attributes Each TLB entry specifies a number of storage attributes for the memory page with which it is associated. Storage attributes affect the manner in which storage accesses to a given page are performed. The storage attributes (and their corresponding TLB entry fields) are: • Write-through (W) • Caching inhibited (I) • Memory coherence required (M) • Guarded (G) • Endianness (E) • User-definable (U0, U1, U2, U3) All combinations of these attributes are supported except combinations that simultaneously specify a region as write-through and caching inhibited. 4.5.1 Write-Through (W) The PowerPC 476FP processor data cache ignores the write-through attribute. The data for all store operations are written to memory, as opposed to only being written into the data cache. If the referenced line also exists in the data cache (that is, the store operation is a hit), then the data is also written into the data cache. An alignment exception occurs if a dcbz instruction targets a memory page that is either write-through required or caching inhibited. A data storage exception occurs if a lwarx, stwcx., or instruction targets a memory page that is either write-through required or caching inhibited. See Section 5 Instruction and Data Caches on page 133 for more information on the handling of accesses to write-through storage. 4.5.2 Caching Inhibited (I) If a memory page is marked as caching inhibited (I = 1), then all load, store, and instruction fetch operations perform their access in memory, as opposed to in the respective cache. If I = 0, then the page is cacheable and the operations may be performed in the cache. An alignment exception occurs if a dcbz instruction targets a memory page that is either write-through required or caching inhibited. A data storage exception occurs if a lwarx, stwcx., or instruction targets a memory page that is either write-through required or caching inhibited. It is a programming error for the target location of a load, store, dcbz, or fetch access to caching inhibited storage to be in the respective cache; the results of such an access are undefined. It is not a programming error for the target locations of the other cache management instructions to be in the cache when the caching inhibited storage attribute is set. The behavior of these instructions is defined for both I = 0 and I = 1 storage. See Section 5 for more information about the handling of accesses to caching-inhibited storage. Memory Management Unit Page 116 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 4.5.3 Hardware Enforced IL1I and IL1D The inhibit L1 instruction (IL1I) field and inhibit L1 data (IL1D) field indicate the page will be treated as cacheinhibited for the L1 cache, regardless of the I bit used for the L2 cache attribute. Table 4-6. Access Control Applied to Cache Management Instructions Instruction dcbf, dcbst, icbi dcbi dcbt, icbt dcbtst, dcbz dcba Notes Generates read-protection permission violation interrupt if page UR = ‘0’. Generates read-protection permission violation interrupt if page UR = ‘0’. If MSR[PR] = ‘1’, a privileged or illegal instruction error exception occurs. These instructions are no-ops if UR = ‘0’. Generates write-protection permission violation INTR if page UW = ‘0’. This instruction is a no-op. 4.5.4 Memory Coherence Required (M) The memory coherence required (M) storage attribute is defined by the architecture to support cache and memory coherency within multiprocessor shared memory systems. If a TLB entry is created with M = 1, any storage accesses to the page associated with that TLB entry are indicated, using the corresponding transfer attribute interface signal, as being memory coherence required, but the setting has no effect on the operation within the PowerPC 476FP processor. 4.5.5 Guarded (G) The guarded storage attribute is provided to control speculative access to non-well-behaved memory locations. Storage is said to be well behaved if the corresponding real storage exists and is not defective, and if the effects of a single access to it are indistinguishable from the effects of multiple identical accesses to it. As such, data and instructions can be fetched out of order from well-behaved storage without causing undesired side effects. In general, storage that is not well behaved should be marked as guarded. Because such storage might represent a control register on an I/O device or might include locations that do not exist, an out-of-order access to such storage might cause an I/O device to perform unintended operations or may result in a machine-check exception. For example, if the input buffer of a serial I/O device is memory-mapped, then an out-of-order or speculative access to that location could result in the loss of an item of data from the input buffer, if the instruction execution is interrupted and later reattempted. A data access to a guarded storage location is performed only if either the access is caused by an instruction that is known to be required by the sequential execution model, or the access is a load and the storage location is already in the data cache. Once a guarded data storage access is initiated, if the storage is also caching inhibited then only the bytes specifically requested are accessed in memory, according to the operand size for the instruction type. Data storage accesses to guarded storage that is marked as cacheable can access the entire cache block, either in the cache itself or in memory. To avoid unintended results, the storage should be guarded and cache-inhibited to maintain a well-behaved storage model. Version 2.2 July 31, 2014 Memory Management Unit Page 117 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Instruction fetch is not affected by guarded storage. While the architecture does not prohibit instruction fetching from guarded storage, system software should generally prevent such instruction fetching by marking all guarded pages as no-execute (UX/SX = 0). Then, if an instruction fetch is attempted from such a page, the memory access will not occur and an execute access control exception type instruction storage interrupt will result if and when execution is attempted for an instruction at any address within the page. See Section 5 Instruction and Data Caches on page 133 for more information about the handling of accesses to guarded storage. 4.5.6 Endian (E) The endian (E) storage attribute controls the byte ordering with which load, store, and fetch operations are performed. Byte ordering refers to the order in which the individual bytes of a multiple-byte scalar operand are arranged in memory. The operands in a memory page with E = 0 are arranged with big-endian byte ordering, which means that the bytes are arranged with the most-significant byte at the lowest-numbered memory address. The operands in a memory page with E = 1 are arranged with little-endian byte ordering, which means that the bytes are arranged with the least-significant byte at the lowest-numbered address. 4.5.7 User-Definable (U0 - U3) The PowerPC 476FP core provides four user-definable (U0 - U3) storage attributes that can be used to control system-dependent behavior of the storage system. By default, these storage attributes do not have any effect on the operation of the PowerPC 476FP core, although all storage accesses indicate to the memory subsystem the values of U0 - U3 using the corresponding transfer attribute interface signals. The specific system design can then take advantage of these attributes to control some system-level behaviors. 4.5.8 Supported Storage Attribute Combinations Storage modes where both W = 1 and I = 1 (that would represent write-through but caching inhibited storage) are not supported. For all supported combinations of the W and I storage attributes, the G, E, and U0 - U3 storage attributes can be used in any combination. 4.5.9 Aliasing For multiple pages that are mapped to the same real address the following rules apply: 1. If the multiple pages exist on a single processor, then: The I bits (I, IL1I, IL1D) must match the corresponding I bits on all pages (see note below). The W bits do not need to match on all pages. The M bits do not need to match on all pages. In such a case, it is the software’s responsibility to maintain data coherency. 2. If the multiple pages exist on multiple processors, then: The I bits (I, IL1I, IL1D) do not need to match on all pages. The W bit must match on all pages. (Book E requirement). The M bits do not need to match on all pages. In such a case, it is the software’s responsibility to maintain data coherency. Note: For multiple pages that exist on a single processor that map to the same real address, the I bits (I, IL1I, IL1D) do not need to match under the following conditions that must be guaranteed by software: Memory Management Unit Page 118 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 1. For those pages where the I bit is zero, the page must be marked as guarded and no execute to prevent speculative accesses. 2. For those addresses where the cacheability attributes are different software must ensure that only those pages where all I bits are the same access the overlapped real address. Alternatively, software can manage the cache appropriately between different cacheability accesses to guarantee that an access to any I = 1 is not found in the associated cache. When the I bit is a one, the data must not be in any level of cache. For example consider a cacheable 64 KB page and a noncacheable 4 KB page (the smallest page size for the PowerPC 476FP core) that both map to the same real address (for example the 4 KB page maps to the last 4 KB of real addresses that the 64 KB page maps to). In this case the 64 KB page is marked as guarded and cacheable. In addition, software must ensure that when operating in the 64 KB page no accesses are performed to the last 4 KB addresses. 4.6 MMU Registers Table 4-7 summarizes the MMU Special Purpose Registers (SPRs) available to software. In the rest of this section, these SPRs are described in detail. Table 4-7. MMU SPR Summary Name Reset Value SPRN PID x‘XXXX XXXX’ x‘030’ Processor ID Register. RMPD x‘XXXX XXXX’ x‘339’ Real Mode Page Description Register. MMUBE0 x‘XXXX XXXX’, ‘000’ x‘334’ MMU Bolted Entries 0 Register. MMUBE1 x‘XXXX XXXX’, ‘000’ x‘335’ MMU Bolted Entries 1 Register. SSPCR x‘0000 000X’ x‘33E’ Supervisor Search Priority Configuration Register. USPCR x‘0000 000X’ x‘33F’ User Search Priority Configuration Register. ISPCR x‘0000 000X’ x‘33D’ Invalidate/Search Priority Configuration Register RSTCFG strapped x‘39B Reset Configuration Register. MMUCR ‘00000’, ‘X’, x‘000 0000’ x‘3B2’ MMU Configuration Register. Version 2.2 July 31, 2014 Description Memory Management Unit Page 119 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 4.6.1 Process ID Register (PID) Reserved 0 1 2 3 4 5 6 Bits Field Name 0:15 Reserved 16:31 PID 7 8 PID 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Description Process ID. This field is used for trace broadcast by the MMU, with EA[0:19], to hash an index address into the UTLB. 4.6.2 Real Mode Page Description Register (RMPD) 0 1 2 3 4 5 6 7 8 9 Field Name 0:1 Reserved 2:11 ERPN 12 Reserved 13 W 14 I Cache inhibited. 15 M Memory coherency required. 16 G Guarded. 17 E Endian. 18 IL1I L1 instruction cache inhibit. 19 IL1D L1 data cache inhibit. 20:21 Reserved 22 SX Supervisor execute permission. 23 SR Supervisor read permission. 24 SW Supervisor write permission. 25 UX User execute permission. 26 UR User read permission. 27 UW User write permission. 28:31 U Page 120 of 322 G E SX SR SW UX UR U 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Bits Memory Management Unit M UW I Reserved W IL1D ERPN IL1I Reserved Reserved Real mode paging is used in debugging only and must be used with care. Thus, this register is reserved for debugging by designers. Description Extended real page number. Write through. U0 - U3. User-defined bits. Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 0 1 2 3 4 5 6 7 8 9 IBE2 Reserved VBE2 IBE1 VBE1 IBE0 VBE0 4.6.3 MMU Bolted Entries 0 Register (MMUBE0) 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Bits Field Name Description 0:7 IBE0 Read only. UTLB index address for bolted entry 0. 8:15 IBE1 Read only. UTLB index address for bolted entry 1. 16:23 IBE2 Read only. UTLB index address for bolted entry 2. 24:28 Reserved 29 VBE0 Valid bit for bolted entry 0. 0 There is no bolted entry at the index address in IBE0. 1 There is a bolted entry in way 0 at the index address in IBE0. 30 VBE1 Valid bit for bolted entry 1. 0 There is no bolted entry at the index address in IBE1. 1 There is a bolted entry in way 0 at the index address in IBE1. 31 VBE2 Valid bit for bolted entry 2. 0 There is no bolted entry at the index address in IBE2. 1 There is a bolted entry in way 0 at the index address in IBE2. 0 1 2 3 4 5 6 7 8 9 IBE5 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Bits Field Name 0:7 IBE3 Read only. UTLB index address for bolted entry 3. 8:15 IBE4 Read only. UTLB index address for bolted entry 4. 16:23 IBE5 Read only. UTLB index address for bolted entry 5. 24:28 Reserved Description 29 VBE3 Valid bit for bolted entry 3. 0 There is no bolted entry at the index address in IBE3. 1 There is a bolted entry in way 0 at the index address in IBE3. 30 VBE4 Valid bit for bolted entry 4. 0 There is no bolted entry at the index address in IBE4. 1 There is a bolted entry in way 0 at the index address in IBE4. 31 VBE5 Valid bit for bolted entry 5. 0 There is no bolted entry at the index address in IBE5. 1 There is a bolted entry in way 0 at the index address in IBE5. Version 2.2 July 31, 2014 Reserved VBE5 IBE4 VBE4 IBE3 VBE3 4.6.4 MMU Bolted Entries 1 Register (MMUBE1) Memory Management Unit Page 121 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 4.6.5 Search Priority Configuration Registers There are three sets of registers to control UTLB look-up/search priority. Two registers, SSPCR and USPCR, are used for instruction-side TLB and data-side TLB misses. The SSPCR is assigned for supervisor/privileged mode (MSR[PR] = ‘0’), and the USPCR is used for problem/user mode (MSR[PR] = ‘1’). Another register, the Invalidate/Search Priority Configuration Register (ISPCR), is used for local tlbsx and tlbivax operations, and for incoming snoops resulting from external tlbivax operations. Separating the registers reduces the number of pages searched to the minimum, improving performance by reducing search latency. All three sets of registers are written by software when the UTLB is set up. For example, in user mode, if there are many 4 KB pages, several 64 KB pages, and a few 256 MB pages, the USPCR first searches using the 4 KB hash, then the 64 KB hash, and finally, the 256 MB hash. 4.6.6 Supervisor Search Priority Configuration Register (SSPCR) This register is used when MSR[PR] = ‘0’. Figure 4-3 Supervisor Search Priority Configuration Registers on page 123 illustrates how this register works. ORD1 0 1 2 ORD2 3 4 5 6 ORD3 7 8 9 ORD4 ORD5 ORD6 ORD7 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Bits Field Name 0:3 ORD1 Order 1. See Figure 4-3 on page 123 for code values and page sizes for all these fields. 4:7 ORD2 Order 2. 8:11 ORD3 Order 3. 12:15 ORD4 Order 4. 16:19 ORD5 Order 5. 20:23 ORD6 Order 6. 24:27 ORD7 Order 7. 28:31 Reserved Memory Management Unit Page 122 of 322 Reserved Description Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Figure 4-3. Supervisor Search Priority Configuration Registers 0 34 Order 1 78 Order 2 Binary Order Code 11 12 Order 3 15 16 Order 4 19 20 Order 5 23 24 Order 6 27 28 Order 7 31 Reserved Page Size Searched 4 KB 16 KB 64 KB 1 MB 16 MB 256 MB 1 GB Use input PID If input PID ≠ 0, the first search uses PID = 0. Then repeat the search with input PID. No search, with the exception of ‘order 1’ where ‘0000’ indicates a 4 KB page. x001 x010 x011 x100 x101 x110 x111 0xxx 1xxx 0000 Search priority order: 1. Order 1 2. Order 2 3. Order 3 4. Order 4 5. Order 5 6. Order 6 7. Order 7 Any order code having ‘0000’ stops the search. For example, if order 1 = ‘0010’, order 2 = ‘0001’, and order 3 = ‘0000’, then hardware stops the search after order 2. A value of ‘0000’ for order 1 searches using 4 KB hash. 4.6.7 Invalidate Search Priority Configuration Register (ISPCR) 0 1 2 3 4 5 6 Bits Field Name 0 Reserved 1:3 ORD1 4 Reserved 5:7 ORD2 Version 2.2 July 31, 2014 7 8 9 ORD5 ORD6 Reserved ORD4 Reserved ORD3 Reserved ORD2 Reserved ORD1 Reserved Reserved Reserved The ISPCR is used for local tlbsx and tlbivax instructions, and for incoming snoops that result from external tlbivax instructions. Figure 4-4 on page 124 illustrates how this register works. ORD7 Reserved 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Description Order 1. Order 2. Memory Management Unit Page 123 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Bits Field Name 8 Reserved 9:11 ORD3 12 Reserved 13:15 ORD4 16 Reserved 17:19 ORD5 20 Reserved 21:23 ORD6 24 Reserved 25:27 ORD7 28:31 Reserved Description Order 3. Order 4. Order 5. Order 6. Order 7. Figure 4-4. Invalidate Search Priority Configuration Register Binary Order Code 001 010 011 100 101 110 111 000 Page Size Searched 4 KB 16 KB 64 KB 1 MB 16 MB 256 MB 1 GB No search, with the exception of ‘order 1’‚ where ‘0000’ indicates a 4 KB page. Search priority order: 1. Order 1 2. Order 2 3. Order 3 4. Order 4 5. Order 5 6. Order 6 7. Order 7 Any order code having ‘0000’ stops the search. For example, if order 1 = ‘010’, order 2 = ‘001’, and order 3 = ‘000’, then hardware stops the search after order 2. A value of ‘000’ for order 1 searches using a 4 KB hash. Memory Management Unit Page 124 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 4.6.8 User Search Priority Configuration Register (USPCR) This register is used when MSR[PR] = ‘1’. Figure 4-5 User Search Priority Configuration Registers (USPCR) on page 126 illustrates how this register works. ORD1 0 1 2 ORD2 3 4 5 6 ORD3 7 8 9 ORD4 ORD5 ORD6 ORD7 Reserved 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Bits Field Name 0:3 ORD1 Order 1. See Figure 4-5 on page 126 for code values and page sizes for all these fields. 4:7 ORD2 Order 2. 8:11 ORD3 Order 3. 12:15 ORD4 Order 4. 16:19 ORD5 Order 5. 20:23 ORD6 Order 6. 24:27 ORD7 Order 7. 28:31 Reserved Version 2.2 July 31, 2014 Description Memory Management Unit Page 125 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Figure 4-5. User Search Priority Configuration Registers (USPCR) 0 34 Order 1 78 Order 2 11 12 Order 3 15 16 Order 4 19 20 Order 5 23 24 Order 6 27 28 Order 7 31 Reserved Page Size Searched Binary Order Code x001 x010 x011 x100 x101 x110 x111 0xxx 1xxx 0000 4 KB 16 KB 64 KB 1 MB 16 MB 256 MB 1 GB Use input PID If input PID ≠ 0, the first search uses PID = 0. Then, repeat the search with input No search, with the exception of ‘order 1’ where ‘0000’ indicates a 4 KB page. Search priority order: 1. Order 1 2. Order 2 3. Order 3 4. Order 4 5. Order 5 6. Order 6 7. Order 7 Any order code having ‘0000’ stops the search. For example, if order 1 = ‘0000’, order 2 = ‘0001’, and order 3 = ‘0000’, hardware stops the search after order 2. A value of ‘0000’ for order 1 searches using a 4 KB hash. 4.6.9 Reset Configuration Register (RSTCFG) See Section 2.7.7 Reset Configuration (RSTCFG) on page 74. 3 4 5 6 Bits Field Name 0 REALE Memory Management Unit Page 126 of 322 LINDEX 7 8 9 STS Reserved 2 IULXE 1 DULXE 0 LWAY LVALID REALE 4.6.10 MMU Configuration Register (MMUCR) STID 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Description Real mode enable. 0 Address translation, storage, and permission bits are sourced from the UTLB. 1 Address translation, storage, and permission bits are sourced from RMPD. Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Bits Field Name Description 1:2 LWAY Latched way. Read only. Set by tlbwe with word select (WS) = 0. Set to RA[1:2] when RA[0] = ‘1’ and RA[4] = ‘0’. Set to ‘00’ when RA[4] = ‘1’ to ensure that bolted entries are placed in way 0. Set to the hardware assist value when RA[0] = ‘0’. Used for way by tlbwe with WS = 1 and WS = 2. 3 LVALID Latched valid. Read only. Set by tlbwe with WS = 0 to the V bit value. Cleared on tlbwe WS = 2 or by a tlbivax matching LINDEX before tlbwe WS = 2. 4 DULXE Data-side user locked line exception enable. 0 User mode attempt to execute dcbf does not generate an exception. 1 User mode attempt to execute dcbf generates an exception. 5 IULXE Instruction-side user locked line exception enable. 0 User mode attempt to execute icbi does not generate an exception. 1 User mode attempt to execute icbi generates an exception. 6 Reserved 7:14 LINDEX 15 STS Set translation space. Set by software before tlbwe. 16:31 STID Set translation ID. Set by software before tlbwe. Written to the UTLB TID field during tlbwe, WS = 0. Used in the index address hash during tlbsx and tlbivax (local). Latched index address. Read only. Set by tlbwe with WS = 0. Used for the index address by tlbwe with WS = 1 and WS = 2. 4.7 UTLB Block Descriptions The UTLB design can be broken into the following parts: • Tag array • Data array • TLB coherency The arrays are arranged so that four entries can be read at a time. 4.7.1 Tag Array The tag array contains the necessary information to determine if a UTLB request from either the EU, ICU, DCU, or snoop interface matches a valid entry. 4.8 Software Considerations The PowerPC 476FP UTLB is a software managed entity. Initializing entries and invalidating entries (except from external tlbivax) require explicit software instructions. Typical UTLB searches, resulting from misses in the ITLB or DTLB, are handled through the hardware. Four instructions must be handled by the software: tlbsx, tlbre, tlbwe, tlbivax. Version 2.2 July 31, 2014 Memory Management Unit Page 127 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 4.8.1 TLB Search Indexed (tlbsx) The tlbsx instruction provides a translation space (through MMUCR[STS]), translation ID (through MMUCR[STID]), and an effective address (EA), used to search the UTLB for a matching entry. The UTLB is searched following the order specified in ISPCR. If a match is found, the way and index address are returned. The format of the tlbsx instruction is shown here: tlbsx RT, RA, RB Rc The effective address is computed as described here: EA = (RA) + (RB) if RA = 0, EA = 0 + (RB) if Rc = 1 CR[CR0[0]] ← 0 CR[CR0[1]] ← 0 CR[CR0[3]] ← XER[SO] The virtual-address equals MMUCR[STS] | MMUCR[STID] | EA[0:n], where n = 19 for a 4 KB page. If a valid entry with the following matches are found in the UTLB: MMUCR[STS] matches tlbentry[TS], and If tlbentry[TID] ≠ ‘0’, MMUCR[STID] matches tlbentry[TID], and EA[0:19] matches tlbentry EPN[0:19] then RT[33:34] ← Way hit, driven on MMU_iwbRdData[1:2] RT[40:47] ← index address of the matching TLB entry, driven on MMU_iwbRdData[8:15] if Rc = 1 CR[CR0[2]] ← 1 else (RT) ← undefined if Rc = 1 CR[CR0[2]] ← 0 4.8.2 TLB Read Entry (tlbre) The tlbre instruction is used to read the contents of an entry. A tlbsx must be performed before a tlbre instruction to obtain the UTLB way and index address. The format of the tlbre instruction is shown here: tlbre RT, RA, WS TLB[(RA)33:34] indicates the tlbentry way to be read. TLB[(RA)40:47] indicates the tlbentry index address. Software can determine if the entry is bolted by comparing it to the MMUBE0 and MMUBE1 Registers. if WS = 0 RT[32:60] ← tlbentry[EPN(0:19), V, TS, DSIZ(0:5), BLTD] if CCR0[CRPE] = 0 Memory Management Unit Page 128 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core RT[61:63] ← 000 else RT[61:63] ← tlbentry[EPNPar, DSIZPar, TIDPar] MMUCR[STID] ← tlbentry[TID] else if WS = 1 RT[32:51] ← tlbentry[RPN] RT[54:63] ← tlbentry[ERPN] if CCR0[CRPE] = 0 RT[52] ← 0 RT[53] ← 0 else RT[52] ← tlbentry[RPNPar] RT[53] ← tlbentry[ERPNPar] else if WS = 2 RT[46:56] ← tlbentry[IL1I, IL1D, U(0:3),W,I,M,G,E], driven on MMU_iwbRdData[14:24] RT[58:63] ← tlbentry[UX,UW,UR,SX,SW,SR] if CCR0[CRPE] = 0 RT[32] ← 0’b0 else RT[32] ← Storage/Permission Parity else (RT), MMUCR[STID] ← undefined 4.8.3 TLB Write Entry (tlbwe) The tlbwe instruction is used by software to place entries in the UTLB. Software can choose to specify the way to be written, or to use the hardware assist way counters. If the way is specified in software, the entry can be bolted, which protects it from subsequent hardware assisted way selection writes, or tlbivax (local or remote). Software can choose to invalidate entries by using tlbwe to clear the valid bit. Bolted entries invalidated in this manner must either specify that the entry is bolted (with RA[36] = ‘1’) or specify way 0 in the tlbwe instruction (with RA[32:34] = ‘100’). The format of the instruction is as follows: tlbwe RS, RA, WS If TLB[(RA)36] = 1, tlbentry way to be written is '0' else if TLB[(RA)36] = 0 and TLB[(RA)32] = 1, tlbentry way is specified by TLB[(RA)33:34] else provided by the way counter corresponding to the index address If TLB[(RA)36] = 1, tlbentry to be written is bolted. TLB[(RA)37:39] specifies the bolted entry, 000 - 101 if WS = 0 UTLB index address obtained by hashing input EPN[0:19] and DSIZ with MMUCR STID[8:15] UTLB way, index address, and valid bit are latched into MMUCR[LWAY], MMUCR[LINDEX], and MMUCR[LVALID] for use with WS = 1 and WS = 2 if V-bit = ‘1’, tlbentry[EPN, V, TS, DSIZ] ← RS[32:59] EPN, TS, and DSIZ are written to the UTLB. The V bit is not written until WS = 2. tlbentry[TID] ← MMUCR[STID] if V-bit = ‘0’, Version 2.2 July 31, 2014 Memory Management Unit Page 129 of 322 User’s Manual PowerPC 476FP Embedded Processor Core tlbentry[V] ← ‘0’ else if WS = 1 tlbentry[RPN] ← RS[32:51] tlbentry[ERPN] ← RS[54:63] write to UTLB way and index address specified by MMUCR[LWAY] and MMUCR[LINDEX] else if WS = 2 tlbentry[IL1I,IL1D,U(0:3),W,I,M,G,E] ← RS[46:56] tlbentry[UX,UW,UR,SX,SW,SR] ← RS[58:63] write to UTLB way and index address specified by MMUCR[LWAY] and MMUCR[LINDEX] tlbentry[V] ← MMUCR[LVALID] else tlbentry ← undefined Parity bits are automatically calculated in hardware and entered into the tag and data arrays of the UTLB. 4.8.4 TLB Invalidate, Virtual Address Indexed (tlbivax) The tlbivax instruction searches the UTLB and, if a match is found, invalidates the first matching entry. The format of the tlbivax instruction is shown here: tlbivax RA, RB The effective address is computed as follows: EA = (RA) + (RB) if RA = 0, EA = 0 + (RB) If a valid entry with the following matches is found in the UTLB, and that entry is not listed as a bolted entry in the MMUBE0 or MMUBE1 registers, then that entry is invalidated by writing its V bit to 0: • MMUCR[STS] to tlbentry[TS], and • if tlbentry[TID] ≠ 0, MMUCR[STID] to tlbentry[TID], and • EA[0:19] to tlbentry EPN[0:19] It uses the hash search order defined by ISPCR. Because the software sets the MMUCR[STID], there is no need to force a nonzero PID to be equal to 0 for the first set of searches. The MMUCR[STS], MMUCR[STID], and EA[0:19] is also broadcast from the core, through the L2, and on to the PLB6 bus such that other coherent processors can invalidate a matching entry. The tlbivax instruction will not invalidate the shadow TLBs (ITLB and DTLB). This can be accomplished with an additional tlbsync instruction. 4.9 UTLB Coherency Support for UTLB coherency involves four operations: • • • • local tlbivax local tlbsync (handled by the execution unit, generate idle signal) remote tlbivax remote tlbsync (handled by L2, generate idle signal) Memory Management Unit Page 130 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core The MMU responds to a local tlbivax instruction by invalidating a matching unbolted UTLB entry. The SYNC unit responds to a local tlbivax instruction by broadcasting the corresponding MMUCR[STS], MMUCR[STID], and EA[0:19] through the L2 and onto the PLB6 bus. Software, more specifically, the kernel, manages the PowerPC 476FP UTLB. It is imperative that tlbivax and tlbsync are executed and operated one-at-a-time in the system. Multiple processors never simultaneously process or execute these instructions. 4.10 tlbsync Special Operations The master processor executes the following instruction sequence to ensure invalidation of TLB entries and synchronize the effects of tlbivax: mtmmucr RS isync tlbivax ... isync mbar tlbsync msync Sets STS and STID fields for the tlbivax operation. Ensures MMUCR is complete. Invalidates a TLB entry. A sequence of a number of mtmmucr and tlbivax instructions occurs. Flushes the local ITLB and DTLB to force synchronization with the UTLB. Ensures the subsequent tlbsync instruction will not bypass any of the tlbivax instructions. Synchronizes remote processors and ensures that all processors complete any pending tlbivax instructions. Ensures that all tlbivax and tlbsync instructions complete before starting the next instruction. 4.10.1 Remote tlbsync operation A remote tlbsync operation is heavy in that the L2 cache ensures the processor completes the tlbsync instruction before accepting the subsequent msync instruction. The msync is retried until the tlbsync is complete. When the tlbsync is complete, the system can detect all storage operations, including instruction and fetches. During the remote tlbsync operation, the processor ensures the completion of all preceding operations including instruction fetches, and ensures that the context sync operation is completed. 4.10.1.1 CPU Remote tlbsync Operation The remote tlbsync operation sequence follows: 1. Detect the remote tlbsync by the IU through the SYNC unit. 2. Flush to clear all uncommitted instructions and stop fetching any instructions. The IU will prevent the ICU from fetching. Both the ITLB and DTLB shadow TLBs are invalidated (context switch operation). 3. A pseudo remote tlbsync operation is issued to the LRACC to mark the pseudo operation through the EU and the DCU. Because this is a pseudo operation, there is no confirm and commit. 4. Ensure all read and write operations are completed. 5. A write command of x‘F’ is issued to the SYNC unit. Version 2.2 July 31, 2014 Memory Management Unit Page 131 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 6. The SYNC unit sends the command to the L2 cache indicating the remote tlbsync operation is complete. When the SYNC unit receives an acknowledgement from the L2 cache (or becomes available), the SYNC unit then indicates to the IU that the remote tlbsync operation is complete. 7. The IU sends a fetch request to the ICU to resume instruction executions. 4.10.1.2 L2 Cache Remote tlbsync Operations The L2 cache must ensure that all preceding remote tlbivax and tlbsync operations are completed by the remote accompanying processor before accepting the subsequent msync instructions. This remote L2 cache does not guarantee the performing of loads and stores that might have used the previous translations. The remote tlbsync operation sequence follows: 1. Detect the remote tlbsync instruction on the PLB6. 2. Set up a flag to mark a remote tlbsync instruction in progress. Any subsequent remote msync is retried until the tlbsync is completed by the accompanying processor. 3. The L2 cache completes all pending requests from the accompanying processor, including instruction fetches. The L2 cache continues all normal operations except the msync instruction from PLB6. 4. The L2 cache completes the tlbsync operation by resetting the tlbsync flag when the accompanying processor returns the lwsync, x‘F’ write command. 5. The L2 cache accepts the remote msync and acknowledgement. Memory Management Unit Page 132 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 5. Instruction and Data Caches The PowerPC 476FP core provides separate level 1 (L1) instruction cache (I-cache) and L1 data cache (Dcache) controllers and arrays. These controllers and arrays allow concurrent access and minimize pipeline stalls. The cache arrays are 32 KB each. Both cache controllers have 32-byte lines. Both cache controllers are four-way set associative. The PowerPC 476FP core implementation also provides special debug instructions that can directly read the data arrays and the tag. Both the instruction controllers and data cache controllers interface to the level 2 (L2) cache. The L2 cache interface consists of a 256-bit shared read bus (reads from the L2 cache) and a 128-bit write bus (writes to the L2 cache). Both caches support symmetrical multiprocessor (SMP) coherency through a processor local bus 6 (PLB6) interconnect, and allow up to eight coherent masters and processors. Both caches are Power Instruction Set Architecture (ISA) Version 2.05 compliant to ease programming. Both caches are parity-protected against soft errors. If such errors are detected, the processor vectors to the machine check interrupt handler where software can take appropriate action. The rest of this section provides more detailed information about the operation of the instruction and data cache controllers and arrays. 5.1 Cache Array Organization and Operation The instruction and data cache arrays are organized identically. However, the fields of the tag and data portions of the arrays are slightly different because the functions of the arrays differ. Both instruction cache and data cache are real address (RA) tagged. The associativity of each cache is 4-way set-associative. Each cache has 256sets, and the line size of each cache is 32 bytes. Table 5-1 illustrates generically the ways and sets of the cache arrays, and Table 5-2 on page 134 provides specific values for the parameters used in Table 5-1. Table 5-1. Instruction and Data Cache Array Organization Line 0 Line n Line 2n Line (w – 1)n Way 0 Way 1 Way 2 Way 3 Set 1 Line 1 Line n + 1 Line (w – 2)n + 1 Line (w – 1)n + 1 • • • • • • • • • • • • • • • Set 254 Line n – 2 Line 2n – 2 Line (w – 1)n – 2 Line wn – 2 Set 255 Line n – 1 Line 2n – 1 Line (w – 1)n – 1 Line wn – 1 Set 0 As shown in Table 5-2, the tag field for each line in each way holds the high-order address bits associated with the line that currently resides in that way. The middle-order address bits form an index to select a specific set of the cache, and the five lowest-order address bits form a byte-offset to choose a specific byte (or bytes, depending on the size of the operation) from the 32-byte cache line. Version 2.2 July 31, 2014 Instruction and Data Caches Page 133 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table 5-2. Cache Size and Parameters Cache Size Ways (w) Sets (n) Tag Address Bits Set Address Bits Byte Offset Address Bits 32 KB 4 256 RA[0:18] RA[29:36] RA[37:41] or EA[27:31] The tag address bits shown inTable 5-2 refer to the RA bits and are for illustrative purposes only. Because the instruction cache is tagged with the virtual address, and the data cache is tagged with the real address, the actual tag address bits contained within each array are different. See Section 5.2.2.9 Instruction Cache Debug Tag Register Low (ICDBTRL) on page 137 and Section 5.2.2.10 Instruction Cache Debug Tag Register High (ICDBTRH) on page 138 for instruction cache tag information. See Section 5.5.22 Data Cache Debug Tag Register Low (DCDBTRL) on page 152 and Section 5.5.23 Data Cache Debug Tag Register High (DCDBTRH) on page 153 for data cache tag information. 5.2 Instruction Cache Controller The instruction cache unit (ICU) delivers four instructions per cycle to the instruction unit (IU) of the PowerPC 476FP core. The ICU also handles the execution of the PowerPC instruction cache management instructions, for touching (prefetching) or invalidating cache lines, or for flash invalidation of the entire cache. Resources for controlling and debugging the instruction cache operation are also provided. 5.2.1 I-Cache Operations The instruction cache can accept four types of instruction cache operations: • icbt (instruction cache block touch), including icbtls (instruction cache block touch and lock set) and icblc (instruction cache block lock clear) • icbi (instruction cache block invalidate) • ici (instruction cache invalidate) • icread (instruction cache read) Also, the instruction cache supports parity operations. The instruction cache contains parity bits and multihit detection hardware to protect against soft data errors. 5.2.2 Instruction Cache Parity Operations The instruction cache contains parity bits and multihit detection hardware to protect against soft data errors. Two types of errors can be detected by the instruction cache parity logic. In the first type, the parity bits stored in the RAM array are checked against the appropriate data in the instruction cache line when the RAM line is read for an instruction fetch. Note that a parity error is not signaled as a result of an icread instruction. The second type of parity error that can be detected is a multihit. This type of error occurs when a tag address bit is corrupted, leaving two tags in the instruction cache array that match the same input address. Multihit errors can be detected on any instruction fetch. No parity errors of any kind are detected on speculative fetch lookups or icbt lookups. Rather, such lookups are treated as cache hits and cause no further action until an instruction fetch lookup at the offending address causes an error to be detected. Instruction and Data Caches Page 134 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core If a parity error is detected, and the MSR[ME] is asserted (that is, machine check interrupts are enabled), the processor vectors to the machine check interrupt handler. As is the case for any machine check interrupt, after vectoring to the machine check handler, the MCSRR0 contains the value of the oldest uncommitted instruction in the pipeline at the time of the exception, and MCSRR1 contains the old MSR context. The interrupt handler can query the Machine Check Status Register (MCSR) to determine if it was called because of an instruction cache parity error, and then must invalidate the instruction cache using the iccci instruction. The handler returns to the interrupted process using the rfmci instruction. If parity checking and machine check interrupts are enabled, instruction cache parity errors are always recoverable. Also note that the machine check interrupt is asynchronous; that is, the return address in the MCSRR0 does not point at the instruction address that contains the parity error. Rather, the machine check interrupt is taken as soon as the parity error is detected. Some instructions in progress are flushed and reexecuted after the interrupt, just as if the machine were responding to an external interrupt. 5.2.2.1 Instruction Cache Block Lock Clear (icblc) The icblc instruction clears the lock bit for a given line in the instruction cache. If the CT field is set to ‘0’ (for L1 cache), icblc clears the lock bit in the least recently used (LRU), valid, or lock array in the instruction cache. If the CT field is set to 2, the icbic instruction sends a request to clear the lock in the L2. The target line remains valid in the cache. 5.2.2.2 Instruction Cache Block Invalidate (icbi) If the block containing the byte addressed by the EA is in the instruction cache of any processors, the block (cache line) is invalidated in the instruction cache. This instruction is broadcast to all processors on the PLB. 5.2.2.3 Instruction Cache Invalidate (ici) This instruction invalidates the entire L1 instruction cache. The ici instruction is not sent to the L2 cache. The ici instruction generates an exception if it is not executed in supervisor mode. If the CT field of the instruction is 2, the instruction is treated as a no-op. The Power Instruction Set Architecture (ISA) specifies that software must place an isync instruction after the ici instruction to invalidate any instructions that might have already been fetched from the previous contents of the instruction cache after the isync. 5.2.2.4 icbt The icbt instruction is a hint to establish a specified line in the cache, but the operation is not guaranteed to establish the cache line. Therefore, the cache line can be snooped out, or the line can be flushed with an icbi instruction. When an icbt is received, the ICU checks the line fill buffers, fetch queue, and the tag to determine if the requested line is already in the cache. If the requested line is not in the cache, the request drops into the fetch queue. This operation uses the CT field of the instruction. If CT is set to ‘0’ and the request misses in the instruction cache, a fill buffer is requested and the request is sent to the L2 cache. No action is required for a cache hit if CT = ‘0’. When CT = 2, no action is taken for the instruction cache, but the request is made to the L2 cache. Because icbt does not guarantee the data is written to the cache, the instruction cache controller does not require a response from the L2 cache controller. Version 2.2 July 31, 2014 Instruction and Data Caches Page 135 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 5.2.2.5 icbtls The icbtls instruction operates similarly to the icbt, except it locks the data in the cache on a write. This means along with the read request, a lock signal is sent to the L2 on a fetch. For lines that are not already in the cache, the lock bit is set on the line fill. This instruction also makes use of the CT field. When CT = 2, only the line in the instruction cache is locked. If the line is already in the instruction cache, the line is locked and no request is made to the L2 cache. When CT = 2, a control signal is sent to the L2 cache controller indicating the line is also to be locked in the L2. Again the instruction cache does not require a response from the L2 cache. 5.2.2.6 icread This instruction reads the content of a specified physical location in the instruction cache and stores the data into debug registers. The cache controller does no address translation or exception processing for this instruction. Because only the content of a specific cache location is accessed, the icread request uses a modified format for the EA. Table 5-3 describes the EA format for the icread instruction. Table 5-3. EA Format icread Address Bits Description 0:16 Unused. 17:18 Instruction cache way. 19:26 Instruction cache index. 27:29 Word address within L1 instruction cache line. 30:31 Unused. Note: The icread instruction is not sent to the L2 cache. Note: The PowerPC 476FP core does not automatically synchronize context between an icread instruction and the subsequent mfspr instructions that read the results of the icread instruction into general purpose registers (GPRs). To guarantee that the mfspr instructions obtain the results of the icread instruction, a sequence such as the following example must be used: icread regA,regB isync mficdbdr0 mficdbdr1 mficdbtrh mficdbtrl regC regD regE regF # # # # # # # Read cache information. The contents of GPR A and GPR B are added, and the result is used to specify a cache line index to be read. Ensure icread is completed before attempting to read results. Move instruction information into GPR C. Move instruction information into GPR D. Move the high portion of the tag into GPR E. Move the low portion of the tag into GPR F. The following special purpose registers (SPRs) are written with the icread operation: • ICDBDR0 • ICDBDR1 • ICDBTRL • ICDBTRH Instruction and Data Caches Page 136 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 5.2.2.7 Instruction Cache Debug Data Register 0 (ICDBDR0) ICDBDR0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Bits 0:31 Description Instruction word. 5.2.2.8 Instruction Cache Debug Data Register 1 (ICDBDR1) Instruction predecode bits 0 1 2 3 4 5 6 Parity 7 8 9 Reserved 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Bits Description 0:7 Instruction predecode bits. 8:9 Parity. 10:31 Reserved. 0 1 2 LRU 3 4 5 6 Bit Field Name 0:3 LRUV 4:9 LRU 10:13 LOCK 14:15 Reserved 16:18 LRUP 19:27 Reserved 28 CONF 29:31 Reserved Version 2.2 July 31, 2014 7 LOCK 8 9 LRUP Reserved CONF LRUV Reserved 5.2.2.9 Instruction Cache Debug Tag Register Low (ICDBTRL) Reserved 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Description The LRU valid bits. One bit for each way in the set. The LRU value for the set. Lock bits. One bit for each way in the set. LRU parity. 16 Even parity for array bits [0:5]. 17 Even parity for array bits [6:9]. 18 Even parity for array bits [10:14]. Way conflict bit. Instruction and Data Caches Page 137 of 322 User’s Manual PowerPC 476FP Embedded Processor Core ADDR 0 1 2 3 4 5 6 7 8 9 VALID 5.2.2.10 Instruction Cache Debug Tag Register High (ICDBTRH) TAGP EXTADDR 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Bit Field Name Description 0:18 ADDR Tag address. 19 VALID Valid bit for this entry. This bit is the tag valid bit and obtained from the LRU array. 20:21 TAGP Tag parity bits. 22:31 EXTADDR Extended tag address. 5.2.2.11 Instruction Cache Parity Operations The instruction cache contains parity bits and multihit detection hardware to protect against soft data errors. Two types of errors can be detected by the instruction cache parity logic. In the first type, the parity bits stored in the RAM array are checked against the appropriate data in the instruction cache line when the RAM line is read for an instruction fetch. Note that a parity error is not signaled as a result of an icread instruction. The second type of parity error that can be detected is a multihit. This type of error occurs when a tag address bit is corrupted, leaving two tags in the instruction cache array that match the same input address. Multihit errors can be detected on any instruction fetch. No parity errors of any kind are detected on speculative fetch lookups or icbt lookups. Rather, such lookups are treated as cache hits and cause no further action until an instruction fetch lookup at the offending address causes an error to be detected. If a parity error is detected, and MSR[ME] is asserted (that is, machine check interrupts are enabled), the processor vectors to the machine check interrupt handler. As is the case for any machine check interrupt, after vectoring to the machine check handler, the MCSRR0 contains the value of the oldest uncommitted instruction in the pipeline at the time of the exception, and MCSRR1 contains the old MSR context. The interrupt handler can query the Machine Check Status Register (MCSR) to determine if it was called because of an instruction cache parity error and then must invalidate the instruction cache using the iccci instruction. The handler returns to the interrupted process using the rfmci instruction. If parity checking and machine check interrupts are enabled, instruction cache parity errors are always recoverable. Also note that the machine check interrupt is asynchronous; that is, the return address in the MCSRR0 does not point at the instruction address that contains the parity error. Rather, the machine check interrupt is taken as soon as the parity error is detected. Some instructions in progress are flushed and reexecuted after the interrupt, just as if the machine were responding to an external interrupt. 5.2.3 Speculative Prefetch In general, all instructions are fetched speculatively. The ICU fetches (or prefetches) two code streams of instructions: one for the sequential stream and another for a branch predicted stream. Because the ICU submits four instructions at a time, it accesses the subsequent cache line or branch predicted target instruction cache line even though it does not detect whether the program code requires those instructions. Thus, the instructions are speculatively prefetched. Instruction and Data Caches Page 138 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core The processor limits speculative fetches because overly-speculative fetches might reduce the cache usage by removing or replacing cache lines with unnecessary prefetched cache lines. All fetches from the L2 cache are performed using real (physical) addresses. 5.2.4 Exceptions The instruction cache generates three different types of exceptions: • Instruction storage interrupt • Instruction-side unified translation lookaside buffer (UTLB) miss • Instruction-side machine check These exceptions are passed to the IU, where the instruction is tagged as a special instruction-side exception. It is then propagated to the instruction pipeline writeback stage as a faulty commitment performed by the IU. These three exceptions are mutually exclusive. The faulty commitment consists of four no-op instructions passed to the decode-and-issue (DISS) and marked with the error. These instructions remain valid until the flush. Subsequent fetches to the L2 cache are blocked. Any data the L2 cache returns with machine-check status is not written into the cache. Snoops, though, continue to be processed. 5.2.4.1 Instruction Storage Interrupt The instruction cache only generates an instruction storage interrupt (ISI) during an execute protection violation instruction. This happens when a page requested from the memory management unit (MMU) is returned without the execute permissions. 5.2.4.2 Instruction-Side UTLB Miss The instruction cache generates an instruction-side UTLB miss when an MMU request for an instruction-side TLB (ITLB) entry results in a UTLB miss. 5.2.4.3 Instruction-Side Machine Check The instruction cache generates an instruction-side machine check whenever a hardware error in the ICU is detected. This can be a read error from the L2 or a parity error from a number of sources. The instruction cache contains the Instruction Cache Error Syndrome Register (ICESR), an SPR, to differentiate between the various encountered errors. ICESR is in supervisor mode only and cleared on a write. 5.3 ICU Special Purpose Registers Table 5-4 lists the SPRs used in the ICU. Table 5-4. ICU Special Purpose Registers (Page 1 of 2) Register Name Address Read/Write Privileged ICESR Instruction Cache Error Syndrome Register x‘851’ R/W Yes ICDBDR0 Instruction Cache Debug Data Register, Instruction x‘979’ R Yes Version 2.2 July 31, 2014 Instruction and Data Caches Page 139 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table 5-4. ICU Special Purpose Registers (Page 2 of 2) Register Name Address Read/Write Privileged ICDBDR1 Instruction Cache Debug Data Register, predecode x‘980’ R Yes ICDBTRL Instruction Cache Debug Tag Register Low x‘926’ R Yes ICDBTRH Instruction Cache Debug Tag Register High x‘927’ R Yes 5.3.1 Instruction Cache Error Syndrome Register (ICESR) 0 1 2 3 4 5 6 7 8 9 ICDAPE ICDAHIT ICINDXPE ICSNPPE ICTESPE ICLOSPE ICTESPE ICLESPE ICRDPE ICTAPE The ICESR provides a syndrome to differentiate between the different kinds of exceptions that can generate the same interrupt type. Upon the generation of one of these interrupt types, the bit or bits corresponding to the specific exception that generated the interrupt is set, and all other ICESR bits are cleared. Reserved 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Bit Field Name Description 0:3 ICRDPE Instruction cache read interface parity error. The bit number represents which word contains the error on the data bus. Multiple bits can be set. 4:7 ICTESPE Tag even set parity error. 8:11 ICTESPE Tag odd set parity error. 12 ICTAPE Parity error in tag SRAM. 13:20 ICINDXPE 21:24 ICDAPE Parity error in ISD. 25 ICLESPE Parity error in LRU/valid SRAM, even set. 26 ICLOSPE Parity error in LRU/valid SRAM, odd set. 27 ICSNPPE Instruction cache snoop parity error. A parity error exists on the snoop request received from the L2 cache. 28 ICDAHIT Instruction cache data array hit. This bit modifies ICDAPE when set. If both ICDAPE and ICDAHIT are set, there is a data parity error on a load request that hits in the instruction cache. If only ICDAPE is set, the parity error is from a request that serviced from the line fill buffers. If ICDAPE is not set, this bit should be ignored. 29:31 Reserved Index of parity error in cache. Represents bits 19:26 of the real address. Core Configuration Registers, CCR0, CCR1, and CCR2, are provided to assist debug and function control. See Section 2 Programming Model on page 33 for further details. 5.4 Self-Modifying Code This example of self-modifying code illustrates the use of cache management instructions to enforce instruction cache coherency. In this example, the program executing on the PowerPC 476FP core stores new data to memory for the purpose of later branching to and executing this new data, which consists of instructions. Instruction and Data Caches Page 140 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core The following code example illustrates the required sequence for software to use when writing self-modifying code. This example assumes that addr1 references a cacheable memory page. stw dcbst msync icbi isync regN, addr1 addr1 addr1 # # # # # # # Store the data (an instruction) in regN to addr1 in the data cache. Write the new instruction from the data cache to memory. Wait until the data reaches the memory. Invalidate addr1 in the instruction cache if it exists. Flush any prefetched instructions within the ICU and instruction unit and refetch them. An older copy of the instruction at addr1 might have already been fetched. At this point, software can begin executing the instruction at addr1 and be guaranteed that the new instruction is recognized. 5.5 Data Cache Controller The data cache is a write-through model and thus all store data is written into the data cache and L2 cache memory at the same time. In addition, the data cache is weakly consistent storage model, and it allows out-oforder loads and store data forwarding within the processor. In general stores are done in-order. PowerPC storage has two levels: the L1 cache and L2 cache structure. This is in addition to system caches, such as the L3 cache and system memory in some systems. In PowerPC storage, the L1 and L2 cache line sizes are different. The L1 cache line size is 32 bytes, and the L2 cache size is 128 bytes. Therefore, all cache operations are handled in a way that the L1 cache operates on its four cache lines independently from the L2 cache. These independent cache operations are transparent to users and programmers. The data cache unit (DCU) primarily consists of the following three subunits: data cache arrays, data cache control (DCC), and the data translation lookaside buffer (DTLB). The array subunit contains three arrays: the LRU, tag, and data arrays. The tag and data arrays are standard SRAMs. The LRU array is a smaller, dual-port register array. Both the tag and data arrays are 4-way set associative and have a pipelined, 2-cycle access. The DCU uses an LRU replacement algorithm that uses a 6-bit age vector in combination with way-locking to determine the best candidate for a replacement. The DCU can receive 256 bits of read data at once from the bus and can send up to 128 bits of write data. The D-cache is nonblocking, and cache coherency is supported by way of the L2 interface in write-through mode. The DCC subunit includes all of the DCU pipeline controls, cache arbitration, the SPRs, and the snoop pipe. It manages most of the data path flow and operation. The DTLB is an 8-entry, fully associative cache that uses the EA to quickly calculate the real address. It is accessed in parallel with the other three arrays. If a DTLB miss occurs, a request is made to the UTLB to calculate the real address. The DCC also handles the execution of the PowerPC data cache management instructions for touching (prefetching), flushing, invalidating, or zeroing cache lines, or for flash invalidation of the entire cache. Resources for controlling and debugging the data cache operation are also provided. Version 2.2 July 31, 2014 Instruction and Data Caches Page 141 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 5.5.1 DCU Operations The data cache is a nonblocking cache and operates in-order. However, load misses can be operated out-of-order because the latencies of L2 cache versus memory differ. Load completion is kept in-order because all instructions are committed (allowed to complete) in-order. All loads and stores are operated based on operand alignment, or up to a two-word boundary. Therefore, any operands that cross the boundary will replicate the operation. For example, an operand will create an extra pipeline cycle. 5.5.1.1 Load Operations Load instructions that reference cacheable memory L2 cache pages and miss in the data cache result in cache line read requests being presented to the data-read PLB interface. Load operations to caching-inhibited memory pages, however, only access the bytes specifically requested, according to the type of load instruction. This behavior of only accessing the requested bytes is only architecturally required. However, the DCU enforces this requirement on any load to a caching-inhibited memory page. Subsequent L2 cache load operations to the same caching-inhibited locations cause new requests to be sent to the data read PLB interface. Data from caching-inhibited locations is not reused from the data cache line fill data (DCLFD) buffer. The DCU includes four DCLFD buffers, such that a total of four independent data cache line fill requests can be in progress at one time. The DCU can continue to process subsequent load and store accesses while these line fills are in progress. The DCU also includes a 8-entry load miss queue (LMQ), which holds up to eight outstanding load instructions that have either missed in the data cache or accessed caching-inhibited memory pages. A load instruction in the LMQ remains there until the requested data arrives in the DCLFD buffer. The data is delivered to the register file and the instruction is removed from the LMQ. 5.5.1.2 Store Operations The processing of store instructions in the DCU is affected by several factors, including the caching-inhibited (I), write-through (W), and guarded (G) storage attributes, and whether the allocation of data cache lines is enabled for cacheable store misses. There are three different behaviors to consider: • Whether a data cache line is allocated (if the line is not already in the data cache) • Whether the data is written directly to memory or only into the data cache • Whether the store data can be gathered with store data from previous or subsequent store instructions before being written to memory; store data is in-order. 5.5.2 Store Gathering In general, memory write operations caused by separate store instructions that specify locations in either write-through or caching-inhibited storage can be gathered into one simultaneous access to memory in 16-byte units, though store gathering in cache inhibited or write-through cases can be disabled by CCR2 register setting. See CCR2[DCSTGW, DISTG] in Section 2.7.6 Core Configuration Register 2 (CCR2) on page 73. A given sequence of two store operations can only be gathered together if the targeted bytes are contained within the same aligned quadword of memory whether they are contiguous with respect to each other. Subsequent store operations might continue to be gathered with the previously gathered sequence, subject to the Instruction and Data Caches Page 142 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core same two rules (same aligned quadword and contiguous with the collection of previously gathered bytes). For example, a sequence of three store word operations to addresses 4, 8, and 0 can all be gathered together because the first two are contiguous with each other and the third (store word to address 0) is contiguous with the gathered combination of the previous two. An additional requirement for store gathering applies to stores that target caching-inhibited memory pages. Specifically, a given store to a caching-inhibited page can only be gathered with previous store operations if the bytes targeted by the given store do not overlap with any of the previously gathered bytes. In other words, a store to a caching-inhibited page must be both contiguous and nonoverlapping with the previous store operations with which it is being gathered. This ensures that the multiple write operations associated with a sequence of store instructions that each target a common caching-inhibited location will each be performed independently on that target location. Finally, a given store operation is not gathered with an earlier store operation if it is separated from the earlier store operation by the msync or mbar instructions and if either of the two store operations reference a memory page that is both guarded and caching inhibited. 5.5.3 Line Flush Operations Because the data cache is write-through, L1 cache and L2 cache memory is always updated at the same time. Therefore, no dirty bits exist in the data cache, and flush operations are not required. The dcbi, dcbf, and dcbz operations are sent to the L2 cache with the operand RA, and the L2 cache operates accordingly while the data cache operates accordingly within the data cache. The following list describes data-cache flush operations: dcbf The L2 cache flushes the dirty line specified by RA to memory and invalidates the cache line. The L1 cache invalidates up to four hit cache lines of the same L2 cache line. dcbi The L2 cache invalidates the cache line specified by RA. The L1 cache invalidates up to four hit cache lines of the same L2 cache line. dcbz The L2 cache writes zeros to the entire cache line specified by RA. The L1 cache writes zeros up to four cache lines of the L2 cache line. 5.5.4 Storage Access Ordering In general, the DCU can perform load and store operations out-of-order with respect to the instruction stream. That is, the memory accesses associated with a sequence of load and store instructions can be performed in memory in an order different from that implied by the order of the instructions. For example, loads can be processed ahead of earlier stores, or stores can be processed ahead of earlier loads. Also, later loads and stores that hit in the data cache can be processed before earlier loads and stores that miss in the data cache. The DCU enforces the requirements of the ISA sequential execution model, such that the net result of a sequence of load and store operations is the same as that implied by the order of the instructions. This means, for example, if a later load reads the same address written by an earlier store, the DCU guarantees that the load will use the data written by the store, and not the older prestore data. But the memory subsystem might still detect a read access associated with an even later load before it detects the write access associated with the earlier store. Version 2.2 July 31, 2014 Instruction and Data Caches Page 143 of 322 User’s Manual PowerPC 476FP Embedded Processor Core If the DCU must make a read request to the data read L2 cache interface, and this request conflicts with (that is, references one or more of the same bytes as) an earlier write request that is being made to the data write L2 cache interface, the DCU withholds the read request from the data read L2 cache interface until the write request has been acknowledged on the data write L2 cache interface. When the earlier write request has been acknowledged, the read request is presented, and the L2 cache subsystem must ensure that the data returned for the read request reflects the value of the data written by the write operation. Conversely, if a write request conflicts with an earlier read request, the DCU withholds the write request until the read request has been acknowledged. The PowerPC system provides storage synchronization instructions to enable software to control the order in which the memory accesses associated with a sequence of instructions are performed. Also, the affected cache lines in the L1 and L2 caches should be snoop invalidated if an I/O device writes data over the PLB6. 5.5.5 Data Cache Coherency Because the PowerPC 476FP data cache is write-through, and the L2 cache is write-back and inclusive of the data cache, all PLB6 data transactions are monitored by the L2 cache and filtered for the L1 data cache snoop-invalidate. This reduces the snoop traffic to the L1 data cache and improves the data cache performance. Because the L2 cache keeps the cache states, the data cache does not maintain and manipulate many of those coherency protocol activities for the system. Therefore, the data cache emphasizes processor performance rather than system level maintenance. 5.5.6 Data Cache Control and Debug The PowerPC 476FP core provides various registers and instructions to control data cache operation and to help debug data cache problems. See Section 2.7.4 Core Configuration Register 0 (CCR0) on page 69, Section 2.7.5 Core Configuration Register 1 (CCR1) on page 70, and Section 2.7.6 Core Configuration Register 2 (CCR2) on page 73 for more information. 5.5.7 Data Cache Management and Debug Instruction Summary For detailed descriptions of the instructions summarized in this section, see the Power ISA Version 2.05 specification. In the instruction descriptions, the term block, describes the unit of storage operated on by the cache block instructions. For the PowerPC 476FP core, this is the same as a cache line. Software uses the following instructions to manage the data cache. 5.5.7.1 Data Cache Block Zero (dcbz) The data cache block zero (dcbz) instruction writes zeros to the specified cache line in both the L1 and L2 caches. Because the L2 cache line size is larger than the L1 cache line size, the DCU replicates this instruction such that there is a dcbz request for each L1 cache line that exists within the L2 cache line. Each replicated operation is treated as an independent instruction within the L-pipe, and only the first dcbz of the replicated group is broadcast to the L2. All of the replicated operations search the L1 data cache and each replicated operation that is a hit in the cache writes zeros to the L1 cache line. If the operation is a miss, no resources are updated in the L1, but it is still sent to the L2 if it is the first replicated operation. Instruction and Data Caches Page 144 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core The dcbz is broadcast to the L2 that uses the DCU store interface. It allocates an entry in the store buffer queue (SBQ) and generates a dcbz request to the L2 once at the head of the queue. A dcbz generates a write access control exception if write permission does not exist. 5.5.8 Data Cache Block Lock Clear (dcblc) This instruction unlocks a line in the L1 or L2 cache. The DCU verifies that a valid address translation exists and searches the tag array to determine if that address exists in the cache. If it does, the lock bit is cleared for the matching way in the cache, but this can only occur when the instruction is committed. A lock clear operation does not invalidate the line from the cache. Even though the line might exist in both the L1 and L2 caches, the dcblc only unlocks the requested line in the cache specified by the CT field. If the CT is set to the L2 cache only, the dcblc is broadcast to the L2 cache using the DCU read interface. It requires allocation of a line fill buffer and generates a no-data request to the L2 cache. When the request is accepted, the line fill buffer deallocates. A dcblc generates a read access control exception if read permission does not exist. 5.5.9 Data Cache Block Store (dcbst) The dcbst instruction flushes dirty data from the L2 cache. This instruction is effectively a no-op for the L1 because the L1 cache is write-through only and cannot contain dirty data. The dcbst is broadcast to the L2 cache using the DCU store interface. A dcbst generates a read access control exception if read permission does not exist. 5.5.10 Data Cache Block Flush (dcbf) In local mode, which is described in the Power ISA Version 2.05 specifications, the dcbf instruction invalidates the line in processors in the system. Note that only L = 0 (local) mode is supported. The dcbf instruction forces dirty data in the specified L2 cache line to be written to memory and then invalidates the line. Because the L1 is strictly a write-through cache, there is no dirty data in the L1. Thus, a dcbf to the L1 only invalidates the line in the L1 cache. No data flush is required. Note that even though the specified line might be in cache inhibited mode, it must still be searched and flushed from the cache if it exists. Because this instruction operates on an L2 cache line, and the L2 cache line size is larger than the L1 cache line size, the DCU replicates this instruction such that there is a dcbf request for each L1 cache line that exists within the L2 cache line. Each replicated operation is treated as an independent instruction within the L-pipe, and only the first operation of the replicated group is sent to the L2. All of the replicated operations search the L1 data cache, and each replicated operation that hits in the cache invalidates the cache line that is hit. If the operation is a miss, no resources are updated in the L1, but it is still sent to the L2 cache if it is the first replicated operation. A dcbf cannot invalidate a line in the L1 cache until it is committed. A dcbf to a locked line clears the lock in both the L1 and L2 caches. The dcbf is broadcast to the L2 cache by using the DCU store interface. It allocates an entry in the SBQ and generates a no-data request to the L2 cache. A dcbf instruction generates an exception if any of the following conditions occur: Version 2.2 July 31, 2014 Instruction and Data Caches Page 145 of 322 User’s Manual PowerPC 476FP Embedded Processor Core • The dcbf instruction is executed in user mode, and the DULXE bit (MMUCR[4]) is set. • The target block of the dcbf does not have read permission. 5.5.11 Data Cache Block Invalidate (dcbi) The dcbi and dcbf instructions perform the same function for both the L1 and L2 caches. That is, the dcbi flushes dirty data in the L2 cache out to memory. Even though dcbi does not have the L field, it operates in local mode, just like the dcbf instruction. The dcbi instruction invalidates a line in the L1 and L2 caches. The L2 treats the dcbi as a dcbf (that is, it invalidates the line and flushes the data). Even though the specified line might be in cache inhibited mode, it must still be searched and invalidated in the cache if it exists. Because this instruction operates on an L2 cache line, and the L2 cache line size is larger than the L1 cache line size, the DCU replicates this instruction such that there is a dcbi request for each L1 cache line that exists within the L2 cache line. Each replicated operation is treated as an independent instruction within the L-pipe, and only the first operation of the replicated group is sent to the L2 cache. All of the replicated operations search the L1 data cache, and each replicated operation that hits in the cache invalidates the cache line that is hit. If the operation is a miss, no resources are updated in the L1, but it is still sent to the L2 cache if it is the first replicated operation. A dcbi cannot invalidate a line in the L1 cache until it is committed. A dcbi to a locked line clears the lock in both the L1 and L2 caches. The dcbi is broadcast to the L2 using the DCU store interface. It allocates an entry in the SBQ and generates a no-data request to the L2 cache. The dcbi generates an exception if any of the following conditions are true: • The dcbi instruction is executed in user mode. • The target block of the dcbi does not have read permission. 5.5.12 Data Cache Invalidate (dci) This instruction invalidates the entire L1 or L2 data cache based on the CT field. If destined for the L2 cache, the DCU broadcasts one dci instruction to the L2 cache. If destined for the L1cache, the DCU generates replicated dci requests until the entire data cache is invalidated. Each dci can invalidate two sets in the data cache. Nonfinal replicated pieces require virtual commitment before invalidating any cache lines. The final piece requires regular commitment. A dci instruction can only go to the L1 or L2 cache, thus it is possible to violate the inclusive nature of the L1 and L2 cache relationship by only performing a dci instruction to the L2 cache. In other words, it is only permissible to perform a dci instruction to the L1 and L2, in that order, followed by an isync. Power ISA specifies that software must place an msync instruction after a dci instruction to guarantee that the dci completes before any subsequent data storage accesses are performed. However, instead of the msync instruction, isync suffices for a dci to the L1 cache. If the L2 cache is unified, invalidating the L2 cache requires the software to invalidate both the instruction-side and data-side L1 caches. The dci is broadcast to the L2 cache using the DCU store interface. It allocates an entry in the SBQ and generates a no-data request to the L2 cache. Instruction and Data Caches Page 146 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core The dci generates an exception if not in supervisor mode. 5.5.13 Data Cache Block Touch (dcbt) This instruction gives a hint that the specified line might be accessed in the future. It is considered a speculative operation, and it affects either the L1 or the L2 cache, based on the CT field. Both types of dcbt instructions send a request to the L2 cache with the difference being that an L2-only dcbt request does require data to be returned to the L1. A request can be sent to the L2 cache before commitment because the instruction is only a hint. If CT is set to L1, the DCU searches the L1 cache for the appropriate cache line. If it is a hit in the cache, the dcbt does nothing and is not broadcast to the L2 cache. If it is a miss, it is allocated a line fill buffer (CT = L1) and sends a request for data to the L2 cache. If CT is set to L2, the L1 cache is not searched and a request is sent to the L2 as a dcbt with no data returned. The dcbt is broadcast to the L2 using the DCU read interface. It allocates a line fill buffer and generates either a data or no-data request, based on CT field and L1 hit status. A dcbt does not generate an exception if read permission does not exist. If this occurs, the dcbt becomes a no-op and has no effect on the L1. It is also a no-op if it references a cache-inhibited or guarded page. 5.5.14 Data Cache Block Touch with Lock Set (dcbtls) This instruction can either load and lock a line in the cache or lock a line that already exists in the cache. It is not a speculative operation, and it affects either the L1 or the L2 cache based on the CT field. The dcbtls instruction allocates a cache line if a miss occurs and then locks the line in either the L1 or L2 cache. There are several possible actions for the dcbtls: • If CT = L1 and dcbtls is a cache miss in the L1, a line fill buffer is allocated and eventually goes into the L1 cache as a locked line. • If CT = L1, dcbtls is a cache hit, and the line is currently unlocked, the line is locked. • If CT = L1, dcbtls is a cache hit, and the line is currently locked, it is treated as a no-op. If CT = L2, a line fill buffer is allocated and a no-data dcbtls request is sent to the L2 cache. This does not change anything in the L1. The dcbtls is broadcast to the L2 using the DCU read interface. It allocates a line fill buffer and generates either a data or no-data request, based on the CT field and L1 hit status. A dcbtls generates an exception if any of the following conditions occur: • The dcbtls instruction is executed in user mode. • The target block of the dcbtls does not have read permission. There is no exception generated if in an overlocked condition exists in the cache. Note: A unique scenario in the L1 cache can occur in which a locked line is replaced even though the lock was recently set. Upon allocation of a line fill buffer, the destination way in the set is selected based on current LRU and lock status. A subsequent dcbtls can match the cache line that the line fill buffer will eventually Version 2.2 July 31, 2014 Instruction and Data Caches Page 147 of 322 User’s Manual PowerPC 476FP Embedded Processor Core replace. In this scenario, the dcbtls might set the lock bit before the line fill occurs. Therefore, when the line fill occurs, it overwrites the entry, and the locked line is removed. To prevent this scenario, an lwsync must be placed before the dcbtls. The requirements for this scenario follow: • A cacheable line fill buffer exists that will replace way X, set Y in the L1 exists. • A dcbtls instruction hits in the L1, an unlocked location way X of set Y. • A dcbtls sets the lock bit for way X of set Y before the occurrence of the line fill. • When the cacheable line fill buffer is ready to perform the line fill, it replaces the locked line in way X, set Y. 5.5.15 Data Cache Block Touch for Store (dcbtst) This instruction gives a hint that the specified line might be written in the near future. It is considered a speculative operation and it affects either the L1 or the L2 based on the CT field. Both types of dcbtst instructions send a request to the L2 with the difference being that an L2-only dcbtst request does not require data to be returned to the L1 cache. A request can be sent to the L2 before commitment because the instruction is only a hint. If the CT field is set to the L1, the DCU searches the L1 cache for the appropriate cache line. If it is a hit in the cache, the dcbtst instruction does nothing and is not broadcast to the L2. If it is a miss, it allocates a line fill buffer (CT = L1) and sends a request for data to the L2. If CT is set to L2, the L1 cache is not searched and a request is sent to the L2 as a dcbtst with no data returned. The dcbtst instruction is broadcast to the L2 using the DCU read interface. It allocates a line fill buffer and generates either a data or no-data request, based on the CT field and L1 hit status. A dcbtst instruction does not generate an exception if write permission does not exist. If this occurs, the dcbtst instruction becomes a no-op and has no effect on the L1. It is also a no-op if it references a cache inhibited or guarded page. 5.5.16 Data Cache Block Touch For Store with Lock Set (dcbtstls) This instruction can either load and lock a line in the cache (with the expectation that a store to the specified line will occur in the near future), or lock a line that already exists in the cache. It is not a speculative operation and it affects either the L1 or the L2 cache based on the CT field. The dcbtstls instruction allocates a cache line if a miss occurs, and then locks the line in either the L1 or L2 cache. There are several possible actions for the dcbtstls instruction: • If CT = L1 and dcbtstls is a cache miss in the L1 cache, a line fill buffer is allocated and eventually goes into the L1 cache as a locked line. • If CT = L1, dcbtstls is a cache hit, and the line is currently unlocked, the line will be locked • If CT = L1, dcbtstls is a cache hit, and the line is currently locked, the line is treated as a no-op If CT = L2, a line fill buffer is allocated and a no-data dcbtstls request is sent to the L2. This does not change anything in the L1. A dcbtstls generates an exception if any of the following conditions occur: • The dcbtstls instruction is executed in user mode. • The target block of the dcbtstls does not have read permission. Instruction and Data Caches Page 148 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core No exception is generated if in an overlocked condition exists in the cache. The dcbtstls is broadcast to the L2 using the DCU read interface. It allocates a line fill buffer and generates either a data or no-data request, based on the CT field and L1 hit status. Note: A unique scenario in the L1 cache can occur in which a locked line is replaced even though the lock was recently set. Upon allocation of a line fill buffer, the destination way in the set is selected, based on current LRU and lock status. A subsequent dcbtls can match the cache line that the line fill buffer will eventually replace. In this scenario, the dcbtstls might set the lock bit before the line fill occurs. Therefore, when the line fill occurs, it overwrites the entry and the locked line is removed. To prevent this scenario, a lwsync must be placed before the dcbtstls. The requirements for this scenario follow: • A cacheable line fill buffer exists that will replace way X, set Y in the L1 cache exists. • A dcbtstls instruction hits in the L1 cache, an unlocked location way X of set Y. • A dcbtstls sets the lock bit for way X of set Y before the occurrence of the line fill. • When the cacheable line fill buffer is ready to perform the line fill, it replaces the locked line in way X, set Y. 5.5.17 Data Cache Read (dcread) The dcread instruction reads the content of a specific, physical location in the data cache and stores the data in debug registers. The DCU does not do any address translation nor exception processing for this instruction. Because only the content of a specific cache location is accessed, a dcread request uses a modified format for the EA. Address Bits Description 0:16 Unused. 17:18 Data cache way. 19:26 Data cache index. 27:29 Word address within L1 Data cache line. 30:31 Unused. The dcread instruction is not broadcast to the L2. The dcread instruction generates an exception if it is not in supervisor mode. 5.5.18 Memory Barrier Instructions The PowerPC 476FP processor provides three types of memory barrier or storage barrier instructions: msync, mbar, and lwsync. See Power ISA Version 2.05 for more details. The msync instruction is equivalent to sync L = 0, lwsync is equivalent to sync L = 1, and mbar is intended to be similar to eieio, but in the PowerPC 476FP implementation, msync is similar to mbar. However, for the future compatibility, it is recommended to use mbar for mbar functionality for programming. In other words, do not substitute msync for mbar. Version 2.2 July 31, 2014 Instruction and Data Caches Page 149 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 5.5.18.1 Memory Synchronization (msync) The msync instruction is the same as sync L = 0 in the Power ISA Version 2.05 specification. The msync instruction blocks the issue of all further instructions until the IU receives a signal from the L2 indicating that the operation has completed. The instruction must proceed down the L-pipe to be broadcast to the L2. The DCU is responsible only for confirmation, commitment, and broadcasting the instruction to the L2 using the store interface. The msync must ensure that all load and store operations have completed before sending the request to the L2. Thus, it cannot allocate a line in the SBQ until the LMQ, line fill buffers, and store hit queue are empty. It does not have to wait for the SBQ to be empty. Because the SBQ is used to send the msync to the L2 and is a FIFO queue, all store operations are guaranteed to leave the DCU before the msync. The msync must be committed before leaving the SBQ, but can allocate an entry in the SBQ before receiving a commitment. The msync instruction also has a system synchronization function, and it guarantees completion of all preceding operations. 5.5.18.2 Memory Barrier (mbar) The mbar instruction is the same as sync L = 0 in the Power ISA Version 2.05 specification. The mbar instruction blocks the issue of all further instructions until the IU receives a signal from the L2 indicating that the operation has completed. The instruction must proceed down the L-pipe to be broadcast to the L2. The DCU is responsible only for confirmation, commitment, and broadcasting the instruction to the L2 using the store interface. The mbar must ensure that all load and store operations have completed before sending the request to the L2. Thus, it cannot allocate a line in the SBQ until the LMQ, line fill buffers, and store hit queue are empty. It does not have to wait for the SBQ to be empty. Because the SBQ is used to send the mbar to the L2 and is a FIFO queue, all store operations are guaranteed to leave the DCU before the mbar. The mbar must be committed before leaving the SBQ, but can allocate an entry in the SBQ before receiving commitment. 5.5.18.3 Lightweight Sync (lwsync) The lwsync instruction is the same as sync L = 1 in the Power ISA Version 2.05 specification. The lwsync instruction blocks the issue of all further instructions until the operation has completed. The operation is considered complete when the request is acknowledged by the L2 cache. 5.5.19 Core Configuration Registers (CCR0, CCR1, and CCR2) Core Configuration Registers CCR0, CCR1, and CCR2, are provided to assist debug and function control. See Section 2 Programming Model on page 33 for further details. Instruction and Data Caches Page 150 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 5.5.20 dcbt and dcbtst Operation The dcbt instruction is typically used as a hint to the processor that a particular block of data is likely to be referenced by the executing program in the near future. Thus, the processor can begin filling that block into the data cache so that when the executing program eventually performs a load from the block, it is already present in the cache, thereby improving performance. The dcbtst instruction is typically used for a similar purpose, but specifically for cases where the executing program is likely to store to the referenced block in the near future. The difference in the purpose of the dcbtst instruction relative to the dcbt instruction is only relevant within shared-memory systems with hardware-enforced support for cache coherency. In such systems, the dcbtst instruction attempts to establish the block within the data cache in such a fashion that the processor is most readily able to subsequently write to the block. By default, the dcbt and dcbtst instructions are ignored if the filling of a requested cache block cannot be immediately commenced, and waiting for such commencement might result in the DCU execution pipeline being stalled. For example, the dcbt instruction is ignored if all three DCLFD buffers are already in use and execution of subsequent storage access instructions is pending. However, the dcbt and dcbtst instructions can also be used as a convenient mechanism for setting up a fixed, known environment within the data cache. This is useful for establishing contents for cache line locking, deterministic performance on a particular sequence of code, or debugging of low-level hardware and software problems. Because the PowerPC 476FP core supports hardware coherency, these touch instructions are not guaranteed operations, and therefore, the target cache line might not be accelerated under certain conditions such as snooped cases and the DCU pipeline might be stalled as mentioned previously. 5.5.21 dcread Operation The dcread instruction can be used to directly read both the tag information and a specified data word in a specified entry of the data cache. The data word is read into the target GPR specified in the instruction encoding. The tag information is read into a pair of SPRs, the DCDBTRH Register, and the DCDBTRL Register. The tag information can subsequently be moved into the GPRs using mfspr instructions. The execution of the dcread instruction generates the equivalent of an EA, which is then used to select a specific data word from a specific cache line, as shown in Table 5-5 on page 151. Table 5-5. Effective Address Format for icread and dcread Address Bits Description 0:16 Unused. 17:18 Cache way. 19:26 Data cache index. 27:29 Word address within L1 cache line. 30:31 Unused. The EA generated by the dcread instruction must be word-aligned (that is, EA[30:31] must be 0); otherwise, it is a programming error and the result is undefined. Version 2.2 July 31, 2014 Instruction and Data Caches Page 151 of 322 User’s Manual PowerPC 476FP Embedded Processor Core If the CCR0[CRPE] bit is set, execution of the dcread instruction also loads parity information into the DCDBTRL Register. Note that the DCDBTRL[DATAP] field, unlike all the other parity fields, loads the check values of the parity instead of the raw parity values. That is, the DATAP field will always load with zeros unless a parity error has occurred or has been inserted intentionally using the appropriate bits in the CCR1. Execution of the dcread instruction is privileged and is intended for use for debugging purposes only. Note: The use of the dcread instruction might not provide correct information when the DCU is still in the process of performing cache operations associated with previously executed instructions such as line fills and line flushes. Also, the PowerPC 476FP core does not automatically synchronize context between a dcread instruction and the subsequent mfspr instructions that read the results of the dcread instruction into GPRs. To guarantee that the dcread instruction operates correctly and that the mfspr instructions obtain the results of the dcread instruction, a sequence such as the following must be used: msync dcread regT,regA,regB isync mfdcdbtrh mfdcdbtrl regD regE # # # # # # # # Ensure that all previous cache operations have completed. Read cache information; the contents of GPR A and GPR B are added and the result is used to specify a cache line index to be read. The data word is moved into GPR T and the tag information is read into DCDBTRH and DCDBTRL. Ensure dcread completes before attempting to read the results. Move the high portion of the tag into GPR D. Move the low portion of the tag into GPR E. LRUV 0 1 2 LRU 3 4 5 Bit Name 0:3 LRUV 4:9 LRU 10:13 LOCK 14:15 Reserved 16:17 LRUP 18:27 Reserved 28:31 DATAP 6 Instruction and Data Caches Page 152 of 322 7 LOCK 8 9 Reserved 5.5.22 Data Cache Debug Tag Register Low (DCDBTRL) LRUP Reserved DATAP 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Description LRU valid bits. One bit for each way in the set. The LRU value for the set. Lock bit. One bit for each way in the set. LRU Parity Data parity for the word being accessed. Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core VALID 5.5.23 Data Cache Debug Tag Register High (DCDBTRH) ADDR 0 1 2 3 4 5 6 7 8 9 TAGP EXTADDR 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Bits Field Name Description 0:18 ADDR Tag address. RA[10:28]. 19 VALID Valid bit for this entry. 20:21 TAGP Tag parity bits. Parity of RA[0:28]. 22:31 EXTADDR Extended tag address. RA[0:9]. 5.5.24 Data Cache Parity Operations The data cache contains parity bits and multihit detection hardware to protect against soft data errors. Both the data cache tags and data are protected. The data parity is byte-based; 258 bits of data and 32 bits of parities per cache line. The tag parity is based on stored real address based because it is real address tagged; bit 0 to bit 28 RA has one parity bit. In addition, there is one parity bit for six LRU bits, one parity bit for 4 valid bits, and 4 lock bits per cache line. If a parity error is detected and the MSR[ME] is asserted (that is, machine check interrupts are enabled), the processor vectors to the machine check interrupt handler. As is the case for any machine check interrupt, after vectoring to the machine check handler, the MCSRR0 contains the value of the oldest uncommitted instruction in the pipeline at the time of the exception, and MCSRR1 contains the old (MSR) context. The interrupt handler can query the MCSR, MCSR[DC] being set, to determine whether it was called because of a data cache parity error, and is then expected to either invalidate the data cache (using dci) or to invoke the operating system to end the process or reset the processor, as appropriate. The handler returns to the interrupted process using the rfmci instruction. 5.5.24.1 Data Cache Exception Status Register (DCESR) 0 1 2 3 4 5 6 Bit Name 0:3 DCRDPE 4:7 Reserved Version 2.2 July 31, 2014 7 8 9 Reserved DCDAAPU DCDAHIT DCINDXPE DCSNPPE DCOSPE DCLRUPE DCESPE Reserved Reserved DCTAPE DCRDPE DCDAPE The Data Cache Exception Status Register (DCESR) provides further details about data cache parity errors. This register provides what operation caused an error, which interface detected an error, and which cache line index is affected. It also includes a multihit error case. 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Description Data cache read interface parity error. The bit number represents which word contains the error on the data bus. Multiple bits can be set. Instruction and Data Caches Page 153 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Bit Name Description 8:11 DCESPE Data cache even set parity error. This field is used for tag array parity errors and can have multiple bits set. Errors can be reported even though the request is for an odd set. 0:3 ways are in the set with (addr[19] XOR addr[26]) equal to ‘0’ 12:15 DCOSPE Data cache odd set parity error Multiple bits can be set. 0:3 ways are in the set with (addr[19] XOR addr[26]) equal to ‘1’. This field is used for tag array parity errors and can have multiple bits set. Errors can be reported even though the request is for an even set. 16:22 DCINDXPE 23 DCDAPE Data cache data array parity error. If set, the requested data has a parity error. If the request is a miss, no error is reported. 24 DCTAPE Data cache tag array parity error. If set, at least one of the tags associated with a way in either the even or odd set has a parity error, the designation of which way is specified by the DCESPE and DCOSPE fields. 25 Reserved 26 DCLRUPE Data cache LRU/valid/lock parity error. A parity error exists in either the even or odd LRU/Valid/Lock field for the requested set. 27 DCSNPPE Data cache snoop parity error. A parity error exists on the snoop request received from the L2 cache. 28 DCDAHIT Data cache data array hit. This bit modifies DCDAPE when set. If both DCDAPE and DCDAHIT are set, there is a data parity error on a load request that hits in the data cache. If only DCDAPE is set, the parity error is from a request that serviced from the line fill buffers. If DCDAPE is not set, this bit should be ignored. 29 DCDAAPU Data cache data array APU. This bit modifies DCDAPE when set. If both DCDAPE and DCDAAPU are set, there is a data parity error on a load request for the APU. If only DCDAPE is set, the parity error is from a CPU request. If DCDAPE is not set, this bit should be ignored. 30:31 Reserved Index of parity error in cache. Represents bits 20:26 of the real address. Bit 19 can be inferred from the DCESPE and DCOSPE fields. If the interrupt handler is executed before a parity error can corrupt the state of the machine, the executing process is recoverable, and the interrupt handler can invalidate the data cache and resume the process. To guarantee that all parity errors are recoverable, user code must have two characteristics. First, it must mark all cacheable data pages as write-through instead of copy-back. Second, the software-settable bit (CCR0[PRE]) must be set. This bit forces all load instructions to stall in the last stage of the load/store pipeline for one cycle, but only if required to ensure that parity errors are recoverable. The pipeline stall guarantees that any parity error is detected. Thus, the resulting machine check interrupt is taken before the load instruction completes and the target GPR is corrupted. Setting CCR0[PRE] degrades overall application performance. However, if the state of the load/store pipeline is such that a load instruction stalls in the last stage for some reason unrelated to parity recoverability, CCR0[PRE] does not cause an additional cycle stall. Note that the parity exception type machine check interrupt is asynchronous; that is, the return address in the MCSRR0 does not necessarily point at the instruction address that detected the parity error in the data cache. Rather, the machine check interrupt is taken as soon as the parity error is detected, and some instructions in progress can get flushed and re-executed after the interrupt as if the machine were responding to an external interrupt. Instruction and Data Caches Page 154 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 5.5.25 Simulating Data Cache Parity Errors for Software Testing Parity errors occur in the cache infrequently and unpredictably. Therefore, the CCR1[DCDPEI], CCR1[DCTPEI], CCR1[DCUPEI], CCR1[DCMPEI], and CCR1[FCOM] fields can be used to simulate the effect of a data cache parity error so that interrupt handling software can be exercised. The 39 data cache parity bits in each cache line contain one parity bit per data byte (that is, 32 parity bits per 32 byte line) plus the following parity bits: • Two parity bits for the address tag (the valid (V) bit is not included in the parity bit calculation for the tag) • One parity bit for the 4-bit U field on the line • A parity bit for each of the four modified (dirty) bits on the line There are two parity bits for the tag data because the parity is calculated for alternating bits of the tag field to guard against a single particle strike event that upsets two adjacent bits. The other data bits are physically interleaved in such a way as to allow the use of a single parity bit per data byte or other field. All parity bits are calculated and stored as the line is initially filled into the cache. In addition, the data and modified (dirty) parity bits (but not the tag and user parity bits) are updated as the line is updated, as the result of executing a store instruction or dcbz. Usually, parity is calculated as the even parity for each set of bits to be protected, which the checking hardware expects. However, if any of the CCR1[DCTPEI] bits are set, the calculated parity for the corresponding bits of the tag are inverted and stored as odd parity. Likewise, if the CCR1[DCUPEI] bit is set, the calculated parity for the user bits is inverted and stored as odd parity. Similarly, if the CCR1[DCDPEI] bit is set, the parity for any data bytes that are written, either during the process of a line fill or by execution of a store instruction, is set to odd parity. Then, when the data stored with odd parity is subsequently loaded, it causes a parity exception type machine check interrupt and exercises the interrupt handling software. The following pseudocode is an example that uses the CCR1[DCDPEI] field to simulate a parity error on byte 0 of a target cache line: dcbt <target line address> msync mtspr CCR1, Rx isync stb <target byte address> msync mtspr CCR1, Rz isync lb <byte 0 of target line> # # # # # # # # # Get the target line into the cache. Wait for the dcbt. Set CCR1[DCDPEI]. Wait for the CCR1 context to update. Store some data at byte 0 of the target line. Wait for the store to finish. Reset CCR1[ICDPEI0]. Wait for the CCR1 context to update. Load byte causes interrupt. If the CCR1[DCMPEI] bit is set, the parity for any modified (dirty) bits that are written, either during the process of a line fill or by execution of a store instruction or dcbz, is set to odd parity. If the CCR1[FFF] bit is also set in addition to CCR1[DCMPEI], the parity for all four modified (dirty) bits is set to odd parity. Store access to a cache line that is already in the cache and in a memory page for which the write-through storage attribute is set does not update the modified (dirty bits) or the modified (dirty) parity bits. Thus for these accesses, the CCR1[DCMPEI] setting has no effect. The CCR1[FCOM] bit enables the simulation of a multihit parity error. When set, it causes a dcbt to seem to be a miss, initiating a line fill even if the line is already in the cache. Thus, this bit allows the same line to be filled to the cache multiple times, which generates a multi-hit parity error when an attempt is made to read data from those cache lines. The following pseudocode is an example that uses the CCR1[FCOM] field to simulate a multihit parity error in the data cache: Version 2.2 July 31, 2014 Instruction and Data Caches Page 155 of 322 User’s Manual PowerPC 476FP Embedded Processor Core mtspr CCR0, Rx dcbt <target line address> msync mtspr CCR1, Ry isync dcbt <target line address> msync mtspr CCR1, Rz isync br <byte 0 of target line> Instruction and Data Caches Page 156 of 322 # # # # # # # # # # Set CCR0[GDCBT]. This dcbt fills a first copy of the target line, if necessary. Wait for the fill to finish. Set CCR1[FCOM]. Wait for the CCR1 context to update. Fill a second copy of the target line. Wait for the fill to finish. Reset CCR1[FCOM]. Wait for the CCR1 context to update. Load byte causes interrupt. Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 6. Timer Facilities The PowerPC 476FP provides four timer facilities as described in the Power ISA Specification V2.05: a time base, a decrementer (DEC), a fixed-interval timer (FIT), and a watchdog timer. These facilities share the same source clock frequency and can support the following functions: • Time of day • General software timing • Periodic service of peripherals • General system maintenance • System error recovery Figure 6-1 shows the relationship between these facilities and the clock source. Figure 6-1. Relationship of Timer Facilities to the Time Base New Timer Divide Select External Timer Clock MUX Time Base (Incrementer) CPU Clock Divide by 4 Divide by 8 CCR1[TSS] Time Base Lower (32 bits) 0 Time Base Upper (32 bits) 31 0 31 Divide by 16 TBU[31] (233 clocks) TBL[3] (229 clocks) CCR1[TCS] [22:23] TBL[7] (225 clocks) Watchdog Timer Period TBL[11] (221 clocks) TBL[7] (225 clocks) TBL[11] (221 clocks) TBL[15] (217 clocks) Fixed-Interval Timer Period TBL[19] (213 clocks) Decrementer (DEC) DEC (32 bits) 0 31 Zero Detection (Decrementer Exception) Version 2.2 July 31, 2014 Timer Facilities Page 157 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table 6-1 summarizes the timer registers in the PowerPC 476FP processor. Table 6-1. Timer Register Summary Register Name Register Short Name Read Address Write Address See Page Time Base Lower TBL x‘10C’ x‘11C’ 158 Time Base Upper TBU x‘10D’ x‘11D’ 158 Decrementer DEC x‘016’ x‘016’ 159 Decrementer Autoreload DECAR Write only x‘036’ 159 Timer Control Register TCR x‘154’ x‘154’ 163 Timer Status Register TSR x‘150’ x‘150’ (clear) x‘350’ (set) 164 6.1 Time Base The time base is a 64-bit register which increments once during each period of the source clock, and provides a time reference. Access to the time base is through two Special Purpose Registers (SPRs). The Time Base Upper (TBU) SPR contains the high-order 32 bits of the time base, and the Time Base Lower (TBL) SPR contains the low-order 32 bits. Software access to TBU and TBL is nonprivileged for reads but is privileged for writes. Therefore, different SPR numbers are used for reading than for writing. TBU and TBL are written using the mtspr instruction and are read using the mfspr instruction. The period of the 64-bit time base registers is approximately 1462 years for a 400 MHz clock source. The time base value itself does not generate any exceptions, even when it wraps. For most applications, the time base is set once at system reset and only read thereafter. Note that fixed-interval timer and watchdog timer exceptions (discussed in Section 6.3 Fixed-Interval Timer on page 160 and Section 6.4 Watchdog Timer on page 161) are caused by ‘0’ to ‘1’ transitions of selected bits from the time base. Transitions of these bits caused by software alteration of the time base have the same effect as transitions caused by normal incrementing of the time base. The TBL and TBU Registers are shown here. Time Base Lower 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:63 Time Base Lower Description Low-order 32 bits of the time base. Time Base Upper 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:63 Time Base Upper Timer Facilities Page 158 of 322 Description High-order 32 bits of the time base. Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 6.1.1 Reading the Time Base The following code provides an example of reading the time base. TBU and TBL are the symbolic names for the TBU and TBL registers. loop: mfspr mfspr mfspr cmpw bne Rx,TBU Ry,TBL Rz,TBU Rz, Rx loop # # # # # Read TBU into general purpose register (GPR) Rx. Read TBL into GPR Ry. Read TBU again, this time into GPR Rz. See if old = new. Loop/reread if rollover occurred. The comparison and loop ensure that a consistent pair of values is obtained. 6.1.2 Writing the Time Base The following code provides an example of writing the time base. lwz lwz li mtspr mtspr mtspr Rx, upper Ry, lower Rz, 0 TBL,Rz TBU,Rx TBL,Ry # Load 64-bit time base value into GPRs Rx and Ry. # # # # Set GPR Rz to 0. Force TBL to 0 (thereby preventing wrap into TBU). Set TBU to initial value. Set TBL to initial value. 6.2 Decrementer and Decrementer Autoreload Registers The Decrementer Register (DEC) is a 32-bit privileged SPR that decrements at the same rate that the time base increments. The DEC is read using mfspr and is written using mtspr. When a nonzero value is written to the DEC, it begins to decrement with the next time base clock. A decrementer exception is signaled when a decrement occurs on a DEC count of 1, and the decrementer interrupt status (DIS) field of the Timer Status Register (TSR[DIS]; see Section 6.6 Timer Status Register on page 164) is set. A decrementer interrupt occurs if it is enabled by both the decrementer interrupt enable (DIE) field of the Timer Control Register (TCR[DIE]; see Section 6.5 Timer Control Register on page 163) and by the external interrupt enable (EE) field of the Machine State Register (MSR[EE]; see Section 7.4.1 Machine State Register (MSR) on page 173. Section 7 Processor Interrupts and Exceptions on page 167 provides more information about the handling of decrementer interrupts. The decrementer interrupt handler software should clear TSR[DIS] before re-enabling MSR[EE] to avoid another decrementer interrupt caused by the same exception (unless TCR[DIE] is cleared instead). The behavior of the DEC itself upon a decrement from a DEC value of 1 depends on which of two modes it is operating in: normal, or autoreload. The mode is controlled by the autoreload enable (ARE) field of the TCR. When operating in normal mode (TCR[ARE] = ‘0’), the DEC decrements to the value 0 and then stops decrementing until it is reinitialized by software. When operating in autoreload mode (TCR[ARE] = ‘1’), instead of decrementing to the value 0, the DEC is reloaded with the value in the Decrementer Autoreload Register (DECAR), and continues to decrement with the next time base clock (assuming the DECAR value was nonzero). The DECAR register is a 32-bit privileged, write-only SPR, and is written using mtspr. The autoreload feature of the DEC is disabled upon reset, and must be enabled by software. The Decrementer Register and the Decrementer Autoreload Register (DECAR) are shown here. Version 2.2 July 31, 2014 Timer Facilities Page 159 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Decrementer 0 1 2 3 4 5 6 Bits Field Name 0:31 Decrementer 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Description The decrementer holds a value that decrements with each time base clock cycle, and is automatically reloaded through the Decrementer Autoreload Register. Autoreload value 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Bits Field Name Description 0:31 Autoreload value The value in this register is copied to DEC at the next time base clock when DEC = ‘1’ and autoreload is enabled (TCR[ARE] = ‘1’). When an mtspr instruction forces the DEC count to 0, a decrementer exception does not occur, and thus TSR[DIS] is not set. However, if a time base clock causes a decrement from a DEC value of 1 to occur simultaneously with the writing of the DEC by an mtspr instruction, the decrementer exception does occur, TSR[DIS] is set, and the DEC is written with the value from the mtspr. For software to quiesce the activity of the DEC and eliminate all DEC exceptions, the following procedure should be performed: 1. Write ‘0’ to TCR[DIE]. This prevents a decrementer exception from causing a decrementer interrupt. 2. Write ‘0’ to TCR[ARE]. This disables the DEC autoreload feature. 3. Write x‘0000 0000’ to the DEC to halt decrementing. Although this action does not itself cause a decrementer exception, it is possible that a decrement from a DEC value of 1 has occurred since the last time that TSR[DIS] was cleared. 4. Write ‘1’ to TSR[DIS] (DEC interrupt status bit). This clears the decrementer exception by setting TSR[DIS] to ‘0’. Because the DEC is no longer decrementing (because it was written with x‘0000 0000’ in step 3), no further decrementer exceptions are possible. 6.3 Fixed-Interval Timer The FIT provides a mechanism for causing regular periodic exceptions. The FIT typically is used by system software to start a periodic system maintenance function, executed by the FIT interrupt handler. A FIT exception occurs on a ‘0’ to ‘1’ transition of a selected bit from the time base. Note that a FIT exception also occurs if the selected time base bit changes from ‘0’ to ‘1’ when an mtspr instruction writes a ‘1’ to the selected time base bit that is at ‘0’. The fixed-interval timer FIT field of the TCR selects one of four bits from the TBL Register, as shown in Table 6-2 on page 161. Timer Facilities Page 160 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Table 6-2. Fixed-Interval Timer Period Selection TCR[FP] Time Base Bit Period (time base clocks) Period (400 MHz clock) 00 TBL[19] 213 clocks 20.48 μs 01 10 11 TBL[15] TBL[11] TBL[7] 17 clocks 327.68 μs 21 clocks 5.2 ms 25 clocks 83.9 ms 2 2 2 When a fixed-interval timer exception occurs, the exception status is recorded by setting the fixed-interval timer interrupt status (FIS) bit of the TSR to ‘1’. A fixed-interval timer interrupt occurs if it is enabled by both the fixed-interval timer interrupt enable (FIE) field of the TCR and by MSR[EE]. Section 7.5.11 Fixed-Interval Timer Interrupt on page 198 provides more information about the handling of fixed-interval timer interrupts. The fixed-interval timer interrupt handler software should clear TSR[FIS] before re-enabling MSR[EE] to avoid another fixed-interval timer interrupt caused by the same exception (unless TCR[FIE] is cleared instead). 6.4 Watchdog Timer The watchdog timer provides a mechanism for system error recovery in case the program running on the PowerPC 476FP processor has stalled and cannot be interrupted by the normal interrupt mechanism. The watchdog timer can be configured to cause a critical-class watchdog timer interrupt upon the expiration of a single period of the watchdog timer. It can also be configured to start a processor-initiated reset upon the expiration of a second period of the watchdog timer. A watchdog timer exception occurs on a ‘0’ to ‘1’ transition of a selected bit from the time base. Note that a watchdog timer exception also occurs if the selected time base bit changes from ‘0’ to ‘1’ when an mtspr instruction writes a ‘1’ to the selected time base bit when it is at ‘0‘. The watchdog timer period (WP) field of the TCR selects one of four bits from the TBU Register, as shown in Table 6-3 on page 161. Table 6-3. Watchdog Timer Period Selection TCR[WP] Time Base Bit Period (time base clocks) Period (400 MHz clock) 00 TBL[11] 221 clocks 5.2 ms 01 10 11 TBL[7] TBL[3] TBU[31] 2 25 clocks 83.9 ms 2 29 clocks 1.34 s 33 clocks 21.47 s 2 The action taken upon a watchdog timer exception depends upon the status of the enable next watchdog (ENW) and watchdog timer interrupt status (WIS) fields of the TSR at the time of the exception. When TSR[ENW] = ‘0’, the next watchdog timer exception is disabled, and the only action to be taken upon the exception is to set TSR[ENW] to ‘1’. By clearing TSR[ENW], software can guarantee that the time until the next enabled watchdog timer exception is at least one full watchdog timer period and a maximum of two full watchdog timer periods. Version 2.2 July 31, 2014 Timer Facilities Page 161 of 322 User’s Manual PowerPC 476FP Embedded Processor Core When TSR[ENW] = ‘1’, the next watchdog timer exception is enabled, and the action to be taken upon the exception depends on the value of TSR[WIS] at the time of the exception. If TSR[WIS] = ‘0’, the action is to set TSR[WIS] to ‘1’, at which time a watchdog timer interrupt occurs if enabled by both the watchdog timer interrupt enable (WIE) field of the TCR and by the critical interrupt enable (CE) field of the MSR. The watchdog timer interrupt handler software should clear TSR[WIS] before re-enabling MSR[CE] to avoid another watchdog timer interrupt caused by the same exception (unless TCR[WIE] is cleared instead). Section 7.5.12 Watchdog Timer Interrupt on page 199 provides more information about the handling of watchdog timer interrupts. If TSR[WIS] is already ‘1’ at the time of the next watchdog timer exception, the action to take depends on the value of the watchdog timer reset control (WRC) field of the TCR. If TCR[WRC] is nonzero and a watchdog timer exception occurs, the value of the TCR[WRC] field is copied into the watchdog timer reset status (WRS) bit of the TSR, TCR[WRC] is cleared, and a core reset occurs. See Section 9.1 Processor Core State after Reset on page 243 for more information about core behavior when reset. Note: After software has set TCR[WRC] to a nonzero value, it cannot be reset by software; this feature prevents errant software from disabling the watchdog timer reset capability. Table 6-4 summarizes the action to be taken upon a watchdog timer exception according to the values of TSR[ENW] and TSR[WIS]. Table 6-4. Watchdog Timer Exception Behavior TSR[ENW] TSR[WIS] Action upon Watchdog Timer Exception 0 0 Set TSR[ENW] to ‘1’. 0 1 Set TSR[ENW] to ‘1’. 1 0 Set TSR[WIS] to ‘1’. If watchdog timer interrupts are enabled (TCR[WIE] = ‘1’ and MSR[CE] = ‘1’), an interrupt occurs. 1 1 Cause the watchdog timer reset action specified by TCR[WRC]. A reset causes the TCR[WRC] bit to be copied into TSR[WRS], and then clears TCR[WRC]. A typical system use of the watchdog timer function is to enable the watchdog timer interrupt and the watchdog timer reset function in the TCR (and MSR), and to start out with both TSR[ENW] and TSR[WIS] cleared to zeros. A recurring software loop of reliable duration (or alternatively the interrupt handler for a periodic interrupt such as the fixed-interval timer interrupt) can perform a periodic check of system integrity. Upon successful completion of the system check, software clears TSR[ENW], thereby ensuring that a minimum of one full watchdog timer period and a maximum of two full watchdog timer periods must expire before an enabled watchdog timer exception occurs. If for some reason the recurring software loop is not successfully completed (and TSR[ENW] does not get cleared) during this period of time, an enabled watchdog timer exception occurs. The exception sets TSR[WIS], and a watchdog timer interrupt occurs (if enabled by both TCR[WIE] and MSR[CE]). The occurrence of a watchdog timer interrupt in this software-serviced system is interpreted as a system error, because the system was unable to complete the periodic system integrity check in time to avoid the watchdog timer exception. The action taken by the watchdog timer interrupt handler is system-dependent, but typically the software attempts to determine the nature of the problem and correct it if possible. If and when the system attempts to resume operation, the software typically clears both TSR[WIS] and TSR[ENW], thus providing a minimum of another full watchdog timer period for a new system integrity check to occur. Finally, if for some reason the watchdog timer interrupt is disabled or the watchdog timer interrupt handler is unsuccessful in clearing TSR[WIS] and TSR[ENW] before another watchdog timer exception, or both, the next exception causes a processor reset operation to occur, according to the value of TCR[WRC]. Timer Facilities Page 162 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Figure 6-2 illustrates the sequence of watchdog timer events that typically occurs in a system. Figure 6-2. Watchdog State Machine The watchdog timer exception is disabled. The next exception sets TSR[ENW] so that a subsequent exception can set TSR[WIS]. Exception Software Loop TSR[ENW,WIS] = ‘00’ TSR[ENW,WIS] = ‘01’ The watchdog timer exception is enabled. The next exception sets TSR[WIS] and causes an interrupt if enabled by TCR[WIE] and MSR[CE]. Exception Watchdog Timer Interrupt Handler TSR[ENW,WIS] = ‘01’ Exception The watchdog timer exception is disabled, but TSR[WIS] is already set. This state should not occur. TSR[ENW,WIS] = ‘11’ Exception If TCR[WRC] ≠ ‘00’, then perform a RESET. Otherwise, do nothing. The watchdog timer exception is enabled, and the first exception status is still set. The next exception causes a reset if enabled by TCR[WRC]. 6.5 Timer Control Register The Timer Control Register (TCR) is a privileged SPR that controls DEC, FIT, and watchdog timer operation. The TCR is read into a General Purpose Register (GPR) by using mfspr, and is written from a GPR by using mtspr. The WRC field of the TCR is cleared to zero by a processor reset (see Section 9.1 Processor Core State after Reset on page 243). Each bit of this 2-bit field is set only by software and is cleared only by hardware. For each bit of the field, after software has written it to a ‘1’, that bit remains ‘1’ until a processor reset occurs. This prevents errant code from disabling the watchdog timer reset function. The ARE bit of the TCR is also cleared to ‘0’ by a processor reset. This disables the autoreload feature of the DEC. Version 2.2 July 31, 2014 Timer Facilities Page 163 of 322 User’s Manual ARE FP FIE WRC DIE WP WIE PowerPC 476FP Embedded Processor Core Reserved 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:33 WP Description Watchdog timer period. 00 221 time base clocks. 01 225 time base clocks. 10 229 time base clocks. 11 233 time base clocks. 34:35 WRC Watchdog timer reset control. This field is reset to ‘00’. This field specifies the type of reset that is generated when a watchdog timer exception occurs with TSR[ENW,WIS] = ‘11’. This field can be set by software, but cannot be cleared by software, except by a software-induced reset. 00 No watchdog timer reset. 01 Processor core reset. 10 Chip reset. 11 System reset. 36 WIE Watchdog timer interrupt enable. 0 Disable the watchdog timer interrupt. 1 Enable the watchdog timer interrupt. 37 DIE Decrementer interrupt enable. 0 Disable decrementer interrupt. 1 Enable decrementer interrupt. 38:39 FP FIT period. 00 213 time base clocks. 01 217 time base clocks. 10 221 time base clocks. 11 225 time base clocks. 40 FIE FIT interrupt enable. 0 Disable FIT interrupt. 1 Enable FIT interrupt. 41 ARE Autoreload enable. This bit is reset to ‘0’. 0 Disable autoreload. 1 Enable autoreload. 41:64 Reserved 6.6 Timer Status Register The Timer Status Register (TSR) is a privileged SPR that records the status of DEC, FIT, and watchdog timer events. The fields of the TSR are generally set to ‘1’ only by hardware and are cleared to ‘0’ only by software. Hardware cannot clear any fields in the TSR, nor can software set any fields. Software can read the TSR into a GPR by using the mfspr instruction. Clearing the TSR is performed using the mtspr instruction by placing a ‘1’ in the GPR source register in all bit positions that are to be cleared in the TSR, and a ‘0’ in all other bit positions. The data written from the GPR to the TSR is not direct data, but a mask. A ‘1’ clears the bit, and a ‘0’ leaves the corresponding TSR bit unchanged. Timer Facilities Page 164 of 322 Version 2.2 July 31, 2014 User’s Manual FIS WRS DIS WIS ENW PowerPC 476FP Embedded Processor Core Reserved 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name Description 32 ENW Enable next watchdog timer exception. 0 The action on the next watchdog timer exception is to set TSR[ENW] = ‘1’. 1 The action on the next watchdog timer exception is governed by TSR[WIS]. See Table 6-4 on page 162 for information about the action taken. 33 WIS Watchdog timer interrupt status. 0 The watchdog timer exception has not occurred. 1 The watchdog timer exception has occurred. 34:35 WRS Watchdog timer reset status. 00 No watchdog timer reset has occurred. 01 A core reset was forced by the watchdog timer. 10 A chip reset was forced by the watchdog timer. 11 A system reset was forced by the watchdog timer. 36 DIS Decrementer interrupt status. 0 A decrementer exception has not occurred. 1 A decrementer exception has occurred. 37 FIS Fixed-interval timer interrupt status. 0 A fixed-interval timer exception has not occurred. 1 A fixed-interval timer exception has occurred. 38:63 Reserved Reserved. 6.7 Halting the Timer Facilities The debug mechanism provides a means for temporarily halting the timers upon a debug exception. Whenever a debug exception is recorded in the Debug Status Register (DBSR), the time base can be prevented from incrementing, and the decrementer can be prevented from decrementing. This allows a debugger to simulate the appearance of real-time operation, even though the application has been temporarily stopped to service the debug event. Section 8.5.1 Debug Control Register 0 (DBCR0) on page 235 describes the use of the freeze timers (FT) bit. 6.8 Selection of the Timer Clock Source The timer clock source is selected by CCR1[TSS] and determines which clock is the timer source: the CPU clock or CPMC476TIMERCLOCK. See Section 2.7.5 Core Configuration Register 1 (CCR1) on page 70 for more information. The timer clock select (TCS) field of the Core Configuration Register 1 (CCR1) determines what clock frequency runs the timers. When set to ‘00’, CCR1[TCS] selects the selected source clock frequency. This is the highest frequency timer clock source. When set to ‘01’, CCR1[TCS] selects a quarterrate processor clock as the timer clock. Other TCS settings select other submultiples of the processor clock. See Core Configuration Register 1 (CCR1) on page 70 for more information about the timer clock select field. Version 2.2 July 31, 2014 Timer Facilities Page 165 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Timer Facilities Page 166 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 7. Processor Interrupts and Exceptions This section begins by defining the terminology and classification of interrupts and exceptions in Section 7.1 Overview on page 167 and Section 7.2 Interrupt Classes on page 167. Section 7.3 Interrupt Processing on page 170 explains in general how interrupts are processed, including the requirements for partial execution of instructions. Several registers support interrupt handling and control. Section 7.4 Interrupt Processing Registers on page 173 describes these registers. Table 7-2 on page 183 lists the interrupts and exceptions handled by the PowerPC 476FP core, in the order of Interrupt Vector Offset Register (IVOR) usage. Detailed descriptions of each interrupt type follow, in the same order. Finally, Section 7.6 Interrupt Ordering and Masking on page 207 and Section 7.7 Exception Priorities on page 210 define the priority order for the processing of simultaneous interrupts and exceptions. 7.1 Overview An interrupt is the action in which the processor saves its old context (Machine State Register [MSR] and next instruction address) and begins execution at a predetermined interrupt-handler address with a modified MSR. Exceptions are the events that will, if enabled, cause the processor to take an interrupt. Exceptions are generated by signals from internal and external peripheral devices, instructions, the internal timer facility, debug events, or error conditions. Interrupts are divided into four classes, and when they are processed no program state is lost. Because Save/Restore register pairs SRR0/SRR1, CSRR0/CSRR1, and MCSSR0/MCSSR1 are serially reusable resources used by base, critical, Machine Check interrupts, respectively, the program state might be lost when an unordered interrupt is taken. All interrupts, except machine check, are context synchronizing. A machine check interrupt acts like a context synchronizing operation with respect to subsequent instructions. Exceptions might be generated by the execution of instructions or by signals from devices external to the PowerPC 476FP core, the internal timer facilities, debug events, or error conditions. 7.2 Interrupt Classes All interrupts, except for machine check, can be categorized according to two independent characteristics of the interrupt: • Asynchronous or synchronous • Critical or noncritical 7.2.1 Asynchronous Interrupts Asynchronous interrupts are caused by events that are independent of instruction execution. For asynchronous interrupts, the address reported to the interrupt handling routine is the address of the instruction that would have executed next, had the asynchronous interrupt not occurred. Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 167 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 7.2.2 Synchronous Interrupts Synchronous interrupts are those that are caused directly by the execution (or attempted execution) of instructions, and are further divided into two classes, precise and imprecise. Synchronous, precise interrupts are those that precisely indicate the address of the instruction causing the exception that generated the interrupt or for certain synchronous, precise interrupt types, the address of the immediately following instruction. Synchronous, imprecise interrupts are those that can indicate the address of the instruction that caused the exception that generated the interrupt or the address of some instruction after the one that caused the exception. 7.2.2.1 Synchronous, Precise Interrupts When the execution or attempted execution of an instruction causes a synchronous, precise interrupt, the following conditions exist when the associated interrupt handler begins execution: • SRR0 (see Section 7.4.2 Save/Restore Register 0 (SRR0) on page 174) or CSRR0 (see Section 7.4.4 Critical Save/Restore Register 0 (CSRR0) on page 175) addresses either the instruction that caused the exception that generated the interrupt or the instruction immediately following this instruction. Which instruction is addressed can be determined from a combination of the interrupt type and the setting of certain fields of the ESR (see Section 7.4.11 Exception Syndrome Register (ESR) on page 179). • The interrupt is generated such that all instructions preceding the instruction that caused the exception appear to have completed with respect to the executing processor. However, some storage accesses associated with these preceding instructions might not have been performed with respect to other processors and mechanisms. • The instruction that caused the exception might appear not to have begun execution (except for having caused the exception), might have been partially executed, or might have completed, depending on the interrupt type (see Section 7.3.1 Partially Executed Instructions on page 172). • Architecturally, no instruction beyond the one that caused the exception has executed. 7.2.2.2 Synchronous, Imprecise Interrupts When the execution or attempted execution of an instruction causes a synchronous, imprecise interrupt, the following conditions exist when the associated interrupt handler begins execution: • SRR0 or CSRR0 addresses either the instruction that caused the exception that generated the interrupt, or some instruction following this instruction. • The interrupt is generated such that all instructions preceding the instruction addressed by SRR0 or CSRR0 appear to have completed with respect to the executing processor. • If the imprecise interrupt is forced by the context synchronizing mechanism due to an instruction that causes another exception that generates an interrupt (for example, alignment, data storage), SRR0 addresses the interrupt-forcing instruction, and the interrupt-forcing instruction might have been partially executed (see Section 7.3.1 Partially Executed Instructions on page 172). • If the imprecise interrupt is forced by the execution synchronizing mechanism due to executing an execution synchronizing instruction other than msync or isync, SRR0 or CSRR0 addresses the interrupt-forcing instruction, and the interrupt-forcing instruction appears not to have begun execution (except for its forcing the imprecise interrupt). If the imprecise interrupt is forced by an msync or isync instruction, SRR0 or CSRR0 can address either the msync or isync instruction, or the following instruction. Processor Interrupts and Exceptions Page 168 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core • If the imprecise interrupt is not forced by either the context synchronizing mechanism or the execution synchronizing mechanism, the instruction addressed by SRR0 or CSRR0 might have been partially executed (see Section 7.3.1 Partially Executed Instructions on page 172). • No instruction following the instruction addressed by SRR0 or CSRR0 has executed. The only synchronous, imprecise interrupts in the PowerPC 476FP core are the special cases of delayed interrupts, which can result when certain kinds of exceptions occur while the corresponding interrupt type is disabled. The first of these is the floating-point enabled exception type program interrupt. For this type of interrupt to occur, a floating-point unit must be attached to the auxiliary processor interface of the PowerPC 476FP core, and the floating-point enabled exception summary bit of the Floating-Point Status and Control Register (FPSCR[FEX]) must be set while floating-point enabled exception type program interrupts are disabled due to MSR[FE0,FE1] both being ‘0’. When such interrupts are subsequently enabled by setting both of MSR[FE0,FE1] to ‘1’ while FPSCR[FEX] is still ‘1’, a synchronous, imprecise form of floating-point enabled exception type program interrupt occurs, and SRR0 is set to the address of the instruction that would have executed next (that is, the instruction after the one that updated MSR[FE0,FE1]). If the MSR was updated by an rfi, rfci, or rfmci instruction, SRR0 will be set to the address to which the rfi, rfci, or rfmci was returning and not to the instruction address that is sequentially after the rfi, rfci, or rfmci. The second type of delayed interrupt that is handled as a synchronous, imprecise interrupt is the debug interrupt. Similar to the floating-point enabled exception type program interrupt, the debug interrupt can be temporarily disabled by an MSR bit, MSR[DE]. Accordingly, certain kinds of debug exceptions can occur and be recorded in the DBSR while MSR[DE] = ‘0’, and later lead to a delayed debug interrupt if MSR[DE] is set to ‘1’ while a debug exception is still set in the DBSR. When this occurs, the interrupt will either be synchronous and imprecise, or it will be asynchronous, depending on the type of debug exception causing the interrupt. In either case, CSRR0 is set to the address of the instruction that would have executed next (that is, the instruction after the one that set MSR[DE] to ‘1’). If MSR[DE] is set to ‘1’ by rfi, rfci, or rfmci, CSRR0 is set to the address to which the rfi, rfci, or rfmci was returning, and not to the address of the instruction that was sequentially after the rfi, rfci, or rfmci. Besides these special cases of program and debug interrupts, all other synchronous interrupts are handled precisely by the PowerPC 476FP core, including FP enabled exception type program interrupts even when the processor is operating in one of the architecturally-defined imprecise modes (MSR[FE0,FE1] = ‘01’ or ‘10’). The PowerPC 476FP core generates a precise interrupt when MSR[FE0, FE1] = ‘01’ or ‘10’. See Section 7.5.7 Program Interrupt on page 193 and Section 7.5.15 Debug Interrupt on page 201 for a more detailed description of these interrupt types, including both the precise and imprecise cases. 7.2.3 Critical and Noncritical Interrupts Interrupts can also be classified as critical or noncritical interrupts. Certain interrupt types demand immediate attention, even if other interrupt types are currently being processed and have not yet saved the state of the machine (that is, return address and captured state of the MSR). To enable taking a critical interrupt immediately after a noncritical interrupt has occurred (that is, before the state of the machine has been saved), two sets of Save/Restore Register pairs are provided. Critical interrupts use the Save/Restore Register pair CSRR0/CSRR1. Noncritical interrupts use Save/Restore Register pair SRR0/SRR1. 7.2.4 Machine Check Interrupts Machine check interrupts are a special case. They are typically caused by some kind of hardware or storage subsystem failure or by an attempt to access an invalid address. A machine check can be caused indirectly by the execution of an instruction but not be recognized or reported until long after the processor has Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 169 of 322 User’s Manual PowerPC 476FP Embedded Processor Core executed past the instruction that caused the machine check. As such, machine check interrupts cannot properly be classified as either synchronous or asynchronous, nor as precise or imprecise. They also do not belong to either the critical or the noncritical interrupt class but instead, have associated with them a unique pair of save/restore registers, Machine Check Save/Restore Register 0 (MCSRR0) and Machine Check Save/Restore Register 1(MCSRR1). Architecturally, the following general rules apply for machine check interrupts: 1. No instruction after the one whose address is reported to the machine check interrupt handler in MCSRR0 has begun execution. 2. The instruction whose address is reported to the machine check interrupt handler in MCSRR0, and all prior instructions, might or might not have completed successfully. All those instructions that are ever going to complete appear to have done so already, and have done so within the context existing before the machine check interrupt. No further interrupt other than possible additional machine check interrupts occurs as a result of those instructions. With the PowerPC 476FP core, machine check interrupts can be caused by machine check exceptions on a memory access for an instruction fetch, a data access, or a TLB access. Some of the interrupts generated behave as synchronous, precise interrupts, and other are handled in an asynchronous fashion. In the case of an Instruction synchronous machine check exception, the PowerPC 476FP core handles the interrupt as a synchronous, precise interrupt, assuming machine check interrupts are enabled (MSR[ME] = ‘1’). That is, if a machine check exception is detected during an instruction fetch, the exception is not reported to the interrupt mechanism unless execution is attempted for the instruction address at which the machine check exception occurred. For example, if the direction of the instruction stream is changed (perhaps due to a branch instruction) such that the instruction at the address associated with the machine check exception will not be executed, the exception is not reported and no interrupt occurs. If an instruction machine check exception is reported, and if machine check interrupts are enabled at the time of the reporting of the exception, the interrupt is synchronous and precise, and MCSRR0 is set to the instruction address that led to the exception. If machine check interrupts are not enabled at the time of the reporting of an instruction machine check exception, a machine check interrupt will not be generated (even if MSR[ME] is subsequently set to ‘1’) although the ESR[ISMC] field is set to ‘1’ to indicate that the exception has occurred and that the instruction associated with the exception has been executed. Instruction asynchronous machine check, data asynchronous machine check, and TLB asynchronous machine check exceptions, however, are handled in an asynchronous fashion. That is, the address reported in MCSRR0 might not be related to the instruction that prompted the access that led directly or indirectly to the machine check exception. The address can be that of an instruction before or after the exception-causing instruction, or it can reference the exception causing instruction, depending on the nature of the access, the type of error encountered, and the circumstances of the instruction execution within the processor pipeline. If MSR[ME] = ‘0’ at the time of a machine check exception that is handled in this asynchronous way, a machine check interrupt subsequently occurs if MSR[ME] is set to ‘1’. See Section 7.5.2 Machine Check Interrupt on page 186 for more detailed information about machine check interrupts. 7.3 Interrupt Processing Associated with each kind of interrupt is an interrupt vector, that is, the address of the initial instruction that is executed when the corresponding interrupt occurs. Processor Interrupts and Exceptions Page 170 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Interrupt processing consists of saving a small part of the processor state in certain registers, identifying the cause of the interrupt in another register, and continuing execution at the corresponding interrupt vector location. When an exception exists and the corresponding interrupt type is enabled, the following actions are performed in order: 1. SRR0 (for noncritical class interrupts), CSRR0 (for critical class interrupts), or MCSRR0 (for machine check interrupts) is loaded with an instruction address that depends on the type of interrupt; see the specific interrupt description for details. 2. The ESR is loaded with information specific to the exception type. Note that many interrupt types can only be caused by a single type of exception and thus, do not need nor use an ESR setting to indicate the cause of the interrupt. Machine check interrupts load the Machine Check Syndrome Register (MCSR). 3. SRR1 (for noncritical class interrupts), CSRR1 (for critical class interrupts), or MCSRR1 (for machine check interrupts) is loaded with a copy of the contents of the MSR. 4. The MSR is updated described as follows. The new values take effect beginning with the first instruction following the interrupt: • MSR[WE,EE,PR,FP,FE0,DWE,FE1,IS,DS] are set to ‘0’ by all interrupts. • MSR[CE,DE] are set to ‘0’ by all critical class interrupts and left unchanged by all noncritical class interrupts. • MSR[ME] is set to ‘0’ by machine check interrupts and left unchanged by all other interrupts. See Section 7.4.1 Machine State Register (MSR) on page 173 for more detail on the definition of the MSR. 5. Instruction fetching and execution resumes using the new MSR value at the interrupt vector address, which is specific to the interrupt type and is determined as follows: IVPR0:15 || IVORn16:27 || 0b0000 where n specifies the IVOR register to be used for a particular interrupt type (see Section 7.4.9 Interrupt Vector Offset Registers (IVOR0 - IVOR15) on page 178). At the end of a noncritical interrupt handling routine, execution of an rfi causes the MSR to be restored from the contents of SRR1 and instruction execution to resume at the address contained in SRR0. Likewise, execution of an rfci performs the same function at the end of a critical interrupt handling routine using CSRR0 instead of SRR0 and CSRR1 instead of SRR1. The rfmci instruction uses MCSRR0 and MCSRR1 in the same manner. Note: In general, at process switch due to possible process interlocks and possible data availability requirements, the operating system must consider executing the following instructions: • stwcx., to clear the reservation if one is outstanding to ensure that an lwarx in the old process is not paired with a stwcx. in the new process. • msync, to ensure that all storage operations of an interrupted process are complete with respect to other processors before that process begins executing on another processor. • isync, rfi, rfci, or rfmci, to ensure that the instructions in the new process execute in the new context. Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 171 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 7.3.1 Partially Executed Instructions In general, the architecture permits load and store instructions to be partially executed, interrupted, and then to be restarted from the beginning upon return from the interrupt. To guarantee that a particular load or store instruction will complete without being interrupted and restarted, software must mark the storage being referred to as Guarded, and must use an elementary (not a string or multiple) load or store that is aligned on an operand-sized boundary. To guarantee that load and store instructions can, in general, be restarted and completed correctly without software intervention, the following rules apply when an instruction is partially executed and then interrupted: For an elementary load, no part of the target register, GPR(RT), FPR(FRT), or auxiliary processor register. will have been altered. • For the update forms of load and store instructions, the update register, GPR(RA), will not have been altered. However, the following effects are permissible when certain instructions are partially executed and then restarted: • For any store instruction, some of the bytes at the addressed storage location might have been accessed or updated (if write access to that page in which bytes were altered is permitted by the access control mechanism). In addition, for the stwcx. instruction, if the address is not aligned on a word boundary, the value in CR[CR0] is undefined, as is whether the reservation (if one existed) has been cleared. • For any load, some of the bytes at the addressed storage location might have been accessed (if read access to that page in which bytes were accessed is permitted by the access control mechanism). In addition, for the lwarx instruction, if the address is not aligned on a word boundary, it is undefined whether a reservation has been set. • For load multiple and load string instructions, some of the registers in the range to be loaded might have been altered. Including the addressing registers (GPR(RA), and possibly GPR(RB)) in the range to be loaded is an invalid form of these instructions (and a programming error), and thus, the rules for partial execution do not protect against overwriting of these registers. Such possible overwriting of the addressing registers makes these invalid forms of load multiple and load strings inherently nonrestartable. In no case is access control violated. As previously stated, the only load or store instructions that are guaranteed to not be interrupted after being partially executed are elementary, aligned, and guarded loads and stores. All others can be interrupted after being partially executed. The following list identifies the specific instruction types for which interruption after partial execution can occur and the specific interrupt types that can cause the interruption: • Any load or store (except elementary, aligned, guarded): – Critical input – Machine check – External input – Program (imprecise mode floating-point enabled) Note: This type of interrupt can lead to partial execution of a load or store instruction under the architectural definition only; the PowerPC 476FP core handles the imprecise modes of the floating-point enabled exceptions precisely, and hence, this type of interrupt does not lead to partial execution. – Decrementer Processor Interrupts and Exceptions Page 172 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core – Fixed-interval timer – Watchdog timer – Debug (unconditional debug event) • Unaligned elementary load or store, or any load or store multiple or string: All of those listed previously plus the following items: – Alignment – Data storage (if the access crosses a memory page boundary) – Debug (data address compare, data value compare) 7.4 Interrupt Processing Registers The interrupt processing registers include the Save/Restore Registers (SRR0 - SRR1), Critical Save/Restore Registers (CSRR0 - CSRR1), Data Exception Address Register (DEAR), Interrupt Vector Offset Registers (IVOR0 - IVOR15), Interrupt Vector Prefix Register (IVPR), and Exception Syndrome Register (ESR). Also described in this section is the Machine State Register (MSR), which belongs to the category of processor control registers. 7.4.1 Machine State Register (MSR) The MSR is a register of its own unique type that controls important chip functions such as the enabling or disabling of various interrupt types. Reserved PMM IS DS Reserved Reserved DE FE1 EE PR FP ME DWE WE CE FE0 Reserved Reserved The MSR can be written from a GPR using the mtmsr instruction. The contents of the MSR can be read into a GPR using the mfmsr instruction. The MSR[EE] bit can be set or cleared atomically using the wrtee or wrteei instructions. The MSR contents are also saved, altered, and restored by the interrupt-handling mechanism. 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:44 Reserved 45 WE Wait state enable. 0 The processor is not in the wait state. 1 The processor is in the wait state. If MSR[WE] = ‘1’, the processor remains in the wait state until an interrupt is taken, a reset occurs, or an external debug tool clears WE. 46 CE Critical interrupt enable. 0 Critical input and watchdog timer interrupts are disabled. 1 Critical input and watchdog timer interrupts are enabled. 47 Reserved Version 2.2 July 31, 2014 Description Processor Interrupts and Exceptions Page 173 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Bits Field Name Description 48 EE External interrupt enable. 0 External input, decrementer, and fixed interval timer interrupts are disabled. 1 External input, decrementer, and fixed interval timer interrupts are enabled. 49 PR Problem state. 0 Supervisor state (the processor is in privileged state). 1 Problem state (the processor is in problem state). 50 FP Floating point available. 0 The processor cannot execute floating-point instructions. 1 The processor can execute floating-point instructions. 51 ME Machine check enable. 0 Machine check interrupts are disabled. 1 Machine check interrupts are enabled. 52 FE0 Floating-point exception mode 0. 0 If MSR[FE1] = ‘0’, ignore exceptions mode; if MSR[FE1] = ‘1’, imprecise nonrecoverable mode. 1 If MSR[FE1] = ‘0’, imprecise recoverable mode; if MSR[FE1] = ‘1’, precise mode. 53 DWE Debug wait enable. 0 Disable debug wait mode. 1 Enable debug wait mode. 54 DE Debug interrupt enable. 0 Debug interrupts are disabled. 1 Debug interrupts are enabled. 55 FE1 Floating-point exception mode 1. 0 If MSR[FE0] = ‘0’, ignore exceptions mode; if MSR[FE0] = ‘1’, imprecise recoverable mode. 1 If MSR[FE0] = ‘0’, imprecise non-recoverable mode; if MSR[FE0] = ‘1’, precise mode. 56:57 Reserved 58 IS Instruction address space. 0 All instruction storage accesses are directed to address space 0 (TS = ‘0’ in the relevant TLB entry). 1 All instruction storage accesses are directed to address space 1 (TS = ‘1’ in the relevant TLB entry). 59 DS Data address space. 0 All data storage accesses are directed to address space 0 (TS = ‘0’ in the relevant TLB entry). 1 All data storage accesses are directed to address space 1 (TS = ‘1’ in the relevant TLB entry). 60 Reserved 61 PMM 62:63 Reserved Performance monitor mark. 0 Disable gathering statistics for marked processes. 1 Enable gathering statistics for marked processes. 7.4.2 Save/Restore Register 0 (SRR0) The SRR0 is an SPR that is used to save the machine state on noncritical interrupts and to restore machine state when an rfi is executed. When a noncritical interrupt occurs, SRR0 is set to an address associated with the process that was executing at the time. When rfi is executed, instruction execution returns to the address in SRR0. Processor Interrupts and Exceptions Page 174 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core In general, SRR0 contains the address of the instruction that caused the noncritical interrupt or the address of the instruction to return to after a noncritical interrupt is serviced. See the individual descriptions under Section 7.5 Interrupt Definitions on page 182 for an explanation of the precise address recorded in SRR0 for each noncritical interrupt type. Reserved SRR0 can be written from a GPR using mtspr and can be read into a GPR using mfspr. ADDR 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:61 ADDR 62:63 Reserved Description The return address for noncritical interrupts. Reserved. 7.4.3 Save/Restore Register 1 (SRR1) The SRR1 is an SPR that is used to save machine state on noncritical interrupts, and to restore machine state when an rfi is executed. When a noncritical interrupt is taken, the contents of the MSR (before the MSR was cleared by the interrupt) are placed into SRR1. When rfi is executed, the MSR is restored with the contents of SRR1. Bits of SRR1 that correspond to reserved bits in the MSR are also reserved. Note: An MSR bit that is reserved can be altered by rfi consistent with the value being restored from SRR1. Reserved PMM IS DS Reserved Reserved DE FE1 EE PR FP ME DWE WE CE FE0 Reserved Reserved SRR1 can be written from a GPR using mtspr, and can be read into a GPR using mfspr. 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits 32:63 Field Name Description A copy of the MSR at the time of a noncritical interrupt. 7.4.4 Critical Save/Restore Register 0 (CSRR0) The CSRR0 is an SPR that is used to save machine state on critical interrupts and to restore machine state when an rfci is executed. When a critical interrupt occurs, CSRR0 is set to an address associated with the process that was executing at the time. When rfci is executed, instruction execution returns to the address in CSRR0. In general, CSRR0 contains the address of the instruction that caused the critical interrupt or the address of the instruction to return to after a critical interrupt is serviced. See the individual descriptions under Section 7.5 Interrupt Definitions on page 182 for an explanation of the precise address recorded in CSRR0 for each critical interrupt type. Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 175 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Reserved CSRR0 can be written from a GPR using mtspr, and can be read into a GPR using mfspr. ADDR 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:61 ADDR 62:63 Reserved Description Return address for critical interrupts. 7.4.5 Critical Save/Restore Register 1 (CSRR1) The CSRR1 is an SPR that is used to save machine state on critical interrupts and to restore machine state when an rfci is executed. When a critical interrupt is taken, the contents of the MSR (before the MSR was cleared by the interrupt) are placed into CSRR1. When rfci is executed, the MSR is restored with the contents of CSRR1. Bits of CSRR1 that correspond to reserved bits in the MSR are also reserved. Because CSRR1 is a 32-bit register, CSRR1[0:31] corresponds to MSR[32:63]. Note: An MSR bit that is reserved can be altered by rfci, consistent with the value being restored from CSRR1. Reserved PMM IS DS Reserved Reserved DE FE1 EE PR FP ME DWE WE CE FE0 Reserved Reserved CSRR1 can be written from a GPR using mtspr and can be read into a GPR using mfspr. 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:63 Description A copy of the MSR when a critical interrupt is taken. 7.4.6 Machine Check Save/Restore Register 0 (MCSRR0) The MCSRR0 is an SPR that is used to save machine state on machine check interrupts, and to restore machine state when an rfmci is executed. When a machine check interrupt occurs, MCSRR0 is set to an address associated with the process that was executing at the time. When rfmci is executed, instruction execution returns to the address in MCSRR0. In general, MCSRR0 contains the address of the instruction that caused the machine check interrupt, or the address of the instruction to return to after a machine check interrupt is serviced. See the individual descriptions under Section 7.5 Interrupt Definitions on page 182 for an explanation of the precise address recorded in MCSRR0 for each machine check interrupt type. MCSRR0 can be written from a GPR using mtspr and can be read into a GPR using mfspr. Processor Interrupts and Exceptions Page 176 of 322 Version 2.2 July 31, 2014 User’s Manual Reserved PowerPC 476FP Embedded Processor Core ADDR 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:61 ADDR 62:63 Reserved Description The return address for machine check interrupts. 7.4.7 Machine Check Save/Restore Register 1 (MCSRR1) The MCSRR1 is an SPR that is used to save machine state on machine check interrupts and to restore machine state when an rfmci is executed. When a machine check interrupt is taken, the contents of the MSR (before the MSR was cleared by the interrupt) are placed into MCSRR1. When rfmci is executed, the MSR is restored with the contents of MCSRR1. Bits of MCSRR1 that correspond to reserved bits in the MSR are also reserved. Because CSRR1 is a 32-bit register, CSRR1[0:31] corresponds to MSR[32:63]. Note: An MSR bit that is reserved can be altered by rfmci, consistent with the value being restored from MCSRR1. Reserved PMM IS DS Reserved Reserved DE FE1 EE PR FP ME DWE WE CE FE0 Reserved Reserved MCSRR1 can be written from a GPR using mtspr and can be read into a GPR using mfspr. 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits 32:63 Field Name Description A copy of the Machine State Register (MSR) at the time of a machine check interrupt. 7.4.8 Data Exception Address Register (DEAR) The DEAR contains the address that was referenced by a load, store, or cache management instruction that caused an alignment, data TLB miss, or data storage exception. The DEAR can be written from a GPR using mtspr, and can be read into a GPR using mfspr. Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 177 of 322 User’s Manual PowerPC 476FP Embedded Processor Core DEAR 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name Description 32:63 DEAR Data Cache Effective Address Register. Upon exceptions that are detected in the data cache unit, such as a UTLB miss or a DSI, the DEAR is written when a faulty commitment occurs in the LWB with the effective address of the data request. The DEAR can also be written with an mtspr instruction. 7.4.9 Interrupt Vector Offset Registers (IVOR0 - IVOR15) An IVOR specifies the quadword (16-byte)-aligned interrupt vector offset from the base address provided by the IVPR (see Section 7.4.10 Interrupt Vector Prefix Register (IVPR) on page 179) for its respective interrupt type. IVOR0 - IVOR15 are provided for the defined interrupt types. The interrupt vector effective address is formed as follows: IVPR32:47 || IVORn48:59 || 0b0000 where n specifies the IVOR register to be used for the particular interrupt type. Any IVOR can be written from a GPR using mtspr and can be read into a GPR using mfspr. Table 7-1 identifies the specific IVOR register associated with each interrupt type. Reserved OFFSET Reserved 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:47 Reserved 48:59 OFFSET 60:63 Reserved Description The address used for the interrupt vector. Table 7-1 on page 178 identifies the specific IVOR register associated with each interrupt type. Table 7-1. Interrupt Types Associated with each IVOR (Page 1 of 2) IVOR Interrupt Type IVOR0 Critical input IVOR1 Machine check IVOR2 Data storage IVOR3 Instruction storage IVOR4 External input IVOR5 Alignment IVOR6 Program Processor Interrupts and Exceptions Page 178 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Table 7-1. Interrupt Types Associated with each IVOR (Page 2 of 2) IVOR Interrupt Type IVOR7 Floating point unavailable IVOR8 System call IVOR9 Auxiliary processor unavailable IVOR10 Decrementer IVOR11 Fixed interval timer IVOR12 Watchdog timer IVOR13 Data translation lookaside buffer (TLB) error IVOR14 Instruction TLB error IVOR15 Debug 7.4.10 Interrupt Vector Prefix Register (IVPR) The IVPR provides the high-order 16 bits of the effective address of the interrupt vectors, for all interrupt types. The interrupt vector effective address is formed as follows: IVPR0:15 || IVORn16:27 || 0b0000 where n specifies the IVOR register to be used for the particular interrupt type. The IVPR can be written from a GPR using mtspr, and can be read into a GPR using mfspr. ADDR Reserved 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:47 ADDR 48:63 Reserved Description Address prefix. 7.4.11 Exception Syndrome Register (ESR) The ESR provides a syndrome to differentiate between the different kinds of exceptions that can generate the same interrupt type. Upon the generation of one of these types of interrupt, the bit or bits corresponding to the specific exception that generated the interrupt is set, and all other ESR bits are cleared. Other interrupt types do not affect the contents of the ESR. See the individual interrupt descriptions under Section 7.5 Interrupt Definitions on page 182 for an explanation of the ESR settings for each interrupt type, and for a more detailed explanation of the function of certain ESR fields. The ESR can be written from a GPR using mtspr and can be read into a GPR using mfspr. Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 179 of 322 User’s Manual BO PIE Reserved PCMP AP PCRE DLK PUO FP ST Reserved PTR PIL PPR POT2 SS POT1 ISMC PowerPC 476FP Embedded Processor Core PCRF 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name Description 32 ISMC 33 SS 34 POT1 Program interrupt: opcode trap 1. This bit is set if an interrupt occurs and the opcode matches the opcode specified in IOCR1. 35 POT2 Program interrupt: opcode trap 2. This bit is set if an interrupt occurs and the opcode matches the opcode specified in IOCR2. 36 PIL Program interrupt: illegal instruction exception. 0 An illegal instruction exception did not occur. 1 An illegal instruction exception occurred. 37 PPR Program interrupt: privileged instruction exception. 0 A privileged instruction exception did not occur. 1 A privileged instruction exception occurred. 38 PTR Program interrupt: trap exception. 0 A trap exception did not occur. 1 A trap exception occurred. 39 FP Floating-point operation. 0 The exception was not caused by a floating-point instruction. 1 The exception was caused by a floating-point instruction. 40 ST Store operation. 0 The exception was not caused by a store-type storage access or cache management instruction. 1 The exception was caused by a store-type storage access or cache management instruction. 41 Reserved 42:43 DLK Instruction side machine check. 0 An instruction did not cause an exception. 1 An instruction caused an exception. Storage synchronization. 0 A storage synchronization exception did not occur. 1 A storage synchronization exception occurred. This exception occurs when the lwarx or stwcx instructions are issued and both the write-through (W) and caching-inhibited storage attributes (I) are enabled. See Section 4.5 Storage Attributes on page 116 for more information. Reserved. Data storage interrupt: locking exception. 00 A locking exception did not occur. 01 A dcbf instruction issued in user mode caused the locking exception. 10 An icbi issued in user mode caused the locking exception. 11 Reserved. Note: 1. The PCRE, PCMP, and PCRF fields are implementation-dependent fields of the ESR and not part of the Power Instruction Set Architecture (ISA) Version 2.05 Architecture. Processor Interrupts and Exceptions Page 180 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Bits Field Name Description 44 AP Auxiliary processor operation. 0 The exception was not caused by an auxiliary processor instruction. 1 The exception was caused by an auxiliary processor instruction. This bit is used with program, alignment, data storage interrupt (DSI), and data-side TLB miss interrupt types. 45 PUO 46 BO Byte ordering exception. 0 A byte ordering exception did not occur. 1 A byte ordering exception occurred. 47 PIE Program interrupt: imprecise exception. 0 An exception occurred precisely. Save/Restore Register 0 (SRR0) contains the address of the instruction that caused the exception. 1 An exception occurred imprecisely. SRR0 contains the address of an instruction after the one that caused the exception. This field is only set for a floating-point enabled exception type program interrupt when the interrupt occurs imprecisely due to MSR[FE0,FE1] being set to a nonzero value when an attached floatingpoint unit is already signaling the floating-point enabled exception (FPSCR[FEX] is already ‘1’). 48:58 Reserved 59 PCRE Program interrupt: condition register enable1. 0 The instruction that caused the exception is not a floating-point CR-updating instruction. 1 The instruction that caused the exception is a floating-point CR-updating instruction. This field is only defined for a floating-point enabled exception type program interrupt, and then only when ESR[PIE] = ‘0’. 60 PCMP Program interrupt: compare1. 0 Instruction that caused the exception is not a floating-point compare type instruction. 1 Instruction that caused the exception is a floating-point compare type instruction. This field is only defined for a floating-point enabled exception type program interrupt, and then only when ESR[PIE] = ‘0’. 61:63 PCRF Program interrupt: condition register field1. If ESR[PCRE] = ‘1’, this field indicates which CR field was to be updated by the floating-point instruction that caused the exception. This field is only defined for a floating-point enabled exception type program interrupt, and then only when ESR[PIE] = ‘0’. Program interrupt: unimplemented operation exception. 0 An unimplemented operation exception did not occur. 1 An unimplemented operation exception occurred. Reserved. Note: 1. The PCRE, PCMP, and PCRF fields are implementation-dependent fields of the ESR and not part of the Power Instruction Set Architecture (ISA) Version 2.05 Architecture. 7.4.12 Machine Check Syndrome Register (MCSR) The MCSR contains status to allow the machine check interrupt handler software to determine the cause of a machine check exception. Any machine check exception that is handled as an asynchronous interrupt sets MCSR[MCS] and other appropriate bits of the MCSR. If MSR[ME] and MCSR[MCS] are both set, the machine takes a machine check interrupt. Section 7.5.2 Machine Check Interrupt on page 186 The MCSR is read into a GPR using mfspr or write to MCSR using mtspr for SPR address of x‘23C’. See Table A-1 on page 263. Clearing the MCSR is performed using mtspr by placing a ‘1’ in the GPR source register in all bit positions that are to be cleared in the MCSR, and a ‘0’ in all other bit positions. The data Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 181 of 322 User’s Manual PowerPC 476FP Embedded Processor Core L2 DCR IMP FPR IC DC GPR Reserved TLB MCS written from the GPR to the MCSR is not direct data, but a mask. A ‘1’ clears the bit, and a ‘0’ leaves the corresponding MCSR bit unchanged. Note that the SPR address for this clearing operation is x‘33C’. See Table A-1 on page 263. Reserved 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name Description 32 MCS 33:35 Reserved 36 TLB 37 IC I-cache asynchronous error. 38 DC D-cache error. 39 GPR GPR parity error. 40 FPR FPR parity error. 41 IMP Imprecise machine check. 42 L2 43 DCR 44:63 Reserved Machine check summary. UTLB parity error. Error or system error reported through the L2 cache. DCR timeout (enabled by CCR2[MCDTO]). 7.5 Interrupt Definitions Table 7-2 on page 183 provides a summary of each interrupt type, in the order corresponding to their associated IVOR register. The table also summarizes the various exception types that might cause that interrupt type; the classification of the interrupt; which ESR bits can be set, if any; and which mask bits can mask the interrupt type, if any. Processor Interrupts and Exceptions Page 182 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Data storage IVOR3 X Data machine check X IVOR4 External input IVOR5 Alignment X CE 1 X ME 2 X ME 2 EE 1 Read access control X [FP,AP] Write access control X ST, [FP,AP] Cache locking X DLK, [ST] lwarx/stwcx. to W = 1 or IL1 = 1 X SS, [ST] Instruction storage Execute access control External Input Notes IVOR2 Instruction machine check DBCR Mask Machine check X MSR Mask IVOR1 Critical input ESR Critical Critical input Context Synchronous IVOR0 Exception Type Synchronous, Precise Interrupt Type Synchronous, Imprecise IVOR Asynchronous Table 7-2. Interrupt and Exception Types (Page 1 of 3) X X Load/store alignment X [ST], [FP] Load/store multiple X [ST] dcbz to W = 1 or I = 1 Notes: 1. Although it is not specified as part of Book E, it is common for system implementations to provide, as part of the interrupt controller, independent mask and status bits for the various sources of critical input and external input interrupts. 2. Machine check interrupts are not classified as asynchronous nor synchronous. They are also not classified as critical or noncritical because they use their own unique set of Save/Restore Registers, MCSRR0 and MCSRR1. See Section 7.2.4 Machine Check Interrupts on page 169 and Section 7.5.2 Machine Check Interrupt on page 186. 3. Debug exceptions have special rules regarding their interrupt classification (synchronous or asynchronous, and precise or imprecise), depending on the particular debug mode being used and other conditions (see Section 7.5.15 Debug Interrupt on page 201). 4. In general, when an interrupt causes a particular ESR bit or bits to be set as indicated in the table, it also causes all other ESR bits to be cleared. Special rules apply to the ESR[ISMC] field; see Section 7.5.2 Machine Check Interrupt on page 186. If no ESR setting is indicated for any of the exception types within a given interrupt type, the ESR is unchanged for that interrupt type. The syntax for the ESR setting indication is as follows: • [xxx] means ESR[xxx] might be set. • [xxx,yyy,zzz] means any one (or none) of ESR[xxx] or ESR[yyy] or ESR[zzz] might be set, but never more than one. • {xxx,yyy,zzz} means that any combination of ESR[xxx], ESR[yyy], and ESR[zzz] might be set, including all or none. • xxx means ESR[xxx] will be set. 5. Unimplemented operation exception type program interrupts can only occur when the PowerPC 476FP core is connected to a floating-point unit or auxiliary processor, and then only when executing instruction opcodes that are recognized by the floating-point unit or auxiliary processor but are not implemented within the hardware. 6. Floating-point unavailable and auxiliary processor unavailable interrupts and floating-point enabled and auxiliary processor enabled exception type program interrupts can only occur when the PowerPC 476FP core is connected to a floating-point unit or auxiliary processor and then only when executing instruction opcodes that are recognized by the floating-point unit or auxiliary processor. Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 183 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Notes DBCR Mask MSR Mask ESR Critical Program Context Synchronous IVOR6 Exception Type Synchronous, Precise Interrupt Type Synchronous, Imprecise IVOR Asynchronous Table 7-2. Interrupt and Exception Types (Page 2 of 3) Illegal instruction X PIL mtspr/mfspr to undefined UM SPR X PIL Privileged instruction X PPR, [AP] Trap X PTR IOC enabled trap X [POT1], [POT2] FP enabled X FP, [PIE], [PCRE], [PCMP], [PCRF] 6 AP enabled X AP 6 Unimplemented operation X PUO, [FP,AP] 5 FP unavailable X System call X AP unavailable X IVOR7 FP unavailable IVOR8 System call IVOR9 AP unavailable IVOR10 Decrementer IVOR11 FIT IVOR12 Watchdog timer IVOR13 DTLB miss DTLB miss X IVOR14 ITLB miss ITLB miss X 6 Decrementer X EE FIT X EE Watchdog timer X X CE [ST], [FP,AP] Notes: 1. Although it is not specified as part of Book E, it is common for system implementations to provide, as part of the interrupt controller, independent mask and status bits for the various sources of critical input and external input interrupts. 2. Machine check interrupts are not classified as asynchronous nor synchronous. They are also not classified as critical or noncritical because they use their own unique set of Save/Restore Registers, MCSRR0 and MCSRR1. See Section 7.2.4 Machine Check Interrupts on page 169 and Section 7.5.2 Machine Check Interrupt on page 186. 3. Debug exceptions have special rules regarding their interrupt classification (synchronous or asynchronous, and precise or imprecise), depending on the particular debug mode being used and other conditions (see Section 7.5.15 Debug Interrupt on page 201). 4. In general, when an interrupt causes a particular ESR bit or bits to be set as indicated in the table, it also causes all other ESR bits to be cleared. Special rules apply to the ESR[ISMC] field; see Section 7.5.2 Machine Check Interrupt on page 186. If no ESR setting is indicated for any of the exception types within a given interrupt type, the ESR is unchanged for that interrupt type. The syntax for the ESR setting indication is as follows: • [xxx] means ESR[xxx] might be set. • [xxx,yyy,zzz] means any one (or none) of ESR[xxx] or ESR[yyy] or ESR[zzz] might be set, but never more than one. • {xxx,yyy,zzz} means that any combination of ESR[xxx], ESR[yyy], and ESR[zzz] might be set, including all or none. • xxx means ESR[xxx] will be set. 5. Unimplemented operation exception type program interrupts can only occur when the PowerPC 476FP core is connected to a floating-point unit or auxiliary processor, and then only when executing instruction opcodes that are recognized by the floating-point unit or auxiliary processor but are not implemented within the hardware. 6. Floating-point unavailable and auxiliary processor unavailable interrupts and floating-point enabled and auxiliary processor enabled exception type program interrupts can only occur when the PowerPC 476FP core is connected to a floating-point unit or auxiliary processor and then only when executing instruction opcodes that are recognized by the floating-point unit or auxiliary processor. Processor Interrupts and Exceptions Page 184 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Notes DBCR Mask ESR MSR Mask Critical Debug Context Synchronous IVOR15 Exception Type Synchronous, Precise Interrupt Type Synchronous, Imprecise IVOR Asynchronous Table 7-2. Interrupt and Exception Types (Page 3 of 3) Trap X X DE IDM IAC X X DE IDM 3 DAC X X DE IDM 3 ICMP X X DE IDM 3 Branch taken X X DE IDM 3 Return X X DE IDM 3 3 Interrupt X X DE IDM Unconditional X X DE IDM Notes: 1. Although it is not specified as part of Book E, it is common for system implementations to provide, as part of the interrupt controller, independent mask and status bits for the various sources of critical input and external input interrupts. 2. Machine check interrupts are not classified as asynchronous nor synchronous. They are also not classified as critical or noncritical because they use their own unique set of Save/Restore Registers, MCSRR0 and MCSRR1. See Section 7.2.4 Machine Check Interrupts on page 169 and Section 7.5.2 Machine Check Interrupt on page 186. 3. Debug exceptions have special rules regarding their interrupt classification (synchronous or asynchronous, and precise or imprecise), depending on the particular debug mode being used and other conditions (see Section 7.5.15 Debug Interrupt on page 201). 4. In general, when an interrupt causes a particular ESR bit or bits to be set as indicated in the table, it also causes all other ESR bits to be cleared. Special rules apply to the ESR[ISMC] field; see Section 7.5.2 Machine Check Interrupt on page 186. If no ESR setting is indicated for any of the exception types within a given interrupt type, the ESR is unchanged for that interrupt type. The syntax for the ESR setting indication is as follows: • [xxx] means ESR[xxx] might be set. • [xxx,yyy,zzz] means any one (or none) of ESR[xxx] or ESR[yyy] or ESR[zzz] might be set, but never more than one. • {xxx,yyy,zzz} means that any combination of ESR[xxx], ESR[yyy], and ESR[zzz] might be set, including all or none. • xxx means ESR[xxx] will be set. 5. Unimplemented operation exception type program interrupts can only occur when the PowerPC 476FP core is connected to a floating-point unit or auxiliary processor, and then only when executing instruction opcodes that are recognized by the floating-point unit or auxiliary processor but are not implemented within the hardware. 6. Floating-point unavailable and auxiliary processor unavailable interrupts and floating-point enabled and auxiliary processor enabled exception type program interrupts can only occur when the PowerPC 476FP core is connected to a floating-point unit or auxiliary processor and then only when executing instruction opcodes that are recognized by the floating-point unit or auxiliary processor. 7.5.1 Critical Input Interrupt A critical input interrupt occurs when no higher priority exception exists, a critical input exception is presented to the interrupt mechanism, and MSR[CE] = ‘1’. A critical input exception is caused by the activation of an asynchronous input to the PowerPC 476FP core. Although the only mask for this interrupt type within the core is the MSR[CE] bit, system implementations typically provide an alternative means for independently masking the interrupt requests from the various devices that can collectively activate the PowerPC 476FP core critical input interrupt request input. Note: MSR[CE] also enables the watchdog timer interrupt. When a critical input interrupt occurs, the interrupt processing registers are updated as indicated as follows (all registers not listed are unchanged), and instruction execution resumes at address IVPR[IVP] || IVOR0[IVO] || 0b0000. Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 185 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Critical Save/Restore Register 0 (CSRR0) Set to the effective address of the next instruction to be executed. Critical Save/Restore Register 1 (CSRR1) Set to the contents of the MSR at the time of the interrupt. Machine State Register (MSR) ME Unchanged. All other MSR bits are set to ‘0’. Note: Software is responsible for taking any actions that are required by the implementation to clear any critical input exception status such that the critical input interrupt request input signal is deasserted before reenabling MSR[CE] to avoid another redundant critical input interrupt. 7.5.2 Machine Check Interrupt A machine check interrupt occurs when no higher priority exception exists, a machine check exception is presented to the interrupt mechanism, and MSR[ME] = ‘1’. The PowerPC architecture specifies machine check interrupts as neither synchronous nor asynchronous, and the exact causes and details of handling such interrupts are implementation dependent. Regardless, for PowerPC 476FP core, it is useful to describe the handling of interrupts caused by various types of machine check exceptions in those terms. The PowerPC 476FP core includes four types of machine check exceptions. They are as follows: • Instruction synchronous machine check exception An instruction synchronous machine check exception is caused when a timeout or read error is signaled on the instruction read PLB interface during an instruction fetch operation. Such an exception is not presented to the interrupt handling mechanism, however, until such a time as the execution is attempted of an instruction at an address associated with the instruction fetch for which the instruction machine check exception was asserted. When the exception is presented, the ESR[ISMC] bit is set to indicated the type of exception, regardless of the state of the MSR[ME] bit. If MSR[ME] = ‘1’ when the instruction machine check exception is presented to the interrupt mechanism, execution of the instruction associated with the exception is suppressed, a machine check interrupt occurs, and the interrupt processing registers are updated as described in Machine Check Save/Restore Register 0 (MCSRR0) on page 187. If MSR[ME] = ‘0’, however, the instruction associated with the exception is processed as though the exception did not exist, and a machine check interrupt does not occur (even if MSR[ME] is subsequently set to ‘1’), although the ESR is still updated as described in Machine Check Save/Restore Register 0 (MCSRR0). • Instruction asynchronous machine check exception An instruction asynchronous machine check exception is caused when one of the following events occurs: – An instruction cache parity error is detected. – The read interrupt request is asserted on the instruction read PLB interface. Processor Interrupts and Exceptions Page 186 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core • Data asynchronous machine check exception A data asynchronous machine check exception is caused when one of the following occurs: – A timeout, read error, or read interrupt request is signaled on the data read PLB interface during a data read operation. – A timeout, write error, or write interrupt request is signaled on the data write PLB interface during a data write operation. – A parity error is detected on an access to the data cache. • TLB asynchronous machine check exception A TLB asynchronous machine check exception is caused when a parity error is detected on an access to the TLB. • GPR asynchronous machine check exception A parity error is detected in one of the GPRs. • FPR asynchronous machine check exception A parity error is detected in one of the FPRs. When any machine check exception that is handled as an asynchronous interrupt occurs, it is immediately presented to the interrupt handling mechanism. MCSR[SUM] and other bits of the MCSR, as appropriate, are set. A machine check interrupt occurs immediately if MSR[ME] = ‘1’, and the interrupt processing registers are updated as described in the following subsections. If MSR[ME] = ‘0’, however, the exception is recorded by the setting of the MCSR[SUM] bit and deferred until such time as MSR[ME] is subsequently set to ‘1’. When the MCSR[SUM] and MSR[ME] are both set to ‘1’, the machine check interrupt is taken. Therefore, MCSR[SUM] must be cleared by software in the machine check interrupt handler before executing an rfmci to return to processing with MSR[ME] set to ‘1’. When a machine check interrupt occurs, the interrupt processing registers are updated as follows. All registers not listed are unchanged, and instruction execution resumes at address IVPR[IVP] || IVOR1[IVO] || 0b0000. Machine Check Save/Restore Register 0 (MCSRR0) For an instruction synchronous machine check exception, set to the effective address of the instruction presenting the exception. For an instruction asynchronous machine check, data asynchronous machine check, or TLB asynchronous machine check exception, set to the effective address of the next instruction to be executed. Machine Check Save/Restore Register 1 (MCSRR1) Set to the contents of the MSR at the time of the interrupt. Machine State Register (MSR) All MSR bits are set to ‘0’. Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 187 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Exception Syndrome Register (ESR) ISMC Set to ‘1’ for an instruction machine check exception; otherwise left unchanged. All other defined ESR bits are set to ‘0’ for an instruction machine check exception; otherwise, they are left unchanged. Note: If an instruction synchronous machine check exception is associated with an instruction, and execution of that instruction is attempted while MSR[ME] = ‘0’, no machine check interrupt occurs, but ESR[ISMC] is still set to ‘1’ when the instruction executes. When set, ESR[ISMC] cannot be cleared except by software using the mtspr instruction. When processing a machine check interrupt handler, software should query ESR[ISMC] to determine the type of machine check exception and then clear ESR[ISMC]. Then, before reenabling machine check interrupts by setting MSR[ME] to ‘1’, software should query the status of ESR[ISMC] again to determine whether any additional instruction machine check exceptions have occurred while MSR[ME] was disabled. Machine Check Syndrome Register (MCSR) The MCSR collects status for the machine check exceptions that are handled as asynchronous interrupts. MCSR[SUM] is set by any instruction asynchronous machine check exception, data asynchronous machine check exception, or TLB asynchronous machine check exception. Other bits in the MCSR are set to indicate the exact type of machine check exception. See Section 7.4.12 Machine Check Syndrome Register (MCSR) on page 181 for more information about the handling of machine check interrupts within the PowerPC 476FP core. 7.5.3 Data Storage Interrupt A data storage interrupt can occur when no higher priority exception exists and a data storage exception is presented to the interrupt mechanism. The PowerPC 476FP core includes four types of data storage exception as follows: • Read access control exception A read access control exception is caused by one of the following occurrences: – While in user mode (MSR[PR] = ‘1’), a load, icbi, dcbst, dcbf, dcbi, dcbtls, dcblc, icbtls, or icblc instruction attempts to access a location in storage that is not enabled for read access in user mode (that is, the TLB entry associated with the memory page being accessed has UR = ‘0’). – While in supervisor mode (MSR[PR] = ‘0’), a load, icbi, dcbst, dcbf, dcbi, dcbtls, dcblc, icbtls, or icblc instruction attempts to access a location in storage that is not enabled for read access in supervisor mode (that is, the TLB entry associated with the memory page being accessed has SR = ‘0’). Note: The instruction cache management instructions icbi and icbt are treated as loads from the addressed byte with respect to address translation and protection. These instruction cache management instructions use MSR[DS] rather than MSR[IS] to determine translation for their target effective address. Similarly, they use the read access control field (UR or SR) rather than the execute access control field (UX or SX) of the TLB entry to determine whether a data storage exception should occur. Instruction storage exceptions and instruction TLB miss exceptions are associated with the fetching of instructions not with the execution of instructions. data storage exceptions and data TLB miss exceptions are associated with the execution of instruction cache management instructions, and with the execution of load, store, and data cache management instructions. Processor Interrupts and Exceptions Page 188 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core • Write access control exception A write access control exception is caused by one of the following: – While in user mode (MSR[PR] = ‘1’), a store, dcbz, dcbtst, or dcbtstls instruction attempts to access a location in storage that is not enabled for write access in user mode (that is, the TLB entry associated with the memory page being accessed has UW = ‘0’). – While in supervisor mode (MSR[PR] = ‘0’), a store, dcbz, dcbtst, or dcbtstls instruction attempts to access a location in storage that is not enabled for write access in supervisor mode (that is, the TLB entry associated with the memory page being accessed has SW = ‘0’). • Cache locking exception A cache locking exception is caused by one of the following: – While in user mode (MSR[PR] = ‘1’) with MMUCR[IULXE] = ‘1’, execution of an icbi instruction is attempted. The exception occurs whether the cache line targeted by the icbi instruction is actually locked in the instruction cache. – While in user mode (MSR[PR] = ‘1’) with MMUCR[DULXE] = ‘1’, execution of a dcbf instruction is attempted. The exception occurs whether the cache line targeted by the dcbf instruction is actually locked in the data cache. See Section 5 Instruction and Data Caches on page 133 and Section 4.6.10 MMU Configuration Register (MMUCR) on page 126 for more information about cache locking and cache locking exceptions, respectively. If an stwcx. instruction causes a write access control exception, but the processor does not have the reservation from an lwarx instruction, a data storage interrupt does not occur, and the instruction completes, updating CR[CR0] to indicate the failure of the store due to the lost reservation. If a data storage exception occurs on any of the following instructions, the instruction is treated as a no-op, and a data storage interrupt does not occur. • lswx or stswx with a length of zero (although the target register of lswx will still be undefined, as it is whether a data storage exception occurs) • icbt • dcbt For all other instructions, if a data storage exception occurs, execution of the instruction causing the exception is suppressed, a data storage interrupt is generated, the interrupt processing registers are updated as indicated as follows (all registers not listed are unchanged), and instruction execution resumes at address IVPR[IVP] || IVOR2[IVO] || 0b0000. Save/Restore Register 0 (SRR0) Set to the effective address of the instruction causing the data storage interrupt. Save/Restore Register 1 (SRR1) Set to the contents of the MSR at the time of the interrupt. Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 189 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Machine State Register (MSR) CE, ME, DE Unchanged. All other MSR bits are set to ‘0’. Data Exception Address Register (DEAR) If the instruction causing the data storage exception does so with respect to the memory page targeted by the initial effective address calculated by the instruction, the DEAR is set to this calculated effective address. However, if the data storage exception only occurs due to the instruction causing the exception crossing a memory page boundary, in that the exception is with respect to the attributes of the page accessed after crossing the boundary, the DEAR is set to the address of the first byte within that page. For example, consider a misaligned load word instruction that targets effective address x‘0000 0FFF’, and that the page containing that address is a 4 KB page. The load word will thus cross the page boundary, and access the next page starting at address x‘0000 000’. If a read access control exception exists within the first page because the read access control field for that page is 0, the DEAR is set to x‘0000 0FFF’. However, if the read access control field of the first page is ‘1’, but the same field is ‘0’ for the next page, the read access control exception exists only for the second page and the DEAR is set to x‘0000 1000’. Furthermore, the load word instruction in this latter scenario will have been partially executed (see Section 7.3.1 Partially Executed Instructions on page 172). Exception Syndrome Register (ESR) See Section 7.4.11 Exception Syndrome Register (ESR) on page 179. 7.5.4 Instruction Storage Interrupt An instruction storage interrupt occurs when no higher priority exception exists and an instruction storage exception is presented to the interrupt mechanism. Note that, although an instruction storage exception can occur during an attempt to fetch an instruction, such an exception is not actually presented to the interrupt mechanism until an attempt is made to execute that instruction. The PowerPC 476FP core includes one type of instruction storage exception, the execute access control exception. Execute Access Control Exception An execute access control exception is caused by one of the following: • While in user mode (MSR[PR] = ‘1’), an instruction fetch attempts to access a location in storage that is not enabled for execute access in user mode (that is, the TLB entry associated with the memory page being accessed has UX = ‘0’). • While in supervisor mode (MSR[PR] = ‘0’), an instruction fetch attempts to access a location in storage that is not enabled for execute access in supervisor mode (that is, the TLB entry associated with the memory page being accessed has SX = ‘0’). Note: Book-III E of Power ISA Version 2.05 defines an additional instruction storage exception: the byte ordering exception. This exception is defined to assist implementations that cannot support dynamically switching byte ordering between consecutive instruction fetches or cannot support a given byte order at all. Processor Interrupts and Exceptions Page 190 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core The PowerPC 476FP core, however, supports instruction fetching from both big-endian and little-endian memory pages, so this exception cannot occur. When an instruction storage interrupt occurs, the processor suppresses the execution of the instruction causing the instruction storage exception, the interrupt processing registers are updated as indicated as follows (all registers not listed are unchanged), and instruction execution resumes at address IVPR[IVP] || IVOR3[IVO] || 0b0000. Save/Restore Register 0 (SRR0) Set to the effective address of the instruction causing the instruction storage interrupt. Save/Restore Register 1 (SRR1) Set to the contents of the MSR at the time of the interrupt. Machine State Register (MSR) CE, ME, DE Unchanged. All other MSR bits are set to ‘0’. Exception Syndrome Register (ESR) See Section 7.4.11 Exception Syndrome Register (ESR) on page 179. 7.5.5 External Input Interrupt An external input interrupt occurs when no higher priority exception exists, an external input exception is presented to the interrupt mechanism, and MSR[EE] = ‘1’. An external input exception is caused by the activation of an asynchronous input to the PowerPC 476FP core. Although the only mask for this interrupt type within the core is the MSR[EE] bit, system implementations typically provide an alternative means for independently masking the interrupt requests from the various devices that can collectively activate the core’s external input interrupt request input. Note: MSR[EE] also enables the external input decrementer interrupt and fixed interval timer interrupts. When an external input interrupt occurs, the interrupt processing registers are updated as indicated as follows (all registers not listed are unchanged), and instruction execution resumes at address IVPR[IVP] || IVOR4[IVO] || 0b0000. Save/Restore Register 0 (SRR0) Set to the effective address of the next instruction to be executed. Save/Restore Register 1 (SRR1) Set to the contents of the MSR at the time of the interrupt. Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 191 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Machine State Register (MSR) CE, ME, DE Unchanged. All other MSR bits are set to ‘0’. Note: Software is responsible for taking any actions that are required by the implementation to clear any external input exception status (such that the external input interrupt request input signal is deasserted) before reenabling MSR[EE] to avoid another redundant external input interrupt. 7.5.6 Alignment Interrupt An alignment interrupt occurs when no higher priority exception exists and an alignment exception is presented to the interrupt mechanism. An alignment exception occurs if execution of any of the following is attempted: • If CCR0[23] (FLSTA) is set, generate an alignment interrupt whenever a load/store instruction is not operand aligned. This means that halfword requests must be aligned on a halfword boundary, word requests must be aligned on a word boundary, and doubleword requests (APU/FPU only) must be aligned on a double word boundary. Requests that are multiples are considered to be word requests. This interrupt does not apply to strings because they are considered to be byte operations and are thus always operand aligned • If an FPU load/store operation is not operand aligned, generate an alignment interrupt. • A dcbz instruction that targets a memory page that is either write-through required or caching inhibited. • A lwarx/stwcx request that is not aligned on a word boundary will also generate an alignment interrupt. If an stwcx. instruction causes an alignment exception, and the processor does not have the reservation from an lwarx instruction, an alignment interrupt still occurs. Note: The architecture does not support the use of an unaligned effective address by the lwarx and stwcx. instructions. If an alignment interrupt occurs due to the attempted execution of one of these instructions, the alignment interrupt handler must not attempt to emulate the instruction but instead, should treat the instruction as a programming error. When an alignment interrupt occurs, the processor suppresses the execution of the instruction causing the alignment exception, the interrupt processing registers are updated as indicated as follows (all registers not listed are unchanged), and instruction execution resumes at address IVPR[IVP] || IVOR5[IVO] || 0b0000. Save/Restore Register 0 (SRR0) Set to the effective address of the instruction causing the alignment interrupt. Save/Restore Register 1 (SRR1) Set to the contents of the MSR at the time of the interrupt. Machine State Register (MSR) CE, ME, DE Unchanged Processor Interrupts and Exceptions Page 192 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core All other MSR bits are set to ‘0’. Data Exception Address Register (DEAR) Set to the effective address of the target data operand as calculated by the instruction causing the alignment exception. Note that for dcbz, this effective address is not necessarily the address of the first byte of the targeted cache block, but could be the address of any byte within the block (it will be the address calculated by the dcbz instruction). Exception Syndrome Register (ESR) FP Set to ‘1’ if the instruction causing the interrupt is a floating-point load or store; otherwise set to ‘0’ ST Set to ‘1’ if the instruction causing the interrupt is a store, dcbz, or dcbi instruction; otherwise set to ‘0’. AP All other defined ESR bits are set to ‘0’. All other defined ESR bits are set to ‘0’. 7.5.7 Program Interrupt A program interrupt occurs when no higher priority exception exists, a program exception is presented to the interrupt mechanism, and, for the floating-point enabled form of program exception only, MSR[FE0,FE1] is nonzero. The PowerPC 476FP core includes the following types of program exception: • Illegal instruction exception An illegal instruction exception occurs when execution is attempted of any of the following kinds of instructions: – A reserved-illegal instruction. – When MSR[PR] = ‘1’ (user mode), an mtspr or mfspr that specifies an SPRN value with SPRN5 = ‘0’ (user-mode accessible) that represents an unimplemented Special Purpose Register. For mtspr, this includes any SPR number other than the XER, LR, CTR, or USPRG0. For mfspr, this includes any SPR number other than the ones listed for mtspr, plus SPRG4-7, TBU, and TBL. – A defined instruction that is not implemented within the PowerPC 476FP core and that is not a floating-point instruction. This includes all instructions that are defined for 64-bit implementations only and mfapidi (see the PowerPC Book-E specification). – A defined floating-point instruction that is not recognized by an attached floating-point unit or when no such floating-point unit is attached. – An allocated instruction that is not implemented within the PowerPC 476FP core and that is not recognized by an attached auxiliary processor (or when no such auxiliary processor is attached). See Section 2.3 Instruction Classes on page 47 for more information about the PowerPC 476FP core’s support for defined and allocated instructions. • Privileged instruction exception A privileged instruction exception occurs when MSR[PR] = ‘1’ and execution is attempted of any of the following kinds of instructions: Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 193 of 322 User’s Manual PowerPC 476FP Embedded Processor Core – A privileged instruction. – An mtspr or mfspr instruction that specifies an SPRN value with SPRN[5] = ‘1’ (a privileged instruction exception occurs regardless of whether the SPR referenced by the SPRN value is defined). • IOC enabled trap exception An IOC enabled trap exception occurs when an opcode match is made according to IOCR1 and IOCR2 registers and properly enabled in the IOCCR. The operation is converted to a special exception and issued to IRACC with an indication that the operation is illegal. The DISS logic faulty confirms the operation when it is equal to the CS tail. ESR[POT1] or ESR[POT2] is set a result, and a program interrupt is taken. • Trap exception A trap exception occurs when any of the conditions specified in a tw or twi instruction are met. However, if trap debug events are enabled (DBCR0[TRAP] = ‘1’), internal debug mode is enabled (DBCR0[IDM] = ‘1’), and debug interrupts are enabled (MSR[DE] = ‘1’), a trap exception causes a debug interrupt to occur rather than a program interrupt. See Section 8 Debug Facilities on page 217 for more information about trap debug events. • Unimplemented operation exception An unimplemented operation exception occurs when execution is attempted of any of the following kinds of instructions: – A defined floating-point instruction that is recognized but not supported by an attached floating-point unit, when floating-point instruction processing is enabled (MSR[FP] = ‘1’). – An allocated instruction that is not implemented within the PowerPC 476FP core, and is recognized but not supported by an attached auxiliary processor, when auxiliary processor instruction processing is enabled. The enabling of auxiliary processor instruction processing is implementation-dependent. • Floating-point enabled exception A floating-point enabled exception occurs when the execution or attempted execution of a defined floating-point instruction causes FPSCR[FEX] to be set to ‘1’ in an attached floating-point unit. FPSCR[FEX] is the Floating-Point Status and Control Register Floating-Point Enabled Exception Summary bit. If MSR[FE0,FE1] is nonzero when the floating-point enabled exception is presented to the interrupt mechanism, a program interrupt occurs, and the interrupt processing registers are updated as described below. If MSR[FE0,FE1] are both ‘0’, however, a program interrupt does not occur and the instruction associated with the exception executes according to the definition of the floating-point unit (see the user’s manual for the floating-point unit implementation). If MSR[FE0,FE1] are subsequently set to a nonzero value, and the floating-point enabled exception is still being presented to the interrupt mechanism (that is, FPSCR[FEX] is still set), a delayed program interrupt occurs, updating the interrupt processing registers as described below. See Section 7.2.2.2 Synchronous, Imprecise Interrupts on page 168 for more information about this special form of delayed floating-point enabled exception. • Auxiliary processor enabled exception Processor Interrupts and Exceptions Page 194 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core An auxiliary processor enabled exception might occur due to the execution or attempted execution of an allocated instruction that is not implemented within the PowerPC 476FP core, but is recognized and supported by an attached auxiliary processor. The cause of such an exception is implementation-dependent. When a program interrupt occurs, the processor suppresses the execution of the instruction causing the program exception (for all cases except the delayed form of floating-point enabled exception described previously), the interrupt processing registers are updated as indicated as follows (all registers not listed are unchanged), and instruction execution resumes at address IVPR[IVP] || IVOR6[IVO] || 0b0000. Save/Restore Register 0 (SRR0) Set to the effective address of the instruction causing the program interrupt, for all cases except the delayed form of floating-point enabled exception described previously. For the special case of the delayed floating-point enabled exception, where the exception was already being presented to the interrupt mechanism at the time MSR[FE0,FE1] was changed from 0 to a non-zero value, SRR0 is set to the address of the instruction that would have executed after the MSR-changing instruction. If the instruction that set MSR[FE0,FE1] was rfi, rfci, or rfmci, CSRR0 is set to the address to which the rfi, rfci, or rfmci was returning, and not to the address of the instruction that was sequentially after the rfi, rfci, or rfmci. Save/Restore Register 1 (SRR1) Set to the contents of the MSR at the time of the interrupt. Machine State Register (MSR) CE, ME, DE Unchanged. All other MSR bits are set to ‘0’. Exception Syndrome Register (ESR) ISMC Instruction machine check. SS Data storage interrupt (DSI), storage synchronization (lwarx/stwcx to W/I = ‘1’). POT1 Program interrupt: opcode trap 1. This bit is set if an interrupt occurs and the opcode matches the opcode specified in IOCR1. POT2 Program interrupt: opcode trap 2. This bit is set if an interrupt occurs and the opcode matches the opcode specified in IOCR2. PIL Program interrupt, illegal instruction exception. PPR Program interrupt, privileged instruction exception. PTR Program interrupt, trap instruction exception. FP FP operation. This is used with program, alignment, DSI, and data-side TLB miss interrupt types. ST Store operation. Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 195 of 322 User’s Manual PowerPC 476FP Embedded Processor Core ISMC Instruction machine check. SS Data storage interrupt (DSI), storage synchronization (lwarx/stwcx to W/I = ‘1’). DLK[0:1] Data storage interrupt (DSI), locking exception. 00 01 10 11 No locking exception. dcbf in user mode to a locked line. icbi in user mode to a locked line. Reserved. AP AP operation. This is used with program, alignment, DSI, and data-side TLB miss interrupt types. PUO Program interrupt, unimplemented operation exception. BO Byte order error. PIE Program interrupt, imprecise exception. PCRE Program interrupt, condition register (CR) enable. PCMP Program interrupt, compare PCRF Program interrupt, condition register (CR) field All other defined ESR bits are set to ‘0’. Note: The ESR[PCRE,PCMP,PCRF] fields are provided to assist the program interrupt handler with the emulation of part of the function of the various floating-point CR-updating instructions when any of these instructions cause a precise (nondelayed) floating-point enabled exception type program interrupt. The Power ISA Version 2.05, Book-III E floating-point architecture defines that when such exceptions occur, the CR is to be updated even though the rest of the instruction execution can be suppressed. The PowerPC 476FP core, however, does not support such CR updates when the instruction that is supposed to cause the update is being suppressed due to the occurrence of a synchronous, precise interrupt. Instead, the PowerPC 476FP core records in the ESR[PCRE,PCMP,PCRF] fields information about the instruction causing the interrupt to assist the program interrupt handler software in performing the appropriate CR update manually. 7.5.8 Floating-Point Unavailable Interrupt A floating-point unavailable interrupt occurs when no higher priority exception exists, an attempt is made to execute a floating-point instruction that is recognized by an attached floating-point unit, and MSR[FP] = ‘0’. When a floating-point unavailable interrupt occurs, the processor suppresses the execution of the instruction causing the floating-point unavailable exception, the interrupt processing registers are updated as follows (all registers not listed are unchanged), and instruction execution resumes at address IVPR[IVP] || IVOR7[IVO] || 0b0000. Processor Interrupts and Exceptions Page 196 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Save/Restore Register 0 (SRR0) Set to the effective address of the instruction causing the floating-point unavailable interrupt. Save/Restore Register 1 (SRR1) Set to the contents of the MSR at the time of the interrupt. Machine State Register (MSR) CE, ME, DE Unchanged. All other MSR bits are set to ‘0’. 7.5.9 System Call Interrupt A system call interrupt occurs when no higher priority exception exists and a system call (sc) instruction is executed. When a system call interrupt occurs, the interrupt processing registers are updated as follows (all registers not listed are unchanged), and instruction execution resumes at address IVPR[IVP] || IVOR8[IVO] || 0b0000. Save/Restore Register 0 (SRR0) Set to the effective address of the instruction after the system call instruction. Save/Restore Register 1 (SRR1) Set to the contents of the MSR at the time of the interrupt. Machine State Register (MSR) CE, ME, DE Unchanged. All other MSR bits are set to ‘0’. 7.5.10 Decrementer Interrupt A decrementer interrupt occurs when no higher priority exception exists, a decrementer exception exists (TSR[DIS] = ‘1’), and the interrupt is enabled (TCR[DIE] = ‘1’ and MSR[EE] = ‘1’). Note: MSR[EE] also enables the external input and fixed interval timer interrupts. When a decrementer interrupt occurs, the interrupt processing registers are updated as follows (all registers not listed are unchanged), and instruction execution resumes at address IVPR[IVP] || IVOR10[IVO] || 0b0000. Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 197 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Save/Restore Register 0 (SRR0) Set to the effective address of the next instruction to be executed. Save/Restore Register 1 (SRR1) Set to the contents of the MSR at the time of the interrupt. Machine State Register (MSR) CE, ME, DE Unchanged. All other MSR bits are set to ‘0’. Note: Software is responsible for clearing the decrementer exception status by writing to TSR[DIS] before reenabling MSR[EE] to avoid another redundant decrementer interrupt. 7.5.11 Fixed-Interval Timer Interrupt A fixed-interval timer interrupt occurs when no higher priority exception exists, a fixed interval timer exception exists (TSR[FIS] = ‘1’), and the interrupt is enabled (TCR[FIE] = ‘1’ and MSR[EE] = ‘1’). See Section 6 Timer Facilities on page 157 for more information about fixed interval timer exceptions. Note: MSR[EE] also enables the external input and decrementer interrupts. When a fixed interval timer interrupt occurs, the interrupt processing registers are updated as follows (all registers not listed are unchanged), and instruction execution resumes at address IVPR[IVP] || IVOR11[IVO] || 0b0000. Save/Restore Register 0 (SRR0) Set to the effective address of the next instruction to be executed. Save/Restore Register 1 (SRR1) Set to the contents of the MSR at the time of the interrupt. Machine State Register (MSR) CE, ME, DE Unchanged. All other MSR bits are set to ‘0’. Note: Software is responsible for clearing the fixed interval timer exception status by writing to TSR[FIS], before reenabling MSR[EE] to avoid another redundant fixed interval timer interrupt. Processor Interrupts and Exceptions Page 198 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 7.5.12 Watchdog Timer Interrupt A watchdog timer interrupt occurs when no higher priority exception exists, a watchdog timer exception exists (TSR[WIS] = ‘1’), and the interrupt is enabled (TCR[WIE] = ‘1’ and MSR[CE] = ‘1’). See Section 6 Timer Facilities on page 157 for more information about watchdog timer exceptions. Note: MSR[CE] also enables the critical input interrupt. When a watchdog timer interrupt occurs, the interrupt processing registers are updated as follows (all registers not listed are unchanged), and instruction execution resumes at address IVPR[IVP] || IVOR12[IVO] || 0b0000. Critical Save/Restore Register 0 (CSRR0) Set to the effective address of the next instruction to be executed. Critical Save/Restore Register 1 (CSRR1) Set to the contents of the MSR at the time of the interrupt. Machine State Register (MSR) ME Unchanged. All other MSR bits are set to ‘0’. Note: Software is responsible for clearing the watchdog timer exception status by writing to TSR[WIS], before reenabling MSR[CE] to avoid another redundant watchdog timer interrupt. 7.5.13 Data TLB Error Interrupt A data TLB error interrupt can occur when no higher priority exception exists and a data TLB miss exception is presented to the interrupt mechanism. A data TLB miss exception occurs when a load, store, icbi, icbt, dcbst, dcbf, dcbz, dcbi, dcbt, or dcbtst instruction attempts to access a virtual address for which a valid TLB entry does not exist. See Section 4 Memory Management Unit on page 103 for more information about the TLB. The data TLB error interrupt also includes an operand accessing a page with W = I = '1’, write-through, and cache-inhibited at the same time. Note: The instruction cache management instructions icbi and icbt are treated as loads from the addressed byte with respect to address translation and protection and therefore, use MSR[DS] rather than MSR[IS] as part of the calculated virtual address when searching the TLB to determine translation for their target storage address. Instruction TLB miss exceptions are associated with the fetching of instructions, not with the execution of instructions. Data TLB miss exceptions are associated with the execution of instruction cache management instructions and with the execution of load, store, and data cache management instructions. If an stwcx. instruction causes a data TLB miss exception, and the processor does not have the reservation from an lwarx instruction, a data TLB error interrupt still occurs. Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 199 of 322 User’s Manual PowerPC 476FP Embedded Processor Core If a data TLB miss exception occurs on any of the following instructions, the instruction is treated as a no-op, and a data TLB error interrupt does not occur: • lswx or stswx with a length of zero (although the target register of lswx will be undefined) • icbt • dcbt • dcbtst For all other instructions, if a data TLB miss exception occurs, execution of the instruction causing the exception is suppressed, a data TLB error interrupt is generated, the interrupt processing registers are updated as follows (all registers not listed are unchanged), and instruction execution resumes at address IVPR[IVP] || IVOR13[IVO] || 0b0000. Save/Restore Register 0 (SRR0) Set to the effective address of the instruction causing the data TLB error interrupt. Save/Restore Register 1 (SRR1) Set to the contents of the MSR at the time of the interrupt. Machine State Register (MSR) CE, ME, DE Unchanged. All other MSR bits are set to ‘0’. Data Exception Address Register (DEAR) If the instruction causing the data TLB miss exception does so with respect to the memory page targeted by the initial effective address calculated by the instruction, the DEAR is set to this calculated effective address. However, if the data TLB miss exception only occurs due to the instruction causing the exception crossing a memory page boundary in that the missing TLB entry is for the page accessed after crossing the boundary, the DEAR is set to the address of the first byte within that page. For example, consider a misaligned load word instruction that targets effective address x‘0000 0FFF’, and that the page containing that address is a 4 KB page. The load word will thus cross the page boundary and attempt to access the next page starting at address x‘0000 1000’. If a valid TLB entry does not exist for the first page, the DEAR will be set to x‘0000 0FFF’. However, if a valid TLB entry exists for the first page, but not for the second, the DEAR will be set to x‘0000 1000’. Furthermore, the load word instruction in this latter scenario will have been partially executed (see Section 7.3.1 Partially Executed Instructions on page 172). Exception Syndrome Register (ESR) FP Set to ‘1’ if the instruction causing the interrupt is a floating-point load or store; otherwise set to ‘0’. ST Set to ‘1’ if the instruction causing the interrupt is a store, dcbz, or dcbi instruction; otherwise set to ‘0’. Processor Interrupts and Exceptions Page 200 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core AP Set to ‘1’ if the instruction causing the interrupt is an auxiliary processor load or store; otherwise set to ‘0’. ISMC Unchanged. All other defined ESR bits are set to ‘0’. 7.5.14 Instruction TLB Error Interrupt An instruction TLB error interrupt occurs when no higher priority exception exists and an instruction TLB miss exception is presented to the interrupt mechanism. Note that although an instruction TLB miss exception might occur during an attempt to fetch an instruction, such an exception is not actually presented to the interrupt mechanism until an attempt is made to execute that instruction. An instruction TLB miss exception occurs when an instruction fetch attempts to access a virtual address for which a valid TLB entry does not exist. See Section 4 Memory Management Unit on page 103 for more information about the TLB. The instruction TLB error interrupt also includes an instruction accessing a page with W = I = ‘1’, writethrough, and cache-inhibited at the same time. When an instruction TLB error interrupt occurs, the processor suppresses the execution of the instruction causing the instruction TLB miss exception, the interrupt processing registers are updated as follows (all registers not listed are unchanged), and instruction execution resumes at address IVPR[IVP] || IVOR14[IVO] || 0b0000. Save/Restore Register 0 (SRR0) Set to the effective address of the instruction causing the instruction TLB error interrupt. Save/Restore Register 1 (SRR1) Set to the contents of the MSR at the time of the interrupt. Machine State Register (MSR) CE, ME, DE Unchanged. All other MSR bits are set to ‘0’. 7.5.15 Debug Interrupt A debug interrupt occurs when no higher priority exception exists, a debug exception exists in the Debug Status Register (DBSR), the processor is in internal debug mode (DBCR0[IDM] = ‘1’), and debug interrupts are enabled (MSR[DE] = ‘1’). A debug exception occurs when a debug event causes a corresponding bit in the DBSR to be set. DBCR0[IDM] and MSR[DE] must be set to enable debug interrupts. However, if DBCR0[IDM] is set but MSR[DE] is not set, the processor operates in trace mode. In trace mode, no debug interrupts occur, but DBSR is still set. To enable the core to broadcast instruction trace data, additional register settings are required. See Section 8.2.3 Trace Mode on page 218. Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 201 of 322 User’s Manual PowerPC 476FP Embedded Processor Core There are several types of debug exceptions, as follows: • Instruction address compare (IAC) exception An IAC debug exception occurs when execution is attempted of an instruction whose address matches the IAC conditions specified by the various debug facility registers. This exception can occur regardless of debug mode, and regardless of the value of MSR[DE]. • Data address compare (DAC) exception A DAC debug exception occurs when the DVC mechanism is not enabled and execution is attempted of a load, store, icbi, icbt, dcbst, dcbf, dcbz, dcbi, dcbt, or dcbtst instruction whose target storage operand address matches the DAC conditions specified by the various debug facility registers. This exception can occur if MSR[DE] and DBCR0[IDM] are set. Note: The instruction cache management instructions icbi and icbt are treated as loads from the addressed byte with respect to debug exceptions. IAC debug exceptions are associated with the fetching of instructions not with the execution of instructions. DAC debug exceptions are associated with the execution of instruction cache management instructions and with the execution of load, store, and data cache management instructions. • Data value compare (DVC) exception A DVC debug exception occurs when execution is attempted of a load, store, or dcbz instruction whose target storage operand address matches the DAC and DVC conditions specified by the various debug facility registers. This exception can occur if MSR[DE] and DBCR0[IDM] are set. • Branch taken (BRT) exception A BRT debug exception occurs when BRT debug events are enabled (DBCR0[BRT] = ‘1’), and execution is attempted of a branch instruction for which the branch conditions are met. This exception cannot occur in internal debug mode when MSR[DE] = ‘0’ unless external debug mode or debug wait mode is also enabled. Table 7-3 on page 202 lists BRT debug event actions. Table 7-3. BRT Debug Event Actions DBCR0 MSR[DE] MSR[DWE] and JDCR[DWE] – – – No action. 0 0 – 0 DBSR[BRT] is set through a normal commit 1 – 1 – – DBSR[BRT] is set through a faulty commit. Transition to the STOP state. 1 – 0 – 1 DBSR[BRT] is set through a faulty commit. Transition to the STOP state. 1 1 0 0 0 No action. 1 1 0 1 0 DBSR[BRT] is set through a faulty commit. A debug interrupt is taken. CSRR0 is set to the address of the branch instruction. [BRT] [IDM] [EDM] 0 – 1 Action If Event Occurs • Trap (TRAP) exception A TRAP debug exception occurs when TRAP debug events are enabled (DBCR0[TRAP] = ‘1’), and execution is attempted of a tw or twi instruction that matches any of the specified trap conditions. This Processor Interrupts and Exceptions Page 202 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core exception can occur regardless of debug mode and regardless of the value of MSR[DE]. Table 7-4 lists TRAP debug event actions. Table 7-4. TRAP Debug Event Actions DBCR0 MSR[DE] MSR[DWE] and JDCR[DWE] – – – Program interrupt taken. 0 0 – 0 DBSR[TRAP] is set. 1 – 1 – – DBSR[TRAP] is set. Transition to the STOP state. 1 – 0 – 1 DBSR[TRAP] is set. Transition to the STOP state. 1 1 0 0 0 DBSR[TRAP] is set. DBSR[IDE] is set. A program interrupt is taken. SRR0 is set to the address of the trap instruction. 1 1 0 1 0 DBSR[TRAP] is set. A debug Interrupt taken. CSRR0 is set to the address of the trap instruction. [TRAP] [IDM] [EDM] 0 – 1 Action If Event Occurs • Return (RET) exception An RET debug exception occurs when RET debug events are enabled (DBCR0[RET] = ‘1’) and execution is attempted of an rfi, rfci, or rfmci instruction. For rfi, the RET debug exception can occur regardless of debug mode and regardless of the value of MSR[DE]. For rfci or rfmci, the RET debug exception cannot occur in internal debug mode when MSR[DE] = ‘0’ unless external debug mode or debug wait mode is also enabled. Table 7-5 on page 203 lists RET debug event actions. Table 7-5. RET Debug Event Actions DBCR0 MSR[DE] MSR[DWE] and JDCR[DWE] – – – None. 0 0 – 0 DBSR[RET] is set through a normal commit. 1 – 1 – – rfi faulty committed. DBSR[RET] is set. Transition to the STOP state. 1 – 0 – 1 rfi faulty committed. DBSR[RET] is set. Transition to the STOP state. 1 1 0 0 0 DBSR[RET] is set through a normal commit. DBSR[IDE] is set. 1 1 0 1 0 rfi faulty committed. DBSR[RET] is set. A debug interrupt is taken. CSRR0 is set to the address of the rfi instruction. [RET] [IDM] [EDM] 0 – 1 Action If Event Occurs • Instruction complete (ICMP) exception An ICMP debug exception occurs when ICMP debug events are enabled (DBCR0[ICMP] = ‘1’), and execution of any instruction is completed. This exception cannot occur in internal debug mode when MSR[DE] = ‘0’ unless external debug mode or debug wait mode is also enabled. Table 7-6 lists ICMP debug event actions. Table 7-6 lists ICMP debug event actions. Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 203 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table 7-6. ICMP Debug Event Actions DBCR0 MSR[DE] MSR[DWE] and JDCR[DWE] – – – No action. 0 0 – 0 No action. 1 – 1 – – DBSR[ICMP] is set. Transition to the STOP state. 1 – 0 – 1 DBSR[ICMP] is set. Transition to the STOP state. 1 1 0 0 0 No action. 1 1 0 1 0 DBSR[ICMP] is set. Debug interrupt is taken. CSRR0 is set to the address of the next instruction to be executed after the ICMP instruction. [ICMP] [IDM] [EDM] 0 – 1 Action If Event Occurs • Interrupt (IRPT) exception An IRPT debug exception occurs when IRPT debug events are enabled (DBCR0[IRPT] = ‘1’), and an interrupt occurs. For noncritical class interrupt types, the IRPT debug exception can occur regardless of debug mode and regardless of the value of MSR[DE]. For critical class interrupt types, the IRPT debug exception cannot occur in internal debug mode (regardless of the value of MSR[DE]) unless external debug mode or debug wait mode is also enabled. Table 7-7 on page 204 lists IRPT debug event actions. Table 7-7. IRPT Debug Event Actions DBCR0 MSR[DE] MSR[DWE] and JDCR[DWE] – – – None. 0 0 – 0 DBSR[IRPT] is set. 1 – 1 – – DBSR[IRPT] is set. Transition to the STOP state. 1 – 0 – 1 DBSR[IRPT] is set. Transition to the STOP state. 1 1 0 0 0 DBSR[IRPT] is set. DBSR[IDE] is set. 1 1 0 1 0 DBSR[IRPT] is set. A debug interrupt is taken. CSRR0 is set to the address of the first instruction in the base class interrupt handler. [RET] [IDM] [EDM] 0 – 1 Action If Event Occurs • Unconditional debug event (UDE) exception A UDE debug exception occurs when an unconditional debug event is signaled over the JTAG interface to the PowerPC 476FP core. This exception can occur regardless of debug mode and regardless of the value of MSR[DE]. Table 7-8 lists UDE debug event actions. Table 7-8. UDE Debug Event Actions (Page 1 of 2) DBCR0 MSR[DE] MSR[DWE] and JDCR[DWE] 0 – 0 DBSR[UDE] is set. 1 – – DBSR[UDE] is set. Transition to the STOP state. [IDM] [EDM] 0 – Processor Interrupts and Exceptions Page 204 of 322 Action If Event Occurs Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Table 7-8. UDE Debug Event Actions (Page 2 of 2) DBCR0 MSR[DE] MSR[DWE] and JDCR[DWE] 0 – 1 DBSR[UDE] is set. Transition to the STOP state. 1 0 0 0 DBSR[UDE] is set. 1 0 1 0 DBSR[UDE] is set. A debug interrupt is taken. CSRR0 is set to the address of the CS trail at the time of the interrupt flush. [IDM] [EDM] – Action If Event Occurs The PowerPC 476FP core supports the following four debug modes: • • • • Internal debug mode External debug mode Debug wait mode Trace mode Debug exceptions and interrupts are affected by the debug modes that are enabled at the time of the debug exception. Debug interrupts occur only when internal debug mode is enabled, although it is possible for external debug mode or debug wait mode to be enabled as well. The remainder of this section assumes that internal debug mode is enabled and that external debug mode and debug wait mode are not enabled, at the time of a debug exception. See Section 8 Debug Facilities on page 217 for more information about the different debug modes and the behavior of each of the debug exception types when operating in each of the modes. Note: It is a programming error for software to enable internal debug mode (by setting DBCR0[IDM] to ‘1’) while debug exceptions are already present in the DBSR. Software must first clear all DBSR debug exception status (that is, all fields except IDE, MRR, IAC12ATS, and IAC34ATS) before setting DBCR0[IDM] to ‘1’. If a stwcx. instruction causes a DAC or DVC debug exception but the processor does not have the reservation from a lwarx instruction, the debug exception is not recorded in the DBSR, and a debug interrupt does not occur. Instead, the instruction completes and updates CR[CR0] to indicate the failure of the store due to the lost reservation. If a DAC exception occurs on an lswx or stswx with a length of zero, the instruction is treated as a no-op, the debug exception is not recorded in the DBSR, and a debug interrupt does not occur. If a DAC exception occurs on an icbt, dcbt, or dcbtst instruction that is being no-op’ed due to some other reason (either the referenced cache block is in a caching inhibited memory page or a data storage or data TLB miss exception occurs), the debug exception is not recorded in the DBSR, and a debug interrupt does not occur. However, if the icbt, dcbt, or dcbtst instruction is not being no-op’ed for one of these other reasons, the DAC debug exception does occur and is handled in the same fashion as other DAC debug exceptions. For all other cases, when a debug exception occurs, it is immediately presented to the interrupt handling mechanism. A debug interrupt occurs immediately if MSR[DE] = ‘1’, and the interrupt processing registers are updated as described in the following subsections. If MSR[DE] = ‘0’, however, the exception condition remains set in the DBSR. When MSR[DE] is subsequently set to ‘1’ and the exception condition is still present in the DBSR, a delayed debug interrupt then occurs either as a synchronous, imprecise interrupt, or as an asynchronous interrupt, depending on the type of debug exception. When a debug interrupt occurs, the interrupt processing registers are updated as follows (all registers not listed are unchanged) and instruction execution resumes at address IVPR[IVP] || IVOR15[IVO] || 0b0000. Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 205 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Critical Save/Restore Register 0 (CSRR0) For debug exceptions that occur while debug interrupts are enabled (MSR[DE] = ‘1’), CSRR0 is set as follows: • For IAC, BRT, TRAP, and RET debug exceptions, set to the address of the instruction causing the debug interrupt. Execution of the instruction causing the debug exception is suppressed, and the interrupt is synchronous and precise. • For DAC and DVC debug exceptions, if DBCR2[DAC12A] = ‘0’, set to the address of the instruction causing the debug interrupt. Execution of the instruction causing the debug exception is suppressed, and the interrupt is synchronous and precise. If DBCR2[DAC12A] = ‘1’, however, DAC and DVC debug exceptions are handled asynchronously, and CSRR0 is set to the address of the instruction that would have executed next had the debug interrupt not occurred. This could either be the address of the instruction causing the DAC or DVC debug exception, or the address of a subsequent instruction. • For ICMP debug exceptions, set to the address of the next instruction to be executed (the instruction after the one whose completion caused the ICMP debug exception). The interrupt is synchronous and precise. Because the ICMP debug exception does not suppress the execution of the instruction causing the exception, but rather allows it to complete before causing the interrupt, the behavior of the interrupt is different in the special case where the instruction causing the ICMP debug exception is itself setting MSR[DE] to ‘0’. In this case, the interrupt is delayed and occurs if MSR[DE] is again set to ‘1’, assuming DBSR[ICMP] is still set. If the debug interrupt occurs in this fashion, it will be synchronous and imprecise, and CSRR0 will be set to the address of the instruction after the one that set MSR[DE] to ‘1’ (not the one that originally caused the ICMP debug exception and in so doing set MSR[DE] to ‘0’). If the instruction that set MSR[DE] to ‘1’ was rfi, rfci, or rfmci, CSRR0 is set to the address to which the rfi, rfci, or rfmci was returning and not to the address of the instruction that was sequentially after the rfi, rfci, or rfmci. • For IRPT debug exceptions, set to the address of the first instruction in the interrupt handler associated with the interrupt type that caused the IRPT debug exception. The interrupt is asynchronous. • For UDE debug exceptions, set to the address of the instruction that would have executed next if the debug interrupt had not occurred. The interrupt is asynchronous. For all debug exceptions that occur while debug interrupts are disabled (MSR[DE] = ‘0’), the debug interrupt is delayed and occurs if and when MSR[DE] is again set to ‘1’, assuming the debug exception status is still set in the DBSR. If the debug interrupt occurs in this fashion, CSRR0 is set to the address of the instruction after the one that set MSR[DE]. If the instruction that set MSR[DE] was rfi, rfci, or rfmci, CSRR0 is set to the address to which the rfi, rfci, or rfmci was returning, and not to the address of the instruction that was sequentially after the rfi, rfci, or rfmci. The interrupt is either synchronous and imprecise, or asynchronous, depending on the type of debug exception, as follows: • For IAC and RET debug exceptions, the interrupt is synchronous and imprecise. • For BRT debug exceptions, this scenario cannot occur. BRT debug exceptions are not recognized when MSR[DE] = ‘0’ if operating in internal debug mode. • For TRAP debug exceptions, the debug interrupt is synchronous and imprecise. However, under these conditions (TRAP debug exception occurring while MSR[DE] is 0), the attempted execution of the trap instruction for which one or more of the trap conditions is met will itself lead to a trap exception type program interrupt. The corresponding debug interrupt that occurs later if debug interrupts are enabled is in addition to the program interrupt. Processor Interrupts and Exceptions Page 206 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core • For DAC and DVC debug exceptions, if DBCR2[DAC12A] = ‘0’, the interrupt is synchronous and imprecise. If DBCR2[DAC12A] = ‘1’, the interrupt is asynchronous. • For ICMP debug exceptions, this scenario cannot occur in this fashion. ICMP debug exceptions are not recognized when MSR[DE] = ‘0’ if operating in internal debug mode. However, a similar scenario can occur when MSR[DE] = ‘1’ at the time of the ICMP debug exception, but the instruction whose completion is causing the exception is itself setting MSR[DE] to ‘0’. This scenario is described in the subsection about the ICMP debug exception for which MSR[DE] = ‘1’ at the time of the exception. In that scenario, the interrupt is synchronous and imprecise. • For IRPT and UDE debug exceptions, the interrupt is asynchronous. Critical Save/Restore Register 1 (CSRR1) Set to the contents of the MSR at the time of the interrupt. Machine State Register (MSR) CE, ME, DE Unchanged. All other MSR bits are set to ‘0’. 7.6 Interrupt Ordering and Masking Multiple exceptions can exist simultaneously, each of which can cause the generation of an interrupt. Furthermore, the Power ISA architecture does not provide for the generation of more than one interrupt of the same class (critical or noncritical) at a time. Therefore, the architecture defines that interrupts are ordered with respect to each other and provides a masking mechanism for certain persistent interrupt types. When an interrupt type is masked (disabled) and an event causes an exception that would normally generate an interrupt of that type, the exception persists as a status bit in a register (which register depends upon the exception type). However, no interrupt is generated. Later, if the interrupt type is enabled (unmasked), and the exception status has not been cleared by software, the interrupt due to the original exception event will then finally be generated. All asynchronous interrupt types can be masked. Machine check interrupts can be masked as well. In addition, certain synchronous interrupt types can be masked. The two synchronous interrupt types that can be masked are the floating-point enabled exception type program interrupt (masked by MSR[FE0,FE1), and the IAC, DAC, DVC, RET, and ICMP exception type debug interrupts (masked by MSR[DE]). Note: When an otherwise synchronous, precise interrupt type is delayed in this fashion through masking, and the interrupt type is later enabled, the interrupt that is then generated due to the exception event that occurred while the interrupt type was disabled is then considered a synchronous, imprecise class of interrupt. To prevent a subsequent interrupt from causing the state information (saved in SRR0/SRR1, CSRR0/CSRR1, or MCSRR0/MCSRR1) from a previous interrupt to be overwritten and lost, the PowerPC 476FP core performs certain functions. As a first step, upon any noncritical class interrupt, the processor automatically disables any further asynchronous, noncritical class interrupts (external input, decrementer, and fixed interval timer) by clearing MSR[EE]. Likewise, upon any critical class interrupt, hardware automatically disables any further asynchronous interrupts of either class (critical and noncritical) by clearing MSR[CE] and MSR[DE], in addition to MSR[EE]. The additional interrupt types that are disabled by the Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 207 of 322 User’s Manual PowerPC 476FP Embedded Processor Core clearing of MSR[CE,DE] are the critical input, watchdog timer, and debug interrupts. For machine check interrupts, the processor automatically disables all maskable interrupts by clearing MSR[ME] and MSR[EE,CE,DE]. This first step of clearing MSR[EE] (and MSR[CE,DE] for critical class interrupts, and MSR[ME] for machine checks) prevents any subsequent asynchronous interrupts from overwriting the relevant save/restore registers (SRR0/SRR1, CSRR0/CSRR1, or MCSRR0/MCSRR1) before software can save their contents. The processor also automatically clears, on any interrupt, MSR[WE,PR,FP,FE0,FE1,IS,DS]. The clearing of these bits assists in the avoidance of subsequent interrupts of certain other types. However, guaranteeing that these interrupt types do not occur and thus do not overwrite the save/restore registers also requires the cooperation of system software. Specifically, system software must avoid the execution of instructions that could cause (or enable) a subsequent interrupt, if the contents of the save/restore registers have not yet been saved. 7.6.1 Interrupt Ordering Software Requirements The following list identifies the actions that system software must avoid, before saving the save/restore registers’ contents: • Reenabling of MSR[EE] (or MSR[CE,DE] in critical class interrupt handlers). This prevents any asynchronous interrupts and in the case of MSR[DE], any debug interrupts, which include both synchronous and asynchronous types. • Branching (or sequential execution) to addresses not mapped by the TLB or mapped without execute access permission. This prevents instruction storage and instruction TLB error interrupts. • Load, store, or cache management instructions to addresses not mapped by the TLB or not having the necessary access permission (read or write). This prevents data storage and data TLB error interrupts. • Execution of system call (sc) or trap (tw, twi) instructions. This prevents system call and trap exception type program interrupts. • Execution of any floating-point instructions. This prevents floating-point unavailable interrupts. Note that this interrupt would occur upon the execution of any floating-point instruction due to the automatic clearing of MSR[FP]. However, even if software were to reenable MSR[FP], floating-point instructions must still be avoided to prevent program interrupts due to the possibility of floating-point enabled or unimplemented operation exceptions. • Reenabling of MSR[PR]. This prevents privileged instruction exception type program interrupts. Alternatively, software can re-enable MSR[PR], but avoid the execution of any privileged instructions. • Execution of any auxiliary processor instructions that are not implemented in the PowerPC 476FP core. This prevents auxiliary processor unavailable interrupts and auxiliary processor enabled and unimplemented operation exception type program interrupts. Note that the auxiliary processor instructions that are implemented within the PowerPC 476FP core do not cause any of these types of exceptions and can therefore be executed before software saves the save/restore register contents. • Execution of any illegal instructions or any defined instructions not implemented within the PowerPC 476FP core (64-bit instructions, mfapidi). This prevents illegal instruction exception type program interrupts. Processor Interrupts and Exceptions Page 208 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core • Execution of any instruction that could cause an alignment interrupt. This prevents alignment interrupts. See Section 7.5.6 Alignment Interrupt on page 192 for a complete list of instructions that might cause alignment interrupts. • In the machine check handler, use of the caches and TLBs until any detected parity errors have been corrected. This will avoid additional parity errors. It is not necessary for hardware or software to avoid critical class interrupts from within noncritical class interrupt handlers (and hence the processor does not automatically clear MSR[CE,ME,DE] upon a noncritical interrupt) because the two classes of interrupts use different pairs of save/restore registers to save the instruction address and MSR. The converse, however, is not true. That is, hardware and software must cooperate in the avoidance of both critical and noncritical class interrupts from within critical class interrupt handlers, even though the two classes of interrupts use different save/restore register pairs. This is because the critical class interrupt might have occurred from within a noncritical class interrupt handler before the noncritical class interrupt handler saved SRR0 and SRR1. Therefore, within the critical class interrupt handler, both pairs of save/restore registers might contain data that is necessary to the system software. Similarly, the machine check handler must avoid further machine checks and both critical and noncritical interrupts because the machine check handler might have been called from within a critical or noncritical interrupt handler. 7.6.2 Interrupt Order The following is a prioritized listing of the various enabled interrupt types for which exceptions might exist simultaneously: 1. Synchronous (nondebug) interrupts: a. Data storage b. Instruction storage c. Alignment d. Program e. Floating-point unavailable f. System call g. Auxiliary processor unavailable h. Data TLB error i. Data TLB miss exception j. Instruction TLB error k. Instruction TLB miss exception Only one of these types of synchronous interrupts can have an existing exception generating it at any given time. This is guaranteed by the exception priority mechanism (see Section 7.7 Exception Priorities on page 210) and the requirements of the sequential execution model defined by the Power ISA architecture. 2. Machine check 3. Debug 4. Critical input Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 209 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 5. Watchdog timer 6. External input 7. Fixed-interval timer 8. Decrementer Even though, as indicated previously, the noncritical, synchronous exception types listed under item1 are generated with higher priority than the critical interrupt types listed in items 2 - 5, the fact is that these noncritical interrupts are immediately followed by the highest priority existing critical interrupt type without executing any instructions at the noncritical interrupt handler. This is because the noncritical interrupt types do not automatically clear MSR[ME,DE,CE] and hence, do not automatically disable the critical interrupt types. In all other cases, a particular interrupt type from the preceding list automatically disable any subsequent interrupts of the same type and all other interrupt types that are listed after it in the priority order. 7.7 Exception Priorities Power ISA requires all synchronous (precise and imprecise) interrupts to be reported in program order, as implied by the sequential execution model. The one exception to this rule is the case of multiple synchronous imprecise interrupts. Upon a synchronizing event, all previously executed instructions are required to report any synchronous imprecise interrupt-generating exceptions, and the interrupts are then generated according to the general interrupt ordering rules outlined in Section 7.6.2 Interrupt Order on page 209. For example, if a mtmsr instruction causes MSR[FE0,FE1,DE] to all be set, it is possible that a previous floating-point enabled exception (in the FPSCR) and a previous debug exception (in the DBSR) both are still being presented. In such a scenario, a floating-point enabled exception type program interrupt occurs first, followed immediately by a debug interrupt. For any single instruction attempting to cause multiple exceptions for which the corresponding synchronous interrupt types are enabled, this section defines the priority order by which the instruction is permitted to cause a single enabled exception, thus generating a particular synchronous interrupt. Note that it is this exception priority mechanism, along with the requirement that synchronous interrupts be generated in program order, that guarantees that at any given time there exists for consideration only one of the synchronous interrupt types listed in item 1 of Section 7.6.2 Interrupt Order on page 209. The exception priority mechanism also prevents certain debug exceptions from existing in combination with certain other synchronous interrupt-generating exceptions. This section does not define the permitted setting of multiple exceptions for which the corresponding interrupt types are disabled. The generation of exceptions for which the corresponding interrupt types are disabled will have no effect on the generation of other exceptions for which the corresponding interrupt types are enabled. Conversely, if a particular exception for which the corresponding interrupt type is enabled is shown in the following sections to be of a higher priority than another exception, the occurrence of that enabled higher priority exception will prevent the setting of the other exception, independent of whether that other exception’s corresponding interrupt type is enabled or disabled. Except as specifically noted in the following subsections, only one of the exception types listed for a given instruction type is permitted to be generated at any given time, assuming the corresponding interrupt type is enabled. The priority of the exception types are listed in the following sections, ranging from highest to lowest within each instruction type. Processor Interrupts and Exceptions Page 210 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Finally, note that machine check exceptions are defined by the PowerPC architecture to be neither synchronous nor asynchronous. Therefore, machine check exceptions are not considered in the remainder of this section, which specifically addresses the priority of synchronous interrupts. 7.7.1 Exception Priorities for Integer Load, Store, and Cache Management Instructions The following list identifies the priority order of the exception types that might occur within the PowerPC 476FP core as the result of the attempted execution of any integer load, store, or cache management instruction. Included in this category is the former opcode for the icbt instruction, which is an allocated opcode still supported by the PowerPC 476FP core. 1. Debug (IAC exception) 2. Instruction TLB error (instruction TLB miss exception) 3. Instruction storage (execute access control exception) 4. Program (illegal instruction exception) Only applies to the defined 64-bit load, store, and cache management instructions, which are not recognized by the PowerPC 476FP core. 5. Program (privileged instruction) Only applies to the dcbi instruction, and only occurs if MSR[PR] = ‘1’. 6. Data TLB error (data TLB miss exception). 7. Data storage (all exception types except byte ordering exception). 8. Alignment (alignment exception). 9. Debug (DAC or DVC exception). 10. Debug (ICMP exception.) 7.7.2 Exception Priorities for Floating-Point Load and Store Instructions The following list identifies the priority order of the exception types that might occur within the PowerPC 476FP core as the result of the attempted execution of any floating-point load or store instruction. 1. Debug (IAC exception) 2. Instruction TLB error (instruction TLB miss exception) 3. Instruction storage (execute access control exception) 4. Program (illegal instruction exception) This exception occurs if no floating-point unit is attached to the PowerPC 476FP core or if the particular floating-point load or store instruction is not recognized by the attached floating-point unit. 5. Floating-point unavailable (floating-point unavailable exception) This exception occurs if an attached floating-point unit recognizes the instruction, but floating-point instruction processing is disabled (MSR[FP] = ‘0’). 6. Program (unimplemented operation exception) This exception occurs if an attached floating-point unit recognizes but does not support the instruction, and floating-point instruction processing is enabled (MSR[FP] = ‘1’). Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 211 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 7. Data TLB error (data TLB miss exception) 8. Data storage (all exception types except cache locking exception) 9. Alignment (alignment exception) 10. Debug (DAC or DVC exception) 11. Debug (ICMP exception) 7.7.3 Exception Priorities for Allocated Load and Store Instructions The following list identifies the priority order of the exception types that might occur within the PowerPC 476FP core as the result of the attempted execution of any allocated load or store instruction. 1. Debug (IAC exception) 2. Instruction TLB error (instruction TLB miss exception) 3. Instruction storage (execute access control exception) 4. Program (illegal instruction exception) This exception occurs if no auxiliary processor unit is attached to the PowerPC 476FP core, or if the particular allocated load or store instruction is not recognized by the attached auxiliary processor. 5. Program (privileged instruction exception) This exception occurs if an attached auxiliary processor unit recognizes the instruction and indicates that the instruction is privileged, but MSR[PR] = ‘1’. 6. Auxiliary processor unavailable (auxiliary processor unavailable exception) This exception occurs if an attached auxiliary processor recognizes the instruction but indicates that auxiliary processor instruction processing is disabled (whether auxiliary processor instruction processing is enabled is implementation-dependent). 7. Program (unimplemented operation exception) This exception occurs if an attached auxiliary processor recognizes but does not support the instruction, and also indicates that auxiliary processor instruction processing is enabled (whether auxiliary processor instruction processing is enabled is implementation-dependent). 8. Data TLB error (data TLB miss exception) 9. Data storage (all exception types except cache locking exception) 10. Alignment (alignment exception) 11. Debug (DAC or DVC exception) 12. Debug (ICMP exception) 7.7.4 Exception Priorities for Floating-Point Instructions (Other) The following list identifies the priority order of the exception types that might occur within the PowerPC 476FP core as the result of the attempted execution of any floating-point instruction other than a load or store. 1. Debug (IAC exception) 2. Instruction TLB error (instruction TLB miss exception) Processor Interrupts and Exceptions Page 212 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 3. Instruction storage (execute access control exception) 4. Program (illegal instruction exception) This exception occurs if no floating-point unit is attached to the PowerPC 476FP core or if the particular floating-point instruction is not recognized by the attached floating-point unit. 5. Floating-point unavailable (floating-point unavailable exception) This exception occurs if an attached floating-point unit recognizes the instruction but floating-point instruction processing is disabled (MSR[FP] = ‘0’). 6. Program (unimplemented operation exception) This exception occurs if an attached floating-point unit recognizes but does not support the instruction, and floating-point instruction processing is enabled (MSR[FP] = ‘1’). 7. Program (floating-point enabled exception) This exception occurs if an attached floating-point unit recognizes and supports the instruction, floatingpoint instruction processing is enabled (MSR[FP] = ‘1’), and the instruction sets FPSCR[FEX] to ‘1’. 8. Debug (ICMP exception) 7.7.5 Exception Priorities for Allocated Instructions (Other) The following list identifies the priority order of the exception types that might occur within the PowerPC 476FP core as the result of the attempted execution of any allocated instruction other than a load or store, and which is not one of the allocated instructions implemented within the PowerPC 476FP core. 1. Debug (IAC exception) 2. Instruction TLB error (instruction TLB miss exception) 3. Instruction storage (execute access control exception) 4. Program (illegal instruction exception) This exception occurs if no auxiliary processor unit is attached to the PowerPC 476FP core or if the particular allocated instruction is not recognized by the attached auxiliary processor and is not one of the allocated instructions implemented within the PowerPC 476FP core. 5. Program (privileged instruction exception) This exception occurs if an attached auxiliary processor unit recognizes the instruction and indicates that the instruction is privileged, but MSR[PR] = ‘1’. 6. Auxiliary processor unavailable (auxiliary processor unavailable exception) This exception occurs if an attached auxiliary processor recognizes the instruction, but indicates that auxiliary processor instruction processing is disabled (whether auxiliary processor instruction processing is enabled is implementation-dependent). 7. Program (unimplemented operation exception) This exception occurs if an attached auxiliary processor recognizes but does not support the instruction, and also indicates that auxiliary processor instruction processing is enabled (whether auxiliary processor instruction processing is enabled is implementation-dependent). 8. Program (auxiliary processor enabled exception) Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 213 of 322 User’s Manual PowerPC 476FP Embedded Processor Core This exception occurs if an attached auxiliary processor recognizes and supports the instruction, indicates that auxiliary processor instruction processing is enabled, and the instruction execution results in an auxiliary processor enabled exception. Whether auxiliary processor instruction processing is enabled is implementation-dependent, as is whether a given auxiliary processor instruction results in an auxiliary processor enabled exception. 9. Debug (ICMP exception) 7.7.6 Exception Priorities for Privileged Instructions The following list identifies the priority order of the exception types that might occur within the PowerPC 476FP core as the result of the attempted execution of any privileged instruction other than dcbi, rfi, rfci, rfmci, or any allocated instruction not implemented within the PowerPC 476FP core (all of which are covered elsewhere). This list covers, however, the dci, dcread, ici, and icread instructions, which are privileged, allocated instructions that are implemented within the PowerPC 476FP core. This list also covers the defined 64bit privileged instructions and the mfapidi instruction, both of which are not implemented by the PowerPC 476FP core. 1. Debug (IAC exception) 2. Instruction TLB error (instruction TLB miss exception) 3. Instruction storage (execute access control exception) 4. Program (illegal instruction exception) Only applies to the defined 64-bit privileged instructions and the mfapidi instruction. 5. Program (privileged instruction exception) Does not apply to the defined 64-bit privileged instructions or the mfapidi instruction. 6. Debug (ICMP exception) Does not apply to the defined 64-bit privileged instructions or the mfapidi instruction. 7.7.7 Exception Priorities for Trap Instructions The following list identifies the priority order of the exception types that might occur within the PowerPC 476FP core as the result of the attempted execution of a trap (tw, twi) instruction. 1. Debug (IAC exception) 2. Instruction TLB error (instruction TLB miss exception) 3. Instruction storage (execute access control exception) 4. Debug (trap exception) 5. Program (trap exception) 6. Debug (ICMP exception) 7.7.8 Exception Priorities for System Call Instruction The following list identifies the priority order of the exception types that might occur within the PowerPC 476FP core as the result of the attempted execution of a system call (sc) instruction: 1. Debug (IAC exception) Processor Interrupts and Exceptions Page 214 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 2. Instruction TLB error (instruction TLB miss exception) 3. Instruction storage (execute access control exception) 4. System call (system call exception) 5. Debug (ICMP exception) Because the system call exception does not suppress the execution of the sc instruction, but rather the exception occurs when the instruction has completed, an sc instruction can cause both a system call exception and an ICMP debug exception at the same time. In such a case, the associated interrupts occur in the order indicated in Section 7.6.2 Interrupt Order on page 209. 7.7.9 Exception Priorities for Branch Instructions The following list identifies the priority order of the exception types that might occur within the PowerPC 476FP core as the result of the attempted execution of a branch instruction: 1. Debug (IAC exception) 2. Instruction TLB error (instruction TLB miss exception) 3. Instruction storage (execute access control exception) 4. Debug (BRT exception) 5. Debug (ICMP exception) 7.7.10 Exception Priorities for Return From Interrupt Instructions The following list identifies the priority order of the exception types that might occur within the PowerPC 476FP core as the result of the attempted execution of an rfi or rfci instruction: 1. Debug (IAC exception) 2. Instruction TLB error (instruction TLB miss exception) 3. Instruction storage (execute access control exception) 4. Debug (RET exception) 5. Debug (ICMP exception) 7.7.11 Exception Priorities for Preserved Instructions The following list identifies the priority order of the exception types that might occur within the PowerPC 476FP core as the result of the attempted execution of a preserved instruction: 1. Debug (IAC exception) 2. Instruction TLB error (instruction TLB miss exception) 3. Instruction storage (execute access control exception) Version 2.2 July 31, 2014 Processor Interrupts and Exceptions Page 215 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 4. Program (illegal instruction exception) Applies to all preserved instructions except the mftb instruction, which is the only preserved class instruction implemented within the PowerPC 476FP core. 5. Debug (ICMP exception) Only applies to the mftb instruction, which is the only preserved class instruction implemented within the PowerPC 476FP core. 7.7.12 Exception Priorities for Reserved Instructions The following list identifies the priority order of the exception types that might occur within the PowerPC 476FP core as the result of the attempted execution of a reserved instruction: 1. Debug (IAC exception) 2. Instruction TLB error (instruction TLB miss exception) 3. Instruction storage (execute access control exception) 4. Program (illegal instruction exception) Applies to all reserved instruction opcodes except the reserved-no-op instruction opcodes. 5. Debug (ICMP exception) Only applies to the reserved-no-op instruction opcodes. 7.7.13 Exception Priorities for All Other Instructions The following list identifies the priority order of the exception types that might occur within the PowerPC 476FP core as the result of the attempted execution of all other instructions (that is, those not covered in Section 7.7.1 Exception Priorities for Integer Load, Store, and Cache Management Instructions on page 211 through Section 7.7.12 Exception Priorities for Reserved Instructions on page 216). This includes both defined instructions and allocated instructions implemented within the PowerPC 476FP core. 1. Debug (IAC exception) 2. Instruction TLB error (instruction TLB miss exception) 3. Instruction storage (execute access control exception) 4. Program (illegal instruction exception) Applies only to the defined 64-bit instructions because these are not implemented within the PowerPC 476FP core. 5. Debug (ICMP exception) Does not apply to the defined 64-bit instructions because these are not implemented by the PowerPC 476FP core. Processor Interrupts and Exceptions Page 216 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 8. Debug Facilities The PowerPC 476FP Embedded Processor Core includes facilities for debugging during hardware and software development. Debug registers control debug modes and events that are provided by the debug facilities. Developers can control the debug process using these debug modes and debug events. The debug registers can be accessed through software running on the processor. Also, the Joint Test Action Group (JTAG) debug port of the PowerPC 476FP core provides access to the debug registers. The PowerPC 476FP debug facility is core centric. And thus for multiprocessor (MP) debug, the following methods are recommended: • Use an external trace logic unit or tool to trigger and gather information and the RISCWatch tool to provide readable information. Consult with IBM PowerPC support team for further details, • Target one processor of interest and debug the processor. 8.1 Development Tool Support The RISCWatch product is a development tool that uses external debug mode, debug events, and the JTAG debug port to implement a hardware and software development tool. The RISCTrace feature of RISCWatch uses the real-time instruction trace capability of the PowerPC 476FP core. 8.2 Debug Modes The PowerPC 476FP core provides debug modes for use with particular types of debug tools or operations that are typically used in embedded systems development. When these debug modes are enabled, debug events are enabled by setting the corresponding bits in Debug Control Register 0 (DBCR0). These debug events are recorded in the Debug Status Register (DBSR). The PowerPC 476FP core supports four debug modes: • • • • Internal debug mode External debug mode Debug wait enable mode Trace mode The Power ISA Book-III E architecture specification focuses only on internal debug mode and the relationship of debug interrupts to the rest of the interrupt architecture. Internal debug mode is the mode that involves debug software running on the processor itself, typically in the form of the debug interrupt handler. The other debug modes, on the other hand, are outside the scope of the Power ISA architecture, and involve specialpurpose debug hardware external to the PowerPC 476FP core, connected either to the JTAG interface (for external debug mode and debug wait mode) or the trace interface (for trace debug mode). Details of these interfaces and their operation are beyond the scope of this manual. See the PowerPC 476FP Core Support Manual and consult with PowerPC support team for further details. 8.2.1 Internal Debug Mode When internal debug mode is enabled (DBCR0[IDM] = ‘1’), a debug event that sets the DBSR also causes a debug interrupt if debug exceptions are enabled in the Machine State Register (MSR[DE] = ‘1’). See Section 7.4.1 on page 173 for information about the MSR. Software at the debug interrupt vector location is Version 2.2 July 31, 2014 Debug Facilities Page 217 of 322 User’s Manual PowerPC 476FP Embedded Processor Core given control when a debug event occurs. Using normal instructions, software can then access all architected processor resources. This way, debug software can control the processor, gather status, and interact with debugging hardware connected to the processor. However, if internal debug mode is enabled, and debug exceptions are not enabled (MSR[DE] = ‘0’), the processor is operating in trace debug mode. A debug event sets the DBSR, but no debug interrupts occur. To enable the core to broadcast instruction trace data, additional register settings are required. See Section 8.2.3 Trace Mode on page 218. 8.2.2 External Debug Mode External debug mode is enabled by setting DBCR0[EDM]. External debug mode provides access to architected processor resources. It supports stopping, starting, and stepping the processor; setting hardware and software breakpoints; and monitoring processor status. In this mode, debug events (including a move to debug-status register [mtdbsr] instruction) are recorded in the DBSR. The debug events then cause a transition to the stop state. In stop state, normal instruction execution stops to allow an external mechanism to handle the debug event. In stop state, architected processor resources and memory can be accessed and altered using the JTAG interface. Also, interrupts are temporarily disabled. This stop is considered to be hard stop and is caused by the XXC476DEBUGHALT signal being asserted, JDCR[STOP] being asserted, or by any DBSR debug event bit (any except for IDM or RST) and DBCR0[EDM] being set. This stop state is exited when DBSR or DBCR0[EDM] is cleared. A hard stop overrides a weak stop. The CPU must be in a stop (hard or weak) state without MSR[WE] set to execute a step or stuff request from JTAG. A weak stop accepts interrupts, causes a transition back to the run state and service the interrupts, and then go back to the stop state. 8.2.3 Trace Mode Trace mode is the absence of each of the other modes. That is, if internal debug mode, external debug mode, and debug wait mode are all disabled, the processor is in trace debug mode. While in trace mode, all debug events are simply recorded in the DBSR, and are indicated over the trace interface from the PowerPC 476FP core. The processor does not enter the stop state, and a debug interrupt does not occur. See the PowerPC 476FP Core Support Manual for trace event and trace event trigger information. Trace mode is an execution mode only. To allow the core to emit instruction trace data to be collected by an external trace module, CCR0[ITE] must be set and CCR0[DTB] must be cleared. 8.2.4 Debug Wait Enable Mode Debug wait enable mode is similar to external debug mode. It is set up by either MSR[DWE] = ‘1’ or JDCR[DWE] = ‘1’. Any event (including an mtdbsr instruction) that sets the DBSR causes a transition to the stop state. Unlike external debug mode, this is a weak stop request and can be exited by an interrupt or by clearing DBSR, MSR[DWE], or JDCR[DWE]. Debug Facilities Page 218 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 8.3 Debug Events Debug events are used to cause debug exceptions, which are recorded in the DBSR. By setting the corresponding bit in DBCR0 and Debug Control Register 1 (DBCR1), a debug event is enabled to set a DBSR bit. When the DBSR bit is set, a debug exception occurs. Furthermore, when a DBSR bit is set, if debug mode is enabled (MSR[DE] = ‘1’), a debug interrupt is generated. Certain debug events cannot occur when debug mode is not enabled (MSR[DE] = ‘0’). In such situations, no debug exception occurs, and no DBSR bit is set. Other debug events can cause debug exceptions and set DBSR bits regardless of the state of MSR[DE]. The associated debug interrupts that result from such debug exceptions are delayed until MSR[DE] is set to ‘1’, provided the exceptions have not been cleared from DBSR in the meantime. Anytime a DBSR bit is set when MSR[DE] = ‘0’, the imprecise debug event (DBSR[IDE]) is set. DBSR[IDE] indicates that the associated debug exception bit in the DBSR is set while debug interrupts are disabled using the MSR[DE] bit. Debug interrupt handler software can use this bit to determine whether the address recorded in the Critical Save/Restore Register 0 (CSRR0) must be interpreted as the address associated with the instruction causing the debug exception or the address of the instruction after the one that set MSR[DE], thereby enabling the delayed debug interrupt. All debug registers are privileged, and therefore, debug set ups are a part of the software kernel. To access the debug registers, call the debug utility. The PowerPC 476FP core supports the following debug events: • • • • • • • • Instruction address comparison (IAC) Data address comparison (DAC) Trap Branch taken (BT) Instruction completed (ICMP) Interrupt (IRPT) Return (RET) Unconditional (UDE) 8.3.1 Broadcast of Debug Events Debug events are enabled using DBCR0 and one of the previous modes. All events (including an mtdbsr instruction) that set the DBSR are broadcast using the trace trigger event bus. The functionality of the trace-trigger bus to user debug facilities depends on the chip-specific implementation. It is solely controlled by debug modes. The broadcast is done in a 4:1 clock ratio, provided by the system-on-a-chip (SoC). See the documentation for your specific chip implementation for more information. 8.3.2 Exceptions In general, a debug event causes an exception (sets the DBSR) based on the corresponding DBCR0 bit being enabled, MSR[DE] being set, and not in stuff state. However, branch taken (BT) and ICMP events do not cause exceptions if all of the following settings are true: • DBCR0[IDM] = ‘1’ • MSR[DE] = ‘0’ • DBCR0[EDM] = ‘0’ Version 2.2 July 31, 2014 Debug Facilities Page 219 of 322 User’s Manual PowerPC 476FP Embedded Processor Core • MSR[DWE] = ‘0’ Also, an IRPT event that results from a critical interrupt or machine check does not cause an exception if all of the following settings are true: • DBCR0[IDM] = ‘1’ • DBCR0[EDM] = ‘0’ • MSR[DWE] = ‘0’ Furthermore, an exception might cause the following actions: • Hard stop (if DBCR0[EDM] = ‘1’) • Weak stop (if MSR[DWE] = ‘1’ and JDCR[DWE]) • Interrupt (if DBCR[IDM] = ‘1’ and MSR[DE] = ‘1’) Notes: • Stuff state overrides all DBSR settings because debug events are not allowed in stuff state. • DBSR[IDE] must be set to ‘1’ when setting any DBSR event. Both DBCR0[IDM] and MSR[DE] must also be set to ‘1’. See Section 7 Processor Interrupts and Exceptions on page 167 for more information about exceptions and machine states. 8.3.3 Instruction Address Comparison IAC debug events occur when execution of an instruction is attempted for which the instruction address and other parameters match the IAC conditions specified by DBCR0, DBCR1, and the four IAC registers (IAC1 - IAC4). Depending on the IAC mode specified by DBCR1, these IAC registers can be used to specify four independent, exact IAC addresses. Also, they can be configured in pairs to specify ranges of instruction addresses for which IAC debug events must occur. The IAC registers can be paired as follows: • IAC1 and IAC2 • IAC3 and IAC4 8.3.3.1 IAC Debug Events For a given IAC event to occur, the corresponding IAC event enable bit in DBCR0 must be set. DBCR0 and DBCR1 are used to specify the IAC conditions. The four IAC events, IAC1, IAC2, IAC3, and IAC4 are enabled by setting the following bits: • • • • DBCR0[IAC1] DBCR0[IAC2] DBCR0[IAC3] DBCR0[IAC4] When a given IAC event occurs, the corresponding DBSR[IAC1], DBSR[IAC2], DBSR[IAC3], or DBSR[IAC4] bit is set. IAC events can be enabled to operate in three modes. DBCR1[IAC12M] controls the comparison mode for the IAC1/IAC2 pair, and DBCR1[IAC34M] controls the comparison mode for the IAC3/IAC4 events. The three comparison modes are described in the following sections. Debug Facilities Page 220 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 8.3.3.2 Exact Comparison Mode This mode is enabled by setting DBCR1[IAC12M] = ‘00’ and DBCR1[IAC34M] = ‘00’. In this mode, the instruction address is compared to the value in the corresponding IAC register. The IAC event occurs only if the comparison is an exact match. 8.3.3.3 Range Inclusive Comparison Mode This mode is enabled by setting DBCR1[IAC12M] = ‘01’ and DBCR1[IAC34M] = ‘01’. In this mode, the IAC1 or IAC2 event occurs only if the instruction address is within the range defined by the IAC1/IAC2 register values as follows: IAC1 ≤ address < IAC2. Similarly, the IAC3 or IAC4 event occurs only if the instruction address is within the range defined by the IAC3/IAC4 register values as follows: IAC3 ≤ address < IAC4. For a given IAC1/IAC2 or IAC3/IAC4 pair, when the instruction address falls within the specified range, either one or both of the corresponding IAC debug event bits are set in the DBSR, as determined by which of the two corresponding IAC event enable bits are set in DBCR0. For example, when the IAC1/IAC2 pair are set to range inclusive comparison mode, and the instruction address falls within the defined range, DBCR1[IAC1] and DBCR1[IAC2] determine whether DBSR[IAC1], DBSR[IAC2], or both are set. It is a programming error to set either of the IAC pairs to a range comparison mode (either inclusive or exclusive) without also enabling at least one of the corresponding IAC event enable bits in DBCR0. The IAC range autotoggle mechanism can switch the IAC range mode from inclusive to exclusive or from exclusive to inclusive. See IAC Range Mode Autotoggle Field on page 222. 8.3.3.4 Range Exclusive Comparison Mode This mode is enabled by setting DBCR1[IAC12M] = ‘11’ and DBCR1[IAC34M] = ‘11’. In this mode, the IAC1 or IAC2 event occurs only if the instruction address is outside the range defined by the IAC1/IAC2 register values, as follows: address < IAC1 or address ³ IAC2. Similarly, the IAC3 or IAC4 event occurs only if the instruction address is outside the range defined by the IAC3/IAC4 register values, as follows: address < IAC3 or address ³ IAC4. For a given IAC1/IAC2 or IAC3/IAC4 pair, when the instruction address falls outside the specified range, either one or both of the corresponding IAC debug event bits are set in the DBSR, as determined by which of the two corresponding IAC event enable bits are set in DBCR0. For example, when the IAC1/IAC2 pair are set to range exclusive comparison mode, and the instruction address falls outside the defined range, DBCR1[IAC1] and DBCR1[IAC2] determine whether DBCR1[IAC1], DBCR1[IAC2], or both are set. It is a programming error to set either of the IAC pairs to a range comparison mode (either inclusive or exclusive) without also enabling at least one of the corresponding IAC event enable bits in DBCR0. The IAC range autotoggle mechanism can switch the IAC range mode from inclusive to exclusive, or from exclusive to inclusive. See IAC Range Mode Autotoggle Field on page 222. Version 2.2 July 31, 2014 Debug Facilities Page 221 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 8.3.3.5 IAC User/Supervisor Field DBCR1[IAC1US], DBCR1[IAC2US], DBCR1[IAC3US], and DBCR1[IAC4US] are the individual IAC user/supervisor fields for each of the four IAC events. The IAC user/supervisor fields specify what operating mode the processor must be in order for the corresponding IAC event to occur. The operating mode is determined by the problem state field of the Machine State Register (MSR[PR]). See Section 7.4.1 Machine State Register (MSR) on page 173. When the IAC user/supervisor field is ‘00’, the operating mode does not matter; the IAC debug event can occur independent of the state of MSR[PR]. When this field is ‘10’, the processor must be operating in supervisor mode (MSR[PR] = ‘0’). When this field is ‘11’, the processor must be operating in user mode (MSR[PR] = ‘1’). The IAC user/supervisor field value of ‘01’ is reserved. If a pair of IAC events (IAC1/IAC2 or IAC3/IAC4) are operating in range inclusive or range exclusive mode, it is a programming error (and the results of any instruction address comparison are undefined) if the corresponding pair of IAC user/supervisor fields are not set to the same value. For example, if IAC1/IAC2 are operating in one of the range modes, both DBCR1[IAC1US] and DBCR1[IAC2US] must be set to the same value. 8.3.3.6 IAC Effective/Real Address Field DBCR1[IAC1ER], DBCR1[IAC2ER], DBCR1[IAC3ER], and DBCR1[IAC4ER] are the individual IAC effective/real address fields for each of the four IAC events. The IAC effective/real address fields specify whether the instruction address comparison is performed using the effective, virtual, or real address. When the IAC effective/real address field is ‘00’, the comparison is performed using the effective address only; the IAC debug event can occur independent of the instruction address space (MSR[IS]). When this field is ‘10’, the IAC debug event occurs only if the effective address matches the IAC conditions and is in virtual address space 0 (MSR[IS] = ‘0’). Similarly, when this field is ‘11’, the IAC debug event occurs only if the effective address matches the IAC conditions and is in virtual address space 1 (MSR[IS] = ‘1’). In these latter two modes, the virtual address space of the instruction is considered, not the entire virtual address. The process identifier, which forms the final part of the virtual address, is not considered. Finally, the IAC effective/real address field value of ‘01’ is reserved. If a pair of IAC events (IAC1/IAC2 or IAC3/IAC4) are operating in range inclusive or range exclusive mode, it is a programming error if the corresponding pair of IAC effective/real address fields are not set to the same value. If this occurs, the results of any instruction address comparison are undefined. For example, if IAC1/IAC2 are operating in one of the range modes, both DBCR1[IAC1ER] and DBCR1[IAC2ER] must be set to the same value. 8.3.3.7 IAC Range Mode Autotoggle Field DBCR1[IAC12AT] controls the toggling mechanism for the IAC1/IAC2 events. DBCR1[IAC34AT] controls the toggle mechanism for the IAC3/IAC4 events. When the IAC mode for one of the pairs of IAC debug events is set to one of the range modes (either range inclusive or range exclusive), the IAC range mode autotoggle field corresponding to that pair of IAC debug events controls whether the range mode automatically toggles from inclusive to exclusive, and from exclusive to inclusive. When the IAC range mode toggle field is set to ‘1’, toggling is enabled; otherwise, it is disabled. It is a programming error if an IAC range mode autotoggle field is set to ‘1’ without the corresponding IAC mode field being set to one of the range modes. If this occurs, the results of any instruction address comparison are undefined. When toggling is enabled for a pair of IAC debug events, upon each occurrence of an IAC debug event within that pair the value of the corresponding autotoggle status field in the DBSR (DBSR[IAC12ATS] and DBSR[IAC34ATS]) is reversed. That is, if the autotoggle status field is set to ‘0’ before the occurrence of the Debug Facilities Page 222 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core IAC debug event, it is changed to ‘1’ at the same time that the IAC debug event is recorded in the DBSR. Conversely, if the autotoggle status field is set to ‘1’ before the occurrence of the IAC debug event, it is changed to ‘0’ at the same time that the IAC debug event is recorded in the DBSR. Furthermore, when autotoggle is enabled, the autotoggle status field of the DBSR affects the interpretation of the IAC mode field of DBCR1. If the autotoggle status field is set to ‘0’, the IAC mode field value of ‘10’ selects range-inclusive mode, whereas the value of ‘11’ selects range-exclusive mode. However, when the autotoggle status field is set to ‘1’, the interpretation of the IAC mode field is reversed. That is, the IAC mode field value of ‘10’ selects range-exclusive mode, whereas the value of ‘11’ selects range-inclusive mode. The relationship of the IAC mode, IAC range mode autotoggle, and IAC range mode autotoggle status fields is summarized in Table 8-1. Table 8-1. IAC Range Mode Toggle Summary DBCR1 IAC12M/IAC34M DBCR1 IAC12AT/IAC34AT DBSR IAC12ATS/IAC34ATS IAC Mode ‘10’ ‘0’ N/A Range Inclusive ‘10’ ‘1’ ‘0’ Range Inclusive ‘10’ ‘1’ ‘1’ Range Exclusive ‘11’ ‘0’ N/A Range Exclusive ‘11’ ‘1’ ‘0’ Range Exclusive ‘11’ ‘1’ ‘1’ Range Inclusive 8.3.4 Data Address Comparison DAC debug events occur when execution is attempted of a load, store, or cache management instruction for which the data storage address and other parameters match the DAC conditions specified by DBCR0. DAC events are written to the DBSR based on the following criteria: • • • • The DAC and data value comparison (DVC) exceptions that are enabled Whether internal debug mode, external debug mode, or debug wait enable mode are enabled. A DAC address comparison match. A DVC data comparison match. The four DAC events are as follows: • • • • Data address comparison 1 read (DAC1R) Data address comparison 1 write (DAC1W) Data address comparison 2 read (DAC2R) debug events Data address comparison 2 write (DAC2W) If these debug events occur in trace mode, the events are recorded in the DBSR when the instruction that caused the event is committed. However, if debug events occur when the processor is in debug interrupt mode, the data cache unit (DCU) does not perform a normal confirmation of the instruction, but performs a faulty confirmation the operation when it is in load write-back (LWB). If the instruction faulty commits, the debug event is recorded in the DBSR. 8.3.4.1 DAC Debug Event Fields The following fields in DBCR0 and DBCR2 are used to specify the DAC conditions. Version 2.2 July 31, 2014 Debug Facilities Page 223 of 322 User’s Manual PowerPC 476FP Embedded Processor Core DAC Event Enable Field DBCR0[DAC1R, DAC1W, DAC2R, DAC2W] are the individual DAC event enables for the two DAC events, DAC1 and DAC2. For each of the two DAC events, one enable is for DAC read events, and the other is for DAC write events. Load, dcbt, dcbtst, icbi, and icbt instructions might cause DAC read events, while store, dcbst, dcbf, dcbi, and dcbz instructions might cause DAC write events. For a given DAC event to occur, the corresponding DAC event enable bit in DBCR0 for the particular operation type must be set. When a DAC event occurs, the corresponding DBSR[DAC1R, DAC1W, DAC2R, DAC2W] bit is set. These same DBSR bits are shared by DVC debug event. DAC Mode Field DBCR2[DAC12M] controls the comparison mode for the DAC1 and DAC2 events. There are four comparison modes supported by the PowerPC 476FP core: • Exact comparison mode (DBCR2[DAC12M] = ‘00’) In this mode, the data address is compared to the value in the corresponding DAC register, and the DAC event occurs only if the comparison is an exact match. • Address bit mask mode (DBCR2[DAC12M] = ‘01’) In this mode, the DAC1 or DAC2 event occurs only if the data address matches the value in the DAC1 register, as masked by the value in the DAC2 register. That is, the DAC1 register specifies an address value, and the DAC2 register specifies an address bit mask that determines which bit of the data address should participate in the comparison to the DAC1 value. For every bit set to 1 in the DAC2 register, the corresponding data address bit must match the value of the same bit position in the DAC1 register. For every bit set to 0 in the DAC2 register, the corresponding address bit comparison does not affect the result of the DAC event determination. This comparison mode is useful for detecting accesses to a particular byte address, when the accesses might be of various sizes. For example, if the debugger is interested in detecting accesses to byte address x‘00000003’, these accesses might occur because of a byte access to that specific address, or because of a halfword access to address x‘00000002’, or because of a word access to address x‘00000000’. By using address bit mask mode and specifying that the low-order two bits of the address should be ignored (that is, setting the address bit mask in DAC2 to x‘FFFFFFFC’), the debugger can detect each of these types of access to byte address x‘00000003’. When the data address matches the address bit mask mode conditions, either one or both of the DAC debug event bits corresponding to the operation type (read or write) are set in the DBSR, as determined by which of the corresponding two DAC event enable bits are set in DBCR0. That is, when an address bit mask mode DAC debug event occurs, the setting of DBCR2[DAC1R, DAC1W, DAC2R, DAC2W] determines whether one or the other or both of the DBSR[DAC1R, DAC1W, DAC2R, DAC2W] bits corresponding to the operation type are set. It is a programming error to set the DAC mode field to address bit mask mode without also enabling at least one of the four DAC event enable bits in DBCR0. Debug Facilities Page 224 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core • Range inclusive comparison mode (DBCR2[DAC12M] = ‘10’) In this mode, the DAC1 or DAC2 event occurs only if the data address is within the range defined by the DAC1 and DAC2 register values, as follows: DAC1 ? address < DAC2. When the data address falls within the specified range, either one or both of the DAC debug event bits corresponding to the operation type (read or write) are set in the DBSR, as determined by which of the corresponding two DAC event enable bits are set in DBCR0. That is, when a range inclusive mode DAC debug event occurs, the setting of DBCR2[DAC1R, DAC1W, DAC2R, DAC2W] determines whether one or the other or both of the DBSR[DAC1R, DAC1W, DAC2R, DAC2W] bits corresponding to the operation type are set. It is a programming error to set the DAC mode field to a range comparison mode (either inclusive or exclusive) without also enabling at least one of the four DAC event enable bits in DBCR0. • Range exclusive comparison mode (DBCR2[DAC12M] = ‘11’) In this mode, the DAC1 or DAC2 event occurs only if the data address is outside the range defined by the DAC1 and DAC2 register values, as follows: address < DAC1 or address ? DAC2. When the data address falls outside the specified range, either one or both of the DAC debug event bits corresponding to the operation type (read or write) are set in the DBSR, as determined by which of the corresponding two DAC event enable bits are set in DBCR0. That is, when a range exclusive mode DAC debug event occurs, the setting of DBCR2[DAC1R, DAC1W, DAC2R, DAC2W] determines whether one or the other or both of the DBSR[DAC1R, DAC1W, DAC2R, DAC2W] bits corresponding to the operation type are set. It is a programming error to set the DAC mode field to a range comparison mode (either inclusive or exclusive) without also enabling at least one of the four DAC event enable bits in DBCR0. DAC User/Supervisor Field DBCR2[DAC1US, DAC2US] are the individual DAC user/supervisor fields for the two DAC events. The DAC user/supervisor fields specify what operating mode the processor must be for the corresponding DAC event to occur. The operating mode is determined by the Problem State field of the Machine State Register (MSR[PR]. When the DAC user/supervisor field is ‘00’, the operating mode does not matter—the DAC debug event may occur independent of the state of MSR[PR]. When this field is ‘10’, the processor must be operating in supervisor mode (MSR[PR] = ‘0’). When this field is ‘11’, the processor must be operating in user mode (MSR[PR] = ‘1’). The DAC user/supervisor field value of ‘01’ is reserved. If the DAC mode is set to one of the paired modes (address bit mask mode, or one of the two range modes), it is a programming error (and the results of any data address comparison are undefined) if DBCR2[DAC1US] and DBCR2[DAC2US] are not set to the same value. DAC Effective/Real Address Field DBCR2[DAC1ER, DAC2ER] are the individual DAC effective/real address fields for the two DAC events. The DAC effective/real address fields specify whether the instruction address comparison should be performed using the effective, virtual, or real address for an explanation of these different types of addresses). When the DAC effective/real address field is ‘00’, the comparison is performed using the effective address only; the DAC debug event may occur independent of the data address space (MSR[DS]). When this field is ‘10’, the DAC debug event occurs only if the effective address matches the DAC conditions and is Version 2.2 July 31, 2014 Debug Facilities Page 225 of 322 User’s Manual PowerPC 476FP Embedded Processor Core in virtual address space 0 (MSR[DS] = ‘0’). Similarly, when this field is ‘11’, the DAC debug event occurs only if the effective address matches the DAC conditions and is in virtual address space 1 (MSR[DS] = ‘1’). Note that in these latter two modes, in which the virtual address space of the data is considered, it is not the entire virtual address which is considered. The process ID, which forms the final part of the virtual address, is not considered. Finally, the DAC effective/real address field value of ‘01’ is reserved, and corresponds to the PowerPC Book-E architected real address comparison mode, which is not supported by the PowerPC 476FP core. If the DAC mode is set to one of the paired modes (address bit mask mode, or one of the two range modes), it is a programming error (and the results of any data address comparison are undefined) if DBCR2[DAC1ER] and DBCR2[DAC2ER] are not set to the same value. DVC Byte Enable Field DBCR2[DVC1BE, DVC2BE] are the individual data value compare (DVC) byte enable fields for the two DVC events. These fields must be disabled (by being set to ‘0000’) for the corresponding DAC debug event to be enabled. In other words, when any of the DVC byte enable field bits for a given DVC event are set to ‘1’, the corresponding DAC event is disabled, and the various DAC field conditions are used with the DVC field conditions to determine whether a DVC event should occur. 8.3.4.2 DAC Debug Events Applied to Instructions that Result in Multiple Storage Accesses Certain misaligned load and store instructions are handled by making multiple, independent storage accesses. Similarly, load and store multiple and string instructions that access more than one register result in more than one storage access. Load and Store Alignment provides a detailed description of the circumstances that lead to such multiple storage accesses being made as the result of the execution of a single instruction. Whenever the execution of a given instruction results in multiple storage accesses, the data address of each access is independently considered for whether or not it will cause a DAC debug event. 8.3.4.3 DAC Debug Events Applied to Various Instruction Types Various special cases apply to the cache management instructions, the store word conditional indexed (stwcx.) instruction, and the load and store string indexed (lswx, stswx) instructions, with regards to DAC debug events. These special cases are as follows: dcbz The dcbz instruction is considered store with respect to both storage access control and DAC debug events. The dcbz instruction directly changes the contents of a given storage location. As “store” operations, they may cause DAC write debug events. dcbst, dcbf, dcbi The dcbst, dcbf, and dcbi instructions are considered loads with respect to storage access control because they do not change the contents of a given storage location. They might merely cause the data at that storage location to be moved from the data cache out to memory. However, in a debug environment, the fact that these instructions might lead to write operations on the external interface is typically the event of interest. Debug Facilities Page 226 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Therefore, these instructions are considered stores with respect to DAC debug events, and might cause DAC write debug events. dcbt, dcbtst, icbt The touch instructions are considered loads, except for dcbtst, which is a store with respect to both storage access control and DAC debug events. However, these instructions are treated as no-ops if they refer to caching inhibited storage locations or if they cause data storage or data TLB miss exceptions. Consequently, if a touch instruction is treated as a no-op for one of these reasons, it does not cause a DAC read debug event. However, if a touch instruction is not treated as a no-op for one of these reasons, it might cause a DAC read debug event. dcba The dcba instruction is treated as a no-op, and thus will not cause a DAC debug event. icbi The icbi instruction is considered a load with respect to both storage access control and DAC debug events, and thus might cause a DAC read debug event. dci, dcread, ici, icread The dci and ici instructions do not generate an address. But rather, the dci instruction affects the entire data cache, and the ici instruction affects the entire instruction cache. Similarly, the dcread and icread instructions do not generate an address, but rather an index that is used to select a particular location in the respective cache, without regard to the storage address represented by that location. Therefore, none of these instructions cause DAC debug events. stwcx. If the execution of a stwcx. instruction would otherwise have caused a DAC write debug event, but the processor does not have the reservation from a lwarx instruction, the DAC write debug event does not occur because the storage location does not get written. lswx, stswx DAC debug events do not occur for lswx or stswx instructions with a length of 0 (XER[TBC] = ‘0’) because these instructions do not access storage. 8.3.4.4 Data Value Compare (DVC) Debug Event DVC debug events occur when execution is attempted of a load, store, or dcbz instruction for which the data storage address and other parameters match the DAC conditions specified by DBCR0, DBCR2, and the DAC registers, and for which the data accessed matches the DVC conditions specified by DBCR2 and the DVC registers. In other words, for a DVC debug event to occur, the conditions for a DAC debug event must first be met, and then the data must also match the DVC conditions. In addition to the DAC conditions, there are two DVC registers DVC1 and DVC2. The DVC registers can be used to specify two independent, 4-byte data values, which are selectively compared against the data being accessed by a given load, store, or cache management instruction. When a DVC event occurs, the corresponding DBSR[DAC1R, DAC1W, DAC2R, DAC2W] bit is set. These same DBSR bits are shared by DAC debug events. Version 2.2 July 31, 2014 Debug Facilities Page 227 of 322 User’s Manual PowerPC 476FP Embedded Processor Core DVC Debug Event Fields In addition to the DAC debug event fields described in Section 8.3.4.1 DAC Debug Event Fields on page 223 and the DVC registers themselves, two fields in DBCR2 are used to specify the DVC conditions, as follows: • DVC byte enable field DBCR2[DVC1BE, DVC2BE] are the individual DVC byte enable fields for the two DVC events. When one or the other (or both) of these fields is disabled (by being set to ‘0000’), the corresponding DVC debug event is disabled (the corresponding DAC debug event can still be enabled, as determined by the DAC debug event enable field of DBCR0). When either one or both of these fields is enabled (by being set to a nonzero value), the corresponding DVC debug event is enabled. Each bit of a given DVC byte enable field corresponds to a byte position within an aligned word of memory. For a given aligned word of memory, the byte offsets (or byte lanes) within that word are numbered 0, 1, 2, and 3, starting from the left-most (most significant) byte of the word. Accordingly, bits 0:3 of a given DVC byte enable field correspond to bytes 0:3 of an aligned word of memory being accessed. For an access to match the DVC conditions for a given byte, the access must be transferring data on that given byte position and the data must match the corresponding byte value within the DVC register. For each storage access, the DVC comparison is made against the bytes that are being accessed within the aligned word of memory containing the starting byte of the transfer. For example, consider a load word instruction with a starting data address of x‘01’. The four bytes from memory are located at addresses x‘01’ - x‘04’, but the aligned word of memory containing the starting byte consists of addresses x‘00’ - x‘03’. Thus, the only bytes being accessed within the aligned word of memory containing the starting byte are the bytes at addresses x‘00’ - x‘03’, and only these bytes are considered in the DVC comparison. The byte transferred from address x‘04’ is not considered. • DVC mode field DBCR2[DVC1M, DVC2M] are the individual DVC mode fields for the two DVC events. Each one of these fields specifies the particular data value comparison mode for the corresponding DVC debug event. The PowerPC 476FP core supports three comparison modes: – AND comparison mode (DBCR2[DVC1M, DVC2M] = ‘01’) In this mode, all data byte lanes enabled by a DVC byte enable field must be being accessed and must match the corresponding byte data value in the corresponding DVC1 or DVC2 register. – OR comparison mode (DBCR2[DVC1M, DVC2M] = ‘10’) In this mode, at least one data byte lane that is enabled by a DVC byte enable field must be being accessed and must match the corresponding byte data value in the corresponding DVC1 or DVC2 register. – AND-OR comparison mode (DBCR2[DVC1M, DVC2M] = ‘11’) In this mode, the four byte lanes of an aligned word are divided into two pairs, with byte lanes 0 and 1 being in one pair, and byte lanes 2 and 3 in the other pair. The DVC comparison mode for each pair of byte lanes operates in AND mode, and then the results of these two AND mode comparisons are ORed together to determine whether a DVC debug event occurs. In other words, a DVC debug event occurs if either one or both of the pairs of byte lanes satisfy the AND mode comparison requirements. Debug Facilities Page 228 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core This mode may be used to cause a DVC debug event upon an access of a particular halfword data value in either of the two halfwords of a word in memory. DVC Debug Events Applied to Instructions that Result in Multiple Storage Accesses Certain misaligned load and store instructions are handled by making multiple, independent storage accesses. Similarly, load and store multiple and string instructions that access more than one register result in more than one storage access. Whenever the execution of a given instruction results in multiple storage accesses, the address and data of each access is independently considered for whether it will cause a DVC debug event. Data Matching There are three modes of data matching: all bytes, any bytes, or halfword match. The bytes that are compared are determined by the DVC[BE] bits, which also enable the DVC. The modes are set by DVC[M]. The DVC comparison matches against the data as it is read out of memory. Endianness and byte-reversal are not accounted for when doing this comparison. Therefore, the data bytes must be in the same order as they are stored in the memory location. The DVC comparison is performed on byte lanes. This means DVC[0:7] can only match against bytes 0 or 4 of the double word, DVC[8:15] can only match against bytes 1 or 5 of the double word, and so on. All memory accesses are segmented, or replicated, based on double-word boundary. If an access is not data word-aligned and crosses the double word boundary, only the valid data from the first double word will be available for comparison. If an access does not cross the double word boundary, the entire word will be available for comparison. Given that the data will be aligned in the appropriate byte lanes, an unaligned access might appear to have the data wrapped around in the DVC register. For example, the data at address x‘0’ is x‘01234567_89abcdef’. An access to the word at location x‘0’ would yield x‘01234567’, and that is the value that should be stored in the DVC registers for comparison. An access to the word at location x‘2’ would yield x‘456789ab’. However, this address is unaligned, so the data as written is not in the appropriate byte lanes. Aligning this data to the appropriate byte lanes yields a DVC value of x‘89ab4567’ if a match is expected. If a word access is performed to the second half of the double word, only the valid data from the first double word will be available for comparison. For example, if an access is for a word at address x‘5’, only the bytes at address x‘5’, x‘6’, and x‘7’ will be available for comparison. The last byte of the word (address x‘8’) will get fetched by a second operation. As a result, in the case of an unaligned access, the DVC would have to be configured to only match on the number of bytes that would be accessed from the first dword. DVC Debug Events Applied to Various Instruction Types Various special cases apply to the cache management instructions, the store word conditional indexed (stwcx.) instruction, and the load and store string indexed (lswx, stswx) instructions, with regards to DVC debug events. These special cases are as follows: Version 2.2 July 31, 2014 Debug Facilities Page 229 of 322 User’s Manual PowerPC 476FP Embedded Processor Core dcbz The dcbz instruction is the only cache management instruction that can cause a DVC debug event. dcbz is the only such instruction that actually writes new data to a storage location (in this case, an entire 128-byte data L2 cache line is written to zeroes). stwcx. If the execution of a stwcx. instruction would otherwise have caused a DVC write debug event, but the processor does not have the reservation from a lwarx instruction, the DVC write debug event does not occur because the storage location does not get written. lswx, stswx DVC debug events do not occur for lswx or stswx instructions with a length of 0 (XER[TBC] = ‘0’) because these instructions do not access storage. 8.3.5 Trap A trap debug event occurs if trap debug events are enabled (DBCR0[TRAP] = ‘1’), a trap instruction (trap word [tw] or trap word immediate [twi] is executed, and the conditions specified by the instruction for the trap are met. Table 8-2 summarizes the behavior and actions. Table 8-2. Trap Debug Event Actions DBCR0 MSR[DE] MSR[DWE] and JDCR[DWE] Action If Event Occurs [TRAP] [IDM] [EDM] 0 – – – – Program interrupt taken. 1 0 0 – 0 DBSR[TRAP] is set. 1 – 1 – – DBSR[TRAP] is set. Transition to the STOP state. 1 - 0 – 1 DBSR[TRAP] is set. Transition to the STOP state. 1 1 0 0 0 DBSR[TRAP] is set. DBSR[IDE] is set. A program interrupt is taken. SRR0 is set to the address of the trap instruction. 1 1 0 1 0 DBSR[TRAP] is set. Debug interrupt is taken. CSRR0 is set to the address of the trap instruction. Note: This debug trap is different from opcode traps based on the IOCCR (Instruction Opcode Compare Control Register. 8.3.6 Branch Taken A Branch Taken (BRT) debug event occurs if the processor has internal debug mode, external debug mode, or debug wait mode enabled; BRT debug events are enabled (DBCR0[BRT] = ‘1’); and execution of a branch instruction whose direction is taken confirmation (that is, either an unconditional branch or a conditional branch whose branch condition is met). In internal debug mode, MSR[DE] must be set to ‘1’ for BT events to be recorded in the DBSR. This is because branch instructions occur frequently. Allowing these common events to be recorded as exceptions in the DBSR when debug interrupts are disabled through MSR[DE] = ‘1’ results in an inordinate number of imprecise debug interrupts. Therefore, BT debug events are not recognized if MSR[DE] = ‘0’ at the time of the execution of the branch instruction, and DBSR[IDE] cannot be set by a branch taken debug event. Table 8-3 summarizes the debug register setting and the actions. Debug Facilities Page 230 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Table 8-3. BRT Debug Event Actions DBCR0 MSR[DE] MSR[DWE] and JDCR[DWE] Action If Event Occurs [BRT] [IDM] [EDM] 0 – – – – No action. 1 0 0 – 0 DBSR[BRT] is set through a normal commit. 1 – 1 – – DBSR[BRT] is set through a faulty commit. Transition to the STOP state. 1 – 0 – 1 DBSR[BRT] is set through a faulty commit. Transition to the STOP state. 1 1 0 0 0 No action. 1 1 0 1 0 DBSR[BRT] is set through a faulty commit. A debug interrupt is taken. CSRR0 is set to the address of the branch instruction. 8.3.7 Instruction Completed An ICMP debug event occurs if DBCR0[ICMP] = ‘1’, execution of any instruction is completed, and MSR[DE] = ‘1’. The IU handles an ICMP debug event as a context synchronizing (rsync, which is similar to csync [or CSI]) operation. However, the operation associated with the ICMP debug event is issued to its normal pipe. When enabled as a trace event, the ICMP debug event sets the DBSR on a normal commitment using the tag in the central scrutinizer (CS). For ICMP debug events in internal debug mode, external debug mode, or debug wait mode, commitment of the csync operation (or a normal commitment) can occur at any time to set the DBSR and cause the interrupt or stop. If execution of an instruction is suppressed because the instruction is causing another exception that is enabled to generate an interrupt, the attempted execution of that instruction does not cause an ICMP debug event. However, the system call (sc) instruction does not fall into the category of an instruction whose execution is suppressed because the instruction completes execution and then generates a system call interrupt. In this case, the ICMP debug exception is also set. ICMP debug events are not recognized if MSR[DE] = ‘0’ at the time the instruction is executed. Also, DBSR[IDE] cannot be set by an ICMP debug event. This is because if the common event of instruction completion is recorded as an exception in the DBSR while debug interrupts are disabled through MSR[DE], the debug interrupt handler software receives an inordinate number of imprecise debug interrupts every time debug interrupts are re-enabled using MSR[DE]. When an ICMP debug event occurs, DBSR[ICMP] is set to ‘1’ to record the debug exception, a debug interrupt occurs immediately (provided no higher priority exception is enabled to cause an interrupt), and CSRR0 is set to the address of the instruction after the one causing the ICMP debug exception. Table 8-4 summarizes the debug register setting and the actions. Table 8-4. ICMP Debug Event Actions DBCR0 MSR[DE] MSR[DWE] and JDCR[DWE] Action If Event Occurs [ICMP] [IDM] [EDM] 0 – – – – No action. 1 0 0 – 0 No action. Version 2.2 July 31, 2014 Debug Facilities Page 231 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table 8-4. ICMP Debug Event Actions DBCR0 MSR[DE] MSR[DWE] and JDCR[DWE] Action If Event Occurs [ICMP] [IDM] [EDM] 1 – 1 – – DBSR[ICMP] is set. Transition to the STOP state. 1 – 0 – 1 DBSR[ICMP] is set. Transition to the STOP state. 1 1 0 0 0 No action. 1 1 0 1 0 DBSR[ICMP] is set. A debug interrupt is taken. CSRR0 is set to the next instruction to be executed after an ICMP instruction. 8.3.8 Return Debug Events RET debug events occur if DBCR0[RET] = ‘1’ and an attempt is made to execute any of the following instructions: • Return from interrupt (rfi) • Return from critical interrupt (rfci), • Return from machine-check interrupt (rfmci) When a RET debug event occurs, DBSR[RET] is set to ‘1’ to record the debug exception. A RET debug event operates similarly to BT events in that RET debug events occur before the rfi, rfci, or rfmci instruction is executed. That is, CSRR0 points to the rfi, rfci, or rfmci instruction, not to the instruction to which the rfi, rfci, or rfmci instruction is returning. If an rfci or rfmci instruction is executed, an RET debug event does not occur if MSR[DE] = ‘0’, DBCR0[IDM] = ‘1’, and MSR[DWE] = ‘0’. In other words, RET debug events do not occur imprecisely in internal debug mode if an rfci or rfmci instruction is executed. However, if DBCR0[EDM] = ‘1’ or MSR[DWE] = ‘1’, RET debug events can occur imprecisely (because MSR[DE] = ‘0’) for the rfci or rfmci instructions. Setting DBCR0[IDM] = ‘1’ does not affect this. For the rfi instruction, imprecise RET debug events can occur regardless of debug mode. Table 8-5 describes debug register setting and the actions. Table 8-5. RET Debug Event Actions DBCR0 MSR[DE] MSR[DWE] and JDCR[DWE] Action If Event Occurs [ICMP] [IDM] [EDM] 0 – – – – None. 1 0 0 – 0 DBSR[RET] is set through a normal commit. 1 – 1 – – rfi faulty committed. DBSR[RET] is set. Transition to the STOP state. 1 – 0 – 1 rfi faulty committed. DBSR[RET] is set. Transition to the STOP state. 1 1 0 0 0 DBSR[RET] is set through a normal commit. DBSR[IDE] is set. 1 1 0 1 0 rfi faulty committed. DBSR[RET] is set. A debug interrupt is taken. CSRR0 is set to the address of the rfi instruction. Debug Facilities Page 232 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 8.3.9 Interrupt Debug Events IRPT debug events occur when IRPT debug events are enabled (DBCR0[IRPT] = ‘1’) and an interrupt occurs. When operating in external debug mode or debug wait mode, the occurrence of an IRPT debug event is recorded in DBSR[IRPT] and causes the processor to enter the stop state and cease processing instructions. The program counter will contain the address of the instruction that would have executed next had the IRPT debug event not occurred. Because the IRPT debug event is caused by the occurrence of an interrupt, by definition this address is that of the first instruction of the interrupt handler for the interrupt type that caused the IRPT debug event. When operating in internal debug mode with external debug mode and debug wait mode both disabled (and regardless of the value of MSR[DE]), an IRPT debug event can only occur because of a noncritical class interrupt. Critical class interrupts (machine check, critical input, watchdog timer, and debug interrupts) cannot cause IRPT debug events in internal debug mode (unless also in external debug mode or debug wait mode), as otherwise the debug interrupt which would occur as the result of the IRPT debug event would by necessity always be imprecise because the critical class interrupt which would be causing the IRPT debug event would itself be causing MSR[DE] to be set to ‘0’. For a noncritical class interrupt which is causing an IRPT debug event while internal debug mode is enabled and external debug mode and debug wait mode are both disabled, the occurrence of the IRPT debug event is recorded in DBSR[IRPT]. If MSR[DE] is ‘1’ at the time of the IRPT debug event, a debug interrupt occurs with CSRR0 set to the address of the instruction that would have executed next had the IRPT debug event not occurred. Because the IRPT debug event is caused by the occurrence of some other interrupt, by definition this address is that of the first instruction of the interrupt handler for the interrupt type that caused the IRPT debug event. If MSR[DE] is ‘0’ at the time of the IRPT debug event, the imprecise debug event (IDE) field of the DBSR is also set and a Debug interrupt does not occur immediately. Instead, instruction execution continues, and a debug interrupt occurs if MSR[DE] is set to ‘1’, thereby enabling debug interrupts, assuming software has not cleared the IRPT debug event status from the DBSR in the meantime. Upon such a delayed interrupt, the debug interrupt handler software can query the DBSR[IDE] field to determine that the debug interrupt has occurred imprecisely. When operating in trace mode, the occurrence of an IRPT debug event is recorded in DBSR[IRPT] and is indicated over the trace interface, and instruction execution continues. Table 8-6 describes the debug register setting and the actions. Table 8-6. IRPT Debug Event Actions DBCR0 MSR[DE] MSR[DWE] and JDCR[DWE] Action If Event Occurs [IRPT] [IDM] [EDM] 0 – – – – None. 1 0 0 – 0 DBSR[IRT] is set. 1 – 1 – – DBSR[IRT] is set. Transition to the STOP state. 1 – 0 – 1 DBSR[IRT] is set. Transition to the STOP state. 1 1 0 0 0 DBSR[IRPT] is set. DBSR[IDE] is set. 1 1 0 1 0 DBSR[IRPT] is set. A debug interrupt is taken. CSRR0 is set to the address of the first instruction in the base class interrupt handler. Version 2.2 July 31, 2014 Debug Facilities Page 233 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 8.3.10 Unconditional Debug Events A UDE occurs immediately upon being set using the JTAG debug port. When a UDE occurs, DBSR[UDE] is set to ‘1’ to record the debug exception. If MSR[DE] = ‘0’, DBSR[IDE] is also set to ‘1’ to record the imprecise debug event. If MSR[DE] = ‘1’ at the time of the unconditional debug exception, a debug interrupt occurs immediately (provided there exists no higher priority exception that is enabled to cause an interrupt). CSRR0 is set to the address of the instruction that would have executed next had the interrupt not occurred. If MSR[DE] = ‘0’ at the time of the UDE, a debug interrupt does not occur. Later, if the UDE has not been reset by clearing DBSR[UDE], and MSR[DE] is set to ‘1’, a delayed debug interrupt occurs. In this case, CSRR0 contains the address of the instruction after the one that enabled the debug interrupt by setting MSR[DE] to ‘1’. Software in the debug interrupt handler can monitor DBSR[IDE] to determine how to interpret the value in CSRR0. Table 8-7 summarizes the debug register setting and the actions. Table 8-7. UDE Debug Event Actions DBCR0 MSR[DE] MSR[DWE] and JDCR[DWE] Action If Event Occurs [IDM] [EDM] 0 0 – 0 DBSR[UDE] is set. – 1 – – DBSR[UDE] is set. Transition to the STOP state. – 0 – 1 DBSR[UDE] is set. Transition to the STOP state. 1 0 0 0 DBSR[UDE] is set. 1 0 1 0 DBSR[UDE] is set. A debug interrupt is taken. CSRR0 is set to the address of the CS tail at the time of the interrupt flush. 8.4 Debug Timer Freeze To maintain the semblance of real time operation while a system is being debugged, DBCR0[FT] can be set to ‘1’, which will cause all of the timers within the PowerPC 476FP core to stop incrementing or decrementing for as long as a debug event bit is set in the DBSR, or until DBCR0[FT] is set to ‘0’. See Section 6 Timer Facilities on page 157 for more information on the operation of the PowerPC 476FP core timers. Debug Facilities Page 234 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 8.5 Debug Special Purpose Registers All debug related registers and their bits descriptions are listed. RET DAC2W DAC2R DAC1W DAC1R IAC4 IAC3 IAC2 IAC1 TRAP IRPT BRT RST ICMP IDM EDM 8.5.1 Debug Control Register 0 (DBCR0) Reserved FT 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32 EDM External debug mode. 0 External debug mode is disabled. 1 External debug mode is enabled. 33 IDM Internal debug mode. 0 External debug mode is disabled. 1 Internal debug mode is enabled. 34:35 RST Reset. Setting this field starts a software-initiated reset. 00 No action. 01 Core reset. 10 Chip reset. 11 System reset. Note: Writing ‘01’, ‘10’, or ‘11’ to these bits resets the processor. 36 ICMP Instruction completed debug event; software single-step. 0 Instruction completed debug events are disabled. 1 Instruction completed debug events are enabled. 37 BRT Branch taken debug event. 0 Branch taken debug events are disabled. 1 Branch taken debug events are enabled. 38 IRPT Interrupt debug event. 0 Interrupt debug events are disabled. 1 Interrupt debug events are enabled. 39 TRAP Trap debug event. 0 Trap debug events are disabled. 1 Trap debug events are enabled. 40 IAC1 Instruction address comparison 1 debug event. 0 IAC 1 debug events are disabled. 1 IAC 1 debug events are enabled. 41 IAC2 Instruction address comparison 2 debug event. 0 IAC 2 debug events are disabled. 1 IAC 2 debug events are enabled. 42 IAC3 Instruction address comparison 3 debug event. 0 IAC 3 debug events are disabled. 1 IAC 3 debug events are enabled. Version 2.2 July 31, 2014 Description Debug Facilities Page 235 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Bits Field Name Description 43 IAC4 Instruction address comparison 4 debug event. 0 IAC 4 debug events are disabled. 1 IAC 4 debug events are enabled. 44 DAC1R Data address comparison 1 read debug event. 0 DAC 1 read debug events are disabled. 1 DAC 1 read debug events are enabled. 45 DAC1W Data address comparison 1 write debug event. 0 DAC 1 write debug events are disabled. 1 DAC 1 write debug events are enabled. 46 DAC2R Data address comparison 2 read debug event. 0 DAC 2 read debug events are disabled. 1 DAC 2 read debug events are enabled. 47 DAC2W Data address comparison 2 write debug event. 0 DAC 2 write debug events are disabled. 1 DAC 2 write debug events are enabled. 48 RET 49:62 Reserved 63 FT Return debug event. 0 Return debug events are disabled. 1 Return debug events are enabled. Freeze timers. 0 Freeze timers are disabled. 1 Freeze timers are enabled. Reserved IAC34AT IAC34M IAC4ER IAC4US IAC3ER IAC3US Reserved IAC12AT IAC12M IAC2ER IAC2US IAC1ER IAC1US 8.5.2 Debug Control Register 1 (DBCR1) 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:33 IAC1US Instruction address comparison 1 user/supervisor. 00 Both. 01 Reserved. 10 Supervisor-only (MSR[PR] = ‘0’). 11 User-only (MSR[PR] = ‘1’). The IAC1US field (and not the IAC2US field) is used when IAC12M is in range mode. 34:35 IAC1ER Instruction address comparison 1 effective/real. 00 Effective (MSR[IS] = Don’t care). 01 Reserved. 10 Effective (MSR[IS] = ‘0’). 11 Effective (MSR[IS] = ‘1’). The IAC1ER field (and not the IAC2ER field) is used when IAC12M is in range mode. 36:37 IAC2US Instruction address comparison 2 user/supervisor. See IAC1US for field values. 38:39 IAC2ER Instruction address comparison 2 effective/real. See IAC1ER for field values. Debug Facilities Page 236 of 322 Description Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Bits Field Name Description 40:41 IAC12M 42:46 Reserved 47 IAC12AT Instruction address comparison autotoggle12. 0 Automatic toggling for IAC1 and IAC2 events is disabled. 1 Automatic toggling for IAC1 and IAC2 events is enabled. 48:49 IAC3US Instruction address comparison 3 user/supervisor (see IAC1US). The IAC3US field (and not the IAC4US field) is used when IAC34M is in range mode. 50:51 IAC3ER Instruction address comparison 3 effective/real (see IAC1ER). The IIAC1ER field (and not the IAC4ER field) is used when IAC34M is in range mode. 52:53 IAC4US Instruction address comparison 4 user/supervisor. See IAC1US for field values. 54:55 IAC4ER Instruction address comparison 4 effective/real. See IAC1ER for field values. 56:57 IAC34M Instruction address comparison 3/4 mode. See IAC12M for field values. 58:62 Reserved 63 IAC34AT Instruction address comparison 1/2 mode. 00 Exact match. Match if address[0:29] is the same as IAC1[0:29] or IAC2[0:29]. These are two independent comparisons. 01 Reserved. 10 Range inclusive. Match if IAC1 ≤ address < IAC2. 11 Range exclusive. Match if (address < IAC1) or (IAC2 ≤ address). Instruction address comparison auto toggle34. 0 Automatic toggling for IAC3 and IAC4 events is disabled. 1 Automatic toggling for IAC3 and IAC4 events is enabled. 8.5.3 Debug Control Register 2 (DBCR2) Reserved DAC12M DAC2ER DAC2US DAC1ER DAC1US This register controls the operating modes of the data and the address compare registers. DVC1M DVC2M Reserved DVC1BE Reserved DVC2BE 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:33 DAC1US Data address compare 1 user or supervisor. 00 Both user and supervisor. 01 Reserved. 10 Supervisor only. MSR[PR] = ‘0’. 11 User only. MSR[PR] = ‘1’. 34:35 DAC1ER Data address compare 1 effective or real. 00 Effective. MSR[DS] = don’t care. 01 Reserved. 10 Effective. MSR[DS] = ‘0’. 11 Effective. MSR[DS] = ‘1’. Version 2.2 July 31, 2014 Description Debug Facilities Page 237 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Bits Field Name 36:37 DAC2US Data address compare 2 user or supervisor. 00 Both user and supervisor. 01 Reserved. 10 Supervisor only. MSR[PR] = ‘0’. 11 User only. MSR[PR] = ‘1’. 38:39 DAC2ER Data address compare 2 effective or real. 00 Effective. MSR[DS] = don’t care. 01 Reserved. 10 Effective. MSR[DS] = ‘0’. 11 Effective. MSR[DS] = ‘1’. 40:41 DAC12M Data address compare 1 and 2 mode. 00 Exact match. A match occurs if address[0:31] equals either DAC1[0:31] or DAC2[0:31]. Two independent comparisons are performed. 01 Address bit mask. A match occurs if the data address bits selected by DAC2 equal the bit values in DAC1. 10 Range inclusive. A match occurs if DAC1 ≤ address < DAC2. 11 Range exclusive. A match occurs if either the address < DAC1 or if DAC2 ≤ address. 42:43 Reserved 44:45 DVC1M Data value compare 1 mode. 00 Reserved. 01 AND all bytes enabled by DVC1BE. 10 OR all bytes enabled by DVC1BE. 11 AND-OR pairs of bytes enabled by DVC1BE (0 AND 1) OR (2 AND 3). 46:47 DVC2M Data value compare 2 mode. 00 Reserved. 01 AND all bytes enabled by DVC2BE 10 OR all bytes enabled by DVC2BE. 11 AND-OR pairs of bytes enabled by DVC2BE (0 AND 1) OR (2 AND 3). 48:51 Reserved 52:55 DVC1BE 56:59 Reserved 60:63 DVC2BE Debug Facilities Page 238 of 322 Description DVC 1 byte enables 0:3 (see DVC1M on page 238). DVC 2 byte enables 0:3 (see DVC2M on page 238). Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core IAC34ATS Reserved IAC12ATS RET DAC2W DAC2R DAC1W DAC1R IAC4 IAC3 IAC2 IAC1 TRAP IRPT BRT MRR ICMP UDE IDE 8.5.4 Debug Status Register (DBSR) 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32 IDE Imprecise debug event. 33 UDE Unconditional debug event. 34:35 MRR Most recent reset type. These two bits are set to one of three values when reset occurs. These two bits are undefined at power-up. 00 No reset occurred since these bits last cleared. 01 Core reset. 10 Chip reset. 11 System reset. 36 ICMP Instruction completed debug event. 37 BRT Branch taken debug event. 38 IRPT Interrupt debug event. 39 TRAP Trap debug event. 40 IAC1 Instruction address comparison 1 debug event. 41 IAC2 Instruction address comparison 2 debug event. 42 IAC3 Instruction address comparison 3 debug event. 43 IAC4 Instruction address comparison 4 debug event. 44 DAC1R Data address comparison 1 read debug event. 45 DAC1W Data address comparison 1 write debug event. 46 DAC2R Data address comparison 2 read debug event. 47 DAC2W Data address comparison 2 write debug event. 48 RET 49:61 Reserved 62 IAC12ATS Instruction address comparison 1/2 auto toggle status. 63 IAC34ATS Instruction address comparison 3/4 auto toggle status. Version 2.2 July 31, 2014 Description Return debug event. Debug Facilities Page 239 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 8.5.5 Setting the DBSR Based on MSR[DE] and DBCR0[IDM] Table 8-8 summarizes how the DBSR bits are set depending on the setting of MSR[DE] and DBCR0[IDM]. Table 8-8. Setting the DBSR based on MSR[DE] and DBCR0[IDM] DE IDM UDE IRPT ICMP BRT IAC DAC TRAP RET 0 0 Set Set Set Set Set Set Set Set 0 1 Set, IDE Noncritical only. Set, IDE - - Set, IDE Set, IDE Set, IDE rfi only Set, IDE 1 0 Set Set Set Set Set Set Set Set 1 1 Set Noncritical only. Set Set Set Set Set Set Set Reserved 8.5.6 Instruction Address Comparison 1 - 4 (IAC1 - IAC4) ADDR 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:61 ADDR 62:63 Reserved Description The address for matching. 8.5.7 Setup Order for IACs, DACs, and DVCs The following setup order is required for an IAC, DAC, or DVC to ensure that all signals are initialized in simulation. mtdbcr0 mtdbsr 0xFFFF mtiacX mtdacX mtdvcX mtdbcr1 mtdbcr2 isync mtdbcr0 bits isync Debug Facilities Page 240 of 322 # clear the dbcr0 # clear the dbsr # enable debug mode and IAC, DAC, DVC Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 8.6 JTAG and Debug Capabilities in a Multiprocessor SoC Environment The PowerPC 476FP core provides RISCWatch and JTAG interfaces, as follows: • Reset the core • Stop, halt, and start the core • Debug • Trace statuses of the processor and other devices In addition, the PowerPC 476FP core provides additional debug and status-observation capabilities to debug and monitor processors cores and other cores in an MP SoC environment. The following capabilities, which are defined by the Power.org Common Debug Interface Technical committee, are implemented: • Observe individual DBSR status bits under JTAG control (using the Debug Bus Out Mask Register) • Stop and run processors by JTAG control (using the Debug Bus Input Mask Register) 8.6.1 Debug Bus Out Mask Register (DBOMask) Figure 8-1 illustrates how the DBOMask Register traces the status of processors (bits are individually observed). Figure 8-1. JTAG-Controlled MP DBSR Monitor Capability PowerPC 476FP Embedded Processor Core Debug Status Debug Status Register (32 Bits) DBO (JTAG Pinout) Mask 32 Bits 32 Bits DBIMask Register JTAG Input Version 2.2 July 31, 2014 Debug Facilities Page 241 of 322 User’s Manual PowerPC 476FP Embedded Processor Core The DBOMask Register is used to allow observation of an individual bit of a trace trigger event type. DBOMask 0 1 2 3 4 5 6 Bits Field Name 0:13 DBOMask 14:31 Reserved 7 Reserved 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Description This 14-bit field corresponds to C476_TRCTRGGEREVENTTYPE[0:13] to enable the appropriate bit. 8.6.2 Debug Input Mask Register (DBIMask) Figure 8-2 illustrates the JTAG interface stop and run control capability (stop and run is controlled by JTAG). Figure 8-2. JTAG-Controlled MP Stop and Run Control Capability. PowerPC 476FP Embedded Processor Core Other Halts Debug Interface Debug Interface Register Mask JTAG Halt OR Processor Halt DBIMask Register JTAG Inputs 0 1 2 3 4 TEIMASK SSMASK UCEMASK This register is used to stop and start the processor through JTAG control. When a DBIMask Register bit is set and the corresponding debug event bit is set, the processor is stopped. When either the debug event bit or the DBIMask bit is cleared, the processor starts running again. 5 6 Reserved 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Bits Field Name Description 0:4 SSMASK When a bit of this field is set to ‘1’, a corresponding bit of the DBGC476SYSTEMSTATUS0-4 input bus can stop the processor. 5 UCEMASK When this bit is set to ‘1’, an asserted input signal XXC476UNCONDEVENT can stop the processor. 6 TEIMASK When this bit is set to ‘1’, an asserted input signal XXC476TRIGGEREVENTIN can stop the processor. 7:31 Reserved Debug Facilities Page 242 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 9. Initialization This section describes the initial state of the PowerPC 476FP processor core after a hardware reset, and contains a description of the initialization software that is required to complete the initialization so that the PowerPC 476FP processor core can begin executing application code. Initialization of other on-chip or offchip system components might also be required. 9.1 Processor Core State after Reset In general, the contents of registers and other facilities within the PowerPC 476FP processor core are undefined after a hardware reset. Reset is defined to initialize only the minimal resources required so that instructions can be fetched and run from the initial program-memory page, and so that repeatable, deterministic behavior can be guaranteed if the correct software initialization sequence is followed. System software must fully configure the remainder of the PowerPC 476FP processor core resources and the other facilities within the chip or system. The following list summarizes the processor state immediately after reset. • All fields of the Machine State Register (MSR) are set to ‘0’, disabling all asynchronous interrupts, placing the processor in supervisor mode, and specifying that instruction and data accesses are to the system (as opposed to the application) address space. • DBCR0[RST] is set to ‘0’, thereby ending any previous software-initiated reset operation. • The Debug Status Register (DBSR) most-recent reset (MRR) field records the type of the just-ended reset operation (core, chip, or system; see Reset Types on page 249). • The Timer Control Register (TCR) watchdog-timer reset control (WRC) field is set to ‘00’, thereby disabling the watchdog timer reset operation. • The Timer Status Register (TSR) watchdog timer reset status (WRS) records the type of the just-ended reset operation, if the reset was initiated by the watchdog timer. Otherwise, this field is unchanged from its prereset value. • The Processor Version Register (PVR) is defined, after reset and otherwise, to contain a value that indicates the specific processor version number. • The program counter (PC) is set to x‘FFFF FFFC’: the effective address (EA) of the last word of the address space. The memory management resources are set to values such that the processor is able to successfully fetch and execute instructions and read (but not write) data within the 4 KB program memory page located at the end of the 32-bit effective address space. Exactly how this is accomplished is implementation-dependent. For example, a translation lookaside buffer (TLB) entry might be established in a manner that is visible to software that uses the TLB management instructions. Regardless of how the implementation enables access to the initial program memory page, instruction execution starts at the effective address of x‘FFFF FFFC’. The instruction at this address must be an unconditional branch backwards to the start of the initialization sequence, which must lie somewhere within the first 4 KB program-memory page. The real address to which the initial effective address is translated is also implementation- or system-dependent, as are the various storage attributes of the initial program-memory page, such as the caching inhibited and endian attributes. Note: In the PowerPC 476FP processor core, a single entry is established in the instruction shadow TLB (ITLB) and data shadow TLB (DTLB) at reset with the properties described in Table 9-1 Reset Values of Registers and Other PowerPC 476FP Facilities on page 244. Initialization software must insert an entry into the Version 2.2 July 31, 2014 Initialization Page 243 of 322 User’s Manual PowerPC 476FP Embedded Processor Core unified translation lookaside buffer to cover this same memory region before performing any context-synchronizing operation (including causing any exceptions that might lead to an interrupt), because a context-synchronizing operation invalidates the shadow TLB entries. Initialization software should consider all other resources within the PowerPC 476FP processor core to be undefined after reset, in order for the initialization sequence to be compatible with other PowerPC implementations. However, additional resources are initialized by reset to guarantee correct and deterministic operation of the processor during the initialization sequence. Table 9-1 shows the reset state of all PowerPC 476FP processor core resources that are defined to be initialized by reset. Although certain other register fields and other facilities within the PowerPC 476FP processor core are affected by reset, this is neither an architectural nor a hardware requirement, and software must treat those resources as undefined. Likewise, even those resources that are included in Table 9-1 but which are not identified in the previous list as being architecturally required, should be treated as undefined by the initialization software. During chip initialization, some chip control registers must be initialized to ensure correct chip operation. Peripheral devices can also be initialized as appropriate for the system design. Table 9-1. Reset Values of Registers and Other PowerPC 476FP Facilities (Page 1 of 6) Resource MSR Field Reset Value Comment WE 0 The wait state is disabled. CE 0 Asynchronous critical interrupts are disabled. EE 0 Asynchronous noncritical interrupts are disabled. PR 0 The processor is in supervisor mode. FP 0 The processor cannot execute floating-point instructions. ME 0 Machine-check interrupts are disabled. FE0 0 Floating-point enabled interrupts are disabled. DWE 0 Debug wait mode is disabled. DE 0 Debug interrupts are disabled. FE1 0 Floating-point enabled interrupts are disabled. IS 0 Instruction fetch access is to system-level virtual address space. DS 0 Data access is to system-level virtual address space. PMM 0 Gathering statistics for marked processes is disabled. Note: TLB entry refers to an entry in the shadow instruction and data TLB arrays that is automatically configured by the PowerPC 476FP processor core to enable fetching and reading (but not writing) from the initial program memory page. The PowerPC 476FP processor core also automatically initializes one entry in the UTLB with the same data at reset to avoid accidental erasure of the initial TLB entry by MP initialization coding. Initialization Page 244 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Table 9-1. Reset Values of Registers and Other PowerPC 476FP Facilities (Page 2 of 6) Resource CCR0 CCR1 CCR2 Field Reset Value PRE 0 Semirecoverable parity mode enabled for the data cache. CRPE 0 Disable parity information reads. ICS 0 The icbi request size is set to 32-byte. DAPUIB 0 Enable broadcast of instruction data to auxiliary processor interface. ICWRIDX[0:3] 0000 DTB 0 Enable broadcast of trace information. FLSTA 0 No alignment exception occurs on integer storage access instructions, regardless of alignment. DQWPM[0:1] 00 Data cache quadword prediction is disabled. IQWPM[0:1] 00 Instruction cache quadword prediction mode is set to use EA[19]. GPRPEI 00 Does not record GPR parity errors. FPRPEI 00 Does not record FPR parity errors. ICDPEI 00 Records odd data parity errors in the instruction cache. ICLPEI 00 Records odd LRU parity errors in the instruction cache. ICTPEI 00 Records odd tag parity errors in the instruction cache DCTPEI 00 Records odd tag parity errors in the data cache DCDPEI 00 Records odd data parity errors in the data cache. DCLPEI 00 Records odd LRU parity errors in the data cache. MMUTPEI 0 Records odd tag parity errors in the memory management unit (MMU). MMUDPEI 0 Records odd data parity errors in the MMU. TSS 0 Selects the timer clock source. DPC 0 Disables or enables parity checking in the L1 cache core. TCS 00 Determines what clock frequency runs the timers. DSTG 00 Stores to all bytes within an L1 halfline can be gathered into a single transfer. DLFPD 0 Line fill match prediction is enabled. DSTI 0 When context synchronization occurs, invalidate shadow TLBs (ITLB, DTLB). PMUD 0 Enables or disables performance monitor unit (PMU) counting. DCSTGW 0 Disables or enables cacheable store gathering with write-through. 00 Determines how long a store request remains in the store buffer queue (SBQ) before a write request is sent. DISTG 0 Disables or enables cache inhibited store gathering. SPC5C1 0 Enables or disables AT field static branch predict. MCDTO 0 Enables or disables DCR timeout machine check. STGCTR Comment The index value to write to the instruction cache. Note: TLB entry refers to an entry in the shadow instruction and data TLB arrays that is automatically configured by the PowerPC 476FP processor core to enable fetching and reading (but not writing) from the initial program memory page. The PowerPC 476FP processor core also automatically initializes one entry in the UTLB with the same data at reset to avoid accidental erasure of the initial TLB entry by MP initialization coding. Version 2.2 July 31, 2014 Initialization Page 245 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table 9-1. Reset Values of Registers and Other PowerPC 476FP Facilities (Page 3 of 6) Resource DBCR0 Field Reset Value EDM 0 External debug mode is disabled. IDM 0 Disables or enables internal debug mode. RST 00 Software-initiated debug reset is disabled. ICMP 0 Instruction-completion debug events are disabled. BRT 0 Branch-taken debug events are disabled. IRPT 0 Interrupt debug events are disabled. TRAP 0 Disables or enables trap debug events. IAC1 0 Instruction address compare 1 (IAC1) debug events are disabled. IAC2 0 IAC2 debug events are disabled. IAC3 0 IAC3 debug events are disabled. IAC4 0 IAC4 debug events are disabled. DAC1R 0 DAC 1 read debug events are disabled. DAC1W 0 Data address comparison 1 write debug event. DAC2R 0 DAC 2 read debug events are disabled. DAC2W 0 Data address comparison 2 write debug event. RET 0 Return debug events are disabled. FT 0 Freeze timers are disabled. IDE 0 An imprecise debug event has not occurred. UDE 0 An unconditional debug event has not occurred. MRR DBSR Comment The MRR indicates the most recent type of reset. Value Type of Reset 00 No reset since this field was last cleared by software. Reset-dependent 01 Core reset. 10 Chip reset. 11 System reset. ICMP 0 The instruction completion debug event has not occurred. BRT 0 The branch taken debug event has not occurred. IRPT 0 The interrupt debug event has not occurred. TRAP 0 The trap debug event has not occurred. IAC1 0 The IAC1 debug event has not occurred. IAC2 0 The IAC2 debug event has not occurred. IAC3 0 The IAC3 debug event has not occurred. IAC4 0 The IAC4 debug event has not occurred. Note: TLB entry refers to an entry in the shadow instruction and data TLB arrays that is automatically configured by the PowerPC 476FP processor core to enable fetching and reading (but not writing) from the initial program memory page. The PowerPC 476FP processor core also automatically initializes one entry in the UTLB with the same data at reset to avoid accidental erasure of the initial TLB entry by MP initialization coding. Initialization Page 246 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Table 9-1. Reset Values of Registers and Other PowerPC 476FP Facilities (Page 4 of 6) Resource Field Reset Value DAC1R 0 The data address compare 1 (DAC1) read debug event has not occurred. DAC1W 0 The DAC1 write debug event has not occurred. DAC2R 0 The DAC2 read debug event has not occurred. DAC2W 0 The DAC2 write debug event has not occurred. RET 0 The return debug event has not occurred. IAC12ATS 0 Instruction address comparison 1/2 auto toggle status is disabled. IAC34ATS 0 Instruction address comparison 3/4 auto toggle status is disabled. ESR ISMC 0 The synchronous instruction machine-check exception has not occurred. MCSR MCS 0 The asynchronous instruction machine-check exception has not occurred. VE0 0 There is no bolted entry at the index address in IBE0. VE1 0 There is no bolted entry at the index address in IBE1. VE2 0 There is no bolted entry at the index address in IBE2. VE3 0 There is no bolted entry at the index address in IBE3. VE4 0 There is no bolted entry at the index address in IBE4. VE5 0 There is no bolted entry at the index address in IBE5. ORD1 0000 ORD2 0000 ORD3 0000 ORD4 0000 ORD5 0000 ORD6 0000 ORD7 0000 ORD1 0000 ORD2 0000 ORD3 0000 ORD4 0000 ORD5 0000 ORD6 0000 ORD7 0000 MMUBE0 MMUBE1 SSPCR USPCR Comment Only 4KB pages are searched. Only 4KB pages are searched. Note: TLB entry refers to an entry in the shadow instruction and data TLB arrays that is automatically configured by the PowerPC 476FP processor core to enable fetching and reading (but not writing) from the initial program memory page. The PowerPC 476FP processor core also automatically initializes one entry in the UTLB with the same data at reset to avoid accidental erasure of the initial TLB entry by MP initialization coding. Version 2.2 July 31, 2014 Initialization Page 247 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table 9-1. Reset Values of Registers and Other PowerPC 476FP Facilities (Page 5 of 6) Resource ISPCR Field Reset Value ORD1 0000 ORD2 0000 ORD3 0000 ORD4 0000 ORD5 0000 ORD6 0000 ORD7 0000 PC x‘FFFF FFFC’ Comment Only 4KB pages are searched. After reset, the first instruction is fetched from the last word of the effective-address space. OWN System-dependent PVR[OWN] value (after reset and otherwise) is specified by core input signals. PVN System-dependent PVR[PVN] value (after reset and otherwise) is specified by core input signals. PVR U0 System-dependent U1 System-dependent U2 System-dependent All Reset Configuration Register (RSTCFG) fields are specified by core input sigSystem-dependent nals. RSTCFG U3 E System-dependent ERPN System-dependent Note: TLB entry refers to an entry in the shadow instruction and data TLB arrays that is automatically configured by the PowerPC 476FP processor core to enable fetching and reading (but not writing) from the initial program memory page. The PowerPC 476FP processor core also automatically initializes one entry in the UTLB with the same data at reset to avoid accidental erasure of the initial TLB entry by MP initialization coding. Initialization Page 248 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Table 9-1. Reset Values of Registers and Other PowerPC 476FP Facilities (Page 6 of 6) Resource Field Reset Value EPN[0:19] x‘FFFFF’ V 1 The translation table entry for the initial program memory page is valid. TS 0 The translation space (TS) is reset to ‘0’. The initial program-memory page is in system-level virtual address space. SIZE x‘1’ TID x‘0000’ RPN[0:21] x‘FFFFF’ || ‘00’ ERPN System-dependent The extended real-page number of the initial program memory page is specified by core input signals. U[0:3] System-dependent The reset value of user-definable storage attributes are specified by core input signals. W 0 The write-through storage attribute is disabled. I 1 The caching-inhibited storage attribute is enabled. M 0 The memory-coherent storage attribute is disabled. G 1 The guarded-storage attribute is enabled. TLBentry (see footnote) E TCR TSR Comment Matches the effective address of the initial reset instruction. EPN[20:21] are undefined. They are not compared to the EA because the page size is 4 KB. The initial program-memory page size is 4 KB. The translation identifier (TID) is set to zero. The initial program-memory page is globally shared; no match against the Process Identifier Register (PID) is required. The initial program-memory page is mapped effective = real. System-dependent The reset value of the endian-storage attribute is specified by a core input signal. SX 1 The supervisor execute (SX) access is enabled. SW 0 The supervisor write (SW) access is disabled. SR 1 The supervisor read (SR) access is enabled. WRC 00 The watchdog-timer reset control is disabled. WRS TCR[WRC] TCR[WRC] is copied into WRS if the reset is caused by a watchdog-timer timeout. Unchanged WRS is not changed if the reset is caused by other than a watchdog-timer timeout. Undefined WRS is undefined after a power-on event. Note: TLB entry refers to an entry in the shadow instruction and data TLB arrays that is automatically configured by the PowerPC 476FP processor core to enable fetching and reading (but not writing) from the initial program memory page. The PowerPC 476FP processor core also automatically initializes one entry in the UTLB with the same data at reset to avoid accidental erasure of the initial TLB entry by MP initialization coding. 9.2 Reset Types The PowerPC 476FP processor core supports three reset types: core, chip, and system. The type of reset is indicated by a set of core input signals. For each type of reset, the core resources are initialized as indicated in Table 9-1 on page 244. Core reset is intended to reset the PowerPC 476FP processor core without necessarily resetting the rest of the on-chip logic. The chip reset operation is intended to reset the entire chip, but off-chip hardware in the system is not informed of the reset operation. System reset is intended to reset the entire chip, and also to signal the rest of the off-chip system that the chip is being reset. Whether the system reset operation is used to reset the rest of the system board is established by the board designer. Version 2.2 July 31, 2014 Initialization Page 249 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 9.3 Reset Sources A reset operation can be initiated on the PowerPC 476FP processor core through the use of any of four separate mechanisms. The first is a set of three input signals to the core, one for each of the three reset types. These signals can be asserted asynchronously by hardware outside the core to initiate a reset operation. The second reset source is the TCR[WRC] field, which can be set up by software to initiate a reset operation upon certain watchdog timer expiration events. The third reset source is the DBCR0[RST] field, which can be written by software to immediately initiate a reset operation. The fourth reset source is the Joint Test Action Group (JTAG) interface, which can be used by a JTAG-attached debug tool to initiate a reset operation asynchronously to program execution on the PowerPC 476FP processor core. 9.4 Initialization Software Requirements After a reset operation occurs, the PowerPC 476FP processor core is initialized to a minimum configuration to enable the fetching and execution of the software initialization code. The initialization also guarantees deterministic behavior of the core during the execution of this code. Initialization software is necessary to complete the configuration of the processor core and the rest of the on-chip and off-chip system. The system must provide nonvolatile memory (or memory initialized by some mechanism other than the PowerPC 476FP processor core) at the real address corresponding to effective address x‘FFFF FFFC’ and at the rest of the initial program memory page. The instruction at the initial address must be an unconditional branch backwards to the beginning of the initialization software sequence. The initialization software functions described in this section perform the configuration tasks required to prepare the PowerPC 476FP processor core to start an operating system and subsequently execute an application program. The initialization software must also perform functions associated with hardware resources that are outside the PowerPC 476FP processor core. The additional initialization is beyond the scope of this document. This section refers to some of these functions, but their full scope should be described in the user’s manual for the specific chip or system implementation. Initialization software should perform the following tasks to fully configure the PowerPC 476FP processor core. For more information about the various functions referenced in the initialization sequence, see the corresponding sections of this document. Proceed as follows: 1. Branch backwards from effective address x‘FFFF FFFC’ to the start of the initialization sequence. 2. Set up and clear the DBCR0 Register to disable all debug events. Although the PowerPC 476FP processor core is defined to reset some of the debug-event enables during the reset operation (as specified in Table 9-1 on page 244), this is not required by the architecture. Therefore, the initialization software should not assume this behavior. Software should disable all debug events to prevent nondeterministic behavior on the trace interface to the core. 3. Clear the DBSR to initialize all debug event status. Although the PowerPC 476FP processor core is defined to reset the DBSR debug event status bits during the reset operation (as specified in Table 9-1), this is not required by the architecture. Therefore, the initialization software should not assume this behavior. Software should clear all such status to prevent nondeterministic behavior on the JTAG interface to the core. Initialization Page 250 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 4. Initialize the core configuration registers, CCR0, CCR1, and CCR2, as necessary. In most cases, these bits can be left in the reset state. A thorough understanding of the implications of changing these register fields must be a prerequisite for making any changes. Reserved fields must be left in the reset state. 5. Configure the memory management unit control registers (MMUBE0, MMUBE1, SSPCR, USPCR, ISPCR) as appropriate. 6. Set up a TLB entry to cover the initial program memory page. The PowerPC 476FP processor core only initializes an architecturally invisible shadow TLB entry and one entry in the UTLB during the reset operation. All other shadow TLB entries are invalidated upon any context synchronization, and all other UTLB entries except index-address-0 and one entry are undefined. Because of these properties, special care must be taken during the initialization sequence until this step is completed and an architected TLB entry has been established in the TLB. a. Initialize the MMU Configuration Register (MMUCR). Complete the following steps: (1) Specify the TID field to be written to TLB entries. (2) Specify the TS field to be used for TLB searches. (3) Specify the store miss allocation behavior. b. Write a TLB entry for the initial program memory page. Complete the following steps: (1) Specify the effective page number (EPN), the real page number (RPN), the extended real page number (ERPN), and the SIZE as appropriate for the system. (2) Set the valid bit. (3) Specify TID = ‘0’ (disable comparison to the PID), or initialize the PID to a matching value. (4) Specify TS = ‘0’ (system address space), or set MSR[IS,DS] to correspond to TS = ‘1’. (5) Specify storage attributes (W, I, M, G, E, U0 - U3) as appropriate for the system. (6) Enable supervisor execute (SX), supervisor read (SR), and supervisor write (SW) access. c. Initialize the PID to match the TID field of the TLB entry (unless TID = ‘0’). d. Set up for subsequent MSR[IS,DS] initialization to correspond to the TS field of the TLB entry. This is necessary only if the TS field of the TLB entry is being set to ‘1’ (MSR[IS,DS] is already reset to ‘0’). Complete the following steps: (1) Write the new MSR value into SRR1. (2) Write the address from which to continue execution into SRR0. Version 2.2 July 31, 2014 Initialization Page 251 of 322 User’s Manual PowerPC 476FP Embedded Processor Core e. Set up for the subsequent change in the instruction fetch address. This is necessary only if the EPN field of the TLB entry changed from the initial value (EPN[0:19] ? x‘FFFFF’). Complete the following steps: (1) Write the initial or new MSR value into SRR1. (2) Write the address from which to continue execution into SRR0. f. Fully initialize the TLB. Issue a tlbwe to all three words of each TLB entry; issuing tlbre to TLB entries that are not fully initialized can result in parity exceptions. All unused TLB entries must be invalidated by setting V-bit = '0'. g. Perform a context synchronization to invalidate the shadow TLB contents and to cause the new TLB contents to take effect. • Use the isync instruction if neither the MSR contents nor the effective address of the rest of the initialization sequence are being changed. • Use the rfi instruction if the MSR is being changed to match the new TS field of the TLB entry. SRR1 will be copied into the MSR, and program execution will resume at the address saved in SRR0. • Use the rfi instruction if the next instruction fetch address is being changed to correspond to the new EPN field of the TLB entry. SRR1 will be copied into MSR, and program execution will resume at the address saved in SRR0. At this point in the initialization process, if the corresponding TLB entry has been set up with the caching inhibited storage attribute set to ‘0’, the instruction and data caches begin to be used. Initialization software can now branch outside of the initial 4 KB memory region as controlled by the address and size of the new TLB entry or any other TLB entries that have been set up. 7. Initialize the interrupt resources. Complete the following steps: a. Initialize the Interrupt Vector Prefix Register (IVPR) to specify the high-order address of the interrupt handling routines. Ensure that the corresponding address region is covered by a TLB entry or entries. b. Initialize the IVOR0 - IVOR15 Registers to set their individual interrupt vector addresses. Ensure that the corresponding addresses are covered by a TLB entry or entries. Because the loworder 4 bits of IVOR0 - IVOR15 are reserved, those bits are ignored when the registers are written and are read as zeros when an interrupt uses the register address values. Therefore, all interrupt vector offsets are implicitly aligned on quadword boundaries. Software must ensure that all interrupt handlers are quadword aligned. c. Load the interrupt handling routines into program memory. d. Synchronize any program memory changes as required. See Section 5.4 Self-Modifying Code on page 140 for more information about the instruction sequence necessary to synchronize changes to program memory before executing the new instructions. 8. Configure the debug facilities as required. Complete the following steps: a. Write DBCR1 and DBCR2 to specify the instruction address compare (IAC) and data address compare (DAC) event conditions. b. Clear the DBSR to initialize the IAC auto-toggle status. c. Initialize the IAC1 - IAC4, DAC1 - DAC2, and DVC1 - DVC2 Registers to the required values. Initialization Page 252 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core d. If required, write to MSR[DWE] to enable the debug wait mode. e. Write to DBCR0 to enable the required debug modes and events. f. Perform a context synchronization isync to establish the new debug facility context. 9. Configure the timer facilities as required. The TSR is cleared. Complete the following steps: a. Write zeros to the Time Base Lower (TBL) Register to prevent fixed-interval timer and watchdog timer exceptions when the TSR is cleared and to prevent an incrementation carry into the Timer Base Upper (TBU) Register before full initialization is completed. b. The TCS field of CCR1 can be initialized at this point or earlier with the rest of CCR1. c. Write zeros to the TSR to clear all timer exception status. d. Write the TCR to configure and enable timers as required. Note that software can enable the watchdog timer reset function, but only a reset can disable it. e. Initialize the TBU value as required. f. Initialize the TBL value as required. g. If the decrementer auto-reload function is required, initialize the Decrementer Auto Reload Register (DECAR) to the required value. h. Initialize the DEC Register to the required value. 10. Initialize the L2 cache using Device Control Register (DCRs). See the PowerPC 476FP L2 Cache Core Databook. 11. Initialize PLB6 using DCRs. See the PLB6 Bus Controller Core Databook. 12. Initialize the MSR to enable interrupts as required. Complete the following steps: a. Set MSR critical interrupt enable (CE) to enable or disable critical-input and watchdog-timer interrupts. b. Set MSR external interrupt enable (EE) to enable or disable external-input, decrementer, and fixedinterval timer interrupts. c. Set MSR debug interrupt enable (DE) to enable or disable debug interrupts. d. Set MSR machine-check enable (ME) to enable or disable machine-check interrupts. Software should first check the status of the Exception Syndrome Register (ESR) machine-check interrupt (ISMC) field and Machine Check Syndrome Register (MCSR) machine check summary (MCS) fields to determine whether any machine-check exceptions have occurred since these fields were cleared by reset but before machine-check interrupts were enabled (by this step). Any such exceptions would have set ESR[ISMC] or MCSR[MCS] to ‘1’, and this status can only be cleared explicitly by software. After the MCSR[MCS] field is known to be clear, the MCSR status bits (MCSR[1:8]) should be cleared by software to avoid possible confusion upon later service of a machine-check interrupt. After MSR[ME] has been set to ‘1’, subsequent machine-check exceptions result in a machine-check interrupt. e. Perform a context synchronization by using an isync instruction to establish a new MSR context. 13. Initialize any other processor core resources as required by the system (General Purpose Registers [GPRs], Special Purpose Registers for general use [SPRGs], and so on). Failure to initialize GPRs might result in parity errors. 14. Initialize any other facilities outside the processor core as required by the system. Initialize system memory as required by the system software. Version 2.2 July 31, 2014 Initialization Page 253 of 322 User’s Manual PowerPC 476FP Embedded Processor Core 15. Synchronize any program memory changes as required. Section 5.4 Self-Modifying Code on page 140 for more information about the instruction sequence necessary to synchronize changes to program memory before executing the new instructions. 16. Start the system software. System software is generally responsible for initializing or managing the rest of the MSR fields, including the following fields: • MSR floating-point enable (FP) to enable or disable the execution of floating-point instructions. • MSR[FE0,FE1] to enable or disable floating-point-enabled exception-type program interrupts. • MSR problem state (PR) to specify user mode or supervisor mode. • MSR[IS,DS] to specify application address space or system address space for instructions and data. • MSR wait state enable (WE) to place the processor into wait state (to halt execution pending an interrupt). Initialization Page 254 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 10. L2 Cache and UTLB Synchronous Interfaces This section describes the level-2 (L2) cache and UTLB synchronous interfaces. See the PowerPC 476FP L2 Cache Core User’s Guide for more information. See the PowerPC 476FP Core Support Manual for further details. 10.1 L2 Cache Interface The PowerPC 476FP core interfaces directly to an L2 cache. The PowerPC 476FP core and the L2 cache synthesizable core are implemented in different clock domains. Figure 10-1 illustrates the L2 cache interface. Version 2.2 July 31, 2014 L2 Cache and UTLB Synchronous Interfaces Page 255 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Figure 10-1. L2 Cache and Interface Block Diagram Frequency Ratio (2:1, 3:1, 4:1) L2 Configuration (256 KB, 512 KB, 1 MB) 476FP Core (1.6 - 2.0 GHz) L2 Cache (400 - 800 MHz) I-Read I-Snoop Frequency Ratio (N:1) PLB6 Bus (400 - 800 MHz) 4-Way Set-Associative Cache with ECC RSV-Snoop RAM TAG L1 I-Cache RAM TAG RAM TAG RAM TAG D-Write D-Read D-Snoop L1 D-Cache Write Queue TLB-Snoop LRU Write Queue Write Read MSR[PMM] UTLB Snoop Control PMUCC0[FAC] Snoop Reservation PM Events1 Other Devices (Alternate DCR Implement) DCR Bus DCR Arbiter Key: DCR ECC L1 L2 LRU PM PLB6 UTLB Device configuration register Error checking and correction Level 1 Level 2 Least recently used Performance monitor Processor local bus 6 Unified translation lookaside buffer Performance Monitor Note: 1. Performance monitor events include L1 cache hits, shadow TLB misses, and commitments. L2 Cache and UTLB Synchronous Interfaces Page 256 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core 10.2 L2 Cache Features For detailed information regarding the features and capabilities of the L2 cache controller, see the PowerPC 476FP L2 Cache Controller User’s Guide. 10.2.1 L2 Cache Storage Reservation Management The PowerPC 476FP core supports memory pages and requires memory coherence regardless of the page attribute M state. The L2 cache manages lwarx/stwcx. storage reservations to improve the performance of multiprocessor (MP) reservation handling. See Section 4.5.4 Memory Coherence Required (M) on page 117. The L2 cache tracks the reservation status that is set by the lwarx instruction and cleared by the stwcx. instruction and snoop operations. The lwarx instruction broadcasts to the L2 cache using the read interface. If the lwarx is an L1 cache hit, the L1 cache requests no data. If the lwarx is an L1 cache miss, the L2 cache returns data to the L1 cache, similar to a normal load instruction. In both cases, the L2 cache updates the reservation granule with the lwarx address and sets the reservation bit. Table 10-1 describes some special cases in which the lwarx instruction must not set the reservation bit resulting from an incoming snoop. This is because the reservation flag is in the L2 cache, and the L2 cache handles all snoops with regards to the reservation. The stwcx. instruction broadcasts to the L2 cache using the write interface. The L1 pipeline guarantees that the line referenced by the stwcx. is invalidated in the L1 cache and line fill buffers. The stwcx. is then sent to the L2 cache. Meanwhile, the processor core does not update the appropriate CR bit field until receiving a completion signal from the L2 cache. The bit is set to ‘0’ for successful completion of the stwcx. and to ‘1’ for a failed stwcx.. The L1 cache does not send lwarx/stwcx. instructions that are faulty as a result of alignment or data storage interrupt (DSI) exceptions to the L2 cache. The L2 cache can handle multiple lwarx/stwcx. instructions. Most importantly, the processor core receives a completion signal for each stwcx.. Table 10-1 shows the lwarx/stwcx. process. Version 2.2 July 31, 2014 L2 Cache and UTLB Synchronous Interfaces Page 257 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table 10-1. lwarx and stwcx. Actions in the L2 Cache and Processor Core L1 D-Cache L1 D-Cache to L2-Cache L2 Cache Reservation Unit or Processor Core lwarx Hit Read with data, L1 D-cache invalidated. lwarx request Hit The L1 D-cache invalidates the cache line, and the L2 cache returns data to the L1 D-cache. The reservation will be set in the L2 cache. lwarx Hit Read with data, L1 D-cache invalidated. lwarx request Miss The L1 D-cache invalidates the cache line, and the L2 cache returns data to the L1 D-cache when the L2 cache returns the cache line. The reservation is set in the L2 cache after the L2 cache gains ownership. lwarx Miss Read with data, lwarx request Hit The reservation is set with the address. Data returns to the L1cache lwarx Miss Read with data, lwarx request Miss The reservation is set with the address after the Data returns to the L2 cache gets the data with shared or exclusive use. L1 cache stwcx. Hit Write with stwcx. data Hit L1 D-cache invalidates the line. If the reservation is set and the address matches, the reservation succeeds and the L2 cache performs the write operation and L1 CR[EQ] is set. stwcx. Hit Write with stwcx. data Miss L1 D-cache invalidates the line. If stwcx. succeeds, the L2cache performs the write after owning the line, and L1 CR[EQ] is set. Most likely L2 line is snooped, and the reservation is lost. stwcx. Miss Write with stwcx. data Hit If stwcx. succeeds, the L2 cache performs the write and L1 CR[EQ] is set. stwcx. Miss Write with stwcx. data Miss If stwcx. succeeds, the L2 cache performs the write after owning the line, and L1 CR[EQ] is set. Note: Some cache operation effects follow: A dcbtst for a line-matching reservation granule by a remote processor causes the reservation to be lost. Any store and dcbz to a line-matching reservation granule by a remote processor causes the reservation to be lost. A dcbst to a line-matching reservation granule by a remote processor does not affect the reservation. A dcbf or dcbt to a line-matching reservation granule by a remote processor does not affect the reservation. A dcbi is converted to a dcbf by a processor, and therefore, dcbi (or dcbf in the L2 cache) to the line-matching reservation granule by a remote processor does not affect reservation. 6. An icbt or icbi to a line-matching reservation granule by a remote processor does not affect the reservation. 1. 2. 3. 4. 5. 10.2.2 Performance Monitor The PowerPC 476FP L2 cache core provides a performance monitor to report the number of occurrences for a number of L2 cache events. See the PowerPC 476FP L2 Cache Core User’s Guide for additional information regarding these events. The performance monitor can be controlled from the PowerPC 476FP core though the Performance Monitor Unit Core Control Register 0 (PMUCC0), described in the following subsection. L2 Cache and UTLB Synchronous Interfaces Page 258 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core FAC 10.2.2.1 Performance Monitor Unit Core Control Register 0 (PMUCC0) Reserved 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bit Name Description 32 FAC 33:63 Reserved 0 1 Performance monitor enabled. Freeze all performance monitor unit (PMU) counters. 10.2.3 Cache Operations Handling Some cache management instructions contain a CT field that is used to specify a cache level within a cache hierarchy or a portion of a cache structure to which the instruction is to be applied. Table 10-2 shows the correspondence between the CT value specified and the cache level. Table 10-2. CT Field Value and Cache Level CT Field Value Cache Level 0 Primary cache 2 Secondary cache Note: The CT values that are not show can be used to specify implementation-dependent cache levels or implementation-dependent portions of a cache structure. Any cache operations that generate exception conditions that are detected before commitment will not be broadcast to the L2 cache. For instance, dcbz to a page that is marked write-through or cache inhibited generates an alignment exception and never reaches the L2 cache. Table 10-3 on page 259 shows how cache operations are handled. Table 10-3. Cache Operations (Page 1 of 3) Cache Operation CT Field icbi icbt CT = 0 L1 Cache dcbf (L = 0) PLB6 Remote L2 Cache Remote L1 Cache Invalid L1 I-cache. Pass through. iKill. See the Pow- Pass-through. erPC PLB6 User’s Guide. L1 I-cache touch. L2 cache touch. Touch if L2 cache miss. L2 cache touch. Touch if L2 cache miss. Write with flush if modified. Write with flush if modified. CT = 2 dcba L2 Cache Invalid L1 I-cache. No-op. L1 D-cache line invalidate four lines. Write with flush. Possible snoop invalid. Note: 1. For dcbt, the TH field has a similar function to the CT field of other cache operations. 2. For dcbtst, the TH field has a similar function to the CT field of other cache operations. Version 2.2 July 31, 2014 L2 Cache and UTLB Synchronous Interfaces Page 259 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table 10-3. Cache Operations (Page 2 of 3) Cache Operation CT Field dcbi (privileged) L1 Cache L1 D-cache line invalidate four lines if L1 hit. dcbst dcbt1 dcbtst2 L1 D-cache touch. Touch, mark D-side. Touch if L2 cache miss. TH = 2 Touch, mark D-side. Touch if L2 cache miss. Write with clean. Possible snoop invalid. TH = 2 Store-touch (exclusive or modified), mark D-side. RWITM (allocate). Write with flush. Possible snoop invalid. L1 D-cache zeros four lines if L1 hit. Zero the L2 cache line. D-claim if W = I = 0 Snoop invalid. Snoop invalid. L1 I-cache touch and lock. Touch if miss. Touch if L2 miss. L2 touch and lock. Touch if L2 miss. CT = 0 CT = 0 L1 I-cache unlock a line. CT = 0 CT = 0 CT = 0 N/A L2 cache unlock a line. L1 D-cache touch and lock. L1 D-cache touch and lock. Touch if miss. Touch if L2 cache miss. L2 touch and lock mark with D-side. Touch if L2 cache miss. Touch if miss. Touch if L2 cache miss. L2 store-touch and lock mark with D-side. Store-touch if L2 cache miss. L1 D-cache unlock No action. line. CT = 2 ici (privileged) Possible snoop invalid. Read with intent to Write with flush. modify (RWITM) (allocate). CT = 2 dcblc (privileged) Flush. L1 D-cache touch. Store-touch (exclusive or modified), mark D-side. CT = 2 dcbtstls (privileged) Remote L2 Cache Remote L1 Cache TH = 0 CT = 2 dcbtls (privileged) Flush if modified. TH = 0 CT = 2 icblc (privileged) Flush if modified. PLB6 Write with clean if modi- Clean if modified. fied. dcbz icbtls (privileged) L2 Cache L2 unlock line. CT = 0 Invalidate L1 I-cache. CT <> 0 No-op. Note: 1. For dcbt, the TH field has a similar function to the CT field of other cache operations. 2. For dcbtst, the TH field has a similar function to the CT field of other cache operations. L2 Cache and UTLB Synchronous Interfaces Page 260 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Table 10-3. Cache Operations (Page 3 of 3) Cache Operation CT Field dci (privileged) CT = 0 Invalidate L1 D-cache. CT = 2 Invalidate L1 D-cache. L1 Cache icread (privileged) L1 I-cache debug read. dcread (privileged) L1 D-cache debug read. L2 Cache PLB6 Remote L2 Cache Remote L1 Cache Invalidate L2 cache. No action. The local L2 cache must invalidate all entries. Note: 1. For dcbt, the TH field has a similar function to the CT field of other cache operations. 2. For dcbtst, the TH field has a similar function to the CT field of other cache operations. 10.2.4 tlbivax, tlbsync, msync, mbar Handling See Section 4.8.4 TLB Invalidate, Virtual Address Indexed (tlbivax) on page 130 and Section 4.9 UTLB Coherency on page 130 for information about tlbivax and tlbsync. See Section 5.5.18 Memory Barrier Instructions on page 149 for information about msync, mbar, and lwsync. 10.3 L1 Cache UTLB Snoop Interface The L1 cache unified translation lookaside buffer (UTLB) snoop interface maintains coherency for the TLB between processors. The PowerPC 476FP core is designed with hardware supported coherency. The tlbivax instruction invalidates the target entry in all processors. The tlbivax instruction will be treated as a tlbie instruction on the PLB6. The tlbsync is fully implemented to provide an ordering function of tlbivax instructions. Both instructions are privileged. See Section 4.8.4 TLB Invalidate, Virtual Address Indexed (tlbivax) on page 130 for information about tlbivax operations. See Section 4.9 UTLB Coherency on page 130 for information about tlbsync operations. Software must ensure the ordering requirement, such as use of the mbar and isync instructions, and updates of the AS or PID fields. The hardware is not required to order the tlbivax operation. Version 2.2 July 31, 2014 L2 Cache and UTLB Synchronous Interfaces Page 261 of 322 User’s Manual PowerPC 476FP Embedded Processor Core L2 Cache and UTLB Synchronous Interfaces Page 262 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Appendix A. Register Summary This appendix provides an alphabetic listing of Special Purpose Registers and bit definitions for the registers contained in the PowerPC 476FP processor core. These SPRs are accessed by mfspr and mtspr instructions. The access column in Table A-1 uses the following terms for different access types: R/W Readable and writable. Read Read only. Write Write only. R/C Read and clear. Clear Clear means that ‘1’ bits in the register are cleared when that SPR is accessed by an mtspr instruction. Table A-1. Register Categories (Page 1 of 3) Category Branch Control Name Short Name Privileged CTR No R/W x‘009’ 60 LR No R/W x‘008’ 59 DCRIPR Yes R/W x‘37B’ 74 Data Cache Debug Tag Register High DCDBTRH Yes Read x‘39D’ 153 Data Cache Debug Tag Register Low DCDBTRL Yes Read x‘39C’ 152 Data Cache Exception Syndrome Register DCESR Yes R/W x‘352’ 268 Instruction Cache Debug Data Register 0 ICDBDR0 Yes Read x‘3D3’ 137 Instruction Cache Debug Data Register 1 ICDBDR1 Yes Read x‘3D4’ 137 Instruction Cache Debug Tag Register High ICDBTRH Yes Read x‘39F’ 138 Instruction Cache Debug Tag Register Low ICDBTRL Yes Read x‘39E’ 137 Instruction Cache Exception Syndrome Register ICESR Yes R/W x‘353’ 140 Instruction Opcode Compare Control Register IOCCR Yes R/W x‘35C’ 269 Instruction Opcode Compare Register 1 IOCR1 Yes R/W x‘35D’ 269 Instruction Opcode Compare Register 2 IOCR2 Yes R/W x‘35E’ 270 Counter Register 1 1 Link Register Cache Debug DCR Immediate Prefix Register Access Address Page 1. See the Power ISA V2.05 Specification for details. 2. This register is renamed to VRSAVE in the Power ISA V2.05 Specification. Version 2.2 July 31, 2014 Register Summary Page 263 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table A-1. Register Categories (Page 2 of 3) Category Debug Name Short Name Privileged Access Address Page Data Cache Address Compare 1 Register DAC1 Yes R/W x‘13C’ 266 Data Cache Address Compare 2 Register DAC2 Yes R/W x‘13D’ 266 Data Value Compare 1 Register DVC1 Yes R/W x‘13E’ 266 Data Value Compare 2 Register DVC2 Yes R/W x‘13F’ 266 Debug Control Register 0 DBCR0 Yes R/W x‘134’ 235 Debug Control Register 1 DBCR1 Yes R/W x‘135’ 236 Debug Control Register 2 DBCR2 Yes R/W x‘136’ 237 Debug Data Register DBDR Yes R/W x‘3F3’ 266 Debug Status Register DBSR Yes R/C x‘130’ 239 Write x‘330’ Instruction Address Compare 1 IAC1 Yes R/W x‘138’ 240 Instruction Address Compare 2 IAC2 Yes R/W x‘139’ 240 Instruction Address Compare 3 IAC3 Yes R/W x‘13A’ 240 Instruction Address Compare 4 IAC4 Yes R/W x‘13B’ 240 XER No R/W x‘001’ 64 Interrupts and Exceptions Critical Save and Restore Register 0 CSRR0 Yes R/W x‘03A’ 175 Critical Save and Restore Register 1 CSRR1 Yes R/W x‘03B’ 176 DEAR Yes R/W x‘03D’ 177 ESR Yes R/W x‘03E’ 179 IVOR0 through IVOR15 Yes R/W x‘190’ through x‘19F’ 178 IVPR Yes R/W x‘03F’ 179 Machine Check Save and Restore Register 0 MCSRR0 Yes R/W x‘23A’ 176 Machine Check Save and Restore Register 1 MCSRR1 Yes R/W x‘23B’ 177 MCSR Yes R/W x‘23C’ 181 Yes Clear x‘33C’ 181 Yes Read mfmsr 173 Write mtmsr Integer Processing Fixed-Point Exception Register1 Data Cache Exception Address Register Exception Syndrome Register Interrupt Vector Offset Register 0 through 15 Interrupt Vector Prefix Register Machine Check Syndrome Register Machine State Register L2 Cache Performance Monitor MSR Save and Restore Register 0 SRR0 Yes R/W x‘01A’ 174 Save and Restore Register 1 SRR1 Yes R/W x‘01B’ 175 PMUCC0 Yes R/W x‘35A’ 259 No Read x‘34A’ 259 PMU Core Control Register 1. See the Power ISA V2.05 Specification for details. 2. This register is renamed to VRSAVE in the Power ISA V2.05 Specification. Register Summary Page 264 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Table A-1. Register Categories (Page 3 of 3) Category Memory Management Name Short Name Privileged ISPCR Yes MMU Bolted Entries 0 Register MMUBE0 MMU Bolted Entries 1 Register MMU Configuration Register Invalidate Search Priority Configuration Register 123 Yes x‘334’ 121 MMUBE1 Yes x‘335’ 121 MMUCR Yes R/W x‘3B2’ 126 PID Yes R/W x‘030’ 120 RMPD Yes R/W x‘339’ 120 RSTCFG Yes Read x‘39B’ 126 Supervisor Search Priority Configuration Register SSPCR Yes R/W x‘33E’ 122 User Search Priority Configuration Register USPCR Yes R/W x‘33F’ 125 Core Configuration Register 0 CCR0 Yes R/W x‘3B3’ 69 Core Configuration Register1 CCR1 Yes R/W x‘378’ 70 Core Configuration Register2 CCR2 Yes R/W x‘379’ 73 Processor ID Register PIR Yes Read x‘11E’ 68 Processor Version Register PVR Yes Read x‘11F’ 68 SPR General 0 SPRG0 Yes R/W x‘110’ 67 SPR General 1 SPRG1 Yes R/W x‘111’ 67 SPR General 2 SPRG2 Yes R/W x‘112’ 67 SPRG3 Yes R/W x‘113’ 67 No Read x‘103’ 67 Real Mode Page Description Register Reset Configuration Register SPR General 31 SPR General 4 SPRG4 Yes R/W x‘104’ 67 SPR General 5 SPRG5 Yes R/W x‘105’ 67 SPR General 6 SPRG6 Yes R/W x‘106’ 67 SPR General 7 SPRG7 Yes R/W x‘107’ 67 SPRG8 Yes R/W x‘25C’ N/A USPRG0 No R/W x‘100’ 67 DECAR Yes R/W x‘036’ 159 Decrementer Register DEC Yes R/W x‘016’ 159 Time Base Lower Register TBL Yes Write x‘11C’ 158 No Read x‘10C’ Yes Write x‘11D’ No Read x‘10D’ SPR General 81 User SPR General 0 Timers R/W Page x‘33D’ Process ID Register Processor Control Access Address 2 Decrementer Auto-Reload Register Time Base Upper Register TBU 158 Timer Control Register TCR Yes R/W x‘154’ 163 Timer Status Register TSR Yes R/C x‘150’ 164 Write x‘350’ 1. See the Power ISA V2.05 Specification for details. 2. This register is renamed to VRSAVE in the Power ISA V2.05 Specification. Version 2.2 July 31, 2014 Register Summary Page 265 of 322 User’s Manual PowerPC 476FP Embedded Processor Core A.1 Data Cache Address Compare 1 Register (DAC1) DAC1 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:63 DAC1 Description Data cache address compare 1 This register holds the compare address for a data address compare event. A.2 Data Cache Address Compare 2 Register (DAC2) DAC2 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:63 DAC2 Description Data cache address compare 2 This register holds the compare address for a data address compare event. A.3 Data Cache Value Compare 1 Register (DVC1) DVC1 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:63 DVC1 Description Data cache value compare 1 This register holds the compare value for a data value compare event. A.4 Data Cache Value Compare 2 Register (DVC2) DVC2 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:63 DVC2 Description Data cache value compare 2 This register holds the compare value for a data value compare event. A.5 Debug Data Register (DBDR) Debug information 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Register Summary Page 266 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Bits Field Name 32:63 DBDR Version 2.2 July 31, 2014 Description Debug information. Register Summary Page 267 of 322 User’s Manual PowerPC 476FP Embedded Processor Core A.6 Data Cache Exception Syndrome Register (DCESR) Reserved DCDAAPU DCDAHIT DCINDXPE DCSNPPE DCOSPE DCLRUPE DCESPE Reserved Reserved DCTAPE DCRDPE DCDAPE The DCESR is written upon a parity error in the data cache. When it is written, it will not be updated on any future error until written by an mtspr. 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 Bits Field Name 32:35 DCRDPE 36:39 Reserved 40:43 DCESPE Data cache even set parity error. 0:3 The ways in set with (address[19] XOR address[26]) are equal to ‘0’. This field is used for tag array parity errors and can have multiple bits set. Errors can be reported even though the request is for an odd set. 44:47 DCOSPE Data cache odd set parity error. Multiple bits can be set. 0:3 The ways in set with (addr[19] XOR addr[26])are equal to ‘1’. This field is used for tag array parity errors and can have multiple bits set. Errors can be reported even though the request is for an even set. 48:54 DCINDXPE 55 DCDAPE Data cache data array parity error. If set, the requested data has a parity error. If the request is a miss, no error is reported. 56 DCTAPE Data cache tag array parity error. If set, at least one of the tags associated with a way in either the “even” or “odd” set has a parity error. The designation of which way is specified by the DCESPE and DCOSPE fields. 57 Reserved Reserved. 58 DCLRUPE Data cache LRU, valid, or lock parity error. A parity error exists in either the even or odd LRU, valid, or lock field for the requested set. 59 DCSNPPE Data cache snoop parity error A parity error exists on the snoop request received from the L2 cache. 60 DCDAHIT Data cache data array hit. This bit modifies the DCDAPE field when set. If both DCDAPE and DCDAHIT are set, there is a data parity error on a load request that hits in the data cache. If only DCDAPE is set, the parity error is because of a request that serviced from the line fill buffers. If DCDAPE is not set, this bit is ignored. 61 DCDAAPU Data cache data array APU. This bit modifies the DCDAPE field when set. If both DCDAPE and DCDAAPU are set, there is a data parity error on a load request for the APU. If only DCDAPE is set, the parity error is because of a processor core request. If DCDAPE is not set, this bit is ignored. 62:63 Reserved Register Summary Page 268 of 322 Description Data cache read interface parity error. Index of parity error in cache. This fields represents bits [20:26] of the real address. Bit 19 can be inferred from the DCESPE and DCOSPE fields. Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core IOCR1EN IOCR1M IOCR1EN IOCR1M IOCR1ME IOCR2ME IOCR1U IOCR2U A.7 Instruction Opcode Compare Control Register (IOCCR) 0 1 2 3 4 5 6 7 Reserved 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Bits Field Name Description 0 IOCR1EN IOCR1 enabled. 1 IOCR1M IOCR1 mode. 0 Match primary and secondary opcodes. 1 Match primary opcode only. 2 IOCR1EN IOCR2 enabled. 3 IOCR1M IOCR2 mode. 0 Match primary and secondary opcodes. 1 Match primary opcode only. 4 IOCR1ME IOCR1 Mask Enable 0 Compare instruction to IOCR without mask. 1 Mask instruction and IOCR for comparison. 5 IOCR2ME IOCR2 Mask Enable 0 Compare instruction to IOCR w/o mask 1 Mask instruction and IOCR for comparison 6 IOCR1U IOCR1 User Mode 0 Trap if match in user/privileged modes 1 Trap if match only in user mode 7 IOCR2U IOCR2 User Mode 0 Trap if match in user/privileged modes 1 Trap if match only in user mode 8:31 Reserved A.8 Instruction Opcode Compare Register 1 (IOCR1) PRI 0 1 2 3 SEC 4 5 6 7 8 9 SECMASK 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Bits Field Name 0:5 PRI Primary opcode to trap. 6:15 SEC Secondary opcode mask. 16:21 PRIMASK Primary opcode mask. 22:31 SECMASK Secondary opcode mask. Version 2.2 July 31, 2014 PRIMASK Description Register Summary Page 269 of 322 User’s Manual PowerPC 476FP Embedded Processor Core A.9 Instruction Opcode Compare Register 2 (IOCR2) PRI 0 1 2 3 SEC 4 5 6 7 8 9 Field Name 0:5 PRI Primary opcode to trap. 6:15 SEC Secondary opcode mask. 16:21 PRIMASK Primary opcode mask. 22:31 SECMASK Secondary opcode mask. Page 270 of 322 SECMASK 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Bits Register Summary PRIMASK Description Version 2.2 July 31, 2014 User’s Manual Version 2.0 PowerPC 476FP Embedded Processor Core Appendix B. Instruction Summary This appendix is to provide a list and description of instructions that are unique to the PowerPC 476FP processor. Standard instructions that are already covered in the PowerPC User Instruction Set Architecture Book I are not described in this section. Some instructions optionally end in a period (dot). The undotted instructions do not update the Condition Register. The dotted instructions do update the Condition Register. B.1 Instructions That Behave Differently from the Power ISA Specification Table B-1 shows instructions for the PowerPC 476FP core that are different from the Power Instruction Set Architecture V2.05 specification. Table B-1. New Instructions in the PowerPC 476FP Core Mnemonic dcbt dcbtst dci Instruction See Page Data cache block touch. 147 Data cache block touch for store. 148 Data cache invalidate. 146 B.2 Unsupported Power ISA Instructions The wait instruction (wait category) is not supported by the PowerPC 476FP processor. PowerPC 4xx embedded processors provide a different method for putting the processor into the wait mode. B.3 Integer Instructions in the PowerPC 476FP Processor Table B-2 lists all of the integer instructions that are implemented in the PowerPC 476FP processor. The opcodes and extended opcodes are shown in decimal. Table B-2. Power ISA V2.05 Integer Instructions (Page 1 of 6) Mnemonic Opcode Extended Opcode Instruction add[o][.] 31 266 Add. addc[o][.] 31 10 Add carrying. adde[o][.] 31 138 Add extended. addi 14 — Add immediate. addic 12 — Add immediate carrying. addic. 13 — Add immediate carrying and record. addis 15 — Add immediate shifted. addme[o][.] 31 234 Add to minus one extended. addze[o][.] 31 202 Add to zero extended. and[.] 31 28 AND. Version 2.2 July 31, 2014 Instruction Summary Page 271 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table B-2. Power ISA V2.05 Integer Instructions (Page 2 of 6) Mnemonic Opcode Extended Opcode Instruction andc[.] 31 60 AND with complement. andi. 28 — AND immediate. andis. 29 — AND immediate shifted. b[a][l] 18 — Branch. bc[a][l] 16 — Branch conditional. bcctr[l] 19 528 Branch conditional to count register. bclr[l] 19 16 Branch conditional to link register. cmp 31 0 Compare. cmpb 31 508 cmpi 11 — Compare immediate. cmpl 31 32 Compare logical. cmpli 10 — Compare logical immediate. cntlzw[.] 31 26 Count leading zeros word. crand 19 257 Condition register AND. crandc 19 129 Condition register AND with complement. creqv 19 289 Condition register equivalent. crnand 19 225 Condition register NAND. crnor 19 33 Condition register NOR. cror 19 449 Condition register OR. crorc 19 417 Condition register OR with complement. crxor 19 193 Condition register XOR. dcba 31 758 No-op. The ISA intention for this instruction is data cache block allocate. dcbf 31 86 Data cache block flush. dcbi 31 470 Data cache block invalidate. dcblc 31 390 Data cache block lock clear. dcbst 31 54 Data cache block store. dcbt 31 278 Data cache block touch dcbtls 31 166 Data cache block touch and lock set. dcbtst 31 246 Data cache block touch for store. dcbtstls 31 134 Data cache block touch for store and lock set. dcbz 31 1014 Data cache block set to zero. dci 31 454 Data cache invalidate. dcread 31 326 Data cache read (alternate encoding). divw[o][.] 31 491 Divide word. divwu[o][.] 31 459 Divide word unsigned. dlmzb[.] 31 78 Determine left most zero byte. eqv[.] 31 284 Equivalent. Instruction Summary Page 272 of 322 Compare bytes. Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Table B-2. Power ISA V2.05 Integer Instructions (Page 3 of 6) Mnemonic Opcode Extended Opcode Instruction extsb[.] 31 954 Extend sign byte. extsh[.] 31 922 Extend sign halfword. icbi 31 982 Instruction cache block invalidate. icblc 31 230 Instruction cache block lock clear. icbt 31 22 Instruction cache block touch. icbtls 31 486 Instruction cache block touch and lock set. ici 31 966 Instruction cache invalidate. icread 31 998 Instruction cache read. isel 31 15 Integer select. isync 19 150 Instruction synchronize. lbz 34 — Load byte and zero. lbzu 35 — Load byte and zero with update. lbzux 31 119 Load byte and zero with update indexed. lbzx 31 87 Load byte and zero indexed. lha 42 — Load halfword algebraic. lhau 43 — Load halfword algebraic with update. lhaux 31 375 Load halfword algebraic with update indexed. lhax 31 343 Load halfword algebraic indexed. lhbrx 31 790 Load halfword byte-reversed indexed. lhz 40 — Load halfword and zero. lhzu 41 — Load halfword and zero with update. lhzux 31 311 Load halfword and zero with update indexed. lhzx 31 279 Load halfword and zero indexed. lmw 46 — lswi 31 597 Load string word immediate. lswx 31 533 Load string word indexed. lwarx 31 20 Load word and reserve indexed. lwbrx 31 534 Load word byte-reversed indexed. lwz 32 — Load word and zero. lwzu 33 — Load word and zero with update. lwzux 31 55 Load word and zero with update indexed. lwzx 31 23 Load word and zero indexed. macchw[o][.] 4 172 Multiply accumulate cross halfword to word modulo signed. macchws[o][.] 4 236 Multiply accumulate cross halfword to word saturate signed. macchwsu[o][.] 4 204 Multiply accumulate cross halfword to word saturate unsigned. macchwu[o][.] 4 140 Multiply accumulate cross halfword to word modulo unsigned. machhw[o][.] 4 44 Multiply accumulate high halfword to word modulo signed. Version 2.2 July 31, 2014 Load multiple word. Instruction Summary Page 273 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table B-2. Power ISA V2.05 Integer Instructions (Page 4 of 6) Mnemonic Opcode Extended Opcode Instruction machhws[o][.] 4 108 Multiply accumulate high halfword to word saturate signed. machhwsu[o][.] 4 76 Multiply accumulate high halfword to word saturate unsigned. machhwu[o][.] 4 12 Multiply accumulate high halfword to word modulo unsigned. maclhw[o][.] 4 428 Multiply accumulate low halfword to word modulo signed. maclhws[o][.] 4 492 Multiply accumulate low halfword to word saturate signed. maclhwsu[o][.] 4 460 Multiply accumulate low halfword to word saturate unsigned. maclhwu[o][.] 4 396 Multiply accumulate low halfword to word modulo unsigned. mbar 31 854 Memory barrier. mcrf 19 0 mcrxr 31 512 Move to condition register from Integer Exception Register (XER). mfcr 31 19 Move from Condition Register. mfdcr 31 323 Move from Device Control Register. mfdcrux 31 291 Move from Device Control Register user-mode indexed. mfdcrx 31 259 Move from Device Control Register indexed. mfmsr 31 83 Move from Machine State Register. mfocrf 31 19 Move from one Condition Register field. mfspr 31 339 Move From Special Purpose Register. mftb 31 371 This is a special instruction, provided in lieu of mftbu and mftbl. msync 31 598 Synchronize. mtcrf 31 144 Move To Condition Register fields. mtdcr 31 451 Move to Device Control Register. mtdcrux 31 419 Move to Device Control Register user-mode indexed. mtdcrx 31 387 Move to Device Control Register indexed. mtmsr 31 146 Move to Machine State Register. mtocrf 31 144 Move to one Condition Register field. mtspr 31 467 Move to Special Purpose Register. mulchw[.] 4 168 Multiply cross halfword to word signed. mulchwu[.] 4 136 Multiply cross halfword to word unsigned. mulhhw[.] 4 40 Multiply high halfword to word signed. mulhhwu[.] 4 8 Multiply high halfword to word unsigned. mulhw[.] 31 75 Multiply high word. mulhwu[.] 31 11 Multiply high word unsigned. mullhw[.] 4 424 Multiply low halfword to word signed. mullhwu[.] 4 392 Multiply low halfword to word unsigned. mulli 7 — mullw[o][.] 31 235 Multiply low word. nand[.] 31 476 NAND. Instruction Summary Page 274 of 322 Move condition register field. Multiply low immediate. Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Table B-2. Power ISA V2.05 Integer Instructions (Page 5 of 6) Mnemonic Opcode Extended Opcode Instruction neg[o][.] 31 104 Negate. nmacchw[o][.] 4 174 Negative multiply accumulate cross halfword to word modulo signed. nmacchws[o][.] 4 238 Negative multiply accumulate cross halfword to word saturate signed. nmachhw[o][.] 4 46 Negative multiply accumulate high halfword to word modulo signed. nmachhws[o][.] 4 110 Negative multiply accumulate high halfword to word saturate signed. nmaclhw[o][.] 4 430 Negative multiply accumulate low halfword to word modulo signed. nmaclhws[o][.] 4 494 Negative multiply accumulate low halfword to word saturate signed. nor[.] 31 124 NOR. or[.] 31 444 OR. orc[.] 31 412 OR with complement. ori 24 — OR immediate. oris 25 — OR immediate shifted. popcntb 31 122 Population count bytes. prtyw 31 154 Parity word. rfci 19 51 Return from critical interrupt. rfi 19 50 Return from interrupt. rfmci 19 38 Return from machine check interrupt. rlwimi[.] 20 — Rotate left word immediate then mask insert. rlwinm[.] 21 — Rotate left word immediate then AND with mask. rlwnm[.] 23 — Rotate left then AND with mask. sc 17 — System call. slw[.] 31 24 Shift left word. sraw[.] 31 792 Shift right algebraic word. srawi[.] 31 824 Shift right algebraic word immediate. srw[.] 31 536 Shift right word. stb 38 — Store byte. stbu 39 — Store byte with update. stbux 31 247 Store byte with update indexed. stbx 31 215 Store byte indexed. sth 44 — sthbrx 31 918 sthu 45 — sthux 31 439 Store halfword with update indexed. sthx 31 407 Store halfword indexed. stmw 47 — stswi 31 725 Store string word immediate. stswx 31 661 Store string word indexed. Version 2.2 July 31, 2014 Store halfword. Store halfword byte-reversed indexed. Store halfword with update. Store multiple word. Instruction Summary Page 275 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table B-2. Power ISA V2.05 Integer Instructions (Page 6 of 6) Mnemonic Opcode Extended Opcode Instruction stw 36 — Store word. stwbrx 31 662 Store word byte-reversed indexed. stwcx. 31 150 Store word conditional indexed. stwu 37 — stwux 31 183 Store word with update indexed. stwx 31 151 Store word indexed. subf[o][.] 31 40 Subtract from. subfc[o][.] 31 8 Subtract from carrying. subfe[o][.] 31 136 Subtract from extended. subfic 8 — Subtract from immediate carrying. subfme[o][.] 31 232 Subtract from minus one extended. subfze[o][.] 31 200 Subtract from zero extended. tlbivax 31 786 TLB invalidate virtual address indexed. tlbre 31 946 TLB read entry. tlbsx[.] 31 914 TLB search indexed. tlbsync 31 566 TLB synchronize. tlbwe 31 978 TLB write entry. tw 31 4 Trap word. twi 3 — Trap word immediate. wrtee 31 131 Write Machine State Register (MSR) external enable. wrteei 31 163 Write MSR external enable immediate. xor[.] 31 316 XOR. xori 26 — XOR immediate. xoris 27 — XOR immediate shifted. Store word with update. B.4 Floating-Point Instructions Table B-3 lists all of the floating-point instructions that are implemented in the PowerPC 476FP processor. Table B-3. Floating-Point Instructions (Page 1 of 3) Mnemonic Opcode Extended Opcode fabs[.] 63 264 Floating absolute value. fadd[.] 63 21 Float add. fadds[.] 59 21 Float add single. fcfid[.] 63 846 Floating convert from integer doubleword. fcmpo 63 32 Floating compare ordered. fcmpu 63 0 Floating compare unordered. Instruction Summary Page 276 of 322 Instruction Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Table B-3. Floating-Point Instructions (Page 2 of 3) Mnemonic Opcode Extended Opcode fcpsgn[.] 63 8 fctid[.] 63 814 Floating convert to integer doubleword. fctidz[.] 63 815 Floating convert to integer doubleword with round toward zero. fctiw[.] 63 14 Floating convert to integer word. fctiwz[.] 63 15 Floating convert to integer word with round toward zero. fdiv[.] 63 18 Float divide. fdivs[.] 59 18 Float divide single. fmadd[.] 63 29 Float multiply-add. fmadds[.] 59 29 Float multiply-add single. fmr[.] 63 72 Floating move register. fmsub[.] 63 28 Float multiply-subtract. fmsubs[.] 59 28 Float multiply-subtract single. fmul[.] 63 25 Float multiply. fmuls[.] 59 25 Float multiply single. fnabs[.] 63 136 Float negative absolute value. fneg[.] 63 40 Floating negate. fnmadd[.] 63 31 Float negative multiply-add. fnmadds[.] 59 31 Float negative multiply-add single. fnmsub[.] 63 30 Float negative multiply-subtract. fnmsubs[.] 59 30 Float negative multiply-subtract single. fre[.] 63 24 Float reciprocal estimate. fres[.] 59 24 Float reciprocal estimate single. frim[.] 63 488 Floating round to integer minus. frin[.] 63 392 Floating round to integer nearest. frip[.] 63 456 Floating round to integer plus. friz[.] 63 424 Floating round to integer toward zero. frsp[.] 63 12 Floating round to single precision. frsqrte[.] 63 26 Float reciprocal square root estimate. frsqrtes[.] 59 26 Float reciprocal square root estimate single. fsel[.] 63 23 Floating select. fsqrt[.] 63 22 Float square root. fsqrts[.] 59 22 Float square root single. fsub[.] 63 20 Float subtract. fsubs[.] 59 20 Float subtract single. lfd 50 — Load floating-point double. lfdu 51 — Load floating-point double with update. lfdux 31 631 Version 2.2 July 31, 2014 Instruction Floating copy sign. Load floating-point double with update indexed. Instruction Summary Page 277 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table B-3. Floating-Point Instructions (Page 3 of 3) Mnemonic Opcode Extended Opcode lfdx 31 599 Load floating-point double with indexed. lfiwax 31 855 Load floating-point as integer word algebraic indexed. lfs 48 — Load floating-point single. lfsu 49 — Load floating-point single with update. lfsux 31 567 Load floating-point single with update indexed. lfsx 31 535 Load floating-point single indexed. mcrfs 63 64 Move to Condition Register from FPSCR. mffs[.] 63 583 Move from FPSCR. mtfsb0[.] 63 70 Move to FPSCR bit 0. mtfsb1[.] 63 38 Move to FPSCR bit 1. mtfsf[.] 63 711 Move to FPSCR fields. mtfsfi[.] 63 134 Move to FPSCR field immediate. stfd 54 — Store floating-point double. stfdu 55 — Store floating-point double with update. stfdux 31 759 Store floating-point double with update indexed. stfdx 31 727 Store floating-point double with indexed. stfiwx 31 983 Store floating-point as integer word indexed. stfs 52 — Store floating-point single. stfsu 53 — Store floating-point single with update. stfsux 31 695 Store floating-point single with update indexed. stfsx 31 663 Store floating-point single indexed. Instruction Summary Page 278 of 322 Instruction Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Appendix C. Instruction Execution Performance for Code Optimization This appendix describes how the PowerPC 476FP processor core fetches, dispatches, issues, and executes instructions. The instruction operation timing information provided in this appendix will help compiler developers and application programmers optimize their code. Though this appendix does not comprehensively identify every microarchitectural characteristic that might have a potential impact on instruction execution time within the PowerPC 476FP core, it provides a high-level overview of basic instruction operation and pipeline performance. The information provided is sufficient to analyze the performance of code sequences to a high degree of accuracy. The overall design characteristics of the PowerPC 476FP core follow: • Instruction predecode is performed outside of the 9-stage pipeline or preinstruction cache (I-cache). • Two-cycle accesses for the I-cache and data cache (D-cache). • A four-instruction submit (or a fetch group of four instructions) at a time. • Four simultaneous instruction issues. • Out-of-order instruction issue, execute, and complete, but in-order instruction commit (allowed to complete). • Six-pipeline structure. • Super pipeline floating-point (FP) execution. • Hardware symmetrical multiprocessor (SMP) support. C.1 PowerPC 476FP Pipeline Overview The PowerPC 476FP is a superscalar processor core capable of issuing four instructions (three integer instructions and one FP instruction) per cycle. The three integer instructions are issued to the branch execution pipeline, simple integer execution pipeline, and complex integer pipeline. The FP instruction is issued to the FP execution pipeline. C.1.1 PowerPC 476FP Integer Pipelines The PowerPC 476FP integer unit has a nine-stage pipeline structure. Figure C-1 on page 280 illustrates the integer pipeline structure. Version 2.2 July 31, 2014 Instruction Execution Performance for Code Optimization Page 279 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Figure C-1. PowerPC 476FP Integer Pipeline Structure ICRD Stages 1 - 3: Fetch and Instruction Capture IST I-Cache (Instructions and Predecoded Instruction Controls) ISD[0:3] Stage 4: Decode and Issue To FP Instruction Queue (See Figure C-2 on page 283) 4 Instructions Per Cycle (3 Integer, 1 FP) DISS[0:7] 3 Instructions Per Cycle Branch Pipeline (B-Pipe) Stage 5: Register Access Queue Complex Integer Pipeline (I-Pipe and M-Pipe) Simple Integer Pipeline (L-Pipe and J-Pipe) IRACC LRACC/JRACC BE0 BPQ JPQ AGEN BE1 IPQ GPR JEXE1 IEXE1 MEXE1 Branch Correction IPGPR BE2 D-Cache CRD JPGPR IEXE2 Stages 6 - 9: Execution Stages MEXE2 Divide/Multiply Accumulate BE3 DST IEXE3 MEXE3 SPR Read BEW LWB IWB/MWB C.1.1.1 ICRD, IST, and ISD Pipeline Stages In Figure C-1, the first three pipeline stages provide the fetch (instruction cache read [ICRD] and instruction steering [IST]) and instruction decode (ISD). The first three stages are also common between the integer pipelines and the floating-point pipeline. The fetch and ISD are designed to access four instructions, or a half I-cache line (a fetch group), at a time. The instructions of the fetch group are transmitted to both the integer dispatch/issue queue (decode and issue [DISS] queue) and the FP dispatch/issue queue (instruction queue [INSTQ]), simultaneously. See Figure C-2 on page 283 for more information about the INSTQ. Instruction Execution Performance for Code Optimization Page 280 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core The I-cache contains both instructions and their associated predecoded instruction controls and designators. These controls also are transmitted to the DISS and FP INSTQ to simplify DISS and INSTQ decoding. C.1.1.2 DISS Stage The DISS consists of eight entries and can hold up to eight instructions. Also, the DISS can issue up to three instructions per cycle. In the DISS stage, the three positions (DISS[0:2]) are eligible for issue to the register access (RACC) stage. Typically, DISS[2] is the oldest instruction of the three. However, if the RACC stage has available resources, DISS[2] can be issued to the RACC stage out-of-order. The DISS stage and the predecoded designators determine which pipeline is used for the instruction execution. The details of this algorithm are proprietary to the PowerPC 476FP core and not exposed to the user. C.1.1.3 RACC Stage The next pipeline stage is the register access (RACC) stage. The RACC stage consists of branch execute 0 (BE0) in the branch pipe, load/store register access (LRACC) and J-pipe (simple integer) register access (JRACC), and integer register access (IRACC). The RACC stage provides register access and dispatch of up to three instructions per cycle. The IRACC stage is dedicated to the complex integer (I-pipe) pipeline. The LRACC/JRACC stage is shared between the simple integer (J-pipe) and load/store (L-pipe) pipelines. Within the RACC stage are queuing stages. These stages are the branch pipe queue (BPQ), J-pipe queue (JPQ), and the I-pipe queue (IPQ). These queues reduce the timing constraint on the RACC stages from the lower execution stages. The remaining pipeline stages are provided for the execution of instructions. The BE0, LRACC, and IRACC pipeline stages are generally where instructions are held until their source operands to become ready. This is the case for all source operands, including conditional register resources. However, the data operands for store instructions are held in the address generation (AGEN) stage, and the accumulate operands for integer multiply-accumulate instructions are held in the IEXE1 stage. When the instruction source operands become ready, the instruction is dispatched to the first execution stage of the corresponding pipeline. Dispatch refers specifically to the action of moving from one of the RACC stages to the first execution stage of a pipeline: BE1, J-pipe execute 1 (JEXE1), AGEN, or IEXE1. Instructions in the RACC stages can be dispatched out-of-order with respect to each other if the pipeline required by the newer instruction becomes available before the pipeline required by the older instruction. Because of the out-of-order issue capability from the DISS stage, there is no guaranteed relative order between the two instructions in the RACC stage. However, the sequence of instructions contained within a given execution pipeline, such as branch pipe, L-pipe, J-pipe, I-pipe, and multiply and divide pipe (M-pipe) are always guaranteed to be in order with respect to each other to simplify resource order. C.1.1.4 Execution Pipeline Stages The final four stages are the execution pipeline stages. In this stage, the I-pipe is divided into the I-pipe and M-pipe, making five pipelines. The last four stages of the pipeline are unique for each of the five pipelines. Version 2.2 July 31, 2014 Instruction Execution Performance for Code Optimization Page 281 of 322 User’s Manual PowerPC 476FP Embedded Processor Core B-Pipe Execution Stages In the B-pipe, the BE1 stage tests the CR bits, determines whether the branch instruction is predicted correctly, and reports whether a branch correction is required. The remaining stages are provided to track the branch instruction until it is allowed to complete. L-Pipe Execution Stages In the L-pipe, addresses are generated in the AGEN stage. Also, store instructions are held in the AGEN stage until the store data operands become ready. The D-cache read (CRD) and data steering (DST) stages are where the D-cache is accessed to determine whether the target location exists in the D-cache and to obtain load data. The load write-back (LWB) stage is where load hit data is written back to the General Purpose Register (GPR) file and where store hit data is written back to the D-cache. J-Pipe Execution Stages In the J-pipe, the JEXE1 stage is the first cycle of instruction execution. For most operations, the result is available to be forwarded from the end of this stage to subsequent instructions requiring the result as a source operand. The J-pipe pre-GPR (JPGPR) data holding stage holds up to four results, similar to the I-pipe operation. This is to simplify and improve the execution bandwidth of the pipeline in case any subsequent resource hold conditions exist. The JPGPR can be written back to the GPR file when instructions are committed or allowed to complete. I-Pipe Execution Stages In the I-pipe, the IEXE1 stage is the first cycle of instruction execution. For most operations, the result is available to be forwarded from the end of this stage to subsequent instructions requiring the result as a source operand. The I-pipe pre-GPR (IPGPR) data holding stage holds up to four results. This is to simplify and improve the execution bandwidth of the pipeline in case any subsequent resource hold conditions exist. The IEXE2, IEXE3, and integer write-back (IWB) stages track the conditions of dot (.) instructions and other miscellaneous instructions. Both IPGPR and IWB can be written back to the GPR file when instructions are committed or allowed to complete. M-Pipe Execution Stages MEXE1 is also where integer multiply-accumulate instructions hold until the accumulate source operand becomes ready. Some operations (such as multiply and divide instructions) must continue to execute in MEXE2, MEXE3, and IWB to fully calculate their results. Divide instructions reside in IWB for various cycles while they iteratively calculate their result, at which point they write the result back to the GPR file. The divider is based on a radix-2 SRT division algorithm. The first two stages, MEXE1 and MEXE2, are used to prepare to compute the leading zeros of the dividend to reduce its execution iterations. The MEXE3 stage is used for division computation. Instruction Execution Performance for Code Optimization Page 282 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core C.1.2 PowerPC 476FP Floating-Point Pipelines Floating-point instructions are issued from the ICRD, IST, and ISD pipeline stages in the integer pipeline (see Section C.1.1 PowerPC 476FP Integer Pipelines on page 279) to the instruction queue (INSTQ) in the floating-point execution unit. Figure C-2 on page 283 illustrates the floating-point pipelines. Figure C-2. PowerPC 476FP Floating-Point Pipeline Structure From ICU IST IDI Subunit INSTQ 7 6 5 4 3 2 1 0 FA Decode Logic FL Decode Logic RPM Subunit RDA-A FARACC RDA-B RDA-C FARACCQ RDD-A RDD-B RDD-C FACMPLX RDA-S FLRACC FPR Logic and Macro WRFA FAUOP FLRACCQ RDD-S WRFL LSC Subunit FAEXE1 FLEXE1 Allocate in Lowest Available LDQ Entry 11 Loads FAEXE2 Stores (Denormal) Stores (Normal) Normalize LD Data Bank Mux A ... ... L FAEXE3 FAEXE4 FLST FAEXE5 FLWFPR 12 Fanout Allocate in 1st Available Entry Load Data from DCU Send Store Data to DCU 12-Way ... ... 0 Instruction and Data Data Convert FLWB FAEXE6 FAWB FAWFPR ASE Subunit Version 2.2 July 31, 2014 Instruction Execution Performance for Code Optimization Page 283 of 322 User’s Manual PowerPC 476FP Embedded Processor Core The PowerPC 476FP FP pipeline structure consists of two pipelines: one for FP execution, the other for FP loads/stores. The INSTQ is similar to DISS of the integer unit and has eight entries. However, only INSTQ[0:1] are eligible to issue instructions. The oldest instruction is always in position 0, second oldest in position 1, and so on. The floating-point unit (FPU) has resources for only two instructions to be completed, and thus, only INSTQ[0] and INSTQ[1] can issue an instruction on a given cycle. The FPU INSTQ is not required to be synchronized with the PowerPC 476FP CPU queue because the FPU only executes FP instructions, and the I-cache ISD sends only FP instructions with designators. The FA-pipe RACC (FARACC) stage is an RACC stage of the FP execution pipe to access the Floating-Point Register (FPR), and is similar to the IRACC of the integer unit. The FARACCQ is functionally similar to the BPQ, JPQ, and IPQ stages of the integer unit in that it holds RACC stage instructions in case of resource conflicts in the later stages. The next seven stages are for FP instruction execution. They are superpipelined for higher frequency with extended division stages and extended operation stages for denormal operation. The FA pipe execution[1:6] (FAEXE[1:6]) and FLRACC stages function as follows: • FAEXE1 and FAEXE2 are for both recoding and operand aligning. • FAEXE3 and FAEXE4 are for alignment and addition. • FAEXE5 is for normalization. • FAEXE6 is for rounding, and FAWB is for CR update and FPSCR update. • The FLRACC stage is an RACC stage for the FP load/store pipe. FLRACCQ is similar to FARACCQ. The FP load data is fetched by the CPU L-pipe. Store data is written by the CPU L-pipe into memory, and thus, the FPU has only queues for the load/store pipe. The FLEXE1 stage allocates an entry in LDQ (load queue) for an FP load instruction. There are eight entries to accommodate load data from the CPU because the CPU fetches the load operand. It is a first-in first-out (FIFO) queue and operates asynchronously between the CPU and the FPU. FP store data is transmitted to the CPU from FLEXE1 if the CPU L-pipe is ready accept the data. Otherwise, the FP L-pipe store stage is provided to hold the data an extra cycle. C.2 Instruction Execution Latency and Penalty The term, latency, refers to the number of cycles of execution required for a given instruction to produce its result, typically the value to be written to the target GPR specified as part of the instruction. Most integer instructions (such as the standard arithmetic and logical instructions) have one-cycle latency. One-cycle latency means that their results are ready at the end of the first execution stage of the pipeline. Thus, the results are available to be forwarded (delivered) to any subsequent instruction that might require that result as one of the subsequent instruction source operands. One significant exception to this is the load instruction category. These instructions have four-cycle latency (assuming the target memory location is found in the D-cache). Their results become available at the end of the fourth execution stage of the pipeline. The term, penalty, refers to the number of processor cycles for which a given instruction cannot proceed down the processor pipeline because of a dependency between itself and an immediately preceding instruction. In other words, if a source operand for a given instruction is the same as the target operand for the preceding instruction, the given instruction might have to hold in the operand access pipeline stage for some Instruction Execution Performance for Code Optimization Page 284 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core number of cycles waiting for its source operand to become ready. The length of the wait depends on the latency of the preceding instruction. For example, assume a source operand for a given instruction is the same as the target operand of an immediately preceding load instruction, which has four-cycle latency. There is a three-cycle penalty associated with the given instruction. This penalty is because the instruction waits at the operand access stage of the pipeline for three extra cycles for the load instruction to reach the forth execution stage and forward its result to the given instruction. In contrast, if the earlier instruction has one-cycle latency, there is a zero-cycle penalty (no penalty) associated with the dependent instruction. This is because the dependent instruction can proceed down the pipeline immediately after the earlier instruction, which forwards its result from the first execution stage of the pipeline. In the PowerPC 476FP core, the processor integer execution unit has a special data forwarding mechanism that is provided to minimize the N and N + 1 penalty (no penalty). Because the PowerPC 476FP core has a four-issue microarchitecture, certain instruction sequences can execute at a rate of more than one instruction per cycle, up to a maximum of five instructions per cycle because of instruction latencies. Thus, the penalty associated with such an instruction stream can be viewed as being less than zero, relative to the single-issue microarchitecture. Figure C-3 on page 286 through Figure C-5 on page 289 illustrate these sequences of instructions. Example One: Instruction Sequences Without a Dependency An example of instruction sequences without a dependency is shown in Figure C-3 on page 286 In this example add, sub, conditional branch (predicted correct), and fadd are simultaneously issued, and there are no dependency penalties. Version 2.2 July 31, 2014 Instruction Execution Performance for Code Optimization Page 285 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Figure C-3. Instruction Sequence Without a Dependency Clock 1 J-pipe JRACC add1 JEXE1 2 3 4 5 6 7 8 9 add1 JPGPR add1 Write to GPR I-pipe IRACC sub1 IEXE1 sub1 IPGPR sub1 Write to GPR B-pipe bc1 BE1 bc1 BE2 bc1 BE3 bc1 BE4 FP-pipe RACC FEXE1 FEXE2 bc1 fadd1 fadd1 fadd1 FEXE3 FEXE4 FEXE5 FEXE6 FWB fadd1 fadd1 fadd1 fadd1 fadd1 Write to FPR Instruction Execution Performance for Code Optimization Page 286 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Example Two: Instruction Sequence with a Dependency The same instruction example is used; however, the sub1 operand is dependent on the add1 result. All instructions are issued at the same time, but sub1 is held at the EXE1 stage until add1 is executed, and the result is put into I-pipe pre-GPR Register 0 (IPGPR0). Because multiple instructions are dispatched at the same time, a relative penalty on sub1 is zero cycles as the result. In the PowerPC 476FP integer execution unit, if the I-pipe is busy with other operations, the sub1 instruction can be held at DISS instead of being issued to the IRACC. However, the relative penalty on sub1 is the same (zero cycles). This is one of the advantages of the PowerPC 476FP core. Figure C-4 on page 288 illustrates an instruction sequence with a dependency. Version 2.2 July 31, 2014 Instruction Execution Performance for Code Optimization Page 287 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Figure C-4. Instruction sequence with a dependency Clock 1 J-pipe JRACC add1 JEXE1 2 3 4 5 6 7 8 9 add1 JPGPR add1 Write to GPR I-pipe IRACC sub1 IEXE1 sub1 (hold) sub1 IPGPR sub1 Write to GPR B-pipe bc1 BE1 bc1 BE2 bc1 BE3 bc1 BE4 FP-pipe RACC FEXE1 FEXE2 bc1 fadd1 fadd1 fadd1 FEXE3 FEXE4 FEXE5 FEXE6 FWB fadd1 fadd1 fadd1 fadd1 fadd1 Write to FPR Example Three: Load Instruction Followed by an add with a Dependency on the Load To simplify this example, only the CPU integer unit is shown in Figure C-5 on page 289. However, the FPU operates independently, in parallel, but asynchronously, with the CPU. Instruction Execution Performance for Code Optimization Page 288 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Because the J-pipe and the L-pipe share JRACC and LARCC, the J-pipe is not used in this stage. Thus, add is issued to the I-pipe, branch is issued to the B-pipe, and load is issued to the L-pipe simultaneously. In this example, one of the operands for the add instruction has a dependency on the load. Thus, it must wait until the operand is fetched from the D-cache and returned on LWB. Then, the operand is forwarded to the stage IEXE1 on the I-pipe. This is shown in Figure C-5. Figure C-5. Load Instruction Followed by an add with a Dependency on the Load Clock 1 L-pipe LRACC lwz1 AGEN 2 3 4 6 7 8 9 lwz1 CRD lwz1 DST lwz1 LWB I-pipe IRACC 5 lwz1 add1 IEXE1 add1 add1 add1 add1 IPGPR add1 add1 Write to GPR B-pipe BE1 BE2 bc1 bc1 bc1 BE3 BE4 bc1 bc1 C.3 Instruction Fetch and Decode An I-cache access consists of the ICRD stage, IST stage, and ISD stage (see Figure C-1 on page 280). Furthermore, just before the ICRD stage, there is a pseudo stage that arbitrates instruction fetch addresses such as interrupt vectors, branch correction addresses, branch target addresses, subsequent sequential fetch group addresses, and so on. This section describes all of these stages and the instruction predecoding stage that resides before the I-cache. Version 2.2 July 31, 2014 Instruction Execution Performance for Code Optimization Page 289 of 322 User’s Manual PowerPC 476FP Embedded Processor Core C.3.1 Instruction Fetch Address Arbitration and Fetch Process The fetch unit contains the following functions: • The instruction unit (IU) interface to provide fetch vectors for reset, refetch, interrupts, and branch mispredicts. • The line-fill interface to provide addresses for reload dumps (line fills). • Snoop addresses for remote I-cache entry invalidate (icbi) operations. • Cache operation control address and control (icbt, ici, and so on). • Branch-predict-taken target addresses. • Next instruction fetch addresses. The fetch unit manages the instruction fetch addressing before the ICRD stage and sets up the priority of instruction fetching. The following list shows the PowerPC 476FP processor fetcher priority: 1. Snoop addressing from the L2 cache. This is a remote icbi (see Power ISA Version 2.05 for the icbi function). There are four entry snoop queues to ensure I-cache and L2 snoop interface timing. 2. Reload dump addressing to transfer line-fill buffer instructions (eight instructions) for I-cache miss to I-cache. There are two line-fill buffers provided. 3. I-cache operation addressing for icbi (local), icbt, ici, and icread. 4. Instruction fetch addressing from the IU, such as reset vector addresses, interrupt vector addresses, branch correction addresses; and instruction refetch addresses, such as context switch cases, and synchronization refetch addresses. 5. Branch-predict-taken target fetch addressing. 6. Instruction translation lookaside buffer (ITLB) miss refetch addressing. 7. Snoop and reload dump-induced refetch addressing. 8. Next instruction fetch group addressing. In the PowerPC 476FP core, the instruction fetch address is normally generated in increments of 16 bytes. This is because four instructions (or 16 bytes) are fetched and submitted every clock cycle. However, branch targets might not be always in 16-byte boundaries. Thus, the arbiter has to adjust the instruction fetch address to a 0 or 16-byte address. The instruction fetcher output is directly connected to I-cache tag, instruction data array, branch history table (BHT), ITLB, branch target address cache (BTAC), and least recently used (LRU) array to fetch a group of instructions. This is the ICRD stage. See also Appendix C.4 Branch Prediction and Branch Instruction Processing on page 292 for branch prediction addressing. The next stage, IST, is mainly occupied by instruction group steering; one of the I-cache hit ways is selected and instructions are left aligned (in the case of branch target fetches). Up to four instructions are fetched simultaneously (an instruction fetch group). Each instruction accompanies the corresponding predecoded designators. This stage also includes steering instructions from the line-fill buffers if the fetch group is an I-cache miss. Instruction Execution Performance for Code Optimization Page 290 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core The ISD stage is provided to transfer instructions to DISS and FP INSTQ. This stage is more focused on transfer distance and loading because both DISS and FP INSTQ are farther away, especially the FPU. This stage is considered to be a submit stage; therefore, the instruction fetch group is sometimes called a submit group. C.3.2 Instruction Predecode, Instruction Field Adjust, and Endian Adjust In the PowerPC 476FP core, the little-endian byte swapping, instruction field swapping based on instruction types, and the instruction predecoding are performed at the L2 interface before instructions are stored into the I-cache to alleviate the execution stage logic and timing. The little-endian should be referred to ISA Book-III E for embedded. The instructions stored in the I-cache are all big-endian. C.3.2.1 Instruction Field Adjust To improve GPR accesses, the PowerPC 476FP core assigns GPR ports according to the hardware instruction operation rather than on a software code basis. The GPR is designed as a 3-read/3-write array: simultaneous 3-read and 3-write array. The instruction cache controller (ICC) preswaps these GPR address fields before the I-cache to reduce the logic level needed at the DISS issue point. The RA, RB, RS, RT, and IMMED fields of the following instruction classes are swapped: • Fixed point compare instructions • Fixed point trap instructions • Fixed point logical instructions • Fixed point shift/rotate instructions • Move-to-SPR class instructions • Some TLB instructions • DST class instructions • Special class instructions C.3.3 Instruction Predecode Certain attributes of instructions are predecoded before the instructions are written into the I-cache and are stored along with the instruction. The ICC is responsible for decoding this information from the instructions it receives from the L2 cache. These information bits are available in the I-cache and transmitted to DISS and FP INSTQ with the corresponding instructions. There are eight bits per instruction stored in the I-cache. Table C-1 on page 292 describes the bit assignments: Version 2.2 July 31, 2014 Instruction Execution Performance for Code Optimization Page 291 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Table C-1. Instruction Predecode Bit Definition isdExtData1[0:1] Selector 00: Special CPU (Must issue DISS[0]) isdExtData[2] isdExtData[3] isdExtData[4] isdExtData[5] Synchronize Type [0:1]: 00: other 01: str/multiple 10: stwcx 11: tlbivax Sync: 0: 1: isdExtData[6] isdExtData[7] ApRdEn2 BpRdEn 3 I- Pipe L- Pipe Store/Multiple: 0: Load 1: Store 01: CPU Normal TarWrEn4 ApRdEn BpRdEn SpRdEn5 PipeCtrl[0:1]: 00: Can use I-pipe or J-pipe 01: L-pipe only 10: I-pipe only 11: L-pipe only with RA update 10: FPU FPU TarWrEn Non L-Pipe: FPU ApRdEn Non L-Pipe: FPU BpRdEn L-Pipe: CPU BpRdEn Non L-Pipe: FPU CpRdEn L-Pipe: FPU SpRdEn PipeCtrl[0:1]: 00: FPU Pipe only 01: Needs L-pipe 10: Needs I-pipe 11: Needs L-pipe with RA update Unconditional Branch Static Predict Taken Dynamic L-Pipe: CPU ApRdEn 11: 1. 2. 3. 4. 5. Branch Type[0:1]: 00: Other 01: blr 10: bcctr 11: bdnz ‘at’ ISD extended data A-port read enable B-port read enable Target the GPR write enable S-port read enable C.4 Branch Prediction and Branch Instruction Processing The PowerPC 476FP branch prediction mechanism includes a 4 K × 2-bit BHT, a 32-entry branch target address content-addressable memory (CAM) (BTAC), link-stack, and Global History Register (GHR). This section describes the details of these mechanisms and branch instruction processing. Table C-2 on page 293 summarizes the branch-predict operations of branch instructions and BHT, BTAC, GHR, and link-stack use. Instruction Execution Performance for Code Optimization Page 292 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Table C-2. Branch Prediction and BHT, GHR, and BTAC Use Branch Instruction Type Operation BHT, GHR, BTAC, and Link-Stack Used Branch always BTAC bc with BO = 1z1zz Branch-predict-taken always BTAC bc with BO = 1z1at Dynamic branch-predict-taken BHT/GHR and BTAC1 bc with BO = 10000 bdnz type Branch-predict-taken if CTR > 1 BTAC Dynamic branch-predict-taken BHT/GHR and BTAC Static branch-predict-taken Link-stack bclr with other than BO = 1z1zz Dynamic branch-predict-taken BHT/GHR and Link-stack bcctr type Dynamic branch-predict-taken BHT/GHR and BTAC b or ba type bc with other BO cases bclr with B) = 1z1zz 1. When the CCR2[SPC51] bit is set, the ‘at’ bit is honored. The timing diagrams in Figure C-6, Figure C-7 on page 294, Figure C-8 on page 294 illustrate the advantages of BTAC/BHT and link-stack use. Figure C-6. Typical Branch-Predict-Taken Timing Diagram (Branch Target Address is Computed at ISD) Clock 1 ICRD groupA IST ISD DISS BE0 BE1 BE3 BE4 Version 2.2 July 31, 2014 2 3 4 5 6 7 8 9 tgtX groupA tgtX groupA tgtX groupA tgtX bc bc bc bc Instruction Execution Performance for Code Optimization Page 293 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Figure C-7. TBTAC and BHT Based Branch-Predict-Taken Timing Diagram (BTAC Hit and BTAC Contains the Branch Target Address) Clock 1 ICRD groupA IST 2 3 4 5 6 7 8 9 tgtX groupA ISD tgtX groupA DISS tgtX groupA BE0 tgtX bc BE1 bc BE3 bc BE4 bc Figure C-8. Link-Stack Based Branch-Predict-taken Timing Diagram (Link-Stack Pops the Branch Target Address at Clock 3) Clock 1 ICRD groupA IST ISD 2 3 4 5 6 7 8 9 tgtX groupA tgtX groupA DISS BE0 BE1 BE3 BE4 tgtX groupA tgtX bc bc bc bc C.4.1 Branch History Table Operation The BHT is used to maintain dynamic prediction of branches, and it is a history of actual codes and branches executed. It is implemented in a 512 × (8 × 2) array and direct-mapped and shared, but indexed using a combination of the branch address and a hash within the 6-bit GHR. The Gshare method of indexing is used to reduce the aliasing of branches by keeping a history of branch activity. The history consists of a stream of taken/not taken bits based on any branch in the instruction stream. A predetermined length of history is XORed with the higher order bits of the BHT address index. The address index consists of enough lower order bits of the fetch address to fully index into the chosen BHT size, or index into each instruction. That is, the lower 3 bits of the branch instruction address are used to index into a memory of 2-bit predictors called a Instruction Execution Performance for Code Optimization Page 294 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core branch history of the corresponding branch instruction. Each prediction entry in the BHT contains a saturating 2-bit counter that indicates whether a branch is recently taken or not taken. This prediction is read from the BHT to speculatively determine whether to begin fetching instructions from the branch target address. The most significant bit of each of the corresponding 2-bit counter (BHT entry) is used to decide whether the branch is predict-taken. When the most significant bit of the counter is 1, the branch is predicted to be taken. Thus, when the counter has a value less than x‘10’, the branch is predicted not to be taken. During a branch pipeline operation, when a branch outcome is determined, the 2-bit counter is updated by incrementing the counter if the branch was taken or decrementing it if the branch was not taken. When the counter saturates with x‘00’ or x‘11’, it remains at the saturation value until branch correction takes place, when it is altered to the opposite direction. This allows branches to absorb some of the aliasing that will occur as a result of different branches accessing the same counter in the BHT. Thus, it takes two iterations of a branch to alter prediction of an already saturated counter. Note that the 2-bit prediction counter value is recorded as an entry in the branch information queue (BIQ). Additionally, whether the value read from the BHT was actually used for the prediction is also recorded as an entry in the BIQ. When the BHT is accessed or read while it is written, the output of the BHT is assumed to be all 1’s (or forced to predict-taken). Because branch correction frequency is predicted to be low, or the BHT update frequency to be low, its impact is expected to be low. The reason for setting up the 2-bit counter to ‘11’ and counter saturation to be branch-predict-taken is to not write to the BHT, so that the BHT contents are preserved for the next read access. The static prediction is based on the branch instruction BO field. C.4.2 Global History Register Operation The Gshare branch prediction scheme uses a recent global branch outcome history and the branch instruction address to index into the BHT. A GHR is used to capture the outcome history of branch activity (a series of taken/not taken bits) when a branch is in the instruction stream. The GHR uses 6 bits to perform the Gshare branch prediction. It contains the results (branch taken/not taken) of the last six determined branches. The 6-bit history is XORed with the higher order bits of the address index, which consists of the lower order bits of the fetch address to fully index into the chosen 4 K BHT size. Indexing the BHT with the XOR of the branch history reduces hot usage spots in the BHT and improves BHT usage. The calculation to index into the BHT is shown in Figure C-9 on page 296. Note that this BHT index value is recorded as an entry in the BIQ also for BHT write indexing. In an ideal setting, the GHR requires a minimum of 17 bits, which includes 6 bits (for the most recent and valid determined branches) as previously discussed, and 11 bits (to track possible speculative/predicted branch conditions in the (8)DISS, BE0, BPQ, and BE1 stages). The speculative bits contain the possible prediction for each branch that is currently undetermined in the pipe. Typically these 11 bits are not used, but they are necessary to provide support should an undetermined branch reside in every possible stage of the pipeline. However, in the PowerPC 476FP implementation, the GHR contains 6 bits of branch history for the last six fetch groups that contain branches (determined branches). If a branch is predicted branch taken, a ‘1’ is shifted into the GHR. If a branch is predicted branch not taken, a ‘0’ is shifted into the GHR. If a fetch group of four instructions does not contain any branch instructions, the GHR is not changed or shifted. Additionally, 4 bits of branch history to track possible speculative or predicted branch conditions (because there are a maximum of four entries in the BIQ) and 4 bits of branch-determined-taken history for a branch correction in Version 2.2 July 31, 2014 Instruction Execution Performance for Code Optimization Page 295 of 322 User’s Manual PowerPC 476FP Embedded Processor Core the GHR are shifted back (shifted left). If the speculative bits are not eventually determined, a branch correction updates (shifts back to the previously determined point) the GHR before the next fetch, for a minimum of 14 bits total. On reset, the GHR is set to all 1’s. Figure C-9. GHR use for BHT Lookup Instruction EA GHR (14 Bits) 1 0 1 1 0 0 1 0 1 0 0 (One shift per fetch group if there is at least one branch) 1718 23 24 29 For branch correction Gshare (most recently determined branches) XOR GHR is reset to all 1’s at reset [0:5] [6:11] ≥ [9:11] are used to index into a word [0:8] for BHT index or 512 entries [0:9] are used for BIQ entry because BIQ covers 4 words C.4.3 Branch Target Address CAM (BTAC) Operation The BTAC provides another way to access branch target addresses and improve the target instruction access latency by two cycles. Because of this advantage, many branch predictions are performed using both BTAC and BHT dynamic prediction in the PowerPC 476FP core (see Table C-1 on page 292). The BTAC and BHT are accessed when the I-cache is looked up at the ICRD stage. If a branch instruction address match and BTAC entry are valid, and the BHT entry indicates branch-predict-taken, the corresponding target address of the entry is used to fetch the target instructions. This is strictly based on the instruction EA. Therefore, all BTAC entries that are BTAC-entry-valid are cleared with any context synchronizing instruction executions or operations. This BTAC-entry-valid flag is also cleared at the core reset and POR to ensure the BTAC entries and branch instructions integrity and correlations. The PowerPC 476FP core implements a 32-entry BTAC, and its replacement is performed by using a roundrobin method. C.4.4 Branch Link-Stack Operation When there are a significant amount of procedure calls, subroutine calls, or function calls, bclr type branch instructions are used in codes. To efficiently handle these nested calls and returns, or to improve the latencies of instruction accesses, a link-stack is implemented in the PowerPC 476FP core. The link-stack is a last-in first-out (LIFO) buffer used to maintain the ordering of consecutive subroutine calls and returns. On a subroutine call or a branch (an instruction with the branch and link form), the address of the next instruction (C + 4) is pushed onto the stack. While on a subroutine return (an instruction with the branch to link form), the entry at the top of the stack (which is expected to contain the address of the instruction following the original subroutine call) is popped from the stack and used as the branch target address fetch. Instruction Execution Performance for Code Optimization Page 296 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core In the PowerPC 476FP core, a four-entry link-stack is implemented to improve call and return codes. In some cases, the link-stack can become misaligned or corrupted during the process of speculative instruction fetching and execution. This means that the link-stack pointer was moved in the wrong spot or direction as a result of a misprediction. Generally, any time one or more BCLR instructions (branch condition to link register) are followed by one or more branch and link instructions in the speculated path, the link-stack becomes corrupted if the speculation turns out to be predicted incorrectly. However, a corrupted link-stack can be corrected after the first branch correction. When the link-stack is empty, a copy of the LR value is used instead of popping the link-stack to calculate the branch target address. Table C-3 illustrate the link-stack operation based on branch instruction types. Table C-3. Link-Stack Operations Instruction Type Stack Entry Stack Valid Operation bl, bla, bcl, bcla, bcctrl, bclrl CIA + 4 (next IAR) Validate Push bclr — Invalidate Pop bclrl CIA + 4 (next IAR) Invalidate ≥ Validate Pop ≥ Push C.4.5 Branch Instruction process When a branch instruction is issued to the B-pipe (BE0 stage), a type of branch instruction (based on instruction decode), a branch-predict-taken or not-taken indicator, and a branch target address from the BIQ are also sent to the execution unit. At the BE1 stage, the execution unit computes the branch target address based on the branch instruction type (Link Register based, counter based, absolute address based, or relative address based), compares it against the branch target address given, and checks whether the predicted address is correct. If the branch prediction direction is correct and the predicted address is correct, no branch correction flush is issued. If the predictions are incorrect, the execution unit generates a branch correction flush request to the IU, and the IU broadcasts the request to the entire CPU and FPU. C.4.6 Branch Information Queue Operation The BIQ is a 4-entry FIFO queue that maintains information regarding branch instructions. It contains the following of information: • Branch instruction address • Branch target address • Branch instruction location in a submit group • BHT index used • BHT used indicator • BTAC used indicator • Branch-predict-taken indicator The queue is split between the ICC and the IU, with the ICC portion containing the BHT data and the IU portion containing the addressing. The ICC queue is considered the master in that it is responsible for the controls for queue movement. The queue is split to reduce the wiring between units and to improve timing by keeping addresses physically closer to the execution unit (EU). Version 2.2 July 31, 2014 Instruction Execution Performance for Code Optimization Page 297 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Each entry in the BIQ represents an instruction submit group: a group of instructions submitted from the ISD to the DISS. This group might contain as few as one instruction, or as many as four, with any combination of branches. Only one predicted-taken branch can be present in an instruction submit group because all instructions newer than the branch-predict-taken branch are invalidated as the instruction stream is redirected. An entry is written into the BIQ as the submission occurs from the ISD to the DISS. In the event that the BIQ is full, the ICC must block all ISD valid bits. An entry is removed from the BIQ when the last branch being tracked in an entry is determined. C.5 Instruction Issue Operation The PowerPC 476FP core can generally issue four instructions in any given cycle to the RACC stage of the pipeline. The four oldest instructions in the issue queue stage of the pipeline (DISS0, DISS1, and DISS2 of the CPU integer unit and INSTQ0 and INSTQ1 of the FPU) are examined to determine which RACC stage they require (LRACC for the L-pipe or the J-pipe, IRACC for the I-pipe, BE0 for B-pipe, FARACC for the FP FA-pipe, and FLRACC for the FP FL-pipe). If they require (or can use) different RACC stages, they may issue together. Conversely, if both instructions require the same RACC stage, they must issue in separate cycles, with generally the older instruction issuing first, though instructions can be issued out-of order (or bypassed). Certain instruction types must use the LRACC and the L-pipe (such as storage access instructions). Certain instructions must use the IRACC and the I-pipe (such as multiply, divide, or SPR instructions). Branch instructions must be issued to the B-pipe, FP arithmetic instructions must be issued to the FA-pipe, and the rest of the instructions can use either the LRACC/J-pipe or the IRACC/I-pipe. FP load/store instructions are issued separately and share the operations between the integer unit and the FP unit because the GPR for operand address generation is in the integer unit, and the FPR for FP operands are in the FPU. This section summarizes the pipelines that may or must be used by each of the instruction categories and the rules regarding the simultaneous issuing of instructions. C.5.1 L-Pipe Instructions The following categories of instructions must use the LRACC dispatch stage and be executed by the L-pipe: • Storage access, including integer and load/store string operations. • Floating-point load/store instructions. Note: Update forms of load/store instructions, which update the base address register used for the load/store operation, are executed simultaneously by both the L-pipe and the J-pipe. For such instructions, the storage access operation uses the L-pipe, and the base address update operation uses the J-pipe. • Cache management instructions including I-cache management. • Storage synchronization being shared with the I-pipe, such as msync, mbar, and lwsync. • The stwcx instruction being shared with the I-pipe. • TLB management operations, such as tlbivax and tlbsync, being shared with the I-pipe. • Allocated cache management instructions. • Allocated D-cache debug instructions. Instruction Execution Performance for Code Optimization Page 298 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core C.5.2 I-Pipe Instructions The following categories of instructions must use the IRACC dispatch stage and be executed by the I-pipe: • Integer multiply • Integer divide • Allocated arithmetic (includes multiply-accumulate, negative multiply-accumulate, and multiply halfword) • Integer trap • Integer count leading zeros • CR manipulating instructions • Allocated logical (dlmzb) • popcntb instruction • TLB management operations, such as tlbivax and tlbsync, being shared with L-pipe • Processor control (includes register management, system linkage) • Storage synchronization, being shared with L-pipe, such as msync, mbar, and lwsync • All instructions which use or update the CR (including integer and floating-point instructions) Note: The stwcx. instruction is both a storage access and a CR-updating instruction, and hence it is executed by both the L-pipe and the I-pipe. It simultaneously issues from DISS0 to both LRACC and IRACC. • All instructions that use or update the XER Note: The load/store string indexed (lswx and stswx) instructions are exceptions to this rule in that they use the XER[TBC] field but are executed by the L-pipe. C.5.3 I-Pipe and J-Pipe Instructions All other integer instructions can use either the IRACC dispatch stage and be executed by the I-pipe, or use the LRACC dispatch stage and be executed by the J-pipe. These instructions are as follows: • Integer arithmetic instructions that do not use the CR or the XER, except for multiply and divide instructions (which must be executed by the I-pipe) • Integer logical instructions that do not update the CR, except for count leading zeros instructions (which must be executed by the I-pipe) • Integer rotate instructions that do not update the CR • Integer shift instructions that do not update the CR or the XER C.5.4 B-Pipe Instructions The following categories of instructions must use the BE0 dispatch stage and be executed by the B-pipe: • Unconditional branch instructions • Conditional branch instructions including bdnz Version 2.2 July 31, 2014 Instruction Execution Performance for Code Optimization Page 299 of 322 User’s Manual PowerPC 476FP Embedded Processor Core • Branch to LR instructions • Branch to CTR instructions C.5.5 FA-pipe Instructions All FP arithmetic instructions must use the FPU and be executed by the FP FA-pipe. C.5.6 FP FL-pipe Instructions All FP load/store instructions must use the FPU and be executed by the FP FL-pipe. All FP load/store instructions must be shared with the integer L-pipe because the operand addressing is done in the L-pipe and the operand is accessed by the L-pipe. C.5.7 Special Issue Rules for System Synchronizing Instructions Because of various architectural requirements regarding context synchronization, interrupt ordering, and system synchronization, the following instructions are treated uniquely with regards to issuing: • isync, mtmsr, rfi, rfci, rfmci, sc, wrtee, and wrtee These instructions all must wait until they occupy DISS0 before being issued to the IRACC stage (they occupy both the I-pipe and the L-pipe). Furthermore, these special instructions each block any subsequent instructions from issuing until the special instruction has completed execution, at which time all subsequent instructions are flushed from the pipelines and refetched at the appropriate address according to the functional definition of the particular instruction. This behavior effectively increases the latency associated with those instructions. • msync, mbar, lwsync, and tlbsync These instructions all must wait until they occupy DISS0 before being issued to the IRACC or LRACC stage. Furthermore, these special instructions each block any subsequent instructions from issuing until the special instruction has completed execution. The msync, mbar, and lwsync instructions are also memory barrier instructions and ensure that a memory barrier is created. This behavior effectively increases the latency associated with those instructions. C.6 Instruction Execution and Penalties As described previously, the PowerPC 476FP core is a superscalar processor core capable of issuing four instructions, three integer instructions and one FP instruction, per cycle. The integer execution unit has a fivepipeline structure. The FP execution unit has a two-pipeline structure. The PowerPC 476FP core allows outof-order issue, execute, and complete, but requires in-order commitment for instructions to complete. In general, most sequences of four nondependent instructions that do not require the same RACC stage can be simultaneously issued, dispatched, executed, and completed. This results in a net execution performance of four instructions in one cycle corresponding to a penalty of three cycles, or two cycles in the integer unit alone, relative to the single-issue microarchitecture model of four instructions in four cycles. See the definition of penalty in Section C.2 Instruction Execution Latency and Penalty on page 284. There are, however, many instruction sequence scenarios where such parallel instruction processing is not possible because of various factors such as dependencies between the instructions, contention for the same RACC stage or execution pipeline (or both), load miss penalties, and so on. Instruction Execution Performance for Code Optimization Page 300 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core This section summarizes the exception cases: the instruction sequences for which simultaneous instruction processing of one form or another is not possible, thereby leading to an increase in the number of cycles required to process the instructions (a decrease in the instructions per cycle metric). These penalty cycles are generally from one or the other of the four instructions having to hold in a given pipeline stage for more than a cycle while a dependency or pipeline resource conflict is resolved. For any given sequence of four instructions, if the sequence is not covered by one of the rules listed in this section, it can be assumed that the four instructions can be processed simultaneously at the rate of four instructions per cycle. Exception: Calculating the total number of cycles to execute a sequence of greater than four instructions is not simply a matter of adding up the number of cycles identified in these rules for each of the consecutive instruction pairs in the sequence. Rather, the out-of-order issue, dispatch, execution, and completion capabilities of the PowerPC 476FP core make it possible, in most cases, for the cycles associated with any given instruction pair to be overlapped to varying degrees with the cycles associated with other instruction pairs. For example, a given sequence of two nondependent load instructions are followed by two nondependent add instructions. And then, the add instructions are followed by two branch instructions. The pair of branch instructions are subject to the previous exception that two instructions in each pair (loads pair, add pairs, and branch pair) require the same execution pipeline, and thus each pair effectively requires two cycles to complete. However, because the first add instruction can be issued with the first load instruction and the first branch instruction, the net throughput for these six instructions are two cycles, not six. This example illustrates how the theoretical maximum execution rate of three instructions per cycle in the integer unit can be maintained. Figure C-10 on page 302 illustrates this example. Version 2.2 July 31, 2014 Instruction Execution Performance for Code Optimization Page 301 of 322 User’s Manual PowerPC 476FP Embedded Processor Core Figure C-10. Instruction Sequence Example with no Dependency on the Integer Unit Clock 1 2 L-pipe LRACC ld1 ld2 AGEN ld1 CRD 3 4 5 6 ld1 LWB ld2 ld1 ld2 Write to GPR IEXE1 add1 add2 add1 add2 Write to GPR BE1 BE2 bc1 Write to GPR add2 IPGPR B-pipe 9 ld2 ld1 add1 8 ld2 DST I-pipe IRACC 7 Write to GPR bc2 bc1 bc2 bc1 BE3 BE4 bc2 bc1 bc2 bc1 bc2 Also note that when considering the penalty associated with the execution of any given pair of instructions where the second instruction has some form of dependency on the immediately preceding instruction, this penalty can generally be reduced or avoided altogether by inserting other, nondependent instructions between the pair. For example, consider the previously mentioned case of a load instruction followed immediately by an instruction dependent on the load result. These two instructions take four cycles to execute, for a penalty of two cycles. As described previously, if the load instruction can be issued with the instruction preceding it, and the second (dependent) instruction can be issued with the instruction following it, the net execution time is four cycles for the four instructions, or a zero cycle penalty (a net of two cycles per two instructions, the same as the single-issue microarchitecture default) in this example. However, if the software sequence is changed such that four more nondependent instructions are inserted between the load and the dependent instruction, the net execution performance is increased back to eight instructions for the four cycles (the load and its preceding instruction, the four instructions after the load, the dependent instruction and its successor) in this example. Generally speaking, compilers should attempt to eliminate the penalties associated with the instruction pairings described in the following sections by inserting nondependent but useful instructions between the penalty-inducing pair. Instruction Execution Performance for Code Optimization Page 302 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Note: Some of the execution performance rules described in the following subsections are related to CR dependencies. When considering these dependencies, there is a special case that should be noted. Specifically, Condition Register logical instructions specify two bits of the CR as source operands and specify a third bit of the CR as the target operand. However, because the PowerPC 476FP core updates the CR on a field (as opposed to a bit) basis, the field containing the target bit operand is actually considered a source operand for the sake of any of the CR-related dependency rules described in the following subsections. This is necessary to source the old value of the three bits of the target field that are not being updated by the condition register logical instruction. Note: Some of the execution performance rules described in the following subsections are related to XER dependencies. Specifically, various “o” form instructions update XER[SO,OV], and various other instructions read XER[SO], XER[OV], or both as a source operand. These instructions that use XER[SO] or XER[OV] as a source operand are: mfspr (with the XER specified as the source SPR), mcrxr, compare instructions (which copy XER[SO] into CR[CR0][35]), and all record form instructions (which copy XER[SO] into CR[CR0][35]). C.6.1 Contention for the Same RACC Stage If the two instructions require the same RACC stage, they must be issued in separate cycles, and thus their effective throughput is two cycles for the two instructions. This corresponds to a penalty of zero cycles, one cycle worse than the negative one-cycle penalty for the default, noncontention case where the two instructions can be issued together. C.6.2 GPR Operand Dependency If the second instruction has a GPR source operand that is the same as one of the first instruction GPR target operands (that is, a GPR read-after-write [RaW] hazard), in general, the second instruction must be executed at least one cycle after the first instruction. This requires at least two cycles to execute the two instructions (zero cycles of penalty). Depending on the instruction type of the first instruction (and the pipeline stage at which it finishes calculating its result), the second instruction might have to wait more than one cycle for its source operand to become ready, thereby increasing the penalty for the two-instruction sequence even further. The circumstances that result in such additional delay are described in rules that are listed later in this section. Two exceptions to this general rule are for integer store instructions and the allocated integer multiply-accumulate (MAC) instructions. The exceptions are as follows: • Exception 1: The second instruction of the sequence is a store instruction, and the source GPR operand of the second instruction that matches the target GPR operand of the first instruction is specifically the store data operand (that is, the RS operand shown in the store instruction description). • Exception 2: The second instruction of the sequence is a MAC instruction, and the source GPR operand of the second instruction that matches the target GPR operand of the first instruction is specifically the MAC accumulate operand (that is, the RT operand shown in the MAC instruction description, which is both a source and a target for MAC instructions). For both of these exceptions, the two instructions can generally still be executed and completed in parallel such that the effective throughput is still generally one cycle for the two instructions, which is equivalent to the nondependent case. Version 2.2 July 31, 2014 Instruction Execution Performance for Code Optimization Page 303 of 322 User’s Manual PowerPC 476FP Embedded Processor Core This parallel execution is generally possible because the store data operand and the MAC accumulate operand are both accessed one cycle later in the execution pipeline than are other GPR source operands. As is the case for the general GPR operand dependency rule described in the preceding paragraphs, however, and depending on the instruction type of the first instruction, there might be additional delay in the calculation of the first instruction result, and hence, additional penalty in the execution of the two-instruction sequence in such cases, even for the special case of the second instruction being one of these two types. Again, the circumstances under which this additional penalty applies are described in rules that are listed later in this section. In general though, the GPR dependency-related penalty for the special case of the dependency being for the store data or MAC accumulate operand is one cycle fewer than the standard GPR dependencyrelated penalty. C.6.3 General CR Operand Dependency There is no need for a separate general rule for execution penalty associated with CR operand dependencies, corresponding to the general rule for GPR dependencies. This is because all instructions that use the CR (either as a source or as a target operand) must issue to the IRACC pipeline stage and be executed in the I-pipe. Therefore, the RACC contention rule described in Section C.6.1 Contention for the Same RACC Stage on page 303 applies to all instruction sequences involving such CR dependencies, leading to a default base execution rate of two cycles for the two instructions with such a dependency. For example, the sequence of a compare instruction (which writes a field of the CR as a target) followed by a conditional branch (which reads a bit of the CR as a source) takes two cycles to execute. This is true whether the branch instruction is actually conditional upon a CR bit that was updated by the compare instruction or whether the branch is conditional. For branch instructions, there are other considerations related to the predicted outcome of the branch and the latency with which the instructions subsequent to the branch may be executed. Also, as is the case with GPR dependencies, there are other special cases involving instructions that do not calculate their CR results in the first cycle of execution (IEXE1 pipeline stage), and hence, introduce additional cycles of penalty when the subsequent instruction is dependent on those CR results. These special cases are covered in the rules listed later in this section. C.6.4 Multiply Dependency Multiply instructions (including the Power ISA 32-bit × 32-bit multiply instructions and the allocated 16-bit × 16-bit multiply-halfword instructions) calculate their results in the IWB pipeline stage (including the GPR result, and the CR result for record forms of multiply that update the CR, and the XER result for “o” forms of multiply that update XER[SO,OV]). Therefore, instruction sequences consisting of a multiply followed immediately by an instruction that uses the multiply result (either the GPR, CR, or XER result) as an input operand take five cycles to complete. This corresponds to a three-cycle penalty, or three cycles more than the penalty for the general GPR dependency rule described in Section C.6.2 GPR Operand Dependency on page 303. Also, if the dependency involved in the sequence is specifically a store data GPR operand, the penalty is one cycle less, or a total execution time of five cycles, not six. The same is true if the first instruction is a Power ISA 32-bit × 32-bit multiply instruction and the second instruction is a MAC instruction, with the only dependency between the two being the MAC accumulate GPR operand (the total execution time is four cycles). However, unlike what is described in Section C.6.2 GPR Operand Dependency, if the first instruction in the sequence is specifically a multiply-halfword instruction and the second instruction is a MAC instruction using the GPR result of the multiply-halfword instruction as the accumulate operand, the penalty associated with the sequence is two cycles, or a total execution time of four cycles for the two instructions, not six. Instruction Execution Performance for Code Optimization Page 304 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core C.6.5 Multiply-Accumulate (MAC) Dependency MAC instructions calculate their results in the IWB (or MWB) pipeline stage including the GPR result, the CR result for record forms of MAC that update the CR, and the XER result for “o” forms of MAC that update XER[SO,OV]. Therefore, instruction sequences consisting of a MAC instruction followed immediately by an instruction that uses the MAC result (either the GPR, CR, or XER result) as an input operand generally take five cycles to complete. This corresponds to a three-cycle penalty, or three cycles more than the penalty for the general GPR dependency rule described in Section C.6.2 GPR Operand Dependency on page 303. Also, if the dependency involved in the sequence is specifically a store data GPR operand, the penalty is one cycle fewer, or a total execution time of four cycles, not five. Unlike what is described in Section C.6.2 GPR Operand Dependency, if the second instruction in the sequence is another MAC instruction using the same GPR accumulate operand (and there is no XER[SO] dependency between the instructions), the penalty associated with the sequence is one cycle, or a total execution time of three cycles for the two instructions, not five. However, MAC instructions with the only dependency between them being the GPR accumulate operand can be executed with single-cycle throughput because of a special forwarding path within the execution pipeline. Lastly, because of a write-after-read (WaR) hazard, instruction sequences consisting of a MAC instruction preceded immediately by an instruction that updates the same GPR as the MAC instruction updates generally take three cycles to complete, which corresponds to a one-cycle penalty. C.6.6 Divide Dependency The divider is based on radix-2 SRT division algorithm. The first two stages, MEXE1 and MEXE2, are used to prepare to compute the leading zeros of the dividend to reduce its execution iterations. MEXE3 stage is used for divide computation. The divide instructions reside in IWB for various cycles as they iteratively calculate their results including the GPR result, the CR result for “record” forms of divide that update the CR, and the XER result for “o” forms of divide that update XER[SO,OV]. Therefore, instruction sequences consisting of a divide followed immediately by an instruction that uses the divide result (either the GPR, CR, or XER result) as an input operand takes various cycles to complete. The average penalty is expected to be 10+ cycles. In this case, 10+ cycles of the penalty for the general GPR dependency rule described in Section C.6.2 GPR Operand Dependency applies. Also note that as described in that section, if the dependency involved in the sequence is specifically a store data or MAC accumulate GPR operand (and there is no XER[SO] dependency between the instructions), the penalty is one cycle fewer, or a total execution time of (10+ cycles minus one) cycles. Furthermore, because divide instructions occupy the IWB pipeline stage for a total of 10+ cycles (instead of the standard one cycle), they impose an additional 10+ cycle penalty on any immediately succeeding instruction that also uses the I-pipe IWB stage. Otherwise, the I-pipe can be used to execute dot instructions. This is one of advantages with IPGPR (temporary buffers) implementation in the PowerPC 476FP core. On the other hand, instructions subsequent to the divide that use the L-pipe, J-pipe and I-pipe (because many of I-pipe instructions can be completed in IEXE1 and IEXE2) are not dependent on the result of the divide, and can be executed and completed while the divide is iterating in the IWB pipeline stage. C.6.7 Move to Condition Register Fields (mtcrf) Instruction Dependency Because of the nature of the mtcrf instruction, which can update any combination of the eight, 4-bit CR fields at once, subsequent instructions that use any bit or field of the CR as a source must wait for the preceding mtcrf instruction to complete before dispatching from the IRACC stage. Therefore, the total execution time Version 2.2 July 31, 2014 Instruction Execution Performance for Code Optimization Page 305 of 322 User’s Manual PowerPC 476FP Embedded Processor Core for a mtcrf instruction followed by an instruction using the CR as a source operand is five cycles, or a penalty of three cycles. Note that this penalty applies whether or not the mtcrf instruction is actually updating any of the CR bits or fields being used as source operands by the subsequent instruction. The following instructions use the CR as a source operand and hence are subject to this three-cycle penalty when they immediately follow a mtcrf instruction: • bc, bclr, bcctr (with BO[0] = ‘0’) • mfcr • mcrf • Condition Register logical instructions (crand, cror, crnand, crnor, crandc, crorc, crxor, creqv) C.6.8 Store Word Conditional Indexed (stwcx.) Instruction Dependency Because of the nature of the stwcx. instruction, which conditionally performs a storage access in addition to updating CR[CR0], subsequent instructions that use any bit of CR[CR0] as a source operand must wait for the preceding stwcx. instruction to complete before dispatching from the IRACC stage. Therefore, the total execution time for a stwcx. instruction followed by an instruction using any bit of CR[CR0] as a source operand is >20 cycles, or a penalty of about 20 cycles. This rather large latency is because of storage reservation being handled in L2 cache, and thus a minimum latency of L2 cache access is added. This is the PowerPC 476FP storage reservation implementation that is targeted for MP system performance. Also see Section C.6.19 lwarx and stwcx. Operations on page 311. The following instructions potentially use CR[CR0] (either the whole field or a single bit of the field) as a source operand, and if, so are subject to this 20-cycle penalty when they immediately follow a stwcx. instruction: • bc, bclr, bcctr (with BO[0] = ‘0’) • mfcr • mcrf • Condition Register logical instructions (crand, cror, crnand, crnor, crandc, crorc, crxor, creqv) C.6.9 Move from Conditional Register (mfcr) Instruction Dependency Because the mfcr instruction reads all eight CR fields at once, and because there can be multiple CR-updating instructions in execution at one time, the mfcr instruction must wait until all preceding CR updates have completed before beginning execution. Therefore, any two-instruction sequence involving a CR-updating instruction followed immediately by a mfcr instruction takes four cycles to execute, or a penalty of two cycles. See Section B Instruction Summary on page 271 for CR updating dot form instructions. Note that the actual penalty for the sequence of mtcrf followed immediately by mfcr is three cycles not two, as described in Section C.6.7 Move to Condition Register Fields (mtcrf) Instruction Dependency on page 305. Similarly, the penalty for the sequence of stwcx. followed immediately by mfcr is three cycles not two, as described in Section C.6.8 Store Word Conditional Indexed (stwcx.) Instruction Dependency on page 306. Instruction Execution Performance for Code Optimization Page 306 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core C.6.10 Move from Special Purpose Register (mfspr) Dependency The mfspr instruction provides its result in the IEXE2 pipeline stage. Therefore, instruction sequences consisting of a mfspr followed immediately by an instruction that uses the target GPR of the mfspr instruction as an input operand generally takes six cycles to complete, which corresponds to a four-cycle penalty, or four cycle more than the penalty for the general GPR dependency rule described in Section C.6.2 GPR Operand Dependency on page 303. Also note that as described in that section, if the dependency involved in the sequence is specifically a store data or MAC accumulate operand, the penalty is one cycle less, or a total execution time of five cycles, not six. In the PowerPC 476FP core, mfspr instructions have been put on the low priority because of their low frequency usage in general coding practice. However, this rule applies only to SPRs other than the LR, CTR, or XER. For these three SPRs, the results of the mfspr instructions are available in the IEXE1 stage and therefore the general GPR dependency rule of Section C.6.2 GPR Operand Dependency applies. C.6.11 Move from Machine State Register (mfmsr) Dependency The mfmsr instruction provides its result in the IEXE2 pipeline stage. Therefore, the same rule described in Section C.6.10 Move from Special Purpose Register (mfspr) Dependency for mfspr applies to the mfmsr instruction as well. C.6.12 Move to Special Purpose Register (mtspr) Dependency mtspr instructions occupy the IWB stage for a total of four cycles, and do not perform the write of the target SPR until this forth cycle to enforce various architectural rules regarding instruction ordering. Therefore, instruction sequences consisting of a mtspr followed immediately by a mfspr that references the same SPR takes seven cycles to complete, which corresponds to a five-cycle penalty. However, this penalty does not apply to the LR, CTR, or XER registers. Special handling within the execution pipeline allows a mtspr/mfspr sequence that involves one of these three registers to operate in two cycles, thereby incurring only the zerocycle penalty resulting from both instructions requiring the I-pipe. Similarly, when a mtspr instruction that specifically targets the MMUCR is followed immediately by a tlbsx instruction (which uses some fields of the MMUCR as input operands), the sequence also takes seven cycles to complete. Furthermore, because mtspr instructions occupy the IWB pipeline stage for a total of three cycles (instead of the standard one cycle), they impose an additional two-cycle penalty on any immediately succeeding instruction that also uses the I-pipe, regardless of any dependency that might exist. That is, any instruction sequence involving a mtspr instruction followed immediately by another instruction that uses the I-pipe takes a minimum of five cycles to execute, or a total penalty of three cycles. However, this penalty again does not apply to the LR, CTR, or XER, nor does it apply to the SPRG registers (SPRG0 - SPRG7 and USPRG0). Special handling within the execution pipeline allows a mtspr instruction that targets one of these registers to move through the pipeline in the normal fashion, occupying the IWB stage for only one cycle. In the PowerPC 476FP core, mtspr instructions have been put on the low priority because of their low frequency usage in general coding practice. Also, instructions subsequent to the mtspr that use the L-pipe or J-pipe can be executed and completed while the mtspr is continuing to occupy the IWB pipeline stage. Version 2.2 July 31, 2014 Instruction Execution Performance for Code Optimization Page 307 of 322 User’s Manual PowerPC 476FP Embedded Processor Core C.6.13 TLB Management Instruction Dependency In addition to the dependency between a mtspr that targets the MMUCR and a subsequent tlbsx instruction, for which the penalty is described in Section C.6.12 Move to Special Purpose Register (mtspr) Dependency on page 307, there are four other special case dependencies involving TLB management instructions that lead to execution penalties. First, the tlbwe instruction occupies the IWB pipeline stage for a total of four cycles (similar to the mtspr instruction). Therefore, any instruction sequence involving a tlbwe instruction followed immediately by another instruction that uses the I-pipe takes a minimum of five cycles to execute, or a total penalty of three cycles. However, instructions subsequent to the tlbwe that use the L-pipe or J-pipe can be executed and completed while the tlbwe is continuing to occupy the IWB pipeline stage. Second, instruction sequences involving a tlbre or tlbsx instruction followed immediately by a mfspr instruction (that targets any SPR except the LR, CTR, or XER) take five cycles to complete, corresponding to a penalty of three cycles. This penalty is from conflicting use of pipeline resources between the two instructions. Third, instruction sequences involving a tlbwe instruction followed immediately by a tlbre or tlbsx instruction also take five cycles to complete, corresponding to a penalty of three cycles. This penalty is from conflicting use of the TLB array between the two instructions. Fourth, instruction sequences involving a tlbre or tlbsx instruction followed immediately by a load, store, cache management (except dcba, which performs no-ops on the PowerPC 476FPcore), cache debug, or storage synchronization instruction, take five cycles to complete, corresponding to a penalty of three cycles. Similarly, if the first instruction is instead a tlbwe, the two-instruction sequence takes eight cycles to complete because the tlbwe instruction is held in the IWB pipeline stage for one extra cycle. Conversely, if the order of the two instructions is reversed, with the TLB management instruction coming immediately after a load, store, cache management, or cache debug, the two-instruction sequence takes either four or ten cycles to complete (it takes ten cycles if the first instruction is icbi, icbt, iccci, or icread, and four cycles otherwise). These penalties are all from the potential for conflicting use of the TLB array or other pipeline resources between the two instructions. C.6.14 DCR Register Managing Instruction Operation Dependency Because the DCR managing instructions (mtdcr, mtdcrx, mtdcrux, mfdcr, mfdcrx and mfdcrux) must interact with the asynchronous and slow clock DCR interface of the PowerPC 476FP core, they stall temporarily within the I-pipe. Specifically, these instructions are held in the IEXE1 pipeline stage as they participate in the asynchronous handshake protocol of the DCR interface. The number of cycles for which these instructions remain in the IEXE1 pipeline stage depends upon the speed with which the DCR device responds to the transaction. In general, a DCR managing instruction occupies the IEXE1 pipeline stage for two cycles, plus the number of CPU clock cycles associated with the DCR interface clock synchronization, and the transaction itself. The number of these extra cycles beyond the base two cycles depends on the relative clock frequencies of the CPU clock and the DCR interface clock, and on the number of cycles of the DCR transaction itself. Assume a CPU:DCR clock ratio of, R = C/D, where, the frequency of C and D are both in MHz, with C > D (always C > D) and R is an integer. Instruction Execution Performance for Code Optimization Page 308 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Because there is a DCR arbiter/arbitration latency, and DCR slave bus latency beyond DCR device transaction latency, the actual number can vary greatly, especially in an MP system. In 45 nm technology with MP and many DCR devices in a system, >> 100 cycles (X factor or XR) is expected. The DCR managing instructions occupy the IEXE1 pipeline stage for XR cycles, thereby leading to a penalty of XR cycles for any immediately subsequent instruction that must use the I-pipe. On the other hand, instructions subsequent to a DCR managing instruction that use the L-pipe or J-pipe can be executed and completed while the DCR managing instruction is continuing to occupy the IEXE1 pipeline stage. Furthermore, the mfdcr instruction cannot forward its GPR result to a subsequent instruction until the IEXE2 pipeline stage. Therefore, instruction sequences consisting of a mfdcr followed immediately by an instruction that uses the mfdcr, mfdcrx or mfdcrux target register as an input operand will generally take XR cycles to complete, which corresponds to an XR-cycle penalty. C.6.15 Processor Control Instruction Operation Various processor control instructions require special handling within the PowerPC 476FP core because of the context synchronization requirements of the Power ISA Version 2.05 Book-III E architecture. These instructions include: • sc • mtmsr • isync • rfi • rfci • rfmci Each of these instructions is issued from DISS0, and it requires that the instruction stream be flushed and refetched immediately after the instruction execution, either at the next sequential address (for mtmsr, and isync), or at the System Call interrupt vector location (for sc), or at the interrupt return address (for rfi, rfci, and rfmci). Because of the instruction refetching requirement and other instruction processing requirements, the minimum execution time for a two-instruction sequence involving one of these instructions as the first instruction is as follows: thirteen cycles (for mtmsr, isync, sc, rfi, rfci, and rfmci) Furthermore, none of these instructions can be issued together with any preceding instruction, which means that the minimum execution time is two cycles (zero-cycle penalty) for any two-instruction sequence in which the second instruction is one of these instructions. The wrtee and wrteei instructions are also issued from DISS0 and hold off any subsequent instructions being issued till they are completed. These instructions are not context synchronizing instructions, and therefore, they do not flush any fetched instructions. The minimum execution time for a two-instruction sequence involving one of these instructions as the first instruction is four cycles (for wrtee and wrteei) It will be confirmed at IRACC stage, committed at IEXE1 stage, and completed in the next cycle. The subsequent instructions can be issued next cycle. Version 2.2 July 31, 2014 Instruction Execution Performance for Code Optimization Page 309 of 322 User’s Manual PowerPC 476FP Embedded Processor Core C.6.16 Load Instruction Dependency Load instructions that obtain their data from the data cache generally provide their result in the LWB pipeline stage. Therefore, instruction sequences consisting of a load instruction followed immediately by an instruction that uses the target GPR of the load instruction as an input operand generally takes five cycles to complete, which corresponds to a three-cycle penalty, or three cycles more than the penalty for the general GPR dependency rule described in Section C.6.2 GPR Operand Dependency on page 303. Also note that as described in that section, if the dependency involved in the sequence is specifically a store data or MAC accumulate operand, the penalty is one cycle less, or a total execution time of four cycles, not five. The dependency described by this section applies only to the target data operand of a load instruction, and not to the target address operand of a load with update instruction for which the result is available from the JEXE1 pipeline stage, and hence, only the general GPR dependency rule applies. Note that there are many other factors that affect the performance of load and other storage access instructions (such as whether their target location is in the data cache). C.6.17 Load/Store Operations A load that depends on the result of a previous store must obtain the store data as required by the sequential execution model (SEM). To handle this type of read-after-write (RaW) hazard, a read instruction must hold in RACC until all of its operands are available (that is, the results of all previous writes to the read operands are known). It is not necessary that all of these earlier writes have actually been performed in the GPR file, or that these writes have even been committed by the CS. Rather, it is only required that the results be known and available such that the read operation can proceed into execution with the correct values. Similarly, a WaR hazard must be handled such that the results of later writes are not erroneously forwarded to earlier reads. Unlike the RaW hazard described previously, a write can leave a given RACC stage even if the read is holding in the other RACC stage. Note that this behavior is identical to the LRACC WaR hazard against a MAC in IRACC (described in Section C.6.5 Multiply-Accumulate (MAC) Dependency on page 305), which has a zero-cycle penalty. The stalls described previously are required to handle RaW and WaR hazards based on operand dependency between reads and writes. These stalls do not, however, handle various resource dependencies that might exist lower in the pipe. One special case is described here. First, note that load/store hits can generally flow through the pipe with an overall throughput of one cycle per instruction, assuming a load hit does not immediately follow a store hit to the same address (a typical compiler will never emit such a scenario). In the case that a load hit immediately follows a store hit to the same address, the load incurs a five-cycle penalty (three cycles for the store to write to the RAM array and two cycles for the load to reaccess the cache). In the special case of a load hit following two store hits where the load matches both stores, the load incurs a sixcycle penalty (four cycles of RAM writes plus two cycles for the load to reaccess the cache). C.6.18 String and Multiple Operations String load/store and multiple load/store instructions are issued from DISS0 and are operated by replication of loads/stores, repeating the LRACC, AGEN, CRD, DST, and LWB stages. To allow for simplifications in the hazard logic implementation for string/multiple operations, all load string multiples are assumed to update all registers (this simplification is necessary because it is not known which registers the string/multiple will access until the final piece is in AGEN). Thus, a load string or multiple that is in LRACC must hold until all older register reads are either past IRACC or are leaving IRACC this cycle (except for MAC, which must be leaving IEXE1). Conversely, a newer register write in IRACC or LRACC must Instruction Execution Performance for Code Optimization Page 310 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core wait until a store string or multiple is finished replicating and the last piece is leaving AGEN before it can proceed. A further simplification in the PowerPC 476FP core avoids GPR hazards that cause newer load string/multiple operations to hold in DISS0 for all older writes to be completely gone from all pipelines. Thus, the associated penalty for any load or store string/multiple depends on the operations that exist in the pipeline at the time the load or store is in LRACC. C.6.19 lwarx and stwcx. Operations Because the PowerPC 476FP core is designed for SMP support, the storage reservation instructions, such as lwarx and stwcx. are different from lwarx and stwcx of the previous PowerPC 4xx processors. The lwarx instruction is issued to the L-pipe as a normal load, but generates a D-cache miss by invalidating the cache entry, though the first operand can be in the L1 cache if hit, and stwcx is issued to both the I-pipe and the L-pipe through DISS0 and invalidates the cache entry. In other words, both instructions are operated on the L2 cache. A CR0 update, whether the reservation is success or not, by a stwcx instruction is after the response from the L2 cache, and thus, a CR0 update by a stwxc instruction is slow latency, about 20 cycles. However, a storage reservation operation is a system-level operation and an in-order operation. Thus, use storage reservation operations with care if performance is desired. C.6.20 Storage Ordering and Synchronizing Operations The msync, mbar, and lwsync instructions go down the L-pipe and are confirmed in LRACC. However, these instructions are storage-ordering and synchronization instructions. All preceding instructions will be completed before msync, mbar, and lwsync complete, and no subsequent instructions will be initiated. This is done in IU after msync, mbar, and lwsync are issued from DISS0. The msync instruction is a heavy instruction and waits until all other processors in the system acknowledge that they have processed or completed all preceding instructions. The mbar instruction is handled similarly. The lwsync instruction is a lighter version of msync and waits for the L2 cache to acknowledge that all preceding operations are complete. These instructions are performance limiting, especially msync and mbar, and therefore, use these instructions with care. C.6.21 Special TLB Managing Operations The tlbivax and tlbsync instructions are system-level instructions. The tlbivax instruction is broadcast to all processors in the system and invalidates a matching TLB entry in each processor. The tlbsync instruction ensures both storage ordering and synchronizing. The tlbivax instruction is issued from DISS0 to both the L-pipe and the I-pipe and is broadcast to all processors through the PLB. This instruction, though, does not hold off any subsequent instructions from being issued. On the other hand, the tlbsync instruction is issued to the L-pipe only and holds off all subsequent instructions being issued until all processors acknowledge that they have completed all visible instructions and are context synchronized. The tlbsync instruction is the heaviest instruction, even heavier than msync. Version 2.2 July 31, 2014 Instruction Execution Performance for Code Optimization Page 311 of 322 User’s Manual PowerPC 476FP Embedded Processor Core C.7 Interrupt Handling In the PowerPC 476FP core, the interrupt process of taking an interrupt spans three cycles. This is required for timing reasons and necessitated by the need to allow any outstanding, committed SPR updates to update the SPR before any subsequent interrupt vector is taken. During the first interrupt cycle, the interrupt logic detects the exception and latches the flush operation. In the second cycle, the flush is sent out to each unit and the proper address is steered into SRR0, CSSR0, and MCSSR0 if allowable (rfi, rfci, and rfmci do not update restore registers). In the third cycle, the MSR context is swapped and a fetch request for the interrupt vector is made. Refetch and stop requests are similarly handled in three cycles, with the next fetch request occurring in the third cycle. Because interrupts span three cycles, all new interrupt, refetch, and stop requests are blocked during the second and third cycle of processing. If another interrupt request exists and it is not disabled by the new MSR value, this three-cycle interrupt sequence is repeated one cycle later. Thus a new Instruction Address Register (IAR) value is captured in SRR0, CSSR0, and MCSSR0. Instruction Execution Performance for Code Optimization Page 312 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Glossary BD Branch displacement BHT Branch history table BI Branch index BO Branch option BT Branch Taken BWB Branch write-back CR Condition Register CRD Cache read CRPE Cache read parity enable CS Central scrutinizer CTR Count Register DBDR Debug Data Register DBSR Debug Status Register DCC Data cache control DCDBTRH Data Cache Debug Tag Register High DCDBTRL Data Cache Debug Tag Register Low DCDTRH Data Cache Debug Tag Register High DCDTRL Data Cache Debug Tag Register Low DCESR Data Cache Exception Syndrome Register DCLFD Data cache line fill data DCR Device Control Register DCRIPR Device Control Register Immediate Prefix Register DCU Data cache unit DEC Decrementer DS Data space DSI Data storage interrupt DTLB Data TLB DVC Data value comparison ENW Enable next watchdog Version 2.2 July 31, 2014 Glossary Page 313 of 322 User’s Manual PowerPC 476FP Embedded Processor Core EPN Effective page number ERPN Extended real page number ESR Exception Syndrome Register EU Execution unit EXP Exponent FIT Fixed-interval timer FP Floating-point FPR Floating-Point Register FPSCR Floating-Point Status and Control Register FPU Floating-point unit FT Freeze timers GPR General Purpose Register ICC Instruction cache controller ICDBTRH Instruction Cache Debug Tag Register High ICDBTRL Instruction Cache Debug Tag Register Low ICESR Instruction Cache Error Syndrome Register ICMP Instruction complete ICU Instruction cache unit IDE Imprecise debug event IOCCR Instruction Opcode Compare Control Register IR Intermediate result IRPT Interrupt ITLB Instruction translation lookaside buffer IU Instruction unit IVOR Interrupt Vector Table IVPR Interrupt Vector Prefix Register IWB Integer write-back unit LFB Line-fill-buffers LK Link bit LMQ Load miss queue LR Link Register Glossary Page 314 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core LRU Least recently used LWB Load write-back unit MCSR Machine Check Syndrome Register MIPS Million instructions per second MMU Memory management unit MMUCR Memory Management Unit Configuration Register MP Multiprocessor MRR Most-recent reset MSB Most significant byte MSR Machine State Register NH Next higher in magnitude NL Next lower in magnitude OV Overflow OX Overflow exception PC Program counter PD Physical design PGM Program exception PGPR Pre-General Purpose Register PID Process ID Register PIR Processor Identification Register PLB Processor local bus PVR Processor Version Register RET Return RLD Reload dump RMPD Real Mode Page Description Register RPN Real page number RSTCFG Reset Configuration SBQ Store buffer queue SEM Sequential execution model SO Summary Overflow SPR Special Purpose Register Version 2.2 July 31, 2014 Glossary Page 315 of 322 User’s Manual PowerPC 476FP Embedded Processor Core SPRG Special Purpose Register General SR Supervisor read SSPCR Supervisor Search Priority Configuration Register STID Set translation ID SW Supervisor write SX Supervisor execute TBC Transfer Byte Count TBL Time Base Lower TBU Time Base Upper TCR Timer Control Register TCS Timer clock select TID Translation ID TLB Translation lookaside buffer UDE Unconditional debug event USPCR User Search Priority Configuration Register UTLB unified translation lookaside buffer UX Underflow exception VX Invalid operation exception WIE Watchdog timer interrupt enable WP Watchdog timer period WRC Watchdog timer reset control WRS Watchdog timer reset status WS Word select XER Integer Exception Register XI Inexact exception ZX Zero divide exception Glossary Page 316 of 322 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core Index A addressing, 33 addressing modes, 35 data storage, 35 instruction storage, 35 Alignment interrupt, 192 alignment interrupts, 192 allocated instruction summary, 55 ANSI/IEEE Standard 754-1985, 85 arithmetic compare, 63 asynchronous interrupt class, 167 B BI field on conditional branches, 57 big endian defined, 36 structure mapping, 37 big endian mapping, 37 BO field on conditional branches, 57 branch instruction summary, 52 branch instructions, exception priorities for, 215 branch prediction, 58 branch processing, 56 branching control BI field on conditional branches, 57 BO field on conditional branches, 57 branch addressing, 56 branch prediction, 58 registers, 59 byte ordering, 36 big endian, defined, 36 instructions, 38 , 39 little endian, defined, 36 structure mapping big-endian mapping, 37 little endian mapping, 37 C cache management instructions summary data cache, 144 caching inhibited, 116 CCR0, 69 , 150 code self-modifying, 140 coherence data cache, 144 compare Version 2.2 July 31, 2014 arithmetic, 63 logical, 63 computional instructions, 96 Condition Register. See also CR context synchronization, 76 control data cache, 144 CR, 60 defined CR updating instructions, 61 instructions integer CR, 62 Critical Input interrupt, 185 critical interrupts, 169 Critical Save/Restore Register 0, 175 , 176 Critical Save/Restore Register 1, 176 , 177 CSRR0, 175 , 176 CSRR1, 176 , 177 CTR, 60 D data addressing modes, 35 data cache coherency, 144 data cache array organization and operation, 133 data cache controller. See DCC Data Cache Unit Overview, 31 data storage addressing modes, 35 Data Storage interrupt, 188 data storage interrupts, 188 Data TLB Error interrupt, 199 data TLB error interrupts, 199 dcbt functional description, 151 dcbt and dcbtst operation, 151 dcbtst functional description, 151 DCC (data cache controller) control, 144 debug, 144 features, 141 operations, 142 DCDBTRH, 151 DCDBTRL, 151 dcread functional description, 151 DCRs defined, 46 debug debug cache, 144 Debug Interrupt, 201 debug interrupts, 201 Decrementer Interrupt, 197 Index Page 317 of 322 User’s Manual PowerPC 476FP Embedded Processor Core decrementer interrupts, 197 device control registers, 46 Device Control Registers. See also DCRs E E storage attribute, 36 , 118 effective address calculation, 34 endianness, 36 , 118 ESR, 179 exception alignment exception, 192 critical input exception, 185 data storage exception, 188 external input exception, 191 floating-point, 85 illegal instruction exception, 193 inexact, 85 , 101 instruction storage exception, 190 instruction TLB miss exception, 201 machine check exception, 186 overflow, 85 , 101 privileged instruction exception, 193 program exception, 193 system call exception, 197 trap exception, 196 underflow, 85 , 101 zero divide, 85 exception priorities, 210 exception priorities for all other instructions, 216 allocated load and store instructions, 212 branch instructions, 215 floating-point load and store instructions, 211 integer load, store, and cache management instructions, 211 other allocated instructions, 213 other floating-point instructions, 212 preserved instructions, 215 privileged instructions, 214 reserved instructions, 216 return from interrupt instructions, 215 system call instruction, 214 trap instructions, 214 Exception Syndrome Register, 179 exception syndrome register, 179 Exceptions, 167 execution synchronization, 78 External Input interrupt, 191 external input interrupts, 191 Index Page 318 of 322 F Facilities, Debug, 29 Facilities, Test, 29 features DCC, 141 ICC, 134 Features, General, 26 Features, Power Control, 26 FEX, 87 fixed interval timer interrupt, 198 Fixed-Interval Timer interrupt, 198 floating point interrupt unavailable interrupts, 196 floating-point denormalized number, 91 infinity, 91 Not a Number, 92 sign, 93 zero, 91 floating-point compare and select instruction set index, 102 floating-point compare instructions comparison sets, 102 floating-point load and store instructions, exception priorities for, 211 floating-point multiply-add instructions, 100 floating-point operands, 96 double precision format, 96 single format, 96 floating-point rounding and conversion instruction set index, 101 floating-point status and control register, 102 instruction set index, 102 Floating-Point Unavailable interrupt, 196 Floating-Point Unit Overview, 30 freezing the timer facilities, 165 G G storage attribute, 117 General, 26 General Purpose Registers. See also GPRs GPRs defined, 45 GPRs, illustrated, 63 guarded, 117 H, I, J, K I storage attribute, 116 ICC (instruction cache controller) features, 134 operations, 134 implemented instruction set summary, 49 implicit update, 62 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core imprecise interrupts, 168 instruction partially executed, 172 instruction addressing modes, 35 instruction cache array organization and operation, 133 instruction cache controller. See ICC Instruction Cache Overview, 30 instruction classes, 47 Instruction Set, 29 instruction set brief summaries by category, 87 classes, 47 summary allocated instructions, 55 branch, 52 cache management, 54 CR logical, 53 integer arithmetic, 50 integer compare, 51 integer logical, 51 integer rotate, 51 integer shift, 52 integer storage access, 50 integer trap, 51 processor synchronization, 54 register management, 53 system linkage, 53 TLB management, 54 instruction set summary, 49 instruction storage addressing modes, 35 Instruction Storage interrupt, 190 instruction storage interrupts, 190 Instruction TLB Error Interrupt, 201 instruction TLB error interrupts, 201 instructions all other, exception priorities for, 216 allocated (other), exception priorities for, 213 allocated load and store, exception priorities for, 212 branch, exception priorities for, 215 by category, 96 byte ordering, 38 , 39 byte-reverse, 40 categories allocated instruction summary, 55 branch, 52 integer, 49 processor control, 52 storage control, 54 storage synchronization, 55 classes defined, 47 , 48 preserved, 48 computational, 96 CR updating, 61 data cache management instruction summary, 144 floating-point (other), exception priorities for, 212 Version 2.2 July 31, 2014 floating-point load and store, exception priorities for, 211 implemented instruction set summary, 49 integer compare CR update, 63 integer load, store, and cache management, exception priorities for, 211 mfmsr, 173 mtmsr, 173 noncomputational, 96 partially executed, 172 preserved, exception priorities for, 215 privileged, 75 privileged instructions, exception priorities for, 214 reserved, exception priorities for, 216 return from interrupt, exception priorities for, 215 rfi, 174 system call, exception priorities for, 214 trap, exception priorities for, 214 integer instructions arithmetic, 50 compare, 51 logical, 51 rotate, 51 shift, 52 storage access, 50 trap, 51 integer load, store, and cache management instructions, exception priorities for, 211 integer processing, 63 interrupt alignment interrupt, 192 data storage interrupt, 188 external input interrupt, 191 instruction partially executed, 172 Instruction Storage, 190 instruction storage interrupt, 190 instruction TLB miss interrupt, 201 machine check interrupt, 186 masking, 207 guidelines for system software, 209 ordering, 207 , 209 guidelines for system software, 209 program interrupt, 193 illegal instruction exception, 193 privileged instruction exception, 193 trap exception, 196 system call interrupt, 197 type Alignment, 192 Critical Input, 185 Data Storage, 188 Data TLB Error, 199 Debug, 201 Decrementer, 197 Index Page 319 of 322 User’s Manual PowerPC 476FP Embedded Processor Core External Input, 191 Fixed-Interval Timer, 198 Floating-Point Unavailable, 196 Instruction TLB Error, 201 Machine Check, 186 Program interrupt, 193 System Call, 197 Watchdog Timer, 199 interrupt and exception handling registers ESR, 179 interrupt classes asynchronous, 167 critical and non-critical, 169 machine check, 169 synchronous, 168 interrupt vector, 170 Interrupts, 167 interrupts definitions, 182 imprecise, 168 order, 209 ordering and masking, 207 ordering and software, 208 partially executed instructions, 172 precise, 168 registers, processing, 173 synchronous and imprecise, 168 synchronous and precise, 168 types alignment, 192 data storage, 188 data TLB error, 199 debug, 201 decrementer, 197 definitions, 182 external inputs, 191 fixed interval timer, 198 floating point unavailable, 196 instruction storage, 190 instruction TLB error, 201 machine check, 186 program, 193 watchdog timer, 199 vectors, 170 invalid operation exception bit, 101 L little endian structure mapping, 37 little endian mapping, 37 little endian, defined, 36 load operations, 142 logical compare, 63 LR, 59 Index Page 320 of 322 M M storage attribute, 117 Machine Check, 169 Machine Check interrupt, 186 machine check interrupts, 169 , 186 Machine State Register. See also MSR masking and ordering interrupts, 207 memory coherence required, 117 memory map, 33 memory organization, 33 mfmsr, 173 MSR, 173 defined, 46 mtmsr, 173 N noncomputational instructions, 96 non-critical interrupts, 169 O operands storage, 33 operations DCC, 142 ICC, 134 line flush, 143 load, 142 store, 142 ordering storage access, 143 ordering and masking interrupts, 207 Overview, 25 Overview, Instruction Cache, 30 P partially executed instructions, 172 PIR, 68 precise interrupts, 168 preserved instructions, exception priorities for, 215 priorities, exception, 210 privileged instructions, 75 privileged mode, 75 privileged operation, 75 privileged SPRs, 75 problem state, 75 processor control instruction summary, 52 processor control instructions CR logical, 53 register management, 53 Version 2.2 July 31, 2014 User’s Manual PowerPC 476FP Embedded Processor Core synchronization, 54 system linkage, 53 processor control registers, 66 Program interrupt, 193 program interrupts, 193 PVR, 68 R register CSRR0, 175 , 176 CSRR1, 176 , 177 ESR, 179 SRR0, 174 SRR1, 175 registers, 40 , 86 branching control, 59 CCR0, 69 , 150 CR, 46 , 60 CTR, 60 DCDBTRH, 151 DCDBTRL, 151 ESR, 179 GPRs, 63 interrupt processing, 173 LR, 59 MSR, 46 , 173 PIR, 68 processor control, 66 PVR, 68 RSTCFG, 74 SPRG0 SPRG7, 67 SPRG0-SPRG3, 67 TCR, 161 TSR, 162 types, 45 , 86 CR, 46 DCR, 46 GPR, 45 MSR, 46 SPR, 46 USPRG0, 67 XER, 64 registers, device control, 46 registers, summary, 40 , 86 requirements software interrupt ordering, 208 reserved instructions, exception priorities for, 216 return from interrupt instructions, exception priorities for, 215 rfi, 174 RSTCFG, 74 Version 2.2 July 31, 2014 S Save/Restore Register 0, 174 Save/Restore Register 1, 175 self-modifying code, 140 software interrupt ordering requirements, 208 Special Purpose Registers. See also SPRs speculative fetching, 76 SPRG0 SPRG7, 67 SPRG0-SPRG3, 67 SPRs defined, 46 SRR0, 174 SRR1, 175 storage access ordering, 143 storage attributes caching inhibited, 116 endian, 118 guarded, 117 Memory Coherence Required, 117 supported combinations, 118 user-definable (U0–U3), 118 write-through required, 116 storage control instruction summary, 54 storage control instructions cache management, 54 TLB management, 54 storage operands, 33 storage synchronization, 78 storage synchronization instruction summary, 55 store gathering, 142 store operations, 142 structure mapping big endian, 37 little endian, 37 supervisor state, 75 synchronization architectural references, 76 context, 76 execution, 78 storage, 78 synchronous interrupt class, 168 system call instruction, exception priorities for, 214 System Call interrupt, 197 T TCR, 161 Test, 29 Test and Debug Facilities, 29 time base writing, 159 timers freezing the timer facilities, 165 watchdog timer, 161 Index Page 321 of 322 User’s Manual PowerPC 476FP Embedded Processor Core watchdog timer state machine, 163 trap instructions exception priorities for, 214 TSR, 162 U, V, W U0–U3 storage attributes, 118 user mode, 75 USPRG0, 67 W storage attribute, 116 Watchdog Timer interrupt, 199 watchdog timer interrupts, 199 write-through required, 116 writing the time base, 159 X XER, 64 carry (CA) field, 66 overflow (OV) field, 65 summary overflow (SO) field, 65 transfer byte count (TBC) field, 66 Index Page 322 of 322 Version 2.2 July 31, 2014
© Copyright 2024