Gameboy Advance / Nintendo DS - Technical Info - Extracted from no$gba version 2.6a [ GBA | DS | CPU ]
General ARM7TDMI Information CPU Overview CPU Register Set CPU Flags CPU Exceptions CPU Memory Alignments The ARM7TDMI Instruction Sets THUMB Instruction Set ARM Instruction Set Pseudo Instructions and Directives Further Information ARM CP15 System Control Coprocessor CPU Instruction Cycle Times CPU Versions CPU Data Sheet
The ARM7TDMI is a 32bit RISC (Reduced Instruction Set Computer) CPU, designed by ARM (Advanced RISC Machines), and designed for both high performance and low power consumption. Fast Execution Depending on the CPU state, all opcodes are sized 32bit or 16bit (that's counting both the opcode bits and its parameters bits) providing fast decoding and execution. Additionally, pipelining allows - (a) one instruction to be executed while (b) the next instruction is decoded and (c) the next instruction is fetched from memory - all at the same time. Data Formats The CPU manages to deal with 8bit, 16bit, and 32bit data, that are called: 8bit - ByteThe two CPU states As mentioned above, two CPU states exist: - ARM state: Uses the full 32bit instruction set (32bit opcodes) - THUMB state: Uses a cutdown 16bit instruction set (16bit opcodes) Regardless of the opcode-width, both states are using 32bit registers, allowing 32bit memory addressing as well as 32bit arithmetic/logical operations. When to use ARM state Basically, there are two advantages in ARM state: - Each single opcode provides more functionality, resultingThe downsides are: - Not so fast when using 16bit memory systemWhen to use THUMB state There are two major advantages in THUMB state: - Faster execution up to approx 160% when using a 16bit busThe disadvantages are: - Not as multi-functional opcodes as in ARM state, so it willCombining ARM and THUMB state Switching between ARM and THUMB state is done by a normal branch (BX) instruction which takes only a handful of cycles to execute (allowing to change states as often as desired - with almost no overload). Also, as both ARM and THUMB are using the same register set, it is possible to pass data between ARM and THUMB mode very easily. The best memory & execution performance can be gained by combining both states: THUMB for normal program code, and ARM code for timing critical subroutines (such like interrupt handlers, or complicated algorithms). Note: ARM and THUMB code cannot be executed simultaneously. Automatic state changes Beside for the above manual state switching by using BX instructions, the following situations involve automatic state changes: - CPU switches to ARM state when executing an exception - User switches back to old state when leaving an exception
Overview The following table shows the ARM7TDMI register set which is available in each mode. There's a total of 37 registers (32bit each), 31 general registers (Rxx) and 6 status registers (xPSR). Note that only some registers are 'banked', for example, each mode has it's own R14 register: called R14, R14_fiq, R14_svc, etc. for each mode respectively. However, other registers are not banked, for example, each mode is using the same R0 register, so writing to R0 will always affect the content of R0 in other modes also. System/User FIQ Supervisor Abort IRQ Undefined -------------------------------------------------------------- R0 R0 R0 R0 R0 R0 R1 R1 R1 R1 R1 R1 R2 R2 R2 R2 R2 R2 R3 R3 R3 R3 R3 R3 R4 R4 R4 R4 R4 R4 R5 R5 R5 R5 R5 R5 R6 R6 R6 R6 R6 R6 R7 R7 R7 R7 R7 R7 -------------------------------------------------------------- R8 R8_fiq R8 R8 R8 R8 R9 R9_fiq R9 R9 R9 R9 R10 R10_fiq R10 R10 R10 R10 R11 R11_fiq R11 R11 R11 R11 R12 R12_fiq R12 R12 R12 R12 R13 (SP) R13_fiq R13_svc R13_abt R13_irq R13_und R14 (LR) R14_fiq R14_svc R14_abt R14_irq R14_und R15 (PC) R15 R15 R15 R15 R15 -------------------------------------------------------------- CPSR CPSR CPSR CPSR CPSR CPSR -- SPSR_fiq SPSR_svc SPSR_abt SPSR_irq SPSR_und --------------------------------------------------------------R0-R12 Registers (General Purpose Registers) These thirteen registers may be used for whatever general purposes. Basically, each is having same functionality and performance, ie. there is no 'fast accumulator' for arithmetic operations, and no 'special pointer register' for memory addressing. However, in THUMB mode only R0-R7 (Lo registers) may be accessed freely, while R8-R12 and up (Hi registers) can be accessed only by some instructions. R13 Register (SP) This register is used as Stack Pointer (SP) in THUMB state. While in ARM state the user may decided to use R13 and/or other register(s) as stack pointer(s), or as general purpose register. As shown in the table above, there's a separate R13 register in each mode, and (when used as SP) each exception handler may (and MUST!) use its own stack. R14 Register (LR) This register is used as Link Register (LR). That is, when calling to a sub-routine by a Branch with Link (BL) instruction, then the return address (ie. old value of PC) is saved in this register. Storing the return address in the LR register is obviously faster than pushing it into memory, however, as there's only one LR register for each mode, the user must manually push its content before issuing 'nested' subroutines. Same happens when an exception is called, PC is saved in LR of new mode. Note: In ARM mode, R14 may be used as general purpose register also, provided that above usage as LR register isn't required. R15 Register (PC) R15 is always used as program counter (PC). Note that when reading R15, this will usually return a value of PC+nn because of read-ahead (pipelining), whereas 'nn' depends on the instruction and on the CPU state (ARM or THUMB). CPSR and SPSR (Program Status Registers) (ARMv3 and up) The current condition codes (flags) and CPU control bits are stored in the CPSR register. When an exception arises, the old CPSR is saved in the SPSR of the respective exception-mode (much like PC is saved in LR). For details refer to chapter about CPU Flags.
Current Program Status Register (CPSR) Bit Expl.Bit 31-28: Condition Code Flags (N,Z,C,V) These bits reflect results of logical or arithmetic instructions. In ARM mode, it is often optionally whether an instruction should modify flags or not, for example, it is possible to execute a SUB instruction that does NOT modify the condition flags. In ARM state, all instructions can be executed conditionally depending on the settings of the flags, such like MOVEQ (Move if Z=1). While In THUMB state, only Branch instructions (jumps) can be made conditionally. Bit 27: Sticky Overflow Flag (Q) - ARMv5TE and ARMv5TExP and up only Used by QADD, QSUB, QDADD, QDSUB, SMLAxy, and SMLAWy only. These opcodes set the Q-flag in case of overflows, but leave it unchanged otherwise. The Q-flag can be tested/reset by MSR/MRS opcodes only. Bit 27-8: Reserved Bits (except Bit 27 on ARMv5TE and up, see above) These bits are reserved for possible future implementations. For best forwards compatibility, the user should never change the state of these bits, and should not expect these bits to be set to a specific value. Bit 7-0: Control Bits (I,F,T,M4-M0) These bits may change when an exception occurs. In privileged modes (non-user modes) they may be also changed manually. The interrupt bits I and F are used to disable IRQ and FIQ interrupts respectively (a setting of "1" means disabled). The T Bit signalizes the current state of the CPU (0=ARM, 1=THUMB), this bit should never be changed manually - instead, changing between ARM and THUMB state must be done by BX instructions. The Mode Bits M4-M0 contain the current operating mode. Binary Hex Dec Expl.Writing any other values into the Mode bits is not allowed. Saved Program Status Registers (SPSR_<mode>) Additionally to above CPSR, five Saved Program Status Registers exist: SPSR_fiq, SPSR_svc, SPSR_abt, SPSR_irq, SPSR_und Whenever the CPU enters an exception, the current status register (CPSR) is copied to the respective SPSR_<mode> register. Note that there is only one SPSR for each mode, so nested exceptions inside of the same mode are allowed only if the exception handler saves the content of SPSR in memory. For example, for an IRQ exception: IRQ-mode is entered, and CPSR is copied to SPSR_irq. If the interrupt handler wants to enable nested IRQs, then it must first push SPSR_irq before doing so.
Exceptions are caused by interrupts or errors. In the ARM7TDMI the following exceptions may arise, sorted by priority, starting with highest priority: - Reset - Data Abort - FIQ - IRQ - Prefetch Abort - Software Interrupt - Undefined Instruction Exception Vectors The following are the exception vectors in memory. That is, when an exception arises, CPU is switched into ARM state, and the program counter (PC) is loaded by the respective address. Address Exception Mode on Entry Interrupt FlagsBASE is normally 00000000h, but may be optionally FFFF0000h in some ARM CPUs. As there's only space for one ARM opcode at each of the above addresses, it'd be usually recommended to deposit a Branch opcode into each vector, which'd then redirect to the actual exception handlers address. Actions performed by CPU when entering an exception - R14=PC+nn ;save old PC, ie. return addressAbove "PC+nn" depends on the type of exception. Basically, in ARM state that nn-offset is caused by pipelining, and in THUMB state an identical ARM-style 'offset' is generated (even though the 'base address' may be only halfword-aligned). Required user-handler actions when returning from an exception Restore any general registers (R0-R14) which might have been modified by the exception handler. Use return-instruction as listed in the respective descriptions below, this will both restore PC and CPSR - that automatically involves that the old CPU state (THUMB or ARM) as well as old state of FIQ and IRQ disable flags are restored. As mentioned above (see action on entering...), the return address is always saved in ARM-style format, so that exception handler may use the same return-instruction, regardless of whether the exception has been generated from inside of ARM or THUMB state. FIQ (Fast Interrupt Request) This interrupt is generated by a LOW level on the nFIQ input. It is supposed to process timing critical interrupts at a high priority, as fast as possible. Additionally to the common banked registers (R13_fiq,R14_fiq), five extra banked registers (R8_fiq-R12_fiq) are available in FIQ mode. The exception handler may freely access these registers without modifying the main programs R8-R12 registers (and without having to save that registers on stack). In privileged (non-user) modes, FIQs may be also manually disabled by setting the F Bit in CPSR. IRQ (Normal Interrupt Request) This interrupt is generated by a LOW level on the nIRQ input. Unlike FIQ, the IRQ mode is not having its own banked R8-R12 registers. IRQ is having lower priority than FIQ, and IRQs are automatically disabled when a FIQ exception becomes executed. In privileged (non-user) modes, IRQs may be also manually disabled by setting the I Bit in CPSR. To return from IRQ Mode (continuing at following opcode): SUBS PC,R14,4 ;both PC=R14_irq-4, and CPSR=SPSR_irqSoftware Interrupt Generated by a software interrupt instruction (SWI). Recommended to request a supervisor (operating system) function. The SWI instruction may also contain a parameter in the 'comment field' of the opcode: In case that your main program issues SWIs from both inside of THUMB and ARM states, then your exception handler must separate between 24bit comment fields in ARM opcodes, and 8bit comment fields in THUMB opcodes (if necessary determine old state by examining T Bit in SPSR_svc); However, in Little Endian mode, you could use only the most significant 8bits of the 24bit ARM comment field (as done in the GBA, for example) - the exception handler could then process the BYTE at [R14-2], regardless of whether it's been called from ARM or THUMB state. To return from Supervisor Mode (continuing at following opcode): MOVS PC,R14 ;both PC=R14_svc, and CPSR=SPSR_svcNote: Like all other exceptions, SWIs are always executed in ARM state, no matter whether it's been caused by an ARM or THUMB state SWI instruction. Undefined Instruction Exception (supported by ARMv3 and up) This exception is generated when the CPU comes across an instruction which it cannot handle. Most likely signalizing that the program has locked up, and that an errormessage should be displayed. However, it might be also used to emulate custom functions, ie. as an additional 'SWI' instruction (which'd use R14_und and SPSR_und though, and it'd thus allow to execute the Undefined Instruction handler from inside of Supervisor mode without having to save R14_svc and SPSR_svc). To return from Undefined Mode (continuing at following opcode): MOVS PC,R14 ;both PC=R14_und, and CPSR=SPSR_undNote that not all unused opcodes are necessarily producing an exception, for example, an ARM state Multiply instruction with Bit 6 set to "1" would be blindly accepted as 'legal' opcode. Abort (supported by ARMv3 and up) Aborts (page faults) are mostly supposed for virtual memory systems (ie. not used in GBA, as far as I know), otherwise they might be used just to display an error message. Two types of aborts exists: - Prefetch Abort (occurs during an instruction prefetch) - Prefetch Abort (also occurs on BKPT opcodes, ARMv5 and up) - Data Abort (occurs during a data access) A virtual memory systems abort handler would then most likely determine the fault address: For prefetch abort that's just "R14_abt-4". For Data abort, the THUMB or ARM instruction at "R14_abt-8" needs to be 'disassembled' in order to determine the addressed data in memory. The handler would then fix the error by loading the respective memory page into physical memory, and then retry to execute the SAME instruction again, by returning as follows: prefetch abort: SUBS PC,R14,#4 ;PC=R14_abt-4, and CPSR=SPSR_abtSeparate exception vectors for prefetch/data abort exists, each should use the respective return instruction as shown above. Reset Forces PC=VVVV0000h, and forces control bits of CPSR to T=0 (ARM state), F=1 and I=1 (disable FIQ and IRQ), and M4-0=10011b (Supervisor mode).
The CPU does NOT support accessing mis-aligned addresses (which would be rather slow because it'd have to merge/split that data into two accesses). When reading/writing code/data to/from memory, Words and Halfwords must be located at well-aligned memory address, ie. 32bit words aligned by 4, and 16bit halfwords aligned by 2. Mis-aligned STR,STRH,STM,LDM,LDRD,STRD,PUSH,POP (forced align) The mis-aligned low bit(s) are ignored, the memory access goes to a forcibly aligned (rounded-down) memory address. For LDRD/STRD, it isn't clearly defined if the address must be aligned by 8 (on the NDS, align-4 seems to be okay) (align-8 may be required on other CPUs with 64bit databus). Mis-aligned LDR,SWP (rotated read) Reads from forcibly aligned address "addr AND (NOT 3)", and does then rotate the data as "ROR (addr AND 3)*8". That effect is internally used by LDRB and LDRH opcodes (which do then mask-out the unused bits). The SWP opcode works like a combination of LDR and STR, that means, it does read-rotated, but does write-unrotated. Mis-aligned LDRH,LDRSH (does or does not do strange things) On ARM9 aka ARMv5 aka NDS9: LDRH Rd,[odd] --> LDRH Rd,[odd-1] ;forced alignOn ARM7 aka ARMv5 aka NDS7/GBA: LDRH Rd,[odd] --> LDRH Rd,[odd-1] ROR 8 ;read to bit0-7 and bit24-31Mis-aligned PC/R15 (branch opcodes, or MOV/ALU/LDR with Rd=R15) For ARM code, the low bits of the target address should be usually zero, otherwise, R15 is forcibly aligned by clearing the lower two bits. For THUMB code, the low bit of the target address may/should/must be set, the bit is (or is not) interpreted as thumb-bit (depending on the opcode), and R15 is then forcibly aligned by clearing the lower bit. In short, R15 will be always forcibly aligned, so mis-aligned branches won't have effect on subsequent opcodes that use R15, or [R15+disp] as operand.
When operating in THUMB state, cut-down 16bit opcodes are used. THUMB supported on T-variants of ARMv4 and up, ie. ARMv4T, ARMv5T, etc. Summary THUMB Instruction Summary Register Operations THUMB.1: move shifted register THUMB.2: add/subtract THUMB.3: move/compare/add/subtract immediate THUMB.4: ALU operations THUMB.5: Hi register operations/branch exchange Memory Addressing Operations THUMB.6: load PC-relative THUMB.7: load/store with register offset THUMB.8: load/store sign-extended byte/halfword THUMB.9: load/store with immediate offset THUMB.10: load/store halfword THUMB.11: load/store SP-relative THUMB.12: get relative address THUMB.13: add offset to stack pointer THUMB.14: push/pop registers THUMB.15: multiple load/store Jumps and Calls THUMB.16: conditional branch THUMB.17: software interrupt and breakpoint THUMB.18: unconditional branch THUMB.19: long branch with link (See also THUMB.5: BX Rs, and ADD/MOV PC,Rs.) Note: Switching between ARM and THUMB state can be done by using the Branch and Exchange (BX) instruction.
The table below lists all THUMB mode instructions with clock cycles, affected CPSR flags, Format/chapter number, and description. Only register R0..R7 can be used in thumb mode (unless R8-15,SP,PC are explicitly mentioned). Logical Operations Instruction Cycles Flags Format Expl.Carry flag affected only if shift amount is non-zero. Arithmetic Operations and Multiply Instruction Cycles Flags Format Expl.Jumps and Calls Instruction Cycles Flags Format Expl.The thumb BL instruction occupies two 16bit opcodes, 32bit in total. Memory Load/Store Instruction Cycles Flags Format Expl.THUMB Binary Opcode Format This table summarizes the position of opcode/parameter bits for THUMB mode instructions, Format 1-19. Form|_15|_14|_13|_12|_11|_10|_9_|_8_|_7_|_6_|_5_|_4_|_3_|_2_|_1_|_0_|Further UNDEFS ??? ARM9? 1011 0001 xxxxxxxx (reserved)
Opcode Format Bit Expl.Example: LSL Rd,Rs,#nn ; Rd = Rs << nn ; ARM equivalent: MOVS Rd,Rs,LSL #nn Zero shift amount is having special meaning (same as for ARM shifts), LSL#0 performs no shift (the the carry flag remains unchanged), LSR/ASR#0 are interpreted as LSR/ASR#32. Attempts to specify LSR/ASR#0 in source code are automatically redirected as LSL#0, and source LSR/ASR#32 is redirected as opcode LSR/ASR#0. Execution Time: 1S Flags: Z=zeroflag, N=sign, C=carry (except LSL#0: C=unchanged), V=unchanged.
Opcode Format Bit Expl.Return: Rd contains result, N,Z,C,V affected (including MOV). Execution Time: 1S
Opcode Format Bit Expl.ARM equivalents for MOV/CMP/ADD/SUB are MOVS/CMP/ADDS/SUBS same format. Execution Time: 1S Return: Rd contains result (except CMP), N,Z,C,V affected (for MOV only N,Z).
Opcode Format Bit Expl.ARM equivalent for NEG would be RSBS. Return: Rd contains result (except TST,CMP,CMN), Affected Flags: N,Z,C,V for ADC,SBC,NEG,CMP,CMNExecution Time: 1S for AND,EOR,ADC,SBC,TST,NEG,CMP,CMN,ORR,BIC,MVN
Opcode Format Bit Expl.Restrictions: For ADD/CMP/MOV, MSBs and/or MSBd must be set, ie. it is not allowed that both are cleared. When using R15 (PC) as operand, the value will be the address of the instruction plus 4 (ie. $+4). Except for BX R15: CPU switches to ARM state, and PC is auto-aligned as (($+4) AND NOT 2). For BX, MSBs may be 0 or 1, MSBd must be zero, Rd is not used/zero. For BLX, MSBs may be 0 or 1, MSBd must be set, Rd is not used/zero. For BX/BLX, when Bit 0 of the value in Rs is zero: Processor will be switched into ARM mode!BLX may not use R15. BLX saves the return address as LR=PC+3 (with thumb bit). Assemblers/Disassemblers should use MOV R8,R8 as NOP (in THUMB mode). Return: Only CMP affects CPSR condition flags! Execution Time: 1S for ADD/MOV/CMP
Opcode Format Bit Expl.The value of PC will be interpreted as (($+4) AND NOT 2). Return: No flags affected, data loaded into Rd. Execution Time: 1S+1N+1I
Opcode Format Bit Expl.Return: No flags affected, data loaded either into Rd or into memory. Execution Time: 1S+1N+1I for LDR, or 2N for STR
Opcode Format Bit Expl.Return: No flags affected, data loaded either into Rd or into memory. Execution Time: 1S+1N+1I for LDR, or 2N for STR
Opcode Format Bit Expl.Return: No flags affected, data loaded either into Rd or into memory. Execution Time: 1S+1N+1I for LDR, or 2N for STR
Opcode Format Bit Expl.Return: No flags affected, data loaded either into Rd or into memory. Execution Time: 1S+1N+1I for LDR, or 2N for STR
Opcode Format Bit Expl.Return: No flags affected, data loaded either into Rd or into memory. Execution Time: 1S+1N+1I for LDR, or 2N for STR
Opcode Format Bit Expl.Return: No flags affected, result in Rd. Execution Time: 1S
Opcode Format Bit Expl.Return: No flags affected, SP adjusted. Execution Time: 1S
Opcode Format Bit Expl.In THUMB mode stack is always meant to be 'full descending', ie. PUSH is equivalent to 'STMFD/STMDB' and POP to 'LDMFD/LDMIA' in ARM mode. Examples: PUSH {R0-R3} ;push R0,R1,R2,R3
Note: When calling to a sub-routine, the return address is stored in LR register, when calling further sub-routines, PUSH {LR} must be used to save higher return address on stack. If so, POP {PC} can be later used to return from the sub-routine.POP {PC} ignores the least significant bit of the return address (processor remains in thumb state even if bit0 was cleared), when intending to return with optional mode switch, use a POP/BX combination (eg. POP {R3} / BX R3). ARM9: POP {PC} copies the LSB to thumb bit (switches to ARM if bit0=0). Return: No flags affected, SP adjusted, registers loaded/stored. Execution Time: nS+1N+1I (POP), (n+1)S+2N+1I (POP PC), or (n-1)S+2N (PUSH).
Opcode Format Bit Expl.Both STM and LDM are incrementing the Base Register. The lowest register in the list (ie. R0, if it's in the list) is stored/loaded at the lowest memory address. Examples: STMIA R7!,{R0-R2} ;store R0,R1,R2
Return: No flags affected, Rb adjusted, registers loaded/stored.Execution Time: nS+1N+1I for LDM, or (n-1)S+2N for STM. Strange Effects on Invalid Rlist's Empty Rlist: R15 loaded/stored (ARMv4 only), and Rb=Rb+40h (ARMv4-v5). Writeback with Rb included in Rlist: Store OLD base if Rb is FIRST entry in Rlist, otherwise store NEW base (STM/ARMv4), always store OLD base (STM/ARMv5), no writeback (LDM/ARMv4/ARMv5; at this point, THUMB opcodes work different than ARM opcodes).
Opcode Format Bit Expl.Destination address must by halfword aligned (ie. bit 0 cleared) Return: No flags affected, PC adjusted if condition true Execution Time: 2S+1N if condition true (jump executed)
Opcode Format Bit Expl.SWI supposed for calls to the operating system - Enter Supervisor mode (SVC) in ARM state. BKPT intended for debugging - enters Abort mode in ARM state via Prefetch Abort vector. Execution SWI/BKPT: R14_svc=PC+2 R14_abt=PC+4 ;save return addressExecution Time: 2S+1N Interpreting the Comment Field: The immediate parameter is ignored by the processor, the user interrupt handler may read-out this number by examining the lower 8bit of the 16bit opcode opcode at [R14_svc-2]. In case that your program executes SWI's from inside of ARM mode also: Your SWI handler must then examine the T Bit SPSR_svc in order to determine whether it's been a ARM SWI - if so, examining the lower 24bit of the 32bit opcode opcode at [R14_svc-4]. For Returning from SWI use this instruction: MOVS PC,R14That instructions does both restoring PC and CPSR, ie. PC=R14_svc, and CPSR=SPRS_svc. In this case (as called from THUMB mode), this does also include restoring THUMB mode. Nesting SWIs: SPSR_svc and R14_svc should be saved on stack before either invoking nested SWIs, or (if the IRQ handler uses SWIs) before enabling IRQs.
Opcode Format Bit Expl.Return: No flags affected, PC adjusted. Execution Time: 2S+1N
Opcode Format This may be used to call (or jump) to a subroutine, return address is saved in LR (R14). Unlike all other THUMB mode instructions, this instruction occupies 32bit of memory which are split into two 16bit THUMB opcodes. First Instruction - LR = PC+4+(nn SHL 12) Bit Expl.Second Instruction - PC = LR + (nn SHL 1), and LR = PC+2 OR 1 (and BLX: T=0) Bit Expl.The destination address range is (PC+4)-400000h..+3FFFFEh, ie. PC+/-4M. Target must be halfword-aligned. As Bit 0 in LR is set, it may be used to return by a BX LR instruction (keeping CPU in THUMB mode). Return: No flags affected, PC adjusted, return address in LR. Execution Time: 3S+1N (first opcode 1S, second opcode 2S+1N). Note Exceptions may or may not occur between first and second opcode, this is "implementation defined" ???
When operating in ARM state, full 32bit opcodes are used. Summaries ARM Instruction Summary ARM Condition Field Jumps and Calls ARM.3: Branch and Exchange (BX, BLX) ARM.4: Branch and Branch with Link (B, BL, BLX) (Also, most various ALU, LDR, LDM opcodes can change PC.) Register Operations ARM.5: Data Processing ARM.6: PSR Transfer (MRS, MSR) ARM.7: Multiply and Multiply-Accumulate (MUL,MLA) Memory Addressing Operations ARM.9: Single Data Transfer (LDR, STR, PLD) ARM.10: Halfword, Doubleword, and Signed Data Transfer ARM.11: Block Data Transfer (LDM,STM) ARM.12: Single Data Swap (SWP) Exception Calls and Coprocessor ARM.13: Software Interrupt (SWI,BKPT) ARM.14: Coprocessor Data Operations (CDP) ARM.15: Coprocessor Data Transfers (LDC,STC) ARM.16: Coprocessor Register Transfers (MRC, MCR) ARM.X: Coprocessor Double-Register Transfer (MCRR,MRRC) ARM.17: Undefined Instruction ARM.X: Count Leading Zeros ARM.X: QADD/QSUB ARM 26bit Memory Interface Note: Switching between ARM and THUMB state can be done by using the Branch and Exchange (BX) instruction.
Modification of CPSR flags is optional for all {S} instructions. Logical Operations Instruction Cycles Flags Format Expl.Add x=1I cycles if Op2 shifted-by-register. Add y=1S+1N cycles if Rd=R15. Carry flag affected only if Op2 contains a non-zero shift amount. Arithmetic Operations Instruction Cycles Flags Format Expl.Add x=1I cycles if Op2 shifted-by-register. Add y=1S+1N cycles if Rd=R15. Multiply Instruction Cycles Flags Format Expl.Memory Load/Store Instruction Cycles Flags Format Expl.For LDR/LDM, add y=1S+1N if Rd=R15, or if R15 in Rlist. Jumps, Calls, CPSR Mode, and others Instruction Cycles Flags Format Expl.Coprocessor Functions (if any) Instruction Cycles Flags Format Expl.Note that no sections 1-2 exist, that is because the sections numbers comply with chapter numbers of the official ARM docs, which described ARM opcodes in chapter 3-17. ARM Binary Opcode Format |..3 ..................2 ..................1 ..................0|
In ARM mode, all instructions can be conditionally executed depending on the state of the CPSR flags (C,N,Z,V). The respective suffixes {cond} must be appended to the mnemonics. For example: BEQ = Branch if Equal, MOVMI = Move if Signed. Code Suffix Flags Meaning 0: EQ Z=1 equal (zero) (same) 1: NE Z=0 not equal (nonzero) (not same) 2: CS/HS C=1 unsigned higher or same (carry set) 3: CC/LO C=0 unsigned lower (carry cleared) 4: MI N=1 negative (minus) 5: PL N=0 positive or zero (plus) 6: VS V=1 overflow (V set) 7: VC V=0 no overflow (V cleared) 8: HI C=1 and Z=0 unsigned higher 9: LS C=0 or Z=1 unsigned lower or same A: GE N=V greater or equal B: LT N<>V less than C: GT Z=0 and N=V greater than D: LE Z=1 or N<>V less or equal E: AL - always F: NV - never (ARMv1,v2 only) (Reserved ARMv3 and up)To define a non-conditional instruction which is always to be executed (regardless of any flags), the AL suffix may be used - that is the same as if no suffix is specified. For example, MOVAL would be usually abbreviated to MOV. ARMv5 and up includes a few additional opcodes without condition field and which cannot be made conditional, these opcodes are: BKPT, PLD, CDP2, LDC2, MCR2, MRC2, STC2, and BLX_imm (however BLX_reg can be conditional). Execution Time: If condition=false: 1S cycle. Otherwise as specified for the respective opcode.
Opcode Format Bit Expl.Switching to THUMB Mode: Set Bit 0 of the value in Rn to 1, program continues then at Rn-1 in THUMB mode. Results in undefined behaviour if using R15 (PC+8 itself) as operand. Execution Time: 2S + 1N Return: No flags affected.
Opcode Format Branch (B) is supposed to jump to a subroutine. Branch with Link is meant to be used to call to a subroutine, return address is then saved in R14. Bit Expl.Branch with Link can be used to 'call' to a sub-routine, which may then 'return' by MOV PC,R14 for example. Execution Time: 2S + 1N Return: No flags affected.
Opcode Format Bit Expl.Second Operand (Op2) This may be a shifted register, or a shifted immediate. See Bit 25 and 11-0. Unshifted Register: Specify Op2 as "Rm", assembler converts to "Rm,LSL#0". Shifted Register: Specify as "Rm,SSS#Is" or "Rm,SSS Rs" (SSS=LSL/LSR/ASR/ROR). Immediate: Specify as 32bit value, for example: "#000NN000h", assembler should automatically convert into "#0NNh,ROR#0ssh" as far as possible (ie. as far as a section of not more than 8bits of the immediate is non-zero). Zero Shift Amount (Shift Register by Immediate, with Immediate=0) LSL#0: No shift performed, ie. directly Op2=Rm, the C flag is NOT affected. LSR#0: Interpreted as LSR#32, ie. Op2 becomes zero, C becomes Bit 31 of Rm. ASR#0: Interpreted as ASR#32, ie. Op2 and C are filled by Bit 31 of Rm. ROR#0: Interpreted as RRX#1 (RCR), like ROR#1, but Op2 Bit 31 set to old C. In source code, LSR#32, ASR#32, and RRX#1 should be specified as such - attempts to specify LSR#0, ASR#0, or ROR#0 will be internally converted to LSL#0 by the assembler. Using R15 (PC) When using R15 as Destination (Rd), note below CPSR description and Execution time description. When using R15 as operand (Rm or Rn), the returned value depends on the instruction: PC+12 if I=0,R=1 (shift by register), otherwise PC+8 (shift by immediate). Returned CPSR Flags If S=1, Rd<>R15, logical operations (AND,EOR,TST,TEQ,ORR,MOV,BIC,MVN): V=not affectedIf S=1, Rd<>R15, arithmetic operations (SUB,RSB,ADD,ADC,SBC,RSC,CMP,CMN): V=overflowflag of resultIF S=1, with unused Rd bits=1111b, {P} opcodes (CMPP/CMNP/TSTP/TEQP): R15=result ;modify PSR bits in R15, ARMv2 and below only.If S=1, Rd=R15; should not be used in user mode: CPSR = SPSR_<current mode>If S=0: Flags are not affected (not allowed for CMP,CMN,TEQ,TST). The instruction "MOV R0,R0" is used as "NOP" opcode in 32bit ARM state. Execution Time: (1+p)S+rI+pN. Whereas r=1 if I=0 and R=1 (ie. shift by register); otherwise r=0. And p=1 if Rd=R15; otherwise p=0.
Opcode Format These instructions occupy an unused area (TEQ,TST,CMP,CMN with S=0) of Data Processing opcodes (ARM.5). Bit Expl.MSR/MRS and CPSR/SPSR supported by ARMv3 and up. ARMv2 and below contained PSR flags in R15, accessed by CMP/CMN/TST/TEQ{P}. The field mask bits specify which bits of the destination Psr are write-able (or write-protected), one or more of these bits should be set, for example, CPSR_fsxc (aka CPSR aka CPSR_all) unlocks all bits (see below user mode restriction though). Restrictions: In non-privileged mode (user mode): only condition code bits of CPSR can be changed, control bits can't. Only the SPSR of the current mode can be accessed; In User and System modes no SPSR exists. The T-bit may not be changed; for THUMB/ARM switching use BX instruction. Unused Bits in CPSR are reserved for future use and should never be changed (except for unused bits in the flags field). Execution Time: 1S. Note: The A22i assembler recognizes MOV as alias for both MSR and MRS because it is practically not possible to remember whether MSR or MRS was the load or store opcode, and/or whether it does load to or from the Psr register.
Opcode Format Bit Expl.Multiply and Multiply-Accumulate (MUL,MLA) Restrictions: Rd may not be same as Rm. Rd,Rn,Rs,Rm may not be R15. Note: Only the lower 32bit of the internal 64bit result are stored in Rd, thus no sign/zero extension is required and MUL and MLA can be used for both signed and unsigned calculations! Execution Time: 1S+mI for MUL, and 1S+(m+1)I for MLA. Whereas 'm' depends on whether/how many most significant bits of Rs are all zero or all one. That is m=1 for Bit 31-8, m=2 for Bit 31-16, m=3 for Bit 31-24, and m=4 otherwise. Flags (if S=1): Z=zeroflag, N=signflag, C=destroyed (ARMv4 and below) or C=not affected (ARMv5 and up), V=not affected. MUL/MLA supported by ARMv2 and up. Multiply Long and Multiply-Accumulate Long (MULL, MLAL) Optionally supported, INCLUDED in ARMv3M, EXCLUDED in ARMv4xM/ARMv5xM. Restrictions: RdHi,RdLo,Rm must be different registers. R15 may not be used. Execution Time: 1S+(m+1)I for MULL, and 1S+(m+2)I for MLAL. Whereas 'm' depends on whether/how many most significant bits of Rs are "all zero" (UMULL/UMLAL) or "all zero or all one" (SMULL,SMLAL). That is m=1 for Bit 31-8, m=2 for Bit 31-16, m=3 for Bit 31-24, and m=4 otherwise. Flags (if S=1): Z=zeroflag, N=signflag, C=destroyed (ARMv4 and below) or C=not affected (ARMv5 and up), V=destroyed??? (ARMv4 and below???) or V=not affected (ARMv5 and up). Signed Halfword Multiply (SMLAxy,SMLAWy,SMLALxy,SMULxy,SMULWy) Supported by E variants of ARMv5 and up, ie. ARMv5TE(xP). Q-flag gets set on 32bit SMLAxy/SMLAWy addition overflows, however, the result is NOT truncated (as it'd be done with QADD opcodes). Q-flag is NOT affected on (rare) 64bit SMLALxy addition overflows. SMULxy/SMULWy cannot overflow, and thus leave Q-flag unchanged as well. NZCV-flags are not affected by Halfword multiplies. Execution Time: 1S+Interlock (SMULxy,SMLAxy,SMULWx,SMLAWx) Execution Time: 1S+1I+Interlock (SMLALxy)
Opcode Format Bit Expl.Instruction Formats for <Address> An expression which generates an address: <expression> ;an immediate used as addressPre-indexed addressing specification: [Rn] ;offset = zeroPost-indexed addressing specification: [Rn], <#{+/-}expression> ;offset = immediate
Whereas...<shift> immediate shift such like LSL#4, ROR#2, etc. (see ARM.5).Notes Shift amount 0 has special meaning, as described in ARM.5 Data Processing. When writing a word (32bit) to memory, the address should be word-aligned. When reading a byte from memory, upper 24 bits of Rd are zero-extended. LDR PC,<op> on ARMv4 leaves CPSR.T unchanged. LDR PC,<op> on ARMv5 sets CPSR.T to <op> Bit0, (1=Switch to Thumb). When reading a word from a halfword-aligned address (which is located in the middle between two word-aligned addresses), the lower 16bit of Rd will contain [address] ie. the addressed halfword, and the upper 16bit of Rd will contain [Rd-2] ie. more or less unwanted garbage. However, by isolating lower bits this may be used to read a halfword from memory. (Above applies to little endian mode, as used in GBA.) In a virtual memory based environment (ie. not in the GBA), aborts (ie. page faults) may take place during execution, if so, Rm and Rn should not specify the same register when post-indexing is used, as the abort-handler might have problems to reconstruct the original value of the register. Return: CPSR flags are not affected. Execution Time: For normal LDR: 1S+1N+1I. For LDR PC: 2S+2N+1I. For STR: 2N. PLD <Address> ;Prepare Cache for Load PLD must use following settings cond=1111b, P=1, B=1, W=0, L=1, Rd=1111b, the address may not use post-indexing, and may not use writeback, the opcode is encoded identical as LDRNVB R15,<Address>. PLD signalizes to the memory system that a specific memory address will be soon accessed, the memory system may use this hint to prepare caching/pipelining, aside from that, PLD does not have any affect to the program logic, and behaves identical as NOP. PLD supported by ARMv5TE only, not ARMv5, not ARMv5TExP.
Opcode Format Bit Expl.STRH,LDRH,LDRSB,LDRSH supported on ARMv4 and up. STRD/LDRD supported on ARMv5TE only, not ARMv5, not ARMv5TExP. STRD/LDRD: base writeback: Rn should not be same as R(d) or R(d+1). STRD: index register: Rm should not be same as R(d) or R(d+1). STRD/LDRD: Rd must be an even numbered register (R0,R2,R4,R6,R8,R10,R12). STRD/LDRD: Address must be double-word aligned (multiple of eight). Instruction Formats for <Address> An expression which generates an address: <expression> ;an immediate used as addressPre-indexed addressing specification: [Rn] ;offset = zeroPost-indexed addressing specification: [Rn], <#{+/-}expression> ;offset = immediate
Whereas... {!} exclamation mark ("!") indicates write-back (Rn will be updated).
Return: No Flags affected.Execution Time: For Normal LDR, 1S+1N+1I. For LDR PC, 2S+2N+1I. For STRH 2N.
Opcode Format Bit Expl.Addressing Modes {amod} The IB,IA,DB,DA suffixes directly specify the desired U and P bits: IB increment before ;P=1, U=1Alternately, FD,ED,FA,EA could be used, mostly to simplify mnemonics for stack transfers. ED empty stack, descending ;LDM: P=1, U=1 ;STM: P=0, U=0Ie. the following expressions are aliases for each other: STMFD=STMDB=PUSH STMED=STMDA STMFA=STMIB STMEA=STMIANote: The equivalent THUMB functions use fixed organization: PUSH/POP: full descending ;base register SP (R13)Descending is common stack organization as used in 80x86 and Z80 CPUs, SP is decremented when pushing/storing data, and incremented when popping/loading data. When S Bit is set (S=1) If instruction is LDM and R15 is in the list: (Mode Changes) While R15 loaded, additionally: CPSR=SPSR_<current mode>Otherwise: (User bank transfer) Rlist is referring to User Bank Registers, R0-R15 (rather thanNotes The lowest Register in Rlist (R0 if its in the list) will be loaded/stored to/from the lowest memory address. The base address should be usually word-aligned. LDM Rn,...,PC on ARMv4 leaves CPSR.T unchanged. LDR Rn,...,PC on ARMv5 sets CPSR.T to <op> Bit0, (1=Switch to Thumb). Return: No Flags affected. Execution Time: For normal LDM, nS+1N+1I. For LDM PC, (n+1)S+2N+1I. For STM (n-1)S+2N. Where n is the number of words transferred. Strange Effects on Invalid Rlist's Empty Rlist: R15 loaded/stored (ARMv4 only), and Rb=Rb+/-40h (ARMv4-v5). Writeback with Rb included in Rlist: Store OLD base if Rb is FIRST entry in Rlist, otherwise store NEW base (STM/ARMv4), always store OLD base (STM/ARMv5), no writeback (LDM/ARMv4), writeback if Rb is "the ONLY register, or NOT the LAST register" in Rlist (LDM/ARMv5).
Opcode Format Bit Expl.SWP/SWPB supported by ARMv2a and up. Swap works properly including if Rm and Rn specify the same register. R15 may not be used for either Rn,Rd,Rm. (Rn=R15 would be MRS opcode). Upper bits of Rd are zero-expanded when using Byte quantity. For info about byte and word data memory addressing, read LDR and STR opcode description. Execution Time: 1S+2N+1I. That is, 2N data cycles, 1S code cycle, plus 1I.
Opcode Format Bit Expl.SWI supposed for calls to the operating system - Enter Supervisor mode (SVC) in ARM state. BKPT intended for debugging - enters Abort mode in ARM state via Prefetch Abort vector. Execution SWI/BKPT: R14_svc=PC+4 R14_abt=PC+4 ;save return addressExecution Time: 2S+1N Interpreting the Comment Field: The immediate parameter is ignored by the processor, the user interrupt handler may read-out this number by examining the lower 24bit of the 32bit opcode opcode at [R14_svc-4]. In case that your program executes SWI's from inside of THUMB mode also: Your SWI handler must then examine the T Bit SPSR_svc in order to determine whether it's been a THUMB SWI - if so, examining the lower 8bit of the 16bit opcode opcode at [R14_svc-2]. For Returning from SWI use this instruction: MOVS PC,R14That instructions does both restoring PC and CPSR, ie. PC=R14_svc, and CPSR=SPRS_svc. Nesting SWIs: SPSR_svc and R14_svc should be saved on stack before either invoking nested SWIs, or (if the IRQ handler uses SWIs) before enabling IRQs.
Opcode Format Bit Expl.CDP supported by ARMv2 and up, CDP2 by ARMv5 and up. Execution time: 1S+bI, b=number of cycles in coprocessor busy-wait loop. Return: No flags affected, no ARM-registers used/modified. For details refer to original ARM docs, irrelevant in GBA because no coprocessor exists.
Opcode Format Bit Expl.LDC/STC supported by ARMv2 and up, LDC2/STC2 by ARMv5 and up. Execution time: (n-1)S+2N+bI, n=number of words transferred. For details refer to original ARM docs, irrelevant in GBA because no coprocessor exists.
Opcode Format Bit Expl.MCR/MRC supported by ARMv2 and up, MCR2/MRC2 by ARMv5 and up. A22i syntax allows to use MOV with Rd specified as first (dest), or last (source) operand. Native MCR/MRC syntax uses Rd as middle operand, <cp> can be ommited if <cp> is zero. When using MCR with R15: Coprocessor will receive a data value of PC+12. When using MRC with R15: Bit 31-28 of data are copied to Bit 31-28 of CPSR (ie. N,Z,C,V flags), other data bits are ignored, CPSR Bit 27-0 are not affected, R15 (PC) is not affected. Execution time: 1S+bI+1C for MCR, 1S+(b+1)I+1C for MRC. Return: For MRC only: Either R0-R14 modified, or flags affected (see above). For details refer to original ARM docs. The opcodes irrelevant for GBA/NDS7 because no coprocessor exists (except for a dummy CP14 unit). However, NDS9 includes a working CP15 unit. ARM CP14 ICEbreaker Debug Communications Channel ARM CP15 System Control Coprocessor
Opcode Format Bit Expl.Supported by ARMv5TE only, not ARMv5, not ARMv5TExP.
Opcode Format Bit Expl.No assembler mnemonic exists, following bitstreams are (not) reserved. cond011xxxxxxxxxxxxxxxxxxxx1xxxx - reserved for future use (except below).Execution time: 2S+1I+1N.
Opcode Format Bit Expl.CLZ supported by ARMv5 and up. Execution time: 1S. Return: No Flags affected. Rd=0..32.
Opcode Format Bit Expl.Supported by E variants of ARMv5 and up, ie. ARMv5TE(xP). Execution time: 1S+Interlock. Results truncated to signed 32bit range in case of overflows, with the Q-flag being set (and being left unchanged otherwise). NZCV flags are not affected. Note: Rn*2 is internally processed first, and may get truncated - even if the final result would fit into range.
The 26bit Memory Interface was used by ARMv1 and ARMv2. The 32bit interface is used by ARMv3 and newer, however, 26bit backward compatibility was included in all ARMv3 (except ARMv3G), and optionally in some non-T variants of ARMv4. Format of R15 in 26bit Mode (Program Counter Register) Bit Name Expl.Branches with +/-32M range wrap the PC register, and can reach all 64M memory. Reading from R15 If R15 is specified in bit16-19 of an opcode, then NZCVIF and M0,1 are masked (zero), otherwise the full 32bits are used. Writing to R15 Data Processing opcodes with S=1, and LDM opcodes with PSR=1 can write to all 32bits in R15 (in 26bit mode, that is allowed even in user mode, though it does then affect only NZCF, not the write protected IFMM bits ???), other opcodes which write to R15 will modify only the program counter bits. Also, special CMP/CMN/TST/TEQ{P} opcodes can be used to write to the PSR bits in R15 without modifying the PC bits. Exceptions SWIs, Reset, Data/Prefetch Aborts and Undefined instructions enter Supervisor mode. Interrupts enter IRQ and FIQ mode. Additionally, a special 26bit Address Exception exists, which enters Supervisor mode on accesses to memory addresses>=64M as follows: R14_svc = PC ($+8, including old PSR bits)to continue at the fault location, return by SUBS PC,LR,8. 26bit Backwards Compatibility on 32bit ARMv3 and up CPSR M4=0 = 26bit mode (with USR,FIQ,IRQ,SVC modes in M1,M0) 32bit CPUs with 26bit compatibility mode can be configured to switch into 32bit mode when encountering exceptions.
ARM Pseudo Instructions nop mov r0,r0All above opcodes may be made conditional by specifying a {cond} field. THUMB Pseudo Instructions nop mov r8,r8A22i Directives org adr assume following code from this address onA22i Alias Directives (for compatibility with other assemblers) align .align 4 code16 .thumbAlias Conditions, Opcodes, Operands hs cs ;condition higher or same = carry setA22i Numeric Formats & Dialects Type Normal AliasNote: The default numeric format can be changed by the .radix directive (usually 10=decimal). For example, with radix 16, values like "85" and "0101b" are treated as hexadecimal numbers (in that case, decimal and binary numbers can be still defined with prefixes &d and &b). A22i Numeric Operators Priority Prio Operator AliasesOperators of same priority are processed from left to right. Boolean operators (priority 4) return 1=TRUE, 0=FALSE. A22i Nocash Syntax Even though A22i does recognize the official ARM syntax, it's also allowing to use friendly code: mov r0,0ffh ;no C64-style "#", and no C-style "0x" required[A22i is the no$gba debug version's built-in source code assembler.]
The ICEbreaker aka EmbeddedICE module may be found in ARM7TDMI and possibly also in other ARM processors. The main functionality of the module relies on external inputs (BREAKPT signal, etc.) being controlled by external debugging hardware. At software side, ICEbreaker contains a Debug Communications Channel (again to access external hardware), which can be accessed as coprocessor 14 via following opcodes: MRC{cond} P14,0,Rd,C0,C0,0 ;Read Debug Comms Control Register
The Control register consists of Bit31-28=ICEbreaker version (0001b for ARM7TDMI), Bit27-2=Not specified, Bit0/Bit1=Data Read/Write Status Flags.The NDS7 and GBA allow to access CP14 (unlike as for CP0..CP13 & CP15, access to CP14 doesn't generate any exceptions), however, the ICEbreaker module appears to be disabled (or completely unimplemented), any reads from P14,0,Rd,C0,C0,0 through P14,7,Rd,C15,C15,7 are simply returning the prefetched opcode value from [$+8]. ICEbreaker might be eventually used and enabled in Nintendo's hardware debuggers, although external breakpoints are reportedly implemented via /FIQ input rather than via ICEbreaker hardware. The NDS9 doesn't include a CP14 unit (or it is fully disabled), any attempts to access it are causing invalid instruction exceptions.
ARM CP15 Overview ARM CP15 ID Codes ARM CP15 Control Register ARM CP15 Memory Managment Unit (MMU) ARM CP15 Protection Unit (PU) ARM CP15 Cache Control ARM CP15 Tightly Coupled Memory (TCM) ARM CP15 Misc
CP15 In many ARM CPUs, particulary such with memory control facilities, coprocessor number 15 (CP15) is used as built-in System Control Coprocessor. CPUs without memory control functions typically do include a CP15 at all, in that case even an attempt to read the Main ID register will cause an Undefined Instruction exception. CP15 Opcodes CP15 can be accessed via MCR and MRC opcodes, with Pn=P15, and <cpopc>=0. MCR{cond} P15,0,Rd,Cn,Cm,<cp> ;move from ARM to CP15
Rd can be any ARM register in range R0-R14, R15 should not be used with P15.Cn,Cm,<cp> are used to select a CP15 register, eg. C0,C0,0 = Main ID Register. Other coprocessor opcodes (CDP, LDC, STC) cannot be used with P15. CP15 Register List Register Expl.Data/Unified Registers Some Cache/PU/TCM registers are declared as "Data/Unified". That registers are used for Data accesses in case that the CPU contains separate Data and Instruction registers, otherwise the registers are used for both (unified) Data and Instruction accesses.
C0,C0,0 - Main ID Register (R) 12-15 ARM Era (0=Pre-ARM7, 7=ARM7, other=Post-ARM7)Post-ARM7 Processors 0-3 Revision NumberARM7 Processors 0-3 Revision NumberPre-ARM7 Processors 0-3 Revision NumberNote: On the NDS9, this register is 41059461h. NDS7 and GBA don't have CP15s. C0,C0,1 - Cache Type Register (R) 0-11 Instruction Cache (bits 0-1=len, 2=m, 3-5=assoc, 6-8=size, 9-11=zero)The 12bit Instruction/Data values are decoded as shown below, Cache Absent = (ASSOC=0 and M=1) ;in that case overriding belowFor Unified cache (Bit 24=0), Instruction and Data values are identical. C0,C0,2 - Tightly Coupled Memory (TCM) Size Register (R) 0-1 Reserved (0)C0,C0,3..7 - Reserved (R) Unused/Reserved registers, containing the same value as C0,C0,0.
C1,C0,0 - Control Register (R/W, or R=Fixed) 0 MMU/PU Enable (0=Disable, 1=Enable) (Fixed 0 if none)Various bits in this register may be read-only (fixed 0 if unsupported, or fixed 1 if always activated). On the NDS bit0,2,7,12..19 are R/W, Bit3..6 are always set, all other bits are always zero.
Function of some registers depends on whether the CPU contains a MMU or PU. MMU handles virtual addressing tables. C2,Cm,Op2 MMU Translation Table BaseThe GBA, and Nintendo DS do not have a MMU.
Protection Unit can be enabled in Bit0 of C1,C0,0 (Control Register). C2,C0,0 - Cachability Bits for Data/Unified Protection Region (R/W) C2,C0,1 - Cachability Bits for Instruction Protection Region (if any) (R/W) 0-7 Cachable (C) bits for region 0-7C3,C0,0 - Write-Bufferability Bits for Data Protection Regions (R/W) 0-7 Bufferable (B) bits for region 0-7Instruction fetches are, obviously, always read-operations. So, there are no write-bufferability bits for Instruction Protection Regions. C5,C0,0 - Access Permission Data/Unified Protection Region (R/W) C5,C0,1 - Access Permission Instruction Protection Region (if any) (R/W) C5,C0,2 - Extended Access Permission Data/Unified Protection Region (R/W) C5,C0,3 - Extended Access Permission Instruction Protection Region (if any) (R/W/W) For C5,C0,0 and C5,C0,1: 0-15 Access Permission (AP) bits for region 0-7 (Bits 0-1=AP0, 2-3=AP1, etc)For C5,C0,2 and C5,C0,3 (Extended): 0-31 Access Permission (AP) bits for region 0-7 (Bits 0-3=AP0, 4-7=AP1, etc)The possible AP settings (0-3 for C5,C0,0..1, or 0-15 for C5,C0,2..3) are: AP Privileged UserSettings 5,6 only for Extended Registers, settings 4,7..15 are Reserved. C6,C0..C7,0 - Protection Unit Data/Unified Region 0..7 (R/W) C6,C0..C7,1 - Protection Unit Instruction Region 0..7 (R/W) if any 0 Protection Region Enable (0=Disable, 1=Enable)Overlapping Regions are allowed, Region 7 is having highest priority, region 0 lowest priority. Background Region Additionally, any memory areas outside of the eight Protection Regions are handled as Background Region, this region has neither Read nor Write access. Unified Region Note On the NDS, the Region registers are unified (C6,C0..C7,1 are read/write-able mirrors of C6,C0..C7,0). Netherless, the Cachabilty and Permission registers are NOT unified (separate registers exists for code and data settings).
Cache enabled/controlled by Bit 2,3,12,14 in Control Register. Cache type detected in Cache Type Register. C7,C0..C15,0..7 - Cache Commands (W) Write-only Cache Command Register. Cm,Op2 operands used to select a specific command, with parameter value in Rd. Cn,Cm,Op2 Rd ARM9 CommandParameter values (Rd) formats: 0 Not used, should be zeroC9,C0,0 - Data Cache Lockdown C9,C0,1 - Instruction Cache Lockdown (Width (W) of index field depends on cache ASSOCIATIVETY.) Format A: 0..(31-W) Reserved/zeroFormat B: 0..(W-1) Lockdown Block IndexCache/Write-buffer should not be enabled for the whole 4GB memory area, high-speed TCM memory doesn't require caching, and caching would have fatal results on I/O ports. So, cache can be used only in combination with the Protection Unit, which allows to enable/disable caching in specified regions. Note ARMv5 instruction set supports a Cache Prepare for Load opcode (PLD), see ARM.9: Single Data Transfer (LDR, STR, PLD)
TCM is high-speed memory, directly contained in the ARM CPU core. TCM and DMA TCM doesn't use the ARM bus. A minor disadvantage is that TCM cannot be accessed by DMA. However, the main advantage is that, when using TCM, the CPU can be kept running without any waitstates even while the bus is used for DMA transfers. Operation during DMA works only if all code/data is located in TCM, waitstates are generated if any code/data outside TCM is accessed; in worst case (if there are no gaps in the DMA) then the CPU is halted until the DMA finishes. TCM and DMA and IRQ No idea if/how IRQs are handled during DMA? Eventually (unlikely) code in TCM is kept executed until DMA finishes (ie. until the IRQ vector can be accessed. Eventually the IRQ vector is instantly accessed (causing to halt the CPU until DMA finishes). In both cases: Assuming that IRQs are enabled, and that the IRQ vector and/or IRQ handler are located outside TCM. Separate Instruction (ITCM) and Data (DTCM) Memory DTCM can be used only for Data accesses, typically used for stacks and other frequently accessed data. ITCM is primarily intended for instruction accesses, but it can be also used for Data accesses (among others allowing to copy code to ITCM), however, performance isn't optimal when simultaneously accessing ITCM for code and data (such like opcodes in ITCM that use literal pool values in ITCM). TCM Enable, TCM Load Mode CP15 Control Register allows to enable ITCM and DTCM, and to switch ITCM/DTCM into Load Mode. In Load Mode (when TCM is enabled), TCM becomes write-only; this allows to read data from source addresses in main memory, and to write data to destination addresses in TCM by using the same addresses; useful for initializing TCM with overlapping source/dest addresses; Load mode works with all Load/Store opcodes, it does NOT work with SWP/SWPB opcodes. TCM Physical Size can be detected in 3rd ID Code Register. (C0,C0,2) C9,C1,0 - Data TCM Size/Base (R/W) C9,C1,1 - Instruction TCM Size/Base (R/W) 0 Reserved (0)The Virtual size settings should be normally same as the Physical sizes (see C0,C0,2). However, smaller sizes are allowed (using only the 1st some KB), as well as bigger sizes (TCM area is then filled with mirrors of physical TCM). The ITCM region base may be fixed (read-only), for example, on the NDS, ITCM base is always 00000000h, nethertheless the virtual size may be changed (allowing to mirror ITCM to higher addresses). If DTCM and ITCM do overlap, then ITCM appears to have priority. TCM and PU TCM can be used without Protection Unit. When the protection unit is enabled, TCM is controlled by the PU just like normal memory, the PU should provide R/W Access Permission for TCM regions; cache and write-buffer are not required for high-speed TCM (so both should be disabled for TCM regions).
C13,C0,0 - Process ID for Fast Context Switch Extension (FCSE) (R/W) 0-24 Reserved/zeroThe FCSE allows different processes (each assembled with ORG 0) to be located at virtual addresses in the 1st 32MB area. The FCSE splits the total 4GB address space into blocks of 32MB, accesses to Block(0) are redirected to Block(PID): IF addr<32M then addr=addr+PID*32MThe CPU-to-Memory address handling is shown below: 1. CPU outputs a virtual address (VA)The FCSE allows limited virtual addressing even if no MMU is present. If the MMU is present, then either the FCSE and/or the MMU can be used for virtual addressing; the advantage of using the FCSE (a single write to C13,C0,0) is less overload; using the MMU for the same purpose would require to change virtual address translation table in memory, and to flush the cache. The NDS doesn't have a FCSE (the FCSE register is read-only, always zero). C13,C0,1 - Trace Process ID (R/W) C13,C1,1 - Trace Process ID (Mirror) (R/W) This value is output to ETMPROCID pins (if any), allowing to notify external hardware about the currently executed process within multi-tasking programs. 0-31 Process IDC13,C1,1 is a mirror of C13,C0,1 (for compatibility with other ARM processors). Both registers are read/write-able on NDS9, but there are no external pin-outs. <cpopc> Unlike for all other CP15 registers, the <cpopc> operand of the MRC/MCR opcodes isn't always zero for below registers, so below registers are using "cpopc,Cn,Cm,op2" notation (instead of the normal "Cn,Cm,op2" notation). Built-In-Self-Test (BIST) Allows to test internal memory (ie. TCM, Cache Memory, and Cache TAGs). The tests are filling (and verifying) the selected memory region thrice (once with the fillvalue, then with the inverted fillvalue, and then again with the fillvalue). The BIST functions are intended for diagnostics purposes only, not for use in normal program code (ARM doesn't guarantee future processors to have backwards compatible BIST functions). 0,C15,C0,1 - BIST TAG Control Register (R/W) 1,C15,C0,1 - BIST TCM Control Register (R/W) 2,C15,C0,1 - BIST Cache Control Register (R/W) 0-15 Data Control (see below)The above 16bit control values are: 0 Start bit (Write: 1=Start) (Read: 1=Busy)Size and Pause are not supported in all implementations. Caution: While and as long as the Enable bit is set, the corresponding memory region(s) will be disabled. Eg. when testing <either> DTCM <and/or> ITCM, <both> DTCM <and> ITCM are forcefully disabled in C1,C0,0 (Control Register), after the test the software must first clear the BIST enable bit, and then restore DTCM/ITCM bits in C1,C0,0. And of course, the content of the tested memory region must be restored when needed. 0,C15,C0,2 - BIST Instruction TAG Address (R/W) 1,C15,C0,2 - BIST Instruction TCM Address (R/W) 2,C15,C0,2 - BIST Instruction Cache Address (R/W) 0,C15,C0,6 - BIST Data TAG Address (R/W) 1,C15,C0,6 - BIST Data TCM Address (R/W) 2,C15,C0,6 - BIST Data Cache Address (R/W) 0-31 Word-aligned Destination Address within Memory Block (eg. within ITCM)On the NDS9, bit0-1, and bit21-31 are always zero. 0,C15,C0,3 - BIST Instruction TAG Fillvalue (R/W) 1,C15,C0,3 - BIST Instruction TCM Fillvalue (R/W) 2,C15,C0,3 - BIST Instruction Cache Fillvalue (R/W) 0,C15,C0,7 - BIST Data TAG Fillvalue (R/W) 1,C15,C0,7 - BIST Data TCM Fillvalue (R/W) 2,C15,C0,7 - BIST Data Cache Fillvalue (R/W) 0-31 Fillvalue for BISTAfter BIST, the selected memory region is filled by that value. That is, on the NDS9 at least, all words will be filled with the SAME value (ie. NOT with increasing or randomly generated numbers). 0,C15,C0,0 - Cache Debug Test State Register (R/W) 0-8 Reserved (zero)3,C15,C0,0 - Cache Debug Index Register (R/W) 0..1 Reserved (zero)3,C15,C0,1 - Cache Debug Instruction TAG (R/W) 3,C15,C0,2 - Cache Debug Data TAG (R/W) 3,C15,C0,3 - Cache Debug Instruction Cache (R/W) 3,C15,C0,4 - Cache Debug Data Cache (R/W) 0..1 Set
Instruction Cycle Summary Instruction Cycles Additional
---------------------------------------------------------------------
Data Processing 1S +1S+1N if R15 loaded, +1I if SHIFT(Rs)
MSR,MRS 1S
LDR 1S+1N+1I +1S+1N if R15 loaded
STR 2N
LDM nS+1N+1I +1S+1N if R15 loaded
STM (n-1)S+2N
SWP 1S+2N+1I
BL (THUMB) 3S+1N
B,BL 2S+1N
SWI,trap 2S+1N
MUL 1S+ml
MLA 1S+(m+1)I
MULL 1S+(m+1)I
MLAL 1S+(m+2)I
CDP 1S+bI
LDC,STC (n-1)S+2N+bI
MCR 1N+bI+1C
MRC 1S+(b+1)I+1C
{cond} false 1S
ARM9: Q{D}ADD/SUB 1S+Interlock.
Execution Time: 1S+Interlock (SMULxy,SMLAxy,SMULWx,SMLAWx)Execution Time: 1S+1I+Interlock (SMLALxy) Whereas, n = number of words transferredAbove 'trap' is meant to be the execution time for exceptions. And '{cond} false' is meant to be the execution time for conditional instructions which haven't been actually executed because the condition has been false. The separate meaning of the N,S,I,C cycles is: N - Non-sequential cycle Requests a transfer to/from an address which is NOT related to the address used in the previous cycle. (Called 1st Access in GBA language). The execution time for 1N is 1 clock cycle (plus non-sequential access waitstates). S - Sequential cycle Requests a transfer to/from an address which is located directly after the address used in the previous cycle. Ie. for 16bit or 32bit accesses at incrementing addresses, the first access is Non-sequential, the following accesses are sequential. (Called 2nd Access in GBA language). The execution time for 1S is 1 clock cycle (plus sequential access waitstates). I - Internal Cycle CPU is just too busy, not even requesting a memory transfer for now. The execution time for 1I is 1 clock cycle (without any waitstates). C - Coprocessor Cycle The CPU uses the data bus to communicate with the coprocessor (if any), but no memory transfers are requested. Memory Waitstates Ideally, memory may be accessed free of waitstates (1N and 1S are then equal to 1 clock cycle each). However, a memory system may generate waitstates for several reasons: The memory may be just too slow. Memory is currently accessed by DMA, eg. sound, video, memory transfers, etc. Or when data is squeezed through a 16bit data bus (in that special case, 32bit access may have more waitstates than 8bit and 16bit accesses). Also, the memory system may separate between S and N cycles (if so, S cycles would be typically faster than N cycles). Memory Waitstates for Different Memory Areas Different memory areas (eg. ROM and RAM) may have different waitstates. When executing code in one area which accesses data in another area, then the S+N cycles must be split into code and data accesses: 1N is used for data access, plus (n-1)S for LDM/STM, the remaining S+N are code access. If an instruction jumps to a different memory area, then all code cycles for that opcode are having waitstate characteristics of the NEW memory area (except Thumb BL which still executes 1S in OLD area).
Version Numbers ARM CPUs are distributed by name ARM#, and are described as ARMv# in specifications, whereas "#" is NOT the same than "v#", for example, ARM7TDMI is ARMv4TM. That is so confusing, that ARM didn't even attempt to clarify the relationship between the various "#" and "v#" values. Version Variants Suffixes like "M" (long multiply), "T" (Thumb support), "E" (Enhanced DSP) indicate presence of special features, additionally to the standard instruction set of a given version, or, when preceded by an "x", indicate the absence of that features. ARMv1 aka ARM1 Some sort of a beta version, according to ARM never been used in any commercial products. ARMv2 and up MUL,MLA CDP,LDC,MCR,MRC,STC SWP/SWPB (ARMv2a and up only) Two new FIQ registers ARMv3 and up MRS,MSR opcodes (instead CMP/CMN/TST/TEQ{P} opcodes) CPSR,SPSR registers (instead PSR bits in R15) Removed never condition, cond=NV no longer valid 32bit addressing (instead 26bit addressing in older versions) 26bit addressing backwards comptibility mode (except v3G) Abt and Und modes (instead handling aborts/undefined in Svc mode) SMLAL,SMULL,UMLAL,UMULL (optionally, INCLUDED in v3M, EXCLUDED in v4xM/v5xM) ARMv4 aka ARM7 and up LDRH,LDRSB,LDRSH,STRH Sys mode (privileged user mode) BX (only ARMv4T, and any ARMv5 or ARMv5T and up) THUMB code (only T variants, ie. ARMv4T, ARMv5T) ARMv5 aka ARM9 and up BKPT,BLX,CLZ (BKPT,BLX also in THUMB mode) LDM/LDR/POP PC with mode switch (POP PC also in THUMB mode) CDP2,LDC2,MCR2,MRC2,STC2 (new coprocessor opcodes) C-flag unchanged by MUL (instead undefined flag value) changed instruction cycle timings / interlock ??? or not ??? QADD,QDADD,QDSUB,QSUB opcodes, CPSR.Q flag (v5TE and V5TExP only) SMLAxy,SMLALxy,SMLAWy,SMULxy,SMULWy (v5TE and V5TExP only) LDRD,STRD,PLD,MCRR,MRRC (v5TE only, not v5, not v5TExP) ARMv6 No public specifications available. A Milestone in Computer History Original ARMv2 has been used in the relative rare and expensive Archimedes deluxe home computers in the late eighties, the Archimedes has caught a lot of attention, particularly for being the first home computer that used a BIOS being programmed in BASIC language - which has been a absolutely revolutionary decadency at that time. Inspired, programmers all over the world have successfully developed even slower and much more inefficient programming languages, which are nowadays consequently used by nearly all ARM programmers, and by most non-ARM programmers as well.
This present document is an attempt to supply a brief ARM7TDMI reference, hopefully including all information which is relevant for programmers. Some details that I have treated as meaningless for GBA programming aren't included - such like Big Endian format, and Virtual Memory data aborts, and most of the chapters listed below. Have a look at the complete data sheet (URL see below) for more detailed verbose information about ARM7TDMI instructions. That document also includes: - Signal Description Pins of the original CPU, probably other for GBA.- Memory Interface Optional virtual memory circuits, etc. not for GBA.- Coprocessor Interface As far as I know, none such in GBA.- Debug Interface For external hardware-based debugging.- ICEBreaker Module For external hardware-based debugging also.- Instruction Cycle Operations Detailed: What happens during each cycle of each instruction.- DC Parameters (Power supply) - AC Parameters (Signal timings) The official ARM7TDMI data sheet can be downloaded from ARMs webpage, http://www.arm.com/Documentation/UserMans/PDF/ARM7TDMI.htmlBe prepared for bloated PDF Format, approx 1.3 MB, about 200 pages. |