GBATEK
Gameboy Advance / Nintendo DS - Technical Info - Extracted from no$gba version 2.6a

[ GBA | DS | CPU ]

 DS Reference

Overview
DS Technical Data
DS I/O Maps
DS Memory Maps

Hardware Programming
DS Memory Control
DS Video
DS 3D Video
DS Sound
DS System and Built-in Peripherals
DS Cartridges,Encryption,Firmware
DS Xboo
DS Wireless Communications

Other
DS Backwards-compatible GBA-Mode
BIOS Functions
External Connectors


 DS Technical Data < ^

Processors
  1x ARM946E-S 32bit RISC CPU, 66MHz (NDS9 video) (not used in GBA mode)
1x ARM7TDMI 32bit RISC CPU, 33MHz (NDS7 sound) (16MHz in GBA mode)
Internal Memory
  4096KB Main RAM (8192KB in debug version)
96KB WRAM (64K mapped to NDS7, plus 32K mappable to NDS7 or NDS9)
60KB TCM/Cache (TCM: 16K Data, 32K Code) (Cache: 4K Data, 8K Code)
656KB VRAM (allocateable as BG/OBJ/2D/3D/Palette/Texture/WRAM memory)
4KB OAM/PAL (2K OBJ Attribute Memory, 2K Standard Palette RAM)
248KB Internal 3D Memory (104K Polygon RAM, 144K Vertex RAM)
?KB Matrix Stack, 48 scanline cache
8KB Wifi RAM
256KB Firmware FLASH (512KB in iQue variant, with chinese charset)
36KB BIOS ROM (4K NDS9, 16K NDS7, 16K GBA)
Video
  2x LCD screens (each 256x192 pixel, 3 inch, 18bit color depth, backlight)
2x 2D video engines (extended variants of the GBA's video controller)
1x 3D video engine (can be assigned to upper or lower screen)
1x video capture (for effects, or for forwarding 3D to the 2nd 2D engine)
Sound
  16 sound channels (16x PCM8/PCM16/IMA-ADPCM, 6x PSG-Wave, 2x PSG-Noise)
2 sound capture units (for echo effects, etc.)
Output: Two built-in stereo speakers, and headphones socket
Input: One built-in microphone, and microphone socket
Controls
  Gamepad      4 Direction Keys, 8 Buttons
Touchscreen (on lower LCD screen)
Communication Ports
  Wifi IEEE802.11b
Specials
  Built-in Real Time Clock
Power Managment Device
Hardware divide and square root functions
CP15 System Control Coprocessor (cache, tcm, pu, bist, etc.)
External Memory
  NDS Slot (for NDS games) (encrypted 8bit data bus, and serial 1bit bus)
GBA Slot (for NDS expansions, or for GBA games) (but not for DMG/CGB games)
Manufactured Cartridges
  ROM: 16MB, 32MB, or 64MB
EEPROM/FLASH/FRAM: 0.5KB, 8KB, 64KB, 256KB, or 512KB
Can be booted from
  NDS Cartridge (NDS mode)
Firmware FLASH (NDS mode) (eg. by patching firmware via ds-xboo cable)
Wifi (NDS mode)
GBA Cartridge (GBA mode) (without DMG/CGB support) (without SIO support)
Power Supply
  Built-in rechargeable Lithium ion battery, 3.7V 1000mAh (DS-Lite)
External Supply: 5.2V DC
-----------------------------------------------------------------------------------
_____________________________________ _____________________________________
| _____________________ | | _____________________ |
| | | | | | | |
| | 3" TFT SCREEN | | | | 3" TFT SCREEN | |
| | 256x192pix 61x46mm | | | ... | 256x192pix 61x46mm | ... |
| | BACKLIGHT | | | ... | BACKLIGHT | ... |
| ::::: | | ::::: | | | DS LITE | |
| ::::: |_____________________| ::::: | | |_____________________| |
_| _ ______ _ |_ |L__ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ___R|
|L|_______| |________| |_| |_______|R| | _ |_____________MIC____________|LEDS|
|_______ _____________________ _______| | _ _____________________ |
| PWR | | | |SEL STA| | _| |_ | | X |
| _ | | 3" TFT SCREEN | | | ||_ _|| 3" TFT SCREEN | Y A |
| _| |_ | | 256x192pix 61x46mm | | X | | |_| | 256x192pix 61x46mm | B |
||_ _|| | BACKLIGHT | | Y A | | | BACKLIGHT | |
| |_| | | TOUCH SCREEN | | B | | | TOUCH SCREEN |oSTART |
| | |_____________________| | | | |_____________________|oSELECT|
|_______| NintendoDS |_______| |_____________________________________|
| MIC LEDS | VOL SLOT2(GBA) MIC/PHONES
|_________________________________________|
VOL SLOT2(GBA) MIC/PHONES

-----------------------------------------------------------------------------------
Nintendo DS (Dual Screen) Notes
New handheld with two screens, backwards compatible with GBA games, it is NOT backwards compatible with older 8bit games (mono/color gameboys) though..
Also, the DS has no link port, so that GBA games will thus work only in single player mode, link-port accessoires like printers cannot be used, and most unfortunately multiboot won't work (trying to press Select+Start at powerup will just lock up the DS).

iQue Notes
iQue is a brand name used by Nintendo in China, iQue GBA and iQue DS are essentially same as Nintendo GBA and Nintendo DS.
The iQue DS contains a larger firmware chip (the charset additionally contains about 6700 simplified chinese characters), the bootmenu still allows to select (only) six languages (japanese has been replaced by chinese). The iQue DS can play normal international NDS games, plus chinese dedicated games. The latter ones won't work on normal NDS consoles (that, reportedly simply due to a firmware-version check contained in chinese dedicated games, aside from that check, the games should be fully compatible with NDS consoles).

NDS-Lite
Slightly smaller than the original NDS, coming in a more decently elegant case. The LCDs are much more colorful (and thus not backwards compatible with any older NDS or GBA games), and the LCDs support wider viewing angles. Slightly different power managment device (with selectable backlight brightness, new external power source flag, lost audio amplifier mute flag). Slightly different Wifi controller (different chip ID, different dirt effects when accessing invalid wifi ports and unused wifi memory regions, different behaviour on GAPDISP registers, RF/BB chips replaced by a single chip). Slightly different touch screen controller (with new unused input, and slightly different powerdown bits).

Notice
NDS9 means the ARM9 processor and its memory and I/O ports in NDS mode
NDS7 means the ARM7 processor and its memory and I/O ports in NDS mode
GBA means the ARM7 processor and its memory and I/O ports in GBA mode

The two Processors
Most game code is usually executed on the ARM9 processor (in fact, Nintendo reportedly doesn't allow developers use the ARM7 processor, except by predefined API functions, anyways, even with the most likely inefficient API code, most of the ARM7's 33MHz horsepower is left unused).
The ARM9's 66MHz "horsepower" is a different tale - it seems Nintendo thought that a 33MHz processor would be too "slow" for 3D games, and so they (tried to) badge an additional CPU to the original GBA hardware.
However, the real 66MHz can be used only with cache and tcm, all other memory and I/O accesses are delayed to the 33MHz bus clock, that'd be still quite fast, but, there seems to be a hardware glitch that adds 3 waitcycles to all nonsequential accesses at the NDS9 side, which effectively drops its bus clock to about 8MHz, making it ways slower than the 33MHz NDS7 processor, it's even slower than the original 16MHz GBA processor.
Altogether, with the bugged 66MHz, and the unused 33MHz, Nintendo could have reached almost the same power when staying with the GBA's 16MHz processor :-)
Although, when properly using cache/tcm, then the 66MHz processor <can> be very fast, still, the NDS should have worked as well with a single processor, though using only an ARM9 might cause a lot of compatibilty problems with GBA games, so there's at least one reason for keeping the ARM7 included.


 DS I/O Maps < ^

ARM9 I/O Map:

Display Engine A
  4000000h  4    2D Engine A - DISPCNT - LCD Control (Read/Write)
4000004h 2 2D Engine A+B - DISPSTAT - General LCD Status (Read/Write)
4000006h 2 2D Engine A+B - VCOUNT - Vertical Counter (Read only)
4000008h 50h 2D Engine A (same registers as GBA, some changed bits)
4000060h 2 DISP3DCNT - 3D Display Control Register (R/W)
4000064h 4 DISPCAPCNT - Display Capture Control Register (R/W)
4000068h 4 DISP_MMEM_FIFO - Main Memory Display FIFO (R?/W)
400006Ch 2 2D Engine A - MASTER_BRIGHT - Master Brightness Up/Down
DMA, Timers, and Keypad
  40000B0h  30h  DMA Channel 0..3
40000E0h 10h DMA FILL Registers for Channel 0..3
4000100h 10h Timers 0..3
4000130h 2 KEYINPUT
4000132h 2 KEYCNT
IPC/ROM
  4000180h  2  IPCSYNC - IPC Synchronize Register (R/W)
4000184h 2 IPCFIFOCNT - IPC Fifo Control Register (R/W)
4000188h 4 IPCFIFOSEND - IPC Send Fifo (W)
40001A0h 2 AUXSPICNT - Gamecard ROM and SPI Control
40001A2h 2 AUXSPIDATA - Gamecard SPI Bus Data/Strobe
40001A4h 4 Gamecard bus timing/control
40001A8h 8 Gamecard bus 8-byte command out
40001B0h 4 Gamecard Encryption Seed 0 Lower 32bit
40001B4h 4 Gamecard Encryption Seed 1 Lower 32bit
40001B8h 2 Gamecard Encryption Seed 0 Upper 7bit (bit7-15 unused)
40001BAh 2 Gamecard Encryption Seed 1 Upper 7bit (bit7-15 unused)
Memory and IRQ Control
  4000204h  2  EXMEMCNT - External Memory Control (R/W)
4000208h 2 IME - Interrupt Master Enable (R/W)
4000210h 4 IE - Interrupt Enable (R/W)
4000214h 4 IF - Interrupt Request Flags (R/W)
4000240h 1 VRAMCNT_A - VRAM-A (128K) Bank Control (W)
4000241h 1 VRAMCNT_B - VRAM-B (128K) Bank Control (W)
4000242h 1 VRAMCNT_C - VRAM-C (128K) Bank Control (W)
4000243h 1 VRAMCNT_D - VRAM-D (128K) Bank Control (W)
4000244h 1 VRAMCNT_E - VRAM-E (64K) Bank Control (W)
4000245h 1 VRAMCNT_F - VRAM-F (16K) Bank Control (W)
4000246h 1 VRAMCNT_G - VRAM-G (16K) Bank Control (W)
4000247h 1 WRAMCNT - WRAM Bank Control (W)
4000248h 1 VRAMCNT_H - VRAM-H (32K) Bank Control (W)
4000249h 1 VRAMCNT_I - VRAM-I (16K) Bank Control (W)
Maths
  4000280h  2  DIVCNT - Division Control (R/W)
4000290h 8 DIV_NUMER - Division Numerator (R/W)
4000298h 8 DIV_DENOM - Division Denominator (R/W)
40002A0h 8 DIV_RESULT - Division Quotient (=Numer/Denom) (R)
40002A8h 8 DIVREM_RESULT - Division Remainder (=Numer MOD Denom) (R)
40002B0h 2 SQRTCNT - Square Root Control (R/W)
40002B4h 4 SQRT_RESULT - Square Root Result (R)
40002B8h 8 SQRT_PARAM - Square Root Parameter Input (R/W)
4000300h 4 POSTFLG - Undoc
4000304h 2 POWCNT1 - Graphics Power Control Register (R/W)
3D Display Engine
  4000320h..6A3h
Display Engine B
  4001000h  4    2D Engine B - DISPCNT - LCD Control (Read/Write)
4001008h 50h 2D Engine B (same registers as GBA, some changed bits)
400106Ch 2 2D Engine B - MASTER_BRIGHT - 16bit - Brightness Up/Down
IPC/ROM
  4100000h  4    IPCFIFORECV - IPC Receive Fifo (R)
4100010h 4 Gamecard bus 4-byte data in, for manual or dma read
Hardcoded RAM Addresses for Exception Handling
  27FFD9Ch   ..  NDS9 Debug Stacktop / Debug Vector (0=None)
DTCM+3FF8h 4 NDS9 IRQ Check Bits (hardcoded RAM address)
DTCM+3FFCh 4 NDS9 IRQ Handler (hardcoded RAM address)
Main Memory Control
  27FFFFEh  2    Main Memory Control
Further Memory Control Registers
ARM CP15 System Control Coprocessor


ARM7 I/O Map:
  4000004h  2   DISPSTAT
4000006h 2 VCOUNT
40000B0h 30h DMA Channels 0..3
4000100h 10h Timers 0..3
4000120h 4 Debug SIODATA32
4000128h 4 Debug SIOCNT
4000130h 2 keyinput
4000132h 2 keycnt
4000134h 2 Debug RCNT
4000136h 2 EXTKEYIN
4000138h 1 RTC Realtime Clock Bus
4000180h 2 IPCSYNC - IPC Synchronize Register (R/W)
4000184h 2 IPCFIFOCNT - IPC Fifo Control Register (R/W)
4000188h 4 IPCFIFOSEND - IPC Send Fifo (W)
40001A0h 2 AUXSPICNT - Gamecard ROM and SPI Control
40001A2h 2 AUXSPIDATA - Gamecard SPI Bus Data/Strobe
40001A4h 4 Gamecard bus timing/control
40001A8h 8 Gamecard bus 8-byte command out
40001B0h 4 Gamecard Encryption Seed 0 Lower 32bit
40001B4h 4 Gamecard Encryption Seed 1 Lower 32bit
40001B8h 2 Gamecard Encryption Seed 0 Upper 7bit (bit7-15 unused)
40001BAh 2 Gamecard Encryption Seed 1 Upper 7bit (bit7-15 unused)
40001C0h 2 SPI bus Control (Firmware, Touchscreen, Powerman)
40001C2h 2 SPI bus Data
Memory and IRQ Control
  4000204h  2   EXMEMSTAT - External Memory Status
4000206h 2 WIFIWAITCNT
4000208h 4 IME
4000210h 4 IE
4000214h 4 IF
4000240h 1 VRAMSTAT - VRAM-C,D Bank Status (R)
4000241h 1 WRAMSTAT - WRAM Bank Status (R)
4000300h 1 POSTFLG
4000301h 1 HALTCNT (different bits than on GBA) (plus NOP delay)
4000304h 2 POWCNT2 Sound/Wifi Power Control Register (R/W)
4000308h 4 BIOSPROT - Bios-data-read-protection address
Sound Registers
  4000400h 100h Sound Channel 0..15 (10h bytes each)
40004x0h 4 SOUNDxCNT - Sound Channel X Control Register (R/W)
40004x4h 4 SOUNDxSAD - Sound Channel X Data Source Register (W)
40004x8h 2 SOUNDxTMR - Sound Channel X Timer Register (W)
40004xAh 2 SOUNDxPNT - Sound Channel X Loopstart Register (W)
40004xCh 4 SOUNDxLEN - Sound Channel X Length Register (W)
4000500h 2 SOUNDCNT - Sound Control Register (R/W)
4000504h 2 SOUNDBIAS - Sound Bias Register (R/W)
4000508h 1 SNDCAP0CNT - Sound Capture 0 Control Register (R/W)
4000509h 1 SNDCAP1CNT - Sound Capture 1 Control Register (R/W)
4000510h 4 SNDCAP0DAD - Sound Capture 0 Destination Address (R/W)
4000514h 2 SNDCAP0LEN - Sound Capture 0 Length (W)
4000518h 4 SNDCAP1DAD - Sound Capture 1 Destination Address (R/W)
400051Ch 2 SNDCAP1LEN - Sound Capture 1 Length (W)
IPC/ROM
  4100000h  4   IPCFIFORECV - IPC Receive Fifo (R)
4100010h 4 Gamecard bus 4-byte data in, for manual or dma read
WLAN Registers
  4800000h  ..  Wifi WS0 Region (32K) (Wifi Ports, and 8K Wifi RAM)
4808000h .. Wifi WS1 Region (32K) (mirror of above, other waitstates)
Hardcoded RAM Addresses for Exception Handling
  380FFDCh  ..  NDS7 Debug Stacktop / Debug Vector (0=None)
380FFF8h 4 NDS7 IRQ Check Bits (hardcoded RAM address)
380FFFCh 4 NDS7 IRQ Handler (hardcoded RAM address)

 DS Memory Maps < ^

NDS9 Memory Map
  00000000h  Instruction TCM (32KB) (not moveable) (mirror-able to 1000000h)
0xxxx000h Data TCM (16KB) (moveable)
02000000h Main Memory (4MB)
03000000h Shared WRAM (0KB, 16KB, or 32KB can be allocated to ARM9)
04000000h ARM9-I/O Ports
05000000h Standard Palettes (2KB) (Engine A BG/OBJ, Engine B BG/OBJ)
06000000h VRAM - Engine A, BG VRAM (max 512KB)
06200000h VRAM - Engine B, BG VRAM (max 128KB)
06400000h VRAM - Engine A, OBJ VRAM (max 256KB)
06600000h VRAM - Engine B, OBJ VRAM (max 128KB)
06800000h VRAM - "LCDC"-allocated (max 656KB)
07000000h OAM (2KB) (Engine A, Engine B)
08000000h GBA Slot ROM (max. 32MB)
0A000000h GBA Slot RAM (max. 64KB)
FFFF0000h ARM9-BIOS (32KB) (only 3K used)
The ARM9 Exception Vectors are located at FFFF0000h. The IRQ handler redirects to [DTCM+3FFCh].

NDS7 Memory Map
  00000000h  ARM7-BIOS (16KB)
02000000h Main Memory (4MB)
03000000h Shared WRAM (0KB, 16KB, or 32KB can be allocated to ARM7)
03800000h ARM7-WRAM (64KB)
04000000h ARM7-I/O Ports
04800000h Wireless Communications Wait State 0 (8KB RAM at 4804000h)
04808000h Wireless Communications Wait State 1 (I/O Ports at 4808000h)
06000000h VRAM allocated as Work RAM to ARM7 (max. 256K)
08000000h GBA Slot ROM (max. 32MB)
0A000000h GBA Slot RAM (max. 64KB)
The ARM7 Exception Vectors are located at 00000000h. The IRQ handler redirects to [3FFFFFCh aka 380FFFCh].

Further Memory (not mapped to ARM9/ARM7 bus)
  3D Engine Polygon RAM (52KBx2)
3D Engine Vertex RAM (72KBx2)
Firmware (256KB) (built-in serial flash memory)
GBA-BIOS (16KB) (not used in NDS mode)
NDS Slot ROM (serial 8bit-bus, max. 4GB with default protocol)
NDS Slot FLASH/EEPROM/FRAM (serial 1bit-bus)
Shared-RAM
Even though Shared WRAM begins at 3000000h, programs are commonly using mirrors at 37F8000h (both ARM9 and ARM7). At the ARM7-side, this allows to use 32K Shared WRAM and 64K ARM7-WRAM as a continous 96K RAM block.

Undefined I/O Ports
On the NDS (at the ARM9-side at least) undefined I/O ports are always zero.

Undefined Memory Regions
16MB blocks that do not contain any defined memory regions (or that contain only mapped TCM regions) are typically completely undefined.
16MB blocks that do contain valid memory regions are typically containing mirrors of that memory in the unused upper part of the 16MB area (only exceptions are TCM and BIOS which are not mirrored).


 DS Memory Control < ^

Memory Control
DS Memory Control - Cache and TCM
DS Memory Control - Cartridges and Main RAM
DS Memory Control - WRAM
DS Memory Control - VRAM
DS Memory Control - BIOS

Memory Access Time
DS Memory Timings


 DS Memory Control - Cache and TCM < ^

TCM and Cache are controlled by the System Control Coprocessor,
ARM CP15 System Control Coprocessor

The specifications for the NDS9 are:

Tightly Coupled Memory (TCM)
  ITCM 32K, base=00000000h (fixed, not move-able)
DTCM 16K, base=moveable (default base=27C0000h)
Note: Although ITCM is NOT moveable, the NDS Firmware configures the ITCM size to 32MB, and so, produces ITCM mirrors at 0..1FFFFFFh. Furthermore, the PU can be used to lock/unlock memory in that region. That trick allows to move ITCM anywhere within the lower 32MB of memory.

Cache
  Data Cache 4KB, Instruction Cache 8KB
4-way set associative method
Cache line 8 words (32 bytes)
Read-allocate method (ie. writes are not allocating cache lines)
Round-robin and Pseudo-random replacement algorithms selectable
Cache Lockdown, Instruction Prefetch, Data Preload
Data write-through and write-back modes selectable
Protection Unit (PU)
Recommended/default settings are:
  Region  Name            Address   Size   Cache WBuf Code Data
- Background 00000000h 4GB - - - -
0 I/O and VRAM 04000000h 64MB - - R/W R/W
1 Main Memory 02000000h 4MB On On R/W R/W
2 ARM7-dedicated 027C0000h 256KB - - - -
3 GBA Slot 08000000h 128MB - - - R/W
4 DTCM 027C0000h 16KB - - - R/W
5 ITCM 01000000h 32KB - - R/W R/W
6 BIOS FFFF0000h 32KB On - R R
7 Shared Work 027FF000h 4KB - - - R/W
Notes: In Nintendo's hardware-debugger, Main Memory is expanded to 8MB (for that reason, some addresses are at 27NN000h instead 23NN000h) (some of the extra memory is reserved for the debugger, some can be used for game development). Region 2 and 7 are not understood? GBA Slot should be max 32MB+64KB, rounded up to 64MB, no idea why it is 128MB? DTCM and ITCM do not use Cache and Write-Buffer because TCM is fast. Above settings do not allow to access Shared Memory at 37F8000h? Do not use cache/wbuf for I/O, doing so might suppress writes, and/or might read outdated values.
The main purpose of the Protection Unit is debugging, a major problem with GBA programs have been faulty accesses to memory address 00000000h and up (due to [base+offset] addressing with uninitialized (zero) base values). This problem has been fixed in the NDS, for the ARM9 processor at least, still there are various leaks: For example, the 64MB I/O and VRAM area contains only ca. 660KB valid addresses, and the ARM7 probably doesn't have a Protection Unit at all. Alltogether, the protection is better than in GBA, but it's still pretty crude compared with software debugging tools.
Region address/size are unified (same for code and data), however, cachabilty and access rights are non-unified (and may be separately defined for code and data).

Note: The NDS7 doesn't have any TCM, Cache, or CP15.


 DS Memory Control - Cartridges and Main RAM < ^

4000204h - NDS9 - EXMEMCNT - 16bit - External Memory Control (R/W)
4000204h - NDS7 - EXMEMSTAT - 16bit - External Memory Status (R/W..R)
  0-1   32-pin GBA Slot SRAM Access Time    (0-3 = 10, 8, 6, 18 cycles)
2-3 32-pin GBA Slot ROM 1st Access Time (0-3 = 10, 8, 6, 18 cycles)
4 32-pin GBA Slot ROM 2nd Access Time (0-1 = 6, 4 cycles)
5-6 32-pin GBA Slot PHI-pin out (0-3 = Low, 4.19MHz, 8.38MHz, 16.76MHz)
7 32-pin GBA Slot Access Rights (0=ARM9, 1=ARM7)
8-10 Not used (always zero)
11 17-pin NDS Slot Access Rights (0=ARM9, 1=ARM7)
12 Not used (always zero)
13 Not used (always set ?)
14 Main Memory Interface Mode Switch (0=Async/GBA/Reserved, 1=Synchronous)
15 Main Memory Access Priority (0=ARM9 Priority, 1=ARM7 Priority)
Bit0-6 can be changed by both NDS9 and NDS7, changing these bits affects the local EXMEM register only, not that of the other CPU.
Bit7-15 can be changed by NDS9 only, changing these bits affects both EXMEM registers, ie. both NDS9 and NDS7 can read the current NDS9 setting.
Bit14=0 is intended for GBA mode, however, writes to this bit appear to be ignored?
DS Main Memory Control


 DS Memory Control - WRAM < ^

4000247h - NDS9 - WRAMCNT - 8bit - WRAM Bank Control (R/W)
4000241h - NDS7 - WRAMSTAT - 8bit - WRAM Bank Status (R)
Should not be changed when using Nintendo's API.
  0-1   ARM9/ARM7 (0-3 = 32K/0K, 2nd 16K/1st 16K, 1st 16K/2nd 16K, 0K/32K)
2-7 Not used
The ARM9 WRAM area is 3000000h-3FFFFFFh (16MB range).
The ARM7 WRAM area is 3000000h-37FFFFFh (8MB range).
The allocated 16K or 32K are mirrored everywhere in the above areas.
De-allocation (0K) is a special case: At the ARM9-side, the WRAM area is then empty (containing undefined data). At the ARM7-side, the WRAM area is then containing mirrors of the 64KB ARM7-WRAM (the memory at 3800000h and up).


 DS Memory Control - VRAM < ^

4000240h - NDS7 - VRAMSTAT - 8bit - VRAM Bank Status (R)
  0     VRAM C enabled and allocated to NDS7  (0=No, 1=Yes)
1 VRAM D enabled and allocated to NDS7 (0=No, 1=Yes)
2-7 Not used (always zero)
The register indicates if VRAM C/D are allocated to NDS7 (as Work RAM), ie. if VRAMCNT_C/D are enabled (Bit7=1), with MST=2 (Bit0-2). However, it does not reflect the OFS value.

4000240h - NDS9 - VRAMCNT_A - 8bit - VRAM-A (128K) Bank Control (W)
4000241h - NDS9 - VRAMCNT_B - 8bit - VRAM-B (128K) Bank Control (W)
4000242h - NDS9 - VRAMCNT_C - 8bit - VRAM-C (128K) Bank Control (W)
4000243h - NDS9 - VRAMCNT_D - 8bit - VRAM-D (128K) Bank Control (W)
4000244h - NDS9 - VRAMCNT_E - 8bit - VRAM-E (64K) Bank Control (W)
4000245h - NDS9 - VRAMCNT_F - 8bit - VRAM-F (16K) Bank Control (W)
4000246h - NDS9 - VRAMCNT_G - 8bit - VRAM-G (16K) Bank Control (W)
4000248h - NDS9 - VRAMCNT_H - 8bit - VRAM-H (32K) Bank Control (W)
4000249h - NDS9 - VRAMCNT_I - 8bit - VRAM-I (16K) Bank Control (W)
  0-2   VRAM MST              ;Bit2 not used by VRAM-A,B,H,I
3-4 VRAM Offset (0-3) ;Offset not used by VRAM-E,H,I
5-6 Not used
7 VRAM Enable (0=Disable, 1=Enable)
There is a total of 656KB of VRAM in Blocks A-I.
Table below shows the possible configurations.
  VRAM    SIZE  MST  OFS   ARM9, Plain ARM9-CPU Access (so-called LCDC mode)
  A       128K  0    -     6800000h-681FFFFh
  B       128K  0    -     6820000h-683FFFFh
  C       128K  0    -     6840000h-685FFFFh
  D       128K  0    -     6860000h-687FFFFh
  E       64K   0    -     6880000h-688FFFFh
  F       16K   0    -     6890000h-6893FFFh
  G       16K   0    -     6894000h-6897FFFh
  H       32K   0    -     6898000h-689FFFFh
  I       16K   0    -     68A0000h-68A3FFFh
  VRAM    SIZE  MST  OFS   ARM9, 2D Graphics Engine A, BG-VRAM (max 512K)
  A,B,C,D 128K  1    0..3  6000000h+(20000h*OFS)
  E       64K   1    -     6000000h
  F,G     16K   1    0..3  6000000h+(4000h*OFS.0)+(10000h*OFS.1)
  VRAM    SIZE  MST  OFS   ARM9, 2D Graphics Engine A, OBJ-VRAM (max 256K)
  A,B     128K  2    0..1  6400000h+(20000h*OFS.0)  ;(OFS.1 must be zero)
  E       64K   2    -     6400000h
  F,G     16K   2    0..3  6400000h+(4000h*OFS.0)+(10000h*OFS.1)
  VRAM    SIZE  MST  OFS   2D Graphics Engine A, BG Extended Palette
  E       64K   4    -     Slot 0-3  ;only lower 32K used
  F,G     16K   4    0..1  Slot 0-1 (OFS=0), Slot 2-3 (OFS=1)
  VRAM    SIZE  MST  OFS   2D Graphics Engine A, OBJ Extended Palette
  F,G     16K   5    -     Slot 0  ;16K each (only lower 8K used)
  VRAM    SIZE  MST  OFS   Texture/Rear-plane Image
  A,B,C,D 128K  3    0..3  Slot OFS(0-3)   ;(Slot2-3: Texture, or Rear-plane)
  VRAM    SIZE  MST  OFS   Texture Palette
  E       64K   3    -     Slots 0-3                 ;OFS=don't care
  F,G     16K   3    0..3  Slot (OFS.0*1)+(OFS.1*4)  ;ie. Slot 0, 1, 4, or 5
  VRAM    SIZE  MST  OFS   ARM9, 2D Graphics Engine B, BG-VRAM (max 128K)
  C       128K  4    -     6200000h
  H       32K   1    -     6200000h
  I       16K   1    -     6208000h
  VRAM    SIZE  MST  OFS   ARM9, 2D Graphics Engine B, OBJ-VRAM (max 128K)
  D       128K  4    -     6600000h
  I       16K   2    -     6600000h
  VRAM    SIZE  MST  OFS   2D Graphics Engine B, BG Extended Palette
  H       32K   2    -     Slot 0-3
  VRAM    SIZE  MST  OFS   2D Graphics Engine B, OBJ Extended Palette
  I       16K   3    -     Slot 0  ;(only lower 8K used)
  VRAM    SIZE  MST  OFS   <ARM7>, Plain <ARM7>-CPU Access
  C,D     128K  2    0..1  6000000h+(20000h*OFS.0)  ;OFS.1 must be zero
Notes
In Plain-CPU modes, VRAM can be accessed only by the CPU (and by the Capture Unit, and by VRAM Display mode). In "Plain <ARM7>-CPU Access" mode, the VRAM blocks are allocated as Work RAM to the NDS7 CPU.
In BG/OBJ VRAM modes, VRAM can be accessed by the CPU at specified addresses, and by the display controller.
In Extended Palette and Texture Image/Palette modes, VRAM is not mapped to CPU address space, and can be accessed only by the display controller (so, to initialize or change the memory, it should be temporarily switched to Plain-CPU mode).
All VRAM (and Palette, and OAM) can be written to only in 16bit and 32bit units (STRH, STR opcodes), 8bit writes are ignored (by STRB opcode). The only exception is "Plain <ARM7>-CPU Access" mode: The ARM7 CPU can use STRB to write to VRAM (the reason for this special feature is that, in GBA mode, two 128K VRAM blocks are used to emulate the GBA's 256K Work RAM).

Other Video RAM
Aside from the map-able VRAM blocks, there are also some video-related memory regions at fixed addresses:
  5000000h Engine A Standard BG Palette (512 bytes)
5000200h Engine A Standard OBJ Palette (512 bytes)
5000400h Engine B Standard BG Palette (512 bytes)
5000600h Engine B Standard OBJ Palette (512 bytes)
7000000h Engine A OAM (1024 bytes)
7000400h Engine B OAM (1024 bytes)

 DS Memory Control - BIOS < ^

4000308h - NDS7 - BIOSPROT - Bios-data-read-protection address
Used to double-protect the first some KBytes of the NDS7 BIOS. The BIOS is split into two protection regions, one always active, one controlled by the BIOSPROT register. The overall idea is that only the BIOS can read from itself, any other attempts to read from that regions return FFh-bytes.
  Opcodes at...      Can read from      Expl.
0..[BIOSPROT]-1 0..3FFFh Double-protected (when BIOSPROT is set)
[BIOSPROT]..3FFFh [BIOSPROT]..3FFFh Normal-protected (always active)
The initial BIOSPROT setting on power-up is zero (disabled). Before starting the cartridge, the BIOS boot code sets the register to 1204h (actually 1205h, but the mis-aligned low-bit is ignored). Once when initialized, further writes to the register are ignored.

The double-protected region contains the exception vectors, some bytes of code, and the cartridge KEY1 encryption seed (about 4KBytes). As far as I know, it is impossible to unlock the memory once when it is locked, however, with some trickery, it is possible execute code before it gets locked. Also, the two THUMB opcodes at 05ECh can be used to read all memory at 0..3FFFh,
  05ECh  ldrb r3,[r3,12h]      ;requires incoming r3=src-12h
05EEh pop r2,r4,r6,r7,r15 ;requires dummy values & THUMB retadr on stack
Additionally most BIOS functions (eg. CpuSet), include a software-based protection which rejects source addresses in the BIOS area (the only exception is GetCRC16, though it still cannot bypass the BIOSPROT setting).

Note
The NDS9 BIOS doesn't include any software or hardware based read protection.


 DS Memory Timings < ^

System Clock
  Bus clock  = 33MHz (33.513982 MHz) (1FF61FEh Hertz)
NDS7 clock = 33MHz (same as bus clock)
NDS9 clock = 66MHz (internally twice bus clock; for cache/tcm)
Most timings in this document are specified for 33MHz clock (not for the 66MHz clock). Respectively, NDS9 timings are counted in "half" cycles.

Memory Access Times
Tables below show the different access times for code/data fetches on arm7/arm9 cpus, measured for sequential/nonsequential 32bit/16bit accesses.
  NDS7/CODE             NDS9/CODE
N32 S32 N16 S16 Bus N32 S32 N16 S16 Bus
9 2 8 1 16 9 9 4.5 4.5 16 Main RAM (read) (cache off)
1 1 1 1 32 4 4 2 2 32 WRAM,BIOS,I/O,OAM
2 2 1 1 16 5 5 2.5 2.5 16 VRAM,Palette RAM
16 12 10 6 16 19 19 9.5 9.5 16 GBA ROM (example 10,6 access)
- - - - - 0.5 0.5 0.5 0.5 32 TCM, Cache_Hit
- - - - - (--Load 8 words--) Cache_Miss
  NDS7/DATA             NDS9/DATA
N32 S32 N16 S16 Bus N32 S32 N16 S16 Bus
10 2 9 1 16 10 2 9 1 16 Main RAM (read) (cache off)
1 1 1 1 32 4 1 4 1 32 WRAM,BIOS,I/O,OAM
1? 2 1 1 16 5 2 4 1 16 VRAM,Palette RAM
15 12 9 6 16 19 12 13 6 16 GBA ROM (example 10,6 access)
9 10 9 10 8 13 10 13 10 8 GBA RAM (example 10 access)
- - - - - 0.5 0.5 0.5 - 32 TCM, Cache_Hit
- - - - - (--Load 8 words--) Cache_Miss
- - - - - 11 11 11 - 32 Cache_Miss (BIOS)
- - - - - 23 23 23 - 16 Cache_Miss (Main RAM)
All timings are counted in 33MHz units (so "half" cycles can occur on NDS9).
Note: 8bit data accesses have same timings than 16bit data.

*** DS Memory Timing Notes ***

The NDS timings are altogether pretty messed up, with different timings for CODE and DATA fetches, and different timings for NDS7 and NDS9...

NDS7/CODE
Timings for this region can be considered as "should be" timings.

NDS7/DATA
Quite the same as NDS7/CODE. Except that, nonsequently Main RAM accesses are 1 cycle slower, and more strange, nonsequential GBA Slot accesses are 1 cycle faster.

NDS9/CODE
This is the most messiest timing. An infamous PENALTY of 3 cycles is added to all nonsequential accesses (except cache, tcm, and main ram). And, all opcode fetches are forcefully made nonsequential 32bit (the NDS9 simply doesn't support fast sequential opcode fetches). That applies also for THUMB code (two 16bit opcodes are fetched by a single nonsequential 32bit access) (so the time per 16bit opcode is one half of the 32bit fetch) (unless a branch causes only one of the two 16bit opocdes to be executed, then that opcode will have the full 32bit access time).

NDS9/DATA
Allows both sequential and nonsequential access, and both 16bit and 32bit access, so it's faster than NDS9/CODE. Nethertheless, it's still having the 3 cycle PENALTY on nonsequential accesses. And, similar as NDS7/DATA, it's also adding 1 cycle to nonsequential Main RAM accesses.

*** More Timing Notes / Lots of unsorted Info ***

Actual CPU Performance
The 33MHz NDS7 is running more or less nicely at 33MHz. However, the so-called "66MHz" NDS9 is having <much> higher waitstates, and it's effective bus speed is barely about 8..16MHz, the only exception is code/data in cache/tcm, which is eventually reaching real 66MHz (that, assuming cache HITS, otherwise, in case of cache MISSES, the cached memory timing might even drop to 1.4MHz or so?).

*********************

ARM9 opcode fetches are always N32 + 3 waits.
  S16 and N16 do not exist (because thumb-double-fetching) (see there).
S32 becomes N32 (ie. the ARM9 does NOT support fast sequential timing).
That N32 is having same timing as normal N32 access on NDS7, plus 3 waits.
  Eg. an ARM9 N32 or S32 to 16bit bus will take: N16 + S16 + 3 waits.
Eg. an ARM9 N32 or S32 to 32bit bus will take: N32 + 3 waits.
Main Memory is ALWAYS having the nonsequential 3 wait PENALTY (even on ARM7).

*********************

ARM9 Data fetches however are allowed to use sequential timing, as well as raw 16bit accesses (which aren't forcefully expanded to slow 32bit accesses).
Nethertheless, the 3 wait PENALTY is added to any NONSEQUENTIAL accesses.
Only exceptions are cache and tcm which do not have that penalty.
 Eg. LDRH on 16bit-data-bus is N16+3waits.
Eg. LDR on 16bit-data-bus is N16+S16+3waits.
Eg. LDM on 16bit-data-bus is N16+(n*2-1)*S16+3waits.
Eventually, data fetches can take place parallel with opcode fetches.
 That is NOT true for LDM (works only for LDR/LDRB/LDRH).
That is NOT true for DATA in SAME memory region than CODE.
That is NOT true for DATA in ITCM (no matter if CODE is in ITCM).
*********************

NDS9 Busses
Unlike ARM7, the ARM9 has separate code and data busses, allowing it to perform code and data fetches simultaneously (provided that both are in different memory regions).
Normally, opcode execution times are calculated as "(codetime+datatime)", with the two busses, it can (ideally) be "MAX(codetime,datatime)", so the data access time may virtually take "NULL" clock cycles.
In practice, DTCM and Data Cache access can take NULL cycles (however, data access to ITCM can't).
When executing code in cache/itcm, data access to non-cache/tcm won't be any faster than with only one bus (as it's best, it could subtract 0.5 cycles from datatime, but, the access must be "aligned" to the bus-clock, so the "datatime-0.5" will be rounded back to the original "datatime").
When executing code in uncached main ram, and accessing data (elsewhere than in main memory, cache/tcm), then execution time is typically "codetime+datatime-2".

NDS9 Internal Cycles
Additionally to codetime+datatime, some opcodes include one or more internal cycles. Compared with ARM7, the behaviour of that internal cycles is slightly different on ARM9. First of, on the NDS9, the internal cycles are of course "half" cycles (ie. counted in 66MHz units, not in 33MHz units) (although they may get rounded to "full" cycles upon next memory access outside tcm/cache). And, the ARM9 is in some cases "skipping" the internal cycles, that often depending on whether or not the next opcode is using the result of the current opcode.
Another big difference is that the ARM9 has lost the fast-multiply feature for small numbers; in some cases that may result in faster execution, but may also result in slower execution (one workaround would be to manually replace MUL opcodes by the new ARM9 halfword multiply opcodes); the slowest case are MUL opcodes that do update flags (eg. MULS, MLAS, SMULLS, etc. in ARM mode, and all ALL multiply opcodes in THUMB mode).

NDS9 Thumb Code
In thumb mode, the NDS9 is fetching two 16bit opcodes by a single 32bit read. In case of 32bit bus, this reduces the amount of memory traffic and may result in faster execution time, of course that works only if the two opcodes are within a word-aligned region (eg. loops at word-aligned addresses will be faster than non-aligned loops). However, the double-opcode-fetching is also done on 16bit bus memory, including for un-neccessary fetches, such like opcodes after branch commands, so the feature may cause heavy slowdowns.

Main Memory
Reportedly, the main memory access times would be 5 cycles (nonsequential read), 4 cycles (nonsequential write), and 1 cycle (sequential read or write). Plus whatever termination cycles. Plus 3 cycles on nonsequential access to the last 2-bytes of a 32-byte block.
That's of course all wrong. Reads are much slower than 5 cycles. Not yet tested if writes are faster. And, I haven't been able to reproduce the 3 cycles on last 2-bytes effect, actually, it looks more as if that 3 cycles are accidently added to ALL nonsequential accesses, at ALL main memory addresses, and even to most OTHER memory regions... which might be the source of the PENALTY which occurs on VRAM/WRAM/OAM/Palette and I/O accesses.

DMA
In some cases DMA main memory read cycles are reportedly performed simultaneously with DMA write cycles to other memory.

NDS9
On the NDS9, all external memory access (and I/O) is delayed to bus clock (or actually MUCH slower due to the massive waitstates), so the full 66MHz can be used only internally in the NDS9 CPU core, ie. with cache and TCM.

Bus Clock
The exact bus clock is specified as 33.513982 MHz (1FF61FEh Hertz). However, on my own NDS, measured in relation to the RTC seconds IRQ, it appears more like 1FF6231h, that inaccuary of 1 cycle per 657138 cycles (about one second per week) on either oscillator, isn't too significant though.

GBA Slot
The access time for GBA slot can be configured via EXMEMCNT register.

VRAM Waitstates
Additionally, on NDS9, a one cycle wait can be added to VRAM accesses (when the video controller simultaneously accesses it) (that can be disabled by Forced Blank, see DISPCNT.Bit7). Moreover, additional VRAM waitstates occur when using the video capture function.
Note: VRAM being mapped to NDS7 is always free of additional waits.


 DS Video < ^

The NDS has two 2D Video Engines, each basically the same as in GBA, see
GBA LCD Video Controller

NDS Specific 2D Video Features
DS Video Stuff
DS Video BG Modes / Control
DS Video OBJs
DS Video Extended Palettes
DS Video Capture and Main Memory Display Mode
DS Video Display System Block Diagram

For Display Power Control (and Display Swap), and VRAM Allocation, see
DS Power Management
DS Memory Control - VRAM


 DS Video Stuff < ^

DS Display Dimensions / Timings
Dot clock = 5.585664 MHz (=33.513982 MHz / 6)
H-Timing: 256 dots visible, 99 dots blanking, 355 dots total (15.7343KHz)
V-Timing: 192 lines visible, 71 lines blanking, 263 lines total (59.8261 Hz)
The V-Blank cycle for the 3D Engine consists of the 23 lines, 191..213.
Screen size 62.5mm x 47.0mm (each) (256x192 pixels)
Vertical space between screens 22mm (equivalent to 90 pixels)

400006Ch - NDS9 - MASTER_BRIGHT - 16bit - Master Brightness Up/Down
  0-4   Factor used for 6bit R,G,B Intensities (0-16, values >16 same as 16)
Brightness up: New = Old + (63-Old) * Factor/16
Brightness down: New = Old - Old * Factor/16
5-13 Not used
14-15 Mode (0=Disable, 1=Up, 2=Down, 3=Reserved)
16-31 Not used
DISPSTAT/VCOUNT
The LY and LYC values are in range 0..262, so LY/LYC values have been expanded to 9bit values: LY = VCOUNT Bit 0..8, and LYC=DISPSTAT Bit8..15,7.
VCOUNT register is write-able, allowing to synchronize linked DS consoles.
For proper synchronization:
  write new LY values only in range of 202..212
write only while old LY values are in range of 202..212
DISPSTAT/VCOUNT supported by NDS9 (Engine A Ports, without separate Engine B Ports), and by NDS7 (allowing to synchronize NDS7 with display timings).
Similar as on GBA, the VBlank flag isn't set in the last line (ie. only in lines 192..261, but not in line 262).
Although the drawing time is only 1536 cycles (256*6), the NDS9 H-Blank flag is "0" for a total of 1606 cycles (and, for whatever reason, a bit longer, 1613 cycles in total, on NDS7).

VRAM Waitstates
The display controller performs VRAM-reads once every 6 clock cycles, a 1 cycle waitstate is generated if the CPU simultaneously accesses VRAM. With capture enabled, additionally VRAM-writes take place once every 6 cycles, so the total VRAM-read/write access rate is then once every 3 cycles.

DS Window Glitches
The DS counts scanlines in range 0..262 (0..106h), of which only the lower 8bit are compared with the WIN0V/WIN1V register settings. Respectively, Y1 coordinates 00h..06h will be triggered in scanlines 100h-106h by mistake. That means, the window gets activated within VBlank period, and will be active in scanline 0 and up (that is no problem with Y1=0, but Y1=1..6 will appear as if if Y1 would be 0). Workaround would be to disable the Window during VBlank, or to change Y1 during VBlank (to a value that does not occur during VBlank period, ie. 7..191).
Also, there's a problem to fit the 256 pixel horizontal screen resolution into 8bit values: X1=00h is treated as 0 (left-most), X2=00h is treated as 100h (right-most). However, the window is not displayed if X1=X2=00h; the window width can be max 255 pixels.

2D Engines
Includes two 2D Engines, called A and B. Both engines are accessed by the ARM9 processor, each using different memory and register addresses:
  Region______Engine A______________Engine B___________
I/O Ports 4000000h 4001000h
Palette 5000000h (1K) 5000400h (1K)
BG VRAM 6000000h (max 512K) 6200000h (max 128K)
OBJ VRAM 6400000h (max 256K) 6600000h (max 128K)
OAM 7000000h (1K) 7000400h (1K)
Engine A additionally supports 3D and large-screen 256-color Bitmaps, plus main-memory-display and vram-display modes, plus capture unit.

Viewing Angles
The LCD screens are best viewed at viewing angles of 90 degrees. Colors may appear distorted, and may even become invisible at other viewing angles.
When the console is handheld, both screens can be turned into preferred direction. When the console is settled on a table, only the upper screen can be turned, but the lower screen is stuck into horizontal position - which results rather bad visibility (unless the user moves his/her head directly above of it).

4000070h - NDS9 - TVOUTCNT - Unknown (W)
  Bit0-3  "COMMAND"  (?)
Bit4-7 "COMMAND2" (?)
Bit8-11 "COMMAND3" (?)
This register has been mentioned in an early I/O map from Nintendo, as far as I know, the register isn't used by any games/firmware/bios, not sure if it does really exist on release-version, or if it's been prototype stuff...?

DS-Lite Screens
The screens in the DS-Lite seem to allow a wider range of vertical angles.
The bad news is that the colors of the DS-Lite are (no surprise) not backwards compatible with older NDS and GBA displays. The good news is that Nintendo has finally reached near-CRT-quality (without blurred colors), so one could hope that they won't show up with more displays with other colors in future.
Don't know if there's an official/recommended way to detect DS-Lite displays (?) possible methods would be whatever values in Firmware header, or by functionality of Power Managment device, or (not too LCD-related) by Wifi Chip ID.


 DS Video BG Modes / Control < ^

4000000h - NDS9 - DISPCNT
  Bit  Engine Expl.
0-2 A+B BG Mode
3 A BG0 2D/3D Selection (instead CGB Mode) (0=2D, 1=3D)
4 A+B Tile OBJ Mapping (0=2D; max 32KB, 1=1D; max 32KB..256KB)
5 A+B Bitmap OBJ 2D-Dimension (0=128x512 dots, 1=256x256 dots)
6 A+B Bitmap OBJ Mapping (0=2D; max 128KB, 1=1D; max 128KB..256KB)
7-15 A+B Same as GBA
16-17 A+B Display Mode (Engine A: 0..3, Engine B: 0..1, GBA: Green Swap)
18-19 A VRAM block (0..3=VRAM A..D) (For Capture & above Display Mode=2)
20-21 A+B Tile OBJ 1D-Boundary (see Bit4)
22 A Bitmap OBJ 1D-Boundary (see Bit5-6)
23 A+B OBJ Processing during H-Blank (was located in Bit5 on GBA)
24-26 A Character Base (in 64K steps) (merged with 16K step in BGxCNT)
27-29 A Screen Base (in 64K steps) (merged with 2K step in BGxCNT)
30 A+B BG Extended Palettes (0=Disable, 1=Enable)
31 A+B OBJ Extended Palettes (0=Disable, 1=Enable)
BG Mode
Engine A BG Mode (DISPCNT LSBs) (0-6, 7=Reserved)
  Mode  BG0      BG1      BG2      BG3
0 Text/3D Text Text Text
1 Text/3D Text Text Affine
2 Text/3D Text Affine Affine
3 Text/3D Text Text Extended
4 Text/3D Text Affine Extended
5 Text/3D Text Extended Extended
6 3D - Large -
Of which, the "Extended" modes are sub-selected by BGxCNT bits:
  BGxCNT.Bit7 BGxCNT.Bit2 Extended Affine Mode Selection
0 CharBaseLsb rot/scal with 16bit bgmap entries (Text+Affine mixup)
1 0 rot/scal 256 color bitmap
1 1 rot/scal direct color bitmap
Engine B: Same as above, except that: Mode 6 is reserved (no Large screen bitmap), and BG0 is always Text (no 3D support).
Affine = formerly Rot/Scal mode (with 8bit BG Map entries)
Large Screen Bitmap = rot/scal 256 color bitmap (using all 512K of 2D VRAM)

Display Mode (DISPCNT.16-17):
  0  Display off (screen becomes white)
1 Graphics Display (normal BG and OBJ layers)
2 Engine A only: VRAM Display (Bitmap from block selected in DISPCNT.18-19)
3 Engine A only: Main Memory Display (Bitmap DMA transfer from Main RAM)
Mode 2-3 display a raw direct color bitmap (15bit RGB values, the upper bit in each halfword is unused), without any further BG,OBJ,3D layers, these modes are completely bypassing the 2D/3D engines as well as any 2D effects, however the Master Brightness effect can be applied to these modes. Mode 2 is particulary useful to display captured 2D/3D images (in that case it can indirectly use the 2D/3D engine).

BGxCNT
character base extended from bit2-3 to bit2-5 (bit4-5 formerly unused)
  engine A screen base: BGxCNT.bits*2K + DISPCNT.bits*64K
engine B screen base: BGxCNT.bits*2K + 0
engine A char base: BGxCNT.bits*16K + DISPCNT.bits*64K
engine B char base: BGxCNT.bits*16K + 0
char base is used only in tile/map modes (not bitmap modes)
screen base is used in tile/map modes,
screen base used in bitmap modes as BGxCNT.bits*16K, without DISPCNT.bits*64K
screen base however NOT used at all for Large screen bitmap mode
  bgcnt size  text     rotscal    bitmap   large bmp
0 256x256 128x128 128x128 512x1024
1 512x256 256x256 256x256 1024x512
2 256x512 512x512 512x256 -
3 512x512 1024x1024 512x512 -
bitmaps that require more than 128K VRAM are supported on engine A only.

For BGxCNT.Bit7 and BGxCNT.Bit2 in Extended Affine modes, see above BG Mode description (extended affine doesn't include 16-color modes, so color depth bit can be used for mode selection. Also, bitmap modes do not use charbase, so charbase.0 can be used for mode selection as well).

For BG0, BG1 only: bit13 selects extended palette slot
  (BG0: 0=Slot0, 1=Slot2, BG1: 0=Slot1, 1=Slot3)
Direct Color Bitmap BG, and Direct Color Bitmap OBJ
BG/OBJ Supports 32K colors (15bit RGB value) - so far same as GBAs BG.
However, the upper bit (Bit15) is used as Alpha flag. That is, Alpha=0=Transparent, Alpha=1=Normal (ie. on the NDS, Direct Color values 0..7FFFh are NOT displayed).

Unlike GBA bitmap modes, NDS bitmap modes are supporting the Area Overflow bit (BG2CNT and BG3CNT, Bit 13).


 DS Video OBJs < ^

DS OBJ Priority
The GBA has been assigning OBJ priority in respect to the 7bit OAM entry number, regardless of the OBJs 2bit BG-priority attribute (which allowed to specify invalid priority orders). That problem has been fixed in DS mode by combining the above two values into a 9bit priority value.

OBJ Tile Mapping (DISPCNT.4,20-21):
  Bit4  Bit20-21  Dimension Boundary Total ;Notes
0 x 2D 32 32K ;Same as GBA 2D Mapping
1 0 1D 32 32K ;Same as GBA 1D Mapping
1 1 1D 64 64K
1 2 1D 128 128K
1 3 1D 256 256K ;Engine B: 128K max
TileVramAddress = TileNumber * BoundaryValue
Even if the boundary gets changed, OBJs are kept composed of 8x8 tiles.

Bitmap OBJ Mapping (DISPCNT.6,5,22):
Bitmap OBJs are 15bit Direct Color data, plus 1bit Alpha flag (in bit15).
  Bit6 Bit5 Bit22 Dimension    Boundary   Total ;Notes
0 0 x 2D/128 dots 8x8 dots 128K ;Source Bitmap width 128 dots
0 1 x 2D/256 dots 8x8 dots 128K ;Source Bitmap width 256 dots
1 0 0 1D 128 bytes 128K ;Source Width = Target Width
1 0 1 1D 256 bytes 256K ;Engine A only
1 1 x Reserved
In 1D mapping mode, the Tile Number is simply multiplied by the boundary value.
  1D_BitmapVramAddress = TileNumber(0..3FFh) * BoundaryValue(128..256)
2D_BitmapVramAddress = (TileNo AND MaskX)*10h + (TileNo AND NOT MaskX)*80h
In 2D mode, the Tile Number is split into X and Y indices, the X index is located in the LSBs (ie. MaskX=0Fh, or MaskX=1Fh, depending on DISPCNT.5).

OBJ Attribute 0 and 2
Setting the OBJ Mode bits (Attr 0, Bit10-11) to a value of 3 has been prohibited in GBA, however, in NDS it selects the the new Bitmap OBJ mode; in that mode, the Color depth bit (Attr 0, Bit13) should be set to zero; also in that mode, the color bits (Attr 2, Bit 12-15) are used as Alpha-OAM value (instead of as palette setting).

OBJ Vertical Wrap
On the GBA, a large OBJ (with 64pix height, scaled into double-size region of 128pix height) located near the bottom of the screen has been wrapped to the top of the screen (and was NOT displayed at the bottom of the screen).
This problem has been "corrected" in the NDS (except in GBA mode), that is, on the NDS, the OBJ appears BOTH at the top and bottom of the screen. That isn't necessarily better - the advantage is that one can manually enable/disable the OBJ in the desired screen-half on IRQ level; that'd be required only if the wrapped portion is non-transparent.


 DS Video Extended Palettes < ^

Extended Palettes
When allocating extended palettes, the allocated memory is not mapped to the CPU bus, so the CPU can access extended palette only when temporarily de-allocating it.

Color 0 of all standard/extended palettes is transparent, color 0 of BG standard palette 0 is used as backdrop. extended palette memory must be allocated to VRAM.

BG Extended Palette enabled in DISPCNT Bit 30, when enabled,
 standard palette --> 16-color tiles (with 16bit bgmap entries) (text)
256-color tiles (with 8bit bgmap entries) (rot/scal)
256-color bitmaps
backdrop-color (color 0)
extended palette --> 256-color tiles (with 16bit bgmap entries)(text,rot/scal)
Allocated VRAM is split into 4 slots of 8K each (32K used in total), normally BG0..3 are using Slot 0..3, however BG0 and BG1 can be optionally changed to BG0=Slot2, and BG1=Slot3 via BG0CNT and BG1CNT.

OBJ Extended Palette enabled in DISPCNT Bit 31, when enabled,
 16 colors x 16 palettes --> standard palette memory (=256 colors)
256 colors x 16 palettes --> extended palette memory (=4096 colors)
Extended OBJ palette memory must be allocated to VRAM F, G, or I (which are 16K) of which only the first 8K are used for extended palettes (=1000h 16bit entries).


 DS Video Capture and Main Memory Display Mode < ^

4000064h - NDS9 - DISPCAPCNT - 32bit - Display Capture Control Register (R/W)
Capture is supported for Display Engine A only.
  0-4   EVA               (0..16 = Blending Factor for Source A)
5-7 Not used
8-12 EVB (0..16 = Blending Factor for Source B)
13-15 Not used
16-17 VRAM Write Block (0..3 = VRAM A..D) (VRAM must be allocated to LCDC)
18-19 VRAM Write Offset (0=00000h, 0=08000h, 0=10000h, 0=18000h)
20-21 Capture Size (0=128x128, 1=256x64, 2=256x128, 3=256x192 dots)
22-23 Not used
24 Source A (0=Graphics Screen BG+3D+OBJ, 1=3D Screen)
25 Source B (0=VRAM, 1=Main Memory Display FIFO)
26-27 VRAM Read Offset (0=00000h, 0=08000h, 0=10000h, 0=18000h)
28 Not used
29-30 Capture Source (0=Source A, 1=Source B, 2/3=Sources A+B blended)
31 Capture Enable (0=Disable/Ready, 1=Enable/Busy)
Notes:
VRAM Read Block (VRAM A..D) is selected in DISPCNT Bits 18-19.
VRAM Read Block can be (or must be ?) allocated to LCDC (MST=0).
VRAM Read Offset is ignored (zero) in VRAM Display Mode (DISPCNT.16-17).
VRAM Read/Write Offsets wrap to 00000h when exceeding 1FFFFh (max 128K).
Capture Sizes less than 256x192 capture the upper-left portion of the screen.
Blending factors EVA and EVB are used only if "Source A+B blended" selected.
After setting the Capture Enable bit, capture starts at next line 0, and the capture enable/busy bit is then automatically cleared (in line 192, regardless of the capture size).

Capture data is 15bit color depth (even when capturing 18bit 3D-images).
Capture A: Dest_Intensity = SrcA_Intensitity ; Dest_Alpha=SrcA_Alpha.
Capture B: Dest_Intensity = SrcB_Intensitity ; Dest_Alpha=SrcB_Alpha.
Capture A+B (blending):
 Dest_Intensity = (  (SrcA_Intensitity * SrcA_Alpha * EVA)
+ (SrcB_Intensitity * SrcB_Alpha * EVB) ) / 16
Dest_Alpha = (SrcA_Alpha AND (EVA>0)) OR (SrcB_Alpha AND EVB>0))
Capture provides a couple of interesting effects.
For example, 3D Engine output can be captured via source A (to LCDC-allocated VRAM), in the next frame, either Graphics Engine A or B can display the captured 3D image in VRAM image as BG2, BG3, or OBJ (from BG/OBJ-allocated VRAM); this method requires to switch between LCDC- and BG/OBJ-allocation.
Another example would be to capture Engine A output, the captured image can be displayed (via VRAM Display mode) in the following frames, simultaneously the new Engine A output can be captured, blended with the old captured image; in that mode moved objects will leave traces on the screen; this method works with a single LCDC-allocated VRAM block.
DS Video Display System Block Diagram

4000068h - NDS9 - DISP_MMEM_FIFO - 32bit - Main Memory Display FIFO (R?/W)
Intended to send 256x192 pixel 32K color bitmaps by DMA directly
 - to Screen A             (set DISPCNT to Main Memory Display mode), or
- to Display Capture unit (set DISPCAPCNT to Main Memory Source).
The FIFO can receive 4 words (8 pixels) at a time, each pixel is a 15bit RGB value (the upper bit, bit15, is unused).
Set DMA to Main Memory mode, 32bit transfer width, word count set to 4, destination address to DISP_MMEM_FIFO, source address must be in Main Memory.
Transfer starts at next frame.
Main Memory Display/Capture is supported for Display Engine A only.


 DS Video Display System Block Diagram < ^
             _____________               __________
VRAM A -->| 2D Graphics |--------OBJ->| |
VRAM B -->| Engine A |--------BG3->| Layering |
VRAM C -->| |--------BG2->| and |
VRAM D -->| |--------BG1->| Special |
VRAM E -->| | ___ | Effects |
VRAM F -->| |->|SEL| | | ______
VRAM G -->| - - - - - - | |BG0|-BG0->| |----+--->| |
| 3D Graphics |->|___| |__________| | |Select|
| Engine | | |Video |
|_____________|--------3D----------------+ | |Input |
_______ _______ ___ | | | |
| | | |<-----------|SEL|<-+ | |and |-->
| | | | _____ |A | | | |
VRAM A <--|Select | |Select | | |<-|___|<----+ |Master|
VRAM B <--|Capture|<---|Capture|<--|Blend| ___ |Bright|
VRAM C <--|Dest. | |Source | |_____|<-|SEL|<----+ |A |
VRAM D <--| | | | |B | | | |
|_______| |_______|<-----------|___|<-+ | | |
_______ | | | |
VRAM A -->|Select | | | | |
VRAM B -->|Display|--------------------------------+------>| |
VRAM C -->|VRAM | | | |
VRAM D -->|_______| _____________ | | |
|Main Memory | | | |
Main ------DMA---->|Display FIFO |------------------+--->|______|
Memory |_____________|
_____________ __________ ______
VRAM C -->| 2D Graphics |--------OBJ->| Layering | | |
VRAM D -->| Engine B |--------BG3->| and | |Master|
VRAM H -->| |--------BG2->| Special |-------->|Bright|-->
VRAM I -->| |--------BG1->| Effects | |B |
|_____________|--------BG0->|__________| |______|

 DS 3D Video < ^

DS 3D Overview
DS 3D I/O Map
DS 3D Display Control
DS 3D Geometry Commands
DS 3D Matrix Load/Multiply
DS 3D Matrix Types
DS 3D Matrix Stack
DS 3D Matrix Examples (Projection)
DS 3D Matrix Examples (Rotate/Scale/Translate)
DS 3D Matrix Examples (Maths Basics)
DS 3D Polygon Attributes
DS 3D Polygon Definitions by Vertices
DS 3D Polygon Light Parameters
DS 3D Shadow Polygons
DS 3D Texture Attributes
DS 3D Texture Formats
DS 3D Texture Coordinates
DS 3D Texture Blending
DS 3D Toon, Edge, Fog, Anti-Aliasing
DS 3D Status
DS 3D Tests
DS 3D Rear-Plane
DS 3D Final 2D Output

3D is more or less (about 92%) understood and described.


 DS 3D Overview < ^

The NDS 3D hardware consists of a Geometry Engine, and a Rendering Engine.

Geometry Engine (Precalculate coordinates & assign polygon attributes)
Geometry commands can be sent via Ports 4000440h and up (or alternately, written directly to Port 4000400h).
The commands include matrix and vector multiplications, the purpose is to rotate/scale/translate coordinates (vertices), the resulting coordinates are stored in Vertex RAM.
Moreover, it allows to assign attributes to the polygons and vertices, that includes vertex colors (or automatically calculated light colors), texture attributes, number of vertices per polygon (three or four), and a number of flags, these attributes are stored in Polygon RAM. Polygon RAM also contains pointers to the corresponding vertices in Vertex RAM.

Swap Buffers (Pass data from the Geometry Engine to the Rendering Engine)
The hardware includes two sets of Vertex/Polygon RAM, one used by the Geometry Engine, one by the Rendering Engine. The SwapBuffers command simply exchanges these buffers (so the new Geometry Data is passed to the Rendering Engine) (and the old buffer is emptied, so the Geometry engine can write new data to it). Additionally, the two parameter bits from the <previous> SwapBuffers command are copied to the Geometry Engine.
Data that is NOT swapped: SwapBuffers obviously can't swap Texture memory (so software must take care that Texture memory is kept mapped throughout rendering). Moreover, the rendering control registers (ports 4000060h, and 4000330h..40003BFh) are not swapped (so that values must be kept intact during rendering, too).

Rendering Engine (Display Output)
The Rendering Engine draws the various Polygons, and outputs them as BG0 layer to the 2D Video controller (which may then output them to the screen, or to the video capture unit). The Rendering part is done automatically by hardware, so the software has little influence on it.
Rendering is done scanline-by-scanline, so there's only a limited number of clock cycles per scanline, which is limiting the maximum number of polygons per scanline. However, due to the 48-line cache (see below), some scanlines are allowed to exceed that maximum.
Rendering starts 48 lines in advance (while still in the Vblank period) (and does then continue throughout the whole display period), the rendered data is written to a small cache that can hold up to 48 scanlines.

Scanline Cache vs Framebuffer
Note: There's only the 48-line cache (not a full 192-line framebuffer to store the whole rendered image). That is perfectly reasonable since animated data is normally drawn only once (so there would be no need to store it). That, assuming that the Geometry Engine presents new data every frame (otherwise, if the Geometry software is too slow, or if the image isn't animated, then the hardware is automatically rendering the same image again, and again).


 DS 3D I/O Map < ^

3D I/O Map
  Address  Siz Name            Expl.
Rendering Engine (per Frame settings) 4000060h 2 DISP3DCNT 3D Display Control Register (R/W) 4000320h 1 RDLINES_COUNT Rendered Line Count Register (R) 4000330h 10h EDGE_COLOR Edge Colors 0..7 (W) 4000340h 1 ALPHA_TEST_REF Alpha-Test Comparision Value (W) 4000350h 4 CLEAR_COLOR Clear Color Attribute Register (W) 4000354h 2 CLEAR_DEPTH Clear Depth Register (W) 4000356h 2 CLRIMAGE_OFFSET Rear-plane Bitmap Scroll Offsets (W) 4000358h 4 FOG_COLOR Fog Color (W) 400035Ch 2 FOG_OFFSET Fog Offset (W) 4000360h 20h FOG_TABLE Fog Density Table, 32 entries (W) 4000380h 40h TOON_TABLE Toon Table, 32 colors (W) Geometry Engine (per Polygon/Vertex settings) 4000400h 40h GXFIFO Geometry Command FIFO (W) 4000440h ... ... Geometry Command Ports (see below) 4000600h 4 GXSTAT Geometry Engine Status Register (R and R/W) 4000604h 4 RAM_COUNT Polygon List & Vertex RAM Count Register (R) 4000610h 2 DISP_1DOT_DEPTH 1-Dot Polygon Display Boundary Depth (W) 4000620h 10h POS_RESULT Position Test Results (R) 4000630h 6 VEC_RESULT Vector Test Results (R) 4000640h 40h CLIPMTX_RESULT Read Current Clip Coordinates Matrix (R) 4000680h 24h VECMTX_RESULT Read Current Directional Vector Matrix (R)
Geometry Commands (can be invoked by Port Address, or by Command ID)
Table shows Port Address, Command ID, Number of Parameters, and Clock Cycles.
  Address  Cmd Pa.Cy.
N/A 00h - - NOP - No Operation (for padding packed GXFIFO commands)
4000440h 10h 1 1 MTX_MODE - Set Matrix Mode (W)
4000444h 11h - 17 MTX_PUSH - Push Current Matrix on Stack (W)
4000448h 12h 1 36 MTX_POP - Pop Current Matrix from Stack (W)
400044Ch 13h 1 17 MTX_STORE - Store Current Matrix on Stack (W)
4000450h 14h 1 36 MTX_RESTORE - Restore Current Matrix from Stack (W)
4000454h 15h - 19 MTX_IDENTITY - Load Unit Matrix to Current Matrix (W)
4000458h 16h 16 34 MTX_LOAD_4x4 - Load 4x4 Matrix to Current Matrix (W)
400045Ch 17h 12 30 MTX_LOAD_4x3 - Load 4x3 Matrix to Current Matrix (W)
4000460h 18h 16 35* MTX_MULT_4x4 - Multiply Current Matrix by 4x4 Matrix (W)
4000464h 19h 12 31* MTX_MULT_4x3 - Multiply Current Matrix by 4x3 Matrix (W)
4000468h 1Ah 9 28* MTX_MULT_3x3 - Multiply Current Matrix by 3x3 Matrix (W)
400046Ch 1Bh 3 22 MTX_SCALE - Multiply Current Matrix by Scale Matrix (W)
4000470h 1Ch 3 22* MTX_TRANS - Mult. Curr. Matrix by Translation Matrix (W)
4000480h 20h 1 1 COLOR - Directly Set Vertex Color (W)
4000484h 21h 1 9* NORMAL - Set Normal Vector (W)
4000488h 22h 1 1 TEXCOORD - Set Texture Coordinates (W)
400048Ch 23h 2 9 VTX_16 - Set Vertex XYZ Coordinates (W)
4000490h 24h 1 8 VTX_10 - Set Vertex XYZ Coordinates (W)
4000494h 25h 1 8 VTX_XY - Set Vertex XY Coordinates (W)
4000498h 26h 1 8 VTX_XZ - Set Vertex XZ Coordinates (W)
400049Ch 27h 1 8 VTX_YZ - Set Vertex YZ Coordinates (W)
40004A0h 28h 1 8 VTX_DIFF - Set Relative Vertex Coordinates (W)
40004A4h 29h 1 1 POLYGON_ATTR - Set Polygon Attributes (W)
40004A8h 2Ah 1 1 TEXIMAGE_PARAM - Set Texture Parameters (W)
40004ACh 2Bh 1 1 PLTT_BASE - Set Texture Palette Base Address (W)
40004C0h 30h 1 4 DIF_AMB - MaterialColor0 - Diffuse/Ambient Reflect. (W)
40004C4h 31h 1 4 SPE_EMI - MaterialColor1 - Specular Ref. & Emission (W)
40004C8h 32h 1 6 LIGHT_VECTOR - Set Light's Directional Vector (W)
40004CCh 33h 1 1 LIGHT_COLOR - Set Light Color (W)
40004D0h 34h 32 32 SHININESS - Specular Reflection Shininess Table (W)
4000500h 40h 1 1 BEGIN_VTXS - Start of Vertex List (W)
4000504h 41h - 1 END_VTXS - End of Vertex List (W)
4000540h 50h 1 392 SWAP_BUFFERS - Swap Rendering Engine Buffer (W)
4000580h 60h 1 1 VIEWPORT - Set Viewport (W)
40005C0h 70h 3 103 BOX_TEST - Test if Cuboid Sits inside View Volume (W)
40005C4h 71h 2 9 POS_TEST - Set Position Coordinates for Test (W)
40005C8h 72h 1 5 VEC_TEST - Set Directional Vector for Test (W)
All cycle timings are counted in 33.51MHz units. NORMAL commands takes 9..12 cycles, depending on the number of enabled lights in PolyAttr (Huh, 9..12 (four timings) cycles for 0..4 (five settings) lights?) Total execution time of SwapBuffers is Duration until VBlank, plus 392 cycles.
In MTX_MODE=2 (Simultanous Set), MTX_MULT/TRANS take additional 30 cycles.


 DS 3D Display Control < ^

4000060h - DISP3DCNT - 3D Display Control Register (R/W)
  0     Texture Mapping     (0=Disable, 1=Enable)
1 PolygonAttr Shading (0=Toon Shading, 1=Highlight Shading)
2 Alpha-Test (0=Disable, 1=Enable) (see ALPHA_TEST_REF)
3 Alpha-Blending (0=Disable, 1=Enable) (see various Alpha values)
4 Anti-Aliasing (0=Disable, 1=Enable)
5 Edge-Marking (0=Disable, 1=Enable) (see EDGE_COLOR)
6 Fog Mode (0=Alpha and Color, 1=Only Alpha) (see FOG_COLOR)
7 Fog Master Enable (0=Disable, 1=Enable)
8-11 Fog Shift (0..10; SHR-Divider) (see FOG_OFFSET)
12 Color Buffer RDLINES Underflow (0=None, 1=Underflow/Acknowledge)
13 Polygon/Vertex RAM Overflow (0=None, 1=Overflow/Acknowledge)
14 Rear-Plane Mode (0=Blank, 1=Bitmap)
15-31 Not used
4000540h - Cmd 50h - SWAP_BUFFERS - Swap Rendering Engine Buffer (W)
SwapBuffers exchanges the two sets of Polygon/Vertex RAM buffers, that is, the newly defined polygons/vertices are passed to the rendering engine (and will be displayed in following frame(s)). The other buffer is emptied, and passed to the Geometry Engine (to be filled with new polygons/vertices by Geometry Commands).
  0     Translucent polygon Y-sorting (0=Auto-sort, 1=Manual-sort)
1 Depth Buffering (0=With Z-value, 1=With W-value)
(mode 1 does not function properly with orthogonal projections)
2-31 Not used
SwapBuffers isn't executed until next VBlank (Scanline 192) (the Geometry Engine is halted for that duration). SwapBuffers should not be issued within Begin/End. The two parameter bits of the SwapBuffers command are used for the following gxcommands (ie. not for the old gxcommands prior to SwapBuffers).
SwapBuffers does lock-up the 3D hardware if an incomplete polygon list has been defined (eg. a triangle with only 2 vertices). On lock-up, only 2D video is kept working, any wait-loops for GXSTAT.27 will hang the program. Once lock-up has occured, there seems to be no way to recover by software, not by sending the missing veric(es), and not even by pulsing POWCNT1.Bit2-3.

4000580h - Cmd 60h - VIEWPORT - Set Viewport (W)
  0-7   Screen/BG0 Coordinate X1 (0..255) (For Fullscreen: 0=Left-most)
8-15 Screen/BG0 Coordinate Y1 (0..191) (For Fullscreen: 0=Bottom-most)
16-23 Screen/BG0 Coordinate X2 (0..255) (For Fullscreen: 255=Right-most)
24-31 Screen/BG0 Coordinate Y2 (0..191) (For Fullscreen: 191=Top-most)
Coordinate 0,0 is the lower-left (unlike for 2D where it'd be upper-left).
The 3D view-volume (size as defined by the Projection Matrix) is automatically scaled to match into the Viewport area. Although polygon vertices are clipped to the view-volume, some vertices may still exceed to X2,Y1 (lower-right) boundary by one pixel, due to some sort of rounding errors. The Viewport settings don't affect the size or position of the 3D Rear-Plane. Viewport should not be issued within Begin/End.

4000610h - DISP_1DOT_DEPTH - 1-Dot Polygon Display Boundary Depth (W)
1-Dot Polygons are very small, or very distant polygons, which would be rendered as a single pixel on screen. Polygons with a depth value greater (more distant) than DISP_1DOT_DEPTH can be automatically hidden; in order to reduce memory consumption, or to reduce dirt on the screen.
  0-14  W-Coordinate (Unsigned, 12bit integer, 3bit fractional part)
15-31 Not used (0000h=Closest, 7FFFh=Most Distant)
The DISP_1DOT_DEPTH comparision can be enabled/disabled per polygon (via POLYGON_ATTR.Bit13), so "important" polygons can be displayed regardless of their size and distance.
Caution: Although DISP_1DOT_DEPTH is a Geometry Engine parameter, it is NOT routed through GXFIFO, ie. changes will take place immediately, and will affect all following polygons, including such that are still in GXFIFO. Workaround: ensure that GXFIFO is empty before changing this parameter.

4000340h - ALPHA_TEST_REF - Alpha-Test Comparision Value (W)
  0-4   Alpha-Test Comparision Value (0..31) (Hide pixels if Alpha<AlphaRef)
5-31 Not used
Alpha Test can be enabled in DISP3DCNT.Bit2, when enabled, pixels with Alpha values less than ALPHA_TEST_REF are not rendered (ie. their alpha value is forced to zero). Alpha Test is performed on the final polygon pixels (ie. after texture blending).


 DS 3D Geometry Commands < ^

4000400h - GXFIFO - Geometry Command FIFO (W) (mirrored up to 400043Fh?)
Used to send packed commands, unpacked commands,
  0-7   First  Packed Command (or Unpacked Command)
8-15 Second Packed Command (or 00h=None)
16-23 Third Packed Command (or 00h=None)
24-31 Fourth Packed Command (or 00h=None)
and parameters,
  0-31  Parameter data for the previously sent (packed) command(s)
to the Geometry engine.

FIFO / PIPE Number of Entries
The FIFO has 256 entries, additionally, there is a PIPE with four entries (giving a total of 260 entries). If the FIFO is empty, and if the PIPE isn't full, then data is moved directly into the PIPE, otherwise it is moved into the FIFO. If the PIPE runs half empty (less than 3 entries) then 2 entries are moved from the FIFO to the PIPE. The state of the FIFO can be obtained in GXSTAT.Bit16-26, observe that there may be still data in the PIPE, even if the FIFO is empty. Check the busy flag in GXSTAT.Bit27 to see if the PIPE or FIFO contains data (or if a command is still executing).
Each PIPE/FIFO entry consists of 40bits of data (8bit command code, plus 32bit parameter value). Commands without parameters occupy 1 entry, and Commands with N parameters occupy N entries.

Sending Commands by Ports 4000440h..40005FFh
Geometry commands can be indirectly sent to the FIFO via ports 4000440h and up.
For a command with N paramters: issue N writes to the port.
For a command without parameters: issue one dummy-write to the port.
That mechanism puts the 8bit command + 32bit parameter into the FIFO/PIPE.
If the FIFO is full, then a wait is generated until data is removed from the FIFO, ie. the STR opcode gets freezed, during the wait, the bus cannot be used even by DMA, interrupts, or by the NDS7 CPU.

GXFIFO Access via DMA
Larger pre-calculated data blocks can be sent directly to the FIFO. This is usually done via DMA (use DMA in Geometry Command Mode, 32bit units, Dest=4000400h/fixed, Length=NumWords, Repeat=0). The timings are handled automatically, ie. the system (should) doesn't freeze when the FIFO is full (see below Overkill note though). DMA starts when the FIFO becomes less than half full, the DMA does then write 112 words to the GXFIFO register (or less, if the remaining DMA transfer length gets zero).

GXFIFO Access via STR,STRD,STM
If desired, STR,STRD,STM opcodes can be used to write to the FIFO.
Opcodes that write more than one 32bit value (ie. STRD and STM) can be used to send ONE UNPACKED command, plus any parameters which belong to that command. After that, there must be a 1 cycle delay before sending the next command (ie. one cannot sent more than one command at once with a single opcode, each command must be invoked by a new opcode). STRD and STM can be used because the GXFIFO register is mirrored to 4000400h..43Fh (16 words).
As with Ports 4000440h and up, the CPU gets stopped if (and as long as) the FIFO is full.

GXFIFO / Unpacked Commands
  - command1 (upper 24bit zero)
- parameter(s) for command1 (if any)
- command2 (upper 24bit zero)
- parameter(s) for command2 (if any)
- command3 (upper 24bit zero)
- parameter(s) for command3 (if any)
GXFIFO / Packed Commands
  - command1,2,3,4 packed into one 32bit value (all bits used)
- parameter(s) for command1 (if any)
- parameter(s) for command2 (if any)
- parameter(s) for command3 (if any)
- parameter(s) for command4 (top-most packed command MUST have parameters)
- command5,6 packed into one 32bit value (upper 16bit zero)
- parameter(s) for command5 (if any)
- parameter(s) for command6 (top-most packed command MUST have parameters)
- command7,8,9 packed into one 32bit value (upper 8bit zero)
- parameter(s) for command7 (if any)
- parameter(s) for command8 (if any)
- parameter(s) for command9 (top-most packed command MUST have parameters)
Packed commands are first decompressed and then stored in command the FIFO.

GXFIFO DMA Overkill on Packed Commands Without Parameters
Normally, the 112 word limit ensures that the FIFO (256 entries) doesn't get full, however, this limit is much too high for sending a lot of "Packed Commands Without Parameters" (ie. PUSH, IDENTITY, or END) - eg. sending 112 x Packed(00151515h) to GXFIFO would write 336 x Cmd(15h) to the FIFO, which is causing the FIFO to get full, and which is causing the DMA (and CPU) to be paused (for several seconds, in WORST case) until enough FIFO commands have been processed to allow the DMA to finish the 112 word transfer.
Not sure if there's much chance to get Overkills in practice. Normally most commands DO have parameters, and so, usually even LESS than 112 FIFO entries are occupied (since 8bit commands with 32bit parameters are merged into single 40bit FIFO entries).


 DS 3D Matrix Load/Multiply < ^

4000440h - Cmd 10h - MTX_MODE - Set Matrix Mode (W)
  0-1   Matrix Mode (0..3)
0 Projection Matrix
1 Position Matrix (aka Modelview Matrix)
2 Position & Vector Simultaneous Set mode (used for Light+VEC_TEST)
3 Texture Matrix (see DS 3D Texture Coordinates chapter)
2-31 Not used
Selects the current Matrix, all following MTX commands (load, multiply, push, pop, etc.) are applied to that matrix. In Mode 2, all MTX commands are applied to both the Position and Vector matrices (except for MTX_SCALE which doesn't change the Vector Matrix, even in Mode 2).

4000454h - Cmd 15h - MTX_IDENTITY - Load Unit Matrix to Current Matrix (W)
Sets C=I. Parameters: None
The Identity Matrix (I), aka Unit Matrix, consists of all zeroes, with a diagonal row of ones. A matrix multiplied by the Unit Matrix is left unchanged.

4000458h - Cmd 16h - MTX_LOAD_4x4 - Load 4x4 Matrix to Current Matrix (W)
Sets C=M. Parameters: 16, m[0..15]

400045Ch - Cmd 17h - MTX_LOAD_4x3 - Load 4x3 Matrix to Current Matrix (W)
Sets C=M. Parameters: 12, m[0..11]

4000460h - Cmd 18h - MTX_MULT_4x4 - Multiply Current Matrix by 4x4 Matrix (W)
Sets C=M*C. Parameters: 16, m[0..15]

4000464h - Cmd 19h - MTX_MULT_4x3 - Multiply Current Matrix by 4x3 Matrix (W)
Sets C=M*C. Parameters: 12, m[0..11]

4000468h - Cmd 1Ah - MTX_MULT_3x3 - Multiply Current Matrix by 3x3 Matrix (W)
Sets C=M*C. Parameters: 9, m[0..8]

400046Ch - Cmd 1Bh - MTX_SCALE - Multiply Current Matrix by Scale Matrix (W)
Sets C=M*C. Parameters: 3, m[0..2] (MTX_SCALE doesn't change Vector Matrix)

4000470h - Cmd 1Ch - MTX_TRANS - Mult. Curr. Matrix by Translation Matrix (W)
Sets C=M*C. Parameters: 3, m[0..2] (x,y,z position)

4000640h..67Fh - CLIPMTX_RESULT - Read Current Clip Coordinates Matrix (R)
This 64-byte region (16 words) contains the m[0..15] values of the Current Clip Coordinates Matrix, arranged in 4x4 Matrix format. Make sure that the Geometry Engine is stopped (GXSTAT.27) before reading from these registers.
The Clip Matrix is internally re-calculated anytime when changing the Position or Projection matrices: ClipMatrix=PositionMatrix*ProjectionMatrix, this matrix is internally used to convert vertices to screen coordinates.
To read only the Position Matrix, or only the Projection Matrix: Use Load Identity on the OTHER matrix, so the ClipMatrix becomes equal to the DESIRED matrix (multiplied by the Identity Matrix, which has no effect on the result).

4000680h..6A3h - VECMTX_RESULT - Read Current Directional Vector Matrix (R)
This 36-byte region (9 words) contains the m[0..8] values of the Current Directional Vector Matrix, arranged in 3x3 Matrix format (the fourth row/column may contain any values).
Make sure that the Geometry Engine is stopped (GXSTAT.27) before reading from these registers.


 DS 3D Matrix Types < ^

Essentially, all matrices in the NDS are 4x4 Matrices, consisting of 16 values, m[0..15]. Each element is a signed fixed-point 32bit number, with a fractional part in the lower 12bits.
The other Matrix Types are used to reduce the number of parameters being transferred, for example, 3x3 Matrix requires only nine parameters, the other seven elements are automatically set to 0 or 1.0 (whereas "1.0" means "1 SHL 12" in 12bit fixed-point notation).
   _      4x4 Matrix       _        _    Identity Matrix    _
| m[0] m[1] m[2] m[3] | | 1.0 0 0 0 |
| m[4] m[5] m[6] m[7] | | 0 1.0 0 0 |
| m[8] m[9] m[10] m[11] | | 0 0 1.0 0 |
|_m[12] m[13] m[14] m[15]_| |_ 0 0 0 1.0 _|
   _      4x3 Matrix       _        _  Translation Matrix   _
| m[0] m[1] m[2] 0 | | 1.0 0 0 0 |
| m[3] m[4] m[5] 0 | | 0 1.0 0 0 |
| m[6] m[7] m[8] 0 | | 0 0 1.0 0 |
|_m[9] m[10] m[11] 1.0 _| |_m[0] m[1] m[2] 1.0 _|
   _      3x3 Matrix       _        _     Scale Matrix      _
| m[0] m[1] m[2] 0 | | m[0] 0 0 0 |
| m[3] m[4] m[5] 0 | | 0 m[1] 0 0 |
| m[6] m[7] m[8] 0 | | 0 0 m[2] 0 |
|_ 0 0 0 1.0 _| |_ 0 0 0 1.0 _|

 DS 3D Matrix Stack < ^

Matrix Stack
The NDS has three Matrix Stacks, and two Matrix Stack Pointers (the Coordinate Matrix stack pointer is also shared for Directional Matrix Stack).
  Matrix Stack        Valid Stack Area    Stack Pointer
Projection Stack 0..0 (1 entry) 0..1 (1bit) (GXSTAT: 1bit)
Coordinate Stack 0..30 (31 entries) 0..63 (6bit) (GXSTAT: 5bit only)
Directional Stack 0..30 (31 entries) (uses Coordinate Stack Pointer)
Texture Stack One..None? 0..1 (1bit) (GXSTAT: N/A)
The initial value of the Stack Pointers is zero, the current value of the pointers can be read from GXSTAT (read-only), that register does also indicate stack overflows (errors flag gets set on read/write to invalid entries, ie. entries 1 or 1Fh..3Fh). For all stacks, the upper half (ie. 1 or 20h..3Fh) are mirrors of the lower half (ie. 0 or 0..1Fh).

4000444h - Cmd 11h - MTX_PUSH - Push Current Matrix on Stack (W)
Parameters: None. Sets [S]=C, and then S=S+1.

4000448h - Cmd 12h - MTX_POP - Pop Current Matrix from Stack (W)
Sets S=S-N, and then C=[S].
  Parameter Bit0-5:  Stack Offset (signed value, -30..+31) (usually +1)
Parameter Bit6-31: Not used
Offset N=(+1) pops the most recently pushed value, larger offsets of N>1 will "deallocate" N values (and load the Nth value into C). Zero or negative values can be used to pop previously "deallocated" values.
The stack has only one level (at address 0) in projection mode, in that mode, the parameter value is ignored, the offset is always +1 in that mode.

400044Ch - Cmd 13h - MTX_STORE - Store Current Matrix on Stack (W)
Sets [N]=C. The stack pointer S is not used, and is left unchanged.
  Parameter Bit0-4:  Stack Address (0..30) (31 causes overflow in GXSTAT.15)
Parameter Bit5-31: Not used
The stack has only one level (at address 0) in projection mode, in that mode, the parameter value is ignored.

4000450h - Cmd 14h - MTX_RESTORE - Restore Current Matrix from Stack (W)
Sets C=[N]. The stack pointer S is not used, and is left unchanged.
  Parameter Bit0-4:  Stack Address (0..30) (31 causes overflow in GXSTAT.15)
Parameter Bit5-31: Not used
The stack has only one level (at address 0) in projection mode, in that mode, the parameter value is ignored.

In Projection mode, the parameter for POP, STORE, and RESTORE is unused - not sure if the parameter (ie. a dummy value) is - or is not - to be written to the command FIFO?
There appear to be actually 32 entries in Coordinate & Directional Stacks, entry 31 appears to exist, and appears to be read/write-able (although the stack overflow flag gets set when accessing it).


 DS 3D Matrix Examples (Projection) < ^

The most important matrix is the Projection Matrix (to be initialized with MTX_MODE=0 via MTX_LOAD_4x4 command). It does specify the dimensions of the view volume.

With Perspective Projections more distant objects will appear smaller, with Orthogonal Projects the size of the objects is always same regardless of their distance.

  Perspective Projection     Orthogonal Projection
                  ___                  __________
       top ___----   |            top |          |
          |   view   |                |   view   |
  Eye ----|--------->|        Eye ----|--------->|
          |__volume  |                |  volume  |
     bottom   ----___|          bottom|__________|
        near        far             near        far
Correctly initializing the projection matrix (as shown in the examples below) can be quite difficult (mind that fixed point multiply/divide requires to adjust the fixed-point width before/after calculation). For beginners, it may be recommended to start with a simple Identity Matrix (MTX_IDENTITY command) used as Projection Matrix (ie. Ortho with t,b,l,r set to +/-1).

Orthogonal Projections (Ortho)
  | (2.0)/(r-l)       0             0            0     |
| 0 (2.0)/(t-b) 0 0 |
| 0 0 (2.0)/(n-f) 0 |
| (l+r)/(l-r) (b+t)/(b-t) (n+f)/(n-f) 1.0 |
n,f specify the distance from eye to near and far clip planes. t,b,l,r are the coordinates of near clip plane (top,bottom,left,right). For a symmetrical view (ie. the straight-ahead view line centered in the middle of viewport) t,b,l,r should be usually t=+ysiz/2, b=-ysiz/2, r=+xsiz/2, l=-xsiz/2; the (xsiz/ysiz) ratio should be usually equal to the viewport's (width/heigh) ratio. Examples for a asymmetrical view would be b=0 (frog's view), or t=0 (bird's view).

Left-Right Asymmetrical Perspective Projections (Frustum)
  | (2*n)/(r-l)       0             0            0     |
| 0 (2*n)/(t-b) 0 0 |
| (r+l)/(r-l) (t+b)/(t-b) (n+f)/(n-f) -1.0 |
| 0 0 (2*n*f)/(n-f) 0 |
n,f,t,b,l,r have same meanings as above (Ortho), the difference is that more distant objects will appear smaller with Perspective Projection (unlike Orthogonal Projection where the size isn't affected by the distance).

Left-Right Symmetrical Perspective Projections (Perspective)
  | cos/(asp*sin)     0             0            0     |
| 0 cos/sin 0 0 |
| 0 0 (n+f)/(n-f) -1.0 |
| 0 0 (2*n*f)/(n-f) 0 |
Quite the same as above (Frustum), but with symmetrical t,b values (which are in this case obtained from a vertical view range specified in degrees), and l,r are matched to the aspect ratio of the viewport (asp=height/width).

Moving the Camera
After initializing the Projection Matrix, you may multiply it with Rotate and/or Translation Matrices to change camera's position and view direction.


 DS 3D Matrix Examples (Rotate/Scale/Translate) < ^

Identity Matrix
The MTX_IDENTITY command can be used to initialize the Position Matrix before doing any Translation/Scaling/Rotation, for example:
  Load(Identity)                           ;no rotation/scaling used
Load(Identity), Mul(Rotate), Mul(Scale) ;rotation/scaling (not so efficient)
Load(Rotate), Mul(Scale) ;rotation/scaling (more efficient)
Rotation Matrices
Rotation can be performed with MTX_MULT_3x3 command, simple examples are:
  Around X-Axis          Around Y-Axis          Around Z-Axis
| 1.0 0 0 | | cos 0 sin | | cos sin 0 |
| 0 cos sin | | 0 1.0 0 | | -sin cos 0 |
| 0 -sin cos | | -sin 0 cos | | 0 0 1.0 |
Scale Matrix
The MTX_SCALE command allows to adjust the size of the polygon. The x,y,z parameters should be normally all having the same value, x=y=z (unless if you want to change only the height of the object, for example). Identical results can be obtained with MTX_MULT commands, however, when using lighting (MTX_MODE=2), then scaling should be done ONLY with MTX_SCALE (which keeps the length of the light's directional vector intact).

Translation Matrix
The MTX_TRANS command allows to move polygons to the desired position. The polygon VTX commands are spanning only a small range of coordinates (near zero-coordinate), so translation is required to move the polygons to other locations in the world coordinates. Aside from that, translation is useful for moved objects (at variable coordinates), and for re-using an object at various locations (eg. you can create a forest by translating a tree to different coordinates).

Matrix Multiply Order
The Matrix must be set up BEFORE sending the Vertices (which are then automatically multiplied by the matrix). When using multiple matrices multiplied with each other: Mind that, for matrix maths A*B is NOT the same as B*A. For example, if you combine Rotate and Translate Matrices, the object will be either rotated around it's own zero-coordinate, or around world-space zero-coordinate, depending on the multiply order.


 DS 3D Matrix Examples (Maths Basics) < ^

Below is a crash-course on matrix maths. Most of it is carried out automatically by the hardware. So this chapter is relevant only if you are interested in details about what happens inside of the 3D engine.

Matrix-by-Matrix Multiplication
Matrix multiplication, C = A * B, is possible only if the number of columns in A is equal to the number of rows in B, so it works fine with the 4x4 matrices which are used in the NDS. For the multiplication, assume matrix C to consist of elements cyx, and respecitively, matrix A and B to consist of elements ayx and byx. So that C = A * B looks like:
  | c11 c12 c13 c14 |     | a11 a12 a13 a14 |     | b11 b12 b13 b14 |
| c21 c22 c23 c24 | = | a21 a22 a23 a24 | * | b21 b22 b23 b24 |
| c31 c32 c33 c34 | | a31 a32 a33 a34 | | b31 b32 b33 b34 |
| c41 c42 c43 c44 | | a41 a42 a43 a44 | | b41 b42 b43 b44 |
Each element in C is calculated by multiplying the elements from one row in A by the elements from the corresponding column in B, and then taking the sum of the products, ie.
  cyx = ay1*b1x + ay2*b2x + ay3*b3x + ay4*b4x
In total, that requires 64 multiplications (four multiplications for each of the 16 cyx elements), and 48 additions (three per cyx element), the hardware carries out that operation at a relative decent speed of 30..35 clock cycles, possibly by performing several multiplications simultaneously with separate multiply units.
Observe that for matrix multiplication, A*B is NOT the same as B*A.

Matrix-by-Vector & Vector-by-Matrix Multiplication
Vectors are Matrices with only one row, or only one column. Multiplication works as for normal matrices; the number of rows/columns must match up, repectively, row-vectors can be multiplied by a matrices; and matrices can be multiplied by column-vectors (but not vice-versa). Eg. C = A * B:
                                                  | b11 b12 b13 b14 |
| c11 c12 c13 c14 | = | a11 a12 a13 a14 | * | b21 b22 b23 b24 |
| b31 b32 b33 b34 |
| b41 b42 b43 b44 |
The formula for calculating the separate elements is same as above,
  cyx = ay1*b1x + ay2*b2x + ay3*b3x + ay4*b4x
Of which, C and A have only one y-index, so one may replace "cyx and ayx" by "c1x and a1x", or completely leave out the y-index, ie. "cx and ax".

Matrix-by-Number Multiplication
Simply multiply all elements of the Matrix by the number, C = A * n:
  cyx = ayx*n
Of course, works also with vectors (matrices with only one row/column).

Matrix-to-Matrix Addition/Subtraction
Both matrices must have the same number of rows & columns, add/subtract all elements with corresponding elements in other matrix, C = A +/- B:
  cyx = ayx +/- byx
Of course, works also with vectors (two matrices with only one row/column).

Vectors
A vector, for example (x,y,z), consists of offsets along x-,y-, and z-axis. The line from origin to origin-plus-offset is having two characteristics: A direction, and a length.
The length (aka magnitude) can be calculated as L=sqrt(x^2+y^2+z^2).

Vector-by-Vector Multiplication
This can be processed as LineVector*RowVector, so the result is a number (aka scalar) (aka a matrix with only 1x1 elements). Multiplying two (normalized) vectors results in: "cos(angle)=vec1*vec2", ie. the consine of the angle between the two vectors (eg. used for light vectors). Multiplying a vector with itself, and taking the square root of the result obtains its length, ie. "length=sqrt(vec^2)".
That stuff should be done with 3-dimensional vectors (not 4-dimensionals).

Normalized Vectors
Normalized Vectors (aka Unit Vectors) are vectors with length=1.0. To normalize a vector, divide its coordinates by its length, ie. x=x/L, y=y/L, z=z/L, the direction remains the same, but the length is now 1.0.
On the NDS, normalized vectors should have a length of something less than 1.0 (eg. something like 0.99) because several NDS registers are limited to 1bit sign, 0bit interger, Nbit fraction part (so vectors that are parallel with the x,y,z axes, or that become parallel to them after rotation, cannot have a length of 1.0).

Fixed-Point Numbers
The NDS uses fixed-point numbers (rather than floating point numbers). Addition and Subtraction works as with normal integers, provided that the fractional part is the same for both numbers. If it is not the same: Shift-left the value with the smaller fractional part.
For multiplication, the fractional part of result is the sum of the fractional parts (eg. 12bit fraction * 12bit fraction = 24bit fraction; shift-right the result by 12 to convert it 12bit fraction). The NDS matrix multiply unit is maintaining the full 24bit fraction when processing the
  cyx = ay1*b1x + ay2*b2x + ay3*b3x + ay4*b4x
formula, ie. the three additions are using full 24bit fractions (with carry-outs to upper bits), the final result of the additions is then shifted-right by 12.
For division, it's vice versa, the fractions of the operands are substracted, 24bit fraction / 12bit fraction = 12bit fraction. When dividing two 12bit numbers, shift-left the first number by 12 before division to get a result with 12bit fractional part.

Four-Dimensional Matrices
The NDS uses four-dimensional matrices and vectors, ie. matrices with 4x4 elements, and vectors with 4 elements. The first three elements are associated with the X,Y,Z-axes of the three-dimensional space. The fourth element is somewhat a "W-axis".
With 4-dimensional matrices, the Translate matrix can be used to move an object to another position. Ie. once when you've setup a matrix (which may consists of pre-multiplied scaling, rotation, translation matrices), then that matrix can be used on vertices to perform the rotation, scaling, translation all-at-once; by a single Vector*Matrix operation.
With 3-dimensional matrices, translation would require a separate addition, additionally to the multiply operation.


 DS 3D Polygon Attributes < ^

40004A4h - Cmd 29h - POLYGON_ATTR - Set Polygon Attributes (W)
  0-3   Light 0..3 Enable Flags (each bit: 0=Disable, 1=Enable)
4-5 Polygon Mode (0=Modulation,1=Decal,2=Toon/Highlight Shading,3=Shadow)
6 Polygon Back Surface (0=Hide, 1=Render) ;Line-segments are always
7 Polygon Front Surface (0=Hide, 1=Render) ;rendered (no front/back)
8-10 Not used
11 Depth-value for Translucent Pixels (0=Keep Old, 1=Set New Depth)
12 Far-plane intersecting polygons (0=Hide, 1=Render/clipped)
13 1-Dot polygons behind DISP_1DOT_DEPTH (0=Hide, 1=Render)
14 Depth Test, Draw Pixels with Depth (0=Less, 1=Equal) (usually 0)
15 Fog Enable (0=Disable, 1=Enable)
16-20 Alpha (0=Wire-Frame, 1..30=Translucent, 31=Solid)
21-23 Not used
24-29 Polygon ID (00h..3Fh, used for translucent, shadow, and edge-marking)
30-31 Not used
Writes to POLYGON_ATTR have no effect until next BEGIN_VTXS command.
Changes to the Light bits have no effect until lighting is re-calculated by Normal command. The interior of Wire-frame polygons is transparent (Alpha=0), and only the lines at the polygon edges are rendered, using a fixed Alpha value of 31.

4000480h - Cmd 20h - COLOR - Directly Set Vertex Color (W)
  Parameter 1, Bit 0-4    Red
Parameter 1, Bit 5-9 Green
Parameter 1, Bit 10-14 Blue
Parameter 1, Bit 15-31 Not used
The 5bit RGB values are internally expanded to 6bit RGB as follows: X=X*2+(X+31)/32, ie. zero remains zero, all other values are X=X*2+1.
Aside from by using the Color command, the color can be also changed by MaterialColor0 command (if MaterialColor0.Bit15 is set, it acts identical as the Color Command), and by the Normal command (which calculates the color based on light/material parameters).

Depth Test (aka DepthFunc in OpenGL)
The Depth Test compares the depth of the pixels of the polygon with the depth of previously rendered polygons (or of the rear plane if there have been none rendered yet). The new pixels are drawn if the new depth is Less (closer to the camera), or if it is Equal, as selected by POLYGON_ATTR.Bit14. The latter comparision mode draws pixels only on exact matches, so the results may be unpredictable due to rounding errors; one situation that does work stable is if both polygons use the same vertices, eg. it can be used to put a grafitti texture (consisting of solid and transparent pixels) onto a wall.


 DS 3D Polygon Definitions by Vertices < ^

The DS supports polygons with 3 or 4 edges, triangles and quadliterals.
The position of the edges is defined by vertices, each consisting of (x,y,z) values.

For Line Segments, use Triangles with twice the same vertex, Line Segments are rendered always because they do not have any front and back sides.
The Prohibited Quad shapes may produce unintended results, namely, that are Quads with crossed sides, and quads with angles greater than 180 degrees.
  Separate Tri.     Triangle Strips   Line Segment
v0 v2___v4____v6
|\ v3 /|\ |\ /\ v0 v1
| \ /\ v0( | \ | \ / \ ------
|__\ /__\ \|__\|__\/____\ v2
v1 v2 v4 v5 v1 v3 v5 v7
  Separate Quads          Quadliteral Strips         Prohibited Quads
v0__v3 v0__v2____v4 v10__ v0__v3 v4
/ \ v4____v7 / \ |\ _____ / /v11 \/ |\
/ \ | \ / \ | |v6 v8| / /\ v5| \
/______\ |_____\ /______\___|_|_____|/ /__\ /___\
v1 v2 v5 v6 v1 v3 v5 v7 v9 v2 v1 v6 v7
The vertices are normally arranged anti-clockwise, except that: in triangle-strips each second polygon uses clockwise arranged vertices, and quad-stripes are sorts of "up-down" arranged (whereas "up" and "down" may be anywhere due to rotation). Other arrangements may result in quads with crossed lines, or may swap the front and back sides of the polygon (above examples are showing the front sides).

4000500h - Cmd 40h - BEGIN_VTXS - Start of Vertex List (W)
  Parameter 1, Bit 0-1    Primitive Type (0..3, see below)
Parameter 1, Bit 2-31 Not used
Indicates the Start of a Vertex List, and its Primitive Type:
  0  Separate Triangle(s)    ;3*N vertices per N triangles
1 Separate Quadliteral(s) ;4*N vertices per N quads
2 Triangle Strips ;3+(N-1) vertices per N triangles
3 Quadliteral Strips ;4+(N-1)*2 vertices per N quads
The BEGIN_VTX command should be followed by VTX_-commands to define the Vertices of the list, and should be then terminated by END_VTX command.
BEGIN_VTX additionally applies changes to POLYGON_ATTR.

4000504h - Cmd 41h - END_VTXS - End of Vertex List (W)
Parameters: None. This is a Dummy command for OpenGL compatibility. It should be used to terminate a BEGIN_VTX, VTX_<values> sequence. END_VTXS is possibly required for Nintendo's software emulator? On real NDS consoles (and in no$gba) it does have no effect, it can be left out, or can be issued multiple times inside of a vertex list, without disturbing the display.

400048Ch - Cmd 23h - VTX_16 - Set Vertex XYZ Coordinates (W)
  Parameter 1, Bit 0-15   X-Coordinate (signed, with 12bit fractional part)
Parameter 1, Bit 16-31 Y-Coordinate (signed, with 12bit fractional part)
Parameter 2, Bit 0-15 Z-Coordinate (signed, with 12bit fractional part)
Parameter 2, Bit 16-31 Not used
4000490h - Cmd 24h - VTX_10 - Set Vertex XYZ Coordinates (W)
  Parameter 1, Bit 0-9    X-Coordinate (signed, with 6bit fractional part)
Parameter 1, Bit 10-19 Y-Coordinate (signed, with 6bit fractional part)
Parameter 1, Bit 20-29 Z-Coordinate (signed, with 6bit fractional part)
Parameter 1, Bit 30-31 Not used
Same as VTX_16, with only one parameter, with smaller fractional part.

4000494h - Cmd 25h - VTX_XY - Set Vertex XY Coordinates (W)
  Parameter 1, Bit 0-15   X-Coordinate (signed, with 12bit fractional part)
Parameter 1, Bit 16-31 Y-Coordinate (signed, with 12bit fractional part)
The Z-Coordinate is kept unchanged, and re-uses the value from previous VTX.

4000498h - Cmd 26h - VTX_XZ - Set Vertex XZ Coordinates (W)
  Parameter 1, Bit 0-15   X-Coordinate (signed, with 12bit fractional part)
Parameter 1, Bit 16-31 Z-Coordinate (signed, with 12bit fractional part)
The Y-Coordinate is kept unchanged, and re-uses the value from previous VTX.

400049Ch - Cmd 27h - VTX_YZ - Set Vertex YZ Coordinates (W)
  Parameter 1, Bit 0-15   Y-Coordinate (signed, with 12bit fractional part)
Parameter 1, Bit 16-31 Z-Coordinate (signed, with 12bit fractional part)
The X-Coordinate is kept unchanged, and re-uses the value from previous VTX.

40004A0h - Cmd 28h - VTX_DIFF - Set Relative Vertex Coordinates (W)
  Parameter 1, Bit 0-9    X-Difference (signed, with 9/12bit fractional part)
Parameter 1, Bit 10-19 Y-Difference (signed, with 9/12bit fractional part)
Parameter 1, Bit 20-29 Z-Difference (signed, with 9/12bit fractional part)
Parameter 1, Bit 30-31 Not used
Sets XYZ-Coordinate relative to the XYZ-Coordinates from previous VTX. In detail: The 9bit fractional values are divided by 8 (sign expanded to 12bit fractions, in range +/-0.125), and that 12bit fraction is then added to the old vtx coordinates. The result of the addition should not overflow 16bit vertex coordinate range (1bit sign, 3bit integer, 12bit fraction).

Notes on VTX commands
On each VTX command, the viewport coordinates of the vertex are calculated and stored in Vertex RAM,
  ( xx, yy, zz, ww ) = ( x, y, z, 1.0 ) * ClipMatrix
The actual screen position (in pixels) is then,
  screen_x = (xx+ww)*viewport_width / (2*ww) + viewport_x1
screen_y = (yy+ww)*viewport_height / (2*ww) + viewport_y1
Each VTX command that completes the definition of a polygon (ie. each 3rd for Separate Trangles) does additionally store data in Polygon List RAM.
VTX commands may be issued only between Begin and End commands.

Clipping
Polygons are clipped to the 6 sides of the view volume (ie. to the left, right, top, bottom, near, and far edges). If one or more vertic(es) exceed one of these sides, then these vertic(es) are replaced by two newly created vertices (which are located on the intersections of the polygon edges and the view volume edge).
Depending on the number of clipped vertic(es), this may increase or decrease the number of entries in Vertex RAM (ie. minus N clipped vertices, plus 2 new vertices). Also, clipped polygons which are part of polygon strips are converted to separate polygons (which does increase number of entries in Vertex RAM). Polygons that are fully outside of the View Volume aren't stored in Vertex RAM, nor in Polygon RAM (the only exception are polygons that are located exactly one pixel below of, or right of lower/right edges, which appear to be accidently stored in memory).


 DS 3D Polygon Light Parameters < ^

The lighting operation is performed by executing the Normal command (which sets the VertexColor based on the Light/Material parameters) (to the rest of the hardware it doesn't matter if the VertexColor was set by Color command or by Normal command). Light is calculated only for the Front side of the polygon (assuming that the Normal is matched to that side), so the Back side will be (incorrectly) using the same color.

40004C8h - Cmd 32h - LIGHT_VECTOR - Set Light's Directional Vector (W)
Sets direction of the specified light (ie. the light selected in Bit30-31).
  0-9   Directional Vector's X component (1bit sign + 9bit fractional part)
10-19 Directional Vector's Y component (1bit sign + 9bit fractional part)
20-29 Directional Vector's Z component (1bit sign + 9bit fractional part)
30-31 Light Number (0..3)
Upon executing this command, the incoming vector is multiplied by the current Directional Matrix, the result is then applied as LightVector. This allows to rotate the light direction. However, normally, to keep the light unrotated, be sure to use LoadIdentity (in MtxMode=2) before setting the LightVector.

40004CCh - Cmd 33h - LIGHT_COLOR - Set Light Color (W)
Sets the color of the specified light (ie. the light selected in Bit30-31).
  0-4   Red          (0..1Fh)      ;\light color this will be combined with
5-9 Green (0..1Fh) ; diffuse, specular, and ambient colors
10-14 Blue (0..1Fh) ;/upon execution of the normal command
15-29 Not used
30-31 Light Number (0..3)
40004C0h - Cmd 30h - DIF_AMB - MaterialColor0 - Diffuse/Ambient Reflect. (W)
  0-4   Diffuse Reflection Red     ;\light(s) that directly hits the polygon,
5-9 Diffuse Reflection Green ; ie. max when NormalVector has opposite
10-14 Diffuse Reflection Blue ;/direction of LightVector
15 Set Vertex Color (0=No, 1=Set Diffuse Reflection Color as Vertex Color)
16-20 Ambient Reflection Red ;\light(s) that indirectly hits the polygon,
21-25 Ambient Reflection Green ; ie. assuming that light is reflected by
26-30 Ambient Reflection Blue ;/walls/floor, regardless of LightVector
31 Not used
With Bit15 set, the lower 15bits are applied as VertexColor (exactly as when when executing the Color command), the purpose is to use it as default color (eg. when outcommenting the Normal command), normally, when using lighting, the color setting gets overwritten (as soon as executing the Normal command).

40004C4h - Cmd 31h - SPE_EMI - MaterialColor1 - Specular Ref. & Emission (W)
  0-4   Specular Reflection Red    ;\light(s) reflected towards the camera,
5-9 Specular Reflection Green ; ie. max when NormalVector is in middle of
10-14 Specular Reflection Blue ;/LightVector and ViewDirection
15 Specular Reflection Shininess Table (0=Disable, 1=Enable)
16-20 Emission Red ;\light emitted by the polygon itself,
21-25 Emission Green ; ie. regardless of light colors/vectors,
26-30 Emission Blue ;/and no matter if any lights are enabled
31 Not used
Caution: Specular Reflection WON'T WORK when the ProjectionMatrix is rotated.

40004D0h - Cmd 34h - SHININESS - Specular Reflection Shininess Table (W)
Write 32 parameter words (each 32bit word containing four 8bit entries), entries 0..3 in the first word, through entries 124..127 in the last word:
  0-7   Shininess 0 (unsigned fixed-point, 0bit integer, 8bit fractional part)
8-15 Shininess 1 ("")
16-23 Shininess 2 ("")
24-31 Shininess 3 ("")
If the table is disabled (by MaterialColor1.Bit15), then reflection will act as if the table would be filled with linear increasing numbers.

4000484h - Cmd 21h - NORMAL - Set Normal Vector (W)
In short, this command does calculate the VertexColor, based on the various light-parameters.
In detail, upon executing this command, the incoming vector is multiplied by the current Directional Matrix, the result is then applied as NormalVector (giving it the same rotation as used for the following polygon vertices).
  0-9   X-Component of Normal Vector (1bit sign + 9bit fractional part)
10-19 Y-Component of Normal Vector (1bit sign + 9bit fractional part)
20-29 Z-Component of Normal Vector (1bit sign + 9bit fractional part)
30-31 Not used
Defines the Polygon's Normal. And, does then update the Vertex Color; by recursing the View Direction, the NormalVector, the LightVector(s), and Light/Material Colors. The execution time of the Normal command varies depending on the number of enabled light(s).

Additional Light Registers
Additionally to above registers, light(s) must be enabled in PolygonAttr (mind that changes to PolygonAttr aren't applied until next Begin command). And, the Directional Matrix must be set up correctly (in MtxMode=2) for the LightVector and NormalVector commands.

Normal Vector
The Normal vector must point "away from the polygon surface" (eg. for the floor, the Normal should point upwards). That direction is implied by the polygon vertices, however, the hardware cannot automatically calculate it, so it must be set manually with the Normal command (prior to the VTX-commands).
When using lighting, the Normal command must be re-executed after switching Lighting on/off, or after changing light/material parameters. And, of course, also before defining polygons with different orientation. Polygons with same orientation (eg. horizontal polygon surfaces) and same material color can use the same Normal. Changing the Normal per polygon gives differently colored polygons with flat surfaces, changing the Normal per vertex gives the illusion of curved surfaces.

Light Vector
Each light consists of parallel beams; similar to sunlight, which appears to us (due to the great distance) to consist of parallel beams, all emmitted into the same direction; towards Earth.
In reality, light is emitted into ALL directions, originated from the light source (eg. a candle), the hardware doesn't support that type of non-parallel light. However, the light vectors can be changed per polygon, so a polygon that is located north of the light source may use different light direction than a polygon that is east of the light source.
And, of course, Light 0..3 may (and should) have different directions.

Normalized Vectors
The Normal Vector and the Light Vectors should be normalized (ie. their length should be 1.0) (in practice: something like 0.99, since the registers have only fractional parts) (a length of 1.0 can cause overflows).

Lighting Limitations
The functionality of the light feature is limited to reflecting light to the camera (light is not reflected to other polygons, nor does it cast shadows on other polygons). However, independently of the lighting feature, the DS hardware does allow to create shadows, see:
DS 3D Shadow Polygons

Internal Operation on Normal Command
  IF TexCoordTransformMode=2 THEN TexCoord=NormalVector*Matrix (see TexCoord)
NormalVector=NormalVector*DirectionalMatrix
VertexColor = EmissionColor
FOR i=0 to 3
IF PolygonAttrLight[i]=enabled THEN
DiffuseLevel = max(0,-(LightVector[i]*NormalVector))
ShininessLevel = max(0,(-HalfVector[i])*(NormalVector))^2
IF TableEnabled THEN ShininessLevel = ShininessTable[ShininessLevel]
;note: below processed separately for the R,G,B color components...
VertexColor = VertexColor + SpecularColor*LightColor[i]*ShininessLevel
VertexColor = VertexColor + DiffuseColor*LightColor[i]*DiffuseLevel
VertexColor = VertexColor + AmbientColor*LightColor[i]
ENDIF
NEXT i
Internal Operation on Light_Vector Command (for Light i)
  LightVector[i] = (LightVector*DirectionalMatrix)
HalfVector[i] = (LightVector[i]+LineOfSightVector)/2
LineOfSightVector (how it SHOULD work)
Ideally, the LineOfSightVector should point from the camera to the vertic(es), however, the vertic(es) are still unknown at time of normal command, so it is just pointing from the camera to the screen, ie.
  LineOfSightVector = (0,0,-1.0)
Moreover, the LineOfSightVector should be multiplied by the Projection Matrix (so the vector would get rotated accordingly when the camera gets rotated), and, after multiplication by a scaled matrix, it'd be required to normalize the resulting vector.

LineOfSightVector (how it DOES actually work)
However, the NDS cannot normalize vectors by hardware, and therefore, it does completely leave out the LineOfSightVector*ProjectionMatrix multiplication. So, the LineOfSightVector is always (0,0,-1.0), no matter of any camera rotation. That means,
  Specular Reflection WON'T WORK when the ProjectionMatrix is rotated (!)
So, if you want to rotate the "camera" (in MTX_MODE=0), then you must instead rotate the "world" in the opposite direction (in MTX_MODE=2).
That problem applies only to Specular Reflection, ie. only if Lighting is used, and only if the Specular Material Color is nonzero.

Maths Notes
Note on Vector*Vector multiplication: Processed as LineVector*RowVector, so the result is a number (aka scalar) (aka a matrix with only 1x1 elements), multiplying two (normalized) vectors results in: "cos(angle)=vec1*vec2", ie. the consine of the angle between the two vectors.
The various Normal/Light/Half/Sight vectors are only 3-dimensional (x,y,z), ie. only the upper-left 3x3 matrix elements are used on multiplications with the 4x4 DirectionalMatrix.


 DS 3D Shadow Polygons < ^

The DS hardware's Light-function allows to reflect light to the camera, it does not reflect light to other polygons, and it does not cast any shadows. For shadows at fixed locations it'd be best to pre-calculate their shape and position, and to change the vertex color of the shaded polygons.
Additionally, the Shadow Polygon feature can be used for to create animated shadows, ie. moved objects and variable light sources.

Shadow Polygons and Shadow Volume
The software must define a Shadow Volume (ie. the region which doesn't contain light), the hardware does then automatically draw the shadow on all pixels whose x/y/z-coordinates are inside of that region.
The Shadow Volume must be defined by several Shadow Polygons which are enclosing the shaded region. The 'top' of the shadow volume should be usually translated to the position of the object that casts the shadow, if the light direction changes then the shadow volume should be also rotated to match the light direction. The 'length' of the shadow volume should be (at least) long enough to reach from the object to the walls/floor where the shadow is to be drawn. The shadow volume must be passed TWICE to the hardware:

Step 1 - Shadow Volume for Mask
Set Polygon_Attr Mode=Shadow, PolygonID=00h, Back=Render, Front=Hide, Alpha=01h..1Eh, and pass the shadow volume (ie. the shadow polygons) to the geometry engine.
The Back=Render / Front=Hide setting causes the 'rear-side' of the shadow volume to be rendered, of course only as far as it is in front of other polygons. The Mode=Shadow / ID=00h setting causes the polygon NOT to be drawn to the Color Buffer - instead, flags are set in the Stencil Buffer (to be used in Step 2).

Step 2 - Shadow Volume for Rendering
Simply repeat step 1, but with Polygon_Attr Mode=Shadow, PolygonID=01h..3Fh, Back=Render(what/why?), Front=Render, Alpha=01h..1Eh.
The Front=Render setting causes the 'front-side' of the shadow volume to be rendered, again, only as far as it is in front of other polygons. The Mode=Shadow / ID>00h setting causes the polygon to be drawn to the Color Buffer as usually, but only if the Stencil Buffer bits are zero (ie. the portion from Step 1 is excluded) (additionally, Step 2 resets the stencil bits after checking them). Moreover, the shadow is rendered only if its Polygon ID differs from the ID in the Attribute Buffer.

Shadow Alpha and Shadow Color
The Alpha=Translucent setting in Step 1 and 2 ensures that the Shadow is drawn AFTER the normal (opaque) polygons have been rendered. In Step 2 it does additionally specify the 'intensity' of the shadow. For normal shadows, the Vertex Color should be usually black, however, the shadow volume may be also used as 'spotlight volume' when using other colors.

Rendering Order
The Mask Volume must be rendered prior to the Rendering Volume, ie. Step 1 and 2 must be performed in that order, and, to keep that order intact, Auto-sorting must have been disabled in the previous Swap_Buffers command.
The shadow volume must be rendered after the 'target' polygons have been rendered, for opaque targets this is done automatically (due to the translucent alpha setting; translucent polygons are always rendered last, even with auto-sort disabled).

Translucent Targets
Casting shadows on Translucent Polygons. First draw the translucent target (with update depth buffer enabled, required for the shadow z-coordinates), then draw the Shadow Mask/Rendering volumes.
Due to the updated depth buffer the shadow will be cast only on the translucent target (not on any other polygons underneath of the translucent polygon). If you want the shadow to appear on both: draw the Shadow Mask/Rendering volume TWICE (once before, and once after drawing the translucent target).

Polygon ID and Fog Enable
The "Render only if Polygon ID differs" feature (see Step 2) allows to prevent the shadow to be cast on the object that casts the shadow (ie. the object and shadow should have the same IDs). The feature also allows to select whether overlapping shadows (with same/different IDs) are shaded once or twice.
The old Fog Enable flag in the Attribute Buffer is ANDed with the Fog Enable flag of the Shadow Polygons, this allows to exclude Fog in shaded regions.

Shadow Volume Open/Closed Shapes
Normally, the shadow volume should have a closed shape, ie. should have a rear-sides (step 1), and corresponding front-sides (step 2) for all possible viewing angles. That is required for the shadow to be drawn correctly, and also for the Stencil Buffer to be reset to zero (in step 2, so that the stencil bits won't disturb other shadow volumes).
Due to that, drawing errors may occur if the shadow volume's front or rear side gets clipped by near/far clip plane.
One exception is that the volume doesn't need a bottom-side (with a suitable volume length, the bottom may be left open, since it vanishes in the floor/walls anyways).


 DS 3D Texture Attributes < ^

4000488h - Cmd 22h - TEXCOORD - Set Texture Coordinates (W)
Specifies the texture source coordinates within the texture bitmap which are to be associated with the next vertex.
  Parameter 1, Bit 0-15   S-Coordinate (X-Coordinate in Texture Source)
Parameter 1, Bit 16-31 T-Coordinate (Y-Coordinate in Texture Source)
Both values are 1bit sign + 11bit integer + 4bit fractional part.
A value of 1.0 (=1 SHL 4) equals to one Texel.
With Position 0.0 , 0.0 drawing starts from upperleft of the Texture.
With positive offsets, drawing origin starts more "within" the texture.
With negative offsets, drawing starts "before" the texture.
"When texture mapping, the Geometry Engine works faster if you issue commands in the order TexCoord -> Normal -> Vertex."

40004A8h - Cmd 2Ah - TEXIMAGE_PARAM - Set Texture Parameters (W)
  0-15  Texture VRAM Offset div 8 (0..FFFFh -> 512K RAM in Slot 0,1,2,3)
(VRAM must be allocated as Texture data, see Memory Control chapter)
16 Repeat in S Direction (0=Clamp Texture, 1=Repeat Texture)
17 Repeat in T Direction (0=Clamp Texture, 1=Repeat Texture)
18 Flip in S Direction (0=No, 1=Flip each 2nd Texture) (requires Repeat)
19 Flip in T Direction (0=No, 1=Flip each 2nd Texture) (requires Repeat)
20-22 Texture S-Size (for N=0..7: Size=(8 SHL N); ie. 8..1024 texels)
23-25 Texture T-Size (for N=0..7: Size=(8 SHL N); ie. 8..1024 texels)
26-28 Texture Format (0..7, see below)
29 Color 0 of 4/16/256-Color Palettes (0=Displayed, 1=Made Transparent)
30-31 Texture Coordinates Transformation Mode (0..3, see below)
Texture Formats:
  0  No Texture
1 A3I5 Translucent Texture
2 4-Color Palette Texture
3 16-Color Palette Texture
4 256-Color Palette Texture
5 4x4-Texel Compressed Texture
6 A5I3 Translucent Texture
7 Direct Texture
Texture Coordinates Transformation Modes:
  0  Do not Transform texture coordinates
1 TexCoord source
2 Normal source
3 Vertex source
The S-Direction equals to the horizontal direction of the source bitmap.
The T-Direction, T-repeat, and T-flip are the same in vertical direction.
For a texture shape ">" the S-clamp, S-repeat, and S-flip look like so:
  Clamp        Repeat      Repeat+Flip
====>---- >>>>>>>>> ><><><><>
With "Clamp", the texture coordinates are clipped to MinMax(0,Size-1), so the texels at the edges of the texture bitmap are repeated (to avoid that effect, fill the bitmap edges by texels with alpha=0, so they become invisible).

40004ACh - Cmd 2Bh - PLTT_BASE - Set Texture Palette Base Address (W)
  0-12   Palette Base Address (div8 or div10h, see below)
(Not used for Texture Format 7: Direct Color Texture)
(0..FFF8h/8 for Texture Format 2: ie. 4-color-palette Texture)
(0..17FF0h/10h for all other Texture formats)
13-31 Not used
The palette data occupies 16bit per color, Bit0-4: Red, Bit5-9: Green, Bit10-14: Blue, Bit15: Not used.
(VRAM must be allocated as Texture Palette, there can be up to 6 Slots allocated, ie. the addressable 18000h bytes, see Memory Control chapter)

TexImageParam and TexPlttBase
Can be issued per polygon (except within polygon strips).


 DS 3D Texture Formats < ^

Format 2: 4-Color Palette Texture
Each Texel occupies 2bit, the first Texel is located in LSBs of 1st byte.
In this format, the Palette Base is specified in 8-byte steps; all other formats use 16-byte steps (see PLTT_BASE register).

Format 3: 16-Color Palette Texture
Each Texel occupies 4bit, the 1st Texel is located in LSBs of 1st byte.

Format 4: 256-Color Palette Texture
Each Texel occupies 8bit, the 1st Texel is located in 1st byte.

Format 7: Direct Color Texture
Each Texel occupies 16bit, the 1st Texel is located in 1st halfword.
Bit0-4: Red, Bit5-9: Green, Bit10-14: Blue, Bit15: Alpha

Format 1: A3I5 Translucent Texture (3bit Alpha, 5bit Color Index)
Each Texel occupies 8bit, the 1st Texel is located in 1st byte.
  Bit0-4: Color Index (0..31) of a 32-color Palette
Bit5-7: Alpha (0..7; 0=Transparent, 7=Solid)
The 3bit Alpha value (0..7) is internally expanded into a 5bit Alpha value (0..31) as follows: Alpha=(Alpha*4)+(Alpha/2).

Format 6: A5I3 Translucent Texture (5bit Alpha, 3bit Color Index)
Each Texel occupies 8bit, the 1st Texel is located in 1st byte.
  Bit0-2: Color Index (0..7) of a 8-color Palette
Bit3-7: Alpha (0..31; 0=Transparent, 31=Solid)
Format 5: 4x4-Texel Compressed Texture
Consists of 4x4 Texel blocks in Slot 0 or 2, 32bit per block, 2bit per Texel,
  Bit0-7   Upper 4-Texel row (LSB=first/left-most Texel)
Bit8-15 Next 4-Texel row ("")
Bit16-23 Next 4-Texel row ("")
Bit24-31 Lower 4-Texel row ("")
Additional Palette Index Data for each 4x4 Texel Block is located in Slot 1,
  Bit0-13  Palette Offset in 4-byte steps; Addr=(PLTT_BASE*10h)+(Offset*4)
Bit14-15 Transparent/Interpolation Mode (0..3, see below)
whereas, the Slot 1 offset is related to above Slot 0 or 2 offset,
  slot1_addr = slot0_addr / 2           ;lower 64K of Slot1 assoc to Slot0
slot1_addr = slot2_addr / 2 + 10000h ;upper 64K of Slot1 assoc to Slot2
The 2bit Texel values (0..3) are intepreted depending on the Mode (0..3),
  Texel  Mode 0       Mode 1             Mode 2         Mode 3
0 Color 0 Color0 Color 0 Color 0
1 Color 1 Color1 Color 1 Color 1
2 Color 2 (Color0+Color1)/2 Color 2 (Color0*5+Color1*3)/8
3 Transparent Transparent Color 3 (Color0*3+Color1*5)/8
Mode 1 and 3 are using only 2 Palette Colors (which requires only half as much Palette memory), the 3rd (and 4th) Texel Colors are automatically set to above values (eg. to gray-shades if color 0 and 1 are black and white).
Note: The maximum size for 4x4-Texel Compressed Textures is 1024x512 or 512x1024 (which are both occupying the whole 128K in slot 0 or 2, plus 64K in slot1), a larger size of 1024x1024 cannot be used because of the gap between slot 0 and 2.


 DS 3D Texture Coordinates < ^

For textured polygons, a texture coordinate must be associated with each vertex of the polygon. The coordinates (S,T) are defined by TEXCOORD command (typically issued prior to each VTX command), and can be optionally automatically transformed, by the Transformation Mode selected in TEXIMAGE_PARAM register.

Texture Matrix
Although the texture matrix is 4x4, with values m[0..15], only the left two columns of this matrix are actually used. In Mode 2 and 3, the bottom row of the matrix is replaced by S and T values from most recent TEXCOORD command.

Texture Coordinates Transformation Mode 0 - No Transform
The values are set upon executing the TEXCOORD command,
  ( S' T' )  =  ( S  T )
Simple coordinate association, without using the Texture Matrix at all.

Texture Coordinates Transformation Mode 1 - TexCoord source
The values are calculated upon executing the TEXCOORD command,
                                     | m[0]  m[1]  |
( S' T' ) = ( S T 1/16 1/16 ) * | m[4] m[5] |
| m[8] m[9] |
|_m[12] m[13]_|
Can be used to produce a simple texture scrolling, rotation, or scaling, by setting a translate, rotate, or scale matrix for the texture matrix.

Texture Coordinates Transformation Mode 2 - Normal source
The values are calculated upon executing the NORMAL command,
                                     | m[0]  m[1]  |
( S' T' ) = ( Nx Ny Nz 1.0 ) * | m[4] m[5] |
| m[8] m[9] |
|_S T _|
Can be used to produce spherical reflection mapping by setting in the texture matrix to the current directional vector matrix, multiplied by a scaling matrix that expands the directional vector space from -1.0..+1.0 to one half of the texture size. For that purpose, translate the origin of the texture coordinate to the center of the spherical texture by using TexCoord command.

Texture Coordinates Transformation Mode 3 - Vertex source
The values are calculated upon executing any VTX commands,
                                     | m[0]  m[1]  |
( S' T' ) = ( Vx Vy Vz 1.0 ) * | m[4] m[5] |
| m[8] m[9] |
|_S T _|
Can be used to produce texture scrolls dependent on the View coordinates by copying the current position coordinate matrix into the texture matrix.


 DS 3D Texture Blending < ^

Polygon pixels consist of a Vertex Color, and of Texture Colors.
These colors can be blended as described below. Or, to use only either one:
To use only the Vertex Color: Select No Texture in TEXIMAGE_PARAM.
To use only the Texture Color: Select Modulation Mode and Alpha=31 in POLYGON_ATTR, and set COLOR to 7FFFh (white), or to gray values (to decrease brightness of the texture color).

Vertex Color (Rv,Gv,Bv,Av)
The Vertex Color (Rv,Gv,Bv) can be changed per Vertex (either by Color, Normal, or Material0 command), pixels between vertices are shaded to medium values of the surrounding vertices. The Vertex Alpha (Av), can be changed only per polygon (by PolygonAttr command).

Texture Colors (Rt,Gt,Bt,At)
The Texture Colors (Rt,Gt,Bt), and Alpha value (At), are defined by the Texture Bitmap. For formats without Alpha value, assume At=31 (solid), and for formats with 1bit Alpha assume At=A*31.

Shading Table Colors (Rs,Gs,Bs)
In Toon/Highlight Shading Mode, the red component of the Vertex Color (Rv) is mis-used as an index in the Shading Table; to read Shading Colors (Rs,Gs,Bs) from the table; the green and blue components of the Vertex Color (Gv,Bv) are unused in this mode. The Vertex Alpha (Av) is kept used.
Shading is used in Polygon Mode 2, whether it is Toon or Highlight Shading is selected in DISP3DCNT; this is a per-frame selection, so only either one can be used.

Texture Blending - Modulation Mode (Polygon Attr Mode 0)
  R = ((Rt+1)*(Rv+1)-1)/64
G = ((Gt+1)*(Gv+1)-1)/64
B = ((Bt+1)*(Bv+1)-1)/64
A = ((At+1)*(Av+1)-1)/64
The multiplication result is decreased intensity (unless both factors are 63).

Texture Blending - Decal Mode (Polygon Attr Mode 1)
  R = (Rt*At + Rv*(63-At))/64  ;except, when At=0: R=Rv, when At=31: R=Rt
G = (Gt*At + Gv*(63-At))/64 ;except, when At=0: G=Gv, when At=31: G=Gt
B = (Bt*At + Bv*(63-At))/64 ;except, when At=0: B=Bv, when At=31: B=Bt
A = Av
The At value is used (only) as ratio for Texture color vs Vertex Color.

Texture Blending - Toon Shading (Polygon Mode 2, DISP3DCNT=Toon)
The vertex color Red component (Rv) is used as an index in the toon table.
  R = ((Rt+1)*(Rs+1)-1)/64   ;Rs=ToonTableRed[Rv]
G = ((Gt+1)*(Gs+1)-1)/64 ;Gs=ToonTableGreen[Rv]
B = ((Bt+1)*(Bs+1)-1)/64 ;Bs=ToonTableBlue[Rv]
A = ((At+1)*(Av+1)-1)/64
This is same as Modulation Mode, but using Rs,Gs,Bs instead Rv,Gv,Bv.

Texture Blending - Highlight Shading (Polygon Mode 2, DISP3DCNT=Highlight)
  R = ((Rt+1)*(Rs+1)-1)/64+Rs ;truncated to MAX=63
G = ((Gt+1)*(Gs+1)-1)/64+Gs ;truncated to MAX=63
B = ((Bt+1)*(Bs+1)-1)/64+Bs ;truncated to MAX=63
A = ((At+1)*(Av+1)-1)/64
Same as Toon Shading, with additional addition offset, the addition may increase the intensity, however, it may also change the hue of the color.

Above formulas are for 6bit RGBA values, ie. 5bit values internally expanded to 6bit as such: IF X>0 THEN X=X*2+1.

Uni-Colored Textures
Although textures are normally containing "pictures", in some cases it makes sense to use "blank" textures that are filled with a single color:
Wire-frame polygons are always having Av=31, however, they can be made transparent by using Translucent Textures (ie. A5I3 or A3I5 formats) with At<31.
In Toon/Highlight shading modes, the Vertex Color is mis-used as table index, however, Toon/Highlight shading can be used on uni-colored textures, which is more or less the same as using Toon/Highlight shading on uni-colored Vertex-colors.


 DS 3D Toon, Edge, Fog, Anti-Aliasing < ^

4000380h..3BFh - TOON_TABLE - Toon Table (W)
This 64-byte region contains the 32 toon colors (16bit per color), used for both Toon and Highlight Shading. In both modes, the Red (R) component of the RGBA vertex color is mis-used as index to obtain the new RGB value from the toon table, vertex Alpha (A) is kept used as is.
  Bit0-4: Red, Bit5-9: Green, Bit10-14: Blue, Bit15: Not Used
Shading can be enabled (per polygon) in Polygon_Attr, whether it is Toon or Highlight Shading is set (per frame) in DISP3DCNT. For more info on shading, see:
DS 3D Texture Blending

4000330h..33Fh - EDGE_COLOR - Edge Colors 0..7 (W)
This 16-byte region contains the 8 edge colors (16bit per color), Edge Color 0 is used for Polygon ID 00h..07h, Color 1 for ID 08h..0Fh, and so on.
  Bit0-4: Red, Bit5-9: Green, Bit10-14: Blue, Bit15: Not Used
Edge Marking allows to mark the edges of an object (whose polygons all have the same ID) in a wire-frame style. Edge Marking can be enabled (per frame) in DISP3DCNT. When enabled, the polygon edges are drawn at the edge color, but only if the old ID value in the Attribute Buffer is different than the Polygon ID of the new polygon, so no edges are drawn between connected or overlapping polygons with same ID values.
Edge Marking is applied ONLY to opaque polygons (including wire-frames).
Edge Marking increases the size of opaque polygons (see notes below).
Edge Marking doesn't work very well with Anti-Aliasing (see Anti-Aliasing).
Technically, when rendering a polygon, it's edges (ie. the wire-frame region) are flagged as possible-edges (but it's still rendered normally, without using the edge-color). Once when all opaque polygons have been rendered, the edge color is applied to these flagged pixels, under following conditions: At least one of the four surrounding pixels (up, down, left, right) must have different polygon_id than the edge, and, the edge depth must be LESS than the depth of that surrounding pixel (ie. no edges are rendered if the depth is GREATER or EQUAL, even if the polygon_id differs). At the screen borders, edges seem to be rendered in respect to the rear-plane's polygon_id entry (see Port 4000350h).

4000358h - FOG_COLOR - Fog Color (W)
Fog can be used to let more distant polygons to disappear in foggy grayness (or in darkness, or other color). This is particulary useful to "hide" the far clip plane. Fog can be enabled in DISP3DCNT.Bit7, moreover, when enabled, it can be activated or deactivated per polygon (POLYGON_ATTR.Bit15), and per Rear-plane (see there).
  0-4    Fog Color, Red     ;\
5-9 Fog Color, Green ; used only when DISP3DCNT.Bit6 is zero
10-14 Fog Color, Blue ;/
15 Not used
16-20 Fog Alpha ;-used no matter of DISP3DCNT.Bit6
21-31 Not used
Whether or not fog is applied to a pixel depends on the Fog flag in the framebuffer, the initial value of that flag can be defined in the rear-plane. When rendering opaque pixels, the framebuffer's fog flag gets replaced by PolygonAttr.Bit15. When rendering translucent pixels, the old flag in the framebuffer gets ANDed with PolygonAttr.Bit15.

400035Ch - FOG_OFFSET - Fog Offset (W)
  0-14   Fog Offset (Unsigned) (0..7FFFh=Depth) (and FOG_SHIFT in DISP3DCNT)
15-31 Not used
The meaning of the depth values depends on whether z-values or w-values are stored in the framebuffer (see SwapBuffers.Bit1).
For translucent polygons, the depth value (and therefore: the amount of fog) depends on the depth update bit (see PolygonAttr.Bit11).

4000360h..37Fh - FOG_TABLE - Fog Density Table (W)
This 32-byte region contains the 32 Fog Densities (one byte per entry),
  0-6    Fog Density (00h..7Fh = None..Full) (usually increasing values)
7 Not used
For n=0..1Fh, FOG_OFFSET=Port 400035Ch, FOG_SHIFT=DISP3DCNT.Bit8..11:
  Density[n] --> used at Depth = FOG_OFFSET+(400h SHR FOG_SHIFT)*(n+1)
Density[0,31] values are used for all pixels that are closer or more distant than the Density[0,31] depth boundaries. Density is linear interpolated for pixels that are between two Density depth boundaries.

Anti-Aliasing
Anti-Aliasing can be enabled in DISP3DCNT, when enabled, the edges of opaque polygons will be anti-aliased (ie. the pixels at the edges may become translucent).
Anti-Aliasing is not applied on translucent polygons. And, Anti-Aliasing is not applied on the interiors of the poylgons (eg. an 8x8 chessboard texture will be anti-aliased only at the board edges, not at the edges of the 64 fields).
Anti-Aliasing is (accidently) applied to opaque 1dot polygongs, line-segments and wire-frames (which results in dirty lines with missing pixels, 1dot polys become totally invisible), workaround is to use translucent dots, lines and wires with alpha=30.
Anti-Aliasing is (correctly) not applied to edges of Edge-Marked polygons, in that special case even opaque line-segments and wire-frames are working even if anti-aliasing is enabled (if they are edge-marked, ie. if their polygon ID differs from the framebuffer's ID).
Anti-Aliasing is (accidently) making the edges of Edge-Marked polygons translucent (with alpha=16 or so?), that reduces the contrast of the edge colors. Moreover, if two of these translucent do overlap, then they blended twice (even if they have the same polygon_id, and even if the depth_update bit in polygon_attr is set; both should normally prevent double-blending), that scatters the brightness of such edges.

Polygon Size
In some cases, the NDS hardware doesn't render the lower/right edges of certain polygons. That feature reduces rendering load, and, when rendering connected polygons (eg. strips), then it'd be unneccessary to render that edges (since they'd overlap with the upper/left edges of the other polygon). On the contrary, if there's no connected polygon displayed, then the polygon may appear smalled than expected. Small polygons with excluded edges are:
  Opaque polygons (except wire-frames) without Edge-Marking and Anti-Aliasing,
and, all polygons with vertical right-edges (except line-segments)
All other polygons are rendered at full size with all edges included (except vertical right edges). Note: To disable the small-polygon feature, you can enable edge-marking (which does increase the polygon size, even if no edges are drawn, ie. even if all polys do have the same ID).


 DS 3D Status < ^

4000600h - GXSTAT - Geometry Engine Status Register (R and R/W)
Bit 30-31 are R/W. Writing "1" to Bit15 does reset the Error Flag (Bit15), and additionally resets the Projection Stack Pointer (Bit13), and probably (?) also the Texture Stack Pointer. All other GXSTAT bits are read-only.
  0     BoxTest,PositionTest,VectorTest Busy (0=Ready, 1=Busy)
1 BoxTest Result (0=All Outside View, 1=Parts or Fully Inside View)
2-7 Not used
8-12 Position & Vector Matrix Stack Level (0..31) (lower 5bit of 6bit value)
13 Projection Matrix Stack Level (0..1)
14 Matrix Stack Busy (0=No, 1=Yes; Currently executing a Push/Pop command)
15 Matrix Stack Overflow/Underflow Error (0=No, 1=Error/Acknowledge/Reset)
16-24 Number of 40bit-entries in Command FIFO (0..256)
(24) Command FIFO Full (MSB of above) (0=No, 1=Yes; Full)
25 Command FIFO Less Than Half Full (0=No, 1=Yes; Less than Half-full)
26 Command FIFO Empty (0=No, 1=Yes; Empty)
27 Geometry Engine Busy (0=No, 1=Yes; Busy; Commands are executing)
28-29 Not used
30-31 Command FIFO IRQ (0=Never, 1=Less than half full, 2=Empty, 3=Reserved)
When GXFIFO IRQ is enabled (setting 1 or 2), the IRQ flag (IF.Bit21) is set while and as long as the IRQ condition is true (and attempts to acknowledge the IRQ by writing to IF.Bit21 have no effect). So that, the IRQ handler must either fill the FIFO, or disable the IRQ (setting 0), BEFORE trying to acknowledge the IRQ.

4000604h - RAM_COUNT - Polygon List & Vertex RAM Count Register (R)
  0-11   Number of Polygons currently stored in Polygon List RAM (0..2048)
12-15 Not used
16-28 Number of Vertices currently stored in Vertex RAM (0..6144)
13-15 Not used
If a SwapBuffers command has been sent, then the counters are reset 10 cycles (at 33.51MHz clock) after next VBlank.

4000320h - RDLINES_COUNT - Rendered Line Count Register (R)
Rendering starts in scanline 214, the rendered lines are stored in a buffer that can hold up to 48 scanlines. The actual screen output begins after scanline 262, the lines are then read from the buffer and sent to the display. Simultaneously, the rendering engine keeps writing new lines to the buffer (ideally at the same speed than display output, so the buffer would always contain 48 pre-calculated lines).
  0-5    Minimum Number (minus 2) of buffered lines in previous frame (0..46)
6-31 Not used
If rendering becomes slower than the display output, then the number of buffered lines decreases. Smaller values in RDLINES indicate that additional load to the rendering engine may cause buffer underflows in further frames, if so, the program should reduce the number of polygons to avoid display glitches.
Even if RDLINES becomes zero, it doesn't indicate whether actual buffer underflows have occured or not (underflows are indicated in DISP3DCNT Bit12).


 DS 3D Tests < ^

40005C0h - Cmd 70h - BOX_TEST - Test if Cuboid Sits inside View Volume (W)
The BoxTest result indicates if one or more of the 6 faces of the box are fully or parts of inside of the view volume. Can be used to reduce unneccessary overload, ie. if the result is false, then the program can skip drawing of objects which are inside of the box.
BoxTest verifies only if the faces of the box are inside view volume, and so, it will return false if the whole view volume is located inside of the box (still objects inside of the box may be inside of view).
  Parameter 1, Bit 0-15   X-Coordinate
Parameter 1, Bit 16-31 Y-Coordinate
Parameter 2, Bit 0-15 Z-Coordinate
Parameter 2, Bit 16-31 Width (presumably: X-Offset?)
Parameter 3, Bit 0-15 Height (presumably: Y-Offset?)
Parameter 3, Bit 16-31 Depth (presumably: Z-Offset?)
All values are 1bit sign, 3bit integer, 12bit fractional part
The result of the "coordinate+offset" additions should not overflow 16bit vertex coordinate range (1bit sign, 3bit integer, 12bit fraction).
Before using BoxTest, be sure that far-plane-intersecting & 1-dot polygons are enabled, if they aren't: Send the PolygonAttr command (with bit12,13 set to enable them), followed by dummy Begin and End commands (required to apply the new PolygonAttr settings). BoxTest should not be issued within Begin/End.
After sending the BoxTest command, wait until GXSTAT.Bit0 indicates Ready, then read the result from GXSTAT.Bit1.

40005C4h - Cmd 71h - POS_TEST - Set Position Coordinates for Test (W)
  Parameter 1, Bit 0-15   X-Coordinate
Parameter 1, Bit 16-31 Y-Coordinate
Parameter 2, Bit 0-15 Z-Coordinate
Parameter 2, Bit 16-31 Not used
All values are 1bit sign, 3bit integer, 12bit fractional part.
Multiplies the specified line-vector (x,y,z,1) by the clip coordinate matrix.
After sending the command, wait until GXSTAT.Bit0 indicates Ready, then read the result from POS_RESULT registers. POS_TEST can be issued anywhere (except within polygon strips, huh?).
Caution: POS_TEST overwrites the internal VTX registers, so the next vertex should be <fully> defined by VTX_10 or VTX_16, otherwise, when using VTX_XY, VTX_XZ, VTX_YZ, or VTX_DIFF, then the new vertex will be relative to the POS_TEST coordinates (rather than to the previous vertex).

4000620h..62Fh - POS_RESULT - Position Test Results (R)
This 16-byte region (4 words) contains the resulting clip coordinates (x,y,z,w) from the POS_TEST command. Each value is 1bit sign, 19bit integer, 12bit fractional part.

40005C8h - Cmd 72h - VEC_TEST - Set Directional Vector for Test (W)
  Parameter 1, Bit 0-9    X-Component
Parameter 1, Bit 10-19 Y-Component
Parameter 1, Bit 20-29 Z-Component
Parameter 1, Bit 30-31 Not used
All values are 1bit sign, 9bit fractional part.
Multiplies the specified line-vector (x,y,z,0) by the directional vector matrix. Similar as for the NORMAL command, it does require Matrix Mode 2 (ie. Position & Vector Simultaneous Set mode).
After sending the command, wait until GXSTAT.Bit0 indicates Ready, then read the result ("the directional vector in the View coordinate space") from VEC_RESULT registers.

4000630h..635h - VEC_RESULT - Vector Test Results (R)
This 6-byte region (3 halfwords) contains the resulting vector (x,y,z) from the VEC_TEST command. Each value is 4bit sign, 0bit integer, 12bit fractional part. The 4bit sign is either 0000b (positive) or 1111b (negative).
There is no integer part, so values >=1.0 or <-1.0 will cause overflows.
(Eg. +1.0 aka 1000h will be returned as -1.0 aka F000h due to overflow and sign-expansion).


 DS 3D Rear-Plane < ^

Other docs seem to refer to this as Clear-plane, rather than Rear-plane, anyways, the plane can be an image, so it isn't always "cleared".
The view order is as such:
  --> 2D Layers --> 3D Polygons --> 3D Rear-plane --> 2D Layers --> 2D Backdrop
The rear-plane can be disabled (by making it transparent; alpha=0), so that the 2D layers become visible as background.
2D layers can be moved in front of, or behind the 3D layer-group (which is represented as BG0 to the 2D Engine), 2D layers behind BG0 can be used instead of, or additionally to the rear-plane.

The rear-plane can be initialized via below two registers (so all pixels in the plane have the same colors and attributes), this method is used when DISP3DCNT.14 is zero:

4000350h - CLEAR_COLOR - Clear Color Attribute Register (W)
  0-4    Clear Color, Red
5-9 Clear Color, Green
10-14 Clear Color, Blue
15 Fog (enables Fog to the rear-plane) (doesn't affect Fog of polygons)
16-20 Alpha
21-23 Not used
24-29 Clear Polygon ID (affects edge-marking, at the screen-edges?)
30-31 Not used
4000354h - CLEAR_DEPTH - Clear Depth Register (W)
  0-14   Clear Depth (0..7FFFh) (usually 7FFFh = most distant)
15 Not used
16-31 See Port 4000356h, CLRIMAGE_OFFSET
The 15bit Depth is expanded to 24bit as "X=(X*200h)+((X+1)/8000h)*1FFh".

Rear Color/Depth Bitmaps
Alternately, the rear-plane can be initialized by bitmap data (allowing to assign different colors & attributes to each pixel), this method is used when DISP3DCNT.14 is set:
Consists of two bitmaps (one with color data, one with depth data), each containing 256x256 16bit entries, and so, each occupying a whole 128K slot,
  Rear Color Bitmap (located in Texture Slot 2)
0-4 Clear Color, Red
5-9 Clear Color, Green
10-14 Clear Color, Blue
15 Alpha (0=Transparent, 1=Solid) (equivalent to 5bit-alpha 0 and 31)
Rear Depth Bitmap (located in Texture Slot 3)
0-14 Clear Depth, expanded to 24bit as X=(X*200h)+((X+1)/8000h)*1FFh
15 Clear Fog (Initial fog enable value)
This method requires VRAM to be allocated to Texture Slot 2 and 3 (see Memory Control chapter). Of course, in that case the VRAM is used as Rear-plane, and cannot be used for Textures.
The bitmap method is restricted to 1bit alpha values (the register-method allows to use a 5bit alpha value).
The Clear Polygon ID is kept defined in the CLEAR_COLOR register, even in bitmap mode.

4000356h - CLRIMAGE_OFFSET - Rear-plane Bitmap Scroll Offsets (W)
The visible portion of the bitmap is 256x192 pixels (regardless of the viewport setting, which is used only for polygon clipping). Internally, the bitmap is 256x256 pixels, so the bottom-most 64 rows are usually offscreen, unless scrolling is used to move them into view.
  Bit0-7   X-Offset (0..255; 0=upper row of bitmap)
Bit8-14 Y-Offset (0..255; 0=left column of bitmap)
The bitmap wraps to the upper/left edges when exceeding the lower/right edges.


 DS 3D Final 2D Output < ^

The final 3D image (consisting of polygons and rear-plane) is passed to 2D Engine A as BG0 layer (provided that DISPCNT is configured to use 3D as BG0).

Scrolling
The BG0HOFS register (4000010h) can be used the scroll the 3D layer horizontally, the scroll region is 512 pixels, consisting of 256 pixels for the 3D image, followed by 256 transparent pixels, and then wrapped to the 3D image again. Vertical scrolling (and rotation/scaling) cannot be used on the 3D layer.

BG Priority Order
The lower 2bit of the BG0CNT register (4000008h) control the priority relative to other BGs and OBJs, so the 3D layer can be in front of or behind 2D layers. All other bits in BG0CNT have no effect on 3D, namely, mosaic cannot be used on the 3D layer.

Special Effects
Special Effects Registers (4000050h..54h) can be used as such:
  Brightness up/down with BG0 as 1st Target via EVY   (as for 2D)
Blending with BG0 as 2nd Target via EVA/EVB (as for 2D)
Blending with BG0 as 1st Target via 3D Alpha-values (unlike as for 2D)
The latter method probably (?) uses per-pixel 3D alpha values as such: EVA=A/2, and EVB=16-A/2, without using the EVA/EVB settings in 4000052h.

Window Feature
Window Feature (4000040h..4Bh) can be used as for 2D.
"If the 3D screen has highest priority, then alpha-blending is always enabled, regardless of the Window Control register's color effect enable flag [ie. regardless of Bit5 of WIN0IN, WIN1IN, WINOBJ, WINOUT registers]"... not sure if that is true, and if it superseedes the effect selection in Port 4000050h...?


 DS Sound < ^

The DS contains 16 hardware sound channels.
The console contains two speakers, arranged left and right of the upper screen, and so, provides stereo sound even without using the headphone socket.

DS Sound Channels 0..15
DS Sound Control Registers
DS Sound Capture
DS Sound Block Diagrams
DS Sound Notes

Power control
When restoring power supply to the sound circuit, do not output any sound during the first 15 milliseconds.


 DS Sound Channels 0..15 < ^

Each of the 16 sound channels occopies 16 bytes in the I/O region, starting with channel 0 at 4000400h..400040Fh, up to channel 15 at 40004F0h..40004FFh.

40004x0h - NDS7 - SOUNDxCNT - Sound Channel X Control Register (R/W)
  Bit0-6    Volume Mul   (0..127=silent..loud)
Bit7 Not used (always zero)
Bit8-9 Volume Div (0=Normal, 1=Div2, 2=Div4, 3=Div16)
Bit10-14 Not used (always zero)
Bit15 Hold (0=Normal, 1=Hold last sample after one-shot sound)
Bit16-22 Panning (0..127=left..right) (64=half volume on both speakers)
Bit23 Not used (always zero)
Bit24-26 Wave Duty (0..7) ;HIGH=(N+1)*12.5%, LOW=(7-N)*12.5% (PSG only)
Bit27-28 Repeat Mode (0=Manual, 1=Loop Infinite, 2=One-Shot, 3=Prohibited)
Bit29-30 Format (0=PCM8, 1=PCM16, 2=IMA-ADPCM, 3=PSG/Noise)
Bit31 Start/Status (0=Stop, 1=Start/Busy)
All channels support ADPCM/PCM formats, PSG rectangular wave can be used only on channels 8..13, and white noise only on channels 14..15.

40004x4h - NDS7 - SOUNDxSAD - Sound Channel X Data Source Register (W)
  Bit0-26  Source Address (must be word aligned, bit0-1 are always zero)
Bit27-31 Not used
40004x8h - NDS7 - SOUNDxTMR - Sound Channel X Timer Register (W)
  Bit0-15  Timer Value, Sample frequency, timerval=-(33513982/2)/freq
The PSG Duty Cycles are composed of eight "samples", and so, the frequency for Rectangular Wave is 1/8th of the selected sample frequency.
For PSG Noise, the noise frequency is equal to the sample frequency.

40004xAh - NDS7 - SOUNDxPNT - Sound Channel X Loopstart Register (W)
  Bit0-15  Loop Start, Sample loop start position
(counted in words, ie. N*4 bytes)
40004xCh - NDS7 - SOUNDxLEN - Sound Channel X Length Register (W)
The number of samples for N words is 4*N PCM8 samples, 2*N PCM16 samples, or 8*(N-1) ADPCM samples (the first word containing the ADPCM header). The Sound Length is not used in PSG mode.
  Bit0-21  Sound length (counted in words, ie. N*4 bytes)
Bit22-31 Not used
Minimum length (the sum of PNT+LEN) is 4 words (16 bytes), smaller values (0..3 words) are causing hang-ups (busy bit remains set infinite, but no sound output occurs). However, at least on the NDS Lite, playing (looping) samples of 4 bytes and samples of 2 halfwords (so in both cases PNT=0 and LEN=1) was successful.

In One-shot mode, the sound length is the sum of (PNT+LEN).
In Looped mode, the length is (1*PNT+Infinite*LEN), ie. the first part (PNT) is played once, the second part (LEN) is repeated infinitely.


 DS Sound Control Registers < ^

4000500h - NDS7 - SOUNDCNT - Sound Control Register (R/W)
  Bit0-6   Master Volume       (0..127=silent..loud)
Bit7 Not used (always zero)
Bit8-9 Left Output from (0=Left Mixer, 1=Ch1, 2=Ch3, 3=Ch1+Ch3)
Bit10-11 Right Output from (0=Right Mixer, 1=Ch1, 2=Ch3, 3=Ch1+Ch3)
Bit12 Output Ch1 to Mixer (0=Yes, 1=No) (both Left/Right)
Bit13 Output Ch3 to Mixer (0=Yes, 1=No) (both Left/Right)
Bit14 Not used (always zero)
Bit15 Master Enable (0=Disable, 1=Enable)
Bit16-31 Not used (always zero)
4000504h - NDS7 - SOUNDBIAS - Sound Bias Register (R/W)
  Bit0-9   Sound Bias    (0..3FFh, usually 200h)
Bit10-31 Not used (always zero)
After applying the master volume, the signed left/right audio signals are in range -200h..+1FFh (with medium level zero), the Bias value is then added to convert the signed numbers into unsigned values (with medium level 200h).
BIAS output is always enabled, even when Master Enable (SOUNDCNT.15) is off.

The sampling frequency of the mixer is 1.04876 MHz with an amplitude resolution of 24 bits, but the sampling frequency after mixing with PWM modulation is 32.768 kHz with an amplitude resolution of 10 bits.


 DS Sound Capture < ^

The DS contains 2 built-in sound capture devices that can capture output waveform data to memory.
Sound capture 0 can capture output from left-mixer or output from channel 0.
Sound capture 1 can capture output from right-mixer or output from channel 2.

4000508h - NDS7 - SNDCAP0CNT - Sound Capture 0 Control Register (R/W)
4000509h - NDS7 - SNDCAP1CNT - Sound Capture 1 Control Register (R/W)
  Bit0     Control of Associated Sound Channels (ANDed with Bit7)
SNDCAP0CNT: Output Sound Channel 1 (0=As such, 1=Add to Channel 0)
SNDCAP1CNT: Output Sound Channel 3 (0=As such, 1=Add to Channel 2)
Caution: Addition mode works only if BOTH Bit0 and Bit7 are set.
Bit1 Capture Source Selection
SNDCAP0CNT: Capture 0 Source (0=Left Mixer, 1=Channel 0/Bugged)
SNDCAP1CNT: Capture 1 Source (0=Right Mixer, 1=Channel 2/Bugged)
Bit2 Capture Repeat (0=Loop, 1=One-shot)
Bit3 Capture Format (0=PCM16, 1=PCM8)
Bit4-6 Not used (always zero)
Bit7 Capture Start/Status (0=Stop, 1=Start/Busy)
4000510h - NDS7 - SNDCAP0DAD - Sound Capture 0 Destination Address (R/W)
4000518h - NDS7 - SNDCAP1DAD - Sound Capture 1 Destination Address (R/W)
  Bit0-26  Destination address (word aligned, bit0-1 are always zero)
Bit27-31 Not used (always zero)
Capture start address (also used as re-start address for looped capture).

4000514h - NDS7 - SNDCAP0LEN - Sound Capture 0 Length (W)
400051Ch - NDS7 - SNDCAP1LEN - Sound Capture 1 Length (W)
  Bit0-15  Buffer length (1..FFFFh words) (ie. N*4 bytes)
Bit16-31 Not used
Minimum length is 1 word (attempts to use 0 words are interpreted as 1 word).

SOUND1TMR - NDS7 - Sound Channel 1 Timer shared as Capture 0 Timer
SOUND3TMR - NDS7 - Sound Channel 3 Timer shared as Capture 1 Timer
There are no separate capture frequency registers, instead, the sample frequency of Channel 1/3 is shared for Capture 0/1. These channels are intended to output the captured data, so it makes sense that both capture and sound output use the same frequency.

For Capture 0, a=0, b=1, x=0.
For Capture 1, a=2, b=3, x=1.

Capture Bugs
The NDS contains two hardware bugs which do occur when capturing data from ch(a) (SNDCAPxCNT.Bit1=1), if so, either bug occurs depending on whether ch(a)+ch(b) addition is enabled or disabled (SNDCAPxCNT.Bit0).
  1) Both Negative Bug - SNDCAPxCNT Bit1=1, Bit0=0 (addition disabled)
Capture data is accidently set to -8000h if ch(a) and ch(b) are both <0.
Otherwise the correct capture result is returned, ie. plain ch(a) data,
not being affected by ch(b) (since addition is disabled).
Workaround: Ensure that ch(a) and/or ch(b) are >=0 (or disabled).
2) Overflow Bug - SNDCAPxCNT Bit1=1, Bit0=1 (addition enabled)
In this mode, Capture data isn't clipped to MinMax(-8000h,+7FFFh),
instead, it is ANDed with FFFFh, so the sign bit is lost if the
addition result ch(a)+ch(b) is less/greater than -8000h/+7FFFh.
Workaround: Reduce ch(a)/ch(b) volume or data to avoid overflows.
These bugs occur only for capture (speaker output remains intact), and they occur only when capturing ch(a) (capturing mixer-output works flawless).

ch(a)+ch(b) Channel Addition
The ch(a)+ch(b) addition unit has 2 outputs, with slightly different results:
 1) Addition Result for Capture(x) when using capture source=ch(a):
Addition is performed always, no matter of SOUNDCNT.Bit12/13.
And, no matter of ch(a) enable, result is plain ch(b) if ch(a) is disabled.
Result is 16bit (plus fraction) with overflow error (see Capture Bugs).
2) Addition Result for Mixer (towards speakers, and capture source=mixer):
Ch(b) is muted if ch(a) is disabled.
Ch(b) is muted if ch(b) SOUNDCNT.Bit12/13 is set to "Ch(b) not to mixer".
Result is 17bit (plus fraction) without overflow error.
Addition mode can be used only if the <corresponding> capture unit is enabled, ie. if SNDCAPxCNT (Bit0 AND Bit7)=1. If so, addition affects both mixers (and so, may also affect the <other> capture unit if it reads from mixer).


 DS Sound Block Diagrams < ^

Left Mixer with Capture 0
(Right Mixer with Capture 1, respectively)
                       _____
Ch0.L ------------->| | +------------------------------> to Capture 0
___ | | | ___
Ch1.L ---+->|Sel|-->| | | Ch0..Ch15 | |
| |___| |Left |--+---------------->| |
Ch2.L ---|--------->|Mixer| |Sel| ______ ____
| ___ | | Ch1 | | |Master| |Add |
Ch3.L -+-|->|Sel|-->| | +----------------->| |->|Volume|->|Bias|-> L
| | |___| | | | | | |______| |____|
Ch4.L -|-|--------->| | | Ch3 | |
... -|-|--------->| | | +--------------->| |
Ch15.L-|-|--------->|_____| | | ___ | |
| +------------------+-|->|Add| Ch1+Ch3 | |
+----------------------+->|___|-------->|___|
Channel 0 and 1, Capture 0 with input from Left Mixer
(Channel 2 and 3, Capture 1 with input from Right Mixer, respectively)
  ____     _________     ___     ___      ___
|FIFO|-->|Channel 0|-->|Vol|-->|Add|-+->|Pan|--> Ch0.L
|____| |_________| |___| |___| | |___|--> Ch0.R
____ _________ ___ ^ |
|FIFO|<--|Capture 0|<--|Sel|<----|---+
|____| |_ _____ _| |___|<----|-------------- Left Mixer
____ _:Timer:_ ___ _|_ ___
|FIFO|-->|Channel 1|-->|Vol|-->|Sel|--->|Pan|--> Ch1.L
|____| |_________| |___| |___| |___|--> Ch1.R
Channel 4 (Channel 5..15, respectively)
  ____     _________     ___              ___
|FIFO|-->|Channel 4|-->|Vol|----------->|Pan|--> Ch4.L
|____| |_________| |___| |___|--> Ch4.R
The FIFO isn't used in PSG/Noise modes (supported on channel 8..15).


 DS Sound Notes < ^

Sound delayed Start/Restart (timing glitch)
A sound will be started/restarted when changing its start bit from 0 to 1, however, the sound won't start immediately: PSG/Noise starts after 1 sample, PCM starts after 3 samples, and ADPCM starts after 11 samples (3 dummy samples as for PCM, plus 8 dummy samples for the ADPCM header).

Sound Stop (timing note)
In one-shot mode, the Busy bit gets cleared automatically at the BEGIN of the last sample period, nethertheless (despite of the cleared Busy bit) the last sample is kept output until the END of the last sample period (or, if the Hold flag is set, then the last sample is kept output infinitely, that is, until Hold gets cleared, or until the sound gets restarted).

Hold Flag (appears useless/bugged)
The Hold flag allows to keep the last sample being output infinitely after the end of one-shot sounds. This feature is probably intended to allow to play two continous one-shot sound blocks (without producing any scratch noise upon small delays between both blocks, which would occur if the output level would drop to zero).
However, the feature doesn't work as intended. As described above, PCM8/PCM16 sound starts are delayed by 3 samples. With Hold flag set, old output level is acually kept intact during the 1st sample, but the output level drops to zero during 2nd-3rd sample, before starting the new sound in 4th sample.

7bit Volume and Panning Values
  data.vol   = data*N/128
pan.left = data*(128-N)/128
pan.right = data*N/128
master.vol = data*N/128/64
Register settings of 0..126,127 are interpreted as N=0..126,128.

Max Output Levels
When configured to max volume (and left-most or right-most panning), each channel can span the full 10bit output range (-200h..1FFh) on one speaker, as well as the full 16bit input range (-8000h..7FFFh) on one capture unit.
(It needs 2 channels to span the whole range on BOTH speakers/capture units.)
Together, all sixteen channels could thus reach levels up to -1E00h..21F0h (with default BIAS=200h) on one speaker, and -80000h..+7FFF0h on one capture unit. However, to avoid overflows, speaker outputs are clipped to MinMax(0,3FFh), and capture inputs to MinMax(-8000h..+7FFFh).

Channel/Mixer Bit-Widths
  Step                           Bits  Min        Max
0 Incoming PCM16 Data 16.0 -8000h +7FFFh
1 Volume Divider (div 1..16) 16.4 -8000h +7FFFh
2 Volume Factor (mul N/128) 16.11 -8000h +7FFFh
3 Panning (mul N/128) 16.18 -8000h +7FFFh
4 Rounding Down (strip 10bit) 16.8 -8000h +7FFFh
5 Mixer (add channel 0..15) 20.8 -80000h +7FFF0h
6 Master Volume (mul N/128/64) 14.21 -2000h +1FF0h
7 Strip fraction 14.0 -2000h +1FF0h
8 Add Bias (0..3FFh, def=200h) 15.0 -2000h+0 +1FF0h+3FFh
9 Clip (min/max 0h..3FFh) 10.0 0 +3FFh
Table shows integer.fractional bits, and min/max values (without fraction).

Capture Clipping/Rounding
Incoming ch(a) is NOT clipped, ch(a)+ch(b) may overflow (see Capture Bugs).
Incoming mixer data (20.8bits) is clipped to 16.8bits (MinMax -8000h..7FFFh).
For PCM8 capture format, the 16.8 bits are divided by 100h (=8.16 bits).
If the MSB of the fractional part is set, then data is rounded towards zero.
(Positive values are rounded down, negative values are rounded up.)
The fractional part is then discarded, and plain integer data is captured.

PSG Sound
The output volume equals to PCM16 values +7FFFh (HIGH) and -7FFFh (LOW).
PSG sound is always Infinite (the SOUNDxLEN Register, and the SOUNDxCNT Repeat Mode bits have no effect). The PSG hardware doesn't support sound length, sweep, or volume envelopes, however, these effects can be produced by software with little overload (or, more typically, with enormous overload, depending on the programming language used).

PSG Wave Duty (channel 8..13 in PSG mode)
Each duty cycle consists of eight HIGH or LOW samples, so the sound frequency is 1/8th of the selected sample rate. The duty cycle always starts at the begin of the LOW period when the sound gets (re-)started.
  0  12.5% "_______-_______-_______-"
1 25.0% "______--______--______--"
2 37.5% "_____---_____---_____---"
3 50.0% "____----____----____----"
4 62.5% "___-----___-----___-----"
5 75.0% "__------__------__------"
6 87.5% "_-------_-------_-------"
7 0.0% "________________________"
The Wave Duty bits exist and are read/write-able on all channels (although they are actually used only in PSG mode on channels 8-13).

PSG Noise (channel 14..15 in PSG mode)
Noise randomly switches between HIGH and LOW samples, the output levels are calculated, at the selected sample rate, as such:
  X=X SHR 1, IF carry THEN Out=LOW, X=X XOR 6000h ELSE Out=HIGH
The initial value when (re-)starting the sound is X=7FFFh. The formula is more or less same as "15bit polynomial counter" used on 8bit Gameboy and GBA.

PCM8 and PCM16
Signed samples in range -80h..+7Fh (PCM8), or -8000h..+7FFFh (PCM16).
The output volume of PCM8=NNh is equal to PCM16=NN00h.

IMA-ADPCM Format
IMA-ADPCM is a Adaptive Differential Pulse Code Modulation (ADPCM) variant, designed by International Multimedia Association (IMA), the format is used, among others, in IMA-ADPCM compressed Windows .WAV files.
The NDS data consist of a 32bit header, followed by 4bit values (so each byte contains two values, the first value in the lower 4bits, the second in upper 4 bits). The 32bit header contains initial values:
  Bit0-15   Initial PCM16 Value (Pcm16bit = -7FFFh..+7FFF) (not -8000h)
Bit16-22 Initial Table Index Value (Index = 0..88)
Bit23-31 Not used (zero)
In theory, the 4bit values are decoded into PCM16 values, as such:
  Diff = ((Data4bit AND 7)*2+1)*AdpcmTable[Index]/8      ;see rounding-error
IF (Data4bit AND 8)=0 THEN Pcm16bit = Max(Pcm16bit+Diff,+7FFFh)
IF (Data4bit AND 8)=8 THEN Pcm16bit = Min(Pcm16bit-Diff,-7FFFh)
Index = MinMax (Index+IndexTable[Data4bit AND 7],0,88)
In practice, the first line works like so (with rounding-error):
  Diff = AdpcmTable[Index]/8
IF (data4bit AND 1) THEN Diff = Diff + AdpcmTable[Index]/4
IF (data4bit AND 2) THEN Diff = Diff + AdpcmTable[Index]/2
IF (data4bit AND 4) THEN Diff = Diff + AdpcmTable[Index]/1
And, a note on the second/third lines (with clipping-error):
  Max(+7FFFh) leaves -8000h unclipped (can happen if initial PCM16 was -8000h)
Min(-7FFFh) clips -8000h to -7FFFh (possibly unlike windows .WAV files?)
Whereas, IndexTable[0..7] = -1,-1,-1,-1,2,4,6,8. And AdpcmTable [0..88] =
  0007h,0008h,0009h,000Ah,000Bh,000Ch,000Dh,000Eh,0010h,0011h,0013h,0015h
0017h,0019h,001Ch,001Fh,0022h,0025h,0029h,002Dh,0032h,0037h,003Ch,0042h
0049h,0050h,0058h,0061h,006Bh,0076h,0082h,008Fh,009Dh,00ADh,00BEh,00D1h
00E6h,00FDh,0117h,0133h,0151h,0173h,0198h,01C1h,01EEh,0220h,0256h,0292h
02D4h,031Ch,036Ch,03C3h,0424h,048Eh,0502h,0583h,0610h,06ABh,0756h,0812h
08E0h,09C3h,0ABDh,0BD0h,0CFFh,0E4Ch,0FBAh,114Ch,1307h,14EEh,1706h,1954h
1BDCh,1EA5h,21B6h,2515h,28CAh,2CDFh,315Bh,364Bh,3BB9h,41B2h,4844h,4F7Eh
5771h,602Fh,69CEh,7462h,7FFFh
The closest way to reproduce the AdpcmTable with 32bit integer maths appears:
  X=000776d2h, FOR I=0 TO 88, Table[I]=X SHR 16, X=X+(X/10), NEXT I
Table[3]=000Ah, Table[4]=000Bh, Table[88]=7FFFh, Table[89..127]=0000h
When using ADPCM and loops, set the loopstart position to the data part, rather than the header. At the loop end, the SAD value is reloaded to the loop start location, additionally index and pcm16 values are reloaded to the values that have originally appeared at that location. Do not change the ADPCM loop start position during playback.

Microphone Input
For Microphone (and Touchscreen) inputs, see
DS Touch Screen Controller (TSC)


 DS System and Built-in Peripherals < ^

DS DMA Transfers
DS Timers
DS Interrupts
DS Maths
DS Inter Process Communication (IPC)
DS Keypad
DS Absent Link Port
DS Real-Time Clock (RTC)
DS Serial Peripheral Interface Bus (SPI)
DS Touch Screen Controller (TSC)
DS Power Management


 DS DMA Transfers < ^

The DS includes four DMA channels for each CPU (ie. eight channels in total), which are working more or less the same as on GBA:
GBA DMA Transfers
All NDS9 and NDS7 DMA Registers are R/W. The gamepak bit (Bit 27) has been removed (on the NDS9 the bit is used to expand the mode setting to 3bits).

NDS9 DMA
Word count of all channels is expanded to 21bits (max 1..1FFFFFh units, or 0=200000h units), and SAD/DAD registers for all channels support ranges of 0..0FFFFFFEh. The transfer modes (DMACNT Bit27-29) are:
  0  Start Immediately
1 Start at V-Blank
2 Start at H-Blank (paused during V-Blank)
3 Synchronize to start of display
4 Main memory display
5 DS Cartridge Slot
6 GBA Cartridge Slot
7 Geometry Command FIFO
NDS7 DMA
Word Count, SAD, and DAD are R/W, aside from that they do have the same restrictions as on GBA (max 4000h or 10000h units, some addresses limited to 0..07FFFFFEh). DMACNT Bit27 is unused on NDS7. The transfer modes (DMACNT Bit28-29) are:
  0  Start Immediately
1 Start at V-Blank
2 DS Cartridge Slot
3 DMA0/DMA2: Wireless interrupt, DMA1/DMA3: GBA Cartridge Slot
40000E0h - NDS9 only - DMA0FILL - DMA 0 Filldata (R/W)
40000E4h - NDS9 only - DMA1FILL - DMA 1 Filldata (R/W)
40000E8h - NDS9 only - DMA2FILL - DMA 2 Filldata (R/W)
40000ECh - NDS9 only - DMA3FILL - DMA 3 Filldata (R/W)
  Bit0-31 Filldata
The DMA Filldata registers contain 16 bytes of general purpose WRAM, intended to be used as fixed source addresses for DMA memfill operations.
This is useful because DMA cannot read from TCM, and reading from Main RAM would require to recurse cache & write buffer.

NDS7 Sound DMA
The NDS additionally includes 16 Sound DMA channels, plus 2 Sound Capture DMA channels (see Sound chapter). The priority of these channels is unknown.

NDS9 Cache, Writebuffer, DTCM, and ITCM
Cache and tightly coupled memory are connected directly to the NDS9 CPU, without using the system bus. So that, DMA cannot access DTCM/ITCM, and access to cached memory regions must be handled with care: Drain the writebuffer before DMA-reads, and invalidate the cache after DMA-writes. See,
ARM CP15 System Control Coprocessor

The CPU can be kept running during DMA, provided that it is accessing only TCM (or cached memory), otherwise the CPU is halted until DMA finishes.
Respectively, interrupts executed during DMA will usually halt the CPU (unless the IRQ handler uses only TCM and cache; the IRQ vector at FFFF00xxh must be cached, or relocated to ITCM at 000000xxh, and the IRQ handler may not access IE, IF, or other I/O ports).

NDS Sequential Main Memory DMA
Main RAM has different access time for sequential and non-sequential access. Normally DMA uses sequential access (except for the first word), however, if the source and destination addresses are both in Main RAM, then all accesses become non-sequential. In that case it would be faster to use two DMA transfers, one from Main RAM to a scratch buffer in WRAM, and one from WRAM to Main RAM.


 DS Timers < ^

Same as GBA, except F = 33.513982 MHz (for both NDS9 and NDS7).
GBA Timers
Both NDS9 and NDS7 have four Timers each, eight Timers in total.


 DS Interrupts < ^

4000208h - NDS9/NDS7 - IME - Interrupt Master Enable (R/W)
  0     Disable all interrupts  (0=Disable All, 1=See IE register)
1-31 Not used
4000210h - NDS9/NDS7 - IE - 32bit - Interrupt Enable (R/W)
4000214h - NDS9/NDS7 - IF - 32bit - Interrupt Request Flags (R/W)
Bits in the IE register are 0=Disable, 1=Enable.
Reading IF returns 0=No request, 1=Interrupt Request.
Writing IF acts as 0=No change, 1=Acknowledge (clears that bit).
  0     LCD V-Blank
1 LCD H-Blank
2 LCD V-Counter Match
3 Timer 0 Overflow
4 Timer 1 Overflow
5 Timer 2 Overflow
6 Timer 3 Overflow
7 NDS7 only: SIO/RCNT/RTC (Real Time Clock)
8 DMA 0
9 DMA 1
10 DMA 2
11 DMA 3
12 Keypad
13 GBA-Slot (external IRQ source)
14-15 Not used
16 IPC Sync
17 IPC Send FIFO Empty
18 IPC Recv FIFO Not Empty
19 NDS-Slot Game Card Data Transfer Completion
20 NDS-Slot Game Card IREQ_MC
21 NDS9 only: Geometry Command FIFO
22 NDS7 only: Screens unfolding
23 NDS7 only: SPI bus
24 NDS7 only: Wifi
25-31 Not used
Raw TCM-only IRQs can be processed even during DMA ?

DTCM+3FFCh - NDS9 - IRQ Handler (hardcoded DTCM address)
380FFFCh - NDS7 - IRQ Handler (hardcoded RAM address)
  Bit 0-31  Pointer to IRQ Handler
NDS7 Handler must use ARM code, NDS9 Handler can be ARM/THUMB (Bit0=Thumb).

DTCM+3FF8h - NDS9 - IRQ Check Bits (hardcoded DTCM address)
380FFF8h - NDS7 - IRQ Check Bits (hardcoded RAM address)
  Bit 0-31  IRQ Flags (same format as IE/IF registers)
When processing & acknowleding interrupts via IF register, the user interrupt handler should also set the corresponding bits of the IRQ Check value (required for BIOS IntrWait and VBlankIntrWait SWI functions).

--- Below for other (non-IRQ) exceptions ---

27FFD9Ch - RAM - NDS9 Debug Stacktop / Debug Vector (0=None)
380FFDCh - RAM - NDS7 Debug Stacktop / Debug Vector (0=None)
These addresses contain a 32bit pointer to the Debug Handler, and, memory below of the addresses is used as Debug Stack. The debug handler is called on undefined instruction exceptions, on data/prefetch aborts (caused by the protection unit), on FIQ (possibly caused by hardware debuggers). It is also called by accidental software-jumps to the reset vector, and by unused SWI numbers within range 0..1Fh.


 DS Maths < ^

4000280h - NDS9 - DIVCNT - Division Control (R/W)
  0-1   Division Mode    (0-2=See below) (3=Reserved; same as Mode 1)
2-13 Not used
14 Division by zero (0=Okay, 1=Division by zero error; 64bit Denom=0)
15 Busy (0=Ready, 1=Busy) (Execution time see below)
16-31 Not used
Division Modes and Busy Execution Times
  Mode  Numer / Denom = Result, Remainder ; Cycles
0 32bit / 32bit = 32bit , 32bit ; 18 clks
1 64bit / 32bit = 64bit , 32bit ; 34 clks
2 64bit / 64bit = 64bit , 64bit ; 34 clks
Division is started when writing to any of the DIVCNT/NUMER/DENOM registers.

4000290h - NDS9 - DIV_NUMER - 64bit Division Numerator (R/W)
4000298h - NDS9 - DIV_DENOM - 64bit Division Denominator (R/W)
Signed 64bit values (or signed 32bit values in 32bit modes, the upper 32bits are then unused, with one exception: the DIV0 flag in DIVCNT is set only if the full 64bit DIV_DENOM value is zero, even in 32bit mode).

40002A0h - NDS9 - DIV_RESULT - 64bit Division Quotient (=Numer/Denom) (R)
40002A8h - NDS9 - DIVREM_RESULT - 64bit Remainder (=Numer MOD Denom) (R)
Signed 64bit values (in 32bit modes, the values are sign-expanded to 64bit).

Division Overflows
Overflows occur on "DIV0" and "-MAX/-1" (eg. -80000000h/-1 in 32bit mode):
  DIV0     -->  REMAIN=NUMER, RESULT=+/-1 (with sign opposite of NUMER)
-MAX/-1 --> RESULT=-MAX (instead +MAX)
On overflows in 32bit/32bit=32bit mode: the upper 32bit of the sign-expanded 32bit result are inverted. This feature produces a correct 64bit (+MAX) result in case of the incorrect 32bit (-MAX) result. The feature also applies on DIV0 errors (which makes the sign-expanded 64bit result even more messed-up than the normal 32bit result).
The DIV0 flag in DIVCNT.14 indicates DENOM=0 errors (it does not indicate "-MAX/-1" errors). The DENOM=0 check relies on the full 64bit value (so, in 32bit mode, the flag works only if the unused upper 32bit of DENOM are zero).

40002B0h - NDS9 - SQRTCNT - Square Root Control (R/W)
  0     Mode (0=32bit input, 1=64bit input)
1-14 Not used
15 Busy (0=Ready, 1=Busy) (Execution time is 13 clks, in either Mode)
16-31 Not used
Calculation is started when writing to any of the SQRTCNT/PARAM registers.

40002B4h - NDS9 - SQRT_RESULT - 32bit - Square Root Result (R)
40002B8h - NDS9 - SQRT_PARAM - 64bit - Square Root Parameter Input (R/W)
Unsigned 64bit parameter, and unsigned 32bit result.

Notes
Push all DIV/SQRT values (parameters and control registers) when using DIV/SQRT registers on interrupt level, and, after restoring them, be sure to wait until the busy flag goes off, before leaving the IRQ handler.
The NDS9 and NDS7 BIOSes additionally contain software based division and square root functions, which are NOT using above hardware registers (even the NDS9 functions are raw software).
The Div/Sqrt timings are counted in 33.51MHz units.


 DS Inter Process Communication (IPC) < ^

Allows to exchange status information between ARM7 and ARM9 CPUs.
The register can be accessed simultaneously by both CPUs (without violating access permissions, and without generating waitstates at either side).

4000180h - NDS9/NDS7 - IPCSYNC - IPC Synchronize Register (R/W)
  Bit   Dir  Expl.
0-3 R Data input from IPCSYNC Bit8-11 of remote CPU (00h..0Fh)
4-7 - Not used
8-11 R/W Data output to IPCSYNC Bit0-3 of remote CPU (00h..0Fh)
12 - Not used
13 W Send IRQ to remote CPU (0=None, 1=Send IRQ)
14 R/W Enable IRQ from remote CPU (0=Disable, 1=Enable)
15-31 - Not used
4000184h - NDS9/NDS7 - IPCFIFOCNT - IPC Fifo Control Register (R/W)
  Bit   Dir  Expl.
0 R Send Fifo Empty Status (0=Not Empty, 1=Empty)
1 R Send Fifo Full Status (0=Not Full, 1=Full)
2 R/W Send Fifo Empty IRQ (0=Disable, 1=Enable)
3 W Send Fifo Clear (0=Nothing, 1=Flush Send Fifo)
4-7 - Not used
8 R Receive Fifo Empty (0=Not Empty, 1=Empty)
9 R Receive Fifo Full (0=Not Full, 1=Full)
10 R/W Receive Fifo Not Empty IRQ (0=Disable, 1=Enable)
11-13 - Not used
14 R/W Error, Read Empty/Send Full (0=No Error, 1=Error/Acknowledge)
15 R/W Enable Send/Receive Fifo (0=Disable, 1=Enable)
16-31 - Not used
4000188h - NDS9/NDS7 - IPCFIFOSEND - IPC Send Fifo (W)
  Bit0-31  Send Fifo Data (max 16 words; 64bytes)
4100000h - NDS9/NDS7 - IPCFIFORECV - IPC Receive Fifo (R)
  Bit0-31  Receive Fifo Data (max 16 words; 64bytes)
IPCFIFO Notes
When IPCFIFOCNT.15 is disabled: Writes to IPCFIFOSEND are ignored (no data is stored in the FIFO, the error bit doesn't get set though), and reads from IPCFIFORECV return the oldest FIFO word (as usually) (but without removing the word from the FIFO).
When the Receive FIFO is empty: Reading from IPCFIFORECV returns the most recently received word (if any), or ZERO (if there was no data, or if the FIFO was cleared via IPCFIFOCNT.3), and, in either case the error bit gets set.
The Fifo-IRQs are edge triggered, IF.17 gets set when the condition "(IPCFIFOCNT.2 AND IPCFIFOCNT.0)" changes from 0-to-1, and IF.18 gets set when "(IPCFIFOCNT.10 AND NOT IPCFIFOCNT.8)" changes from 0-to-1. The IRQ flags can be acknowledge even while that conditions are true.


 DS Keypad < ^

For the GBA-buttons: Same as GBA, both ARM7 and ARM9 have keyboard input registers, and each its own keypad IRQ control register.
GBA Keypad Input

For Touchscreen (and Microphone) inputs, see
DS Touch Screen Controller (TSC)

4000136h - NDS7 - EXTKEYIN - Key X/Y Input (R)
  0      Button X     (0=Pressed, 1=Released)
1 Button Y (0=Pressed, 1=Released)
3 DEBUG button (0=Pressed, 1=Released/None such)
6 Pen down (0=Pressed, 1=Released/Disabled)
7 Hinge/folded (0=Open, 1=Closed)
2,4,5 Unknown / set
8..15 Unknown / zero
The Hinge stuff is a magnetic sensor somewhere underneath of the Start/Select buttons, it will be triggered by the magnet field from the right speaker when the console is closed. The hinge generates an interrupt request (there seems to be no way to disable this, unlike as for all other IRQ sources), however, the interrupt execution can be disabled in IE register (as for other IRQ sources).
The Pen Down is the /PENIRQ signal from the Touch Screen Controller (TSC), if it is enabled in the TSC control register, then it will notify the program when the screen pressed, the program should then read data from the TSC (if there's no /PENIRQ then doing unneccassary TSC reads would just waste CPU power). However, the user may release the screen before the program performs the TSC read, so treat the screen as not pressed if you get invalid TSC values (even if /PENIRQ was LOW).
Not sure if the TSC /PENIRQ is actually triggering an IRQ in the NDS?
The Debug Button should be connected to R03 and GND (on original NDS, R03 is the large soldering point between the SL1 jumper and the VR1 potentiometer) (there is no R03 signal visible on the NDS-Lite board).
Interrupts are reportedly not supported for X,Y buttons.


 DS Absent Link Port < ^

The DS doesn't have a Serial Link Port Socket, however, internally, the NDS7 contains the complete set of Serial I/O Ports, as contained in the GBA:
GBA Communication Ports

In GBA mode, the ports are working as on real GBA (as when no cable is connected). In NDS mode, the ports are even containing some additional bits:

NDS7 SIO Bits (according to an early I/O map from Nintendo)
  NDS7 4000128h SIOCNT   Bit15 "CKUP"  New Bit in NORMAL/MULTI/UART mode (R/W)
NDS7 4000128h SIOCNT Bit14 "N/A" Removed IRQ Bit in UART mode (?)
NDS7 400012Ah SIOCNT_H Bit14 "TFEMP" New Bit (R/W)
NDS7 400012Ah SIOCNT_H Bit15 "RFFUL" New Bit (always zero?)
NDS7 400012Ch SIOSEL Bit0 "SEL" New Bit (always zero?)
NDS7 4000140h JOYCNT Bit7 "MOD" New Bit (R/W)
The "CKUP" bit duplicates the internal clock transfer rate (selected in SIOCNT.1) (tested in normal mode) (probably works also in multi/uart mode?).

NDS7 DS-Lite 4001080h (W) (?)
DS-Lite Firmware writes FFFFh to this address (prior to accessing SIOCNT), so it's probably SIO or debugging related (might be as well a bug or so). Reading from the port always returns 0000h on both DS and DS-Lite.

NDS9 SIO Bits (according to an early I/O map from Nintendo)
  NDS9 4000120h SIODATA32 Bit0-31 Data            (always zero?)
NDS9 4000128h SIOCNT Bit2 "TRECV" New Bit (always zero?)
NDS9 4000128h SIOCNT Bit3 "TSEND" New Bit (always zero?)
NDS9 400012Ch SIOSEL Bit0 "SEL" New Bit (always zero?)
Not sure if these ports really exist in the release-version, or if it's been prototype stuff?

RCNT
RCNT (4000134h) should be set to 80xxh (general purpose mode) before accessing EXTKEYIN (4000136h) or RTC (4000138h). No idea why (except when using RTC/SI-interrupt).

DS Serial Port
The SI line is labeled "INT" on the NDS mainboard, it is connected to Pin 1 of the RTC chip (ie. the /INT interrupt pin).
I have no idea where to find SO, SC, and SD. I've written a test proggy that pulsed all four RCNT bits - but all I could find was the SI signal. However, the BIOS contains some code that uses SIO normal mode transfers (for the debug version), so at least SI, SO, SC should exist...?


 DS Real-Time Clock (RTC) < ^

Seiko Instruments Inc. S-35180 (compatible with S-35190A)
Miniature 8pin RTC with 3-wire serial bus

4000138h - NDS7 - Real Time Clock Register
  Bit  Expl.
0 Data I/O (0=Low, 1=High)
1 Clock Out (0=Low, 1=High)
2 Select Out (0=Low, 1=High/Select)
4 Data Direction (0=Read, 1=Write)
5 Clock Direction (should be 1=Write)
6 Select Direction (should be 1=Write)
3,8-11 Unused I/O Lines
7,12-15 Direction for Bit3,8-11 (usually 0)
16-31 Not used
Serial Transfer Flowchart
Chipselect and Command/Parameter Sequence:
  Init CS=LOW and /SCK=HIGH, and wait at least 1us
Switch CS=HIGH, and wait at least 1us
Send the Command byte (see bit-transfer below)
Send/receive Parameter byte(s) associated with the command (see below)
Switch CS to LOW
Bit transfer (repeat 8 times per cmd/param byte) (bits transferred LSB first):
  Output /SCK=LOW and SIO=databit (when writing), then wait at least 5us
Output /SCK=HIGH, wait at least 5us, then read SIO=databit (when reading)
In either direction, data is output on (or immediately after) falling edge.
Ideally, <both> commands and parameters should be transmitted LSB-first (unlike the original Seiko document, which recommends LSB-first for data, and MSB-first for commands).

Command Register
  Command Register
Fwd Rev
0-3 7-4 Fixed Code (must be 06h = 0110b) (same for Fwd and Rev)
4-6 3-1 Command
Fwd Rev Parameter bytes (read/write access)
0 0 1 byte, status register 1
4 1 1 byte, status register 2
2 2 7 bytes, date&time (year,month,day,day_of_week,hour,min,sec)
6 3 3 bytes, time (hour,minute,second)
1* 4* 1 byte, int1, frequency duty setting
1* 4* 3 bytes, int1, alarm time 1 (day_of_week, hour, minute)
5 5 3 bytes, int2, alarm time 2 (day_of_week, hour, minute)
3 6 1 byte, clock adjustment register
7 7 1 byte, free register
7 0 Parameter Read/Write Access (0=Write, 1=Read)
* INT1: Type and number of parameters depend on INT1 setting in stat reg2.
The "Fwd" bit numbers and command values for LSB-first command transfers (ie. both commands and parameters use the same bit-order).
The "Rev" numbers/values are for MSB-first command transfers (ie. commands using opposite bit-order than parameters, as being suggested by Seiko).

Control and Status Registers
  Status Register 1
0 W Reset (0=Normal, 1=Reset)
1 R/W 12/24 hour mode (0=12 hour, 1=24 hour)
2-3 R/W General purpose bits
4 R Interrupt 1 Flag (1=Yes) ;auto-cleared on read
5 R Interrupt 2 Flag (1=Yes) ;auto-cleared on read
6 R Power Low Flag (0=Normal, 1=Power is/was low) ;auto-cleared on read
7 R Power Off Flag (0=Normal, 1=Power was off) ;auto-cleared on read
Power off indicates that the battery was removed or fully discharged,
all registers are reset to 00h (or 01h), and must be re-initialized.
Status Register 2
0-3 R/W INT1 Mode/Enable
0000b Disable
0x01b Selected Frequency steady interrupt
0x10b Per-minute edge interrupt
0011b Per-minute steady interrupt 1 (duty 30.0 seconds)
0100b Alarm 1 interrupt
0111b Per-minute steady interrupt 2 (duty 0.0079 seconds)
1xxxb 32kHz output
4-5 R/W General purpose bits
6 R/W INT2 Enable
0b Disable
1b Alarm 2 interrupt
7 R/W Test Mode (0=Normal, 1=Test, don't use) (cleared on Reset)
Clock Adjustment Register (to compensate oscillator inaccuracy)
0-7 R/W Adjustment (00h=Normal, no adjustment)
Free Register
0-7 R/W General purpose bits
Date Registers
  Year Register
0-7 R/W Year (BCD 00h..99h = 2000..2099)
Month Register
0-4 R/W Month (BCD 01h..12h = January..December)
5-7 - Not used (always zero)
Day Register
0-5 R/W Day (BCD 01h..28h,29h,30h,31h, range depending on month/year)
6-7 - Not used (always zero)
Day of Week Register (septenary counter)
0-2 R/W Day of Week (00h..06h, custom assignment, usually 0=Monday?)
3-7 - Not used (always zero)
Time Registers
  Hour Register
0-5 R/W Hour (BCD 00h..23h in 24h mode, or 00h..11h in 12h mode)
6 * AM/PM (0=AM before noon, 1=PM after noon)
* 24h mode: AM/PM flag is read only (PM=1 if hour = 12h..23h)
* 12h mode: AM/PM flag is read/write-able
* 12h mode: Observe that 12 o'clock is defined as 00h (not 12h)
7 - Not used (always zero)
Minute Register
0-6 R/W Minute (BCD 00h..59h)
7 - Not used (always zero)
Second Register
0-6 R/W Minute (BCD 00h..59h)
7 - Not used (always zero)
Alarm 1 and Alarm 2 Registers
  Alarm1 and Alarm2 Day of Week Registers (INT1 and INT2 each)
0-2 R/W Day of Week (00h..06h)
3-6 - Not used (always zero)
7 R/W Compare Enable (0=Alarm every day, 1=Alarm only at specified day)
Alarm1 and Alarm2 Hour Registers (INT1 and INT2 each)
0-5 R/W Hour (BCD 00h..23h in 24h mode, or 00h..11h in 12h mode)
6 R/W AM/PM (0=AM, 1=PM) (must be correct even in 24h mode?)
7 R/W Compare Enable (0=Alarm every hour, 1=Alarm only at specified hour)
Alarm1 and Alarm2 Minute Registers (INT1 and INT2 each)
0-6 R/W Minute (BCD 00h..59h)
7 R/W Compare Enable (0=Alarm every min, 1=Alarm only at specified min)
Selected Frequency Steady Interrupt Register (INT1 only) (when Stat2/Bit2=0)
0 R/W Enable 1Hz Frequency (0=Disable, 1=Enable)
1 R/W Enable 2Hz Frequency (0=Disable, 1=Enable)
2 R/W Enable 4Hz Frequency (0=Disable, 1=Enable)
3 R/W Enable 8Hz Frequency (0=Disable, 1=Enable)
4 R/W Enable 16Hz Frequency (0=Disable, 1=Enable)
The signals are ANDed when two or more frequencies are enabled,
ie. the /INT signal gets LOW when either of the signals is LOW.
5-7 R/W General purpose bits
Note: There is only one register shared as "Selected Frequency Steady Interrupt" (accessed as single byte parameter when Stat2/Bit2=0) and as "Alarm1 Minute" (accessed as 3rd byte of 3-byte parameter when Stat2/Bit2=1), changing either value will also change the other value.

Interrupt
There's only one /INT signal, shared for both INT1 and INT2.
In the NDS, it is connected to the SI-input of the SIO unit (and so, also shared with SIO interrupts). To enable the interrupt, RCNT should be set to 8144h (Bit14-15=General Purpose mode, Bit8=SI Interrupt Enable, Bit6,2=SI Output/High).
The Output/High settings seems to be used as pullup (giving faster reactions on low-to-high transitions) (nethertheless, in most cases it seems to be also working okay as Input, ie. with RCNT=8100h).
The RCNT interrupt is generated on high-to-low transitions on the SI line (but only if the IRQ is enabled in RCNT.8, and only if RCNT is set to general purpose mode) (note: changing RCNT.8 from off-to-on does NOT generate IRQs, even when SI is LOW).

Pin-Outs
  1 /INT      8 VDD
2 XOUT 7 SIO
3 XIN 6 /SCK
4 GND 5 CS

 DS Serial Peripheral Interface Bus (SPI) < ^

Serial Peripheral Interface Bus
SPI Bus is a 4-wire (Data In, Data Out, Clock, and Chipselect) serial bus.
The NDS supports the following SPI devices (each with its own chipselect).
DS Firmware Serial Flash Memory
DS Touch Screen Controller (TSC)
DS Power Management

40001C0h - NDS7 - SPICNT - SPI Bus Control/Status Register
  0-1   Baudrate (0=4MHz/Firmware, 1=2MHz/Touchscr, 2=1MHz/Powerman., 3=512KHz)
2-6 Not used (Zero)
7 Busy Flag (0=Ready, 1=Busy) (presumably Read-only)
8-9 Device Select (0=Powerman., 1=Firmware, 2=Touchscr, 3=Reserved)
10 Transfer Size (0=8bit/Normal, 1=16bit/Bugged)
11 Chipselect Hold (0=Deselect after transfer, 1=Keep selected)
12-13 Not used (Zero)
14 Interrupt Request (0=Disable, 1=Enable)
15 SPI Bus Enable (0=Disable, 1=Enable)
The "Hold" flag should be cleared BEFORE transferring the LAST data unit, the chipselect will be then automatically cleared after the transfer, the program should issue a WaitByLoop(3) manually AFTER the LAST transfer.

40001C2h - NDS7 - SPIDATA - SPI Bus Data/Strobe Register (R/W)
The SPI transfer is started on writing to this register, so one must <write> a dummy value (should be zero) even when intending to <read> from SPI bus.
  0-7   Data
8-15 Not used (always zero, even in bugged-16bit mode)
During transfer, the Busy flag in SPICNT is set, and the written SPIDATA value is transferred to the device (via output line), simultaneously data is received (via input line). Upon transfer completion, the Busy flag goes off (with optional IRQ), and the received value can be then read from SPIDATA, if desired.

Notes/Glitches
SPICNT Bits 12,13 appear to be unused (always zero), although the BIOS (attempts to) set Bit13=1, and Bit12=Bit11 when accessing the firmware.
The SPIDATA register is restricted to 8bit, so that only each 2nd byte will appear in SPIDATA when attempting to use the bugged-16bit mode.

Cartridge Backup Auxiliar SPI Bus
The NDS Cartridge Slot uses a separate SPI bus (with other I/O Ports), see
DS Cartridge Backup


 DS Touch Screen Controller (TSC) < ^

Texas Instruments TSC2046 (NDS)
Asahi Kasei Microsystems AK4148AVT (NDS-Lite)
The Touch Screen Controller (for lower LCD screen) is accessed via SPI bus,
DS Serial Peripheral Interface Bus (SPI)

Control Byte (transferred MSB first)
  0-1  Power Down Mode Select
2 Reference Select (0=Differential, 1=Single-Ended)
3 Conversion Mode (0=12bit, max CLK=2MHz, 1=8bit, max CLK=3MHz)
4-6 Channel Select (0-7, see below)
7 Start Bit (Must be set to access Control Byte)
Channel
  0 Temperature 0 (requires calibration, step 2.1mV per 1'C accuracy)
1 Touchscreen Y-Position (somewhat 0B0h..F20h, or FFFh=released)
2 Battery Voltage (not used, connected to GND in NDS, always 000h)
3 Touchscreen Z1-Position (diagonal position for pressure measurement)
4 Touchscreen Z2-Position (diagonal position for pressure measurement)
5 Touchscreen X-Position (somewhat 100h..ED0h, or 000h=released)
6 AUX Input (connected to Microphone in the NDS)
7 Temperature 1 (difference to Temp 0, without calibration, 2'C accuracy)
All channels can be accessed in Single-Ended mode.
In differential mode, only channel 1,3,4,5 (X,Z1,Z2,Y) can be accessed.
On AK4148AVT, channel 6 (AUX) is split into two separate channels, IN1 and IN2, separated by Bit2 (Reference Select). IN1 is selected when Bit2=1, IN2 is selected when Bit2=0 (despite of the Bit2 settings, both IN1 and IN2 are using single ended more). On the NDS-Lite, IN1 connects to the mircrophone (as on original NDS), and the new IN2 input is simply wired to VDD3.3 (which is equal the the external VREF voltage, so IN2 is always FFFh).

Power Down Mode
  Mode /PENIRQ   VREF  ADC   Recommended use
0 Enabled Auto Auto Differential Mode (Touchscreen, Penirq)
1 Disabled Off On Single-Ended Mode (Temperature, Microphone)
2 Enabled On Off Don't use
3 Disabled On On Don't use
Allows to enable/disable the /PENIRQ output, the internal reference voltage (VREF), and the Analogue-Digital Converter.
For AK4148AVT, Power Down modes are slightly different (among others, /PENIRQ is enabled in Mode 0..2).

Reference Voltage (VREF)
VREF is used as reference voltage in single ended mode, at 12bit resolution one ADC step equals to VREF/4096. The TSC generates an internal VREF of 2.5V (+/-0.05V), however, the NDS uses as external VREF of 3.33V (sinks to 3.31V at low battery charge), the external VREF is always enabled, no matter if internal VREF is on or off. Power Down Mode 1 disables the internal VREF, which may reduce power consumption in single ended mode. After conversion, Power Down Mode 0 should be restored to re-enable the Penirq signal.

Sending the first Command after Chip-Select
Switch chipselect low, then output the command byte (MSB first).

Reply Data
The following reply data is received (via Input line) after the Command byte has been transferred: One dummy bit (zero), followed by the 8bit or 12bit conversion result (MSB first), followed by endless padding (zero).
Note: The returned ADC value may become unreliable if there are longer delays between sending the command, and receiving the reply byte(s).

Sending further Commands during/after receiving Reply Data
In general, the Output line should be LOW during the reply period, however, once when Data bit6 has been received (or anytime later), a new Command can be invoked (started by sending the HIGH-startbit, ie. Command bit7), simultanously, the remaining reply-data bits (bit5..0) can be received.
In other words, the new command can be output after receiving 3 bits in 8bit mode (the dummy bit, and data bits 7..6), or after receiving 7 bits in 12bit mode (the dummy bit, and data bits 11..6).
In practice, the NDS SPI register always transfers 8 bits at once, so that one would usually receive 8 bits (rather than above 3 or 7 bits), before outputting a new command.

Touchscreen Position
Read the X and Y positions in 12bit differential mode, then convert the touchscreen values (adc) to screen/pixel positions (scr), as such:
  scr.x = (adc.x-adc.x1) * (scr.x2-scr.x1) / (adc.x2-adc.x1) + (scr.x1-1)
scr.y = (adc.y-adc.y1) * (scr.y2-scr.y1) / (adc.y2-adc.y1) + (scr.y1-1)
The X1,Y1 and X2,Y2 calibration points are found in Firmware User Settings,
DS Firmware User Settings
scr.x1,y1,x2,y2 are originated at 1,1 (converted to 0,0 by above formula).

Touchscreen Pressure
To calculate the pressure resistance, in respect to X/Y/Z positions and X/Y plate resistances, either of below formulas can be used,
  Rtouch = (Rx_plate*Xpos*(Z2pos/Z1pos-1))/4096
Rtouch = (Rx_plate*Xpos*(4096/Z1pos-1)-Ry_plate*(1-Ypos))/4096
The second formula requires less CPU load (as it doesn't require to measure Z2), the downside is that one must know both X and Y plate resistance (or at least their ratio). The first formula doesn't require that ratio, and so Rx_plate can be set to any value, setting it to 4096 results in
  touchval = Xpos*(Z2pos/Z1pos-1)
Of course, in that case, touchval is just a number, not a resistance in Ohms.

Touchscreen Notes
It may be impossible to press locations close to the screen borders.
When pressing two or more locations the TSC values will be somewhere in the middle of these locations.
The TSC values may be garbage if the screen becomes newly pressed or released, to avoid invalid inputs: read TSC values at least two times, and ignore BOTH positions if ONE position was invalid.

Microphone / AUX Channel
Observe that the microphone amplifier is switched off after power up, see:
DS Power Management

Temperature Calculation
TP0 decreases by circa 2.1mV per degree Kelvin. The voltage difference between TP1 minus TP0 increases by circa 0.39mV (1/2573 V) per degree Kelvin. At VREF=3.33V, one 12bit ADC step equals to circa 0.8mV (VREF/4096).
Temperature can be calculated at best resolution when using the current TP0 value, and two calibration values (an ADC value, and the corresponding temperature in degrees kelvin):
  K = (CAL.TP0-ADC.TP0) * 0.4 + CAL.KELVIN
Alternately, temperature can be calculated at rather bad resolution, but without calibration, by using the difference between TP1 and TP0:
  K = (ADC.TP1-ADC.TP0) * 8568 / 4096
To convert Kelvin to other formats,
  Celsius:     C = (K-273.15)
Fahrenheit: F = (K-273.15)*9/5+32
Reaumur: R = (K-273.15)*4/5
Rankine: X = (K)*9/5
The Temperature Range for the TSC 2046 chip is -40'C..+85'C (for AK4181AVT only -20'C..+70'C). According to Nintendo, the DS should not be exposed to "extreme" heat or cold, the optimal battery charging temperature is specified as +10'C..+40'C.
The original firmware does not support temperature calibration, calibration is supported by nocash firmware (if present). See Extended Settings,
DS Firmware Extended Settings

Pin-Outs
         ________
VCC 1|o |16 DCLK
X+ 2| |15 /CS
Y+ 3| TSC |14 DIN
X- 4| 2046 |13 BUSY
Y- 5| |12 DOUT
GND 6| |11 /PENIRQ
VBAT 7| |10 IOVDD
AUX 8|________|9 VREF
For AK4181AVT, same pins as above, except that IOVDD replaced by the new IN2 input, the pin is wired to VDD3.3 (so IN2 is always equal to VREF, which is wired to VDD3.3, too) (and AUX is renamed to IN1, and is kept used for MIC input).


 DS Power Management < ^

The DS contains several Power Managment functions, some accessed via I/O ports, some accessed via SPI bus (described later on below).

4000304h - NDS9 - POWCNT1 - Graphics Power Control Register (R/W)
  0     Enable Flag for both LCDs (0=Disable) (Prohibited, see notes)
1 2D Graphics Engine A (0=Disable) (Ports 008h-05Fh, Pal 5000000h)
2 3D Rendering Engine (0=Disable) (Ports 320h-3FFh)
3 3D Geometry Engine (0=Disable) (Ports 400h-6FFh)
4-8 Not used
9 2D Graphics Engine B (0=Disable) (Ports 1008h-105Fh, Pal 5000400h)
10-14 Not used
15 Display Swap (0=Send Display A to Lower Screen, 1=To Upper Screen)
16-31 Not used
Use SwapBuffers command once after enabling Rendering/Geometry Engine.
Improper use of Bit0 may damage the hardware?
When disabled, corresponding Ports become Read-only, corresponding (palette-) memory becomes read-only-zero-filled.

4000304h - NDS7 - POWCNT2 - Sound/Wifi Power Control Register (R/W)
  Bit   Expl.
0 Sound Speakers (0=Disable, 1=Enable) (Initial setting = 1)
1 Wifi (0=Disable, 1=Enable) (Initial setting = 0)
2-31 Not used
Note: Bit0 disables the internal Speaker only, headphones are not disabled.
Bit1 disables Port 4000206h, and Ports 4800000h-480FFFFh.

4000206h - NDS7 - WIFIWAITCNT - Wifi Waitstate Control
  Bit   Expl.
0-2 Wifi WS0 Control (0-7) (Ports 4800000h-4807FFFh)
3-5 Wifi WS1 Control (0-7) (Ports 4808000h-480FFFFh)
4-15 Not used (zero)
This register is initialized by firmware on power-up, don't change.
Note: WIFIWAITCNT can be accessed only when enabled in POWCNT2.

4000301h - NDS7 - HALTCNT - Low Power Mode Control (R/W)
In Halt mode, the CPU is paused as long as (IE AND IF)=0.
In Sleep mode, most of the hardware including sound and video are paused, this very-low-power mode could be used much like a screensaver.
  Bit   Expl.
0-5 Not used (zero)
6-7 Power Down Mode (0=No function, 1=Enter GBA Mode, 2=Halt, 3=Sleep)
The HALTCNT register should not be accessed directly. Instead, the BIOS Halt, Sleep, CustomHalt, IntrWait, or VBlankIntrWait SWI functions should be used.
BIOS Halt Functions
ARM CP15 System Control Coprocessor
The NDS9 does not have a HALTCNT register, instead, the Halt function uses the co-processor opcode "mcr p15,0,r0,c7,c0,4" - this opcode locks up if interrupts are disabled via IME=0 (unlike NDS7 HALTCNT method which doesn't check IME).

4000300h - NDS7/NDS9 - POSTFLG - BYTE - Post Boot Flag (R/W)
The NDS7 and NDS9 post boot flags are usually set upon BIOS/Firmware boot completion, once when set the reset vector is redirected to the debug handler of Nintendo's hardware debugger. That allows the NDS7 debugger to capture accidental jumps to address 0, that appears to be a common problem with HLL-programmers, asm-coders know that (and why) they should not jump to 0.
  Bit   Expl.
0 Post Boot Flag (0=Boot in progress, 1=Boot completed)
1 NDS7: Not used (always zero), NDS9: Bit1 is read-writeable
2-7 Not used (always zero)
There are some write-restrictions: The NDS7 register can be written to only from code executed in BIOS. Bit0 of both NDS7 and NDS9 registers cannot be cleared (except by Reset) once when it is set.

Power Management Device - Mitsumi 3152A (NDS) / Mitsumi 3205B (NDS-LITE)
The Power Management Device is accessed via SPI bus,
DS Serial Peripheral Interface Bus (SPI)
To access the device, write the Index Register, then read or write the data register, and release the chipselect line when finished.
  Index Register
  Bit0-2 Register Select          (0..3) (0..4 for DS-Lite)
  Bit3-6 Not used
  Bit7   Register Direction       (0=Write, 1=Read)
  Register 0 - Powermanagement Control (R/W)
  Bit0   Sound Amplifier Enable   (0=Disable, 1=Enable)
         (Old-DS:  Disabled: Sound is very silent, but still audible)
         (DS-Lite: Disabled: Sound is NOT audible)
  Bit1   Sound Amplifier Mute     (0=Normal, 1=Mute) (Old-DS Only, not DS-Lite)
         (Old-DS:  Muted: Sound is NOT audible, that works only if Bit0=1)
         (DS-Lite: Not used, Bit1 is always zero)
  Bit2   Lower Backlight          (0=Disable, 1=Enable)
  Bit3   Upper Backlight          (0=Disable, 1=Enable)
  Bit4   Power LED Blink Enable   (0=Always ON, 1=Blinking OFF/ON)
  Bit5   Power LED Blink Speed    (0=Slow, 1=Fast) (only if Blink enabled)
  Bit6   DS System Power          (0=Normal, 1=Shut Down)
  Bit7   Not used                 (always 0)
  Register 1 - Battery Status (R)
  Bit0   Battery Power LED Status (0=Power Good/Green, 1=Power Low/Red)
  Bit1-7 Not used
  Register 2 - Microphone Amplifier Control (R/W)
  Bit0   Amplifier                (0=Disable, 1=Enable)
  Bit1-7 Not used                 (always 0)
  Register 3 - Microphone Amplifier Gain Control (R/W)
  Bit0-1 Gain                     (0..3=Gain 20, 40, 80, 160)
  Bit2-7 Not used                 (always 0)
  Register 4 - DS-Lite Only - Backlight Levels/Power Source (R/W)
  Bit0-1 Backlight Brightness (0..3=Low,Med,High,Max)   (R/W)
         (when bit2+3 are both set, then reading bit0-1 always returns 3)
  Bit2   Force Max Brightness when Bit3=1 (0=No, 1=Yes) (R/W)
  Bit3   External Power Present           (0=No, 1=Yes) (Read-Only)
  Bit4-7 Unknown (Always 4) (Read-Only)
On Old-DS, registers 4..7Fh are mirrors of 0..3. On DS-Lite, registers 5,6,7 are mirrors of 4, register 8..7Fh are mirrors of 0-7.

Backlight Dimming / Backlight caused Shut-Down(s)
The above bits are essentially used to switch Backlights on or off. However, there a number of strange effects. Backlight dimming is possible by pulse width modulation, ie. by using a timer interrupt to issue pulse widths of N% ON, and 100-N% OFF. Too long pulses are certainly resulting in flickering. Too short pulses are ignored, the backlights will remain OFF, even if the ON and OFF pulses are having the same length. Much too short pulses cause the power supply to shut-down; after changing the backlight state, further changes must not occur within the next (circa) 2500 clock cycles. The mainboard can be operated without screens & backlights connected, however, if so, the power supply will shut-down as soon as backlights are enabled.
That method works also on the DS-Lite, allowing to use smoother fade in/out effects as when using the five "hardware" levels (Off,Low,Med,High,Max).

Memory Power Down Functions
DS Main Memory Control
DS Firmware Serial Flash Memory


 DS Main Memory Control < ^

Main Memory
The DS Main Memory is 2Mx16bit (4MByte), 1.8V Pseudo SRAM (PSRAM); all Dynamic RAM refresh is handled internally, the chip doesn't require any external refresh signals, and alltogether behaves like Static RAM. Non-sequential access time is 70ns, sequential (burst) access time is 12ns.

Main Memory Control
The memory chips contain built-in Control functions, which can be accessed via Port 27FFFFEh and/or by EXMEMCNT Bit 14. Nintendo is using at least two different types of memory chips in DS consoles, Fujitsu 82DBS02163C-70L, and ST M69AB048BL70ZA8, both appear to have different control mechanisms, other chips (with 8MB size) are used in the semi-professional DS hardware debuggers, and further chips may be used in future, so using the memory control functions may lead into compatibitly problems.

Power Consumption / Power Control
Power Consumption during operation (read/write access) is somewhat 30mA, in standby mode (no read/write access) consumption is reduced to 100uA.
Furthermore, a number of power-down modes are supported: In "Deep" Power Down mode the refresh is fully disabled, consumption is 10uA (and all data will be lost), in "Partial" Power Down modes only fragment of memory is refreshed, for smallest fragments, consumption goes to down to circa 50uA. The chip cannot be accessed while it is in Deep or Partial Power Down mode.

Fujitsu 82DBS02163C-70L
The Configuration Register (CR) can be written to by the following sequence:
  LDRH R0,[27FFFFEh]      ;read one value
STRH R0,[27FFFFEh] ;write should be same value as above
STRH R0,[27FFFFEh] ;write should be same value as above
STRH R0,[27FFFFEh] ;write any value
STRH R0,[27FFFFEh] ;write any value
LDRH R0,[2400000h+CR*2] ;read, address-bits are defining new CR value
Do not access any other Main Memory addresses during above sequence (ie. disable interrupts, and do not execute the sequence by code located in Main Memory). The CR value is write-only. The CR bits are:
  Bit    Expl.
0-6 Reserved (Must be 7Fh)
7 Write Control
0=WE Single Clock Pulse Control without Write Suspend Function
1=WE Level Control with Write Suspend Function)
Burst Read/Single Write is not supported at WE Single Clock Mode.
8 Reserved (Must be 1)
9 Valid Clock Edge (0=Falling Edge, 1=Rising Edge)
10 Single Write (0=Burst Read/Burst Write, 1=Burst Read/Single Write)
11 Burst Sequence (0=Reserved, 1=Sequential)
12-14 Read Latency (1=3 clocks, 2=4 clocks, 3=5 clocks, other=Reserved)
15 Mode
0=Synchronous: Burst Read, Burst Write
1=Asynchronous: Page Read, Normal Write
In Mode 1 (Async), only the Partial Size bits are used,
all other bits, CR bits 0..18, must be "1".
16-18 Burst Length (2=8 Words, 3=16Words, 7=Continous, other=Reserved)
19-20 Partial Size (0=1MB, 1=512KB, 2=Reserved, 3=Deep/0 bytes)
The Power Down mode is entered by setting CE2=LOW, this can be probably done by setting EXMEMCNT Bit14 to zero.

ST Microelectronics M69AB048BL70ZA8
The chip name decodes as PSRAM (M96), Asynchronous (A), 1.8V Burst (B), 2Mx16 (048), Two Chip Enables (B), Low Leakage (L), 70ns (70), Package (ZA), -30..+85'C (8).
There are three data sheets for different PSRAM chips available at www.st.com (unfortunately none for M69AB048BL70ZA8), each using different memory control mechanisms.

NDS9 BIOS
The NDS9 BIOS contains the following Main Memory initialization code, that method doesn't match up with any ST (nor Fujitsu) data sheets that I've seen. At its best, it looks like a strange (and presumably non-functional) mix-up of different ST control methods.
  STRH 2000h,[4000204h]
LDRH R0,[27FFFFEh]
STRH R0,[27FFFFEh]
STRH R0,[27FFFFEh]
STRH FFDFh,[27FFFFEh]
STRH E732h,[27FFFFEh]
LDRH R0,[27E57FEh]
STRH 6000h,[4000204h]
In the above BIOS code, EXMEMCNT.14 appears to be used to unlock the control register. However, the NDS Firmware appears to use EXMEMCNT.14 to switch Main Memory into Power Down mode before entering GBA mode.


 DS Cartridges, Encryption, Firmware < ^

Cartridges
DS Cartridge Header
DS Cartridge Secure Area
DS Cartridge Icon/Title
DS Cartridge Protocol
DS Cartridge Backup
DS Cartridge I/O Ports
DS Cartridge NitroROM File System
DS Cartridge PassMe/PassThrough
DS Cartridge GBA Slot

Add-Ons
DS Cart Rumble Pak
DS Cart Slider with Rumble
DS Cart Unknown Add-Ons

Special Cartridges
DS Cart Cheat Action Replay DS
DS Cart Cheat Codebreaker DS

Encryption
DS Encryption by Gamecode/Idcode (KEY1)
DS Encryption by Random Seed (KEY2)

Firmware
DS Firmware Serial Flash Memory
DS Firmware Header
DS Firmware Wifi Calibration Data
DS Firmware Wifi Internet Access Points
DS Firmware User Settings
DS Firmware Extended Settings


 DS Cartridge Header < ^

Header Overview (loaded from ROM Addr 0 to Main RAM 27FFE00h on Power-up)
  Address Bytes Expl.
000h 12 Game Title (Uppercase ASCII, padded with 00h)
00Ch 4 Gamecode (Uppercase ASCII, NTR-<code>) (0=homebrew)
010h 2 Makercode (Uppercase ASCII, eg. "01"=Nintendo) (0=homebrew)
012h 1 Unitcode (00h=Nintendo DS)
013h 1 Encryption Seed Select (00..07h, usually 00h)
014h 1 Devicecapacity (Chipsize = 128KB SHL nn) (eg. 7 = 16MB)
015h 9 Reserved (zero filled)
01Eh 1 ROM Version (usually 00h)
01Fh 1 Autostart (Bit2: Skip "Press Button" after Health and Safety)
(Also skips bootmenu, even in Manual mode & even Start pressed)
020h 4 ARM9 rom_offset (4000h and up, align 1000h)
024h 4 ARM9 entry_address (2000000h..23BFE00h)
028h 4 ARM9 ram_address (2000000h..23BFE00h)
02Ch 4 ARM9 size (max 3BFE00h) (3839.5KB)
030h 4 ARM7 rom_offset (8000h and up)
034h 4 ARM7 entry_address (2000000h..23BFE00h, or 37F8000h..3807E00h)
038h 4 ARM7 ram_address (2000000h..23BFE00h, or 37F8000h..3807E00h)
03Ch 4 ARM7 size (max 3BFE00h, or FE00h) (3839.5KB, 63.5KB)
040h 4 File Name Table (FNT) offset
044h 4 File Name Table (FNT) size
048h 4 File Allocation Table (FAT) offset
04Ch 4 File Allocation Table (FAT) size
050h 4 File ARM9 overlay_offset
054h 4 File ARM9 overlay_size
058h 4 File ARM7 overlay_offset
05Ch 4 File ARM7 overlay_size
060h 4 Port 40001A4h setting for normal commands (usually 00586000h)
064h 4 Port 40001A4h setting for KEY1 commands (usually 001808F8h)
068h 4 Icon_title_offset (0=None) (8000h and up)
06Ch 2 Secure Area Checksum, CRC-16 of [ [20h]..7FFFh]
06Eh 2 Secure Area Loading Timeout (usually 051Eh)
070h 4 ARM9 Auto Load List RAM Address (?)
074h 4 ARM7 Auto Load List RAM Address (?)
078h 8 Secure Area Disable (by encrypted "NmMdOnly") (usually zero)
080h 4 Total Used ROM size (remaining/unused bytes usually FFh-padded)
084h 4 ROM Header Size (4000h)
088h 38h Reserved (zero filled)
0C0h 9Ch Nintendo Logo (compressed bitmap, same as in GBA Headers)
15Ch 2 Nintendo Logo Checksum, CRC-16 of [0C0h-15Bh], fixed CF56h
15Eh 2 Header Checksum, CRC-16 of [000h-15Dh]
160h 4 Debug rom_offset (0=none) (8000h and up) ;only if debug
164h 4 Debug size (0=none) (max 3BFE00h) ;version with
168h 4 Debug ram_address (0=none) (2400000h..27BFE00h) ;SIO and 8MB
16Ch 4 Reserved (zero filled) (transferred, and stored, but not used)
170h 90h Reserved (zero filled) (transferred, but not stored in RAM)
For more info about CRC-16, see description of GetCRC16 BIOS function,
BIOS Misc Functions
For the Logo checksum, the BIOS verifies only [15Ch]=CF56h, it does NOT verify the actual data at [0C0h-15Bh] (nor it's checksum), however, the data is verified by the firmware.

Secure Area Loading Timeout (usually X=051Eh) used to initialize timer counter to (0-((X AND 3FFFh)+2)). (In case of Header checksum error it is ANDed with 1FFFh instead of 3FFFh, no idea why).


 DS Cartridge Secure Area < ^

The Secure Area is located in ROM at 4000h..7FFFh, it can contain normal program code and data, however, it can be used only for ARM9 boot code, it cannot be used for ARM7 boot code, icon/title, filesystem, or other data.

Secure Area Size
The Secure Area exists if the ARM9 boot code ROM source address (src) is located within 4000h..7FFFh, if so, it will be loaded (by BIOS via KEY1 encrypted commands) in 4K portions, starting at src, aligned by 1000h, up to address 7FFFh. The secure area size if thus 8000h-src, regardless of the ARM9 boot code size entry in header.
Note: The BIOS silently skips any NDS9 bootcode at src<4000h.
Cartridges with src>=8000h do not have a secure area.

Secure Area ID
The first 8 bytes of the secure area are containing the Secure Area ID, the ID is required (verified by BIOS boot code), the ID value changes during boot process:
  Value                Expl.
"encryObj" raw ID before encryption (raw ROM-image)
(encrypted) encrypted ID after encryption (encrypted ROM-image)
"encryObj" raw ID after decryption (verified by BIOS boot code)
E7FFDEFFh,E7FFDEFFh destroyed ID (overwritten by BIOS after verify)
If the decrypted ID does match, then the BIOS overwrites the first 8 bytes by E7FFDEFFh-values (ie. only the ID is destroyed). If the ID doesn't match, then the first 800h bytes (2K) are overwritten by E7FFDEFFh-values.

Secure Area First 2K Encryption/Content
The first 2K of the Secure Area (if it exists) are KEY1 encrypted. In official games, this 2K region contains data like so (in decrypted form):
  000h..007h  Secure Area ID (see above)
008h..00Dh Fixed (FFh,DEh,FFh,E7h,FFh,DEh)
00Eh..00Fh CRC16 across following 7E0h bytes, ie. [010h..7FFh]
010h..7FDh Unknown/random values, mixed with some THUMB SWI calls
7FEh..7FFh Fixed (00h,00h)
Of which, only the ID in the first 8 bytes is verified. Neither BIOS nor (current) firmare versions are verifying the data at 008h..7FFh, so the 7F8h bytes may be also used for normal program code/data.

Avoiding Secure Area Encryption
WLAN files are reportedly same format as cartridges, but without Secure Area, so games with Secure Area cannot be booted via WLAN. No$gba can encrypt and decrypt Secure Areas only if the NDS BIOS-images are present. And, Nintendo's devkit doesn't seem to support Secure Area encryption of unreleased games.
So, unencrypted cartridges are more flexible in use. Ways to avoid encryption (which still work on real hardware) are:
1) Set NDS9 ROM offset to 4000h, and leave the first 800h bytes of the Secure Area 00h-filled, which can be (and will be) safely destroyed during loading; due to the missing "encryObj" ID; that method is used by Nintendo's devkit.
2) Set NDS9 ROM offset to 8000h or higher (cartridge has no Secure Area at all).
3) Set NDS9 ROM offset, RAM address, and size to zero, set NDS7 ROM offset to 200h, and point both NDS9 and NDS7 entrypoints to the loaded NDS7 region. That method avoids waste of unused memory at 200h..3FFFh, and it should be compatible with the NDS console, however, it is not comaptible with commercial cartridges - which do silently redirect address below 4000h to 8000h+(addr AND 1FFh). Still, it should work with inofficial flashcards, which do not do that redirection. No$gba emulates the redirection for regular official cartridges, but it disables redirection for homebrew carts if NDS7 rom offset<8000h, and NDS7 size>0.
[One possible problem: Newer "anti-passme" firmware versions reportedly check that the entrypoint isn't set to 80000C0h, that firmwares might also reject NDS9 entrypoints within the NDS7 bootcode region?]


 DS Cartridge Icon/Title < ^

The ROM offset of the Icon/Title is defined in Cartridge Header [68h].
If it is present (nonzero), then Icon/Title are displayed in the bootmenu.
  Addr  Siz  Expl.
000h 2 Version (0001h=Original, 0002h=With Chinese)
002h 2 CRC16 across entries 020h..83Fh
004h 2 CRC16 across entries 020h..93Fh (Version 2 only)
006h 1Ah Reserved (zero-filled)
020h 200h Icon Bitmap (32x32 pix) (4x4 tiles, each 4x8 bytes, 4bit depth)
220h 20h Icon Palette (16 colors, 16bit, range 0000h-7FFFh)
(Color 0 is transparent, so the 1st palette entry is ignored)
240h 100h Title 0 Japanese (128 characters, 16bit Unicode)
340h 100h Title 1 English ("")
440h 100h Title 2 French ("")
540h 100h Title 3 German ("")
640h 100h Title 4 Italian ("")
740h 100h Title 5 Spanish ("")
840h 100h Title 6 Chinese ("") (Version 2 only)
A00h - End of Icon/Title structure (unused bytes usually FFh-filled)
Usually, for non-multilanguage games, the same (english) title is stored in all title entries. The title may consist of ASCII characters 0020h-007Fh, character 000Ah (linefeed), and should be terminated/padded by 0000h. The whole text should not exceed the dimensions of the DS cart field in the bootmenu. The title is usually split into a primary title, optional sub-title, and manufacturer, each separated by 000Ah character(s). For example, "America",000Ah,"The Axis of War",000Ah,"Cynicware",0000h


 DS Cartridge Protocol < ^

Communication with Cartridge ROM relies on sending 8 byte commands to the cartridge, after the sending the command, a data stream can be received from the cartridge (the length of the data stream isn't fixed, below descriptions show the default length in brackets, but one may receive more, or less bytes, if desired).

Cartridge Memory Map
  0000h-0FFFh Header (unencrypted)
1000h-3FFFh Not read-able (zero filled in ROM-images)
4000h-7FFFh Secure Area, 16KBytes (first 2Kbytes with extra encryption)
8000h-... Main Data Area
Cartridge memory must be copied into Work RAM (the CPU cannot execute code in ROM).

Command Summary, Cmd/Reply-Encryption Type, Default Length
  Command/Params    Expl.                             Cmd  Reply Len
-- Unencrypted Load --
9F00000000000000h Dummy (read HIGH-Z bytes) RAW RAW 2000h
0000000000000000h Get Cartridge Header RAW RAW 200h
9000000000000000h 1st Get ROM Chip ID RAW RAW 4
00aaaaaaaa000000h Unencrypted Data (debug ver only) RAW RAW 200h
3Ciiijjjxkkkkkxxh Activate KEY1 Encryption Mode RAW RAW 0
-- Secure Area Load --
4llllmmmnnnkkkkkh Activate KEY2 Encryption Mode KEY1 FIX 910h+0
1lllliiijjjkkkkkh 2nd Get ROM Chip ID KEY1 KEY2 910h+4
xxxxxxxxxxxxxxxxh Invalid - Get KEY2 Stream XOR 00h KEY1 KEY2 910h+...
2bbbbiiijjjkkkkkh Get Secure Area Block (4Kbytes) KEY1 KEY2 910h+11A8h
6lllliiijjjkkkkkh Optional KEY2 Disable KEY1 KEY2 910h+?
Alllliiijjjkkkkkh Enter Main Data Mode KEY1 KEY2 910h+0
-- Main Data Load --
B7aaaaaaaa000000h Encrypted Data Read KEY2 KEY2 200h
B800000000000000h 3rd Get ROM Chip ID KEY2 KEY2 4
xxxxxxxxxxxxxxxxh Invalid - Get KEY2 Stream XOR 00h KEY2 KEY2 ...
The parameter digits contained in above commands are:
  aaaaaaaa     32bit ROM address (command B7 can access only 8000h and up)
bbbb Secure Area Block number (0004h..0007h for addr 4000h..7000h)
x,xx Random, not used in further commands
iii,jjj,llll Random, must be SAME value in further commands
kkkkk Random, must be INCREMENTED after FURTHER commands
mmm,nnn Random, used as KEY2-encryption seed
++++ Unencrypted Commands (First Part of Boot Procedure) ++++

Cartridge Reset
The /RES Pin switches the cartridge into unencrypted mode. After reset, the first two commands (9Fh and 00h) are transferred at 4MB/s CLK rate.

9F00000000000000h (2000h) - Dummy
Dummy command send after reset, returns endless stream of HIGH-Z bytes (ie. usually receiving FFh, immediately after sending the command, the first 1-2 received bytes may be equal to the last command byte).

0000000000000000h (200h) - Get Header
Returns RAW unencrypted cartridge header, repeated every 1000h bytes. The interesting area are the 1st 200h bytes, the rest is typically zero filled.
The Gamecode header entry is used later on to initialize the encryption. Also, the ROM Control entries define the length of the KEY1 dummy periods (typically 910h clocks), and the CLK transfer rate for further commands (typically faster than the initial 4MB/s after power up).

9000000000000000h (4) - 1st Get ROM Chip ID
Returns RAW unencrypted Chip ID (eg. C2h,0Fh,00h,00h), repeated every 4 bytes.
  1st byte - Manufacturer (C2h = Macronix)
2nd byte - Chip size in megabytes minus 1 (eg. 0Fh = 16MB)
3rd byte - Reserved/zero (probably upper bits of chip size)
4th byte - Bit7: Secure Area Block transfer mode (8x200h or 1000h)
Existing/known NDS chip sizes are 16MB (eg. Metroid Demo), 32MB (eg. Over the Hedge), and 64MB (eg. Ultimate Spiderman).

3Ciiijjjxkkkkkxxh (0) - Activate KEY1 Encryption Mode
The 3Ch command returns endless stream of HIGH-Z bytes, all following commands, and their return values, are encrypted. The random parameters iii,jjj,kkkkk must be re-used in further commands; the 20bit kkkkk value is to be incremented by one after each <further> command (it is <not> incremented after the 3Ch command).

++++ KEY1 Encrypted Commands (2nd Part of Boot procedure) ++++

4llllmmmnnnkkkkkh (910h) - Activate KEY2 Encryption Mode
KEY1 encrypted command, parameter mmmnnn is used to initialize the KEY2 encryption stream. Returns 910h dummy bytes (which are still subject to old KEY2 settings; at pre-initialization time, this is fixed: HIGH-Z, C5h, 3Ah, 81h, etc.). The new KEY2 seeds are then applied, and the first KEY2 byte is then precomputed. The 910h dummy stream is followed by that precomputed byte value endless repeated (this is the same value as that "underneath" of the first HIGH-Z dummy-byte of the next command).

1lllliiijjjkkkkkh (914h) - 2nd Get ROM Chip ID / Get KEY2 Stream
KEY1 encrypted command. Returns 910h dummy bytes, followed by KEY2 encrypted Chip ID repeated every 4 bytes, which must be identical as for the 1st Get ID command. The BIOS randomly executes this command once or twice. Changing the first command byte to any other value returns an endless KEY2 encrypted stream of 00h bytes, that is the easiest way to retrieve encryption values and to bypass the copyprotection.

2bbbbiiijjjkkkkkh (19B8h) - Get Secure Area Block
KEY1 encrypted command. Used to read a secure area block (bbbb in range 0004h..0007h for addr 4000h..7000h), each block is 4K, so it requires four Get Secure Area commands to receive the whole Secure Area (ROM locations 4000h-7FFFh), the BIOS is reading these blocks in random order.
Normally (if the upper bit of the Chip ID is set): Returns 910h dummy bytes, followed by 200h KEY2 encrypted Secure Area bytes, followed by 18h KEY2 encrypted 00h bytes, then the next 200h KEY2 encrypted Secure Area bytes, again followed by 18h KEY2 encrypted 00h bytes, and so on. That stream is repeated every 10C0h bytes (8x200h data bytes, plus 8x18h zero bytes).
Alternately (if the upper bit of the Chip ID is zero): Returns 910h dummy bytes, followed by 1000h KEY2 encrypted Secure Area bytes, presumably followed by 18h bytes, too.
Aside from above KEY2 encryption (which is done by hardware), the first 2K of the Secure Area is additionally KEY1 encrypted (which must be resolved after transfer by software).

6lllliiijjjkkkkkh (0) - Optional KEY2 Disable
KEY1 encrypted command. Returns 910h dummy bytes (which are still KEY2 affected), followed by endless stream of RAW 00h bytes. KEY2 encryption is disabled for all following commands.
This command is send only if firmware[18h] matches encrypted string "enPngOFF", and ONLY if firmware get_crypt_keys had completed BEFORE completion of secure area loading, this timing issue may cause unstable results.

Alllliiijjjkkkkkh (910h) - Enter Main Data Mode
KEY1 encrypted command. Returns 910h dummy bytes, followed by endless KEY2 encrypted stream of 00h bytes. All following commands are KEY2 encrypted.

++++ KEY2 Encrypted Commands (Main Data Transfer) ++++

B7aaaaaaaa000000h (200h) - Get Data
KEY2 encrypted command. The desired ROM address is specifed, MSB first, in parameter bytes (a). Can be used only for addresses 8000h and up, smaller addresses will be silently redirected to address "8000h+(addr AND 1FFh)". There is no alignment restriction for the address. However, the datastream wraps to the begin of the current 4K block when address+length crosses a 4K boundary (1000h bytes). Returned data is KEY2 encrypted.

B800000000000000h (4) - 3rd Get ROM Chip ID
KEY2 encrypted command. Returns KEY2 encrypted Chip ID repeated every 4 bytes.

xxxxxxxxxxxxxxxxh - Invalid Command
Any other command (anything else than above B7h and B8h) in KEY2 command mode causes communcation failures. The invalid command returns an endless KEY2 encrypted stream of 00h bytes. After the invalid command, the KEY2 stream is NOT advanced for further command bytes, further commands seems to return KEY2 encrypted 00h bytes, of which, the first returned byte appears to be HIGH-Z.
Ie. the cartridge seems to have switched back to a state similar to the KEY1-phase, although it doesn't seem to be possible to send KEY1 commands.

++++ Notes ++++

KEY1 Command Encryption / 910h Dummy Bytes
All KEY1 encrypted commands are followed by 910h dummy byte transfers, these 910h clock cycles are probably used to decrypt the command at the cartridge side; communication will fail when transferring less than 910h bytes.
The return values for the dummy transfer are: A single HIGH-Z byte, followed by 90Fh KEY2-encrypted 00h bytes. The KEY2 encryption stream is advanced for all 910h bytes, including for the HIGH-Z byte.
Note: Current cartridges are using 910h bytes, however, other carts might use other amounts of dummy bytes, the 910h value can be calculated based on ROM Control entries in cartridge header. For the KEY1 formulas, see:
DS Encryption by Gamecode/Idcode (KEY1)

KEY2 Command/Data Encryption
DS Encryption by Random Seed (KEY2)


 DS Cartridge Backup < ^

SPI Bus Backup Memory
  Type   Total Size  Page Size  Chip/Example      Game/Example
EEPROM 0.5K bytes 16 bytes ST M95040-W (eg. Metroid Demo)
EEPROM 8K bytes 32 bytes ST M95640-W (eg. Super Mario DS)
EEPROM 64K bytes 128 bytes ST M95512-W (eg. Downhill Jam)
FLASH 256K bytes 256 bytes ST M45PE20 (eg. Skateland)
FLASH 256K bytes Sanyo LE25FW203T (eg. Mariokart)
FLASH 512K bytes 256 bytes ST M25PE40? (eg. which/any games?)
FRAM 8K bytes No limit ? (eg. which/any games?)
FRAM 32K bytes No limit Ramtron FM25L256? (eg. which/any games?)
Lifetime Stats
  Type      Max Writes per Page    Data Retention
EEPROM 100,000 40 years
FLASH 100,000 20 years
FRAM No limit 10 years
SPI Bus Backup Memory is accessed via Ports 40001A0h and 40001A2h, see
DS Cartridge I/O Ports

Commands
For all EEPROM and FRAM types:
  06h WREN  Write Enable                Cmd, no parameters
04h WRDI Write Disable Cmd, no parameters
05h RDSR Read Status Register Cmd, read repeated status value(s)
01h WRSR Write Status Register Cmd, write one-byte value
9Fh RDID Read JEDEC ID (not supported on EEPROM/FLASH, returns FFh-bytes)
For 0.5K EEPROM (8+1bit Address):
  03h RDLO  Read from Memory 000h-0FFh  Cmd, addr lsb, read byte(s)
0Bh RDHI Read from Memory 100h-1FFh Cmd, addr lsb, read byte(s)
02h WRLO Write to Memory 000h-0FFh Cmd, addr lsb, write 1..MAX byte(s)
0Ah WRHI Write to Memory 100h-1FFh Cmd, addr lsb, write 1..MAX byte(s)
For 8K..64K EEPROM and for FRAM (16bit Address):
  03h RD    Read from Memory            Cmd, addr msb,lsb, read byte(s)
02h WR Write to Memory Cmd, addr msb,lsb, write 1..MAX byte(s)
Note: MAX = Page Size (see above chip list) (no limit for FRAM).

For FLASH backup, commands should be same as for Firmware FLASH memory:
DS Firmware Serial Flash Memory

Status Register
  0   WIP  Write in Progress (1=Busy) (Read only) (always 0 for FRAM chips)
1 WEL Write Enable Latch (1=Enable) (Read only, except by WREN,WRDI)
2-3 WP Write Protect (0=None, 1=Upper quarter, 2=Upper Half, 3=All memory)
For 0.5K EEPROM:
  4-7 ONEs Not used (all four bits are always set to "1" each)
For 8K..64K EEPROM and for FRAM:
  4-6 ZERO Not used (all three bits are always set to "0" each)
7 SRWD Status Register Write Disable (0=Normal, 1=Lock) (Only if /W=LOW)
WEL gets reset on Power-up, WRDI, WRSR, WRITE/LO/HI, and on /W=LOW.
The WRSR command allows to change ONLY the two WP bits, and the SRWD bit (if any), these bits are non-volatile (remain intact during power-down), respectively, the WIP bit must be checked to sense WRSR completion.

Detection (by examining hardware responses)
The overall memory type and bus-width can be detected by RDSR/RDID commands:
  RDSR  RDID          Type         (bus-width)
FFh, FFh,FFh,FFh None (none)
F0h, FFh,FFh,FFh EEPROM (with 8+1bit address bus)
00h, FFh,FFh,FFh EEPROM/FRAM (with 16bit address bus)
00h, xxh,xxh,xxh FLASH (usually with 24bit address bus)
And, the RD commands can be used to detect the memory size/mirrors (though that won't work if the memory is empty).

Pin-Outs for EEPROM and FRAM chips
  Pin Name Expl.
1 /S Chip Select
2 Q Data Out
3 /W Write-Protect (not used in NDS, wired to VCC)
4 VSS Ground
5 D Data In
6 C Clock
7 /HOLD Transfer-pause (not used in NDS, wired to VCC)
8 VCC Supply 2.5 to 5.5V for M95xx0-W
FRAM (Ferroelectric Nonvolatile RAM) is fully backwards compatible with normal EEPROMs, but comes up with faster write/erase time (no delays), and with lower power consumption, and unlimited number of write/erase cycles. Unlike as for normal RAM, as far as I understand, the data remains intact without needing any battery.


 DS Cartridge I/O Ports < ^

The Gamecard bus registers can be mapped to NDS7 or NDS9 via EXMEMCNT, see
DS Memory Control

40001A0h - NDS7/NDS9 - AUXSPICNT - Gamecard ROM and SPI Control
  0-1   SPI Baudrate        (0=4MHz/Default, 1=2MHz, 2=1MHz, 3=512KHz)
2-5 Not used (always zero)
6 SPI Hold Chipselect (0=Deselect after transfer, 1=Keep selected)
7 SPI Busy (0=Ready, 1=Busy) (presumably Read-only)
8-12 Not used (always zero)
13 NDS Slot Mode (0=Parallel/ROM, 1=Serial/SPI-Backup)
14 Transfer Ready IRQ (0=Disable, 1=Enable) (for ROM, not for AUXSPI)
15 NDS Slot Enable (0=Disable, 1=Enable) (for both ROM and AUXSPI)
The "Hold" flag should be cleared BEFORE transferring the LAST data unit, the chipselect will be then automatically cleared after the transfer, the program should issue a WaitByLoop(12) (on NDS7, or longer on NDS9) manually AFTER the LAST transfer.

40001A2h - NDS7/NDS9 - AUXSPIDATA - Gamecard SPI Bus Data/Strobe (R/W)
The SPI transfer is started on writing to this register, so one must <write> a dummy value (should be zero) even when intending to <read> from SPI bus.
  0-7  Data
8-15 Not used (always zero)
During transfer, the Busy flag in AUXSPICNT is set, and the written DATA value is transferred to the device (via output line), simultaneously data is received (via input line). Upon transfer completion, the Busy flag goes off, and the received value can be then read from AUXSPIDATA, if desired.

40001A4h - NDS7/NDS9 - ROMCTRL - Gamecard Bus ROMCTRL (R/W)
  Bit   Expl.
0-12 KEY1 length part1 (0-1FFFh) (forced min 08F8h by BIOS)
13 KEY2 encrypt data (0=Disable, 1=Enable KEY2 Encryption for Data)
14 "SE" Unknown? (usually same as Bit13)
15 KEY2 Apply Seed (0=No change, 1=Apply Encryption Seed) (Write only)
16-21 KEY1 length part2 (0-3Fh) (forced min 18h by BIOS)
22 KEY2 encrypt cmd (0=Disable, 1=Enable KEY2 Encryption for Commands)
23 Data-Word Status (0=Busy, 1=Ready/DRQ) (Read-only)
24-26 Data Block size (0=None, 1..6=100h SHL (1..6) bytes, 7=4 bytes)
27 Transfer CLK rate (0=6.7MHz=33.51MHz/5, 1=4.2MHz=33.51MHz/8)
28 Secure Area Mode (0=Normal, 1=Other)
29 "RESB" Unknown (always 1 ?) (not read/write-able)
30 "WR" Unknown (always 0 ?) (read/write-able)
31 Block Start/Status (0=Ready, 1=Start/Busy) (IRQ See 40001A0h/Bit14)
The cartridge header is booted at 4.2MHz CLK rate, and following transfers are then using ROMCTRL settings specified in cartridge header entries [060h] and [064h], which are usually using 6.7MHz CLK rate. When using other CLK rate, the Timeout in header [06Eh] must be probably adjusted accordingly.
Transfer length of null, four, and 200h..4000h bytes are supported by the console, however, regular cartridges support only max 1000h bytes.

40001A8h - NDS7/NDS9 - Gamecard bus 8-byte Command Out
The separate commands are described in the Cartridge Protocol chapter, however, once when the BIOS boot procedure has completed, one would usually only need command "B7aaaaaaaa000000h", for reading data (usually 200h bytes) from address aaaaaaaah (which should be usually aligned by 200h).
  0-7   1st Command Byte (at 40001A8h) (eg. B7h) (MSB)
8-15 2nd Command Byte (at 40001A9h) (eg. addr bit 24-31)
16-23 3rd Command Byte (at 40001AAh) (eg. addr bit 16-23)
24-31 4th Command Byte (at 40001ABh) (eg. addr bit 8-15) (when aligned=even)
32-39 5th Command Byte (at 40001ACh) (eg. addr bit 0-7) (when aligned=00h)
40-47 6th Command Byte (at 40001ADh) (eg. 00h)
48-57 7th Command Byte (at 40001AEh) (eg. 00h)
56-63 8th Command Byte (at 40001AFh) (eg. 00h) (LSB)
Observe that the command/parameter MSB is located at the smallest memory location (40001A8h), ie. compared with the CPU, the byte-order is reversed.

4100010h - NDS7/NDS9 - Gamecard bus 4-byte Data In (R)
  0-7   1st received Data Byte (at 4100010h)
8-15 2nd received Data Byte (at 4100011h)
16-23 3rd received Data Byte (at 4100012h)
24-31 4th received Data Byte (at 4100013h)
After sending a command, data can be read from this register manually (when the DRQ bit is set), or by DMA (with DMASAD=4100010h, Fixed Source Address, Length=1, Size=32bit, Repeat=On, Mode=DS Gamecard).

40001B0h - 32bit - NDS7/NDS9 - Encryption Seed 0 Lower 32bit (W)
40001B4h - 32bit - NDS7/NDS9 - Encryption Seed 1 Lower 32bit (W)
40001B8h - 16bit - NDS7/NDS9 - Encryption Seed 0 Upper 7bit (bit7-15 unused)
40001BAh - 16bit - NDS7/NDS9 - Encryption Seed 1 Upper 7bit (bit7-15 unused)
These registers are used by the NDS7 BIOS to initialize KEY2 encryption (and there's normally no need to change the initial settings). Writes to the Seed registers do not have direct effect on the internal encryption registers, until the Seed gets applied by writing "1" to ROMCTRL.Bit15.

For more info:
DS Encryption by Random Seed (KEY2)

Note: There are <separate> Seed registers for both NDS7 and NDS9, which can be applied by ROMCTRL on NDS7 and NDS9 respectively (however, once applied to the internal registers, the new internal setting is used for <both> CPUs).


 DS Cartridge NitroROM File System < ^

NitroROM
The NitroROM Filesystem is used by many commercial games, at least those that have been developed with Nintendo's tools. The filesystem allows to load data from cartridge ROM by filenames and/or by Overlay IDs.
However, the DS hardware, BIOS, and Firmware do NOT contain any built-in filesystem functions. The ARM9/ARM7 boot code (together max 3903KB), and Icon/Title information are automatically loaded on power-up.
Programs that require to load additional data from cartridge ROM may do that either by implementing whatever functions to translate filenames to ROM addresses, or by reading from ROM directly.

File Allocation Table (FAT) (base/size defined in cart header)
Contains ROM addresses for up to 61440 files (File IDs 0000h and up).
  Addr Size Expl.
00h 4 Start address of file in ROM (8000h and up) (0=Unused Entry)
04h 4 End address of file in ROM (Start+Len...-1?) (0=Unused Entry)
Directories are fully defined in FNT area, and do not require FAT entries.

File Name Table (FNT) (base/size defined in cart header)
Consists of the FNT Directory Table, followed by one or more FNT Sub-Tables.
To interprete the directory tree: Start at the 1st Main-Table entry, which is referencing to a Sub-Table, any directories in the Sub-Table are referencing to Main-Table entries, which are referencing to further Sub-Tables, and so on.

FNT Directory Main-Table (base=FNT+0, size=[FNT+06h]*8)
Consists of a list of up to 4096 directories (Directory IDs F000h and up).
  Addr Size Expl.
00h 4 Offset to Sub-table (originated at FNT base)
04h 2 ID of first file in Sub-table (0000h..EFFFh)
For first entry (ID F000h, root directory):
  06h  2    Total Number of directories     (1..4096)
Further entries (ID F001h..FFFFh, sub-directories):
  06h  2    ID of parent directory (F000h..FFFEh)
FNT Sub-tables (base=FNT+offset, ends at Type/Length=00h)
Contains ASCII names for all files and sub-directories within a directory.
  Addr Size Expl.
00h 1 Type/Length
01h..7Fh File Entry (Length=1..127, without ID field)
81h..FFh Sub-Directory Entry (Length=1..127, plus ID field)
00h End of Sub-Table
80h Reserved
01h LEN File or Sub-Directory Name, case-sensitive, without any ending
zero, ASCII 20h..7Eh, except for characters \/?"<>*:;|
Below for Sub-Directory Entries only:
  LEN+1 2    Sub-Directory ID (F001h..FFFFh) ;see FNT+(ID AND FFFh)*8
File Entries do not have above ID field. Instead, File IDs are assigned in incrementing order (starting at the "First ID" value specified in the Directory Table).

ARM9 and ARM7 Overlay Tables (OVT) (base/size defined in cart header)
Somehow related to Nintendo's compiler, allows to assign compiler Overlay IDs to filesystem File IDs, and to define additional information such like load addresses.
  Addr Size Expl.
00h 4 Overlay ID
04h 4 RAM Address ;Point at which to load
08h 4 RAM Size ;Amount to load
0Ch 4 BSS Size ;Size of BSS data region
10h 4 Static initialiser start address
14h 4 Static initialiser end address
18h 4 File ID (0000h..EFFFh)
1Ch 4 Reserved (zero)
Cartridge Header
The base/size of FAT, FNT, OVT areas is defined in cartridge header,
DS Cartridge Header


 DS Cartridge PassMe/PassThrough < ^

PassMe is an adapter connected between the DS and an original NDS cartridge, used to boot unencrypted code from a flash cartridge in the GBA slot, it replaces the following entries in the original NDS cartridge header:
  Addr  Siz Patch
004h 4 E59FF018h ;opcode LDR PC,[027FFE24h] at 27FFE04h
01Fh 1 04h ;set autostart bit
022h 1 01h ;set ARM9 rom offset to nn01nnnnh (above secure area)
024h 4 027FFE04h ;patch ARM9 entry address to endless loop
034h 4 080000C0h ;patch ARM7 entry address in GBA slot
15Eh 2 nnnnh ;adjust header crc16
After having verified the encrypted chip IDs (from the original cartridge), the console thinks that it has successfully loaded a NDS cartridge, and then jumps to the (patched) entrypoints.

GBA Flashcard Format
Although the original PassMe requires only the entrypoint, PassMe programs should additionally contain one (or both) of the ID values below, allowing firmware patches to identify & start PassMe games without real PassMe hardware.
  0A0h  GBA-style Title    ("DSBooter")
0ACh GBA-style Gamecode ("PASS")
0C0h ARM7 Entrypoint (32bit ARM code)
Of course, that applies only to early homebrew programs, newer games should use normal NDS cartridge headers.

ARM9 Entrypoint
The GBA-slot access rights in the EXMEMCNT register are initially assigned to the ARM7 CPU, so the ARM9 cannot boot from the flashcard, instead it is switched into an endless loop in Main RAM (which contains a copy of the cartridge header at 27FFE00h and up). The ARM7 must thus copy ARM9 code to Main RAM, and the set the ARM9 entry address by writing to [027FFE24h].


 DS Cartridge GBA Slot < ^

Aside from the 17-pin NDS slot, the DS also includes a 32-pin GBA slot. Which is used for backwards compatibility GBA mode. Additionally, in DS mode, it can be as expansion port, or for importing data from GBA games.
In DS mode, ROM, SRAM, FLASH backup, and whatever peripherals contained in older GBA cartridges can be accessed (almost) identically as in GBA mode,
GBA Cartridges

Addressing
In DS mode, only one ROM-region is present at 8000000h-9FFFFFFh (ie. without the GBA's mirrored WS1 and WS2 regions at A000000h-DFFFFFFh). The expansion region (for SRAM/FLASH/etc) has been moved from E000000h-E00FFFFh (GBA-mode) to A000000h-A00FFFFh (DS-mode).

Timings
GBA timings are specified as "waitstates" (excluding 1 access cycle), NDS timings are specified as (total) "access time". And, the NDS bus-clock is twice as fast as for GBA. So, for "N" GBA waitstates, the NDS access time should be "(N+1)*2". Timings are controlled via NDS EXMEMCNT instead GBA WAITCNT,
DS Memory Control - Cartridges and Main RAM

GBA EEPROM
EEPROMs in GBA carts cannot be accessed in DS mode. The EEPROMs should be accessed with 8 waits on GBA, ie. 18 cycles on NDS on both 1st/2nd access. But, 2nd access is restricted to max 6 cycles in NDS mode, which is ways too fast.


 DS Cart Rumble Pak < ^

DS Rumble Option Pak
The Rumble Pak comes bundled with Metroid Prime Pinball. It contains a small actuator made by ALPS to make it rumble. The original device (NTR-008) is sized like a normal GBA cartridge, and there's also shorter variant for the DS-Lite (USG-006).
The rumble pak is pretty simple internally, it only wires up to a few pins on the GBA Cartridge Port:
  VCC, GND, /WR, AD1, and IRQ (grounded)
AD1 runs into a little 8 pin chip, which is probably just a latch on the rising edge of /WR. A line runs from this chip to a transistor that is directly connected to the actuator. The only other chip on the board is a 5 pin jobber, probably a power component.
For detection, AD1 seems to be pulled low when reading from it, the other AD lines are open bus (containing the halfword address), so one can do:
  for i=0 to 0FFFh
if halfword[8000000h+i*2]<>(i and FFFDh) then <not_a_ds_rumble_pak>
next i
The actuator doesn't have an on/off setting like a motor, it rumbles when you switch it between the two settings. Switch frequently for a fast rumble, and fairly slowly for more of a 'tick' effect. That should be done via timer irq:
  rumble_state = rumble_state xor 0002h
halfword[8000000h]=rumble_state
Unknown if one of the two states has higher power-consumption than the other, ie. if it's a "pull/release" mechanism, if so, then disabling rumble should be done by using the "release" state, which would be AD1=0, or AD1=1...?
Note: The v3 firmware can detect the Rumble Pak as an option pak, but it does not provide an enable/disable rumble option in the alarm menu.

Other DS Rumble device
There's also another DS add-on with rumble. That device uses AD8 (instead AD1) to control rumble, and, it's using a classic motor (ie. it's rumbling while and as long as the latched AD8 value is "1").
DS Cart Slider with Rumble

GBA Rumble Carts
There are also a few GBA games that contain built-in Rumble, and which could be used by NDS games as well. To be user friendly, best support both types.
GBA Cart Rumble


 DS Cart Slider with Rumble < ^

Add-on device for the japanese title Magukiddo. The optical sensor is attached underneath of the console (connected to the GBA slot).
The sensor is an Agilent ADNS-2030 Low Power Optical Mouse Sensor (16pin DIP chip with built-in optical sensor, and external LED light source) with two-wire serial bus (CLK and DTA).

ADNS-2030 Registers (write 1 byte index, then read/write 1 byte data)
Index (Bit7=Direction; 0=Read, 1=Write):
  00h Product_ID (R) (03h)
01h Revision_ID (R) (10h=Rev. 1.0) (20h=Used in DS-option-pak)
02h Motion/Status Flags (R)
03h Delta_X (R) (signed 8bit) (automatically reset to 00h after reading)
04h Delta_Y (R) (signed 8bit) (automatically reset to 00h after reading)
05h SQUAL (R) (surface quality) (unsigned 8bit)
06h Average_Pixel (R) (unsigned 6bit, upper 2bit unused)
07h Maximum_Pixel (R) (unsigned 6bit, upper 2bit unused)
08h Reserved
09h Reserved
0Ah Configuration_bits (R/W)
0Bh Reserved
0Ch Data_Out_Lower (R)
0Dh Data_Out_Upper (R)
0Eh Shutter_Lower (R)
0Fh Shutter_Upper (R)
10h Frame_Period_Lower (R/W)
11h Frame_Period_Upper (R/W)
Motion/Status Flags:
  7 Motion since last report or PD (0=None, 1=Motion occurred)
6 Reserved
5 LED Fault detected (0=No fault, 1=Fault detected)
4 Delta Y Overflow (0=No overflow, 1=Overflow occured)
3 Delta X Overflow (0=No overflow, 1=Overflow occured)
2 Reserved
1 Reserved
0 Resolution in counts per inch (0=400, 1=800)
Configuration_bits:
  7 Reset Power up defaults (W) (0=No, 1=Reset)
6 LED Shutter Mode (0=LED always on, 1=LED only on when shutter is open)
5 Self Test (W) (0=No, 1=Perform all self tests)
4 Resolution in counts per inch (0=400, 1=800)
3 Dump 16x16 Pixel bitmap (0=No, 1=Dump via Data_Out ports)
2 Reserved
1 Reserved
0 Sleep Mode (0=Normal/Sleep after 1 second, 1=Always awake)
_______
|74273 |
/WR -----------------> |CLK | _____
AD1/SIO CLK ---------> |D1 Q1|--------------> CLK |74125|
AD2 power control ---> |D2 Q2|---> ____ | |
AD3/SIO DIR ---------> |D3 Q3|------+-|7400\________|/EN |
AD8 rumble on/off ---> |D? Q?|---> +-|____/ | |
AD0/SIO DTA ----+----> |D5 Q5|----------------------|A Y|--+--DTA
| |_______| |- - -| |
____ +-------------------------------------|Y A|--+
/RD ---|7400\______ ____ | |
/RD ---|____/ |7400\_____________________________|/EN |
A19 _______________|____/ |_____|
7400 Quad NAND Gate, 74273 8bit Latch

AD0 Optical Sensor Serial Data (0=Low, 1=High)
AD1 Optical Sensor Serial Clock (0=Low, 1=High)
AD2 Optical Sensor Power (0=Off, 1=On)
AD3 Optical Sensor Serial Direction (0=Read, 1=Write)
AD8 Rumble Motor (0=Off, 1=On)

Thanks: Daniel Palmer


 DS Cart Unknown Add-Ons < ^

DS Memory Expansion Pak (NTR-011 or USG-007)
Connects to the GBA slot. Reportedly used for a NDS browser. Might be also used by other bloated programs. Up to now, access times, memory region, bus-width, chip types, and even the memory size seem to be totally unknown.


 DS Cart Cheat Action Replay DS < ^

The first commercial DS cheat code solution, this device was developed by Datel. It supports swapping out cartridges after loading the AR software. For updating, the user may either manually enter codes or use the included proprietary USB cable that comes with the device. The user has been able to manually update codes since firmware version 1.52.

Action Replay DS Codes
  ABCD-NNNNNNNN       Game ID ;ASCII Gamecode [00Ch] and CRC32 across [0..1FFh]
00000000 XXXXXXXX manual hook codes (rarely used) (default is auto hook)
0XXXXXXX YYYYYYYY word[XXXXXXX+offset] = YYYYYYYY
1XXXXXXX 0000YYYY half[XXXXXXX+offset] = YYYY
2XXXXXXX 000000YY byte[XXXXXXX+offset] = YY
3XXXXXXX YYYYYYYY IF YYYYYYYY > word[XXXXXXX] ;unsigned ;\
4XXXXXXX YYYYYYYY IF YYYYYYYY < word[XXXXXXX] ;unsigned ; for v1.54,
5XXXXXXX YYYYYYYY IF YYYYYYYY = word[XXXXXXX] ; when X=0,
6XXXXXXX YYYYYYYY IF YYYYYYYY <> word[XXXXXXX] ; uses
7XXXXXXX ZZZZYYYY IF YYYY > ((not ZZZZ) AND half[XXXXXXX]) ; [offset]
8XXXXXXX ZZZZYYYY IF YYYY < ((not ZZZZ) AND half[XXXXXXX]) ; instead of
9XXXXXXX ZZZZYYYY IF YYYY = ((not ZZZZ) AND half[XXXXXXX]) ; [XXXXXXX]
AXXXXXXX ZZZZYYYY IF YYYY <> ((not ZZZZ) AND half[XXXXXXX]) ;/
BXXXXXXX 00000000 offset = word[XXXXXXX+offset]
C0000000 YYYYYYYY FOR loopcount=0 to YYYYYYYY ;execute Y+1 times
C4000000 00000000 offset = address of the C4000000 code ;v1.54
C5000000 XXXXYYYY counter=counter+1, IF (counter AND YYYY) = XXXX ;v1.54
C6000000 XXXXXXXX [XXXXXXXX]=offset ;v1.54
D0000000 00000000 ENDIF
D1000000 00000000 NEXT loopcount
D2000000 00000000 NEXT loopcount, and then FLUSH everything
D3000000 XXXXXXXX offset = XXXXXXXX
D4000000 XXXXXXXX datareg = datareg + XXXXXXXX
D5000000 XXXXXXXX datareg = XXXXXXXX
D6000000 XXXXXXXX word[XXXXXXXX+offset]=datareg, offset=offset+4
D7000000 XXXXXXXX half[XXXXXXXX+offset]=datareg, offset=offset+2
D8000000 XXXXXXXX byte[XXXXXXXX+offset]=datareg, offset=offset+1
D9000000 XXXXXXXX datareg = word[XXXXXXXX+offset]
DA000000 XXXXXXXX datareg = half[XXXXXXXX+offset]
DB000000 XXXXXXXX datareg = byte[XXXXXXXX+offset] ;bugged on pre-v1.54
DC000000 XXXXXXXX offset = offset + XXXXXXXX
EXXXXXXX YYYYYYYY Copy YYYYYYYY parameter bytes to [XXXXXXXX+offset...]
44332211 88776655 parameter bytes 1..8 for above code (example)
0000AA99 00000000 parameter bytes 9..10 for above code (padded with 00s)
FXXXXXXX YYYYYYYY Copy YYYYYYYY bytes from [offset..] to [XXXXXXX...]
IF/ENDIF can be nested up to 32 times. FOR/NEXT cannot be nested, any FOR statement does forcefully terminate any prior loop. FOR does backup the current IF condidition flags, and NEXT does restore these flags, so ENDIF(s) aren't required inside of the loop. The NEXT+FLUSH command does (after finishing the loop) reset offset=0, datareg=0, and does clear all condition flags, so further ENDIF(s) aren't required after the loop.
Before v1.54, the DB000000 code did accidently set offset=offset+XXXXXXX after execution of the code. For all word/halfword accesses, the address should be aligned accordingly. For the COPY commands, addresses should be aligned by four (all data is copied with ldr/str, except, on odd lengths, the last 1..3 bytes do use ldrb/strb).
offset, datareg, loopcount, and counter are internal registers in the action replay software.

> The condition register is checked, for all code types
> but the D0, D1 and D2 code type
Makes sense.

> and for the C5 code type it's checked AFTER the counter has
> been incremented (so the counter is always incremented
I love that exceptions ;-)

Hook
The hook codes consist of a series of nine 00000000 XXXXXXXX codes, and must be marked as (M) code (for not being confused with normal 0XXXXXXX YYYYYYYY codes). For all nine codes, the left 32bit are actually don't care (but should be zero), the meaning of the right 32bit varies from 1st to 9th code.
  1st: Address used prior to launching game (eg. 23xxxxxh)
2nd: Address to write the hook at (inside the ARM7 executable)
3rd: Hook final address (huh?)
4th: Hook mode selection (0=auto, 1=mode1, 2=mode2)
5th: Opcode that replaces the hooked one (eg. E51DE004h)
6th: Address to store important stuff (default 23FE000h)
7th: Address to store the code handler (default 23FE074h)
8th: Address to store the code list (default 23FE564h)
9th: Must be 1 (00000001h)
For most games, the AR does automatically hook code on the ARM7. Doing that automatically is nice, but hooking ARM7 means that there is no access to VRAM, TCM and Cache, which <might> cause problems since efficient games <should> store all important data in TCM or Cache (though, in practice, I'd doubt that any existing NDS games are that efficient).

Thanks
To Kenobi and Dualscreenman from Kodewerx for above ARDS cheat info.


 DS Cart Cheat Codebreaker DS < ^

This is Pelican's entry into the DS cheat-device industry. It supports swapping out the cartridges, and alternately, also gives the user the option of connecting another gamecard onto it. For updating, the user may either manually enter codes, or use Wifi to connect to the Codebreaker update site (that updating will overwrite all manually entered codes though).

Codebreaker DS Codes
  ---Initialization---
0000CR16 GAMECODE Specify Game ID, use Encrypted codes
8000CR16 GAMECODE Specify Game ID, use Unencrypted codes
BEEFC0DE XXXXXXXX Change Encryption Keys
A0XXXXXX YYYYYYYY Bootup-Hook 1, X=Address, Y=Value
A8XXXXXX YYYYYYYY Bootup-Hook 2, X=Address, Y=Value
F0XXXXXX TYYYYYYY Code-Hook 1 (T=Type,Y=CheatEngineAddr,X=HookAddr)
F8XXXXXX TPPPPPPP Code-Hook 2 (T=Type,X=CheatEngineHookAddr,P=Params)
---General codes---
00XXXXXX 000000YY [X]=YY
10XXXXXX 0000YYYY [X]=YYYY
20XXXXXX YYYYYYYY [X]=YYYYYYYY
60XXXXXX 000000YY ZZZZZZZZ 00000000 [[X]+Z]=YY
60XXXXXX 0000YYYY ZZZZZZZZ 10000000 [[X]+Z]=YYYY
60XXXXXX YYYYYYYY ZZZZZZZZ 20000000 [[X]+Z]=YYYYYYYY
30XXXXXX 000000YY [X]=[X] + YY
30XXXXXX 0001YYYY [X]=[X] + YYYY
38XXXXXX YYYYYYYY [X]=[X] + YYYYYYYY
70XXXXXX 000000YY [X]=[X] OR YY
70XXXXXX 001000YY [X]=[X] AND YY
70XXXXXX 002000YY [X]=[X] XOR YY
70XXXXXX 0001YYYY [X]=[X] OR YYYY
70XXXXXX 0011YYYY [X]=[X] AND YYYY
70XXXXXX 0021YYYY [X]=[X] XOR YYYY
---Memory fill/copy---
40XXXXXX 2NUMSTEP 000000YY 000000ZZ byte[X+(0..NUM-1)*STEP*1]=Y+(0..NUM-1)*Z
40XXXXXX 1NUMSTEP 0000YYYY 0000ZZZZ half[X+(0..NUM-1)*STEP*2]=Y+(0..NUM-1)*Z
40XXXXXX 0NUMSTEP YYYYYYYY ZZZZZZZZ word[X+(0..NUM-1)*STEP*4]=Y+(0..NUM-1)*Z
50XXXXXX YYYYYYYY ZZZZZZZZ 00000000 copy Y bytes from [X] to [Z]
---Conditional codes (bugged)---
60XXXXXX 000000YY ZZZZZZZZ 01c100VV IF [[X]+Z] .. VV THEN [[X]+Z]=YY
60XXXXXX 000000YY ZZZZZZZZ 01c0VVVV IF [[X]+Z] .. VVVV THEN [[X]+Z]=YY
60XXXXXX 0000YYYY ZZZZZZZZ 11c100VV IF [[X]+Z] .. VV THEN [[X]+Z]=YYYY
60XXXXXX 0000YYYY ZZZZZZZZ 11c0VVVV IF [[X]+Z] .. VVVV THEN [[X]+Z]=YYYY
60XXXXXX YYYYYYYY ZZZZZZZZ 21c100VV IF [[X]+Z] .. VV THEN [[X]+Z]=YYYYYYYY
60XXXXXX YYYYYYYY ZZZZZZZZ 21c0VVVV IF [[X]+Z] .. VVVV THEN [[X]+Z]=YYYYYYYY
---Conditional codes (working)---
D0XXXXXX NNc100YY IF [X] .. YY THEN exec max(1,NN) lines
D0XXXXXX NNc0YYYY IF [X] .. YYYY THEN exec max(1,NN) lines
The condition digits (c=0..7), have the following functions:
  0 IF [mem] =  imm THEN ...              4 IF ([mem] AND imm) =  0   THEN ...
1 IF [mem] <> imm THEN ... 5 IF ([mem] AND imm) <> 0 THEN ...
2 IF [mem] < imm THEN ... (unsigned) 6 IF ([mem] AND imm) = imm THEN ...
3 IF [mem] > imm THEN ... (unsigned) 7 IF ([mem] AND imm) <> imm THEN ...
Notes
  GAMECODE  Cartridge Header[00Ch] (32bit in reversed byte-order)
CR16 Cartridge Header[15Eh] (16bit in normal byte-order)
XXXXXX 27bit addr (actually 7 digits, XXXXXXX, overlaps 5bit code number)
The "bugged" conditional codes (60XXXXXX) are accidently skipping NN lines when the condition is false, where NN is taken from the upper 8bit of the code's last 32bit values (ie. exactly as for the D0XXXXXX codes). For byte-writes, that would be NN=01h, which can be eventually dealt with, although there may be compatibility problems which future versions that might fix that bug. For halfword/word writes, NN would be 11h or 21h, so that codes are about totally unusable.

Codebreaker DS / Encrypted Codes
The overall "address value" decryption works like so:
  for i=4Fh to 00h
y=77628ECFh
if i>13h then y=59E5DC8Ah
if i>27h then y=054A7818h
if i>3Bh then y=B1BF0855h
address = (Key0-value) xor address
value = value - Key1 - (address ror 1Bh)
address = (address xor (value + y)) ror 13h
if (i>13h) then
if (i<=27h) or (i>3Bh) then x=Key2 xor Key1 xor Key0
else x=((Key2 xor Key1) and Key0) xor (Key1 and Key2)
value=value xor (x+y+address)
x = Secure[((i*4+00h) and FCh)+000h]
x = Secure[((i*4+34h) and FCh)+100h] xor x
x = Secure[((i*4+20h) and FCh)+200h] xor x
x = Secure[((i*4+08h) and FCh)+300h] xor x
address = address - (x ror 19h)
next i
Upon startup, the initial key settings are:
  Secure[0..7FFh] = Copy of the ENCRYPTED 1st 2Kbytes of the game's Secure Area
Key0 = 0C2EAB3Eh, Key1 = E2AE295Dh, Key2 = E1ACC3FFh, Key3 = 70D3AF46h
scramble_keys
Upon BEEFC0DE XXXXXXXX, the keys get changed like so:
  Key0 = Key0 + (XXXXXXXX ror 1Dh)
Key1 = Key1 - (XXXXXXXX ror 05h)
Key2 = Key2 xor (Key3 xor Key0)
Key3 = Key3 xor (Key2 - Key1)
scramble_keys
The above scramble_keys function works like so:
  for i=0 to FFh
y = byte(xlat_table[i])
Secure[i*4+000h] = (Secure[i*4+000h] xor Secure[y*4]) + Secure[y*4+100h]
Secure[i*4+400h] = (Secure[i*4+400h] xor Secure[y*4]) - Secure[y*4+200h]
next i
for i=0 to 63h
Key0 = Key0 xor (Secure[i*4] + Secure[i*4+190h])
Key1 = Key1 xor (Secure[i*4] + Secure[i*4+320h])
Key2 = Key2 xor (Secure[i*4] + Secure[i*4+4B0h])
Key3 = Key3 xor (Secure[i*4] + Secure[i*4+640h])
next i
Key0 = Key0 - Secure[7D0h]
Key1 = Key1 xor Secure[7E0h]
Key2 = Key2 + Secure[7F0h]
Key3 = Key3 xor Secure[7D0h] xor Secure[7F0h]
the xlat_table consists of 256 fixed 8bit values:
  34h,59h,00h,32h,7Bh,D3h,32h,C9h,9Bh,77h,75h,44h,E0h,73h,46h,06h
0Bh,88h,B3h,3Eh,ACh,F2h,BAh,FBh,2Bh,56h,FEh,7Ah,90h,F7h,8Dh,BCh
8Bh,86h,9Ch,89h,00h,19h,CDh,4Ch,54h,30h,01h,93h,30h,01h,FCh,36h
4Dh,9Fh,FDh,D7h,32h,94h,AEh,BCh,2Bh,61h,DFh,B3h,44h,EAh,8Bh,A3h
2Bh,53h,33h,54h,42h,27h,21h,DFh,A9h,DDh,C0h,35h,58h,EFh,8Bh,33h
B4h,D3h,1Bh,C7h,93h,AEh,32h,30h,F1h,CDh,A8h,8Ah,47h,8Ch,70h,0Ch
17h,4Eh,0Eh,A2h,85h,0Dh,6Eh,37h,4Ch,39h,1Fh,44h,98h,26h,D8h,A1h
B6h,54h,F3h,AFh,98h,83h,74h,0Eh,13h,6Eh,F4h,F7h,86h,80h,ECh,8Eh
EEh,4Ah,05h,A1h,F1h,EAh,B4h,D6h,B8h,65h,8Ah,39h,B3h,59h,11h,20h
B6h,BBh,4Dh,88h,68h,24h,12h,9Bh,59h,38h,06h,FAh,15h,1Dh,40h,F0h
01h,77h,57h,F5h,5Dh,76h,E5h,F1h,51h,7Dh,B4h,FAh,7Eh,D6h,32h,4Fh
0Eh,C8h,61h,C1h,EEh,FBh,2Ah,FCh,ABh,EAh,97h,D5h,5Dh,E8h,FAh,2Ch
06h,CCh,86h,D2h,8Ch,10h,D7h,4Ah,CEh,8Fh,EBh,03h,16h,ADh,84h,98h
F5h,88h,2Ah,18h,ACh,7Fh,F6h,94h,FBh,3Fh,00h,B6h,32h,A2h,ABh,28h
64h,5Ch,0Fh,C6h,23h,12h,0Ch,D2h,BAh,4Dh,A3h,F2h,C9h,86h,31h,57h
0Eh,F8h,ECh,E1h,A0h,9Ah,3Ch,65h,17h,18h,A0h,81h,D0h,DBh,D5h,AEh
all used operations are unsigned 32bit integer.

Thanks
To Kenobi and Dualscreenman from Kodewerx for above CBDS cheat info.


 DS Encryption by Gamecode/Idcode (KEY1) < ^

KEY1 - Gamecode / Idcode Encryption
The KEY1 encryption relies only on the gamecode (or firmware idcode), it does not contain any random components. The fact that KEY1 encrypted commands appear random is just because the <unencrypted> commands contain random values, so the encryption result looks random.

KEY1 encryption is used for KEY1 encrypted gamecart commands (ie. for loading the secure area). It is also used for resolving the extra decryption of the first 2K of the secure area, and for firmware decryption, and to decode some encrypted values in gamecart/firmware header.

Below are KEY1 encryption formulas. The formulas can be used only with a copy of the key table (1048h bytes) from the NDS ARM7 BIOS (address 0030h..1077h).

encrypt_64bit(ptr) / decrypt_64bit(ptr)
  Y=[ptr+0]
X=[ptr+4]
FOR I=0 TO 0Fh (encrypt), or FOR I=11h TO 02h (decrypt)
Z=[keybuf+I*4] XOR X
X=[keybuf+048h+((Z SHR 24) AND FFh)*4]
X=[keybuf+448h+((Z SHR 16) AND FFh)*4] + X
X=[keybuf+848h+((Z SHR 8) AND FFh)*4] XOR X
X=[keybuf+C48h+((Z SHR 0) AND FFh)*4] + X
X=Y XOR X
Y=Z
NEXT I
[ptr+0]=X XOR [keybuf+40h] (encrypt), or [ptr+0]=X XOR [keybuf+4h] (decrypt)
[ptr+4]=Y XOR [keybuf+44h] (encrypt), or [ptr+4]=Y XOR [keybuf+0h] (decrypt)
apply_keycode(modulo)
  encrypt_64bit(keycode+4)
encrypt_64bit(keycode+0)
[scratch]=0000000000000000h ;S=0 (64bit)
FOR I=0 TO 44h STEP 4 ;xor with reversed byte-order (bswap)
[keybuf+I]=[keybuf+I] XOR bswap_32bit([keycode+(I MOD modulo)])
NEXT I
FOR I=0 TO 1040h STEP 8
encrypt_64bit(scratch) ;encrypt S (64bit) by keybuf
[keybuf+I+0]=[scratch+4] ;write S to keybuf (first upper 32bit)
[keybuf+I+4]=[scratch+0] ;write S to keybuf (then lower 32bit)
NEXT I
init_keycode(idcode,level,modulo)
  copy [arm7bios+0030h..1077h] to [keybuf+0..1047h]
[keycode+0]=[idcode]
[keycode+4]=[idcode]/2
[keycode+8]=[idcode]*2
IF level>=1 THEN apply_keycode(modulo) ;first apply (always)
IF level>=2 THEN apply_keycode(modulo) ;second apply (optional)
[keycode+4]=[keycode+4]*2
[keycode+8]=[keycode+8]/2
IF level>=3 THEN apply_keycode(modulo) ;third apply (optional)
firmware_decryption
  init_keycode(firmware_header+08h,1,0Ch) ;idcode (usually "MACP"), level 1
decrypt_64bit(firmware_header+18h) ;rominfo
init_keycode(firmware_header+08h,2,0Ch) ;idcode (usually "MACP"), level 2
decrypt ARM9 and ARM7 bootcode by decrypt_64bit (each 8 bytes)
decompress ARM9 and ARM7 bootcode by LZ77 function (swi)
calc CRC16 on decrypted/decompressed ARM9 bootcode followed by ARM7 bootcode
Note: The sizes of the compressed/encrypted bootcode areas are unknown (until they are fully decompressed), one way to solve that problem is to decrypt the next 8 bytes each time when the decompression function requires more data.

gamecart_decryption
  init_keycode(cart_header+0Ch,1,08h)   ;gamecode, level 1, modulo 8
decrypt_64bit(cart_header+78h) ;rominfo (secure area disable)
init_keycode(cart_header+0Ch,2,08h) ;gamecode, level 2, modulo 8
encrypt_64bit all KEY1 commands (1st command byte in MSB of 64bit value)
after loading the secure_area, calculate secure_area crc, then
decrypt_64bit(secure_area+0) ;first 8 bytes of secure area
init_keycode(cart_header+0Ch,3,08h) ;gamecode, level 3, modulo 8
decrypt_64bit(secure_area+0..7F8h) ;each 8 bytes in first 2K of secure
After decryption, the ID field in the first 8 bytes should be "encryObj", if it matches then first 8 bytes are filled with E7FFDEFFh, otherwise the whole 2K are filled by that value.

Gamecart Command Register
Observe that the byte-order of the command register [40001A8h] is reversed. The way how the CPU stores 64bit data in memory (and the way how the "encrypt_64bit" function for KEY1-encrypted commands expects data in memory) is LSB at [addr+0] and MSB at [addr+7]. This value is to be transferred MSB first. However, the DS hardware transfers [40001A8h+0] first, and [40001A8h+7] last. So, the byte order must be reversed when copying the value from memory to the command register.

Note
The KEY1 encryption is based on Bruce Schneier's "Blowfish Encryption Algorithm".


 DS Encryption by Random Seed (KEY2) < ^

KEY2 39bit Seed Values
The pre-initialization settings at cartridge-side (after reset) are:
  Seed0 = 58C56DE0E8h
Seed1 = 5C879B9B05h
The post-initialization settings (after sending command 4llllmmmnnnkkkkkh to the cartridge, and after writing the Seed values to Port 40001Bxh) are:
  Seed0 = (mmmnnn SHL 15)+6000h+Seedbyte
Seed1 = 5C879B9B05h
The seedbyte is selected by Cartridge Header [013h].Bit0-2, this index value should be usually in range 0..5, however, possible values for index 0..7 are: E8h,4Dh,5Ah,B1h,17h,8Fh,99h,D5h.
The 24bit random value (mmmnnn) is derived from the real time clock setting, and also scattered by KEY1 encryption, anyways, it's just random and doesn't really matter where it comes from.

KEY2 Encryption
Relies on two 39bit registers (x and y), which are initialized as such:
  x = reversed_bit_order(seed0)  ;ie. LSB(bit0) exchanged with MSB(bit38), etc.
y = reversed_bit_order(seed1)
During transfer, x, y, and transferred data are modified as such:
  x = (((x shr 5)xor(x shr 17)xor(x shr 18)xor(x shr 31)) and 0FFh)+(x shl 8)
y = (((y shr 5)xor(y shr 23)xor(y shr 18)xor(y shr 31)) and 0FFh)+(y shl 8)
data = (data xor x xor y) and 0FFh

 DS Firmware Serial Flash Memory < ^

ST Microelectronics SPI Bus Compatible Serial FLASH Memory
  ST M45PE20 - ID 20h,40h,12h - 256 KBytes (Nintendo DS) (in my old DS)
ST M35PE20 - ID 20h,50h,12h - 256 KBytes (Nintendo DS) (in my DS-Lite)
ST M25PE40 - ID 20h,80h,13h - 512 KBytes (iQue DS, with chinese charset)
Sanyo LE25FW203T - ID 62h,16h,00h - 256 KBytes (Mariokart backup)
More than 100,000 Write Cycles, more than 20 Year Data Retention
The Firmware Flash Memory is accessed via SPI bus,
DS Serial Peripheral Interface Bus (SPI)

Instruction Codes
  06h  WREN Write Enable (No Parameters)
04h WRDI Write Disable (No Parameters)
9Fh RDID Read JEDEC Identification (Read 1..3 ID Bytes)
(Manufacturer, Device Type, Capacity)
05h RDSR Read Status Register (Read Status Register, endless repeated)
Bit7-2 Not used (zero)
Bit1 WEL Write Enable Latch (0=No, 1=Enable)
Bit0 WIP Write/Program/Erase in Progess (0=No, 1=Busy)
03h READ Read Data Bytes (Write 3-Byte-Address, read endless data stream)
0Bh FAST Read Data Bytes at Higher Speed (Write 3-Byte-Address, write 1
dummy-byte, read endless data stream) (max 25Mbit/s)
0Ah PW Page Write (Write 3-Byte-Address, write 1..256 data bytes)
(changing bits to 0 or 1) (reads unchanged data, erases the page,
then writes new & unchanged data) (11ms typ, 25ms max)
02h PP Page Program (Write 3-Byte-Address, write 1..256 data bytes)
(changing bits from 1 to 0) (1.2ms typ, 5ms max)
DBh PE Page Erase 100h bytes (Write 3-Byte-Address) (10ms typ, 20ms max)
D8h SE Sector Erase 10000h bytes (Write 3-Byte-Address) (1s typ, 5s max)
B9h DP Deep Power-down (No Parameters) (consumption 1uA typ, 10uA max)
(3us) (ignores all further instructions, except RDP)
ABh RDP Release from Deep Power-down (No Parameters) (30us)
Write/Program may not cross page-boundaries. Write/Program/Erase are rejected during first 1..10ms after power up. The WEL bit is automatically cleared on Power-Up, on /Reset, and on completion of WRDI/PW/PP/PE/SE instructions. WEL is set by WREN instruction (which must be issued before any write/program/erase instructions). Don't know how RDSR behaves when trying to write to the write-protected region?

Communication Protocol
  Set Chip Select LOW to invoke the command
Transmit the instruction byte
Transmit any parameter bytes
Transmit/receive any data bytes
Set Chip Select HIGH to finish the command
All bytes (and 3-byte addresses) transferred most significant bit/byte first.

Pin-Outs
  1   D    Serial Data In (latched at rising clock edge)          _________
2 C Serial Clock (max 25MHz) /|o |
3 /RES Reset 1 -| | |- 8
4 /S Chip Select (instructions start at falling edge) 2 -| | |- 7
5 /W Write Protect (makes first 256 pages read-only) 3 -| |_________|- 6
6 VCC Supply (2.7V..3.6V typ) (4V max) (DS:VDD3.3) 4 -|/ |- 5
7 VSS Ground |___________|
8 Q Serial Data Out (changes at falling clock edge)

 DS Firmware Header < ^

Firmware Memory Map
  00000h-00029h  Firmware Header
0002Ah-001FFh Wifi Settings
00200h-3F9FFh Firmware Code/Data
3FA00h-3FAFFh Wifi Access Point 1
3FB00h-3FBFFh Wifi Access Point 2
3FC00h-3FCFFh Wifi Access Point 3
3FD00h-3FDFFh Not used
3FE00h-3FEFFh User Settings Area 1
3FF00h-3FFFFh User Settings Area 2
On iQue DS (with 512K flash memory), user settings are moved to 7FE00h and up, and, there seems to be some unknown stuff at 200h..27Fh.

Firmware Header (00000h-001FFh)
  Addr Size Expl.
000h 2 part3 romaddr/8 (arm9 gui code) (LZ/huffman compression)
002h 2 part4 romaddr/8 (arm7 wifi code) (LZ/huffman compression)
004h 2 part3/4 CRC16 arm9/7 gui/wifi code
006h 2 part1/2 CRC16 arm9/7 boot code
008h 4 firmware identifier (usually nintendo "MAC",nn) (or nocash "XBOO")
the 4th byte (nn) occassionally changes in different versions
00Ch 2 part1 arm9 boot code romaddr/2^(2+shift1) (LZSS compressed)
00Eh 2 part1 arm9 boot code 2800000h-ramaddr/2^(2+shift2)
010h 2 part2 arm7 boot code romaddr/2^(2+shift3) (LZSS compressed)
012h 2 part2 arm7 boot code 3810000h-ramaddr/2^(2+shift4)
014h 2 shift amounts, bit0-2=shift1, bit3-5=shift2, bit6-8=shift3,
bit9-11=shift4, bit12-15=firmware_chipsize/128K
016h 2 part5 data/gfx romaddr/8 (LZ/huffman compression)
018h 8 Optional KEY1-encrypted "enPngOFF"=Cartridge KEY2 Disable
(feature isn't used in any consoles, instead contains timestamp)
018h 5 Firmware version built timestamp (BCD minute,hour,day,month,year)
01Dh 1 Console type (FFh=NDS, 20h=NDS-lite, 43h=iQueDS, 63h=iQueDS-lite)
(The entry was unused (FFh) in older NDS, ie. replace FFh by 00h)
Bit0,1 seems to be iQue related (see also: Bit6)
Bit5 seems to be DS-Lite related
Bit6 indicates presence of "extended" iQue user settings
01Eh 2 Unused (FFh-filled)
020h 2 User Settings Offset (div8) (usually last 200h flash bytes)
022h 2 Unknown
024h 2 Unknown
026h 2 part5 CRC16 data/gfx
028h 2 unused (FFh-filled)
02Ah-1FFh Wifi Calibration Data (see next chapter)

 DS Firmware Wifi Calibration Data < ^

Wifi Calibration/Settings (located directly after Firmware Header)
  Addr Size Expl.
000h-029h Firmware Header (see previous chapter)
02Ah 2 CRC16 (with initial value 0) of [2Ch..2Ch+config_length-1]
02Ch 2 config_length (usually 0138h, ie. entries 2Ch..163h)
02Eh 1 Unused (00h)
02Fh 1 Wifi version (00h=v1..v4, 03h=v5, 05h=v6..v7)
030h 6 Unused (00h-filled)
036h 6 48bit MAC address (v1-v5: 0009BFxxxxxx, v6-v7: 001656xxxxxx)
03Ch 2 list of enabled channels ANDed with 7FFE (Bit1..14 = Channel 1..14)
(usually 3FFEh, ie. only channel 1..13 enabled)
03Eh 2 Whatever Flags (usually FFFFh)
040h 1 RF Chip Type (usually 02h)
041h 1 RF Bits per entry at 0CEh (usually 18h=24bit=3byte) (Bit7=?)
042h 1 RF Number of entries at 0CEh (usually 0Ch)
043h 1 Unknown (usually 01h)
044h 2 Initial Value for [4808146h]
046h 2 Initial Value for [4808148h]
048h 2 Initial Value for [480814Ah]
04Ah 2 Initial Value for [480814Ch]
04Ch 2 Initial Value for [4808120h]
04Eh 2 Initial Value for [4808122h]
050h 2 Initial Value for [4808154h]
052h 2 Initial Value for [4808144h]
054h 2 Initial Value for [4808130h]
056h 2 Initial Value for [4808132h]
058h 2 Initial Value for [4808140h]
05Ah 2 Initial Value for [4808142h]
05Ch 2 Initial Value for [4808038h]
05Eh 2 Initial Value for [4808124h]
060h 2 Initial Value for [4808128h]
062h 2 Initial Value for [4808150h]
064h 69h Initial 8bit values for BB[0..68h]
0CDh 1 Unused (00h)
Below for Type2 (ie. when [044h]=2) (Mitsumi MM3155 and RF9008):
  0CEh 24h  Initial 24bit values for RF[0,4,5,6,7,8,9,0Ah,0Bh,1,2,3]
0F2h 54h Channel 1..14 2x24bit values for RF[5,6]
146h 0Eh Channel 1..14 8bit values for BB[1Eh] (usually somewhat B1h..B7h)
154h 0Eh Channel 1..14 8bit values for RF[9].Bit10..14 (usually 10h-filled)
Below for Type3 (ie. when [044h]=3) (Mitsumi MM3218):
  --- Type3 values are originated at 0CEh, following addresses depend on:  ---
1) number of initial values, found at [042h] ;usually 29h
2) number of BB indices, found at [0CEh+[042h]] ;usually 02h
3) number of RF indices, found at [043h] ;usually 02h
--- Below example addresses assume above values to be set to 29h,02h,02h ---
0CEh 29h Initial 8bit values for RF[0..28h]
0F7h 1 Number of BB indices per channel
0F8h 1 1st BB index
0F9h 14 1st BB data for channel 1..14
107h 1 2nd BB index
108h 14 2nd BB data for channel 1..14
116h 1 1st RF index
117h 14 1st RF data for channel 1..14
125h 1 2nd RF index
126h 14 2nd RF data for channel 1..14
134h 46 Unused (FFh-filled)
Below for both Type2 and Type3:
  162h 1    Unknown (usually 19h..1Ch)
163h 1 Unused (FFh) (Inside CRC16 region, with config_length=138h)
164h 9Ch Unused (FFh-filled) (Outside CRC16 region, with config_length=138h)
Most of the Wifi settings seem to be always the same values on all currently existing consoles. Except for:
Values that are (obviously) different are the CRC16, and 4th-6th bytes of the MAC address. Also, initial values for BB[01h] and BB[1Eh], and channel 1..14 values for BB[1Eh], and unknown entry [162h] contain different calibration settings on all consoles.
Firmware v5 is having a new wifi ID [2Fh]=03h, and different RF[9] setting.
Firmware v6 (dslite) has wifi ID [2Fh]=05h, and same RF[9] setting as v5, additionally, v6 and up have different 2nd-3rd bytes of the MAC address.

Moreover, a LOT of values are different with Type3 chips (ie. when [044h]=3).

Note
Unlike for Firmware User Settings, the Firmware Header (and Wifi Settings) aren't stored in RAM upon boot. So the data must be retrieved via SPI bus by software.


 DS Firmware Wifi Internet Access Points < ^

These three 100h byte regions are used to memorize known internet access points. The firmware doesn't use these regions, but games that support internet seem to be allowed to read (and write) them.

03FA00-03FAFF: connection data 1
03FB00-03FBFF: connection data 2
03FC00-03FCFF: connection data 3
(07Fxxx for iQue DS)
  Addr Siz Expl.
00h 64 Unknown
40h 32 SSID
60h 32 SSID for WEP64 on AOSS router (each security level has its own SSID)
80h 16 WEP Key 1 (for type/size, see entry E6h)
90h 16 WEP Key 2
A0h 16 WEP Key 3
B0h 16 WEP Key 4
C0h 4 IP Address (0=Auto/DHCP)
C4h 4 Gateway (0=Auto/DHCP)
C8h 4 Primary DNS Server (0=Auto/DHCP)
CCh 4 Secondary DNS Server (0=Auto/DHCP)
D0h 1 Subnet Mask (0=Auto/DHCP, 1..1Ch=Leading Ones) (eg. 6 = FC.00.00.00)
D1h .. Unknown
E6h 1 WEP Mode (0=None, 1/2/3=5/13/16 byte hex, 5/6/7=5/13/16 byte ascii)
E7h 1 Status (00h=Normal, 01h=AOSS, FFh=connection not configured/deleted)
E8h .. Unknown
EFh 1 bit0/1/2 - connection 1/2/3 (1=Configured, 0=Not configured)
F0h 6 Nintendo Wifi Connection 43bit User ID
(ID=([F0h] AND 07FFFFFFFFFFFFh)*1000, shown as decimal string
NNNN-NNNN-NNNN-N000) (the upper 5bit of the last byte are
containing additional/unknown nonzero data)
F6h 8 Unknown
FEh 2 CRC16 for Entries 00h..FDh (with initial value 0000h)
For connection 3: entries [EFh..FDh] - always zero-filled?
The location of the first data block seems to be at the User Settings address (see Firmware Header [020h]) minus 400h.


 DS Firmware User Settings < ^

Current Settings (RAM 27FFC80h-27FFCEFh)
User Settings 0 (Firmware 3FE00h-3FEFFh) ;(iQue uses different address,
User Settings 1 (Firmware 3FF00h-3FFFFh) ;see Firmware Header [020h])
  Addr Size Expl.
000h 2 Version (5) (Always 5, for all Firmware versions)
002h 1 Favorite color (0..15) (0=Gray, 1=Brown, etc.)
003h 1 Birthday month (1..12) (Binary, non-BCD)
004h 1 Birthday day (1..31) (Binary, non-BCD)
005h 1 Not used (zero)
006h 20 Nickname string in UTF-16 format
01Ah 2 Nickname length in characters (0..10)
01Ch 52 Message string in UTF-16 format
050h 2 Message length in characters (0..26)
052h 1 Alarm hour (0..23) (Binary, non-BCD)
053h 1 Alarm minute (0..59) (Binary, non-BCD)
054h 2
056h 1 80h=enable alarm (huh?), bit 0..6=enable?
057h 1 Zero (1 byte)
058h 2x2 Touch-screen calibration point (adc.x1,y1) 12bit ADC-position
05Ch 2x1 Touch-screen calibration point (scr.x1,y1) 8bit pixel-position
05Eh 2x2 Touch-screen calibration point (adc.x2,y2) 12bit ADC-position
062h 2x1 Touch-screen calibration point (scr.x2,y2) 8bit pixel-position
064h 2 Language and Flags (see below)
066h 1 Year (2000..2255) (when having entered date in the boot menu)
067h 1 Unknown (usually 00h...08h or 78h..7Fh or so)
068h 4 RTC Offset (difference in seconds when RTC time/date was changed)
06Ch 4 Not used (FFh-filled, sometimes 00h-filled)
Below not stored in RAM (found only in FLASH memory)...
  070h  2   update counter (used to check latest) (must be 0000h..007Fh)
072h 2 CRC16 of entries 00h..6Fh
074h 8Ch Not used (FFh-filled) (except iQue, see below)
Additional data in chinese iQue DS only: (see Firmware Header entry [1Dh].Bit6)
  074h  1   Unknown (01h) (maybe version?)
075h 1 Extended Language (0..5=Same as Entry 064h, plus 6=Chinese)
(for language 6, entry 064h defaults to english; for compatibility)
(for language 0..5, both entries 064h and 075h have same value)
076h 2 Bitmask for Supported Languages (Bit0..6) (usually 007Eh)
(for whatever reason, the iQue firmware doesn't support japanese)
078h 86h Not used (FFh-filled)
0FEh 2 CRC16 of entries 74h..FDh
Language and Flags (Entry 064h)
  Bit
0..2 Language (0=Japanese, 1=English, 2=French, 3=German,
4=Italian, 5=Spanish, 6..7=Reserved) (for Chinese see Entry 075h)
(the language setting also implies time/data format)
3 GBA mode screen selection (0=Upper, 1=Lower)
4-5 Backlight Level (0..3=Low,Med,High,Max) (DS-Lite only)
6 Bootmenu Disable (0=Manual/bootmenu, 1=Autostart Cartridge)
9 Settings Lost (1=Prompt for User Info, and Language, and Calibration)
10 Settings Okay (0=Prompt for User Info)
11 Settings Okay (0=Prompt for User Info) (Same as Bit10)
12 No function
13 Settings Okay (0=Prompt for User Info, and Language)
14 Settings Okay (0=Prompt for User Info) (Same as Bit10)
15 Settings Okay (0=Prompt for User Info) (Same as Bit10)
The Health and Safety message is skipped if Bit9=1, or if one or more of the following bits is zero: Bits 10,11,13,14,15. However, as soon as entering the bootmenu, the Penalty-Prompt occurs.

Note: There are two User Settings areas in the firmware chip, at offset 3FE00h and 3FF00h, if both areas have valid CRCs, then the current/newest area is that whose Update Counter is one bigger than in the other/older area.
 IF count1=((count0+1) AND 7Fh) THEN area1=newer ELSE area0=newer
When changing settings, the older area is overwritten with new data (and incremented Update Counter). The two areas allow to recover previous settings in case of a write-error (eg. on a battery failure during write).

Battery Removal
Even though the battery is required only for the RTC (not for the firmware flash memory), most of the firmware user settings are reset when removing the battery. This appears to be a strange bug-or-feature of the DS bios, at least, fortunately, it still keeps the rest of the firmware intact.


 DS Firmware Extended Settings < ^

Extended Settings contain some additional information which is not supported by the original firmware (current century, date/time formats, temperature calibration, etc.), the settings are supported by Nocash Firmware, by the no$gba emulator, and may be eventually also supported by other emulators. If present, the values can be used by games, otherwise games should use either whatever default settings, or contain their own configuration menu.

Extended Settings - loaded to 23FEE00h (aka fragments of NDS9 boot code)
  Addr Siz Expl.
00h 8 ID "XbooInfo"
08h 2 CRC16 Value [0Ch..0Ch+Length-1]
0Ah 2 CRC16 Length (from 0Ch and up)
0Ch 1 Version (currently 01h)
0Dh 1 Update Count (newer = (older+1) AND FFh)
0Eh 1 Bootmenu Flags
Bit6 Important Info (0=Disable, 1=Enable)
Bit7 Bootmenu Screen (0=Upper, 1=Lower)
0Fh 1 GBA Border (0=Black, 1=Gray Line)
10h 2 Temperature Calibration TP0 ADC value (x16) (sum of 16 ADC values)
12h 2 Temperature Calibration TP1 ADC value (x16) (sum of 16 ADC values)
14h 2 Temperature Calibration Degrees Kelvin (x100) (0=none)
16h 1 Temperature Flags
Bit0-1 Format (0=Celsius, 1=Fahrenheit, 2=Reaumur, 3=Kelvin)
17h 1 Backlight Intensity (0=0ff .. FFh=Full)
18h 4 Date Century Offset (currently 20, for years 2000..2099)
1Ch 1 Date Month Recovery Value (1..12)
1Dh 1 Date Day Recovery Value (1..31)
1Eh 1 Date Year Recovery Value (0..99)
1Fh 1 Date/Time Flags
Bit0-1 Date Format (0=YYYY-MM-DD, 1=MM-DD-YYYY, 2=DD-MM-YYYY)
Bit2 Friendly Date (0=Raw Numeric, 1=With Day/Month Names)
Bit5 Time DST (0=Hide DST, 1=Show DST=On/Off)
Bit6 Time Seconds (0=Hide Seconds, 1=Show Seconds)
Bit7 Time Format (0=24 hour, 1=12 hour)
20h 1 Date Separator (Ascii, usually Slash, or Dot)
21h 1 Time Separator (Ascii, usually Colon, or Dot)
22h 1 Decimal Separator (Ascii, usually Comma, or Dot)
23h 1 Thousands Separator (Ascii, usually Comma, or Dot)
24h 1 Daylight Saving Time (Nth)
Bit 0-3 Activate on (0..4 = Last,1st,2nd,3rd,4th)
Bit 4-7 Deactivate on (0..4 = Last,1st,2nd,3rd,4th)
25h 1 Daylight Saving Time (Day)
Bit 0-3 Activate on (0..7 = Mon,Tue,Wed,Thu,Fri,Sat,Sun,AnyDay)
Bit 4-7 Deactivate on (0..7 = Mon,Tue,Wed,Thu,Fri,Sat,Sun,AnyDay)
26h 1 Daylight Saving Time (of Month)
Bit 0-3 Activate DST in Month (1..12)
Bit 4-7 Deactivate DST in Month (1..12)
27h 1 Daylight Saving Time (Flags)
Bit 0 Current DST State (0=Off, 1=On)
Bit 1 Adjust DST Enable (0=Disable, 1=Enable)
Note: With the original firmware, the memory region at 23FEE00h and up contains un-initialized, non-zero-filled data (fragments of boot code).


 DS Wireless Communications < ^

DS Wifi I/O Map
DS Wifi Control
DS Wifi Interrupts
DS Wifi Power-Down Registers
DS Wifi Receive Control
DS Wifi Receive Buffer
DS Wifi Receive Statistics
DS Wifi Transmit Control
DS Wifi Transmit Buffers
DS Wifi Transmit Errors
DS Wifi Status
DS Wifi Timers
DS Wifi Multiplay Master
DS Wifi Multiplay Slave
DS Wifi Configuration Ports
DS Wifi Baseband Chip (BB)
DS Wifi RF Chip
DS Wifi RF9008 Registers
DS Wifi Unknown Registers
DS Wifi Unused Registers
DS Wifi Initialization
DS Wifi Flowcharts
DS Wifi Hardware Headers
DS Wifi Multiboot
DS Wifi IEEE802.11 Frames
DS Wifi IEEE802.11 Managment Frames (Type=0)
DS Wifi IEEE802.11 Control and Data Frames (Type=1/2)

2.4GHz band, Wireless LAN (WLAN) IEEE802.11b protocol

Credits
A very large part of the DS Wifi chapters is based on Stephen Stair's great DS Wifi document, thanks there.


 DS Wifi I/O Map < ^

Notice
Wifi Registers & RAM cannot be written to by STRB opcodes (ignored).

Registers - NDS7 - 4808000h..4808FFFh
  Addr Dir   Name            r/w  [Init] Description
000h R W_ID ---- [1440] Chip ID (1440h=DS, C340h=DS-Lite)
004h R/W W_MODE_RST 9fff [0000] Mode/Reset
006h R/W W_MODE_WEP --7f [0000] Mode/Wep modes
008h R/W W_TXSTATCNT ffff [0000] Beacon Status Request
00Ah R/W W_X_00Ah ffff [0000] [bit7 - ingore rx duplicates]
010h R/W W_IF ackk [0000] Wifi Interrupt Request Flags
012h R/W W_IE ffff [0000] Wifi Interrupt Enable
018h R/W W_MACADDR_0 ffff [0000] Hardware MAC Address (first 2 bytes)
01Ah R/W W_MACADDR_1 ffff [0000] Hardware MAC Address (next 2 bytes)
01Ch R/W W_MACADDR_2 ffff [0000] Hardware MAC Address (last 2 bytes)
020h R/W W_BSSID_0 ffff [0000] BSSID (first 2 bytes)
022h R/W W_BSSID_1 ffff [0000] BSSID (next 2 bytes)
024h R/W W_BSSID_2 ffff [0000] BSSID (last 2 bytes)
028h R/W W_AID_LOW ---f [0000] usually as lower 4bit of AID value
02Ah R/W W_AID_FULL -7ff [0000] AID value assigned by a BSS.
02Ch R/W W_TX_RETRYLIMIT ffff [0707] Tx Retry Limit (set from 0x00-0xFF)
02Eh R/W W_INTERNAL ---1 [0000]
030h R/W W_RXCNT ff0e [0000] Receive control
032h R/W W_WEP_CNT ffff [0000] WEP engine enable
034h R? W_INTERNAL 0000 [0000] bit0,1 (see port 004h, 040h, and 1A0h)
Power-Down Registers (and Random Generator)
  036h R/W   W_POWER_US      ---3 [0001]
038h R/W W_POWER_TX ---7 [0003]
03Ch R/W W_POWERSTATE -r-2 [0200]
040h R/W W_POWERFORCE 8--1 [0000]
044h R W_RANDOM 0xxx [0xxx]
048h R/W W_POWER_? ---3 [0000]
WLAN Memory Ports
  050h R/W   W_RXBUF_BEGIN   ffff [4000]
052h R/W W_RXBUF_END ffff [4800]
054h R W_RXBUF_WRCSR 0rrr [0000]
056h R/W W_RXBUF_WR_ADDR -fff [0000]
058h R/W W_RXBUF_RD_ADDR 1ffe [0000]
05Ah R/W W_RXBUF_READCSR -fff [0000]
05Ch R/W W_RXBUF_COUNT -fff [0000]
060h R W_RXBUF_RD_DATA rrrr [xxxx]
062h R/W W_RXBUF_GAP 1ffe [0000]
064h R/W W_RXBUF_GAPDISP -fff [0000]
068h R/W W_TXBUF_WR_ADDR 1ffe [0000]
06Ch R/W W_TXBUF_COUNT -fff [0000]
070h W W_TXBUF_WR_DATA xxxx [xxxx]
074h R/W W_TXBUF_GAP 1ffe [0000]
076h R/W W_TXBUF_GAPDISP 0fff [0000]
xxx
  078h W     W_INTERNAL      mirr [mirr] Read: Mirror of 068h
080h R/W W_TXBUF_BEACON ffff [0000] Beacon Transmit Location
084h R/W W_TXBUF_TIM --ff [0000] Beacon TIM Index in Frame Body
088h R/W W_LISTENCOUNT --ff [0000] Listen Count
08Ch R/W W_BEACONINT -3ff [0064] Beacon Interval
08Eh R/W W_LISTENINT --ff [0000] Listen Interval
090h R/W W_TXBUF_CMD ffff [0000] (used by firmware part4)
094h R/W W_TXBUF_REPLY1 ffff [0000] (used by firmware part4)
098h R W_TXBUF_REPLY2 0000 [0000] (used by firmware part4)
09Ch R/W W_INTERNAL ffff [0050] value 4x00h --> preamble+x*12h us?
0A0h R/W W_TXBUF_LOC1 ffff [0000]
0A4h R/W W_TXBUF_LOC2 ffff [0000]
0A8h R/W W_TXBUF_LOC3 ffff [0000]
0ACh W W_TXREQ_RESET fixx [0050]
0AEh W W_TXREQ_SET fixx [0050]
0B0h R W_TXREQ_READ --1f [0010]
0B4h W W_TXBUF_RESET 0000 [0000] (used by firmware part4)
0B6h R W_TXBUSY 0000 [0000] (used by firmware part4)
0B8h R W_TXSTAT 0000 [0000]
0BAh ? W_INTERNAL 0000 [0000]
0BCh R/W W_PREAMBLE ---3 [0001]
0C0h R/W x W_CMD_TOTALTIME ffff [0000] (used by firmware part4)
0C4h R/W x W_CMD_REPLYTIME ffff [0000] (used by firmware part4)
0C8h ? W_INTERNAL 0000 [0000]
0D0h R/W W_RXFILTER 1fff [0401]
0D4h R/W W_CONFIG_0D4h ---3 [0001]
0D8h R/W W_CONFIG_0D8h -fff [0004]
0DAh R/W W_CONFIG_0DAh ffff [0602]
0E0h R/W W_RXFILTER2 ---f [0008]
Wifi Timers
  0E8h R/W   W_US_COUNTCNT   ---1 [0000] Microsecond counter enable
0EAh R/W W_US_COMPARECNT ---1 [0000] Microsecond compare enable
0ECh R/W W_CONFIG_0ECh 3f1f [3F03]
0EEh R/W W_CMD_COUNTCNT ---1 [0001]
0F0h R/W W_US_COMPARE0 fc-- [FC00] Microsecond compare, bits 0-15
0F2h R/W W_US_COMPARE1 ffff [FFFF] Microsecond compare, bits 16-31
0F4h R/W W_US_COMPARE2 ffff [FFFF] Microsecond compare, bits 32-47
0F6h R/W W_US_COMPARE3 ffff [FFFF] Microsecond compare, bits 48-63
0F8h R/W W_US_COUNT0 ffff [0000] Microsecond counter, bits 0-15
0FAh R/W W_US_COUNT1 ffff [0000] Microsecond counter, bits 16-31
0FCh R/W W_US_COUNT2 ffff [0000] Microsecond counter, bits 32-47
0FEh R/W W_US_COUNT3 ffff [0000] Microsecond counter, bits 48-63
100h ? W_INTERNAL 0000 [0000]
102h ? W_INTERNAL 0000 [0000]
104h ? W_INTERNAL 0000 [0000]
106h ? W_INTERNAL 0000 [0000]
10Ch R/W W_CONTENTFREE ffff [0000] ...
110h R/W W_PRE_BEACON ffff [0000]
118h R/W W_CMD_COUNT ffff [0000]
11Ch R/W W_BEACONCOUNT1 ffff [0000] decreases; reloaded with W_BEACONINT
Configuration Ports (and some other Registers)
  120h R/W   W_CONFIG_120h   81ff [0048] ...init from firmware[04Ch]
122h R/W W_CONFIG_122h ffff [4840] ...init from firmware[04Eh]
124h R/W W_CONFIG_124h ffff [0000] ...init from firmware[05Eh], or 00C8h
126h ? W_INTERNAL fixx [ 0080]
128h R/W W_CONFIG_128h ffff [0000] ...init from firmware[060h], or 07D0h
12Ah ? W_INTERNAL fixx [1000] lower 12bit are same as W_CONFIG_128h
130h R/W W_CONFIG_130h -fff [0142] ...init from firmware[054h]
132h R/W W_CONFIG_132h 8fff [8064] ...init from firmware[056h]
134h R/W W_BEACONCOUNT2 ffff [FFFF] ...
140h R/W W_CONFIG_140h ffff [0000] ...init from firmware[058h], or xx
142h R/W W_CONFIG_142h ffff [2443] ...init from firmware[05Ah]
144h R/W W_CONFIG_144h --ff [0042] ...init from firmware[052h]
146h R/W W_CONFIG_146h --ff [0016] ...init from firmware[044h]
148h R/W W_CONFIG_148h --ff [0016] ...init from firmware[046h]
14Ah R/W W_CONFIG_14Ah --ff [0016] ...init from firmware[048h]
14Ch R/W W_CONFIG_14Ch ffff [162C] ...init from firmware[04Ah]
150h R/W W_CONFIG_150h ff3f [0204] ...init from firmware[062h], or 202h
154h R/W W_CONFIG_154h 7a7f [0058] ...init from firmware[050h]
Baseband Chip Ports
  158h W     W_BB_CNT        mirr [00B5] BB Access Start/Direction/Index
15Ah W W_BB_WRITE ???? [0000] BB Access data byte to write
15Ch R W_BB_READ 00rr [00B5] BB Access data byte read
15Eh R W_BB_BUSY 000r [0000] BB Access Busy flag
160h R/W W_BB_MODE 41-- [0100] BB Access Mode
168h R/W W_BB_POWER 8--f [800D] BB Access Powerdown
Internal Stuff
  16Ah ?     W_INTERNAL      0000 [0001] (or 0000h?)
170h ? W_INTERNAL 0000 [0000]
172h ? W_INTERNAL 0000 [0000]
174h ? W_INTERNAL 0000 [0000]
176h ? W_INTERNAL 0000 [0000]
178h W W_INTERNAL fixx [0800] Read: mirror of 17Ch
RF Chip Ports
  17Ch R/W   W_RF_DATA2      ffff [0800]
17Eh R/W W_RF_DATA1 ffff [C008]
180h R W_RF_BUSY 000r [0000]
184h R/W W_RF_CNT 413f [0018]
xxx
  190h R/W   W_INTERNAL      ffff [0000]
194h R/W W_TX_HDR_CNT ---7 [0000] (used by firmware part4) (0 or 6)
198h R/W W_INTERNAL ---f [0000]
19Ch R W_RF_PINS fixx [0004]
1A0h R/W x -933 [0000] (used by firmware part4) (0 or 823h)
1A2h R/W x ---3 [0001] (used by firmware part4)
1A4h R/W x ffff [0000] "Rate used when signal test..."
Wifi Statistics
  1A8h R     W_RXSTAT_INC_IF rrrr [0000] Statistics Increment Flags
1AAh R/W W_RXSTAT_INC_IE ffff [0000] Statistics Increment IRQ Enable
1ACh R W_RXSTAT_OVF_IF rrrr [0000] Statistics Half-Overflow Flags
1AEh R/W W_RXSTAT_OVF_IE ffff [0000] Statistics Half-Overflow IRQ Enable
1B0h R/W W_RXSTAT --ff [0000]
1B2h R/W W_RXSTAT ffff [0000] RX_LengthErrorCount RX_RateErrorCount
1B4h R/W W_RXSTAT rrff [0000] ... firmware uses also MSB ... ?
1B6h R/W W_RXSTAT ffff [0000]
1B8h R/W W_RXSTAT --ff [0000]
1BAh R/W W_RXSTAT --ff [0000]
1BCh R/W W_RXSTAT ffff [0000]
1BEh R/W W_RXSTAT ffff [0000]
1C0h R/W W_TX_ERR_COUNT --ff [0000] TransmitErrorCount
1C4h R W_RX_COUNT fixx [0000]
[1D0 - 1DE are 15 entries related to multiplayer response errors]
  1D0h R/W   W_CMD_STAT      ff-- [0000]
1D2h R/W W_CMD_STAT ffff [0000]
1D4h R/W W_CMD_STAT ffff [0000]
1D6h R/W W_CMD_STAT ffff [0000]
1D8h R/W W_CMD_STAT ffff [0000]
1DAh R/W W_CMD_STAT ffff [0000]
1DCh R/W W_CMD_STAT ffff [0000]
1DEh R/W W_CMD_STAT ffff [0000]
Internal Diagnostics Registers (usually not used for anything)
  1F0h R/W   W_INTERNAL      ---3 [0000]
204h ? W_INTERNAL fixx [0000]
208h ? W_INTERNAL fixx [0000]
20Ch W W_INTERNAL fixx [0050]
210h R W_TX_SEQNO fixx [0000]
214h R W_RF_STATUS XXXX [0009] (used by firmware part4)
21Ch W W_IF_SET fbff [0000] Force Interrupt (set bits in W_IF).
220h R/W W_INTERNAL ffff [0000] "Has something to do with whether the
packet is ignored or allowed by the
packet filtering system"
Bit0-1: Enable/Disable WifiRAM
(nothing to do with filtering, just
locks memory at 4000h-5FFFh)
224h R/W W_INTERNAL ---3 [0003]
228h W x fixx [0000] (used by firmware part4) (bit3)
230h R/W W_INTERNAL --ff [0047]
234h R/W W_INTERNAL -eff [0EFF]
238h R/W W_INTERNAL ffff [0000] ;rx_seq_no-60h+/-x ;why that?
;other day: fixed value, not seq_no related?
23Ch ? W_INTERNAL fixx [0000] like W_TXSTAT, but ONLY for beacons?
244h R/W x ffff [0000] (used by firmware part4)
248h R/W W_INTERNAL ffff [0000]
24Ch R W_INTERNAL fixx [0000] ;rx_mac_addr_0
24Eh R W_INTERNAL fixx [0000] ;rx_mac_addr_1
250h R W_INTERNAL fixx [0000] ;rx_mac_addr_2
254h ? W_CONFIG_254h fixx [0000] (read: FFFFh on DS, EEEEh on DS-Lite)
258h ? W_INTERNAL fixx [0000]
25Ch ? W_INTERNAL fixx [0000]
260h ? W_INTERNAL fixx [ 0FEF]
264h R W_INTERNAL fixx [0000] ;rx_addr_1 (usually "rxtx_addr-x")
268h R W_RXTX_ADDR fixx [0005] ;rxtx_addr
270h R W_INTERNAL fixx [0000] ;rx_addr_2 (usually "rx_addr_1-1")
274h ? W_INTERNAL fixx [ 0001]
278h R/W W_INTERNAL ffff [000F]
27Ch ? W_INTERNAL fixx [ 000A]
290h (R/W) x fixx [FFFF] bit 0 = ? (used by firmware part4)
298h W W_INTERNAL fixx [0000]
2A0h R/W W_INTERNAL ffff [0000]
2A2h R W_INTERNAL XXXX [7FFF] 15bit shift reg (used during tx...?)
2A4h R W_INTERNAL fixx [0000] ;rx_rate_1 (not ALWAYS same as 2C4h)
2A8h W W_INTERNAL fixx [0000]
2ACh ? W_INTERNAL fixx [ 0038]
2B0h W W_INTERNAL fixx [0000]
2B4h R/W W_INTERNAL -1-3 [0000]
2B8h ? W_INTERNAL fixx [0000]
2C0h R/W W_INTERNAL ---1 [0000]
2C4h R W_INTERNAL fixx [000A] ;rx_rate_2 (0Ah or 14h; 1 or 2 Mbit/s)
2C8h R W_INTERNAL fixx [0000] ;rx_duration/length/rate (or so?)
2CCh R W_INTERNAL fixx [0000] ;rx_framecontrol (from ieee header)
2D0h DIS W_INTERNAL ;"W_POWERACK" (not used by firmware)
;normally DISABLED (except on FORCE)
2F0h R/W W_INTERNAL ffff [0000]
2F2h R/W W_INTERNAL ffff [0000]
2F4h R/W W_INTERNAL ffff [0000]
2F6h R/W W_INTERNAL ffff [0000]
All other ports in range 000h..FFFh are unused.
All registers marked as "W_INTERNAL" aren't used by Firmware part4, and are probably unimportant, except for whatever special diagnostics purposes.
Reading from write-only ports (W) often mirrors to data from other ports.

Additionally, there are 69h Baseband Chip Registers (BB), and 0Fh RF Chip Registers (see BB and RF chapters).

For Wifi Power Managment (POWCNT2), for Wifi Waitstates (WIFIWAITCNT), and for the Power LED Blink Feature (conventionally used to indicate Wifi activity) see:
DS Power Management

For Wifi Configuration and Calibration data in Firmware Header, see:
DS Cartridges, Encryption, Firmware

Wifi RAM - NDS7 - Memory (4804000h..4805FFFh)
  4804000h W_MACMEM RX/TX Buffers (2000h bytes) (excluding below specials)
4805F60h Used for something, not included in the rx circular buffer.
4805F80h W_WEPKEY_0 (32 bytes)
4805FA0h W_WEPKEY_1 (32 bytes)
4805FC0h W_WEPKEY_2 (32 bytes)
4805FE0h W_WEPKEY_3 (32 bytes)
Unlike all other NDS memory, Wifi RAM is left uninitialized after boot.

5F80h - W_WEPKEY_0 thru W_WEPKEY_3 - Wifi WEP keys (R/W)
These WEP key slots store the WEP keys that are used for encryption for 802.11 keys IDs 0-3.


 DS Wifi Control < ^

000h - W_ID - Wifi Chip ID (R)
  0-15   Chip ID (1440h on NDS, C340h on NDS-lite)
The NDS-lite is more or less backwards compatible with the original NDS (the W_RXBUF_GAPDISP and W_TXBUF_GAPDISP are different, and most of the garbage effects on unused/mirrored ports are different, too).

004h - W_MODE_RST - Wifi Hardware mode / reset (R/W)
  0     Adjust some ports (0/1=see lists below) (R/W)
TX Master Enable for LOC1..3 and Beacon (0=Disable, 1=Enable)
1-12 Unknown (R/W)
13 Reset some ports (0=No change, 1=Reset/see list below) (Write-Only)
14 Reset some ports (0=No change, 1=Reset/see list below) (Write-Only)
15 Unknown (R/W)
006h - W_MODE_WEP - Wifi Software mode / Wep mode (R/W)
  0-2   specify a software mode for wifi operation
(may be related to hardware but a correlation has not yet been found)
3-5 specify the hardware WEP mode
0=no WEP, 1=64bit WEP (48bit key), and 3=128bit WEP.
(Values 2 and 4 exist too, but are nonstandard)
6 Unknown
8-15 Always zero
018h - W_MACADDR_0 - MAC Address (R/W)
01Ah - W_MACADDR_1 - MAC Address (R/W)
01Ch - W_MACADDR_2 - MAC Address (R/W)
48bit MAC Address of the console. Should be initialized from firmware[036h]. The hardware receives only packets that are sent to this address (or to group addresses, like FF:FF:FF:FF:FF:FF).

020h - W_BSSID_0 - BSSID (R/W)
022h - W_BSSID_1 - BSSID (R/W)
024h - W_BSSID_2 - BSSID (R/W)
48bit BSSID stored here. Ie. the MAC address of the host, obtained from Beacon frames (on the host itself, that should be just same as W_MACADDR). See W_RXFILTER.

028h - W_AID_LOW (R/W)
  Bit0-3   Maybe player-number, assuming that HW supports such? (1..15, or 0)
Bit4-15 Not used
Usually set equal to the lower 4bit of the W_AID_FULL value.

02Ah - W_AID_FULL - Association ID (R/W)
  Bit0-10  Association ID (AID) (1..2007, or zero)
Bit11-15 Not used
032h - W_WEP_CNT - WEP Engine Enable (R/W)
  0-14  Unknown (usually zero)
15 WEP Engine Enable (0=Disable, 1=Enable)
[expl. I - bit15 enables/disables WEP processing of sent/received packets]
[expl. II - bit15 enables wep processing on packets which bear the WEP flag in the 802.11 header]
[expl. III - bit15 seems to react on 0-to-1 transitions]

044h - W_RANDOM - Random Generator (R)
  0-10  Random
11-15 Not used (zero)
The random generator is updated at 33.51MHz rate, as such:
  X = (X AND 1) XOR (X ROL 1)  ;(rotation within 11bit range)
That random sequence goes through 5FDh different values before it restarts.
When reading from the random register, the old latched value is returned to the CPU, and the new current random value is then latched, so reads always return the older value, timed from the previous read.
Occassionally, about once every some thousand reads, the latching appears to occur 4 cycles earlier than normally, so the value on the next read will be 4 cycles older than expected.
The random register has ACTIVE mirrors.

0BCh - W_PREAMBLE - Preamble Control (R/W)
  Bit   Dir  Expl.
0 R/W Unknown (this does NOT affect TX)
1 R/W Preamble (0=Long, 1=Short) (this does NOT affect TX)
2 W Preamble (0=Long, 1=Short) (this does affect TX) (only at 2Mbit/s)
3-15 - Always zero
Short preamble works only with 2Mbit/s transfer rate (ie. when set like so in TX hardware header). 1Mbit/s rate always uses long preamble.
  Type   Carrier Signal  SFD Value     PLCP Header     Data
Long 128bit, 1Mbit 16bit, 1Mbit 48bit, 1Mbit N bits, 1Mbit or 2Mbit
Short 56bit, 1Mbit 16bit, 1Mbit 48bit, 2Mbit N bits, 2Mbit
Length of the Carrier+SFD+PLCP part is thus 192us (long) or 96us (short).
Note: The Carrier+SFD+PLCP part is sent between IRQ14 and IRQ07 (not between IRQ07 and IRQ01).

Writing "0-then-1" to W_MODE_RST.Bit0 does reset following ports:
  [034h]=0002h ;W_INTERNAL
[19Ch]=0046h ;W_RF_PINS
[214h]=0009h ;W_RF_STATUS
[27Ch]=0005h ;W_INTERNAL
[2A2h]=? ;...unstable?
Writing "1-then-0" to W_MODE_RST.Bit0 does reset following ports:
  [27Ch]=000Ah ;W_INTERNAL
Writing "1" to W_MODE_RST.Bit13 does reset following ports:
  [056h]=0000h ;W_RXBUF_WR_ADDR
[0C0h]=0000h ;W_CMD_TOTALTIME
[0C4h]=0000h ;W_CMD_REPLYTIME
[1A4h]=0000h ; x
[278h]=000Fh ;W_INTERNAL
...Also, following may be affected (results are unstable though)...
[0AEh]=? ;or rather the actual port (which it is an mirror of)
[0BAh]=? ;W_INTERNAL (occassionally unstable)
[204h]=? ;W_INTERNAL
[25Ch]=? ;W_INTERNAL
[268h]=? ;W_RXTX_ADDR
[274h]=? ;W_INTERNAL
Writing "1" to W_MODE_RST.Bit14 does reset following ports:
  [006h]=0000h ;W_MODE_WEP
[008h]=0000h ;W_TXSTATCNT
[00Ah]=0000h ;W_X_00Ah
[018h]=0000h ;W_MACADDR_0
[01Ah]=0000h ;W_MACADDR_1
[01Ch]=0000h ;W_MACADDR_2
[020h]=0000h ;W_BSSID_0
[022h]=0000h ;W_BSSID_1
[024h]=0000h ;W_BSSID_2
[028h]=0000h ;W_AID_LOW
[02Ah]=0000h ;W_AID_FULL
[02Ch]=0707h ;W_TX_RETRYLIMIT
[02Eh]=0000h ;W_INTERNAL
[050h]=4000h ;W_RXBUF_BEGIN
[052h]=4800h ;W_RXBUF_END
[084h]=0000h ;W_TXBUF_TIM
[0BCh]=0001h ;W_PREAMBLE
[0D0h]=0401h ;W_RXFILTER
[0D4h]=0001h ;W_CONFIG_0D4h
[0E0h]=0008h ;W_RXFILTER2
[0ECh]=3F03h ;W_CONFIG_0ECh
[194h]=0000h ;W_TX_HDR_CNT
[198h]=0000h ;W_INTERNAL
[1A2h]=0001h ; x
[224h]=0003h ;W_INTERNAL
[230h]=0047h ;W_INTERNAL

 DS Wifi Interrupts < ^

010h - W_IF - Wifi Interrupt Request Flags (R/W)
  0   Receive Complete  (packet received and stored in the RX fifo)
1 Transmit Complete (packet is done being transmitted) (no matter if error)
2 Receive Event Increment (IRQ02, see W_RXSTAT_INC_IE)
3 Transmit Error Increment (IRQ03, see W_TX_ERR_COUNT)
4 Receive Event Half-Overflow (IRQ04, see W_RXSTAT_OVF_IE)
5 Transmit Error Half-Overflow (IRQ05, see W_TX_ERR_COUNT.Bit7)
6 Start Receive (IRQ06, a packet has just started to be received)
7 Start Transmit (IRQ07, a packet has just started to be transmitted)
8 Txbuf Count Expired (IRQ08, see W_TXBUF_COUNT)
9 Rxbuf Count Expired (IRQ09, see W_RXBUF_COUNT)
10 Not used (always zero, even when trying to set it with W_IF_SET)
11 RF Wakeup (IRQ11, see W_POWERSTATE)
12 Multiplay ...? (IRQ12, see W_CMD_COUNT)
13 Post-Beacon Timeslot (IRQ13, see W_BEACONCOUNT2)
14 Beacon Timeslot (IRQ14, see W_BEACONCOUNT1/W_US_COMPARE)
15 Pre-Beacon Timeslot (IRQ15, see W_BEACONCOUNT1/W_PRE_BEACON)
Write a '1' to a bit to clear it.
The Transmit Start/Complete bits (Bit7,1) are set for EACH packet (including beacons, and including retries).

012h - W_IE - Wifi Interrupt Enable Flags (R/W)
  0-15  Enable Flags, same bits as W_IF  (0=Disable, 1=Enable)
In W_IE, Bit10 is R/W, but seems to have no function since IRQ10 doesn't exist.

21Ch - W_IF_SET (W_INTERNAL) - Force Wifi Interrupt Flags (W)
  0-15  Set corresponding bits in W_IF  (0=No change, 1=Set Bit)
Notes: Bit10 cannot be set since no IRQ10 exists. This register does only set IRQ flags, but without performing special actions (such like W_BEACONCOUNT1 and W_BEACONCOUNT2 reloads that occur on real IRQ14's).

Wifi Primary IRQ Flag (IF.Bit24, Port 4000214h)
IF.Bit24 gets set <only> when (W_IF AND W_IE) changes from 0000h to non-zero.
IF.Bit24 can be reset (ack) <even> when (W_IF AND W_IE) is still non-zero.
  Caution  Caution  Caution  Caution  Caution
  That means, when acknowledging IF.Bit24, then NO FURTHER wifi IRQs
  will be executed whilst and as long as (W_IF AND W_IE) is non-zero.
One work-around is to process/acknowledge ALL wifi IRQs in a loop, including further IRQs that may occur inside of that loop, until (W_IF AND W_IE) becomes 0000h.
Another work-around (for single IRQs) would be to acknowledge IF and W_IF, and then to set W_IE temporarily to 0000h, and then back to the old W_IE setting.


 DS Wifi Power-Down Registers < ^

036h - W_POWER_US (R/W)
  0     Disable W_US_COUNT and W_BB_ports  (0=Enable, 1=Disable)
1 Unknown (usually 0)
2-15 Always zero
Bit0=0 enables RFU by setting RFU.Pin11=HIGH, which activates the 22.000MHz oscillator on the RFU board, the 22MHz clock is then output to RFU.Pin26.

038h - W_POWER_TX (R/W)
transmit-related power save or sth
init from firmware[05Ch]
  0     Auto Wakeup (1=Leave Idle Mode a while after IRQ15)
1 Auto Sleep (0=Enter Idle Mode on IRQ13)
2 Unknown
3 Unknown (Write-only) (used by firmware)
4-15 Always zero
03Ch - W_POWERSTATE (R/W)/(R)
  0     Unknown (usually 0)                         (R/W)
1 Request Power Enable (0=No, 1=Yes/queued) (R/W, but not always)
2-7 Always zero
8 Indicates that Bit9 is about the be cleared (Read only)
9 Current power state (0=Enabled, 1=Disabled) (Read only)
10-15 Always zero
[value =1: queue disable power state] ;<-- seems to be incorrect
[value =2: queue enable power state] ;<-- seems to be correct
Enabling causes wakeup interrupt (IRQ11).
Note: That queue stuff seems to work only if W_POWER_US=0 and W_MODE_RST=1.

040h - W_POWERFORCE - Force Power State (R/W)
  0     New value for W_POWERSTATE.Bit9  (0=Clear/Delayed, 1=Set/Immediately)
1-14 Always zero
15 Apply Bit0 to W_POWERSTATE.Bit9 (0=No, 1=Yes)
Setting W_POWERFORCE=8001h whilst W_POWERSTATE.Bit9=0 acts immediately:
  (Doing this is okay. Switches to power down mode. Similar to IRQ13.)
[034h]=0002h ;W_INTERNAL
[03Ch]=02xxh ;W_POWERSTATE
[0B0h]=0000h ;W_TXREQ_READ
[19Ch]=0046h ;W_RF_PINS
[214h]=0009h ;W_RF_STATUS (idle)
Setting W_POWERFORCE=8000h whilst W_POWERSTATE.Bit9=1 acts delayed:
  (Don't do this. After that sequence, the hardware seems to be messed up)
W_POWERSTATE.Bit8 gets set to indicate the pending operation,
while pending, changes to W_POWERFORCE aren't applied to W_POWERSTATE,
while pending, W_POWERACK becomes Read/Write-able,
writing 0000h to W_POWERACK does clear W_POWERSTATE.Bit8,
and does apply POWERFORCE.Bit0 to W_POWERSTATE.Bit9
and does deactivate Port W_POWERACK again.
048h - W_POWER_? (R/W)
  0     Unknown
1 Unknown
2-15 Always zero
At whatever time (during transmit or so) it gets set to 0003h by hardware.

See also: POWCNT2, W_BB_POWER.


 DS Wifi Receive Control < ^

030h - W_RXCNT - Wifi Receive Control (R/W)
  0     Copy W_RXBUF_WR_ADDR to W_RXBUF_WRCSR             (Write-only)
1-3 Unknown (R/W)
4-6 Always zero
7 Copy [094h] to [098h], and reset [094h] to 0000h (Write-only)
Ie. Copy W_TXBUF_REPLY1 to W_TXBUF_REPLY2,
and reset W_TXBUF_REPLY1 to 0000h
8-14 Unknown (R/W)
15 Enable Queuing received data to RX FIFO (R/W)
0D0h - W_RXFILTER - (R/W)
  0     (0=Insist on W_BSSID, 1=Accept no matter of W_BSSID)
1-6 Unknown (usually zero)
7 Unknown (0 or 1)
8 Unknown (0 or 1)
9 Unknown (0 or 1)
10 Unknown (0 or 1) (when set, receives beacons, and maybe others)
11 Unknown (usually zero)
12 (0=Normal, 1=Accept even whatever garbage)
13-15 Not used (always zero)
Specifies what packets to allow.
0000h = Disable receive.
FFFFh = Enable receive.
0400h = Receives managment frames (and possibly others, too)

0E0h - W_RXFILTER2 - (R/W)
  0     Unknown (0=Receive Data Frames, 1=Ignore Data Frames) (?)
1 Unknown
2 Unknown
3 Unknown (usually set)
4-15 Not used (always zero)
Firmware writes values 08h, 0Bh, 0Dh (aka 1000b, 1011b, 1101b).
Firmware usually has bit0 set, even when receiving data frames, so, in some situations data frames seem to pass-through even when bit0 is set...? Possibly that situation is when W_BSSID matches...?
Control/PS-Poll frames seem to be passed always (even if W_RXFILTER2=0Fh).


 DS Wifi Receive Buffer < ^

The dimensions of the circular Buffer are set with BEGIN/END values, hardware automatically wraps to BEGIN when an incremented pointer hits END address.

Write Area
Memory between WRCSR and READCSR is free for receiving data, the hardware writes incoming packets to this region (to WRCSR and up) (but without exceeding READCSR), once when it has successfully received a complete packet, the hardware moves WRCSR after the packet (aligned to a 4-byte boundary).

Read Area
Memory between READCSR and WRCSR contains received data, which can be read by the CPU via RD_ADDR and RD_DATA registers (or directly from memory). Once when having processed that data, the CPU must set READCSR to the end of it.

050h - W_RXBUF_BEGIN - Wifi RX Fifo start location (R/W)
052h - W_RXBUF_END - Wifi RX Fifo end location (R/W)
  0-15  Byte-offset in Wifi Memory (usually 4000h..5FFEh)
Although the full 16bit are R/W, only the 12bit halfword offset in Bit1-12 is actually used, the other bits seem to have no effect.
Some or all (?) of the below incrementing registers are automatically matched to begin/end, that is, after incrementing, IF adr=end THEN adr=begin.

054h - W_RXBUF_WRCSR - Wifi RX Fifo Write or "end" cursor (R)
  0-11  Halfword Address in RAM
12-15 Always zero
This is a hardware controlled write location - it shows where the next packet will be written..

056h - W_RXBUF_WR_ADDR - Wifi RX Fifo Write Cursor Latch value (R/W)
  0-11  Halfword Address in RAM
12-15 Always zero
This is a value that is latched into W_RXBUF_WRCSR, when the W_RXCNT latch bit (W_RXCNT.Bit0) is written.

058h - W_RXBUF_RD_ADDR - Wifi CircBuf Read Address (R/W)
  0     Always zero
1-12 Halfword Address in RAM for reading via W_RXBUF_RD_DATA
13-15 Always zero
The circular buffer limits are the same as the range specified for the receive FIFO, however the address can be set outside of that range and will only be affected by the FIFO boundary if it crosses the FIFO end location by reading from the circular buffer.

05Ah - W_RXBUF_READCSR - Wifi RX Fifo Read or "start" cursor (R/W)
  0-11  Halfword Address in RAM
12-15 Always zero
This value is specified the same as W_RXBUF_WRCSR - it's purely software controlled so it's up to the programmer to move the start cursor after loading a packet. if W_RXBUF_READCSR != W_RXBUF_WRCSR, then one or more packets exist in the FIFO that need to be processed. (See the section on HW RX Headers, for information on calculating packet lengths) Once a packet has been processed, the software should advance the read cursor to the beginning of the next packet.

060h - W_RXBUF_RD_DATA - Wifi CircBuf Read Data (R)
  0-15  Data
returns the 16bit value at the address specified by W_RXBUF_RD_ADDR, and increments W_RXBUF_RD_ADDR by 2. If the increment causes W_RXBUF_RD_ADDR to equal the address specified in W_RXBUF_END, W_RXBUF_RD_ADDR will be reset to the address specified in W_RXBUF_BEGIN.
Ports 1060h, 6060h, 7060h are PASSIVE mirrors of 0060h, reading from these mirrors returns the old latched value from previous read from 0060h, but without reading a new value from RAM, and without incrementing the address.

062h - W_RXBUF_GAP - Wifi RX Gap Address (R/W)
  0     Always zero
1-12 Halfword Address in RAM
13-15 Always zero
Seems to be intended to define a "gap" in the circular buffer, done like so:
  Addr=Addr+2 and 1FFEh  ;address increment (by W_RXBUF_RD_DATA read)
if Addr=RXBUF_END then ;normal begin/end wrapping (done before gap wraps)
Addr=RXBUF_BEGIN
if Addr=RXBUF_GAP then ;now gap-wrap (may include further begin/end wrap)
Addr=RXBUF_GAP+RXBUF_GAPDISP*2
if Addr>=RXBUF_END then Addr=Addr+RXBUF_BEGIN-RXBUF_END ;wrap more
To disable the gap stuff, set both W_RXBUF_GAP and W_RXBUF_GAPDISP to zero.

064h - W_RXBUF_GAPDISP - Wifi RX Gap Displacement Offset (R/W)
  0-11  Halfword Offset, used with W_RXBUF_GAP (see there)
12-15 Always zero
Caution: On the DS-Lite, after adding it to W_RXBUF_RD_ADDR, the W_RXBUF_GAPDISP setting is destroyed (reset to 0000h) by hardware. The original DS leaves W_RXBUF_GAPDISP intact.

05Ch - W_RXBUF_COUNT (R/W)
  0-11  Decremented on reads from W_RXBUF_RD_DATA
12-15 Always zero
Triggers IRQ09 when it reaches zero, and does then stay at zero (without further decrementing, and without generating further IRQs).
Note: Also decremented on (accidental) writes to read-only W_RXBUF_RD_DATA.


 DS Wifi Receive Statistics < ^

1A8h - W_RXSTAT_INC_IF - Statistics Increment Flags (R)
  0-12   Increment Flags (see Port 1B0h..1BFh)
13-15 Always zero
Bitmask for which statistics have been increased at least once.
Unknown how to reset/acknowledge these bits... possibly by reading from 1A8h, or by reading from 1B0h..1BFh, or eventually/obscurely by writing to 1ACh.

1AAh - W_RXSTAT_INC_IE - Statistics Increment Interrupt Enable (R/W)
  0-12   Counter Increment Interrupt Enable (see Port 1B0h..1BFh) (1=Enable)
13-15 Unknown (usually zero)
Statistic Interrupt Enable Control register for Count Up.
Note: ------> seems to trigger IRQ02 ...?

1ACh - W_RXSTAT_OVF_IF - Statistics Half-Overflow Flags (R)
  0-12   Half-Overflow Flags (see Port 1B0h..1BFh)
13-15 Always zero
The W_RXSTAT_OVF_IF bits are simply containing the current bit7-value of the corresponding counters, setting or clearing that counter bits is directly reflected to W_RXSTAT_OVF_IF.
The recommended way to acknowledge W_RXSTAT_OVF_IF is to read the corresponding counters (which are reset to 00h after reading). For some reason, the firmware is additionally writing FFFFh to W_RXSTAT_OVF_IF (that is possibly a bug, or it does acknowlege something internally?).

1AEh - W_RXSTAT_OVF_IE - Statistics Half-Overflow Interrupt Enable (R/W)
  0-12   Half-Overflow Interrupt Enable (see Port 1B0h..1BFh) (1=Enable)
13-15 Unknown (usually zero)
Statistic Interrupt Enable for Overflow, bits same as in W_RXSTAT_INC_IE
Note: ------> seems to trigger IRQ04 ...?

1B0h..1BFh - W_RXSTAT - Receive Statistics (R/W, except 1B5h: Read-only)
W_RXSTAT is a collection of 8bit counters, which are incremented upon certain events. These entries are automatically reset to 0000h after reading. Should be accessed with LDRH opcodes (using LDRB to read only 8bits does work, but the read is internally expanded to 16bit, and so, the whole 16bit value will be reset to 0000h).
  Port  Dir  Bit  Expl.
1B0h R/W 0 W_RXSTAT ?
1B1h - - Always 0 -
1B2h R/W 1 W_RXSTAT ? "RX_LengthErrorCount or RX_RateErrorCount"
1B3h R/W 2 W_RXSTAT Length>2348 error
1B4h R/W 3 W_RXSTAT RXBUF Full error
1B5h R 4? W_RXSTAT ? (R) (but seems to exist; used by firmware)
1B6h R/W 5 W_RXSTAT Length=0 or Wrong FCS Error
1B7h R/W 6 W_RXSTAT Packet Received Okay
(also increments on W_MACADDR mis-match)
(also increments on internal ACK packets)
(also increments on invalid IEEE type=3)
(also increments TOGETHER with 1BCh and 1BEh)
(not incremented on RXBUF_FULL error)
1B8h R/W 7 W_RXSTAT ?
1B9h - - Always 0 -
1BAh R/W 8 W_RXSTAT ?
1BBh - - Always 0 -
1BCh R/W 9 W_RXSTAT WEP Error (when FC.Bit14 is set)
1BDh R/W 10 W_RXSTAT ?
1BEh R/W 11 W_RXSTAT (duplicated sequence control) ;"0x800 = DUPE, ?"
1BFh R/W 12 W_RXSTAT ?
1C4h - W_RX_COUNT (W_INTERNAL) (R)
  0-?   Receive Okay Count (increments together with ports 1B4h + 1B7h)
8-? Receive Error Count (increments together with ports 1B3h + 1B6h)
Increments when receiving a packet. Automatically reset to zero after reading.

1D0h..1DFh - W_CMD_STAT - Multiplay Response Error Counters (R/W)
The multiplay error counters are only used when sending a multiplay command (via W_TXBUF_CMD) to any connected slaves (which must be indicated by flags located in the second halfword of the multiplay command's frame body).
  1D0h        Not used (always zero)
1D1h..1DFh Client 1..15 Response Error (increments on missing replies)
If one or more of those slaves fail to respond, then the corresponding error counters get incremented (at the master side). Automatically reset to zero after reading.
Unknown if these counters do also increment at the slave side?


 DS Wifi Transmit Control < ^

0ACh - W_TXREQ_RESET - Reset Transfer Request Bits (W)
  0-3   Reset corresponding bits in W_TXREQ_READ (0=No change, 1=Reset)
4-15 Unknown (if any)
Firmware writes values 01h,02h,08h,0Dh, and FFFFh.

0AEh - W_TXREQ_SET - Set Transfer Request Bits (W)
  0-3   Set corresponding bits in W_TXREQ_READ (0=No change, 1=Set)
4-15 Unknown (if any)
Firmware writes values 01h,02h,05h,08h,0Dh.

0B0h - W_TXREQ_READ - Get Transfer Request Bits (R)
  0     Send W_TXBUF_LOC1  (1=Transfer, if enabled in W_TXBUF_LOC1.Bit15)
1 Send W_TXBUF_CMD (1=Transfer, if enabled in W_TXBUF_CMD.Bit15)
2 Send W_TXBUF_LOC2 (1=Transfer, if enabled in W_TXBUF_LOC2.Bit15)
3 Send W_TXBUF_LOC3 (1=Transfer, if enabled in W_TXBUF_LOC3.Bit15)
4 Unknown (seems to be always 1) (never used by firmware part4)
Ah, except... Bit4 can be cleared via W_POWERFORCE
5-15 Unknown/Not used
Bit0-3 can be set/reset via W_TXREQ_SET/W_TXREQ_RESET. The setting in W_TXREQ_READ remains intact even after the transfer(s) have completed.
If more than one of the LOC1,2,3 bits is set, then LOC3 is transferred first, LOC1 last. Beacons are transferred in every Beacon Timeslot (if enabled in W_TXBUF_BEACON.Bit15).
Bit0,2,3 are automatically reset upon IRQ14 (by hardware).

0B6h - W_TXBUSY (R)
  0     W_TXBUF_LOC1  (1=Requested Transfer busy, or not yet started at all)
1 W_TXBUF_CMD (1=Requested Transfer busy, or not yet started at all)
2 W_TXBUF_LOC2 (1=Requested Transfer busy, or not yet started at all)
3 W_TXBUF_LOC3 (1=Requested Transfer busy, or not yet started at all)
4 W_TXBUF_BEACON (1=Beacon Transfer busy)
5-15 Unknown (if any)
Busy bits. If all three W_TXBUF_LOC's are sent, then it goes through values 0Dh,05h,01h,00h; ie. LOC3 is transferred first, LOC1 last. The register is updated upon IRQ01 (by hardware).
Bit4 is set only in Beacon Timeslots.

0B8h - W_TXSTAT - RESULT - Status of transmitted frame (R)
For LOC1-3, this register is updated at the end of a transfer (upon the IRQ01 request), if retries occur then it is updated only after the final retry.
For BEACON, this register is updated only if enabled in W_TXSTATCNT.Bit15, and only after successful transfers (since beacon errors result in infinite retries).
For CMD, this register is updated only if enabled in W_TXSTATCNT.Bit13,14).
Bit0/1 act similar to W_IF Bit1/3, however, the W_IF Bits are set after each transmit (including retries).
  0     One (or more) Packet has Completed (1=Yes)
(No matter if successful, for that info see Bit1)
(No matter if ALL packets are done, for that info see Bit12-13)
1 Packet Failed (1=Error)
2-7 Unknown/Not used
8-11 Usually 0, ...but firmware is checking for values 03h,08h,0Bh
(gets set to 07h when transferred W_TXBUF_LOC1/2/3 did have Bit12=set)
(gets set to 00h otherwise)
(gets set to 03h after beacons; if enabled in W_TXSTATCNT.Bit15)
(gets set to 08h or 0Bh after CMD; depending on W_TXSTATCNT.Bit13,14)
12-13 Packet which has updated W_TXSTAT (0=LOC1/BEACON/CMD, 1=LOC2, 2=LOC3)
14-15 Unknown/Not used
No idea how to reset bit0/1 once when they are set?

008h - W_TXSTATCNT (R/W)
  0-12  Unknown (usually zero)
13 Update W_TXSTAT=0B01h and trigger IRQ01 after CMD transmits (1=Yes)
14 Update W_TXSTAT=0800h and trigger IRQ01 after CMD transmits (1=Yes)
15 Update W_TXSTAT and trigger IRQ01 after BEACON transmits (0=No, 1=Yes)
If both Bit13 and Bit14 are set, then Bit13 is having priority.
Note: LOC1..3 transmits are always updating W_TXSTAT and triggering IRQ01.

194h - W_TX_HDR_CNT - Disable Transmit Header Adjustments (R/W)
  0     IEEE FC.Bit12 and Duration (0=Auto/whatever, 1=Manual/Wifi RAM)
1 IEEE Frame Check Sequence (0=Auto/FCS/CRC32, 1=Manual/Wifi RAM)
2 IEEE Sequence Control (0=Auto/W_TX_SEQNO, 1=Manual/Wifi RAM)
3-15 Always zero
Allows to disable automatic adjustments of the IEEE header and checksum.
Note: W_TX_SEQNO can be also disabled by W_TXBUF_LOCn.Bit13 and by TXHDR[04h].

210h - W_TX_SEQNO - Transmit Sequence Number (R)
  0-11   Increments on IRQ07 (Transmit Start Interrupt)
12-15 Always zero
Also incremented shortly after IRQ12.
When enabled in W_TXBUF_LOCn.Bit13, this value replaces the upper 12bit of the IEEE Frame Header's Sequence Control value (otherwise, when disabled, the original value in Wifi RAM is used, and, in that case, W_TX_SEQNO is NOT incremented).
Aside from W_TXBUF_LOCn.Bit13, other ways to disable W_TX_SEQNO are: Transmit Hardware Header entry TXHDR[04h], and W_TX_HDR_CNT.Bit2.


 DS Wifi Transmit Buffers < ^

068h - W_TXBUF_WR_ADDR - Wifi CircBuf Write Address (R/W)
  0     Always zero
1-12 Halfword Address in RAM for Writes via W_TXBUF_WR_DATA
13-15 Always zero
070h - W_TXBUF_WR_DATA - Wifi CircBuf Write Data (W)
  0-15  Data to be written to address specified in W_TXBUF_WR_ADDR
After writing to this register, W_TXBUF_WR_ADDR is automatically incremented by 2, and, if it gets equal to W_TXBUF_GAP, then it gets additonally incremented by W_TXBUF_GAPDISP*2.

074h - W_TXBUF_GAP - Wifi CircBuf Write Top (R/W)
  0     Always zero
1-12 Halfword Address
13-15 Always zero
076h - W_TXBUF_GAPDISP - Wifi CircBuf Write Offset from Top to Bottom (R/W)
  0-11  Halfword Offset (added to; if equal to W_TXBUF_GAP)
12-15 Always zero
Should be "0-write_buffer_size" (wrap from end to begin), or zero (no wrapping).
Caution: On the DS-Lite, after adding it to W_TXBUF_WR_ADDR, the W_TXBUF_GAPDISP setting is destroyed (reset to 0000h) by hardware. The original DS leaves W_TXBUF_GAPDISP intact.

Note: W_TXBUF_GAP and W_TXBUF_GAPDISP may be (not TOO probably) also used by transmits via W_TXBUF_LOCn and W_TXBUF_BEACON (not tested).

080h - W_TXBUF_BEACON - Beacon Transmit Location (R/W)
090h - W_TXBUF_CMD - Multiplay Command Transmit Location (R/W)
0A0h - W_TXBUF_LOC1 - Transmit location 1 (R/W)
0A4h - W_TXBUF_LOC2 - Transmit location 2 (R/W)
0A8h - W_TXBUF_LOC3 - Transmit location 3 (R/W)
  0-11  Halfword Address of TX Frame Header in RAM
12 For LOC1-3: When set, W_TXSTAT.bit8-10 are set to 07h after transfer
And, when set, the transferred frame-body gets messed up?
For BEACON: Unknown, no effect on W_TXSTAT
For CMD: Unknown, no effect on W_TXSTAT
13 IEEE Sequence Control (0=From W_TX_SEQNO, 1=Value in Wifi RAM)
For BEACON: Unknown (always uses W_TX_SEQNO) (no matter of bit13)
14 Unknown
15 Transfer Request (1=Request/Pending)
For LOC1..3 and CMD, Bit15 is automatically cleared after (or rather: during?) transfer (no matter if the transfer was successful). For Beacons, bit15 is kept unchanged since beacons are intended to be transferred repeatedly.
The purpose of W_TXBUF_CMD is unknown... maybe for automatic replies...? Pictochat seems to use it for host-to-client data frames. W_TXBUF_CMD.Bit15 can be set ONLY while W_CMD_COUNT is non-zero.

0B4h - W_TXBUF_RESET (W)
  0     Disable LOC1    (0=No change, 1=Reset W_TXBUF_LOC1.Bit15)
1 Disable CMD (0=No change, 1=Reset W_TXBUF_CMD.Bit15)
2 Disable LOC2 (0=No change, 1=Reset W_TXBUF_LOC2.Bit15)
3 Disable LOC3 (0=No change, 1=Reset W_TXBUF_LOC3.Bit15)
4-5 Unknown/Not used
6 Disable REPLY2 (0=No change, 1=Reset W_TXBUF_REPLY2.Bit15)
7 Disable REPLY1 (0=No change, 1=Reset W_TXBUF_REPLY1.Bit15)
8-15 Unknown/Not used
Firmware writes values FFFFh, 40h, 02h, xxxx, 09h, 01h, 02h, C0h.

084h - W_TXBUF_TIM - Beacon TIM Index in Frame Body (R/W)
  0-7   Location of TIM parameters within Beacon Frame Body
8-15 Not used/zero
Usually set to 15h, that assuming that preceeding Frame Body content is: Timestamp(8), BeaconInterval(2), Capability(2), SuppRatesTagLenParams(4), ChannelTagLenParam(3), TimTagLen(2); so the value points to TimParams (ie. after TimTagLen).

06Ch - W_TXBUF_COUNT (R/W)
  0-11  Decremented on writes to W_TXBUF_WR_DATA
12-15 Always zero
Triggers IRQ08 when it reaches zero, and does then stay at zero (without further decrementing, and without generating further IRQs).
Note: Not affected by (accidental) reads from write-only W_TXBUF_WR_DATA.


 DS Wifi Transmit Errors < ^

Automatic ACKs
Transmit errors occur on missing ACKs. The NDS hardware is automatically responding with an ACK when receiving a packet (if it has been addressed to the receipients W_MACADDR setting). And, when sending a packet, the NDS hardware is automatically checking for ACK responses.
The only exception are packets that are sent to group addresses (ie. Bit0 of the 48bit MAC address being set to "1", eg. Beacons sent to FF:FF:FF:FF:FF:FF), the receipient(s) don't need to respond to such packets, and the sender always passes okay without checking for ACKs.

02Ch - W_TX_RETRYLIMIT (R/W)
Specifies the maximum number of retries on Transmit Errors (eg. 07h means one initial transmit, plus up to 7 retries, ie. max 8 transmits in total).
  0-7   Retry Count (usually 07h)
8-15 Unknown (usually 07h)
The Retry Count value is decremented on each Error (unless it is already 00h). There's no automatic reload, so W_TX_RETRYLIMIT should be reinitialized by software prior to each transmit (or, actually, there IS a reload?).
When sending multiple packets (by setting more than one bit with W_TXREQ_SET), then the first packet may eat-up all retries, leaving only a single try to the other packet(s).

1C0h - W_TX_ERR_COUNT - TransmitErrorCount (R/W)
  0-7   TransmitErrorCount
8-15 Always zero
Increments on Transmit Errors. Automatically reset to zero after reading.
IRQ03 triggered when W_TX_ERR_COUNT is incremented (for NON-beacons ONLY).
IRQ05 triggered when W_TX_ERR_COUNT > 7Fh (happens INCLUDING for beacons).

Error Notification
Transmit Errors can be sensed via W_TX_ERR_COUNT, IRQ03, IRQ05, TX Hardware Header entry [00h], and W_TXSTAT.Bit1.

W_TXBUF_BEACON Errors
As the name says, W_TXBUF_BEACON is intended for sending Beacons to group addresses (which do not require to respond by ACKs). So, transmit errors would occur only when mis-using W_TXBUF_BEACON to send packets to individual addresses, but the W_TXBUF_BEACON error handling isn't fully implemented:
First of, W_TX_RETRYLIMIT isn't used, instead, W_TXBUF_BEACON errors will result in infinite retries.
Moreover, W_TXBUF_BEACON errors seem to increment W_TX_ERR_COUNT, but without generating IRQ03, however, IRQ05 is generated when W_TX_ERR_COUNT>7Fh.

Other Errors
The NDS transmit hardware seems to do little error checking on the packet headers. The only known error-checked part is byte [04h] in the TX hardware header (which must be 00h, 01h, or 02h). Aside from that, when sent to a group address, it is passing okay even with invalid IEEE type/subtypes, and even with Length/Rate entries set to zero. However, when sending such data to an individual address, the receiving NDS won't respond by ACKs.

Note
Received ACKs aren't stored in WifiRAM (or, possibly, they ARE stored, but without advancing W_RXBUF_WRCSR, so that the software won't see them, and so that they will be overwritten by the next packet).


 DS Wifi Status < ^

19Ch - W_RF_PINS - Status of RF-Chip Control Signals (R)
  0    Reportedly "carrier sense" (maybe 1 during RX.DTA?) (usually 0)
1 TX.MAIN (RFU.Pin17) Transmit Data Phase (0=No, 1=Active)
2 Unknown (RFU.Pin3) Seems to be always high (Always 1=high?)
3-5 Not used (Always zero)
6 TX.ON (RFU.Pin14) Transmit Preamble+Data Phase (0=No, 1=Active)
Uhhh, no that seems to be still wrong...
Bit6 is often set, even when not transmitting anything...
7 RX.ON (RFU.Pin15) Receive Mode (0=No, 1=Enabled)
8-15 Not used (Always zero)
Physical state of the RFU board's RX/TX pins. Similar to W_RF_STATUS.

214h - W_RF_STATUS - Current Transmit/Receive State (R)
  0-3  Current Transmit/Receive State:
0 = Initial Value on power-up (before raising W_MODE_RST.Bit0)
1 = RX Mode enabled (waiting for incoming data)
2 = Switching from RX to TX (takes a few clock cycles)
3 = TX Mode active (sending preamble and data)
4 = Switching from TX to RX (takes a few clock cycles)
5 = Unknown, firmware checks for that value (maybe RX busy)
6 = Unknown, firmware checks for that value (maybe RX busy)
9 = Idle (upon IRQ13, and upon raising W_MODE_RST.Bit0)
----
5 = Receive ACK phase ?
6 =
7 =
8 = Multiplay related ? (when sending through W_TXBUF_CMD ?)
4-15 Always zero?
Numeric Status Code. Similar to W_RF_PINS.

268h - W_RXTX_ADDR - Current Receive/Transmit Address (R)
  0-11   Halfword address
12-15 Always zero
Indicates the halfword that is currently transmitted or received. Can be used by Start Receive IRQ06 handler to determine how many halfwords of the packet have been already received (allowing to pre-examine portions of the packet header even when the whole packet isn't fully received). Can be also used in Transmit Start IRQ07 handler to determine which packet is currently transmitted.


 DS Wifi Timers < ^

0E8h - W_US_COUNTCNT - Microsecond counter enable (R/W)
  0     Counter Enable (0=Disable, 1=Enable)
1-15 Always zero
Activates W_US_COUNT, and also W_BEACONCOUNT1 and W_BEACONCOUNT2 (which are decremented when lower 10bit of W_US_COUNT wrap from 3FFh to 000h). Note: W_POWER_US must be enabled, too.

0F8h - W_US_COUNT0 - Microsecond counter, bits 0-15 (R/W)
0FAh - W_US_COUNT1 - Microsecond counter, bits 16-31 (R/W)
0FCh - W_US_COUNT2 - Microsecond counter, bits 32-47 (R/W)
0FEh - W_US_COUNT3 - Microsecond counter, bits 48-63 (R/W)
  0-63  Counter Value in microseconds (incrementing)
Clocked by the 22.00MHz oscillator on the RFU board (ie. not by the 33.51MHz system clock). The 22.00MHz are divided by a 22-step prescaler.

0EAh - W_US_COMPARECNT - Microsecond compare enable (R/W)
  0     Compare Enable (0=Disable, 1=Enable) (IRQ14/IRQ15)
1 Force IRQ14 (0=No, 1=Force Now) (Write-only)
2-15 Always zero
Activates IRQ14 on W_US_COMPARE matches, and IRQ14/IRQ15 on W_BEACONCOUNT1.

0F0h - W_US_COMPARE0 - Microsecond compare, bits 0-15 (R/W)
0F2h - W_US_COMPARE1 - Microsecond compare, bits 16-31 (R/W)
0F4h - W_US_COMPARE2 - Microsecond compare, bits 32-47 (R/W)
0F6h - W_US_COMPARE3 - Microsecond compare, bits 48-63 (R/W)
  0     Always zero... firmware writes 1 though (maybe write-only flag?)
1-9 Always zero
10-63 Compare Value in milliseconds (aka microseconds/1024)
Triggers IRQ14 (see IRQ14 notes below) when W_US_COMPARE matches W_US_COUNT.
Usually set to FFFFFFFFFFFFFC00h (ie. almost/practically never). Instead, IRQ14 is usually derived via W_BEACONCOUNT1.

11Ch - W_BEACONCOUNT1 (R/W)
Triggers IRQ14 and IRQ15 (see IRQ14/IRQ15 notes below) when it reaches 0000h (if W_PRE_BEACON is non-zero, then IRQ15 occurs that many microseconds in advance).
  0-15  Decrementing Millisecond Counter (reloaded with W_BEACONINT upon IRQ14)
Set to W_BEACONINT upon IRQ14 events (unlike the other W_US_COMPARE related actions, this is done always, even if W_US_COMPARECNT is zero).
When reaching 0000h, it is immediately reloaded (as for US_COUNT matches), so the counting sequence is ..,3,2,1,BEACONINT,.. (not 3,2,1,ZERO,BEACONINT).

134h - W_BEACONCOUNT2 - Post-Beacon Counter (R/W)
  0-15  Decrementing Millisecond Counter (reloaded with FFFFh upon IRQ14)
Triggers IRQ13 when it reaches 0000h (no matter of W_US_COMPARECNT), and does then stay fixed at 0000h (without any further decrement/wrapping to FFFFh).
Set to FFFFh upon IRQ14 (by hardware), the IRQ14 handler should then adjust the register (by software) by adding the Tag DDh Beacon header's Stepping value (usually 000Ah) to it.
Seems to be used to indicate beacon transmission time (possible including additional time being reserved for responses)?

08Ch - W_BEACONINT - Beacon Interval (R/W)
Reload value for W_BEACONCOUNT1.
  0-9   Frequency in milliseconds of beacon transmission
10-15 Always zero
Should be initialized randomly to 0CEh..0DEh or so. The random setting reduces risk of repeated overlaps with beacons from other hosts.

110h - W_PRE_BEACON - Pre-Beacon Time (R/W)
  0-15  Pre-Beacon Time in microseconds (static value, ie. NOT decrementing)
Allows to define the distance between IRQ15 and IRQ14. The setting doesn't affect the IRQ14 timing (which occurs at the W_BEACONCOUNT1'th millisecond boundary), but IRQ15 occurs in advance (at the W_BEACONCOUNT1'th millisecond boundary minus W_PRE_BEACON microseconds). If W_PRE_BEACON is zero, then both IRQ14 and IRQ15 occur exactly at the same time.

088h - W_LISTENCOUNT - Listen Count (R/W)
  0-7   Decremented by hardware at IRQ14 events (ie. once every beacon)
8-15 Always zero
Reload occurs immediately BEFORE decrement, ie. with W_LISTENINT=04h, it will go through values 03h,02h,01h,00h,03h,02h,01h,00h,etc.

08Eh - W_LISTENINT - Listen Interval (R/W)
  0-7   Listen Interval, counted in beacons (usually 02h)
8-15 Always zero
Reload value for W_LISTENCOUNT.

10Ch - W_CONTENTFREE (R/W)
  0-15  Decrementing microsecond counter
Operated always (no matter of W_US_COUNTCNT).
Once when it has reached 0000h, it seems to stay fixed at 0000h.
"[Set to the remaining duration of contention-free period when
receiving beacons - only *really* necessary for powersaving mode]"

IRQ13 Notes (Post-Beacon Interrupt)
IRQ13 is generated by W_BEACONCOUNT2. It's simply doing:
  W_IF.Bit13=1      ;interrupt request
If W_POWER_TX.Bit1=0, then additionally enter sleep mode:
  [034h]=0002h ;W_INTERNAL       ;(similar to W_POWERFORCE=8001h)
[03Ch]=02xxh ;W_POWERSTATE ;(W_TXREQ_READ.Bit4 is kept intact though)
[19Ch]=0046h ;W_RF_PINS.Bit7=0 ;disable receive (enter idle mode) (RX.ON=Low)
[214h]=0009h ;W_RF_STATUS=9 ;indicate idle mode
Unlike for IRQ14/IRQ15, that's done no matter of W_US_COMPARECNT.

IRQ14 Notes (Beacon Interrupt)
IRQ14 is generated by W_US_COMPARE, and by W_BEACONCOUNT1.
Aside from just setting the IRQ flag in W_IF, the hardware does:
  W_BEACONCOUNT1=W_BEACONINT                             ;next IRQ15/IRQ14
(Above is NOT done when IRQ14 was forced via W_US_COMPARECNT.Bit1)
If W_US_COMPARECNT is 1, then the hardware does additionally:
  (Below IS ALSO DONE when IRQ14 was forced via W_US_COMPARECNT.Bit1)
W_IF.Bit14=1
W_BEACONCOUNT2=FFFFh ;about 64 secs (ie. almost never) ;next IRQ13 ("never")
W_TXREQ_READ=W_TXREQ_READ AND FFF2h
if W_TXBUF_BEACON.15 then W_TXBUSY.Bit4=1
if W_LISTENCOUNT=00h then W_LISTENCOUNT=W_LISTENINT
W_LISTENCOUNT=W_LISTENCOUNT-1
If W_TXBUF_BEACON.Bit15=1, then following is done shortly after IRQ14:
  W_RF_PINS.Bit7=0  ;disable receive (RX.ON=Low)
W_RF_STATUS=2 ;indicate switching from RX to TX mode
If W_TXBUF_BEACON.Bit15=1, then following is done a bit later:
  W_RF_PINS.Bit6=1  ;transmit preamble start (TX.ON=High)
W_RF_STATUS=3 ;indicate TX mode
The IRQ14 handler should then do the following (by software):
  W_BEACONCOUNT2 = W_BEACONCOUNT2 + TagDDhSteppingValue  ;next IRQ13
For using only ONE of the two IRQ14 sources: W_BEACONCOUNT1 can be disabled by setting both W_BEACONCOUNT1 and W_BEACONINT to zero. W_US_COMPARE can be sorts of "disabled" by setting it to value distant from W_US_COUNT, such like compare=count-400h.

IRQ07 Notes (Transmit Start Data; occurs after preamble)
  W_IF.Bit7=1       ;interrupt request
W_RF_PINS.Bit1=1 ;start data transfer (preamble finished now) (TX.MAIN=High)
Below only if packet was sent through W_TXBUF_BEACON, or if it was sent via W_TXBUF_LOCn, with W_TXBUF_LOCn.Bit13 being zero:
  [TXBUF...] = W_TX_SEQNO*10h   ;auto-adjust IEEE Sequence Control
W_TX_SEQNO=W_TX_SEQNO+1 ;increase sequence number
IRQ01 Notes (Transmit Done)
The following happens shortly before IRQ01:
  W_RF_PINS.Bit6=0  ;disable TX (TX.ON=Low)
W_RF_STATUS=4 ;indicate switching from TX to RX mode
Then, upon IRQ01, the following happens:
  W_IF.Bit1=1       ;interrupt request
W_RF_PINS.Bit1=0 ;disable TX (TX.MAIN=Low)
W_RF_PINS.Bit7=1 ;enable RX (RX.ON=High)
W_RF_STATUS=1 ;indicate RX mode
IRQ15 Notes (Pre-Beacon Interrupt)
IRQ15 is generated via W_BEACONCOUNT1 and W_PRE_BEACON. It's simply doing:
  if W_US_COMPARECNT=1 then W_IF.Bit15=1
If W_POWER_TX.Bit0=1, then additionally wakeup from sleep mode:
  W_RF_PINS.Bit7=1  ;enable RX (RX.ON=High) ;\gets set like so a good while
W_RF_STATUS=1 ;indicate RX mode ;/after IRQ15 (but not immediately)
Beacon IRQ Sequence
  IRQ15  Pre-Beacon  (beacon will be transferred soon)
IRQ14 Beacon (beacon will be transferred very soon) (carrier starts)
IRQ07 Tx Start (beacon transfer starts) (if enabled in W_TXBUF_BEACON.15)
IRQ01 Tx End (beacon transfer done) (if enabled in W_TXSTATCNT.15)
IRQ13 Post-Beacon (beacon transferred) (unless next IRQ14 occurs earlier)
That, for transmitting beacons. (For receiving, IRQ07/IRQ01 would be replaced by Rx IRQ's, provided that a remote unit is sending beacons).


 DS Wifi Multiplay Master < ^

These registers are used for multiplay host-to-client (aka master to slave) commands.

0EEh - W_CMD_COUNTCNT (R/W)
  0     Enable W_CMD_COUNT (0=Disable, 1=Enable)
1-15 Always Zero
118h - W_CMD_COUNT (R/W)
  0-15  Decremented once every 10 microseconds (Stopped at 0000h)
Written by firmware. Firmware IRQ14 handler checks for read value<=0Ah.
When it reaches zero, W_TXBUF_CMD is transferred (if enabled in W_TXBUF_CMD.Bit15, and in W_TXREQ_READ.Bit1), it does then trigger two (!) transfer start interrupts (IRQ07), transfer end is then indicated by a single IRQ12, optionally (when enabled in W_TXSTATCNT, IRQ01 (transfer done) is additionally generated (simultaneously with above IRQ12).
NOPE, above isn't quite right..... when W_CMD_COUNT is set to a very small value, then ONLY IRQ12 is triggered (so it might specify the duration during which the IRQ07's for W_TXBUF_CMD are allowed?)

0C0h - W_CMD_TOTALTIME - (R/W)
  0-15  Duration per ALL slave response packet(s) in microseconds
Before sending a MASTER packet, this port should be set to the same value as the MASTER packet's IEEE header's Duration/ID entry.

0C4h - W_CMD_REPLYTIME - (R/W)
  0-15  Duration per SINGLE slave response packet in microseconds
Before sending a MASTER packet, this port should be set to the expected per slave response time.
Note: Nintendo's multiboot/pictochat code is also putting this value in the 1st halfword of the MASTER packet's frame body.

At 2MBit/s transfer rate, the values should be set up sorts of like so:
  master_time = (master_bytes*4)+(60h)     ;60h = 96 decimal = short preamble
slave_time = (slave_bytes*4)+(0D0h..0D2h)
all_slave_time = (EAh..F0h)+(slave_time+0Ah)*num_slaves
txhdr[2] = slave_bits ;hardware header (*)
ieee[2] = all_slave_time ;ieee header (duration/id)
body[0] = slave_time ;duration per slave (for multiboot/pictochat)
body[2] = slave_bits ;frame body -- required (*)
reg[0C0h] = all_slave_time ;
reg[0C4h] = slave_time ;duration per slave
reg[118h] = (388h+(num_slaves*slave_time)+master_time+32h)/10
reg[090h] = 8000h+master_packet_address ;start transmit
With the byte values counting the ieee frame header+body+fcs.
(*) The hardware doesn't actually seem to use the "slave_bits" entry in the hardware header, instead, it is using the "slave_bits" entry in the frame body(!)


 DS Wifi Multiplay Slave < ^

These registers are used for multiplay client-to-host (aka slave to master) responses.

094h - W_TXBUF_REPLY1 - Multiplay Response Transmit Location 1 (R/W)
  0-11  Halfword address
12-14 Unknown (the bits can be set, ie. they DO exist)
15 Enable
Response packet address. The register setting probably doesn't directly affect the hardware, it's sole purpose seems to initialize 098h (see there).

098h - W_TXBUF_REPLY2 - Multiplay Response Transmit Location 2 (R)
  0-11  Halfword address
12-14 Unknown (the bits can be set, ie. they DO exist)
15 Enable
This register seems to contain the actual response packet address. However, since it's read-only, software cannot set it directly. Instead, software must write the address to 094h, and then latch it from 094h to 098h (via. W_RXCNT.Bit7).

Notes
Not sure if there's also auto-latching (similar to manual W_RXCNT.Bit7)?
Unknown if W_TXBUF_REPLY2.Bit15 is automatically reset after transfer?
Not sure if/how the hardware determines WHEN to send reply packets (eg. it should NOT send them after receiving Beacons) (eventually the Start Receive IRQ handler must examine the incoming packet, and then software must decide if it wants to respond by sending the reply) (if there are multiple slaves, the response order is probably automatically handled in respect to the local W_AID_LOW setting) (although, if, for example, ONLY slave 5 exists, then it ought to know that slave 5 is the <first> slave; that might happen if slave 1..4 have left the communication; that, unless the slaves would be automatically renumbered by software (?), so slave 5 would be become slave 1). Some of the Unknown Registers (namely Ports 244h and 228h) are probably also related to the REPLY function.


 DS Wifi Configuration Ports < ^

120h - W_CONFIG_120h (R/W) ;81ff 0048->SAME ...init from firmware[04Ch]
122h - W_CONFIG_122h (R/W) ;ffff 4840->SAME ...init from firmware[04Eh]
124h - W_CONFIG_124h (R/W) ;ffff 0000->0032 ...init from firmware[05Eh]
128h - W_CONFIG_128h (R/W) ;ffff 0000->01F4 ...init from firmware[060h]
130h - W_CONFIG_130h (R/W) ;0fff 0142->0140 ...init from firmware[054h]
132h - W_CONFIG_132h (R/W) ;8fff 8064->SAME ...init from firmware[056h]
140h - W_CONFIG_140h (R/W) ;ffff 0000->E0E0 ...init from firmware[058h]
142h - W_CONFIG_142h (R/W) ;ffff 2443->SAME ...init from firmware[05Ah]
144h - W_CONFIG_144h (R/W) ;00ff 0042->SAME ...init from firmware[052h]
146h - W_CONFIG_146h (R/W) ;00ff 0016->0002 ...init from firmware[044h]
148h - W_CONFIG_148h (R/W) ;00ff 0016->0017 ...init from firmware[046h]
14Ah - W_CONFIG_14Ah (R/W) ;00ff 0016->0026 ...init from firmware[048h]
14Ch - W_CONFIG_14Ch (R/W) ;ffff 162C->1818 ...init from firmware[04Ah]
150h - W_CONFIG_150h (R/W) ;ff3f 0204->0101 ...init from firmware[062h]
154h - W_CONFIG_154h (R/W) ;7a7f 0058->SAME ...init from firmware[050h]
These ports are to be initialized from firmware settings.
Above comments show the R/W bits (eg. 81FFh means bit15 and bit8-0 are R/W, bit14-9 are always zero), followed by the initial value on Reset (eg. 0048h), followed by new value after initialization from firmware settings (eg. 0032h, or SAME if the Firmware value is equal to the Reset value), followed by the location in firmware where the new value comes from (these values seem to be identical in all currently existing consoles).
Note: Firmware part4 changes W_CONFIG_124h to C8h, and W_CONFIG_128h to 7D0h, and W_CONFIG_150h to 202h, and W_CONFIG_140h depending on tx rate and preamble:
  W_CONFIG_140h = firmware[058h]+0202h             ;1Mbit/s
W_CONFIG_140h = firmware[058h]+0202h-6161h ;2Mbit/s with long preamble
W_CONFIG_140h = firmware[058h]+0202h-6161h-6060h ;2Mbit/s with short preamble
0ECh - W_CONFIG_0ECh (R/W) ;firmware writes 3F03h (same as on power-up)
0D4h - W_CONFIG_0D4h (R/W) ;firmware writes 0003h (affected by W_MODE_RST)
0D8h - W_CONFIG_0D8h (R/W) ;firmware writes 0004h (same as on power-up)
0DAh - W_CONFIG_0DAh (R/W) ;firmware writes 0602h (same as on power-up)
254h - W_CONFIG_254h (?) ;firmware writes 0000h (read: EEEEh on DS-Lite)
Firmware just initializes these ports with fixed values, without further using them after initialization.


 DS Wifi Baseband Chip (BB) < ^

BB-Chip Mitsumi MM3155 (DS), or BB/RF-Chip Mitsumi MM3218 (DS-Lite)

158h - W_BB_CNT - Baseband serial transfer control (W)
  0-7   Index     (00h-68h)
8-11 Not used (should be zero)
12-15 Direction (5=Write BB_WRITE to Chip, 6=Read from Chip to BB_READ)
Transfer is started after writing to this register.

15Ah - W_BB_WRITE - Baseband serial write data (W)
  0-7   Data to be sent to chip (by following W_BB_CNT transfer)
8-15 Not used (should be zero)
15Ch - W_BB_READ - Baseband serial read data (R)
  0-7   Data received from chip (from previous W_BB_CNT transfer)
8-15 Not used (always zero)
15Eh - W_BB_BUSY - Baseband serial busy flag (R)
  0     Transfer Busy (0=Ready, 1=Busy)
1-15 Always zero
Used to sense transfer completion after writes to W_BB_CNT.
Not sure if I am doing something wrong... but the busy flag doesn't seem to get set immediately after W_BB_CNT writes, and works only after waiting a good number of clock cycles?

160h - W_BB_MODE (R/W)
  0-7   Always zero
8 Unknown (usually 1) (no effect no matter what setting?)
9-13 Always zero
14 Unknown (usually 0) (W_BB_READ gets unstable when set)
15 Always zero
This register is initialized by firmware bootcode - don't change.

168h - W_BB_POWER (R/W)
  0-3   Disable whatever   (usually 0Dh=disable)
4-14 Always zero
15 Disable W_BB_ports (usually 1=Disable)
Must be set to 0000h before accessing BB registers.

Read-Write-Ability of the BB-Chip Mitsumi MM3155 registers (DS)
  Index    Num Dir Expl.
00h 1 R always 6Dh (R) (Chip ID)
01h..0Ch 12 R/W 8bit R/W
0Dh..12h 6 - always 00h
13h..15h 3 R/W 8bit R/W
16h..1Ah 5 - always 00h
1Bh..26h 12 R/W 8bit R/W
27h 1 - always 00h
28h..4Ch R/W 8bit R/W
4Dh 1 R always 00h or BFh (depending on other regs)
4Eh..5Ch R/W 8bit R/W
5Dh 1 R always 01h (R)
5Eh..61h - always 00h
62h..63h 2 R/W 8bit R/W
64h 1 R always FFh or 3Fh (depending on other regs)
65h 1 R/W 8bit R/W
66h 1 - always 00h
67h..68h 2 R/W 8bit R/W
69h..FFh - always 00h
Read-Write-Ability of the BB/RF-Chip Mitsumi MM3218 (DS-Lite)
Same as above. Except that reading always seems to return [5Dh]=00h. And, for whatever reason, Nintendo initializes DS-Lite registers by writing [00h]=03h and [66h]=12h. Nethertheless, the registers always read as [00h]=6Dh and [66h]=00h, ie. same as on original DS.

Important BB Registers
Registers 0..68h are initialized by firmware bootcode, and (most) of these settings do not need to be changed by other programs, except for:
  Addr Initial Meaning
01h 0x9E [unsetting/resetting bit 7 initializes/resets the system?]
02h unknown (firmware is messing with this register)
06h unknown (firmware is messing with this register, too)
13h 0x00 CCA operation - criteria for receiving
0=only use Carrier Sense (CS)
1=only use Energy Detection (ED)
2=receive if CS OR ED
3=receive only if CS AND ED
1Eh 0xBB see change channels flowchart (Ext. Gain when RF[09h].bit16=0)
35h 0x1F Energy Detection (ED) criteria
value 0..61 (representing energy levels of -60dBm to -80dBm)

 DS Wifi RF Chip < ^

RF-Chip RF9008 (compatible to RF2958 from RF Micro Devices, Inc.) (Original DS)
BB/RF-Chip Mitsumi MM3218 (DS-Lite)

17Ch - W_RF_DATA2 - RF chip serial data/transfer enable (R/W)
For Type2 (ie. firmware[040h]<>3):
  0-1   Upper 2bit of 18bit data
2-6 Index (00h..1Fh) (firmware uses only 00h..0Bh)
7 Command (0=Write data, 1=Read data)
8-15 Should be zero (not used with 24bit transfer)
For Type3 (ie. firmware[040h]=3):
  0-3   Command (5=Write data, 6=Read data)
4-15 Should be zero (not used with 20bit transfer)
Writing to this register starts the transfer.

17Eh - W_RF_DATA1 - RF chip serial data (R/W)
For Type2 (ie. firmware[040h]<>3):
  0-15  Lower 16bit of 18bit data
For Type3 (ie. firmware[040h]=3):
  0-7   Data (to be written to chip) (or being received from chip)
8-15 Index (usually 00h..28h) (index 40h..FFh are mirrors of 00h..3Fh)
This value should be set before setting W_RF_DATA2.

180h - W_RF_BUSY - RF chip serial busy flag (R)
  0     Transfer Busy (0=Ready, 1=Busy)
1-15 Always zero
Used to sense transfer completion after writes to W_RF_DATA2.

184h - W_RF_CNT - RF chip serial control (R/W)
  0-5   Transfer length (init from firmware[041h].Bit0-5)
6-7 Always zero
8 Unknown (init from firmware[041h].Bit7)
9-13 Always zero
14 Unknown (usually 0)
15 Always zero
This register is initialized by firmware bootcode - don't change.
Usually, Type2 has length=24bit and flag=0. Type3 uses length=20bit and flag=1.

Caution For Type2 (ie. firmware[040h]<>3)
Before accessing Type2 RF Registers, first BB[01h] must have been properly initialized (ie. BB[01h].Bit7 must have been toggled from 0-to-1).


 DS Wifi RF9008 Registers < ^

RF9008 (RF2958 compatible)
2.4GHz Spread-Spectrum Transceiver - RF Micro Devices, Inc.

RF chip data (Type2) (initial NDS settings from firmware, example)
  Firmware   Index   Data
(24bit) (4bit) (18bit)
00C007h = 00h + 0C007h ;-also set to 0C008h for power-down
129C03h = 04h + 29C03h
141728h = 05h + 01728h ;\these are also written when changing channels
1AE8BAh = 06h + 2E8BAh ;/
1D456Fh = 07h + 1456Fh
23FFFAh = 08h + 3FFFAh
241D30h = 09h + 01D30h ;-bit10..14 should be also changed per channel?
""""50h = """ + """50h ;firmware v5 and up uses narrower tx filter
280001h = 0Ah + 00001h
2C0000h = 0Bh + 00000h
069C03h = 01h + 29C03h
080022h = 02h + 00022h
0DFF6Fh = 03h + 1FF6Fh
RF[00h] - Configuration Register 1 (CFG1) (Power on: 00007h)
  17-16 Reserved, program to zero (0)
15-14 Reference Divider Value (0=Div2, 1=Div3, 2=Div44, 3=Div1)
3 Sleep Mode Current (0=Normal, 1=Very Low)
2 RF VCO Regulator Enable (0=Disable, 1=Enable)
1 IF VCO Regulator Enable (0=Disable, 1=Enable)
0 IF VGA Regulator Enable (0=Disable, 1=Enable)
RF[01h] - IF PLL Register 1 (IFPLL1) (Power on: 09003h)
  17    IF PLL Enable                      (0=Disable, 1=Enable)
16 IF PLL KV Calibration Enable (0=Disable, 1=Enable)
15 IF PLL Coarse Tuning Enable (0=Disable, 1=Enable)
14 IF PLL Loop Filter Select (0=Internal, 1=External)
13 IF PLL Charge Pump Leakage Current (0=Minimum value, 1=2*Minimum value)
12 IF PLL Phase Detector Polarity (0=Positive, 1=Negative)
11 IF PLL Auto Calibration Enable (0=Disable, 1=Enable)
10 IF PLL Lock Detect Enable (0=Disable, 1=Enable)
9 IF PLL Prescaler Modulus (0=4/5 Mode, 1=8/9 Mode)
8-4 Reserved, program to zero (0)
3-0 IF VCO Coarse Tuning Voltage (N=Voltage*16/VDD)
RF[02h] - IF PLL Register 2 (IFPLL2) (Power on: 00022h)
  17-16 Reserved, program to zero (0)
15-0 IF PLL divide-by-N value
RF[03h] - IF PLL Register 3 (IFPLL3) (Power on: 1FF78h)
  17    Reserved, program to zero (0)
16-8 IF VCO KV Calibration, delta N value (signed) ;DeltaF=(DN/Fr)
7-4 IF VCO Coarse Tuning Default Value
3-0 IF VCO KV Calibration Default Value
RF[04h] - RF PLL Register 1 (RFPLL1) (Power on: 09003h)
  17-10 Same as for RF[01h] (but for RF, not for IF)
9 RF PLL Prescaler Modulus (0=8/9 Mode, 1=8/10 Mode)
8-0 Same as for RF[01h] (but for RF, not for IF)
RF[05h] - RF PLL Register 2 (RFPLL2) (Power on: 01780h)
  17-6  RF PLL Divide By N Value
5-0 RF PLL Numerator Value (Bits 23-18)
RF[06h] - RF PLL Register 3 (RFPLL3) (Power on: 00000h)
  17-0  RF PLL Numerator Value (Bits 17-0)
RF[07h] - RF PLL Register 4 (RFPLL4) (Power on: 14578h)
  17-10 Same as for RF[03h] (but for RF, not for IF) ;and, DN=(deltaF/Fr)*256
RF[08h] - Calibration Register 1 (CAL1) (Power on: 1E742h)
  17-13  VCO1 Warm-up Time  ;TVCO1=(approximate warm-up time)*(Fr/32)
12-8 VCO1 Tuning Gain Calibration ;TLOCK1=(approximate lock time)*(Fr/128)
7-3 VCO1 Coarse Tune Calibration Reference ;VALUE=(average time)*(Fr/32)
2-0 Lock Detect Resolution (0..7)
RF[09h] - TXRX Register 1 (TXRX1) (Power on: 00120h)
  17    Receiver DC Removal Loop          (0=Enable DC Removal Loop, 1=Disable)
16 Internal Variable Gain for VGA (0=Disable/External, 1=Enable/Internal)
15 Internal Variable Gain Source (0=From TXVGC Bits, 1=From Power Control)
14-10 Transmit Variable Gain Select (TXVGC) (0..1Fh = High..low gain)
9-7 Receive Baseband Low Pass Filter (0=Wide Bandwidth, 7=Narrow)
6-4 Transmit Baseband Low Pass Filter (0=Wide Bandwidth, 7=Narrow)
3 Mode Switch (0=Single-ended mode, 1=Differential mode)
2 Input Buffer Enable TX (0=Input Buffer Controlled by TXEN, 1=By BBEN)
1 Internal Bias Enable (0=Disable/External, 1=Enable/Internal)
0 TX Baseband Filters Bypass (0=Not Bypassed, 1=Bypassed)
RF[0Ah] - Power Control Register 1 (PCNT1) (Power on: 00000h)
  17-15 Select MID_BIAS Level                          (1.6V through 2.6V)
14-9 Desired output power at antenna (N*0.5dBm)
8-3 Power Control loop-variation-adjustment Offset (signed, N*0.5dB)
2-0 Desired delay for using a single TX_PE line (N*0.5us)
RF[0Bh] - Power Control Register 2 (PCNT2) (Power on: 00000h)
  17-12 Desired MAX output power when PABIAS=MAX=2.6V (N*0.5dBm)
11-6 Desired MAX output power when PABIAS=MID_BIAS (N*0.5dBm)
5-0 Desired MAX output power when PABIAS=MIN=1.6V (N*0.5dBm)
RF[0Ch] - VCOT Register 1 (VCOT1) (Power on: 00000h)
  17    IF VCO Band Current Compensation (0=Disable, 1=Enable)
16 RF VCO Band Current Compensation (0=Disable, 1=Enable)
15-0 Reserved, program to zero (0)
RF[0Dh..1Ah] - N/A (Power on: 00000h)
  Not used.
RF[1Bh] - Test Register 1 (TEST) (Power on: 0000Fh)
  17-0  This is a test register for internal use only.
RF[1Ch..1Eh] - N/A (Power on: 00000h)
  Not used.
RF[1Fh] - Reset Register (Power on: 00001h)
  17-0  Don't care (writing any value resets the chip)

 DS Wifi Unknown Registers < ^

00Ah - W_X_00Ah (R/W)
  0-15  Unknown (usually zero)
"[bit7 - ingore rx duplicates]" <--- that is NOT correct (no effect).
Firmware writes 0000h to it. That, done many times. So, eventually some bits in this register are automatically set by hardware in whatever situations, otherwise repeatedly writing 0000h to it would be kinda useless...?

---

Below Ports 244h and 228h might be related to deciding when to send multiplay replies...?

244h (R/W) x ffff [0000] (used by firmware part4)
Unknown. Seems to be W_IF/W_IE related. Firmware sets Port 244h bits 6,7,12 to 1-then-0 upon IRQ06, IRQ07, IRQ12 respectively.

228h (W) x fixx [0000] (used by firmware part4) (bit3)
Unknown. Firmware writes 8-then-0 (done in IRQ06 handler, after Port 244h access).

---

Below Ports 1A0h, 1A2h, 1A4h are somehow related to BB[02h]...

1A0h (R/W) x -933 [0000]
  0-1   Unknown
2-3 Always zero
4-5 Unknown
6-7 Always zero
8 Unknown
9-10 Always zero
11 Unknown
12-15 Always zero
Firmware writes values 000h, 823h. Seems to be power-related. The following experimental code toggles RXTX.ON (RFU.Pin4): "x=0 / @@lop: / [1A0h]=x / [036h]=0 / x=x XOR 3 / wait_by_loop(1000h) / b @@lop".
Also, writing to port 1A0h affects ports 034h, 19Ch, 21Ch, and 2A2h.

1A2h (R/W) x ---3 [0001] (used by firmware part4)
  0-1   Unknown. Firmware writes values 03h, 01h, and VAR.
2-15 Always zero
Used in combination with Port 1A0h, so it's probably power-related, too.

1A4h (R/W) x ffff [0000]
"Rate used when signal test is enabled (0x0A or 0x14 for 1 or 2 mbit)"
(Not too sure if that's correct, there is no visible relation to any rate.)
(This register seems to be R/W only on certain port 1A0h settings.)
Unknown. Firmware writes whatever.

---

290h (R/W or Disabled)
Reportedly, this is the "antenna" register, which should exist on official devkits, allowing to switch between wired Ethernet, and wireless Wifi mode.
  0     Unknown (R/W) (if present)
1-15 Not used
On normal NDS release versions, this register seems to be disabled (if it is implemented at all), and trying to read from it acts as for unused registers, ie. reads return FFFFh (or probably 0000h on NDS-lite). The NDS firmware contains code for accessing this port, even in release versions.

W_INTERNAL
All registers marked as "W_INTERNAL" aren't used by Firmware part4, and are probably unimportant, except for whatever special diagnostics purposes.

Wifi DMA
Wifi RAM can be accessed with normal "Start Immediately" DMA transfers (typically by reading through W_RXBUF_RD_DATA, so the DMA automatically wraps from END to BEGIN).
Additionally, DMA0 and DMA2 can be reportedly synchronized to "Wireless Interrupt" (rather than using "Start Immediately" timing), no idea if/how that's working though... and if it gets started on any Wifi IRQ, or only on specific IRQs...?
Possibly some of the above unknown registers, or some unknown bits in other registers, are DMA related...?
Reportedly, early firmwares did use "Wireless Interrupt" DMAs (that'd be firmware v1/v2... or, only earlier unreleased prototype versions?).


 DS Wifi Unused Registers < ^

Wifi WS0 and WS1 Regions in NDS7 I/O Space
Wifi hardware occupies two 32K slots, but most of it is filled with unused or duplicated regions. The timings (waitstates) for WS0 and WS1 are initialized in WIFIWAITCNT (by firmware).
  4800000h-4807FFFh Wifi WS0 Region (32K)
4808000h-4808000h Wifi WS1 Region (32K)
4810000h-4FFFFFFh Not used (00h-filled)
Structure of the 32K Wifi Regions (WS0 and WS1)
  Wifi-WS0-Region    Wifi-WS1-Region    Content
4800000h-4800FFFh 4808000h-4808FFFh Registers
4801000h-4801FFFh 4809000h-4809FFFh Registers (mirror)
4802000h-4803FFFh 480A000h-480BFFFh Unused
4804000h-4805FFFh 480C000h-480DFFFh Wifi RAM (8K)
4806000h-4806FFFh 480E000h-480EFFFh Registers (mirror)
4807000h-4807FFFh 480F000h-480FFFFh Registers (mirror)
Wifi Registers (recommended 4808000h-4808FFFh) appear more stable in WS1?
Wifi RAM (recommended 4804000h-4805FFFh) appears more stable in WS0?

Unused Ports (Original NDS)
Aside from those ports listed in the Wifi I/O Map, all other ports in range 000h..FFFh are unused. On the original DS, reading from these ports returns FFFFh.

Unused Ports (NDS-Lite)
Reading from unused I/O ports acts as PASSIVE mirror of W_RXBUF_RD_DATA. Exceptions are: Ports 188h, and 2D8h..2E6h; which always return 0000h.

Unused Memory (Original NDS)
Unused Wifi Memory is at 2000h..3FFFh. On the original DS, reading from that region returns FFFFh.

Unused Memory (NDS-Lite)
Reading from unused memory acts as PASSIVE mirror of WifiRAM (ie. reading from it returns the value being most recently read from 4000h..5FFFh) (that not affected by indirect WifiRAM reads via W_RXBUF_RD_DATA) (and, that not affected by writes to wifi memory, including writes that do overwrite the most recent read value) (and, that only if WifiRAM is properly enabled, ie. Port 220h.Bits0-1 should be 0).
Moreover, certain addresses are additionally ORed with mirrored I/O Ports. That addresses are:
  2030h, 2044h, 2056h, 2080h, 2090h, 2094h, 2098h, 209Ch, 20A0h, 20A4h,
20A8h, 20AAh, 20B0h, 20B6h, 20BAh, 21C0h, 2208h, 2210h, 2244h, 31D0h,
31D2h, 31D4h, 31D6h, 31D8h, 31DAh, 31DCh, 31DEh.
For example, 2044h is a PASSIVE mirror of WifiRAM, ORed with an ACTIVE mirror of W_RANDOM (Port 044h). Note that some mirrors are at 2000h-2FFFh, and some at 3000h-3FFFh. The W_CMD_STAT mirrors are PASSIVE (that, in unused memory region only) (in normal port-mirror regions like 1000h-1FFF, W_CMD_STAT mirrors are ACTIVE).

Known (W) Mirrors (when reading from Write-only ports)
  Read from (W)           Mirrors to (NDS)       Or to (NDS-Lite)
070h W_TXBUF_WR_DATA 060h W_RXBUF_RD_DATA 074h W_TXBUF_GAP
078h W_INTERNAL 068h W_TXBUF_WR_ADDR 074h W_TXBUF_GAP
0ACh W_TXREQ_RESET 09Ch W_INTERNAL ? (zero)
0AEh W_TXREQ_SET 09Ch W_INTERNAL ? (zero)
0B4h W_TXBUF_RESET 0B6h W_TXBUSY ? (zero)
158h W_BB_CNT 15Ch W_BB_READ ? (zero)
15Ah W_BB_WRITE ? (zero) ? (zero)
178h W_INTERNAL 17Ch W_RF_DATA2 ? (zero)
20Ch W_INTERNAL 09Ch W_INTERNAL ? (zero)
21Ch W_IF_SET 010h W_IF 010h-OR-05Ch-OR-more?
228h x ? (zero) ? (zero)
298h W_INTERNAL 084h W_TXBUF_TIM 084h W_TXBUF_TIM
2A8h W_INTERNAL 238h W_INTERNAL 238h W_INTERNAL
2B0h W_INTERNAL 084h W_TXBUF_TIM 084h W_TXBUF_TIM
Notes: The mirror to W_RXBUF_RD_DATA is a passive mirror.
The DS-Lite mirror at 21Ch consists of several ports ORed with each other (known components are Ports 010h and 05Ch, but there seem to be even more values ORed with it).

Port Mirror Regions
The Wifi Port region at 000h..FFFh is mirrored to 1000h..1FFFh, 6000h..6FFFh, and 7000h..7FFFh. Many of that mirrored ports are PASSIVE mirrors. Eg. reading from 1060h (mirror of Port 060h, W_RXBUF_RD_DATA) returns the old W_RXBUF_RD_DATA value (but without loading a new value from Wifi RAM, and without incrementing W_RXBUF_RD_ADDR). However, other registers, like W_RANDOM do have ACTIVE mirrors.


 DS Wifi Initialization < ^

Initialization sequence
These events must be done somewhat in sequence. There is some flexibility as to how they can be ordered but it's best to follow this order:
  [4000304h].Bit1 = 1 ;POWCNT2  ;-Enable power to the wifi system
W_MACADDR = firmware[036h] ;-Set 48bit Mac address
reg[012h] = 0000h ;W_IE ;-Disable interrupts
Wake Up the wireless system:
  reg[036h] = 0000h ;W_POWER_US ;\clear all powerdown bits
delay 8 ms ; (works without that killer-delay ?)
reg[168h] = 0000h ;W_BB_POWER ;/
IF firmware[040h]=02h ;\
temp=BB[01h] ; for wifitype=02h only:
BB[01h]=temp AND 7Fh ; reset BB[01h].Bit7, then restore old BB[01h]
BB[01h]=temp ; (that BB setting enables the RF9008 chip)
ENDIF ;/
delay 30 ms ;-(more killer-delay now getting REALLY slow)
call init_sub_functions ;- same as "Init 16 registers by firmware[..]"
; and "Init RF registers", below.
; this or the other one probably not necessary
Init the Mac system:
  reg[004h] = 0000h   - W_MODE_RST       ;set hardware mode
reg[008h] = 0000h - W_TXSTATCNT ;
reg[00Ah] = 0000h - ? W_X_00Ah ;(related to rx filter)
reg[012h] = 0000h - W_IE ;disable interrupts (again)
reg[010h] = FFFFh - W_IF ;acknowledge/clear any interrupts
reg[254h] = 0000h - W_CONFIG_254h ;
reg[0B4h] = FFFFh - W_TXBUF_RESET ;--reset all TXBUF_LOC's
reg[080h] = 0000h - W_TXBUF_BEACON ;disable automatic beacon transmission
reg[02Ah] = 0000h - W_AID_FULL ;\clear AID
reg[028h] = 0000h - W_AID_LOW ;/
reg[0E8h] = 0000h - W_US_COUNTCNT ;disable microsecond counter
reg[0EAh] = 0000h - W_US_COMPARECNT ;disable microsecond compare
reg[0EEh] = 0001h - W_CMD_COUNTCNT ;(is 0001h on reset anyways)
reg[0ECh] = 3F03h - W_CONFIG_0ECh ;
reg[1A2h] = 0001h - ? ;
reg[1A0h] = 0000h - ? ;
reg[110h] = 0800h - W_PRE_BEACON ;
reg[0BCh] = 0001h - W_PREAMBLE ;disable short preamble
reg[0D4h] = 0003h - W_CONFIG_0D4h ;
reg[0D8h] = 0004h - W_CONFIG_0D8h ;
reg[0DAh] = 0602h - W_CONFIG_0DAh ;
reg[076h] = 0000h - W_TXBUF_GAPDISP ;disable gap/skip (offset=zero)
Init 16 registers by firmware[044h..063h]
  reg[146h] = firmware[044h] ;W_CONFIG_146h
reg[148h] = firmware[046h] ;W_CONFIG_148h
reg[14Ah] = firmware[048h] ;W_CONFIG_14Ah
reg[14Ch] = firmware[04Ah] ;W_CONFIG_14Ch
reg[120h] = firmware[04Ch] ;W_CONFIG_120h
reg[122h] = firmware[04Eh] ;W_CONFIG_122h
reg[154h] = firmware[050h] ;W_CONFIG_154h
reg[144h] = firmware[052h] ;W_CONFIG_144h
reg[130h] = firmware[054h] ;W_CONFIG_130h
reg[132h] = firmware[056h] ;W_CONFIG_132h
reg[140h] = firmware[058h] ;W_CONFIG_140h
reg[142h] = firmware[05Ah] ;W_CONFIG_142h
reg[038h] = firmware[05Ch] ;W_POWER_TX
reg[124h] = firmware[05Eh] ;W_CONFIG_124h
reg[128h] = firmware[060h] ;W_CONFIG_128h
reg[150h] = firmware[062h] ;W_CONFIG_150h
Init RF registers
  numbits = BYTE firmware[041h]    ;usually 18h
numbytes = (numbits+7)/8 ;usually 3
reg[0x184] = (numbits+80h) AND 017Fh -- W_RF_CNT
for i=0 to BYTE firmware[042h]-1 ;number of entries (usually 0Ch) (0..0Bh)
if BYTE firmware[040h]=3
RF[i]=firmware[0CEh+i]
else
RF_Write(numbytes at firmware[0CEh+i*numbytes])
endif
Init the BaseBand System
  (this should be not required, already set by firmware bootcode)
reg[160h] = 0100h ;W_BB_MODE
BB[0..68h] = firmware[64h+(0..68h)]
Set Mac address
  copy 6 bytes from firmware[036h] to mac address at 0x04800018  (why again ?)
Now just set some default varibles
  reg[02Ch]=0007h  ;W_TX_RETRYLIMIT - XXX needs to be set for every transmit?
Set channel (see section on changing channels)
Set Mode 2 -- sets bottom 3 bits of W_MODE_WEP to 2
Set Wep Mode / key -- Wep mode is bits 3..5 of W_MODE_WEP
BB[13h] = 00h ;CCA operation (use only carrier sense, without ED)
BB[35h] = 1Fh ;Energy Detection Threshold (ED)
-- To further init wifi to the point that you can properly send
-- and receive data, there are some more variables that need to be set.
  reg[032h] = 8000h -- W_WEP_CNT     ;Enable WEP processing
reg[134h] = FFFFh -- W_BEACONCOUNT2;reset post-beacon counter to LONG time
reg[028h] = 0000h -- W_AID_LOW ;\clear W_AID value, again?!
reg[02Ah] = 0000h -- W_AID_FULL ;/
reg[0E8h] = 0001h -- W_US_COUNTCNT ;enable microsecond counter
reg[038h] = 0000h -- W_POWER_TX ;disable transmit power save
reg[020h] = 0000h -- W_BSSID_0 ;\
reg[022h] = 0000h -- W_BSSID_1 ; clear BSSID
reg[024h] = 0000h -- W_BSSID_2 ;/
-- TX prepare
  reg[0AEh] = 000Dh -- W_TXREQ_SET   ;flush all pending transmits (uh?)
-- RX prepare
  reg[030h] = 8000h    W_RXCNT         ;enable RX system (done again below)
reg[050h] = 4C00h W_RXBUF_BEGIN ;(example values)
reg[052h] = 5F60h W_RXBUF_END ;(length = 4960 bytes)
reg[056h] = 0C00h/2 W_RXBUF_WR_ADDR ;fifo begin latch address
reg[05Ah] = 0C00h/2 W_RXBUF_READCSR ;fifo end, same as begin at start.
reg[062h] = 5F60h-2 W_RXBUF_GAP ;(set gap<end) (zero should work, too)
reg[030h] = 8001h W_RXCNT ;enable, and latch new fifo values to hardware
--
  reg[030h] = 8000h    W_RXCNT       enable receive (again?)
reg[010h] = FFFFh W_IF clear interrupt flags
reg[012h] = whatever W_IE set enabled interrupts
reg[1AEh] = 1FFFh W_RXSTAT_OVF_IE desired STAT Overflow interrupts
reg[1AAh] = 0000h W_RXSTAT_INC_IE desired STAT Increase interrupts
reg[0D0h] = 0181h W_RXFILTER set to 0x581 when you successfully connect
to an access point and fill W_BSSID with a mac
address for it. (W_RXFILTER) [not sure on the values
for this yet]
reg[0E0h] = 000Bh -- W_RXFILTER2 ;
reg[008h] = 0000h -- ? W_TXSTATCNT ;(again?)
reg[00Ah] = 0000h -- ? W_X_00Ah ;(related to rx filter) (again?)
reg[004h] = 0001h -- W_MODE_RST ;hardware mode
reg[0E8h] = 0001h -- W_US_COUNTCNT ;enable microsecond counter (again?)
reg[0EAh] = 0001h -- W_US_COMPARECNT ;enable microsecond compare
reg[048h] = 0000h -- W_POWER_? ;[disabling a power saving technique]
reg[038h].Bit1 = 0 -- W_POWER_TX ;[this too]
reg[048h] = 0000h -- W_POWER_? ;[umm, it's done again. necessary?]
reg[0AEh] = 0002h -- W_TXREQ_SET ;
reg[03Ch].Bit1 = 1 -- W_POWERSTATE ;queue enable power (RX power, we believe)
reg[0ACh] = FFFFh -- W_TXREQ_RESET;reset LOC1..3
That's it, the DS should be now happy to send and receive packets.
It's very possible that there are some unnecessary registers set in here.


 DS Wifi Flowcharts < ^

Wifi Transmit Procedure
To transmit data via wifi (Assuming you've already initialized wifi and changed channels to the channel you want):
 (1) Copy the TX Header followed by the 802.11 packet to send anywhere it
will fit in MAC memory (halfword-aligned)
(2) Take the offset from start of MAC memory that you put the packet,
divide it by 2, and or with 0x8000 - store this in one of the
W_TXBUF_LOC registers
(3) Set W_TX_RETRYLIMIT, to allow your packet to be retried until an ack is
received (set it to 7, or something similar)
(4) Store the bit associated with the W_TXBUF_LOC register you used
into W_TXREQ_SET - this will send the packet.
(5) You can then read the result data in W_TXSTAT when the TX is over
(you can tell either by polling or interrupt) to find out how many
retries were used, and if the packet was ACK'd
Of course, this is just the simplest approach, you can be a lot more clever about it.

Wifi Receive Procedure
To receive data via wifi, you either need to handle the wifi received data interrupt, or you need to poll W_RXBUF_WRCSR - whenever it is != W_RXBUF_READCSR, there is a new packet. When there is a new packet, take the following approach:
 (1) Calculate the length of the new packet (read "received frame length"
which is +8 bytes from the start of the packet) - total frame length
is (12 + received frame length) padded to a multiple of 4 bytes.
(2) Read the data out of the RX FIFO area (keep in mind it's a circular
buffer and you may have to wrap around the end of the buffer)
(3) Set the value of W_RXBUF_READCSR to the location of the next packet
(add the length of the packet, and wrap around if necessary)
Keep in mind, W_RXBUF_READCSR and W_RXBUF_WRCSR must be multiplied by 2 to get a byte offset from the start of MAC memory.

Wifi Change Channels Procedure (ch=1..14)
For Type2 or Type5 (ie. firmware[040h]<>3): ;(Type2, used in Original-DS)
  RF[firmware[F2h+(ch-1)*6]/40000h] = firmware[F2h+(ch-1)*6] AND 3FFFFh
RF[firmware[F5h+(ch-1)*6]/40000h] = firmware[F5h+(ch-1)*6] AND 3FFFFh
delay a few milliseconds ;huh?
IF RF[09h].bit16=0 ;External Gain (default)
BB[1Eh]=firmware[146h+(ch-1)] ;set BB.Gain register
ELSEIF RF[09h].bit15=0 ;Internal Gain from TXVGC Bits
RF[09h].Bit10..14 = (firmware[154h+(ch-1)] AND 1Fh) ;set RF.TXVGC Bits
ENDIF
For Type3 (ie. firmware[040h]=3): ;(Type3, used in DS-Lite)
  num_initial_regs = firmware[042h]
addr=0CEh+num_initial_regs
num_bb_writes = firmware[addr]
num_rf_writes = firmware[43h]
addr=addr+1
for i=1 to num_bb_writes
BB[firmware[addr]] = firmware[addr+ch]
addr=addr+15
next i
for i=1 to num_rf_writes
RF[firmware[addr]] = firmware[addr+ch]
addr=addr+15
next i
Congrats, you are now ready to transmit/receive on whatever channel you picked.

Channels
The IEEE802.11b standard (and the NDS hardware) support 14 channels (1..14).
Channels 1..13 use frequencies 2412MHz..2472MHz (in 5MHz steps). Channel 14 uses frequency 2484MHz. Which channels are allowed to be used varies from country to country, as indicated by Bit1..14 of firmware[03Ch]. Channel 14 is rarely used (dated back to an older japanese standard).

Caution: Nearby channels do overlap, you'll get transmission errors on packets that are transferred simultaneously with packets on nearby channels. But, you won't successfully receive packets from nearby channels (so you won't even "see" that they are there, which is bad, as it doesn't allow you to share the channel synchronized with other hosts; ie. it'd be better if two hosts are using the SAME channel, rather than to use nearby channels).
To avoid that problem, conventionally only channels 1,6,11 are used - however Nintendo uses channels 1,7,13 - which is causing conflicts between channel 6,7, and maybe also between 11,13.


 DS Wifi Hardware Headers < ^

Hardware TX Header (12 bytes) (TXHDR)
The TX header immediately precedes the data to be sent, and should be put at the location that will be given to the register activating a transmission.
  Addr Siz Expl.
00h 2 Status - In: Don't care - Out: Status (0000h=Failed, 0001h=Okay)
02h 2 Unknown - In: Don't care
Bit0: Usually zero.
Bit1..15 --------> flags for multiboot slaves number 1..15
(Should be usually zero, except when sending multiplay commands
via W_TXBUF_CMD. In that case, the slave flags should be ALSO
stored in the second halfword of the FRAME BODY. Actually, the
hardware seems to use only that entry (in the BODY), rather than
using the this entry (in the hardware header)).
04h 1 Unknown - In: Must be 00h..02h (should be 00h)
(03h..FFh result in error: W_TXSTAT.Bit1 gets set, but
nethertheless header entry[00h] is kept set to 0001h=Okay)
;00h = use W_TX_SEQNO (if enabled in TXBUF_LOCn)
;01h = force NOT to use W_TX_SEQNO (even if it is enabled in LOCn)
;02h = seems to behave same as 01h
05h 1 Unknown - In: Don't care - Out: Set to 00h
06h 2 Unknown - In: Don't care
08h 1 Transfer Rate (0Ah=1Mbit/s, 14h=2Mbit/s) (other values=1MBit/s, too)
09h 1 Unknown - In: Don't care
0Ah 2 Length of IEEE Frame Header+Body+checksum(s) in bytes
(14bits, upper 2bits are unused/don't care)
The eight "Don't care" bytes should be usually set to zero (although setting them to FFh seems to be working as well). Entries [00h] and [05h] are modified by hardware, all other entries are kept unchanged.

Important note! TX length includes the length of a 4-byte "FCS" (checksum) for the packet. The hardware generates the FCS for you, but you still must include it in the packet length. Also note that if the 802.11 WEP enabled bit is set in the header, the packet will be automatically encrypted via the wep algorithm - however, the software is responsible for providing the 4-byte IV block with the WEP key ID and the 24bit IV value. - ALSO, you must include the length of the *encrypted* FCS used in packets that have wep enabled (increase the tx length by another 4 bytes) - this value is calculated automaticly for you, but you are responsible for including it in the length of your packet (if you have data there, it'll be replaced by the FCS.)

Hardware RX Header (12 bytes) (RXHDR)
The RX header is an informational structure that provides needed information about a received packet. It is written right before the received packet data in the rx circular buffer.
  Addr Siz Expl.
00h 2 Flags
Bit0-3: Frame type/subtype:
0 managment/any frame (except beacon and invalid subtypes)
1 managment/beacon frame
5 control/ps-poll frame
8 data/any frame (subtype0..7) (ie. except invalid subtypes)
C,D,E,F unknown (firmware is checking for that values)
---
C firmware uses it for data/cf-poll frame, FromDs (*)
D firmware uses it for data/cf-ack frame, FromDs
E,F firmware uses it for data/cf-ack frame, ToDs
(*) with DA=broadcast
---
Bit4: Seems to be always set
Bit5-7: Seems to be always zero
Bit8: Set when FC.Bit10 is set (more fragments)
Bit9: Set when the lower-4bit of Sequence Control are nonzero,
it is also set when FC.Bit10 is set (more fragments)
So, probably, it is set on fragment-mismatch-errors
Bit10-14: Seems to be always zero
Bit15: Set when Frame Header's BSSID value equals W_BSSID register
02h 2 Unknown (usually 0040h)
04h 2 Time since last packet (eg. when receiving beacons: total random on
first some packets, but later on it gets equal to Beacon Interval)
In other cases, this value is equal to the 1st 2 bytes of the DA ?
[Above time/da effects might be explained by other reason: maybe
this entry is left unchanged, simply containing old WifiRAM value?]
06h 2 Transfer Rate (N*100kbit/s) (ie. 14h for 2Mbit/s)
08h 2 Length of IEEE Frame Header+Body in bytes (excluding FCS checksum)
0Ah 1 MAX RSSI
0Bh 1 MIN RSSI
Important Note: Received frame lengths are always multiples of 4 bytes. While the actual header length + received frame length may be less, when incrementing the read cursor you must pad the length to a multiple of 4 bytes.

IEEE Header
The above Hardware headers should (must) be followed by valid IEEE headers. Although that headers are to be generated by software, the hardware does do some interaction with the IEEE headers, such like comparing address fields with W_MACADDR and W_BSSID. And, it does modify some entries of it:
1) The sequence control value is replaced by W_TX_SEQNO*10h (when enabled in W_TXBUF_LOCn.Bit13), this replacement does also overwrite the local TXBUF value.
2) The frame control value is modified, namely, the hardware tends to set Bit12 of it. This replacement does NOT modify the local TXBUF, but the remote RXBUF will receive the modified value. Also, Bit0-1 (protocol version) are forcefully set to 0.
3) Transmits via W_TXBUF_BEACON do additionally modify the 64bit timestamp (so W_TXBUF_BEACON should be used ONLY for packets WITH timestamp, ie. Beacons or Probe-Responses). The local TXBUF seems to be left unchanged, but the remote RXBUF will contain the (sender's) W_US_COUNT value.
C) For Control Frames, the hardware headers Length value is transferred as normally (ie. excluding the FCS length, remote RXBUF will contain TXBUF length minus 4), but - no matter of that length value - only 10 or 16 bytes (depending on the subtype) of the IEEE frame are actually transferred and/or stored in RXBUF.
X) For Control Frames with Subtype 0Ah, the AID entry is set to C000h, that, probably ORed with original value in WifiRAM, or with the W_AID_FULL register?
XX) No idea if it's possible to send Control Frames with subtype 0Bh..0Fh, as for now, it seems that either they aren't sent, or the receipient is ignoring them (or processing them internally, but without storing them in RXBUF).


 DS Wifi Multiboot < ^

Available Game Advertisement
WMB uses beacon frames to advertise available games for download. The beacon frames are normally used to advertise available access points in most 802.11 systems, but there is nothing preventing their use in this capacity. The advertisement data is fragmented and stored partially in each beacon frame as the payload of a custom information element (tag: 0xDD).

The DS Download Play menu only lists games when the beacons are broadcasted on one of the following channels: 1, 3, 4, 5, 7, 9, 10, 11, 13, and 14 (that is WRONG, firmware_v3 checks only channels 1,7,13). However, the DS hosting mechanism only seems to transmit on channels 1, 7, and 13 (apparently selected at random).

All beacon frames transmitted by a DS host have the following format:
  802.11 management frame
802.11 beacon header
Supported rates (tagged IE, advertises 1 Mbit and 2 Mbit)
DS parameter set (tagged IE, note: Distribution System, not Nintendo DS)
TIM vector (tagged IE, transmitted as empty)
Custom extension (tagged IE, tag 0xDD)
Nintendo specific beacon fragment format (information element code 0xDD):
  Offset Description
00h Nintendo Beacon ID (00h,09h,BFh,00h)
04h Stepping Offset for 4808134h/W_BEACONCOUNT2 (always 000Ah)
06h Strange Timestamp (W_US_COUNT*2-VCOUNT*7Fh)/128 (0000h for multiboot)
08h 01 00
0Ah 40 00
0Ch 24 00
0Eh 40 00
10h Randomly generated stream code
12h Number of bytes from entry 18h and up (70h for multiboot) (0 if Empty)
13h Beacon Type (0Bh=Multiboot, 01h=Multicart/Pictochat, 09h=Empty)
14h 0100 0008 (some kind of max,min values?)
For Empty (length zero, is used at very begin of multiboot)
  18h  No data.
For Multicart (variable length)
  18h  Custom data, usually containing the host name, either in 8bit ascii,
or 16bit unicode format. Sometimes taken from Firmware User Settings,
and sometimes from Cartridge Backup Memory.
For Pictochat (length 8)
  18h  Fixed (always 2348h)
1Ah xxxx
1Ch Chatroom number (00h..03h for Chatroom A..D)
1Dh Number of users already connected (01h..10h) (including host)
1Eh Fixed (always 0004h)
For Multiboot (always 70h bytes)
  18h  24 00 40 00 (varies from game to game)
1Ch End of advertisement flag (00 for non-end, 02 for end packets)
1Dh Always 00, 01, 02, or 04
1Eh Number of players already connected
1Fh Sequence number (0 .. total_advertisement_length)
20h Checksum (on entries 22h and up)
chksum=0, for i=22h to 86h step 2, chksum=chksum+halfword[i], next i,
chksum=FFFFh AND NOT (chksum+chksum/10000h)
22h Sequence number in non-final packet, # of players in final packet
23h Total advertisement length - 1 (in beacons)
24h Datasize in bytes (2 byte little-endian)
(0062h for seq 0..7, 0048h for seq 8, 0001h for seq 9)
26h Data (always 62h bytes, padded with 00h if Datasize<62h)
The advertisement fragments are reordered and assembled according to their internal sequence number, to form the overall advertisement payload, as defined below:
  Offset Size Description
000h 32 Icon Palette (same as for ROM Cartridge Icon)
020h 512 Icon Bitmap (same as for ROM Cartridge Icon)
220h 1 Unknown (0Bh)
221h 1 Length of hosting name ;(probably same as firmware
222h 20 Name of hosting DS (10 UCS-2) ;user name?)
236h 1 Max number of players
237h 1 Unknown (00h)
238h 96 Game name (48 UCS-2) (same as 1st line of ROM Cartridge Title)
298h 192 Description (96 UCS-2) (same as further lines of ROM Cart Title)
358h 64 00's if no users are connected <---WRONG: LEN=1, not 64
398h 0 End of data if no users are connected
Authentication process
Once a user B chooses a download offered by a host A, the following standard 802.11 authentication process observed.
  Host A advertises a game in beacon frames as described above
Client B sends an authentication request (sequence 1) to A
Host A replies with an ACK
Host A sends an authentication reply (sequence 2) to B
Client B replies with an association request
Host A replies with an ACK
Host A sends an association response
Client B responds with an ACK
After this, the two are associated, and will remain so until the transfer is complete or one is idle for several seconds, at which point they will de-associate. For more information on the association process, see the 802.11 standard.

Download process (after authentication)
  Host sends Pings (type 0x01, replies are 0x00, 0x07)
Host sends RSA frame (type 0x03, replies 0x08)
Host sends NDS header (type 0x04, replies 0x09)
Host sends ARM9 binary (type 0x04, replies 0x09)
Host sends ARM7 binary (type 0x04, replies 0x09)
Host terminates transfer (type 0x05, no replies)
The WMB protocol ostensibly implements layers 3 to 7 of the OSI network model, but does not define a new type of network addresses. However, it does define a couple of special broadcast-like MAC addresses within the assigned Nintendo namespace (00:09:BF).

The three channels or flows used for all communications after the MAC broadcast beacons take the form 03:09:BF:00:00:xx, where xx is:
  00 for the main data flow, from host to client    (sent via Port 090h)
10 for the client to host replies (sent via Port 094h)
03 for the feedback flow, host to client (acknowledges the replies)
Observed commands:
  Command   Description
0x01 Ping / Name request
0x03 RSA signature frame
0x04 Data packet
0x05 Post-idle / unknown
Observed replies
  Reply ID  Description
0x00 Pong (ping reply)
0x07 Name reply
0x08 RSA frame reply
0x09 Data packet reply
The host does something unusual with the 802.11 sequence control field, each packet sent out on the 00 flow has a sequence control number 2 greater than the previous one, even if they are sent sequentially. When the host acknowledges a reply (on flow 03) from the client about a particular packet, it uses the sequence number one after the original packet number it sent out on 00. This is the root of one of the major problems in finding a PC card that can transmit WMB packets, as very few cards provide user control over it. Even when a card is capable of `raw' 802.11 transmission, it typically takes care of the sequence control field in hardware or firmware, filling it with a constantly incrementing number.
  -------------------
Host-to-client packets (on the 0x00 flow)
  0  1  2  3  4    5     6..e-3   e-2  e-1  e-0
06 01 02 00 Size Flags Payload 00 02 00
Above first two bytes are W_CMD_REPLYTIME.
Above next two bytes are slave flags (bit1..15 for slave 1..15) (1=connected).
The size field is in terms of half-words (16 bits), and includes the flags byte along with the payload (so a size of 0x03 represents a flag byte, a command byte, and 4 bytes of payload).
When flags is 0x11, the first byte of the payload is a command. There seems to be no important data when flags is not 0x11 (seen occasionally as 0x01), and ignoring them still results in a complete dump.

The Ping messages (type 0x00) have a payload size of 0x03, but always contain zeroes in the payload. They seem to be used only to keep the connection alive while waiting for the host DS to start the transfer, to prevent a time-out de-association.

RSA signature frame payload (type 0x03)
The RSA frame format (type 0x03) sends a table of information about the game being downloaded (most of it redundant with the NDS header, see Appendix), as well as the RSA signature for the DS. I have not looked into computing the signature, as homebrew developers are not privy to Nintendo's private key, making signing a fruitless activity, but it is my understanding that the signature is a 128 byte public key and an 8 byte SHA-1 message digest over the NDS header, ARM9 binary, and ARM7 binary. Notably: the RSA frame itself is not included as part of the data being signed, bringing up various security issues and making Nintendo's firmware engineers look amateurish at best.
There are several abortive sendings of empty RSA frames with a size field of 0x03, before the real frame is sent (always with a size field of 0x75).
  Offset Size Description
0x00 4 ARM9 execute address
0x04 4 ARM7 execute address
0x08 4 0x00
0x0C 4 Header destination
0x10 4 Header destination
0x14 4 Header size (0x160)
0x18 4 0x00
0x1C 4 ARM9 destination address
0x20 4 ARM9 destination address
0x24 4 ARM9 binary size
0x28 4 0x00
0x2C 4 0x022C0000
0x30 4 ARM7 destination address
0x34 4 ARM7 binary size
0x38 4 0x01
0x3C 136 Signature block
0xC4 36 0x00's
0xE8 0 End of frame payload
The offsets in the table are from after the command byte, i.e. two bytes into the 234 bytes of payload including the flags.
The unknown address 0x022C0000 is probably ARM7 related, by comparison with the duplicated header and ARM9 destination addresses 32 and 16 bytes before it, although it has no known significance according to the NDS header.

Data packet (type 0x04)
The data packets (type 0x04) include a transport-layer sequence number inside of the data packet itself, but no destination offset or other mechanism to allow the packets to be processed out-of-order. The only way to place the data at the correct location in memory is to re-order the packets according to the sequence number and process them sequentially.
  0  1     2       3   ..  End
00 [Sequence #] xx .. yy
The sequence number is a zero based little-endian number. Each packet only contains data for one of the three destination blocks (header, ARM9, ARM7), so the change-of-destination check only needs to be made on packet boundaries.
  -------------------
Client to Host Replies (on the 0x10 flow)
The replies from client to host are sent on the 0x10 flow. The client uses an incrementing sequence control number for all of its packets, with no unusual trickery. Each reply is sent as a standard 802.11 data frame (typically as a Data + CF-Acknowledgement), consisting of 10 data bytes for the WMB payload. The first two are always 0x04 0x81, with the third byte indicating the type of reply, and the remaining 7 bytes being reply-specific.

Idle / Pong reply (type 0x00)
  0  1  2  3  4  5  6  7  8  9
04 81 00 00 00 00 00 00 00 00
One type of packet frequently sent before a download gets underway is what I have termed the Idle or Pong packet (in response to 0x00 `Pings'). It has a reply type field of 0x00, and does not contribute any additional information.

Name reply (type 0x07)
  0  1  2  3  4     5      6     7      8     9
04 81 07 01 [Character0] [Character1] [Character2]
04 81 07 02 [Character3] [Character4] [Character5]
04 81 07 03 [Character6] [Character7] [Character8]
04 81 07 04 [Character9] 01 00 00 00
The name reply (type 0x07) is sent shortly after association is completed, although I am not certain what triggers it. There are a variable number of pings preceding this reply, but most are replied via Pongs. The name reply sends the user-configured DS name (set in the firmware menu) split over four messages (with the 4th byte of the packet specifying which message fragment this is, 1 based). This can be a total length of 10 UCS-2 characters, although all four messages are still sent if it is shorter (padded with nulls to 10 characters, and then 01 and then nulls until the end of the frame).

RSA frame receipt reply (type 0x08)
  0  1  2  3  4  5  6  7  8  9
04 81 08 xx xx xx xx xx xx xx
The RSA frame receipt reply contains no extra information; it only acknowledges receipt of a type 0x03 host packet on the main flow (0x00). Bizarrely, the xx bytes in the above table are not driven to a particular value when replying to an RSA frame, and usually contain the same data as the second (of four) name response frames.

Data packet receipt reply (type 0x09)
  0  1  2    3    4        5     6     7  8  9
04 81 09 [Last packet] [Best packet] 00 00 00
[last packet] is the packet number being acknowledged
[best packet] is the highest continuous packet number seen so far
Packet IDs are little-endian numbers, like other Nintendo provided data.
  -------------------
Host to client acknowledgements (on the 0x03 flow)

These packets contain four data bytes, but three are always zero. The first seems to be random, with no connection to the acknowledged data. The actual indication of acknowledgement is the sequence control number of the packet. It is set to be one greater than the sequence control number of the initial host packet (sent on flow 0x00) that the client has just responded to, to indicate that the reply was received.

Host-to-client acknowledgement
  0  1  2  3
?? 00 00 00
The .NDS format is the standard format for Nintendo DS programs; it originated on original game cards and also appears to a limited extent in WMB binaries. The WMB process only transfers the first 0x160 bytes of the header, the ARM9 binary, and the ARM7 binary (in that order), ignoring the file name and file allocation tables, the overlay data, and some information stored in the banner (the rest is transmitted partially via the beacon advertisement process).


 DS Wifi IEEE802.11 Frames < ^

MAC Frame Format
  10..30 bytes    MAC Header
0..2312 bytes Frame Body
4 bytes Frame Check Sequence (FCS) (aka checksum)
MAC Header (10..30 bytes)
  Size Content
2 Frame Control Field (FC)
2 Duration/ID
6 Address 1
(6) Address 2 (if any)
(6) Address 3 (if any)
(2) Sequence Control (if any)
(6) Address 4 (if any)
Frame Control Field (FC)
  Bit  Expl.
0-1 Protocol Version (0=Current, 1..3=Reserved)
2-3 Type (0=Managment, 1=Control, 2=Data, 3=Reserved)
4-7 Subtype (see next chapters) (meaning depends on above Type)
8 To Distribution System (DS)
9 From Distribution System (DS)
10 More Fragments
11 Retry
12 Power Managment (0=Active, 1=STA will enter Power-Safe mode after..)
13 More Data
14 Wired Equivalent Privacy (WEP) Encryption (0=No, 1=Yes)
15 Order
Bit 8-11 and Bit 13-15 are always 0 in Control Frames.

Duration/ID Field (16bit)
  0000h..7FFFh  Duration (0-32767)
8000h Fixed value within frames transmitted during the CFP
(CFP=Contention Free Period)
8001h..BFFFh Reserved
C000h Reserved
C001h..C7D7h Association ID (AID) (1..2007) in PS-Poll frames
C7D8h..FFFFh Reserved
48bit MAC Addresses
MAC Addresses are 48bit (6 bytes) (Bit0 is the LSB of the 1st byte),
  0     Group Flag (0=Individual Address, 1=Group Address)
1 Local Flag (0=Universally Administered Address, 1=Locally Administered)
2-23 22bit Manufacturer ID (assigned by IEEE)
24-47 24bit Device ID (assigned by the Manufacturer)
Special NDS related Addresses:
  00 09 BF xx xx xx  NDS-Consoles (Original NDS with firmware v1-v5)
00 16 56 xx xx xx NDS-Consoles (Newer NDS-Lite with firmware v6 and up)
03 09 BF 00 00 00 NDS-Multiboot: host to client (main data flow)
03 09 BF 00 00 10 NDS-Multiboot: client to host (replies)
03 09 BF 00 00 03 NDS-Multiboot: host to client (acknowledges replies)
FF FF FF FF FF FF Broadcast to all stations (eg. Beacons)
Sequence Control Field
  Bit  Expl.
0-3 Fragment Number (0=First (or only) fragment)
4-15 Sequence Number
(increment by 1, except on retransmissions, ie. retries)

WEP Frame Body
  3 bytes     Initialization Vector
1 byte Pad (6bit, all zero), Key ID (2bit)
1..? bytes Data (encrypted data)
4 bytes ICV (encrypted CRC32 across Data)

 DS Wifi IEEE802.11 Managment Frames (Type=0) < ^

All Managment Frames have 24-byte Frame Header, with following values:
  FC(2), Duration(2), DA(6), SA(6), BSSID(6), Sequence Control(2)
The content of the Frame Body depends on the FC's Subtype:
  Subtype                   Frame Body
0 Association request Capability, ListenInterval, SSID, SuppRates
1 Association response Capability, Status, AID, SuppRates
2 Reassociation request Capability, ListenInterval, CurrAP, SSID, SuppRates
3 Reassociation response Capability, Status, AID, SuppRates
4 Probe request SSID, SuppRates
5 Probe response Same as for Beacon (but without TIM)
8 Beacon Timestamp,BeaconInterval,Capability,SSID,SuppRates,
FH Parameter Set (when using Frequency Hopping),
DS Parameter Set (when using Direct Sequence),
CF Parameter Set (when supporting PCF),
IBSS Parameter Set (when in an IBSS),
TIM (when generated by AP)
9 Announcement traffic indication message (ATIM) Body is "null" (=none?)
A Disassociation ReasonCode
B Authentication AuthAlgorithm, AuthSequence, Status, ChallengeText
C Deauthentication ReasonCode
Subtypes 6..7, and D..F are Reserved.

The separate components of the Frame Body are...
64bit Parameters (8 bytes)
  Timestamp: value of the TSFTIMER (see 11.1) of a frame's source. Uh?
48bit Parameters (6 bytes)
  Current AP (Access Point): MAC Address of AP with which station is associated
16bit Parameters (2 bytes)
  Capability Information (see list below)
Status code (see list below) (0000h=Successful, other=Error code)
Reason code (see list below) (Error code)
Association ID (AID) (C000h+1..2007)
Authentication Algorithm (0=Open System, 1=Shared Key, 2..FFFFh=Reserved)
Authentication Transaction Sequence Number (Open System:1-2, Shared Key:1-4)
Beacon Interval (Time between beacons, N*1024 us)
Listen Interval (see note below)
Information elements (1byte ID, 1byte LEN, followed by LEN byte(s) data)
  ID      LEN      Expl.
00h 00h-20h SSID (LEN=0 for broadcast SSID)
01h 01h-08h Supported rates; each (nn AND 7Fh)*500kbit/s, bit7=flag
02h 05h FH (Frequency Hopping) Parameter Set
DwellTime(16bit), HopSet, HopPattern, HopIndex
03h 01h DS (Distribution System) Parameter Set; Channel (01h..0Eh)
04h 06h CF Parameter Set; Count, Period, MaxDuration, RemainDuration
05h 04h..FEh TIM; Count,Period,Control, 1-251 bytes PartialVirtualBitmap
06h 02h IBSS Parameter Set; ATIM Window length (16bit)
07h-0Fh - Reserved
10h 02h..FEh Challenge text; 1-253 bytes Authentication data
(Used only for Shared Key sequence no 2,3)
(none such for Open System)
(none such for Shared key sequence no 1,4)
11h-1Fh - Reserved for challenge text extension
20h-FFh - Reserved
DDh var Reserved but used by Nintendo for NDS-Multiboot beacons
IDs 20h-FFh are commonly used; I've received values 2xh..3xh and DDh (from non-nintendo network routers in the neighborhood); no idea if these "Reserved" IDs are somewhere officially documented?

Capability Information
  Bit0    ESS
Bit1 IBSS
Bit2 CF-Pollable
Bit3 CF-Poll Request
Bit4 Privacy
Bit5 Short Preamble (IEEE802.11b only)
Bit6 PBCC (IEEE802.11b only)
Bit7 Channel Agility (IEEE802.11b only)
Bit5-7 Reserved (0) (original IEEE802.11 specs)
Bit8-15 Reserved (0)
Listen Interval
  ... used to indicate to the AP how often an STA wakes to listen to Beacon
management frames. The value of this parameter is the STA's Listen Interval
parameter of the MLME-Associate. request primitive and is expressed in
units of Beacon Interval.
Reason codes
  00h Reserved
01h Unspecified reason
02h Previous authentication no longer valid
03h Deauthenticated because sending station is leaving (or has left) IBSS
or ESS
04h Disassociated due to inactivity
05h Disassociated because AP is unable to handle all currently associated
stations
06h Class 2 frame received from nonauthenticated station
07h Class 3 frame received from nonassociated station
08h Disassociated because sending station is leaving (or has left) BSS
09h Station requesting (re)association is not authenticated with responding
station
0Ah..FFFFh Reserved
Status codes
  00h Successful
01h Unspecified failure
02h..09h Reserved
0Ah Cannot support all requested cap's in the Capability Information field
0Bh Reassociation denied due to inability to confirm that association exists
0Ch Association denied due to reason outside the scope of this standard
0Dh Responding station doesn't support the specified authentication algorithm
0Eh Received an Authentication frame with authentication transaction sequence
number out of expected sequence
0Fh Authentication rejected because of challenge failure
10h Authentication rejected due to timeout waiting for next frame in sequence
11h Association denied because AP is unable to handle additional associated
stations
12h Association denied due to requesting station not supporting all of the
data rates in the BSSBasicRateSet parameter
13h Association denied due to requesting station not supporting
the Short Preamble option (IEEE802.11b only)
14h Association denied due to requesting station not supporting
the PBCC Modulation option (IEEE802.11b only)
15h Association denied due to requesting station not supporting
the Channel Agility option (IEEE802.11b only)
13h-15h Reserved (original IEEE802.11 specs)
16h..FFFFh Reserved

 DS Wifi IEEE802.11 Control and Data Frames (Type=1/2) < ^

Control Frames (Type=1)
All Control Frames have 10-byte or 16-byte headers, depending on the Subtype:
  Subtype                          Frame Header
A Power Save (PS)-Poll FC AID BSSID TA
B Request To Send (RTS) FC Duration RA TA
C Clear To Send (CTS) FC Duration RA -
D Acknowledgment (ACK) FC Duration RA -
E Contention-Free (CF)-End FC Duration RA BSSID
F CF-End + CF-Ack FC Duration RA BSSID
Subtypes 0..9 are Reserved. Control Frames do not have a Frame Body, so the Header is directly followed by the FCS.

Data Frames (Type=2)
All Data Frames consist of the following components:
  FC, Duration/ID, Address 1, Address 2, Address 3, Sequence Control,
Address 4 (only on From DS to DS), Frame Body, FCS.
The meaning of the 3 or 4 addresses depends on Frame Control FromDS/ToDS bits:
  Frame Control    Address 1  Address 2  Address 3  Address 4
From STA to STA DA SA BSSID -
From DS to STA DA BSSID SA -
From STA to DS BSSID SA DA -
From DS to DS RA TA DA SA
Frame Control Subtypes for Data Frames (Type=2) are:
  0   Data
1 Data + CF-Ack
2 Data + CF-Poll
3 Data + CF-Ack + CF-Poll
4 Null function (no data)
5 CF-Ack (no data)
6 CF-Poll (no data)
7 CF-Ack + CF-Poll (no data)
8-F Reserved

 DS Backwards-compatible GBA-Mode < ^

When booting a 32pin GBA cartridge, the NDS is automatically switched into GBA mode, in that mode all NDS related features are disabled, and the console behaves (almost) like a GBA.

GBA Features that are NOT supported on NDS in GBA Mode.
Unlike real GBA, the NDS does not support 8bit DMG/CGB cartridges.
The undocumented Internal Memory Control register (Port 800h) isn't supported, so the NDS doesn't allow to use 'overclocked' RAM.
The NDS doesn't have a link-port, so GBA games can be played only in single player mode, link-port accessories cannot be used, and the NDS cannot run GBA code via multiboot.

GBA Features that are slightly different on NDS in GBA Mode.
The CPU, Timers, and Sound Frequencies are probably clocked at 16.76MHz; 33.51Mhz/2; a bit slower than the original GBA's 16.78MHz clock?
In the BIOS, a single byte in a formerly 00h-filled area has been changed from 00h to 01h, resulting in SWI 0Dh returning a different BIOS checksum.
The GBA picture can be shown on upper or lower screen (selectable in boot-menu), the backlight for the selected screen is always on, resulting in different colors & much better visibility than original GBA. Unlike GBA-SP, the NDS doesn't have a backlight-button.

Screen Border in GBA mode
The GBA screen is centered in the middle of the NDS screen. The surrounding pixels are defined by 32K-color bitmap data in VRAM Block A and B. Each frame, the GBA picture is captured into one block, and is displayed in the next frame (while capturing new data to the other block).
To get a flicker-free border, both blocks should be initialized to contain the same image before entering GBA mode (usually both are zero-filled, resulting in a plain black border).
Note: When using two different borders, the flickering will be irregular - so there appears to be a frame inserted or skipped once every some seconds in GBA mode?!

Switching from NDS Mode to GBA Mode
  --- NDS9: ---
ZEROFILL VRAM A,B ;init black screen border (or other color/image)
POWCNT=8003h ;enable 2D engine A on upper screen (0003h=lower)
EXMEMCNT=... ;set Async Main Memory mode (clear bit14)
IME=0 ;disable interrupts
SWI 06h ;halt with interrupts disabled (lockdown)
--- NDS7: ---
POWERMAN.REG0=09h ;enable sound amplifier & upper backlight (05h=lower)
IME=0 ;disable interrupts
wait for VCOUNT=200 ;wait until VBlank
SWI 1Fh with R2=40h ;enter GBA mode, by CustomHalt(40h)
After that, the GBA BIOS will be booted, the GBA Intro will be displayed, and the GBA cartridge (if any) will be started.


 DS Xboo < ^

The DS Xboo cable allows to upload NDS ROM-Images (max 3.9MBytes) to the console via parallel port connection. Should be the best, simpliest, easiest, and fastest way to test code on real hardware. And, at a relative decent price of 11 cents per diode it should be by far the least expensive way. You'll have to touch classic tools (screwdrivers, knifes, saws, tweezers, and solder) which will probably scare most of you to hell.

DS XBOO Connection Schematic
  Console Pin/Names             Parallel Port Pin/Names
RFU.9 FMW.1 D ---|>|--- DSUB.14 CNTR.14 AutoLF
RFU.6 FMW.2 C ---|>|--- DSUB.1 CNTR.1 Strobe
RFU.10 FMW.3 /RES ---|>|--- DSUB.16 CNTR.31 Init
RFU.7 FMW.4 /S ---|>|--- DSUB.17 CNTR.36 Select
RFU.5 FMW.5 /W --. SL1A - - N.C.
RFU.28 FMW.6 VCC __| SL1B - - N.C.
RFU.2,12 FMW.7 VSS --------- DSUB.18-25 CNTR.19-30 Ground
RFU.8 FMW.8 Q --------- DSUB.11 CNTR.11 Busy
P00 Joypad-A ---|>|--- DSUB.2 CNTR.2 D0
P01 Joypad-B ---|>|--- DSUB.3 CNTR.3 D1
P02 Joypad-Select ---|>|--- DSUB.4 CNTR.4 D2
P03 Joypad-Start ---|>|--- DSUB.5 CNTR.5 D3
P04 Joypad-Right ---|>|--- DSUB.6 CNTR.6 D4
P05 Joypad-Left ---|>|--- DSUB.7 CNTR.7 D5
P06 Joypad-Up ---|>|--- DSUB.8 CNTR.8 D6
P07 Joypad-Down ---|>|--- DSUB.9 CNTR.9 D7
RTC.1 INT aka SI --------- DSUB.10 CNTR.10 /Ack
Parts List: 15 wires, four (DS) or twelve (DS-Lite) "BAT 85" diodes, 1 parallel port socket.

DS XBOO Connection Notes
The Firmware chip (FMW.Pins) hides underneath of the RFU shielding plate, so it'd be easier to connect the wires to the RFU.Pins (except DS-Lite: The RFU pins are terribly small (and have different pin-numbers), so either using FMW.Pins, or using mainboard vias (see below GIF) would be easier). The easiest way for the /W-to-VCC connection is to shortcut SL1 by putting some solder onto it.
The P00..P07 and INT signals are labeled on the switch-side of the mainboard, however, there should be more room for the cables when connecting them to via's at the bottom-side (except DS-Lite: P01 is found only at switch-side) image below may help to locate that pins,


At the parallel port side, DSUB.Pins or CNTR.Pins can be used for 25pin DSUB or 36pin Centronics sockets, the latter one allowing to use a standard printer cable.
The ring printed on the diodes is pointing towards parallel port side, the 4 diodes are required to prevent the parallel port to pull-up LOW levels on the NDS side, be sure to use BAT85 diodes, cheaper ones like 1N4148 are loosing too much voltage and won't gain stable LOW levels.
The power managment chip in the DS-Lite simply refuses to react to the Power-On button when P00..P07 are dragged high by the parallel port (even if it is in HighZ state), the 8 diodes in the data-lines are solving that problem (they are required on DS-Lite only, not on original DS).

DS XBOO Operation Notes
The main Upload function is found in no$gba Utility menu, together with further functions in Remote Access sub-menu.
Before uploading anything: download the original firmware, the file is saved as FIRMnnnn.BIN, whereas "nnnn" is equal to the last 16bit of the consoles 48bit MAC address, so Firmware-images from different consoles are having unique filenames. If you don't already have, also download the NDS BIOS, the BIOS contains encryption seed data required to encrypt/decrypt secure area; without having downloaded the BIOS, no$gba will be working only with unencrypted ROM-images. Next, select Patch Firmware to install the nocash firmware.

DS XBOO Troubleshooting
Be sure that the console is switched on, and that the XBOO cable is connected, and that you have selected the correct parallel port in no$gba setup (the "multiboot" options in Various Options screen), and, of course, try avoid to be fiddling with the joypad during uploads.
I've tested the cable on two computers, the overall upload/download stuff should work stable. The firmware access functions - which are required only for (un-)installation - worked only with one of the two computers; try using a different computer/parallel port in case of problems.

Nocash Firmware
The primary purpose is to receive uploaded NDS-images via parallel port connection, additionally it's containing bootmenu and setup screens similar to the original firmware. The user interface is having less cryptic symbols and should be alltogether faster and easier to use. Important Information about Whatever is supported (but it can be disabled). The setup contains a couple of additional options like automatic daylight saving time adjustment.
The bootmenu allows to boot normal NDS and GBA carts, it does additionally allow to boot NDS-images (or older PassMe-images) from flashcards in GBA slot. Furthermore, benefits of asm coding, the nocash firmware occupies less than 32KBytes, allowing to store (and boot) smaller NDS-images in the unused portion of the firmware memory (about 224KBytes), the zero-filled region between cart header and secure area, at 200h..3FFFh, is automatically excluded, so the image may be slightly bigger than the available free memory space.

Missing
Unlike the original firmware, the current version cannot yet boot via WLAN.