NES Emulation Saga - Writing a NES emulator
Part II: the PPU
This is a series (part1, part3) of articles on the development of my own NES emulator.
It is coded in a mixture of C/C++, using Allegro 4, and is hosted at bitbucket
(link to repository).
The PPU is NES' Picture Processing Unit. It is fundamental in a NES emulator. The first thing a game does is usually check PPU registers in a loop
to wait for two frames, and then after the reset code runs many games will go into an infinite loop and have everything run in the vblank interrupt handler.
; Clear the vblank flag if it was set at reset time
bit PPU_STATUS
; Wait 2 vblanks
- bit PPU_STATUS
bpl -
- bit PPU_STATUS
bpl -
;Rest of init code
;[...]
loop:
jmp loop
vblank:
;Most of game code
rti
; Clear the vblank flag if it was set at reset time
bit PPU_STATUS
; Wait 2 vblanks
- bit PPU_STATUS
bpl -
- bit PPU_STATUS
bpl -
;Rest of init code
;[...]
loop:
jmp loop
vblank:
;Most of game code
rti
Main emulator loop
Therefore, the main emulator loop ends up structured around the PPU, and looks like this:
unsignedint cyclesCount = 0;
for(int scanline = 0; scanline < 262; scanline++){
switch(scanline){
case0 ... 240:
//Visible lines
RenderScanline(nes_mem, scanline, screen);
break;
case241:
//First vblank lineif(nes_mem.ppu[PPUCTRL] & PPUCTRL_NMI) machine.NMI();
nes_mem.ppu[PPUSTATUS] |= PPUSTATUS_VBLANK;
//Usually, since it is the most predictable state,//emulators do savestate saving/loading here alsobreak;
case261:
//Last line before rendering starts anew
nes_mem.ppu[PPUSTATUS] &= (255u - PPUSTATUS_VBLANK);
nes_mem.ppu[PPUSTATUS] &= (255u - PPUSTATUS_SPR0HIT);
break;
default:
break;
}
while(cyclesCount < 113){
int cycles = machine.DoStep();
cyclesCount += cycles;
}
cyclesCount -= scanlineCycles;
}
unsigned int cyclesCount = 0;
for(int scanline = 0; scanline < 262; scanline++){
switch(scanline){
case 0 ... 240:
//Visible lines
RenderScanline(nes_mem, scanline, screen);
break;
case 241:
//First vblank line
if(nes_mem.ppu[PPUCTRL] & PPUCTRL_NMI) machine.NMI();
nes_mem.ppu[PPUSTATUS] |= PPUSTATUS_VBLANK;
//Usually, since it is the most predictable state,
//emulators do savestate saving/loading here also
break;
case 261:
//Last line before rendering starts anew
nes_mem.ppu[PPUSTATUS] &= (255u - PPUSTATUS_VBLANK);
nes_mem.ppu[PPUSTATUS] &= (255u - PPUSTATUS_SPR0HIT);
break;
default:
break;
}
while(cyclesCount < 113){
int cycles = machine.DoStep();
cyclesCount += cycles;
}
cyclesCount -= scanlineCycles;
}
Scrolling storm
The main difficulty in emulating the PPU is the complex behaviour around scrolling and register writes. This seems to have been first well documented
in a famous 1999 document called "SKINNY.TXT" and now well known as loopy's PPU doc. It is described in better detail at the NESdev wiki pages
PPU rendering and
PPU scrolling.
While still quite simple, with the hack mentioned by the comment even Super Mario Bros plays fine, a notoriously difficult game to emulate. The way
the renderer calculates the addresses for graphical data is also quite hacked together: for(unsignedint scr_x = 0, x = nes_mem.ppu_x; scr_x < 256; scr_x++, x++){
constunsignedint coarse_x_lo = (x >> 3u) & (bit5 - 1u);
constunsignedint coarse_y_lo = ((y >> 3u) % 30);
constunsignedint coarse_x_hi = (x >> 3u) >= 32u; // (x >> 3u) >= 32;constunsignedint coarse_y_hi = (y >> 3u) >= 30u;
constunsignedint fine_x = x & (bit3 - 1u);
constunsignedint fine_y = y & (bit3 - 1u);
constunsignedint nametbl = nes_mem.ppu[PPUCTRL] & PPUCTRL_NAMETBL;
unsignedint bg_name_addr =
(coarse_x_lo) |
(coarse_y_lo << 5u) |
((nametbl << 10u) ^ ((coarse_x_hi << 10u) | (coarse_y_hi << 11u))) |
bit13;
constunsignedint bg_name = nes_mem.GetPPU(bg_name_addr);
constunsignedint bg_table = (nes_mem.ppu[PPUCTRL] & PPUCTRL_BGADDR) >> 4u;
constunsignedint bg_plane0_addr =
fine_y |
0 |
(bg_name << 4u) |
(bg_table << 12u);
constunsignedint bg_plane1_addr =
fine_y |
bit3 |
(bg_name << 4u) |
(bg_table << 12u);
for(unsigned int scr_x = 0, x = nes_mem.ppu_x; scr_x < 256; scr_x++, x++){
const unsigned int coarse_x_lo = (x >> 3u) & (bit5 - 1u);
const unsigned int coarse_y_lo = ((y >> 3u) % 30);
const unsigned int coarse_x_hi = (x >> 3u) >= 32u; // (x >> 3u) >= 32;
const unsigned int coarse_y_hi = (y >> 3u) >= 30u;
const unsigned int fine_x = x & (bit3 - 1u);
const unsigned int fine_y = y & (bit3 - 1u);
const unsigned int nametbl = nes_mem.ppu[PPUCTRL] & PPUCTRL_NAMETBL;
unsigned int bg_name_addr =
(coarse_x_lo) |
(coarse_y_lo << 5u) |
((nametbl << 10u) ^ ((coarse_x_hi << 10u) | (coarse_y_hi << 11u))) |
bit13;
const unsigned int bg_name = nes_mem.GetPPU(bg_name_addr);
const unsigned int bg_table = (nes_mem.ppu[PPUCTRL] & PPUCTRL_BGADDR) >> 4u;
const unsigned int bg_plane0_addr =
fine_y |
0 |
(bg_name << 4u) |
(bg_table << 12u);
const unsigned int bg_plane1_addr =
fine_y |
bit3 |
(bg_name << 4u) |
(bg_table << 12u);
I suppose this must look terrible to someone familiar with how the NES PPU operates, or a more accurate emulator's writer.
Glitches in graphical detail
I already shared some interesting, glitch-art like screens on Twitter when first
starting to code this.
Before I implemented reading from the PPU, which is something a few, mostly older games do, some games would lack collision with the world. In PacMan,
famously you can go right through walls. I don't think it looks as ridiculous/amusing as Mappy however:
Limitations of the approach
The approach to scrolling mentioned above works well for games that scroll horizontally, but it seems to fail when games do
vertical scrolling at a split point - a more involved technique that involves both PPUSCROLL and PPUADDR writes in a particular sequence. This issue
can be easily seen in The Legend of Zelda and in Duck Tales:
I don't think it is worth it to keep to this hacky method, it makes more sense to rewrite the PPU to do the correct thing. It shouldn't be too much
work, but annoying to read all the specs with great attention to detail.