Another hurdle is the graphics hardware. The Game Boy uses a tile and sprite based approach, which is great for a wide variety of 2D games but becomes a major bottleneck when you want rendering at a lower level. Suddenly you don't only have to calculate the pixel color, you have to find its offset in the tile map and write it to the correct bits of the correct byte of the tile.
The 256 color VGA frame buffer is perfect for this type of work because of the 1-1 relationship between bytes and pixels, and the simple correlation between memory offset and pixel position.
So you have a slower CPU that's 8-bit (basically an 8080) rather than 16-bit doing more work because of the complex graphics model.
I think 80-81 was when USA signed the Bern convention that defined copyright as life+50 for authors (when assigned to corporations it has a fixed duration of, iirc, 90 years).
But the Bern convention also have a stipulation that all signatories have to respect the duration of the initial nation of publication, and that can be longer than the minimum terms of the convention agreement.
Thus you get a ratcheting effect where multinationals will try to convince national governments to up their copyright terms to be "more competitive".
A kind of inverse to the race to the bottom that they first did on taxes between US states (leading to Delaware being the state to file your incorporation in), and has since applied across the globe under the banner of competition.
BTW, there is a claim that Lord of the Rings became popular because a US publisher thought he didn't need to respect Tolkien's UK copyright when publishing a cheap paperback. At the time USA had not yet signed the Bern convention.
Frankly it seems like a historic pattern where a industrial nation will begin to slow down, try to shore up its economy by using IP laws, and another nation coming along and ignoring those laws to bootstrap their own industry, and then repeating the patterns some decades down the road.
So far the changeover has been UK to USA to China. And you can basically see China trying to clamp down on their lax IP adherence right now.
Wolfenstein 3D predates the Gameboy Color by six years (1992 and 1998 respectively), and was officially ported to the SNES in 1994, the Gameboy's contemporary TV video gaming console, so this isn't retrofutristic at all.
I'm just guessing here, but it looks like the two processors share SRAM and the beefy ARM processor draws the scene and writes it as tiles to SRAM. The Z80 reads the tiles from SRAM and blits them to the screen.
Basically you have an ARM processor doing a whole lot of work, and a Z80 in charge of moving it around and drawing supporting UI.
Yes, that is essentially what I'm doing.
The Arm does most of the heavy lifting for the actual game and the Z80 does input, sound, HUD, palette fading, hands+gun, main game loop, and of course it spends a lot of time just shuffling data to vram.
RAM is limited so the KE04 internally renders to a 2 bitplane framebuffer (the bit-banding feature of KE04 greatly helps).
When the Z80 needs the next frame it triggers an interrupt on the KE04 to wake it from sleep mode, it converts the framebuffer into GB vram ready tile + map attribute data and stores it on the dp-sram.
Z80 can then DMA directly from dp-sram into vram.
Some ranges in the dp-sram are for command buffers used for Z80<->KE04 communication.
The KE04 I am using has 128Kb ROM and 16Kb RAM, and lacks hardware division. This presents some interesting challenges in terms of memory and rom space usage, and juggling speed vs ram/rom usage.
You could of course put something much beefier in there but I think that would take too much of the fun away from the project.
All in all it's great fun and I'm learning a lot as I go along.
If I were to make another hardware revision I would use a CPLD instead of the MBC1 chip, and try to loose the dp-sram in favour of a normal sram.
They're utilizing a dual-port SRAM, meaning that the co-processor can read and write to the RAM at the same time as the Gameboy CPU can read and write to it. Those pins along the cartridge edge are actually just the address and data lines of the Gameboy CPU.
They've written a program for the Gameboy CPU whose job is to DMA data from the RAM to video RAM (it's a bit more complicated due to the architecture of the Gameboy GPU not being set up for streaming video at it).
The game itself is running on the ARM co-processor, writing data to a known location in the DP-SRAM and the Gameboy CPU is streaming it to the display.
That's very similar to what people are doing with the BeagleBone black...there's a pair of PRUs in the AM3XXX processor that have direct access to memory. So, you do the hard work on the ARM, but let the PRU push pixels (or other data) that needs to be real time, jitter free, etc.
That's basically right. He does some fancy stuff like DMA from the cartridge to VRAM to make it fast enough, but basically he just copies each frame into memory on the CGB and then swaps the background being displayed to make it appear. It takes two V-Blanks to copy an entire frame, so it runs at 30 FPS (While the CGB runs at about 60FPS).
I've invited the author to this thread since it seems several people are guessing/assuming what he is doing, plus I'm sure they'd like to know the great reaction they got here :)
That kind of trick was very common back in the day.
The Amiga used a very similar setup between the chipset and the CPU.
And cartridge based consoles have often included coprocessors on the carts (but nothing as potent as this ARM). At the tail end of the SNES years there was even a simple "GPU" in some of its carts.
I always wondered how all those 3D polygon games on the Sega Genesis tile engine too. I guess the games just rendered a screen buffer in RAM and created tiles on the fly.
Most games had their graphics tiles stored in ROM, but some games had 8KB of RAM instead of bank-switched ROM. Elite is in that category. So is Legend of Zelda, although Zelda's tiles seem to be stored verbatim in the program ROM, while Elite's must be algorithmically generated.
It's crazy to remember that the GBC lacked even Mode 7, of which something comparable wouldn't come to Nintendo handhelds until the Game Boy Advance. A full raycaster running on the Color, at a high frame rate no less, is a very strange sight.
I wonder just how wildly impractical it would have been to build such outboard hardware acceleration into a cartridge in 1998.
The Gameboy had several address banks which allowed for whatever coprocessor you wished to put in, you just DMA out of the address space. I suspect the unit volumes just weren't there to justify the enigineering expense in the Gameboy's case.
Coincidentally, the Gameboy game X from 1992 ( https://www.youtube.com/watch?v=AyjU4MtonZM )was developed by Dylan Cuthbert, who later went on to work on Star Fox and the Super FX chip.
That was the first FPS game I played, and it appears to use a very limited form of raycasting. Maybe just drawing entire walls with one raycast and some scaling.
It looks like "pop-in" happens frequently on that so I'd wager that a short ray and not very many to create a whole circle (which would necessitate only looking at closer things) meant a ton less calcs per frame.
I remember playing that in high school! That's really impressive use of the tech, and was staggeringly impressive to me at the time. Learning z80 assembly on the TI series got me into programming at an early age.
It's worth pointing out that the TI-83(+) contains a proper z80, running at 6MHz, with full support for the z80's extended instruction set and 16-bit arithmetic functions, which certainly helped this game out quite a bit. The Gameboy itself is somewhat underpowered in comparison; its Sharp80 processor is considerably stripped down, lacks most of the extended instruction set, and runs at a slower 4MHz. Given these limitations, I'm (a) staggeringly impressed to see Wolfenstein running on the thing in any capacity, and also (b) totally understand why the coprocessor is necessary to make it work, especially at that buttery smooth framerate. There's no hardware graphics scaling for one, all tech demos I've seen that do scaling appear to be doing so entirely in software and using hblank trickery to speed things up a bit. Doing a raytracer without hardware scaling for the texture lookups (or any ability to rewrite VRAM mid-scanline for that matter) would be pretty tough.
It's also curious to point out that this probably wouldn't be possible on the Black and White Gameboy (Pocket); from his frame disassembly, he's using the arbitrary DMA copy hardware exclusive to the Gameboy Color, and I don't think a straight sharp80 copy routine would be able to complete all 120 tiles during a single vblank and still have any room to do much of anything else.
I'm presently working on a Gameboy Emulator in Lua, and it seems like this is another esoteric cartridge I'll have a ton of trouble supporting. An entire additional CPU inside the cart! What sorcery :)
The big difference there is that the gameboy didn't have a way for you to draw on the screen directly. It was a sprite/tile-mapped only system. Same with the NES and SNES, which is why a co-processor such as the SuperFX was needed, to generate the tiles needed for arbitrary graphics. the Megaman X series also used a coprocessor (in 2+ i think?) that allowed them to use compressed data and arbitrary rotation of a sprite by generating the tiles on the fly.
Some few games on the NES had RAM instead of ROM for graphics storage. Many more games supported bank-switching. Each 8KB bank of graphics tiles would provide enough tiles to uniquely cover a little over 1/4 of the screen, and hsync time would be enough time to switch to the next bank, every 1/4 of the image.
I think that transferring 60K of data from the CPU to PPU every frame would be completely infeasible, though. All that in mind, I'm very impressed by how well the game Elite runs: https://www.youtube.com/watch?v=zoBIOi00sEI
Interestingly, in MegaMan X2 and X3, the Cx4 coprocessor built into the game cartridge is used for some very specific wireframe animations used in only very specific areas of each game and was largely unnecessary. (Only 2 scenes in X2: a miniboss and the final boss, and one scene in X3: after defeating the final boss)
Interesting, I had thought it did some sprite compression also but it looks like I was confusing it with the SDD-1. There's a lot of neat co-processors out there that got used. It's one of the things I loved about the cartridge era systems because it let them get upgraded in ways you can't do anymore.
This is huge!!!
I just pulled my super mario cartridge yesterday out of storage to test that it still works.
I would love to get a copy when you are done.
He should seriously consider replacing the Wolfenstein content with his own stuff and release it as an independent title! I know I would love to see a new release for my GBC...
There are people out there making new NES games and selling them on real carts. They go for about $40; obviously they don't typically include ARM coprocessors. Sometimes they have blinkenlights though.
That he then has the aesthetic capability to knock out a beautiful fucking box for the cartridge is just humbling.