Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Wolfenstein 3D – Gameboy cartridge with co-processor (happydaze.se)
493 points by phoboslab on Dec 18, 2016 | hide | past | favorite | 55 comments


This guy's combo of hardware and software engineering is stunning.

That he then has the aesthetic capability to knock out a beautiful fucking box for the cartridge is just humbling.


While this whole project certainly is incredible, the box he created is a remix of existing art and design work:

http://www.gamefaqs.com/pc/564603-wolfenstein-3d/images/1456...

http://www.vgmpf.com/Wiki/index.php?title=File:Wolfenstein_3...


Yeah well I wouldn't even be able to make a blank cardboard box.


Exactly this. I hope to high heaven that this person makes piles of money for how potent their grey matter is.


I'm wondering if he is throwing hardware at a software problem?

Is the gameboy hardware not good enough to play wolf3d by itself? Is a co-processor needed, or just more optimized / better software?


Not even close. The GBC used what is essentially a Zilog Z80 running at 8.3Mhz.

Most problematic is that the Z80 is only an 8bit CPU.

Original minimum requirement on PC for Wolf 3D was a 286, which was a 16bit processor running at 25mhz.


The 386 ran at 25Mhz. I'm pretty certain I had Wolfenstine 3D running on a 286 at 12Mhz .. but that was a long time ago and my memory could be off.

I remembered I couldn't play Doom though. It required a 386+.


Another hurdle is the graphics hardware. The Game Boy uses a tile and sprite based approach, which is great for a wide variety of 2D games but becomes a major bottleneck when you want rendering at a lower level. Suddenly you don't only have to calculate the pixel color, you have to find its offset in the tile map and write it to the correct bits of the correct byte of the tile.

The 256 color VGA frame buffer is perfect for this type of work because of the 1-1 relationship between bytes and pixels, and the simple correlation between memory offset and pixel position.

So you have a slower CPU that's 8-bit (basically an 8080) rather than 16-bit doing more work because of the complex graphics model.


DOOM had a more sophisticated and CPU-demanding graphical engine.


An interesting reminder of what we lose when we extend copyrights ad infinitum.


Wolfenstein in still under copyright. This project would not be lost because of extended copyright law.


> Wolfenstein in still under copyright. This project would not be lost because of extended copyright law.

Yep and will be for another 100 years or so assuming Carmack et al live another 30 plus the 70 for post-creator-death. Thanks Mickey Mouse! :D


The original length of copyright in the United States was 14 years plus an optional 14 year extension.

The first Wolfenstein game was released in 1981.


I think 80-81 was when USA signed the Bern convention that defined copyright as life+50 for authors (when assigned to corporations it has a fixed duration of, iirc, 90 years).

But the Bern convention also have a stipulation that all signatories have to respect the duration of the initial nation of publication, and that can be longer than the minimum terms of the convention agreement.

Thus you get a ratcheting effect where multinationals will try to convince national governments to up their copyright terms to be "more competitive".

A kind of inverse to the race to the bottom that they first did on taxes between US states (leading to Delaware being the state to file your incorporation in), and has since applied across the globe under the banner of competition.

BTW, there is a claim that Lord of the Rings became popular because a US publisher thought he didn't need to respect Tolkien's UK copyright when publishing a cheap paperback. At the time USA had not yet signed the Bern convention.

Frankly it seems like a historic pattern where a industrial nation will begin to slow down, try to shore up its economy by using IP laws, and another nation coming along and ignoring those laws to bootstrap their own industry, and then repeating the patterns some decades down the road.

So far the changeover has been UK to USA to China. And you can basically see China trying to clamp down on their lax IP adherence right now.


ID software open sourced a bunch of their older stuff, including Wolfenstein:

https://github.com/id-Software/wolf3d

Edit: never mind, it's just the engine. The actual data files you need to pull from a copy of the game.


Just my kind of things. That sort of retrofuturism, patching the old with just enough new to make it sexy.

Serious kudos.


Wolfenstein 3D predates the Gameboy Color by six years (1992 and 1998 respectively), and was officially ported to the SNES in 1994, the Gameboy's contemporary TV video gaming console, so this isn't retrofutristic at all.


Well well well, both have a Z80 (half clock though) of late 70s design thank you.


SNES did not have a Z80.


The GBA which had the W3D port had an ARM7.


Amazing - I still don't fully understand how you manage to do rays on a tile based system, with so many textures to account for...


I'm just guessing here, but it looks like the two processors share SRAM and the beefy ARM processor draws the scene and writes it as tiles to SRAM. The Z80 reads the tiles from SRAM and blits them to the screen.

Basically you have an ARM processor doing a whole lot of work, and a Z80 in charge of moving it around and drawing supporting UI.


Yes, that is essentially what I'm doing. The Arm does most of the heavy lifting for the actual game and the Z80 does input, sound, HUD, palette fading, hands+gun, main game loop, and of course it spends a lot of time just shuffling data to vram. RAM is limited so the KE04 internally renders to a 2 bitplane framebuffer (the bit-banding feature of KE04 greatly helps). When the Z80 needs the next frame it triggers an interrupt on the KE04 to wake it from sleep mode, it converts the framebuffer into GB vram ready tile + map attribute data and stores it on the dp-sram. Z80 can then DMA directly from dp-sram into vram. Some ranges in the dp-sram are for command buffers used for Z80<->KE04 communication.

The KE04 I am using has 128Kb ROM and 16Kb RAM, and lacks hardware division. This presents some interesting challenges in terms of memory and rom space usage, and juggling speed vs ram/rom usage. You could of course put something much beefier in there but I think that would take too much of the fun away from the project.

All in all it's great fun and I'm learning a lot as I go along. If I were to make another hardware revision I would use a CPLD instead of the MBC1 chip, and try to loose the dp-sram in favour of a normal sram.

Cheers, --Anders


Yeah, more or less.

They're utilizing a dual-port SRAM, meaning that the co-processor can read and write to the RAM at the same time as the Gameboy CPU can read and write to it. Those pins along the cartridge edge are actually just the address and data lines of the Gameboy CPU.

They've written a program for the Gameboy CPU whose job is to DMA data from the RAM to video RAM (it's a bit more complicated due to the architecture of the Gameboy GPU not being set up for streaming video at it).

The game itself is running on the ARM co-processor, writing data to a known location in the DP-SRAM and the Gameboy CPU is streaming it to the display.

Very cool stuff!


That's very similar to what people are doing with the BeagleBone black...there's a pair of PRUs in the AM3XXX processor that have direct access to memory. So, you do the hard work on the ARM, but let the PRU push pixels (or other data) that needs to be real time, jitter free, etc.

This guy is driving the CRT on an old mac with it...very cool: https://trmm.net/Mac-SE_video


That's basically right. He does some fancy stuff like DMA from the cartridge to VRAM to make it fast enough, but basically he just copies each frame into memory on the CGB and then swaps the background being displayed to make it appear. It takes two V-Blanks to copy an entire frame, so it runs at 30 FPS (While the CGB runs at about 60FPS).


I've invited the author to this thread since it seems several people are guessing/assuming what he is doing, plus I'm sure they'd like to know the great reaction they got here :)


That kind of trick was very common back in the day.

The Amiga used a very similar setup between the chipset and the CPU.

And cartridge based consoles have often included coprocessors on the carts (but nothing as potent as this ARM). At the tail end of the SNES years there was even a simple "GPU" in some of its carts.


> At the tail end of the SNES years there was even a simple "GPU" in some of its carts.

Starfox was the most famous example of that approach, right?


I believe it was the FX chip, also used in Vortex I think, and there was also the Mode7 chip IIRC.


Doom as well. It didn't really help though


I always wondered how all those 3D polygon games on the Sega Genesis tile engine too. I guess the games just rendered a screen buffer in RAM and created tiles on the fly.


Elite, as implemented on the NES: https://www.youtube.com/watch?v=zoBIOi00sEI

Most games had their graphics tiles stored in ROM, but some games had 8KB of RAM instead of bank-switched ROM. Elite is in that category. So is Legend of Zelda, although Zelda's tiles seem to be stored verbatim in the program ROM, while Elite's must be algorithmically generated.


Starfox had a 3D co-processor too! https://en.wikipedia.org/wiki/Super_FX

Nintendo really pulled off some impressive feats back in those days.


wasn't actually Nintendo that made the SuperFX chip it was Argonaut Games https://en.wikipedia.org/wiki/Argonaut_Games who developed it

They originally codenamed it the Mathematical Argonaut Rotation I/O, or “MARIO”, as is printed on the chip's surface.


It's crazy to remember that the GBC lacked even Mode 7, of which something comparable wouldn't come to Nintendo handhelds until the Game Boy Advance. A full raycaster running on the Color, at a high frame rate no less, is a very strange sight.

I wonder just how wildly impractical it would have been to build such outboard hardware acceleration into a cartridge in 1998.


The SNES had Doom running with a SuperFX 2 coprocessor in 1995. https://www.youtube.com/watch?v=n18tcF4nbqE

The Gameboy had several address banks which allowed for whatever coprocessor you wished to put in, you just DMA out of the address space. I suspect the unit volumes just weren't there to justify the enigineering expense in the Gameboy's case.

Coincidentally, the Gameboy game X from 1992 ( https://www.youtube.com/watch?v=AyjU4MtonZM )was developed by Dylan Cuthbert, who later went on to work on Star Fox and the Super FX chip.


That Doom music. :D


I wonder what techniques Faceball 2000 used on the original Gameboy ( https://www.youtube.com/watch?v=iZq31JUYf8M )

That was the first FPS game I played, and it appears to use a very limited form of raycasting. Maybe just drawing entire walls with one raycast and some scaling.


It looks like "pop-in" happens frequently on that so I'd wager that a short ray and not very many to create a whole circle (which would necessitate only looking at closer things) meant a ton less calcs per frame.


Someone managed to make this for the TI-83+, which only had a Z80 chip and nothing else:

https://www.youtube.com/watch?v=9wjM8ude3Rs


I remember playing that in high school! That's really impressive use of the tech, and was staggeringly impressive to me at the time. Learning z80 assembly on the TI series got me into programming at an early age.

It's worth pointing out that the TI-83(+) contains a proper z80, running at 6MHz, with full support for the z80's extended instruction set and 16-bit arithmetic functions, which certainly helped this game out quite a bit. The Gameboy itself is somewhat underpowered in comparison; its Sharp80 processor is considerably stripped down, lacks most of the extended instruction set, and runs at a slower 4MHz. Given these limitations, I'm (a) staggeringly impressed to see Wolfenstein running on the thing in any capacity, and also (b) totally understand why the coprocessor is necessary to make it work, especially at that buttery smooth framerate. There's no hardware graphics scaling for one, all tech demos I've seen that do scaling appear to be doing so entirely in software and using hblank trickery to speed things up a bit. Doing a raytracer without hardware scaling for the texture lookups (or any ability to rewrite VRAM mid-scanline for that matter) would be pretty tough.

It's also curious to point out that this probably wouldn't be possible on the Black and White Gameboy (Pocket); from his frame disassembly, he's using the arbitrary DMA copy hardware exclusive to the Gameboy Color, and I don't think a straight sharp80 copy routine would be able to complete all 120 tiles during a single vblank and still have any room to do much of anything else.

I'm presently working on a Gameboy Emulator in Lua, and it seems like this is another esoteric cartridge I'll have a ton of trouble supporting. An entire additional CPU inside the cart! What sorcery :)

http://bgb.bircd.org/pandocs.htm#gameboytechnicaldata https://en.wikipedia.org/wiki/TI-83_series#Technical_specifi...


How does one even copy such a cartridge to a file?


The big difference there is that the gameboy didn't have a way for you to draw on the screen directly. It was a sprite/tile-mapped only system. Same with the NES and SNES, which is why a co-processor such as the SuperFX was needed, to generate the tiles needed for arbitrary graphics. the Megaman X series also used a coprocessor (in 2+ i think?) that allowed them to use compressed data and arbitrary rotation of a sprite by generating the tiles on the fly.


Some few games on the NES had RAM instead of ROM for graphics storage. Many more games supported bank-switching. Each 8KB bank of graphics tiles would provide enough tiles to uniquely cover a little over 1/4 of the screen, and hsync time would be enough time to switch to the next bank, every 1/4 of the image.

I think that transferring 60K of data from the CPU to PPU every frame would be completely infeasible, though. All that in mind, I'm very impressed by how well the game Elite runs: https://www.youtube.com/watch?v=zoBIOi00sEI


Interestingly, in MegaMan X2 and X3, the Cx4 coprocessor built into the game cartridge is used for some very specific wireframe animations used in only very specific areas of each game and was largely unnecessary. (Only 2 scenes in X2: a miniboss and the final boss, and one scene in X3: after defeating the final boss)


Interesting, I had thought it did some sprite compression also but it looks like I was confusing it with the SDD-1. There's a lot of neat co-processors out there that got used. It's one of the things I loved about the cartridge era systems because it let them get upgraded in ways you can't do anymore.



This is huge!!! I just pulled my super mario cartridge yesterday out of storage to test that it still works. I would love to get a copy when you are done.


He should seriously consider replacing the Wolfenstein content with his own stuff and release it as an independent title! I know I would love to see a new release for my GBC...


Just curious, but what would you pay?


There are people out there making new NES games and selling them on real carts. They go for about $40; obviously they don't typically include ARM coprocessors. Sometimes they have blinkenlights though.

https://www.retrousb.com/index.php?cPath=30


I'd pay somewhere around $50 if it included box art and everything. I don't imagine that's quite enough to turn a profit.


I'd pay $39-49 because it's really cool.


Perhaps he can get a cheap license from iD software to do a couple hundred of these cartridges...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: