Hacker News new | past | comments | ask | show | jobs | submit login
Reverse engineering the binary data format for Star Wars: Yoda Stories (zachtronics.com)
299 points by krispykrem on Oct 7, 2014 | hide | past | favorite | 66 comments



I'm not sure why there's a sports car in the game's tileset

I have a suspicion -- it's from one of Mark Hamill's more-regrettable roles. What a great Easter Egg! (unless it's actually part of the game somewhere)

http://fffmp.fffmoviepostersc.netdna-cdn.com/wp-content/uplo...


Holy crap, that's absolutely the same car. I'll update the writeup with your discovery. Thanks!


I guess the two tiles are really unused, aren't they? If they are, that's great stuff for the The Cutting Room Floor wiki (http://tcrf.net)! Don't get lost on there as it's somewhat like TV Tropes.


In your page update, Mark's family name should read Hamill.


Sort of related:

The trailer for Corvette Summer can be watched at https://www.youtube.com/watch?v=4W9pmT6JTO4

The 70's was a strange time for custom cars. This movie preserves the custom car culture of the era fairly well.


This put a big ol' smile on my face. I remember Yoda Stories (there was a similar game, Indiana Jones and his Desktop Adventures, same engine?)

Very thorough RE of the data format. Posts like this are why I still come to HN!


Yeah the author mentions that it was probably a later version of the same engine: The VERS identifier clearly starts a "version" section, which contains the following four bytes: 0x00, 0x02, 0x00, 0x00. My guess is that this is version 2.0 of the file format, as Yoda Stories was actually the successor to an Indiana Jones game that appears to use the same engine,


Every time I think "hey, I have this funky DAT file that I need data from. I bet I could reverse engineer it." And then I open the file and my eyes glaze over. Then I read a headline like this, and I think, "Hey, maybe this will help." And then I realize... nope, still way over my head.

Sigh.


10/10 good article.

I wonder if something remotely similar is possible with todays games. With all the custom data formats and compression algorithms being used.


It's not necessarily easy, but decoding MMO data files can be massively profitable for players. Reams of valuable data are often stored client-side to save on sending it during gameplay. You can maybe learn what items are in the game (including items in not-yet-released updates), where the items drop, and what the exact % droprate they have. You can learn the exact HP and other stats of mobs (if it isn't normally displayed).

If you mean extracting images, 2d image formats are pretty easy and 3d are quite doable. Things like http://kayin.moe/?p=2218 exist for some games.


Yeah, most often tools like you linked are built by partially reverse engineering (ie: with a disassembler and a lot of time) the game's binary though rather than a simple guessing-based method as the author used here.


I'm still bummed that OnLive died. Seemed to me like it would have been the PERFECT platform to kill cheaters for MMOs that weren't twitch-based.


I doubt there are really many custom compression algorithms, mainly just custom container formats.


There are some. I had to deal with a microsoft project using one, and in the end the only solution we came up for dealing with it was extracting the decompression code out of the EXE file and embedding it in a DLL (at run-time, to prevent copyright infringement issues).



Absolutely! Are there any games you'd like to see in particular?


Nothing in particular.

I love reading RE posts but I don't plan on making any mods to any games in the near future. Therefore I don't want to specify any 'targets' for your efforts unless I'm willing to extend them :)

I was talking about something big and sophisticated, like Crysis or Battlefield (from the top of my head). I'm under the impression they have something more than uncompressed bitmaps and 8bpp resolution but I've never done anything like this to be certain.

Thanks! Keep up the good work.


The CryEngine documentation would be a good start for Crysis http://docs.cryengine.com/display/SDKDOC1/Home. Battlefield would require some reverse engineering since it runs on Frostbite, which EA's internal game engine, and not publicly available.


I'm wondering whether there are any Android games that'd use custom formats or whether lots use off the shelf engines.

I've always had a passing interest in game reverse engineering - I remember the days when "ripper" applications had a good chance of pulling the music out of a game.


I would love to see something like Football Manager, it's perfect for data extraction, I think, as it's largely a game of data.

I know a few programs are able to read the data, such as FM Genie Scout (Watch out if you download, it's filled with adware) and FMRTE (Paid only).


I would be very interested to see this happen. I'd like to get into RE but don't know where one would begin. Having an example from a modern game might give some nice pointers to start working on the games I play and love right now.


personally, its not games, but approaches to getting information from games

finding / latching onto a rendering call and shooting the inputs to that call off to a separate file to export market data from a game, was one of the more interesting approaches I've heard. But how someone managed to work that out is beyond me


The four bytes of ID followed by four bytes of size is an old and somewhat standard technique: http://en.wikipedia.org/wiki/Interchange_File_Format


The author also created SpaceChem, which is one of the best programming/engineering games out there. For just programming games, I'd call it my second all time favorite just behind Robot Odyssey, an ancient Apple ][ game.

This gives some great context to where SpaceChem came from. Thanks Zach!


I thought I recognized the favicon! He's also the guy behind Infiniminer (the primary inspiration for Minecraft) and KOHCTPYKTOP (a game where you design basic digital circuits that I'd like to personally thank him for, for expanding my mind beyond the "pure" software realm: Thanks, Zach!)


SpaceChem is included in the current Humble Bundle, with still 10 hours remaining, FWIW.


Sweet! Thanks for the tip. looks like EFF just got another donation for $6.54 USD, and thank you Zach for donating your software for charity!


I want to pick up more skills like this. I've just done something similiar at work to break in to some event recorder files we could only view as the application provided no utility for exporting the data into CSV (among many other limitations). I was pretty lucky and fluked a lot of it, mainly by just identifying a pattern and messing with the hex values of a file then viewing what it did in the viewer application provided.

I'd love to expand my skills and try this out on a number of other projects. Is there some good starting material which can push me in the right direction?


Great article, brings back hex hacking, I remember doing the same with Virtua Tennis for the PC, then someone else wrote up an article after doing the same thing.

http://www.gamefaqs.com/pc/557900-virtua-tennis/faqs/19110

I think I made a little tool to unlock all the options and enabling players you couldn't otherwise get IIRC.


Great article. Thanks for being so rigorous with the step-by-step process and for annotating your thoughts on each step, it's very instructive.


That's the difficult part about documenting CTFs and reverse engineering tasks. In the past I tried to do it afterwards but it's a bit clunky. Now I try to do it at the same time, like a journal, and it's getting better (still not as good as OP :))


This is a really well written article; full of great, reusable tricks and techniques in reverse engineering. Clear screenshots, humorous yet technical content, neat results. Hats off to you, sir, keep up the good work and I look forward to reading more of your tutorials.


This looks to me like the file is in something like the IFF format: http://en.wikipedia.org/wiki/Interchange_File_Format



Thank you for that write up, it's very detailed and showed your thought process extremely well.


Love it. Any pointers for an easier old (or new) game to have a go at pulling apart like this?


I'd definitely check out Halo 2. The modding community was massive. In fact, modding Halo 2 was what introduced me to programming :)

The maps have a very basic encryption(checksum). Map signers are all over the internet if you don't feel like doing the work yourself. There's also many high level tools to play with the maps. I remember taking an a vehicle from one level and placing it in another was a trivial process. Youtube has a plethora of videos where modders show off what they could do


For anyone else looking - Halo 2 is hard to get legitimately now. eBay copies are looking like upwards of $90AUD!


That must be an Australian issue because on eBay and in local stores, it is still $10ish.


I'm not sure - http://www.ebay.com.au/sch/Video-Games-Consoles-/1249/i.html...

Amazon has a (used?) copy for ~$17 - AFAICT it requires online activation so this may not work well.


I was referencing the xbox version of the game. If I remember correctly, the pc version of the game actually allowed cheaters, which takes all the fun out of it IMO

Im sure you could find an old xbox and the game for less than $90AUD. Or, you could buy an xbox, flash the disk drive, download a halo .iso and burn it to a verbatim disk. Thats what I usually do


NES games are pretty easy to tear apart, and emulator tools are pretty good. I use fceuxdsp, running in Wine because I'm too lazy to build it natively. The 6502 was too slow to do any complex compression, and there was really no need to encrypt an NES ROM. Later consoles, of course, get more complex, but the NES is a good place to start.


I don't think that the lack of compression is a matter of speed. After all, C64 games did it all the time. The reason is more likely that it is convenient to store graphics and code plainly in the ROM since it can then be mapped direcly. The NES didn't have enough RAM.


Doom's WAD format is very well documented but still a fun binary format to write a parser to read. I remember writing a parser for it in QBASIC when I was a teen!


Can you upload those sprites?



I found Waldo! Just kidding, but I found Indiana Jones. I also doubt, that the sprites on the left side of him, are part of this Star Wars game.

Edit:

I did the same for the Indiana Jones game: https://i.imgur.com/c1kKfV7.png

(thanks ImageMagick's convert and montage)

Edit 2:

Well, they're part of the game according to http://starwars.wikia.com/wiki/Star_Wars:_Yoda_Stories:

>Indiana Jones is featured as an easter egg in a mission, as a sequel to a mission in Desktop Adventures. His similarity to Han Solo is remarked upon.


You should take them down - you don't have the right to copy and distribute those images. That link is illegal. Please take it down.


https://www.synalysis.net is a program specifically for reversing binary files.


That was such a great game.


I'm stretching my memory a little but I think there was an Indiana Jones game using what I vaguely recognised at the time as the same engine, which I have equally fond memories of!


http://en.wikipedia.org/wiki/Indiana_Jones_and_His_Desktop_A... is what you're looking for.

I'll have to see later if the same tricks apply to both games!


the site is taken down!!!!!!!!!!!!!!!!!!


Thanks! :)


Can someone please write a C/C++/Go/Rust version as I am C# Illiterate .


If I may be so blunt, I suggest you help yourself. You will no longer be 'C# illiterate' when you have.

I'm having difficulty understanding in what way C# is difficult to read for someone who can write in the languages that you mentioned.


Must be a troll or a comment on the wrong thread. I know neither language, have no clue about programming and have no trouble understanding the logic in the code snippets.


It is frustrating to see tools like this written with C#, because most C# developers have no mind to cross-platform compatibility. Often they'll build the UI using the Microsoft-specific frameworks, which Mono doesn't support. .NET programs also don't often run well in Wine, leaving me with a ton of work to do to get it running on Linux.

It's not really the language that's the problem so much as the incredibly proprietary environment in which it's used.


This isn't tools being offered it is a description of how to do it yourself (tools will need to be adjusted to data). The code presented looks very clear with simple self explanatory basic calls made (mostly accessing, reading and writing files).


This code should work as is using Mono.

But the complaint was weird in the first place?


> This code should work as is using Mono.

Sure, this one does, but nearly any .NET program with a GUI won't. It's very frustrating to run into useful programs that I can't run, and most often applications that fit that description are written in a .NET language.

> But the complaint was weird in the first place?

Yeah, true. There's nothing hard about reading C#, just running it.


You make it sound as if there were no portable (or 'non-windows') C# applications. Stuff I care about:

http://banshee.fm/

https://wiki.gnome.org/Apps/Tomboy

http://f-spot.org/

http://keepass.info/


At least in the game-hacking-with-a-GUI space, there very nearly aren't any.


Wow, you're really narrowing this down now. OK, there aren't many tools in the portable C# game-hacking-with-a-GUI space. Why is that such a great issue, and why is it relevant to this article?


It's just annoying, as someone who likes to dabble in that space. There's a lot of good, interesting work done and it's frustrating that it's done using a proprietary technology I can't use when there are tons of other options.

This article is about game hacking using C# and never addressed the issue of portability, because game hackers who use C# never think about portability. That was my point.


Maybe it doesn't address the "issue" of portability because it in fact is a portable program. Maybe it doesn't address portability because it's an article about reverse engineering a game. Maybe it doesn't address portability because the code is meant to illustrate his process rather than for others to port it. I still don't see how your criticism is relevant at all.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: