Hacker News new | past | comments | ask | show | jobs | submit login

I saw a longer video of this that Ethan Mollick posted and in that one, the sequences are longer and they do appear to demonstrate a fair amount of consistency. The clips don't backtrack in the summary video on the paper's home page because they're showing a number of district environments but you only get a few seconds of each.

If I studied the longer one more closely, I'm sure inconsistencies would be seen but it seemed able to recall presence/absence of destroyed items, dead monsters etc on subsequent loops around a central obstruction that completely obscured them for quite a while. This did seem pretty odd to me, as I expected it to match how you'd described it.




Yes it definitely is very good for simulating gameplay footage, don't get me wrong. Its input for predicting the next frame is not just the previous frame, it has access to a whole sequence of prior frames.

But to say the model is simulating actual gameplay (i.e. that a person could actually play Doom in this) is far fetched. It's definitely great that the model was able to remember that the gray wall was still there after we turned around, but it's untenable for actual gameplay that the wall completely changed location and orientation.


> it's untenable for actual gameplay that the wall completely changed location and orientation.

It would in an SCP-themed game. Or dreamscape/Inception themed one.

Hell, "you're trapped in Doom-like dreamscape, escape before you lose your mind" is a very interesting pitch for a game. Basically take this Doom thing and make walking though a specific, unique-looking doorway from the original game to be the victory condition - the player's job would be to coerce the model to generate it, while also not dying in the Doom fever dream game itself. I'd play the hell out of this.

(Implementation-wise, just loop in a simple recognition model to continously evaluate victory condiiton from last few frames, and some OCR to detect when player's hit points indicator on the HUD drops to zero.)

(I'll happily pay $100 this year to the first project that gets this to work. I bet I'm not the only one. Doesn't have to be Doom specifically, just has to be interesting.)


Check out the actual modern DOOM WAD MyHouse which implements these ideas. It totally breaks our preconceptions of what the DOOM engine is capable of.

https://en.wikipedia.org/wiki/MyHouse.wad


MyHouse is excellent, but it mostly breaks our perception of what the Doom engine is capable of by not really using the Doom engine. It leans heavily on engine features which were embellishments by the GZDoom project, and never existed in the original Doom codebase.


To be honest, I agree! That would be an interesting gameplay concept for sure.

Mainly just wanted to temper expectations I'm seeing throughout this thread that the model is actually simulating Doom. I don't know what will be required to get from here to there, but we're definitely not there yet.


Or if training the model on many FPS games? Surviving in one nightmare that morphs into another, into another, into another ...


What you're pointing at mirrors the same kind of limitation in using LLMs for role-play/interactive fictions.


Maybe a hybrid approach would work. Certain things like inventory being stored as variables, lists etc.

Wouldn't be as pure though.


Give it state by having a rendered-but-offscreen pixel area that's fed back in as byte data for the next frame.


Huh.

Fun variant: give it hidden state by doing the offscreen scratch pixel buffer thing, but not grading its content in training. Train the model as before, grading on the "onscreen" output, and let it keep the side channel to do what it wants with. It'd be interesting to see what way it would use it, what data it would store, and how it would be encoded.


It's an empirical question, right? But they didn't do it...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: