Doom system requirements: - 4 MB RAM - 12 MB disk space Stable diffusion v1 > 86...

snickmy · on Aug 28, 2024

Those are valid points, but irrelevant for the context of this research.

Yes, the computational cost is ridicolous compared to the original game, and yes, it lacks basic things like pre-computing, storing, etc. That said, you could assume that all that can be either done at the margin of this discovery OR over time will naturally improve OR will become less important as a blocker.

The fact that you can model a sequence of frames with such contextual awareness without explictly having to encode it, is the real breakthrough here. Both from a pure gaming standpoint, but on simulation in general.

tobr · on Aug 28, 2024

I suppose it also doesn't really matter what kinds of resources the game originally requires. The diffusion model isn't going to require twice as much memory just because the game does. Presumably you wouldn't even necessarily need to be able to render the original game in real time - I would imagine the basic technique would work even if you used a state of the Hollywood-quality offline renderer to render each input frame, and that the performance of the diffusion model would be similar?

godelski · on Aug 28, 2024

Well the majority of ML systems are compression machines (entropy minimizers), so ideally you'd want to see if you can learn the assets and game mechanics through play alone (what this paper shows). Better would be to do so more efficiently than that devs themselves, finding better compression. Certainly the game is not perfectly optimized. But still, this is a step in that direction. I mean no one has accomplished this before so even with a model with far higher capacity it's progress. (I think people are interpreting my comment as dismissive. I'm critiquing but the key point I was making was about how there's likely better architectures, training methods, and all sorts of stuff to still research. Personally I'm glad there's still more to research. That's the fun part)

pickledoyster · on Aug 28, 2024

>you could assume that all that can be either done at the margin of this discovery OR over time will naturally improve OR will become less important as a blocker.

OR one can hope it will be thrown to the heap of nonviable tech with the rest of spam waste

godelski · on Aug 28, 2024

I'm not sure what you're saying is irrelevant.

1) the model has enough memory to store not only all game assets and engine but even hundreds of "plays".

2) me mentioning that there's still a lot of room to make these things better (seems you think so too so maybe not this one?)

3) an interesting point I was wondering to compare current state of things (I mean I'll give you this but it's just a random thought and I'm not reviewing this paper in an academic setting. This is HN, not NeurIPS. I'm just curious ¯ \ _ ( ツ ) _ / ¯)

4) the point that you can rip a game

I'm really not sure what you're contesting to because I said several things.

  > it lacks basic things like pre-computing, storing, etc.

It does? Last I checked neural nets store information. I guess I need to return my PhD because last I checked there's a UNet in SD 1.4 and that contains a decoder.

snickmy · on Aug 28, 2024

Sorry, probably didn't explain myself well enough

1) yes you are correct. the point i was making is that, in the context of the discovery/research, that's outside the scope, and 'easier' to do, as it has been done in other verticals (ie.: e2e self driving)

2) yep, aligned here

3) I'm not fully following here, but agree this is not NeurIPS, and no Schmidhuber's bickering.

4) The network does store information, it just doesn't store a gameplay information, which could be forced, but as per point 1, it is , and I think it is the right approach, beyond the scope of this research

godelski · on Aug 28, 2024

1) I'm not sure this is outside scope. It's also not something I'd use to reject a paper were I to review this in a conference. I mean you got to start somewhere and unlike reviewer 2 I don't think any criticism is rejection criteria. That'd be silly since lack of globally optimal solutions. But I'm also unconvinced this is proven my self-driving vehicles but I'm also not an RL expert.

3) It's always hard to evaluate. I was thinking about the ripping the game and so a reasonable metric is a comparison of ability to perform the task by a human. Of course I'm A LOT faster than my dishwasher at cleaning dishes but I'm not occupied while it is going, so it still has high utility. (Someone tell reviewer 2 lol)

4) Why should we believe that it doesn't store gameplay? The model was fed "user" inputs and frames. So it has this information and this information appears useful for learning the task.

danielmarkbruce · on Aug 28, 2024

Is it a breakthrough? Weather models are miles ahead of this as far as I can tell.

dTal · on Aug 28, 2024

>What's also interesting about this work is it's basically saying you can rip a game if you're willing to "play" (automate) it enough times and spend a lot more on storage and compute

That's the least of it. It means you can generate a game from real footage. Want a perfect flight sim? Put a GoPro in the cockpit of every airliner for a year.

phh · on Aug 28, 2024

> Want a perfect flight sim? Put a GoPro in the cockpit of every airliner for a year.

I guess that's the occasion to remind that ML is splendid at interpolating, but extrapolating, maybe don't keep your hopes too high.

Namely, to have a "perfect flight sim" using GoPros, you'll need to record hundreds of stalls and crashs.

godelski · on Aug 29, 2024

  > Want a perfect flight sim? Put a GoPro in the cockpit of every airliner for a year.

You're jumping ahead there and I'm not convinced you could do this ever (unless you're model is already a great physics engine). The paper itself has feeds the controls into the network. But a flight sim will be harder better you'd need to also feed in air conditions. I just don't see how you could do this from video alone, let alone just video from the cockpit. Humans could not do this. There's just not enough information.

dTal · on Aug 29, 2024

There's an enormous amount of information if your GoPro placement includes all the flight instruments. Humans can and do predict aircraft state t+1 by parsing a visual field that includes the instruments; that is what the instruments are for.

camtarn · on Aug 28, 2024

Plus, presumably, either training it on pilot inputs (and being able to map those to joystick inputs and mouse clicks) or having the user have an identical fake cockpit to play in and a camera to pick up their movements.

And, unless you wanted a simulator that only allowed perfectly normal flight, you'd have to have those airliners go through every possible situation that you wanted to reproduce: warnings, malfunctions, emergencies, pilots pushing the airliner out of its normal flight envelope, etc.

isaacfung · on Aug 28, 2024

The possibility seems far beyond gaming(given enough computation resources).

You can feed it with videos of usage of any software or real world footage recorded by a Go Pro mounted on your shoulder(with body motion measured by some sesnors though the action space would be much larger).

Such a "game engine" can potentially be used as a simulation gym environment to train RL agents.

dvngnt_ · on Aug 28, 2024

wouldnt make more sense to train using microsoft flight simulator the same way they did DOOM, but im not sure what the point is if the game already exists