Hacker News new | past | comments | ask | show | jobs | submit login

I’m not sure how that would help vs just training the model with the conditionings described in the paper.

I’m not very familiar with Gaussian splats models, but aren’t they just a way of constructing images using multiple superimposed parameterized Gaussian distributions, sort of like the Fourier series does with waveforms using sine and cosine waves?

I’m not seeing how that would apply here but I’d be interested in hearing how you would do it.




I'm not certain where it would fit in, but my thinking is this.

There's been a bunch of work on making splats efficient and good at representing geometry. Reading more, perhaps NERFs would be a better fit, since they're an actual neutral network.

My thinking is that if you trained a NERF ahead of time to represent the geometry and layout of the levels, and plug that in to the diffusion model (as a part of computing the latents, and then also on the other side so it can be used to improve the rendering) then the diffusion model could focus on learning how actions manipulate the world without having to learn the geometry representation.


I don’t know if that would really help, I have a hard time imagining exactly what that model would be doing in practise.

To be honest none of the stuff in the paper is very practical, you almost certainly do not want a diffusion model trying to be an entire game under any circumstances.

What you might want to do is use a diffusion model to transform a low poly, low fidelity game world into something photorealistic. So the geometry, player movement and physics etc would all make sense, and then the model paints over it something that looks like reality based on some primitive texture cues in the low fidelity render.

I’d bet money that something like that will happen and it is the future of games and video.


Yeah, I realize this will never be useful for much in practice (although maybe as some kind of client side prediction for cloud gaming? But likely if you could run this in real time you might as well run whatever game there is in real time as well, unless there's some kind of massive world running on the server that's too large to stream the geometry for effectively), I was mostly just trying to think of a way to avoid the issues with fake looking frames or forgetting what the level looks like when you turn around that someone mentioned.

Not exactly that, but Nvidia does something like this already, they call it DLSS. It uses previous frames and motion vector to render a next frame using machine learning.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: