Hacker Newsnew | past | comments | ask | show | jobs | submit | tom_0's commentslogin

Yeah, there's a toggle to type you can switch at any time, it actually lowers latency.


Thanks :)


Hah from my knowledge of traditional AAA, there is 0 chance any AAA in development right now uses LLMs. A lot of them don't even use it for coding and gamedevs' mood about AI is abysmal.


Let me just remind you that Microsoft owns the elder scrolls franchise now, for better or worse.


I know, but it's a bit of an unstoppable force vs immovable object situation unless something changed. If they do it, I hope it'll be better than Copilot integrations :)


We want to gamify prompt hacking and give people an UI to add/remove chunks of the system prompt. It'll be unlocked by collecting widgets around the place.


It's a stylistic choice for sure. A little better than that is straight in uncanny valley, and human-level is too high latency and too expensive for us. We found that this level of crappy works great, in practice, plus it runs on-device! We use Rhasspy Piper to generate them.


I would personally avoid voices that skew too close to common tiktok TTS ai. Currently the heavy robots with the lower bassier voices sell that clunky robot voice vibe much better, but some of the more generic voices immediately take me out.


Unfortunately, they are close because some of them ARE tiktok AI voices you heard! I'm working on hiring VAs to make custom datasets, though. We'll have our own unique voices by 1.0 for sure.


Hey, Tommaso here, I'm one of the founders of the Robotopia studio. I didn't expect to see this here! Ask me anything :)


Do you have a budget per-player of cloud usage? What happens if people really like the game and play it so much it starts getting expensive to keep running? I guess at $0.79 / Mtok llama70B is pretty affordable, but a per-player opex seems hard to handle without a subscription model.


Our initial plan was to simply ask enough for the game that the price would cover the costs on average... but that means that we're basically encouraged to have people play the game as little as possible? We're looking into some kind of subscription now, it sounds weird but I do think it's a better incentive in this case. Plus we can actually ask for less upfront.


This is fantastic. I think it’s nailed in the substack what was missing from a lot of these LLM driven NPCs that did not feel authentic. I have a couple of follow-up questions on specifics relating to analysis of behaviour with LLMs (in game-dev myself). Would it be possible to speak to you directly on them?


Thanks :) If you want I'm on the discord linked on our landing page, it's fun stuff to talk about!


Amazing! Thanks will join.


Do you think there's a path where you can pregenerate popular paths of dialogue to avoid LLM inference costs for every player? And possibly pair it with a lightweight local LLM to slightly adapt the responses? While still shelling out to a larger model when users go "off the rails"?


Not the founder, but having run conversational agents at decent scale, I don't think the cost actually matters much early on.

It's almost always better to pay more for the smarter model, than to potentially give a worse player experience.

If they had 1M+ players there would certainly be room to optimize, but starting out you'd certainly spend more trying engineer the model switcher than you would save in token costs.


I agree, trying to save on costs early on is basically betting against things getting better. Not only that but in almost every case people prefer the best model they can get!

Not only that but I think our selling point is rewarding creativity with emergent behavior. I think baked dialogue would turn into traditional game with worse writing pretty quick and then you got a problem. For example, this AI game here does multiple choices with a local model and people seem a bit mild about it.

We could use it to cache popular QA, but in my experience humans are insane and nobody ever says even remotely similar things to robots :)

[1] https://store.steampowered.com/app/2828650/The_Oversight_Bur...


Hey! Robotopia looks awesome, I'm excited to try it out when it launches. How do you convert the LLM output to actions? Is there more broad actions available (ie like creating any object, moving anything anywhere) exposed to the LLM or is it more specific tools it can call?


Thanks :) It may sound insane but we convert actions to Python functions then ask the LLM to write a python script that actually runs in IronPython inside the game. Then we have a visual Behavior Tree system to let our designer define the actions. So yeah, they got a bunch of general actions like walk, talk, follow, interact etc.

PS: I think MCP/Tool Calls are a boondoggle and LLMs yearn to just run code. It's crazy how much better this works than JSON schema etc.


uhhh... you're running generated code on your customers' PCs? what kind of sandboxing do you have?


Fair reaction tbh. Right now there's a time watchdog + I'm entirely disabling all I/O and import, But going forward I want to replace it with a proper sandboxing tech... things I looked into are V8 isolates, compilation to WASM, implementing our own gutted python interpreter, spinning up a locked down process, and others. I'm definitely aware of the risk here. The good news is that unless we get pwned, LLMs are very unlikely to write malicious code for the user.


>...LLMs are very unlikely to write malicious code for the user.

Do you have any idea what the actual probability is? Because if millions of people start using the system, 'very unlikely' can turn into 'virtual certainty' pretty quickly.


yikes


This has insanely incredible potential for language learning. Do you plan to implement support for additional languages?


Yes, but every language is going to be a "port", not something contracted out like traditional localization. I haven't decided how exactly but language conversion will land somewhere between these two extremes: 1. (expensive) pick a suite of "native" models (eg. models from China), TTS, ASR. Rewrite all the prompts in the target language. Revalidate all characters by hand 2. (cheap) slap a translation model around input and output and let the game run in English internally. My gut feeling is that this could have very poor results though and increase latency.

It's definitely a research project, this has never been done before.


Are the LLMs run on-device, or does this use cloud compute?

(Off-topic AMA question: Did you see my voxel grid visibility post?)


The "big" one is Llama3.3-70b on the cloud, right now. On GroqCloud in fact, but we have a cloud router that gives us several backups if Groq abandoned us.

We use a ton of smaller models (embeddings, vibe checks, TTS, ASR, etc) and if we had enough scale we'll try to run those locally for users that have big enough GPUs.

(You mean the voxel grid visibility from 2014?! I'm sure I did at the time... but I left MC in 2020 so don't even remember my own algorithm right now)


Shipping GPU-accelerated ML models in games looks difficult, are there any major examples other than vendor-locked upscaling like DLSS or FSR?

(Yep! https://cod.ifies.com/voxel-visibility/ )


Yeah it's extremely difficult right now, especially for a Windows game that can't have players install Pytorch and the Cuda Toolkit!

ONNX and DirectML seem sort of promising right now, but it's all super raw. Even if that worked, local models are bottlenecked by VRAM and that's never been more expensive. And we need to fit 6gb of game in there as well. Even if _that_ worked, we'd need to timeslice the compute inside the frame so that the game doesn't hang for 1 second. And then we'd get to fight every driver in existence :) Basically it's just not possible unless you have a full-time expert dedicated to this IMO. Maybe it'll change!

About the voxel visibility: yeah that was awesome, I remember :) Long story short MC is CPU-bound and the frustum clippings' CPU cost didn't get paid off by the reduced overdraw, so it wasn't worth it. Then a guy called Jonathan Hoof rewrote the entire thing to be separated in a 360° scan done on another thread when you changed chunk + a in-frustum walk that worked completely differently, and I don't remember the details but it did fix the ravine issue entirely!


GGML is another neat ML abstraction layer, but I don't think much work has been dedicated to the Windows port.


GGML still runs on llama.cpp, and that still requires CUDA to be installed, unfortunately. I saw a PR for DirectML, but I'm not really holding my breath.


You don't have to install the whole CUDA. They have a redistributable.


Oh, I can't believe I missed that! That makes whisper.cpp and llama.cpp valid options if the user has Nvidia, thanks.


Whisper.cpp and llama.cpp also work with Vulkan.


Yeah, I researched this and I absolutely missed this whole part. To my defense I looked into this in 2023 which is ages ago :) Looks like local models are getting much more mature.


Hey, generalizing the answer like that is nice, the flood fill is definitely an approximation of that. However keep in mind that the flood fill is run only once, not per frame, so you have to check that "any ray from face A can exit through any point of face B", so I'm not sure it is easier to compute. I'm sure there is some way to get closer to this than a flood fill, but for 0.9 it had to be good enough!


For the actual occlusion test, you have a "don't go backwards" rule. If you have to go backwards to reach a chunk, you know it's not visible along that path (but may be on a different path, which you will eventually find). Without having thought about it a whole lot, does that work for the inside-a-chunk tests too? If you have to traverse an edge facing back towards the face you're coming from, does that mean there's no visible path from the first edge to the second? And if so, would that give a meaningful reduction in connectivity?


There's a problem with that I think. If I understand correctly, the inside-a-chunk tests are filled directionless, so this wouldn't work.

Basically it performs a flood fill on contiguous groups of all the transparent blocks in the chunk. If any group touches two sides they are considered to be able to see each other. Keep in mind this is precomputed and doesn't have a "looking-at" direction.

Perhaps you could perform this test six times, one for each face. Then you would have a direction.


Yeah, see my reply to myself, I beat you by a few minutes =) Whether 6X the searches is worth it or not is something only testing can answer, I think.


Partly answering my own question after mulling it over, the difference in the pre-pass is that you're not actually coming "from" an edge or going "to" one when you're doing the flood fills, you're just testing which edges are connected to any given empty cell. But I still think there might be something to this idea. The simplest thing I can think of is to do no-backwards searches out from all the empty cells on each edge and see which other edges you reach. That's potentially 6x as many searches (slightly smaller and with some early outs since the connectivity is bi-directional), but at least it's still finite and predictable and isn't who knows how many raycasts. I don't know if that's acceptable for the pre-pass or not. But I also think there might be smarter ways to do it that only require a single search from each empty cell but remember directions traversed so you know if a particular path you've reached an edge along implies visibility or not.

Maybe you can do the 6 searches in parallel, such that whenever they meet you know you've found a connection between the two edges they were coming from? I don't know if that means any less work except in cases where all 6 edges are trivially connected.


That's a fair point. The interior shape of most of the chunks you're handling is mostly convex anyways, so flood-fill's conservative visibility determinations will usually be equal to the precise numbers.


Hey, Tommaso here. I've thought about using a depth prepass, however that really helps only if you have heavy shaders, at the cost of basically doubling the vertex shader load. Given that our terrain shader is an one liner which returns tex*color, and that we have A LOT of vertices, it's not very convenient. The poly count at max render distance (224 blocks) can be anywhere between 300K in plains and 900K polygons in jungles, minus the savings of the culling in the post. Apart from jungles which kill everything though, the major bottleneck is now alphatesting and most devices will run at 60fps if you turn that off.


Hello Tommaso, I developed something VERY similar to that algorithm back in 2007 (C++, software rendering), and then again in 2012 (in Java for Android, with GLES).

I got some of your issues fixed back then. Ravines, for instance. You might want to take a look:

(3-clause BSD): https://github.com/TheFakeMontyOnTheRun/derelict/blob/master...

(GPLv2): https://garage.maemo.org/plugins/scmsvn/viewcvs.php/angstron...

Don't hesitate to get into touch. I will gladly explain anything.

I'm not looking for money. I'm just trying to help. I always felt bad for this algorithm to just gather dust. That post of you made my day.


Thanks for the reply. That makes sense; my scenes are probably 100-200k, but my fragment shaders are much more complex so the depth pre-pass makes a huge difference.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: