Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's just so slow for the autocompletion use case to do it like that. Ideally, you're never chaining serial requests to the LLM. Even if you do stuff in all the data into a single prompt, the execution time seems to be superlinear with the number of tokens, again getting super slow.


Yeah I agree it's too slow for autocompletion at the moment, but this would be for full feature implementations, not just autocomplete. For example, if I have a repo I want to add a table and rest api implementation in, it can do this: https://imgur.com/a/mIJvaJr (ignore the formatting errors in the UI, somehow parts of it show up in as code and others not, but api wouldn't have this issue, especially since you can use the system message to enforce output format).

I'm happy to wait even 30-60 seconds for this which I can easily evaluate, criticize (and the model will correct it) and then proceed to just patch and move on. I think the results from this will be much better with the 32k model, but remains to be seen.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: