Hacker Newsnew | past | comments | ask | show | jobs | submit | sudb's commentslogin

We last submitted a SWE-Bench verified result in November 2024 - at the time I believe we were in the top 5 entrants.

We expect Engine to be as good as the other code-writing agents out there at the moment - we understand almost everyone in the space to be using very similar base models and agent scaffolding.


the closest I could get to getting your LLM to identify itself was as LaMDA, which makes me think this is probably a Gemma model - am I close?


Haha no, quite off


Looks cool! If you're able to say - where/how do you run these virtual desktop instances?


It's on our production AKS cluster :) So fully scalable.

By the way, we've launched on ProductHunt btw! If you're interested in giving us an upvote (same for anyone else here!)

https://www.producthunt.com/posts/cyberdesk


I know of https://modal.com/, which I believe is used by Codegen and Cognition.

Anecdotally-speaking, I hear that many companies in the LLM agent space roll their own sandbox solutions - I've heard of both Firecracker- and Kubernetes-based implementations.


I use this for work - but there are edge cases all over the place that I keep running into (e.g. Yarn being installed on Github-hosted runners, but not self-hosted ones or act - https://github.com/actions/setup-node/issues/182)

Apart from that it's been quite good!


Same experience here. Edge cases everywhere, though most can be worked around.

You can specify different runners to use. The default images are a compromise to keep size down. There is a very large image that tries to include everything you might want. I would suggest trying that if you don’t mind the very large (15GB IIRC) image.


I definitely remember considering the larger images - I think we ended up not using them since my work's usecase for act is running user github workflows on-demand on temporary VMs. The hope was that most usage is covered by the smaller images - and in fairness that has been true so far.


I worked on this! Happy to answer any questions anyone has.


I had a problem recently trying to send LLM-generated text between two web servers under my control, from AWS to Render - I was getting 403s for command injection from Render's Cloudflare protection which is opaque and unconfigurable to users.

The hacky workaround which has been stably working for a while now was to encode the offending request body and decode it on the destination server.


I've been a customer of A&A for a few years - I had the same reticence at first but thought I'd try them anyway. The only time I've come close to the 1TB monthly usage quota is this month, entirely because (for work) I had to download a very large number of docker images, in addition to a normal usage of 500-750GB. I think it helps that some unused portion of the usage quota gets rolled over too. Out of curiosity, what are you doing that would make you regularly exceed 1TB?


I don't think I'm doing anything crazy, the odd game from steam, some ML models from huggingface, running a Plex server for myself and friends/family. I checked my usage and regularly exceed 1TB every month, so that's why I was put off.

I've been lucky that a new ISP came to my area and I can avoid Openreach infrastructure completely, they've put a FTTH line directly into my house and offer up to 8Gbps symmetrical


This youtube video by No Boilerplate (https://www.youtube.com/watch?v=XcF6tvepRlg) was a great summary and really got my interest up initially but I personally got bogged down quite quickly by the prospect of having to learn Morse Code


You don't have to learn morse code to pass the initial HAM test.


In the US, I don't believe morse is required for any tier of license anymore. For sure, I have my general and don't know it.


Yes, it's any ham test in the U.S.

The sw code (morse) portion was eliminated back in the 2000s.


Or any ham test in the united states (note: learning Morse code is worth doing anyway)


Turso's SQLite fork libSQL[1] has an extension/improvement that adds the ability to alter columns and drop constraints (albeit not via a DROP CONSTRAINT clause). I'm not affiliated, but have been using libSQL recently and am finding it to be a very pleasant experience.

Although conceptually I agree that SQLite's limited type system is frustrating, if your usecase allows, an ORM might help with not having to think about it or touch it directly.

[1] https://github.com/tursodatabase/libsql/tree/main


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: