Hacker News new | past | comments | ask | show | jobs | submit | prasoonds's comments login

Really illuminating. Reading the comments, I can't help but think how bad most of us are at predicting the future. While the OP's world might still be somewhat far away, it doesn't seem nearly as far-fetched now as it would have seemed in 2013.


humanity being out-of-work is the premise of Marx's writings and we are not closer to it now than those 170 years ago.


I disagree.

Sonnet 3.6 (the 2022-10-22 release of Sonnet 3.5) is head and shoulders above GPT-4 and anyone who has been using both regularly can attest to this fact.

Reasoning models do reason quite well but you need the right problems to ask them. Don't throw open-ended problems at them. They perform well on problems with one (or many) correct solution(s). Code is a great example - o1 has fixed tricky code bugs for me where Sonnet and other GPT-4 class models have failed.

LLMs are leaky abstractions still - as the user, you need to know when and how to use them. This, I think, will get fixed in the 1-2 years. For now, there's no substitute for hands on time using these weird tools. But the effort is well worth it.


> one (or many) correct solution(s).

> Code is a great example

I’d argue that most coding problems have one truly correct solution and many many many half correct solutions.

I personally have not found AI coding assistance very helpful, but from blog posts by people who do much of the code I see from Claude is very barebones html templates and small scripts which call out to existing npm packages. Not really reasoning or problem solving per se.

I’m honestly curious to hear what tricky code bugs sonnet has helped you solve.

It’s led me down several incorrect paths, one of which actually burned me at work.


This is really cool! Any chance of adding support for the Arc browser? Right now, Chrome allows for some websites (WhatsApp, Spotify, YouTube Music) to be made into "apps" already via PWAs. Arc - which is based on chromium, for some mysterious reason, chooses to not support PWAs so this would be extremely useful for Arc!


I've heard Rust is especially hard for LLMs, what with the need to thinking deeply about variable ownership and lifetimes. I wonder if it will work better if you try o1-mini (as long as you're still providing the right context!)

And I would've thought that AI would be good at creating new routes (wrap this function in a POST route) - certainly I've done this with Python and TS and it's been fine. Maybe it's a Rust specific issue? Likewise, SQL migrations I would've expected to also just work, especially if your schema change is small! Interesting that it doesn't!

Which tool do you use? Cursor or something else?


Also to add some more context:

- I've found LLM codegen working very well with standard React + TS code

- It sucks when using less knows languages or less popular frameworks (I tried reverse engineering github copilot lua code from the neovim extension in one instance and it really didn't work well)

I'd be curious to hear people's experience here.


This is cool! I'm a regular user of Cursor (I barely write any code now - just prompt and tab tab tab).

The thing that's really keeping me in Cursor (vs decamping for Zed) is NOT the Cmd K or the sidebar chat. It's actual Cursor's Tab autocomplete model. I've tried many other tab completion models and Cursor Tab blows them out of the water. Supermaven's new release is promising, however.

For the use-case you're solving for (issues with code-privacy), I think something like https://codeium.com/ does already allow for on-prem deployments with enterprise support. I'm trying to think of who would be served well by a fork of VSCode vs Continue.dev or something like a codeium VPS deployment.


Curious: what is your use case for cursor, i.e. what is the problem domain or complexity level (on a scale from boilerplate code to core architecture) that you typically work with? Also what plan are you on and is that enough for your needs?


Hi, we're working with an large typescript codebase and I develop fully fledged features with Cursor which involve non-trivial UI wiring up work. Decent amount of state management using React too.

It's definitely not boilerplate level code - but it also also stops well short of architecture level work. Here's a recent prompt of me trying to implement line-by-line diff generation: https://pastebin.com/WRJpNwqc

When I need to do architectural/design work, I do it in multiple passes, not generating any code, just lots detailed text with tons of feedback/back-and-forth with the model.

I'm on the pro plan but it's not enough for my needs - I run through the quota in 10-15 days. Then, I revert to using Sonnet 3.5 via API key.


Big fan of Codeium here. They're the only player who took vim/neovim users seriously from the start, and it's great to see people building extensions like monkoose/neocodeium. Their models or ux may not always be the best available, but their understanding of their target market and their product positioning is peak IMO, particularly the enterprise/on-prem options you mentioned. My money's on them long-term.


Yeah, I have not given enough time and did so yesterday; it suggests new parameters for example, so I only have to begin a "refactoring" and then tab tab... Vim guys are missing out.


Here’s an audio version if anyone prefers listening instead

https://drive.google.com/file/d/1DLtJAShDN54ooPWpBspHPV98rh2...


We've made an open source fork of Jupyter - kind of like Cursor but for Jupyter.

See GH: https://github.com/pretzelai/pretzelai/

You can install it with pip install pretzelai (in a new environment preferably) - then run it with pretzel lab. You can bring your own keys or use the default free (for now) AI server.

We also have a hosted version to make it easy to try it out: https://pretzelai.app

Would love to get your feedback!


Can it reliably generate interspersed blocks of markup and code with a single prompt?


Hmm do you mean you want to create multiple cells from a single prompt - some code cells, then some markdown cells, then some code cells and so on?

The sidebar can certainly produce code mixed with markdown but right now, we process the markdown and show visually.

https://imgur.com/a/bpYu8yN

The cell level Cmd + K shortcut only works on a given cell to create or edit code and fix errors. Just tested it and it generates markdown well (just start your prompt with "this is a markdown cell")

https://imgur.com/VuDciQN

In the sidebar/chat window, it should be trivial to not parse the markdown and just show it raw. I'll work on it. In the main notebook, it's a bit harder but we are planning to allow multi-cell insertions but it will probably take 2-3 weeks.


Yeah the golden goose for me personally is the ability to say "create a jupyter notebook about x topic" and have an LLM spit out interspersed markdown (w/ inline latex) and python cells. It would be really cool if the LLM was good at segmenting those chunks and drawing stuff/evaluating output at interesting points. Quick example to illustrate the idea:

https://imgur.com/04FUp9s

I find Cursor to be extremely good right up to that point - I can work with Jupyter via the VS code extension and quickly get mixed markdown like how you're describing now - but it cannot do the multi-cell output or intelligent segmenting described above. I currently split it apart myself from the big 'ol block of markdown output.


This is something we've experimented with and I know some other tools out there claim to do this, I've just found that there's a very simple issue with this: if the AI gets any step wrong, every subsequent step is wrong and then you have to review every bit of code/markdown bit by bit, and it ends up turning into more work than just doing the analysis step by step while guiding the AI. I'm optimistic that this will change over time as the AI gets better, but it's still quite fragile (although it demos really well...)


So if you had 3 markdown cells and 3 python cells, I would design the tool to pull all the content out of those cells and present it (sans all that ipynb markup, just contents, probably in markdown) to the model as the full context for every edit you want to make. So the tool would need to know how to transform a given notebook into a collection of markdown/python cells which it would present to the model to make edits. The model would need to return updated cells in the same format, and the tool would update the cells in the document (or just replace them entire with new cells from the response). I would be fine with this just blowing away all previous evaluation results.

Do you think that approach would work? Not sure if I'm misunderstanding the issue you're describing and I recognize it is likely much messier than I imagine.


This is something we're planning on doing - just generate a large bit of text with markdown text and code in the middle. This is actually how the newer models already generate code - with the only difference being there's only one code block.

Via the use of <thinking></thinking> blocks, it's pretty straightforward to get the the model to evaluate it's own work and plan the next steps (basically chain of thought) but then you can filter out the <thinking> block in the final output.

The last trick to making this actually work is to give the AI model evaluation power - make it be able to run certain inspection code to evaluate its decisions so far and feel that evaluation to the next set of steps.

Combining all of this, it's very possible to convert an AI chat into a multi-step markdown + code notebook that actually works.


I see, interesting. Hadn't come upon this use-case before but makes sense.

I've made a GitHub issue for this feature: https://github.com/pretzelai/pretzelai/issues/142

If you'd like to be updated when we have this feature in, please leave a comment on the issue. Alternatively, my email is in my bio - feel free to email me so that when we have this feature, we can send you an update!


Horrors beyond comprehension.


I wonder if there’s a devbox-as-a-service tool out there. I use a MacBook Air for most of my work and on occasion would be benefited by using a beefier machine in the cloud. I just don’t want to set up a machine, set up sync etc.


You could just rent a beefy server for like $40/month at hetzner or OVH and use VS Code with the remote development extension.


Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: