Hacker Newsnew | past | comments | ask | show | jobs | submit | more snickell's commentslogin

I really like "agent assisted coding". I think the word "vibe" is gonna always swing in a yolo direction, so having different words is helpful for differentiating fundamentally different applications of the same agentic coding tools.


abbreviation ass.coding.


This is a brilliant idea, I really hope somebody on the iPhone 18 design team reads it. I think there’s a huge pent up demand for a mini model, many of us would pay more for it than the large versions.


On an iPhone 12 mini, wishing I hadn’t upgraded to iOS 26 because now my phone is notably laggy. Word to the wise. I use swiping for input and would consider it now unusable due to extreme lag.

The physical aspect I can’t give up is I can hold the phone with my thumb on the bottom and my middle finger on the top and scroll with my index finger to read. Wish I could buy that capability on a new iPhone, maybe one even slightly smaller.

Time to go find out if there’s even a way to downgrade, oof this is slow.


Yeah, the last supported OS upgrade for iPhones really makes them dog slow. (no idea if this is actually the last for the 12 series)


The distance between earth and mars varies between 150 and 2000 light seconds.


But carpeting that distance across the entire volume of space between the planets with data centers every few light-seconds apart seems ambitious. A hundred or more data centers in space?

> throw tons of datacenter and compute that's anywhere more than a few light-seconds from the nearest existing datacenter

I think I'm misinterpreting the comment.


It's like the transatlantic internet cable. One really beefy interconnect is more than enough for two halves of the planet to talk.

We wouldn't need to blanket the solar system in data centers to be able to communicate with other planets. We would only need enough connections so that no matter where in their respective orbits they are, there is a line of radio "sight" that is clear enough for high bandwidth communications to work.

I don't have access to the specifics, but I imagine something between 5 and 10 satellite data centers orbiting the sun in between earth and mars would be enough to maintain communications with minimum delay regardless of when in the solar year the comms take place.


At their maximum separation, Mars & Earth are about 20 minutes apart. If we had 10 satellite data centers all in perfect alignment (disregarding the sun, which obviously makes a hash out of things) they'd still each be 2 minutes apart.

Once you take into consideration the sun, plus the fact that the you'd need to cover the full disk to keep all data centers within a few minutes of another one in an unbroken chain back to both planets, I just don't get the math involved here.

But, I'm also terrible at both math and visualization, so I readily concede I may be missing something obvious.


Think of it more like 3 circles.

The inner circle has Earth's orbit in it. The outer circle is Mar's orbit.

The middle circle would be a ring of relatively stationary satellites in between them.

And in the center of all 3 circles is the Sun, which will not allow radio signals to pass through.

I drew a crappy illustration to demonstrate: https://ibb.co/tP2rkzS0

When Mars and the Earth are on opposite sides of the sun, a satellite ring can transmit around the sun and keep the communication lines open.

Having a ring of relay satellites gives you a set distance to transmit from Mars. The satellites can then transmit their received data from the one that is closest to Mars to the one that is closest to Earth, which would then send the data to Earth.

This is helpful for a variety of reasons, but the most important one is that with this setup, even when the Sun is in between Earth and Mars, you could still send data around the sun.

Constant communication, no communications breakdowns. Even if 1 satellite failed for some reason, a bit of maneuvering would allow the others to backfill the gap until it could be repaired or replaced.

Even when Earth and Mars are close together, it would still be smart to use the relay so that the power levels are easily calculated and maintained.


That makes sense. I guess I was hung up on “a few light seconds” since that’s more like, what, 5-10 minutes per hop?


Data will travel at the speed of light within a margin of error, so "a few light seconds" means "a few seconds".

There will be some lag as each satellite would need to cache the data before retransmitting, and it would need to store that data for a short period as well in case of failure, so assume that it would double the time for each stop under ideal locations, so to get information 4 light seconds away would take approx 8 seconds, and a minimum of 16 seconds for a response assuming they started replying the instant they received.


I think many open source projects already experience two buckets of contributors which maps nicely to the two class distinction inherent in this model:

1) a bunch of people who contributed one or two PRs, but it took the maintainers more time to review/merge the PR than the dev time contributed

2) a much smaller set of people who come back and do more and more PRs, eventually contributing more time than it takes to review their work

A major existing reason to review PRs from class 1 "once or twice" contributors (perhaps the main reason?) is that all class 2 "maintainer-level" contributors start as class 1.

I agree there's an awkward middle ground here, now you have to define where the boundary is between class 1 and class 2, but I think if you were able to graph contribution level you'd find there's already something of a bimodal distribution naturally in many projects anyway.


If you want to try what Karpathy is describing live today, here's a demo I wrote a few months ago: https://universal.oroborus.org/

It takes mouse clicks, sends them to the LLM, and asks it to render static HTML+CSS of the output frame. HTML+CSS is basically a JPEG here, the original implementation WAS JPEG but diffusion models can't do accurate enough text yet.

My conclusions from doing this project and interacting with the result were: if LLMs keep scaling in performance and cost, programming languages are going to fade away. The long-term future won't be LLMs writing code, it'll be LLMs doing direct computation.


What scares me is that the obvious pool of money to fund the deficit in the cost of operating of LLMs comes from the most subtle native advertising imaginable. Can you resist ads where, say, AirBnB pays OpenAI privately to “dope” the o3 hyperspace such that AirBnB is moved imperceptibly closer to tokens like value and authentic??

How much would AirBnB pay for the intelligence everyone gets all their info from having a subtle bias like this? Sliiightly more likely to assume folks will stay in airbnbs vs a hotel when they travel, sliiightly more likely to describe the world in these terms.

How much would companies pay to directly, methodically and indetectably bias “everyone’s most frequent conversant” toward them?


> Can you resist ads where, say, AirBnB pays OpenAI privately to “dope” the o3 hyperspace such that AirBnB is moved imperceptibly closer to tokens like value and authentic??

This would be a very impressive technical feat


Anthropic demoed something similar with Golden Gate Claude a year ago:

https://www.anthropic.com/news/golden-gate-claude


I use AI heavily in my own programming, so I’m not against, but I suspect this “as much as” is mostly copilot doing “tab completion” style autocompletions, not AI writing and modifying functions on its own.


This is a really interesting project, and a great read. I learned a lot. I'm falling down the rabbit hole pretty hard reading about the "Leap" algorithm (https://www.usenix.org/system/files/atc20-maruf.pdf) it uses to predict remote memory prefetches.

It's easy to focus on libgraft's SQLite integration (comparing to turso, etc), but I appreciate that the author approached this as a more general and lower-level distributed storage problem. If it proves robust in practice, I could see this being used for a lot more than just sqlite.

At the same time, I think "low level general solutions" are often unhinged when they're not guided by concrete experience. The author's experience with sqlsync, and applying graft to sqlite on day one, feels like it gives them standing to take a stab at a general solution. I like the approach they came up with, particularly shifting responsibility for reconciliation to the application/client layer. Because reconciliation lives heavily in tradeoff space, it feels right to require the application to think closely about how they want to do it.

A lot of the questions here are requesting comparison's to existing SQLite replication systems, the article actually has a great section on this topic at the bottom: https://sqlsync.dev/posts/stop-syncing-everything/#compariso...


Thank you! I'm extremely excited and interested to explore applying Graft to solutions outside of SQLlite/SQLSync. That was a driving factor behind why I decided to make it more general. But you're absolutely right, I'm glad I spent time developing use cases first and then worked backwards to a general solution. I made a lot of mistakes in the process that I wouldn't have seen if I had gone the other way.

And yea, I fell pretty far down the "Leap" rabbit hole. It's a fun one :)


This is the smoothest tom sawyer move I've ever seen IRL, I wonder how many people are now grinding out your GTK4 port with our favorite LLM/system to see if it can. It'll be interesting to see if anyone gets something working with current-gen LLMs.

UPDATE: naive (just fed it your description verbatim) cline + claude 3.7 was a total wipeout. It looked like it was making progress, then freaked out, deleted 3/4 of its port, and never recovered.


>> This is the smoothest tom sawyer move I've ever seen IRL

That made me laugh. True, but not really the motivation. I honestly don't think LLMs can code significant real-world things yet and I'm not sure how else to prove that since they can code some interesting things. All the talk about putting programmers out of work has me calling BS but also thinking "show me". This task seems like a good combination of simple requirements, not much documentation, real world existing problem, non-trivial code size, limited scope.


I agree. I tried something similar: a conversion of a simple PHP library from one system to another. It was only like 500 loc but Gemini 2.5 completely failed around line 300, and even then its output contained straight up hallucinations, half-brained additions, wrong namespaces for dependencies, badly indented code and other PSR style violations. Worse, it also changed working code and broke it.


Try asking it to generate a high-level plan of how it's going to do the conversion first, then to generate function definitions for the new functions, then have it generate tests for the new functions, then actually write them, while giving it the output of the tests.

It's not like people just one-shot a whole module of code, why would LLMs?


> It's not like people just one-shot a whole module of code, why would LLMs?

For conversions between languages or libraries, you often do just one-shot it, writing or modifying code from start to end in order.

I remember 15 years ago taking a 10,000 line Java code base and porting it to JavaScript mostly like this, with only a few areas requiring a bit more involved and non-sequential editing.


I think this shows how the approach LLMs take is wrong. For us it's easy because we simply sort of iterate over every function with a simple prompt of doing a translation, but are yet careful enough taking notes of whatever may be relevant to do a higher level change if necessary.

Maybe the mistake is mistaking LLMs as capable people instead of a simple, but optimised neuron soup tuned for text.


So, you didn't test it until the end? or did you have to build it in such a way that is was partially testable?


One of the nifty things about the target being JavaScript was that I didn’t have to finish it before I could run it—it was the sort of big library where typical code wouldn’t use most of the functionality. It was audio stuff, so there were a couple of core files that needed more careful porting (from whatever in Java to Mozilla’s Audio Data API, which was a fairly good match), and then the rest was fairly routine that could be done gradually, as I needed them or just when I didn’t have anything better to focus on. Honestly, one of the biggest problems was forgetting to prefix instance properties with `this.`


I know many people who can and will one-shot a rewrite of 500 LOC. In my world, 500 LOC is about the length of a single function. I don't understand why we should be talking about generating a high level plan with multiple tests etc. for a single function.

And I don't think this is uncommon. Just a random example from Github, this file is 1800 LOC and 4 functions. It implements one very specific thing that's part of a broader library. (I have no affiliation with this code.)

https://github.com/elemental/Elemental/blob/master/src/optim...


> I don't understand why we should be talking about generating a high level plan with multiple tests etc. for a single function.

You don't have to, you can write it by hand. I thought we were talking about how we can make computers write code, instead of humans, but it seems that we're trying to prove that LLMs aren't useful instead.


If we have to break the problem into tiny pieces that can be individually tested in order for LLMs to be useful, I think it clearly limits LLM usability to a particular niche of programming.


> If we have to break the problem into tiny pieces that can be individually tested

Isn't this something that we should have doing for decades of our own volition?

Separation of concerns, single responsibility principle, all of that talk and trend of TDD or at the very least having good test coverage, or writing code that at least can be debugged without going insane (no Heisenbugs, maybe some intermediate variables to stop on in a debugger, instead of just endless chained streams, though opinions are split, at least code that is readable and not 3 pages worth per function).

Because when I see long bits of code that I have to change without breaking anything surrounding them, I don't feel confident in doing that even if it's a codebase I'm familiar with, much less trust an AI on it (at that point it might be a "Hail Mary", a last ditch effort in hoping that at least the AI can find method in the madness before I have to get my own hands dirty and make my hair more gray).


You don't have to, the LLM will.


No, it's simply being demonstrated that they're not as useful as some claim.


By saying "why do I have to use a specific technique, instead of naively, to get what I want"?


"Why do I have to put in more work to use this tool vs. not using it?"


Which is exactly what I said here:

https://news.ycombinator.com/item?id=43537443


Isn't that basically the process some "thinking" models try to do for you under the hood? Prompting itself to improve your prompt. I actually have no idea but this is what I guessed it did when using it.


They do some variant of this, but this is more directed. They might not do this on their own, they might follow other lines of reasoning. Maybe more effective, maybe less.


Only 500 lines? That's miniscule.


Did you paste it into the chat or did you use it with a coding agent like Cline?

I am majorly impressed with the combination VSCode + Cline + Gemini

Today I had it duplicate an esp32 proram from UDP communication to TCP.

It first copied the file ( funnily enough by writing it again instead of just straight cp ) Then it started to just change all the headers and declarations Then in a third step it changed one bigger function And in the last step it changed some smaller functions

And it reasoned exactly that way "Let's start with this first ... Let's now do this .... " until is was done


I’ve just moved from expensive claudecode to cursor and Gemini - what are you thoughts on cursor vs cline?

Thank you


> I honestly don't think LLMs can code significant real-world things yet and I'm not sure how else to prove that since they can code some interesting things

In my experience it seems like it depends on what they’ve been trained on

They can do some pretty amazing stuff in python, but fail even at the most basic things in arm64 assembly

These models have probably not seen a lot of GTK3/4 code and maybe not even a single example of porting between the two versions

I wonder if finetuning could help with that


Yes, very much agree, an interesting benchmark. Particularly because it’s in a “tier 2” framework (gtkmm) in terms of amount of code available to train an LLM on. That tests the LLMs ability to plan and problem solve compared with, say, “convert to the latest version of react” where the LLM has access to tens of thousands (more?) of similar ports in its training dataset and more has to pattern match.


>> Particularly because it’s in a “tier 2” framework (gtkmm) in terms of amount of code available to train an LLM on.

I asked GPT4 to write an empty GTK4 app in C++. I asked for a menu bar with File, Edit, View at the top and two GL drawing areas separated by a spacer. It produced what looked like usable code with a couple lines I suspected were out of place. I did not try to compile it so don't know if it was a hallucination, but it did seem to know about gtkmm 4.


It definitely knows what GTK4 is, when it freaked out on me and lost the code, it was using all gtkmm-4.0 headers, and had the compiler error count down to 10 (most likely with tons of logic errors, but who knows).

But LLMs performance varies (and this is a huge critique!) not just on what they theoretically know, but how, erm, cross-linked it is with everything else, and that requires lots of training data in the topic.

Metaphorically, I think this is a little like the difference for humans in math between being able to list+define techniques to solve integrals vs being able to fluidly apply them without error.

I think a big and very valid critique of LLMs (compared to humans) is that they are stronger at "memory" than reasoning. They use their vast memory as a crutch to hide the weaknesses in their reasoning. This makes benchmarks like "convert from gtkmm3 to gtkmm4" both challenging AND very good benchmarks of what real programmers are able to do.

I suspect if we gave it a similarly sized 2kloc conversion problem with a popular web framework in TS or JS, it would one-shot it. But again, its "cheating" to do this, its leveraging having read a zillion conversion by humans and what they did.


Programmers who code interesting things likely shouldn’t worry. The legions who code voluminous but shallow corporate apps and glue might be more concerned.


>All the talk about putting programmers out of work

I keep thinking may be specifically Web programmers. Given a lot of the web essentially CRUD / have the same function.


Smooth? Nah.

Tom Sawyer? Yes.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: