> I think the opposite, MCP is destined to fail for the exact same reason the semantic web failed, nobody makes money when things aren't locked down.
I think this is right. MCP resembles robots.txt evolved into some higher lifeform, but it's still very much "describe your resources for us to exploit them".
The reason the previous agent wave died (it was a Java thing in the 90s) was eventually everyone realized they couldn't trust their code once it was running on a machine it's supposed to be negotiating with. Fundamentally there is an information assymetry problem between interacting agents, entirely by design. Take that away and huge swathes of society will stop functioning.
This is unavoidable and the only way to mitigate the negatives is sousveillance.[0]
I reject claims by law enforcement that this will lead to making their lives less safe and that they will need to take steps to mitigate it including wearing masks and not giving out their names.[1]
In small towns of old every knew the police and judge, where they lived and which schools their children attended because their kids may have even sat next to them in class. This was fine and served as a moderating force for the worst impulses of law enforcement.
If you want to actually implement an ACME client from first principles, reading the RFC (plus related RFCs for JOSE etc) is probably easier than you think. I did exactly that when I made a client for myself.
I also wrote up a digested description of the issuance flow here: https://www.arnavion.dev/blog/2019-06-01-how-does-acme-v2-wo... It's not a replacement for reading the RFCs, but it presents the information in the sequence that you would follow for issuance, so think of it like an index to the RFC sections.
The thing with alchemy was not that their hypotheses were wrong (they eventually created chemistry), but that their method of secret esoteric mysticism over open inquiry was wrong.
Newton is the great example of this: he led a dual life, where in one he did science openly to a community to scrutinize, in the other he did secret alchemy in search of the philosopher's stone. History has empirically shown us which of his lives actually led to the discovery and accumulation of knowledge, and which did not.
Wouldn't that be the best thing possible for our industry? Watching the bandwagoners and "vibe coders" get destroyed and come begging for actual thinking talent would be delicious. I think the bets are equal on whether later LLMs can unfuck current LLM code to the degree that no one needs to be re-hired... but my bet is on your side, that bad code collapses under its own weight. As does bad management in thrall to trends whose repercussions they don't understand. The scenario you're describing is almost too good. It would be a renaissance for the kind of thinking coders you're talking about - those of us who spend 90% of our time considering how to fit a solution to a domain and a specific problem - and it would scare the hell out of the next crop of corner suite assholes, essentially enshrining the belief that only smart humans can write code that performs on the threat/performance model needed to deal with any given problem.
>> the vast majority of an engineer's time isn't spent writing -- it's spent reading and thinking.
Unfortunately, this is now an extremely minority understanding of how we need to do our job - both among hirees and the people who hire them. You're lucky if you can find an employer who understands the value of it. But this is what makes a "10x coder". The unpaid time spent lying awake in bed, sleepless until you can untangle the real logic problems you'll have to turn into code the next day.
This is cool, though the notes in your example look pretty random? Are they actually randomly or is it just too modern for me to hear it without playing it?
I'm a fairly average pianist, but sight reading is a (relative) strength. Being able to play random notes is definitely part of it, but I think for me sight-reading is more about getting a sense of the gist of the music (a lot of pattern matching of common phrases, cadences, hand positions etc) - this is kind of subconcious, then my focus is on keeping my internal version aligned with what's on the page (spotting where the written music is doing something different or interesting and making sure you hit those notes). The latter part would definitley improve by practicing random notes, but the first bit is more akin to improvisation - you've got some lossy, distilled version of the music in your head (from memory or from your first mental parse of the full manuscript) and you're trying to recreate it (or expound on it).
I think what really helped my reading was having lots of cheap/free sheet music on hand and just trying to play it (simplifying massively if needed, but trying to get the sense of it, even if only playing 20% of the notes)
It kind of is, when they were given $500B and told to make a return in 10-ish years. They have to put the capital in play where it has the largest ROI potential. They are gambling that Jony has another iPhone in him.
I don't know enough about any of this to weigh in on it, but when you take investor money, you aren't supposed to sit on it or do slow burn (at least not VC money), its meant to be gasoline, and you moonshot with it.
I'm curious if rust has this problem. The problem I notice in npm land is many developers have no taste. Example, there's a library for globbing call glob. You'd think it would just be a function that does globbing but no, the author decided it should ALSO be a standalone commandline executable and so includes a large commandline option parser. They could have easily made a separate commandline tool that include a library that does the glob but no, this is a common and shit pattern in npm. I'd say easily 25% or more of all "your dependencies are out of date" messages are related to the argument parcing for the commandline tool in these libraries. That's just one example.
Also there's arguably design. Should a 'glob' library actually read the file system and give you filenames or should it just tell you if a string matches a glob and leave the reset to you? I think it's better design to do the later, the simplest thing. This means less dependencies and more flexibility. I don't have to hack it or add option to use my own file system (like for testing). I can use it with a change monitoring system, etc...
And, I'm sure there are tons of devs that like the glob is a "Do everything for me" library instead of a "do one specific thing" library which makes it worse because you get more "internet points" the more your library doesn't require the person using it to be a good dev.
I can't imagine it's any different in rust land, except maybe for the executable thing. There's just too many devs and all of them, including myself, don't always make the best choices.
That first PR (115733) would make me quit after a week if we were to implement this crap at my job and someone forced me to babysit an AI in its PRs in this fashion. The others are also rough.
A wall of noise that tells you nothing of any substance but with an authoritative tone as if what it's doing is objective and truthful - Immediately followed by:
- The 8 actual lines of code (discounting the tests & boilerplate) it wrote to actually fix the issue is being questioned by the person reviewing the code, it seems he's not convinced this is actually fixing what it should be fixing.
- Not running the "comprehensive" regression tests at all
- When they do run, they fail
- When they get "fixed" oh-so confidently, they still fail. Fifty-nine failing checks. Some of these tests take upward of an hour to run.
So the reviewer here has to read all the generated slop in the PR description and try to grok what the PR is about, read through the changes himself anyway (thankfully it's only a ~50 line diff in this situation, but imagine if this was a large refactor of some sort with a dozen files changed), and then drag it by the hand multiple times to try fix issues it itself is causing. All the while you have to tag the AI as if it's another colleague and talk to it as if it's not just going to spit out whatever inane bullshit it thinks you want to hear based on the question asked. Test failed? Well, tests fixed! (no, they weren't)
And we're supposed to be excited about having this crap thrust on us, with clueless managers being sold on this being a replacement for an actual dev? We're being told this is what peak efficiency looks like?
It's a bit annoying to see something I think is worth keeping on a website, only to find that the author has excluded their website from the Wayback Machine.
Now the content can't be stored easily in a reliable way. I would have to host it myself and make some solution to find it.
Value capture pricing is a fantasy often spouted by salesmen, the current era AI systems have limited differentiation, so the final cost will trend towards the cost to run the system.
So far I have not been convinced that any particular platform is more than 3 months ahead of the competition.
I work at OpenAI (not on Codex) and have used it successfully for multiple projects so far. Here's my flow:
- Always run more than one rollout of the same prompt -- they will turn out different
- Look through the parallel implementations, see which is best (even if it's not good enough), then figure out what changes to your prompt would have helped nudge towards the better solution.
- In addition, add new modifications to the prompt to resolve the parts that the model didn't do correctly.
- Repeat loop until the code is good enough.
If you do this and also split your work into smaller parallelizable chunks, you can find yourself spending a few hours only looping between prompt tuning and code review with massive projects implemented in a short period of time.
I've used this for "API munging" but also pretty deep Triton kernel code and it's been massive.
> it could be the phrase "total number of kittens"
There is this minimalism present in math culture, while I sort of understand it, when tossing and mixing formulas it helps to have everything as small as possible. but later, when it is published it really sucks for readability, "Okay here is an item doing some heavy lifting in this formula what is it for? hell if I know some joker labeled it 'φ' "
I like to joke, If you think programmers are bad at naming thing you should see the mathematicians, they take a perverse pride in their inability to name things.
The worst are programs derived directly from a math paper, if your variable holds the correlation coefficient call it that. we have thousands of years of language and labels we can use to share our ideas with others. don't encrypt it and call it "rho".
If you are old enough you remember posting to Usenet and the warning that would accompany each new submission:
This program posts news to thousands of machines throughout the entire civilized world. Your message will cost the net hundreds if not thousands of dollars to send everywhere. Please be sure you know what you are doing. Are you absolutely sure that you want to do this? [ny]
Maybe we meed something similar in LLM clients. Could be phrased in terms of how many pounds of atmospheric carbon the request will produce.
> Google said Amazon doesn’t have a special deal. The company and Amazon declined to offer specifics.
> Google and Amazon say the payment options aren’t new. Google said Amazon was among a few companies that had been able to offer non-Google payment options for their existing customers, under a test program.
"It's not a special deal. It's just that only a few companies can benefit from it."
Converting a dictionary into a list of records when you known that's what you want ... easy, mechanical, boring af, and something we should almost obviously outsource to machines. LLMs are great at this.
Deciding whether to use a dictionary or a stream of records as part of your API? You need to internalize the impacts of that decision. LLMs are generally not going to worry about those details unless you ask. And you absolutely need to ask.
I had the same reaction when they said that "younger study participants had the most enthusiastic preference for M3 Expressive." Could it be that young people are most likely to be impressed by pretty bullshit, and the whole point of this redesign is futile?
It's incredible how bad this keeps getting and how much they ignore formerly well-established UI principles in favour of "vibe design" and pseudoscientific "studies".
What is the explanation for this? What is the reason that even the most well-funded companies in the world fuck this up so bad?
At some point they resize the send button into a circle of comically huge proportions — eating even more space from the actual content — because they did eye-tracking testing and users "find" it in 0.9s instead of 1.6s. Surely there's some explanation for this clinical level of madness.
---
> These factors can be quantified in users’ responses to new M3 Expressive designs. We found a 32% increase in subculture perception, which indicates that expressive design makes a brand feel more relevant and “in-the-know.” We also saw a 34% boost in modernity, making a brand feel fresh and forward-thinking. On top of that, there was a 30% jump in rebelliousness, suggesting that expressive design positions a brand as bold, innovative, and willing to break from convention.
Jesus christ, we're already a sci-fi dystopia and we didn't even realise.
>I don't get the whole "all-in" mentality around LLMs
To be uncharitable and cynical for a moment (and talking generally rather than about this specific post), it yields content. It gives people something to talk about. Defining their personality by their absolutes, when in reality the world is an infinite shades of gradients.
Go "all in" on something and write about how amazing it is. In a month you can write your "why I'm giving up" the thing you went all in on and write about how relieved/better it is. It's such an incredibly tired gimmick.
"Why I dumped SQL for NoSQL and am never looking back"
"Why NoSQL failed me"
"Why we at FlakeyCo are all in on this new JavaScript framework!"
"Why we dumped that new JavaScript framework"
This same incredibly boring cycle is seen on here over and over and over again, and somehow people fall for it. Like, it's a huge indicator that the writer more than likely has bad judgment and probably shouldn't be the person to listen to about much.
Like most rational people that use decent judgement (rather than feeling I need to "all in" on something, as if the more I commit the more real the thing I'm committing to is), I leverage LLMs many, many times in my day to day. Yet somehow it has authored approximately zero percentage of my actual code, yet is still a spectacular resource.
That's very neat! I will look at Truffle. The TLA+ interpreter is definitely "weird" in that it does this double duty of both evaluating a predicate while also using that same predicate to extract hints about possible next states. I wonder how well this highly unusual side-effectful pattern can be captured in Truffle.
Edit: okay the more I look into GraalVM the more impressed I am. I will have to sit down and really go through their docs. Oracle was actually cooking here.
>This proposal is about bots identifying themselves through open HTTP headers.
The problem is that to CF, everything that isn't Chrome is a bot (only a slight exaggeration). So browsers that aren't made by large corporations wouldn't have this. It's like how CF uses CORS.
CORS isn't only CF but it's an example of their requiring obscure things no one else really uses, and using them in weird ways that causes most browser to be unable to do it. The HTTP header CA signing is yet another of these things. And weird modifications of TLS flags fall right in there too. It's basically Proof-of-Chrome via Gish Gallop of new "standards" they come up with.
>Absolutely nothing wrong with this, as it's site owners that make the decision for their own sites.
I agree. It's their choice. I am just laying out the consequences of these mostly uninformed choices. They won't be aware that they're blocking a large number of their actual human visitors initially. I've seen it play out again and again with sites and CF. Eventually the sites are doing as much work maintaining their whitelists of UAs and IPs that one wonders why they use CF at all if they're doing the job instead.
And that's not even starting on the bad and aggressive defaults for CF free accounts. In the last month or two they have slightly improved this. So there's some hope. They know they are a problem because they're so big,
"It was a decision I could make because I’m the CEO of a major Internet infrastructure company." ... "Literally, I woke up in a bad mood and decided someone shouldn't be allowed on the Internet. No one should have that power." - Cloudflare CEO Matthew Prince
(ps. You made some good and valid points, re: IETF process status quo, personal choice, etc, it's not me doing the downvotes)
It sucks that we've collectively surrendered the urls to our content to centralized services that can change their terms at any time without any control. Content can always be moved, but moving the entire audience associated with a url is much harder.
I don't understand the reasoning for persisting LLM output that can be generated at any point. If I want to use an LLM to understand someone else's commits, I can use the LLM best suited for that task at the time I need the information, which will likely be more accurate than what was available at the time of the commit and will have access to more context.
I also believe that commit messages should focus on information the code doesn't already convey. Whatever the LLM can generate from looking at your code is likely not the info I'll seek when I read your commit message.
I think this is right. MCP resembles robots.txt evolved into some higher lifeform, but it's still very much "describe your resources for us to exploit them".
The reason the previous agent wave died (it was a Java thing in the 90s) was eventually everyone realized they couldn't trust their code once it was running on a machine it's supposed to be negotiating with. Fundamentally there is an information assymetry problem between interacting agents, entirely by design. Take that away and huge swathes of society will stop functioning.