One insidious thing is whitelists. If you allow the bot to run a command like `API_KEY=fdafsafa docker run ...`, then the API_KEY will be written to a file, and the agent can then read that in future runs. That bit me once already.
> If you allow the bot to run a command like `API_KEY=fdafsafa docker run ...`, then the API_KEY will be written to a file
It wouldn't be inherently. Is this something that Docker does? Or perhaps something that was done by the code that was run? (Shouldn't it have stayed within that container?)
But also, if it's not okay for the agent to know the API key permanently, why is it okay for the agent to have one-off use of something that requires the same key? Did it actually craft a Bash command line with the API key set and request to run it; or was it just using a tool that ends up with that command?
What I meant to say was, the agents (like Claude Code) often have a "Allow all instances of this command in the session," and that persists to a whitelist for that session. The mechanic here is actually just a prefix match, so `API_KEY=... diff_command` also matches, allowing the agent to reuse the key without asking me.
This file also sticks around, so I had another agent read the whitelist and the conversation transcript and do other things automatically without approval.
> if it's not okay for the agent to know the API key permanently, why is it okay for the agent to have one-off use of something that requires the same key?
Read commands vs. write commands. I'm okay having the agent fetch info for me, but I want to approve any state changes.
True. But it can help me create a lot of useful text so I can represent my self better.
I do wonder what happens when everyone is using agents for this, though. If AI produces the text and AI also reads the text, then do we even need the intermediary at all?
> Some botocalypse is going to happen at some point.
Yeah the bots can duke it out. As long as my time is saved.
For me the main concern is, before I have a stash of millions of dollars saved up, my medical expenses need to be paid for by the system, because I can't afford surprise bills. Hopefully the bots can fight more on my side in the near future.
Hopefully in the far future when the botocalypse happens I'll have saved up enough that insurance evading payment of $5500 won't be an issue for me, and/or I'll be of retirement age, don't need job opportunities anymore, and can go live in a country with better healthcare.
Call me selfish, but I don't control the insurance/medical system, I don't have space to think about more than protecting myself from it.
It makes sense that SAST is better for the provided task. The CWE Top 25 seem like issues focused around patterns. Each one has a strictly enumerated set of vulnerable patterns that you can scan for, and then, the tool's task becomes simply finding an exploitable path to that pattern. This lends itself towards static methods. Every known weakness of LLMs, like hallucinations, needle-in-haystack, and context overflow, show up in this taint-analysis issue.
I also think this is why SAST did much better in Java. Pattern-based vulns + static languages make static taint analysis really powerful. LLMs have no advantage here, while all of their disadvantages are highlighted.
This article doesn't go into issues that LLMs are able to find that traditional SAST isn't. Auth vulnerabilities, for example - privilege escalation is a software pattern but not a code one, and it takes reasoning to build a permissions model and then test it for breaches. Business logic issues are other: ways users can get around usage limits, or get access to premium features or private data.
> the ongoing economic and cultural exchange would have propelled the island towards a different political system
The blocker to this has always been the government refusing to reform. I don't see how increased exchange changes this. If anything, the Cuban government would've blocked any integration that threatens their control.
I guarantee you that price will double by 2027. Then it’ll be a new car payment!
I’m really not saying this to be snarky, I’m saying this to point out that we’re really already in the enshittification phase before the rapid growth phase has even ended. You’re paying $200 and acting like that’s a cheap SaaS product for an individual.
I pay less for Autocad products!
This whole product release is about maximizing your bill, not maximizing your productivity.
I don’t need agents to talk to each other. I need one agent to do the job right.
$200/month is peanuts when you are a business paying your employees $200k/year. I think LLMs make me at least 10% more effective and therefore the cost to my employer is very worth it. Lots of trades have much more expensive tools (including cars).
I think it depends on the tasks you use it for. Bootstrapping or translating projects between languages is amazing. New feature development? Questionable.
I don’t write frontend stuff, but sometimes need to fix a frontend bug.
Yesterday I fed claude very surgical instructions on how the bug happens, and what I want to happen instead, and it oneshot the fix. I had a solution in about 5 minutes, whereas it would have taken me at least an hour, but most likely more time to get to that point.
Literally an hour or two of my day was saved yesterday. I am salaried at around $250/hour, so in that one interaction AI saved my employer $250-500 in wages.
AI allows me to be a T shaped developer, I have over a decade of deep experience in infrastructure, but know fuck all about front end stuff. But having access to AI allows me as an individual who generally knows how computers work to fix a simple problem which is not in my domain.
Maybe this is a gray area, but that's kind of my experience with it too. I understand what I want to happen, but don't understand the language and it produces a language specific result that is close enough, maybe even one-shot, for me to use. I categorize this under translation.
My process, which probably wouldn't work with concurrent agents because I'm keeping an eye on it, is basically:
- "Read these files and write some documentation on how they work - put the documentation in the docs folder" (putting relevant files into the context and giving it something to refer to later on)
- "We need to make change X, give me some options on how to do it" (making it plan based on that context)
- "I like option 2 - but we also need to take account of Y - look at these other files and give me some more options" (make sure it hasn't missed anything important)
- "Revised option 4 is great - write a detailed to-do list in the docs/tasks folder" (I choose the actual design, instead of blindly accepting what it proposes)
- I read the to-do list and get it rewritten if there's anything I'm not happy with
- I clear the context window
- "Read the document in the docs folder and then this to-do list in the docs/tasks folder - then start on phase 1"
- I watch what it's doing and stop if it goes off on one (rare, because the context window should be almost empty)
- Once done, I give the git diffs a quick review - mainly the tests to make sure it's checking the right things
- Then I give it feedback and ask it to fix the bits I'm not happy with
- Finally commit, clear context and repeat until all phases are done
Most of the time this works really well.
Yesterday I gave it a deep task, that touched many aspects of the app. This was a Rails app with a comprehensive test suite - so it had lots of example code to read, plus it could give itself definite end points (they often don't know when to stop). I estimated it would take me 3-4 days for me complete the feature by hand. It made a right mess of the UI but it completed the task in about 6 hours, and I spent another 2 hours tidying it up and making it consistent with the visuals elsewhere (the logic and back-end code was fine).
So either my original estimate is way off, or it has saved me a good amount of time there.
New feature development in web and mobile apps is absolutely 10% more productive with these tools, and anyone who says otherwise is coping. That's a large fraction of software development.
Yes, the research is wrong. And in science, it's not taboo to call that out.
It's outdated, doesn't differentiate between people trying to incorporate it in their current workflow and the people who apply themselves to entirely new ones. It doesn't represent me in any way and I am releasing features to my platform daily now, instead of weekly. So I can wholeheartedly disagree with its conclusion.
The earth is either flat of it isn't. It's easy to proof it's not flat. It's not easy to conclude that the results of a study in a field that changes daily represents all people working in it, including the ones who did not participate.
If it is so self-evident that the research is wrong, that means there should be some research that supports the opposite conclusion then? Maybe you can link it?
The reason we don’t see any other research is because it’s neigh impossible to study a moving field. Especially at this pace.
If you have any ideas on how to measure objectively while this landscape changes daily, please share them with us. Maybe a researcher will jump on this bandwagon and proof you right.
I proposed a logically consistent perspective where both my experience and the study are true at the same time? What is your response to that other than comparing me to a flat earther? Do you have something useful to contribute?
Honestly, that is a “skill issue” as the kids these days say. When used properly and with skill, agents can increase your productivity. Like any tool, use it wrong and your life will be worse off. The logically consistent view if you want to believe this study and my experience is that the average person is hindered by using AI because they do not have the skills, but there are people out there who gain a net benefit.
It drives me nuts that people take the mean of AI code generation results and use that to make claims about what AI code generation is possible of. It's like using the mean basketball player to argue that people like LeBron and Jordan don't exist.
For sure. I like having discussions with nuanced takes, these are tools with strengths and weaknesses and being a good tool user includes knowing when not to pick it up.
It’s a skill issue, which means you can’t fire any of your highly skilled employees, which means it has the same value as any other business organization tool like Jira or Microsoft Excel, approximately $10-20 per user per month.
Autodesk Fusion for manufacturing costs less than Claude Max and you literally can’t do your job without it.
So Autodesk takes you from 0 to 100% productivity for under $200 a month and companies are expected to pay $200+ to gain an extra 10-20%?
That math isn’t how it works with any other business logic tools.
I pay $200/month, don’t come near the limits (yet), and if they raised the price to $1000/month for the exact same product I’d gladly pay it this afternoon (Don’t quote me on this Anthropic!)
If you’re not able to get US$thousands out of these models right now either your expectations are too high or your usage is too low, but as a small business owner and part/most-time SWE, the pricing is a rounding error on value delivered.
As a business expense to make profit, I can understand being ok with this price point.
But as an individual with no profit motive, no way.
I use these products at work, but not as much personally because of the bill. And even if I decided I wanted to pursue a for profit side project I’d have to validate it’s viability before even considering a 200$ monthly subscription
I'm paying $100 per month even though I don't write code professionally. It is purely personal use. I've used the subscription to have Claude create a bunch of custom apps that I use in my daily life.
This did require some amount of effort on my part, to test and iterate and so on, but much less than if I needed to write all the code myself. And, because these programs are for personal use, I don't need to review all the code, I don't have security concerns and so on.
$100 every month for a service that writes me custom applications... I don't know, maybe I'm being stupid with my money, but at the moment it feels well worth the price.
with the US salaries for SWEs $1000/month is not a rounding error for all but definitely for some. say you make $100/hr and CC saves you say 30hrs / month? not rounding error but no brainer. if you make $200+/hr it starts to become a rounding error. I have multiple max accounts at my disposal and at this point would for sure pay $1000/month for max plan. it comes down to simple math
1. 1-3 LLM vendors are substantially higher quality than other vendors and none of those are open source. This is an oligarchy and the scenario you described will play out.
2. >3 LLM vendors are all high quality and suitable for the tasks. At least one of these is open source. This is the "commodity" scenario, and we'll end up paying roughly the cost of inference. This still might be hundreds per month, though.
3. Somewhere in between. We've got >3 vendors, but 1-3 of them are somewhat better than the others, so the leaders can charge more. But not as much more than they can in scenario #1.
It's clear what's gonna play out. Chinese open source labs are slowly closing the gap, and as American frontier labs hit diminishing return on various tasks, the Chinese models are going to be good enough for the vast majority of use cases. This is going to strip American labs ability to do monopoly plays, and force them into open behavior.
The only place frontier labs will be able to profit take is niche models for specific purposes where they can control who has access to traces tightly. Any general pupose LLM with highly available traces is gonna get distilled down instantly.
> I’m saying this to point out that we’re really already in the enshittification phase before the rapid growth phase has even ended. You’re paying $200 and acting like that’s a cheap SaaS product for an individual.
Traditional SaaS products don't write code for me. They also cost much less to run.
I'm having a lot of trouble seeing this as enshittification. I'm not saying it won't happen some day, but I don't think we're there. $200 per month is a lot, but it depends on what you're getting. In this case, I'm getting a service that writes code for me on demand.
We can see especially in the case of Claude AI Max that while it sounds like you’re getting better value than the cheaper plans, the company is now encouraging less efficient use of the tool (having multiple agents talking to each other, rather than improving models so that one agent is doing work correctly).
> Traditional SaaS products literally “write code” for you (they implement business logic). See: Zapier, Excel.
Eh, I'd call those a sort of programming language. The user is still writing code, albeit in a "friendlier" manner. You can't just ask for what you want in English.
> The enshittification is that the costs are going up faster than inflation and companies like OpenAI are talking about adding advertisements.
In 1980, IT would have cost $0 at most companies. It's okay for costs to go up if you're getting a service you were not getting before.
In 1980, the costs associated with what we today call IT were not $0, they were just spread around in administrative clerical duties performed by a lot of humans.
Okay, but I think the analogy still works with that framing. These AI products can do tasks that would previously have been performed by a larger number of humans.
I could write an essay about how almost everything you wrote either is extremely incorrect or is extremely likely to be incorrect. I am too lazy to, though, so I will just have to wait for another commenter to do the equivalent.
Because, while I have been a huge AI optimist for decades, I generally don't like their current writing output. And even if I did, it would feel like plagiarism unless I prepended it with "an AI responded with this:", which would make me seem lazy. (Though I did already just admit I am very lazy in my first post, so perhaps that is what I will do going forward once they become better writers.)
reply