I've started a company in this space about 2 years ago. We are doing fine. What we've learned so far is that a lot of these techniques are simply optimisations to tackle some deficiency in LLMs that is a problem "today". These are not going to be problems tomorrow because the technology will shift. As it happened many time in the span of the last 2 years.
So yah, cool, caching all of that... but give it a couple of months and a better technique will come out - or more capable models.
Many years ago when disc encryption on AWS was not an option, my team and I had to spend 3 months to come up with a way to encrypt the discs and do so well because at the time there was no standard way. It was very difficult as that required pushing encrypted images (as far as I remember). Soon after we started, AWS introduced standard disc encryption that you can turn on by clicking a button. We wasted 3 months for nothing. We should have waited!
What I've learned from this is that often times it is better to do absolutely nothing.
This is the most important observation. I'm getting so many workshop invitations from my corporate colleagues about AI and agents. What most people don't get that these clever patterns they "invented" will be obsolete next week. This nice company blog about agents - one which got viral recently - will be obsolete next month. It's hard to swallow for my colleagues that in these age - like when you studied gang of four or a software architecture pattern book that you have learned a common language - no, these days the half-life of a pattern for AI is about a week. Even when you ask 10 professionals what an agent actually is - you will get 10 different answers yet they assume that how they use it is the common understanding.
This is also why it's perfectly fine to wait out this AI hype and see what sticks afterward. It probably won't cost too much time to catch up, because at that point everyone who knows what they're doing only learned that a month or two ago anyway.
> It probably won't cost too much time to catch up
That's a risky bet. It is more likely that the user interface of AI will evolve. Some things will stick, some will not. Three years from now, many things that are clunky now will be replaced by more intuitive things. But some things that already work now will still be in place. People who have been heavy users of AI between now and then will definitely have a leg start on those who will just start then.
In general, I'm not too afraid of UI - those are usually very learnable in a short amount of time. It's the underlying concepts and abstractions that take more time to pick up, but right now, a lot of them seem to be based on observations (or just general "feels") of the behaviour of particular models, of which new ones appear every year.
Counterpoint to these two posts: a journeyman used to have to make his own tools. He could easily have bought them, or his master could have made them. Making your own tools gives you vastly greater skills when using the tools. So I know how fast AI agents and model APIs are evolving, but I’m writing them anyway. Every break in my career has been someone telling me it’s impossible and then me doing it anyway. If you use an agent framework, you really have no idea how artificially constrained you. You’re so constrained, and yet you are oblivious to it.
On the “wasting three months” remark (GP), if it’s a key value proposition, just do it. Don’t wait. If it’s not a key value prop, then don’t do it at all. Often times what I’ve built has been better tailored to our product than what AWS built.
I agree with this point. It is about being a craftsman esp. the point about if it is part of your KVP or not.
In addition to this, if you do have the skills of doing it, then you can either patent it or open source it.
This will allow you to be part of the ecosystem giving you a much greater heft in the community. At the very least, if you've done something at least put it out there as an alternative to what's being pushed by AWS (or whoever). You never know...
You can make your own hand plane, and you will be a better woodworker for it. Still in a few months your competition will be using a electric planes and routers
the hand plane vs. the electric plane may not be the right metaphor. It will be more like one hand plane vs. another.
Just because it is a "great big" company pushing it with all their might ($s), doesn't mean it is the best solution out there. There's a lot of people who would prefer the alternate.
Like a previous post said, just make sure it lies in your base competency (which you have if you've developed it) and is part of your key value proposition.
The cult of efficiency aims to turn craftsmanship into something that only concerns hobbyists. Everything else is optimizing money in vs money out to get as close to possible as revenue being directly deposited into shareholders bank accounts.
Note that even many of those "long knowledge" things people learned are today obsolete, but people that follow them just haven't figured it out yet. See how many of those object oriented design patters just look very silly the minute you use immutable data structures, and have access to functional programming constructs in your language. And nowadays most do. Many seminal books on how to program in the early 2000s, especially those covering "pure" OO, look quite silly today.
And yet despite being largely obsolete in the specifics, gang of four remains highly relevant and useful in the generalities. All these books continue to be absolutely great foundations if you look past their immediate advice.
I think knowing when to do nothing is being able to evaluate if the problem the team is tackling is essential or tangential to the core focus of the project, and also whether the problem is something new or if it's been around for a while and there is still no standard way to solve it.
Yeah, that will be the make it to brake it moment because if it’s too essential, it will be implemented but if it’s not, it may become a competitive advantage
Vehement disagree. We implemented our own context editing features 4 months back. Claude released a very similar featureset we had all along last month. We were still glad we did it because
(A) it took me half a day to do that work
(B) our solution is still more powerful for our use case
(C) our solution works on other models as well.
It all comes down to trying to predict what will be your vendors' roadmap (or if youre savvy, get a peek into it) and whether the feature you want to create is fundamental to your applications behavior (I doubt encryption is unless youre a storage company).
This is the "Wait Calculation" and it's fiendish because there exists only some small, finite window in which it is indeed better to start before the tech is "better" in order to "win" (i.e. get "there" first, wherever "there" is in your scenario).
if we wait long enough, we just end up dead, so it turns out we didn't need to do anything at all whatsoever. of course there's a balance - often times starting out and growing up with the technology gives you background and experience that gives you an advantage when it hits escape velocity.
These days it seems like training yourself into a specialty that provides steadyish income for a year before someone obliterates your professional/corporate/field’s scaffolding with AI and you have to start over is kind of a win. Doesn’t it feel like a win? Look at the efficiency!
I agree with the sentiment. things are moving so fast that waiting now is a legitimate strategy. though it is also easy to fall in to the trap of. well if we continue along these lines might as well wait 4-5 years and we get agi. which still true imo does feel off as you arent participating in the process.
One example is that there used to be a whole complex apparatus around getting models to do chain of thought reasoning, e.g., LangChain. Now that is built in as reasoning and they are heavily trained to do it. Same with structured outputs and tool calls — you used to have to do a bunch of stuff to get models to produce valid JSON in the shape you want, now it’s built in and again, they are specifically trained around it. It used to be you would have to go find all relevant context up front and give it to the model. Now agent loops can dynamically figure out what they need and make the tool calls to retrieve it. Etc etc.
LangChain generally felt pointless for me to use, not a good abstraction. It would rather keep you from the most important thing that you need in this fast evolving ecosystem, and it's direct prompt level (if you can even call that low level) understanding of what is going on.
For JSON I agree, now I can just mention JSON and provide examples and the response always comes in the right format, but for tool calling and information retrieval I have never seen a system actually work, nor in my tests have these worked.
Now, I'm open to the idea that I am just using it wrong, but I have seen several reports around the web that the most that people got in tool calling accuracy is 80%, which is unusable for any production system, also for info retrieval I have seen it lose coherence the more data is available overall.
Is there a model that actually achieved 100% tool calling accuracy?
So far I built systems for that myself, surrounding the LLM, and only like this it worked well in production.
If we expand this to 3 years, the single biggest shift that totally changed LLM development is the increase in size of context windows from 4,000 to 16,000 to 128,000 to 256,000.
When we were at 4,000 and 16,000 context windows, a lot of effort was spent on nailing down text splitting, chunking, and reduction.
For all intents and purposes, the size of current context windows obviates all of that work.
What else changed?
- Multimodal LLMs - Text extraction from PDFs was a major issue for rag/document intelligence. A lot of time was wasted trying to figure out custom text extraction strategies for documents. Now, you can just feed the image of a PDF page into an LLM and get back a better transcription.
- Reduced emphasis on vector search. People have found that for most purposes, having an agent grep your documents is cheaper and better than using a more complex rag pipeline. Boris Cherny created a stir when he talked about claude code doing it that way[0]
>For all intents and purposes, the size of current context windows obviates all of that work.
Large context windows can make some problems easier or go away for sure. But you may still have the same issue of getting the right information to the model. If your data is much larger than e.g. 256k tokens you still need to filter it. Either way, it can still be beneficial (cost, performance, etc.) to filter out most of the irrelevant information.
>Reduced emphasis on vector search. People have found that for most purposes, having an agent grep your documents is cheaper and better than using a more complex rag pipeline
This has been obvious from the beginning for anyone familiar with information retrieval (R in RAG). It's very common that search queries are looking for exact matches, not just anything with similar meaning. Your linked example is code search. Exact matches/regex type of searches are generally what you are looking for there.
I'm amazed at this question and the responses you're getting.
These last few years, I've noticed that the tone around AI on HN changes quite a bit by waking time zone.
EU waking hours have comments that seem disconnected from genAI. And, while the US hours show a lot of resistance, it's more fear than a feeling that the tools are worthless.
It's really puzzling to me. This is the first time I noticed such a disconnect in the community about what the reality of things are.
To answer your question personally, genAI has changed the way I code drastically about every 6 months in the last two years. The subtle capability differences change what sorts of problems I can offload. The tasks I can trust them with get larger and larger.
It started with better autocomplete, and now, well, agents are writing new features as I write this comment.
Despite the latest and greatest models…I still see glaring logic errors in the code produced in anything beyond basic CRUD apps. They still make up fields that don’t exist, assign a value to a variable that is nonsensical. I’ll give you an example, in the code in question, Codex assigned a required field LoanAmount to a value from a variable called assessedFeeAmount…simply because as far as I can tell, it had no idea how to get the correct value from the current function/class.
That's why I don't get people that claim to be letting an agent run for an hour on some task. LLMs tend to do so many small errors like that, that are so hard to catch if you aren't super careful.
I wouldn't want to have to review the output of an agent going wild for an hour.
The agent reviews the code. The agent has access to tools. It writes the code, runs it through a test, reads the error, fixes the code, keeps going. It passes the code off to another agent with a prompt to review code and give it notes. They pass it back and forth, another agent reads and creates documentation. It keeps going and passes things back.
Now that's the idea anyway. Of course they all will lie to each other and there's hallucinations every step of the way. If you want to see a great example look at the documentation for the TEMU marketplace API. The whole API system, docs, examples etc appears to be vibe coded and lots of nonsensical formatting, methods that don't work and parameters in example that just say "test" or "parameters", but they are presented as working examples with actual response examples (like a normal API) but it largely appears to just be made up!
Who says anyone’s reviewing anything? I’m seeing more and more influencers and YouTubers playing engineer or just buying an app from an overseas app farm. Do you think anyone in that chain gives the first shit what the code is like?
If the LLM can test the code it will fix those issues automatically. That’s how it can keep going for hours and produce something useful. You need to review the code and tests obviously afterwards.
The main line of contention is how much autonomy these agents are capable of handling in a competitive environment. One side generally argues that they should be fully driven by humans (i.e. offloading tedious tasks you know the exact output of but want to save time not doing) while the other side generally argues that AI agents should handle tasks end-to-end with minimal oversight.
Both sides have valid observations in their experiences and circumstances. And perhaps this is simply another engineering "it depends" phenomenon.
the disconnect is quite simple, there are people that are professionals and are willing to put the time in to learn and then there’s vast majority of others who don’t and will bitch and moan how it is shit etc. if you can’t get these tools to make your job easier and more productive you ought to be looking for a different career…
You're not doing yourself any favors by labeling people who disagree with you undereducated or uninformed. There is enough over-hyped products/techniques/models/magical-thinking to warrant skepticism. At the root of this thread is an argument to (paraphrasing) encouraging people to just wait until someone solves major problems instead of tackling it themselves. This is a broad statement of faith, if I've ever seen one, in a very religious sense: "Worry not, the researchers and foundation models will provide."
My skepticism and intuition that AI innovations are not exponential, but sigmoid are not because I don't understand what gradient-descent, transformers, RAG, CoT, or multi-head attention are. My statement of faith is: the ROI economics are going to catch up with the exuberance way before AGI/ASI is achieved; sure, you're getting improving agents for now, but that's not going to justify the 12- or 13-digit USD investments. The music will stop, and improvements slow to a drip
Edit: I think at it's root, the argument is between folk who think AI will follow the same curve as past technological trends, and those who believe "It's different this time".
> labeling people who disagree with you undereducated or uninformed
I did neither of these two things... :) I personally could not care about
- (over)hype
- 12/13/14/15 ... digit USD investment
- exponential vs. sigmoid
There are basically two groups of industry folk:
1. those that see technology as absolutely transformational and are already doing amazeballs shit with it
2. those that argue how it is bad/not-exponential/ROI/...
If I was a professional (I am) I would do everything in my power to learn everything there is to learn (and then more) and join the Group #1. But it is easier to be in Group #2 as being in Group #1 requires time and effort and frustrations and throwing laptop out the window and ... :)
A mutually exclusive group 1 & group 2 are a false dichotomy. One can have a grasp on the field and keep up to date with recent papers, have an active Claude subscription, use agents and still have a net-negative view of "AI" as a whole, considering the false promises, hucksters, charlatans and an impending economic reckoning.
tl;dr version: having negative view of the industry is decoupled from one's familiarity with, and usage of the tools, or the willingness to learn.
> considering the false promises, hucksters, charlatans and an impending economic reckoning.
I hack for a living. I could hardly give two hoots about “false promises” or “hucksters” or some “impeding economic reckoning…” I made a general comment that a whole lot of people simple discount technology on technical grounds (favorite here on HN)…
> I could hardly give two hoots about “false promises” or “hucksters”
I suppose this is the crux of our misunderstanding: I deeply care about the long-term health and future of the field that gave me a hobby that continues to scratch a mental itch with fractal complexity/details, a career, and more money than I ever imagined.
> or some “impeding economic reckoning…”
I'm not going to guess if you missed the last couple of economic downturns or rode them out, but an economic reckoning may directly impact your ability to hack for a living, that's the thing you prize.
I see the first half of group 1, but where's the second half? Don't get me wrong, there's some cool and interesting stuff in this space, but nothing I'd describe as close to "amazeballs shit."
you should see what I’ve seen (and many other people also). after 30 years of watching humans do it (fairly poorly as there is extremely small percentage of truly great SWEs) stuff I am seeing is ridiculously amazing
Can you describe some of it? On one hand, it is amazing that a computer can go from prose to code at all. On the other hand, it’s what I like to describe as a dancing bear. The bear is not a very good dancer, but it’s amazing that it can dance at all.
I’d make the distinction between these systems and what they’re used for. The systems themselves are amazing. What people do with them is pretty mundane so far. Doing the same work somewhat faster is nice, and it’s amazing that computers can do it, but the result is just a little more of the same output.
If there is really amazing stuff happening with this technology how did we have two recent major outages that were cause by embarrassing problems? I would guess that at least in the cloud flare instance some of the responsible code was ai generated
Microsoft is saying they're generating 30% of their code now and there's clearly been a lot of stability issues with Windows 11 recently that they've publicly acknowledged. It's not hard to tell a story that involves layoffs, increased pressure to ship more code, AI tools, and software quality issues. You can make subtle jabs about your peers as much as you want but that isn't going to change public perception when you ship garbage.
The whole point is that the outages happened not that the ai code caused them. If ai is so useful/amazing then these outages should be less common not more. It’s obviously not rock solid evidence. Yeah ai could be useful and speed up or even improve a code base but there isn’t any evidence that that’s actually improving anything the only real studies point to imagined productivity improvements
They're not logistic, this is a species of nonsense claim that irks me even more than claiming "capabilities gains are exponential, singularity 2026!"; it actually includes the exponential-gains claim and then tries to tack on epicycles to preempt the lack of singularities.
Remember, a logistic curve is an exponential (so, roughly, a process whose outputs feed its growth, the classic example being population growth, where more population makes more population) with a carrying capacity (the classic example is again population, where you need to eat to be able to reproduce).
Singularity 2026 is open and honest, wearing its heart on its sleeve. It's a much more respectable wrong position.
It's disheartening. I got a colleague, very senior, who dislikes AI for a myriad of reasons and doesn't want to adapt if not forced by mgmt. I feel from 2022-2024 the majority of my colleagues were in this camp - either afraid from AI or because they looked at it as not something a "real" developer would ever use. 2025 it seemed to change a bit. American HN seemed to adapt more quickly while EU companies are still lacking the foresight to see what is happening on the grand scale.
I'm pretty senior and I just don't find it very useful. It is useful for certain things (deep code search, writing non-production helper scripts, etc.) and I'm happy to use it for those things, but it still seems like a long way off for it to be able to really change things. I don't foresee any of my coworkers being left behind if they don't adopt it.
AI gives you either free expertise or free time. If you can make software above the level of Gemini or Claude output, then have it write your local tools, or have it write synthetic data for tests, or have it optimize your zshrc or bash profile. Maybe have it implement changes your skip level wants to see made, which you know they are amateurish, unsound garbage with revolting UI. Rather than waste your day writing ill-advised but high quality code just to show them how it’s a bad idea, you can have AI write code for you, to illustrate your point without spending any real work hours on it.
Just in my office, I have seen “small tools” like Charles Proxy almost entirely disappear. Everyone writes/shares their AI-generated solutions now rather than asking cyber to approve a 3rd party envfile values autoloader to be whitelisted across the entire organization.
senior as well, few years from finishing up my career. I run 8 to 12 terminals entire day. it is changing existing and writing new stuff all day, every day. 100’s of thosands of lines of changed/added/removed code in production… and a lot less issues than when every line was typed in by me (or another human)
What sort of work do you do? I suspect a lot of the differences of opinion here are caused by these systems being a lot better at some kinds of programming than others.
I do lower level operating systems work. My bread and butter is bit-packing shenanigans, atomics, large-scale system performance, occasionally assembly language. It’s pretty bad at those things. It comes up with code that looks like what you’d expect, but doesn’t actually work.
It’s good for searching code big codebases. “I’m crashing over here because this pointer has the low bit set, what would do that?” It’s not consistent, but it’s easy to check what it finds and it saves time overall. It can be good for making tests, especially when given an example to work from. And it’s really good for helper scripts. But so far, production code is a no-go for me.
I use GenAI for text translation, text 2 voice and voice 2 text, there it is extremely useful. For coding I often have the feeling it is useless, but also sometimes it is useful, like most tools...
Exactly, it’s really weird to see all this people claiming these wonderful things about LLMs. Maybe it’s really just different levels of amazement, but I understand how LLMs work, I actually use ChatGPT quite a bit for certain things (searching, asking some stuff I know it can find online, discuss ideas or questions I have etc.).
But all the times I tried using LLMs to help me coding, the best it performs is when I give it a sample code (more or less isolated) and ask it for a certain modification that I want.
More often than not, it does make seemingly random mistakes and I have to be looking at the details to see if there’s something I didn’t catch, so the smallest scope there better.
If I ask for something more complex or more broad, it’s almost certain it will make many things completely wrong.
At some point, it’s such a hard work to detail exactly what you want with all context that it’s better to just do it yourself, cause you’re writing a wall of text to have a one time thing.
But anyway, I guess I remain waiting. Waiting until FreeBSD catches up with Linux, because it should be easy, right? The code is there in the Linux kernel, just tell an agent to port it to FreeBSD.
I’m waiting for the explosion of open source software that aren’t bloated and that can run optimized, because I guess agents should be able to optimize code? I’m waiting for my operating system to get better over time instead of worse.
Instead I noticed the last move from WhatsApp was to kill the desktop app to keep a single web wrapper. I guess maintaining different codebases didn’t get cheaper with the rise of LLMs? Who knows. Now Windows releases updates that break localhost. Ever since the rise of LLMs I haven’t seen software release features any faster, or any
Cambrian explosion of open source software copying old commercial leaders.
I think it is an interesting thought experiment to try to visualize 2025 without the internet ever existing because we take it completely for granted that the internet has made life better.
It seems pretty clear to me that culture, politics and relationships are all objectively worse.
Even remote work, I am not completely sure I am happier than when I use to go to the office. I know I am certainly not walking as much as I did when I would go to the office.
Amazon is vastly more efficient than any kind of shopping in the pre-internet days but I can remember shopping being far more fun. Going to a store and finding an item I didn't know I wanted because I didn't know it existed. That experience doesn't exist for me any longer.
Information retrieval has been made vastly more efficient so I instead of spending huge amounts of time at the library, I get that all back in free time. What I would have spent my free time doing though before the internet has largely disappeared.
I think we want to take the internet for granted because the idea that the internet is a long term, giant mistake is unthinkable to the point of almost having a blasphemous quality.
Childhood? Wealth inequality?
It is hard to see how AI as an extension of the internet makes any of this better.
Chlorofluorocarbons, microplastics, UX dark patterns, mass surveillance, planned obsolescence, fossil fuels, TikTok, ultra-processed food, antibiotic overuse in livestock, nuclear weapons.
It's a defensible claim I think. Things that people want are not always good for humanity as a whole, therefore things can be useful and also not good for humanity as a whole.
There was some confusion. I originally read Wiseowise's comment as a failure to think of anything that could be "useful but bad for humanity". But given the followup response above I assume they're actually saying that LLMs are similar to tools like the Internet or Wikipedia and therefore should simply not be in the bad for humanity category.
Whether that's true or not, it is a different claim which doesn't fit the way I responded. It does fit the way Libidinalecon responded.
> EU waking hours have comments that seem disconnected from genAI. And, while the US hours show a lot of resistance, it's more fear than a feeling that the tools are worthless.
I don't think it's because the audience is different but because the moderators are asleep when Europeans are up. There are certain topics which don't really survive on the frontpage when moderators are active.
I'm unsure how you're using "moderators." We, the audience, are all 'moderators' if we have the karma. The operators of the site are pretty hands-off as far as content in general.
This would mean it is because the audience is different.
I'm sure this site works quite differently from what you say. There's no paid team of moderators flicking stories and comments off the site because management doesn't like them.
There's dang who I've seen edit headlines to match the site rules. Then there's the army of users upvoting and flagging stories, voting (up and down) and flagging comments. If you have some data to backup your sentiments, please do share it - we'd certainly like to evaluate it.
My email exchanges with Dang, as part of the moderation that happens around here, have all been positive
1. I've been moderated, got a slowdown timeout for a while
2. I've emailed about specific accounts, (some egregious stuff you've probably never seen)
3. Dang once emailed me to ask why I flagged a story that was near the top, but getting heavily flagged by many users. He sought understanding before making moderation choices
I will defend HN moderation people & policies 'til the cows come home. There is nothing close to what we have here on HN, which is largely about us being involved in the process and HN having a unique UX and size
dang announced they were moved from volunteer to paid position a few years ago. More rumblings about more mods brought on since then. What makes you say you're "so sure"?
> There's no paid team of moderators flicking stories and comments off the site because management doesn't like them.
Emphasis mine. The question is does the paid moderation team disappear unfavorable posts and comments, or are they merely downranked and marked dead (which can still be seen by turning on showdead in your profile).
The by far more common action is for the mods to restore a story which has been flagged to oblivion by a subset of the HN community, where it then lands on the front page because it already has sufficient pointage
It's not controversial to say that submissions are being moderated, that's how this (and many other) sites work. I haven't made any claims about how often it happens, or how it relates to second-chance moderation.
What I'm pointing out is just that moderation isn't the same at different times of the day and that this sometimes can explain what content you see during EU and US waking hours. If you're active during EU daytime hours and US morning hours, you can see the pattern yourself. Tools like hnrankings [1] make it easy to watch how many top-10 stories fall off the front page at different times of day over a few days.
> I’m referring to the actual moderators of this website removing posts from the front page.
This is what you said. There has only been one until this year, so now we have two.
The moderation patterns you see are the community and certainly have significant time factors that play into that. The idea that someone is going into the system and making manual changes to remove content is the conspiracy theory
Anything sovereign AI or whatever is gone immediately when the mods wake up.
Got an EU cloud article? Publish it at 11am CET, it's disappears around 12.30.
On the foundational level, test time compute(reasoning), heavy RL post training, 1M+ plus context length etc.
On the application layer, connecting with sandboxes/VM's is one of the biggest shifts. (Cloudfares codemode etc). Giving an llm a sandbox unlocks on the fly computation, calculations, RPA, anything really.
MCP's, or rather standardized function calling is another one.
Also, local llm's are becoming almost viable because of better and better distillation, relying on quick web search for facts etc.
We started putting them in image and video models and now image and video models are insane.
I think the next period of high and rapid growth will be in media (image, video, sound, 3D), not text.
It's much harder to adapt LLMs to solving business use cases with text. Each problem is niche, you have to custom tailor the solution, and the tooling is crude.
The media use cases, by contrast, are low hanging fruit and result in 10,000x speedups and cost reductions almost immediately. The models are pure magic.
I think more companies would be wise to ignore text for now and focus on visual domain problems.
Nano Banana has so much more utility than agents. And there are so many low hanging fruit ways to make lots of money.
Don't sleep on image and video. That's where the growth salient is.
> Nano Banana has so much more utility than agents.
I am so far removed from multimedia spaces that I truly can't imagine a universe where this could be true. Agents have done incredible things for me and Nano Banana has been a cool gimmick for making memes.
Anyone have a use case for media models that'll expand my mind here?
We now have capacity to program and automate in the optics, signals, and spatial domains.
As someone in the film space, here's just one example: we are getting extremely close to being able to make films with only AI tools.
Nano Banana makes it easy to create character and location consistent shots that adhere to film language and the rules of storytelling. This still isn't "one shot", and considerable effort still needs to be put in by humans. Not unlike AI assistance in IDEs requiring a human engineer pilot.
We're entering the era of two person film studios. You'll undoubtedly start seeing AI short films next year. I had one art school professor tell me that film seems like it's turning into animation, and that "photorealism" is just style transfer or an aesthetic choice.
The film space is hardly the only space where these models have utility. There are so many domains. News, shopping, gaming, social media, phone and teleconference, music, game NPCs, GIS, design, marketing, sales, pitching, fashion, sports, all of entertainment, consumer, CAD, navigation, industrial design, even crazy stuff like VTubing, improv, and LARPing. So much of what we do as humans is non-text based. We haven't had effective automation for any of this until this point.
This is a huge percentage of the economy. This is actually the beating heart of it all.
Been thinking about this. Curious why you positioned it as Nano Banana having more utility than agents when it seems like the next level even would be Nano Banana with agents?
> we are getting extremely close to being able to make films with only AI tools
AI still can’t reliably write text on background details. It can’t get shadows right. If you ask it to shoot things from a head on perspective, for example a bookshelf, it fails to keep proportions accurate enough. The bookshelf will not have parallel shelves. The books won’t have text. If in a library, the labels will not be in Dewey decimal order.
It still lacks a huge amount of understanding about how the world works necessary to make a film. It has its uses, but pretending like it can make a whole movie is laughable.
Still impossible at this point and for the foreseeable future. You can prompt all you want, you cannot get an image or video model to create certain scenes, even with a well defined spec
Exactly. You can still open the generations in Photoshop.
I'd say the image and video tools are much further along and much more useful than AI code gen (not to dunk on code autocomplete). They save so much time and are quite incredible at what they can do.
I don't think equating "extremely close" with "pretending like it can" is a fair way to frame the sentiment of the comment you were replying to. Saying something is close to doing something is not the same as saying it already can.
In terms of cinema tech, it took us arguably until the early 1940s to achieve "deep focus in artificial light". About 50 years!
The last couple of years of development in generative video looks, to me, like the tech is improving more quickly than the tech it is mimicking did. This seems unsurprising - one was definitely a hardware problem, and the other is most likely a mixture of hardware and software problems.
Your complaints (or analogous technical complaints) would have been acceptable issues - things one had to work around - for a good deal of cinema history.
We've already reached people complaining about "these book spines are illegible", which feels very close to "it's difficult to shoot in focus, indoors". Will that take four or five decades to achieve, based on the last 3 - 5 years of development?
The tech certainly isn't there yet, nor am I pretending like it is, and nor was the comment you replied to. To call it close is not laughable, though, in the historical context.
The much more interesting question is: At what point is there an audience for the output? That's the one that will actually matter - not whether it's possible to replicate Citizen Kane.
I suspect you're right, but it's a bit discouraging to consider that an alternative way of framing this is that companies like OpenAI have a huge advantage in this landscape and anything that works will end up behind their API.
In some ways, the fact that the technology will shift is the problem as model behavior keeps changing. It's rather maddening unstable ground to build on. Really hard to gauge the impact to customer experience from a new model.
Is JS dev really still so mercurial as it was 5 to 10 years ago? I'm not so sure. Back then, there would be a new topic daily about some new JS framework etc etc.
I still occasionally see a blip of activity but I can't say it's anything like what we witnessed in the past.
Though I will agree that gen AI trends feel reminiscent of that period of JS dev history.
I’m working on a couple apps using Typescript and for me (ex-JS hacker coming back to it after some years) it’s still an insane menu of bad choices and new “better” frameworks, some of which are abandoned before you get done reading the docs. Though I get that it probably moved faster a few years ago.
I settled on what seemed like the most “standard” set of things (marketable skills blabla) and every week I read an article about how that stack is dead, and everybody supposedly uses FancyStack now.
Adding insult to injury, I have relearned the fine art of inline styles. I assume table layouts are next.
To lurch back on topic: I’m doing this for AI-related stuff and yes, the AI pace of change is much worse, but they sure do make a nice feedback loop.
If it is, it’s entirely self inflicted today. There’s some tentpole tech that is reliable enough to stick with and get things done. Has been for a while.
You could use the like of Amazon / Anthropic, or use Google who has had transparent disk encryption for 10+ years, and Gemini which already had the transparent caching discussed built in.
If you’ve spent any time with the vertex LLM apis you wouldn’t be so enthusiastic about using Google’s platform (I say this as someone who prefers GCP to aws for compute and networking).
So yah, cool, caching all of that... but give it a couple of months and a better technique will come out - or more capable models.
Many years ago when disc encryption on AWS was not an option, my team and I had to spend 3 months to come up with a way to encrypt the discs and do so well because at the time there was no standard way. It was very difficult as that required pushing encrypted images (as far as I remember). Soon after we started, AWS introduced standard disc encryption that you can turn on by clicking a button. We wasted 3 months for nothing. We should have waited!
What I've learned from this is that often times it is better to do absolutely nothing.