More

jryio · 2025-09-15T12:48:16 1757940496

Agree. What writing is better for understanding geometric properties or information in high dimensional vector spaces + spherical codes?

cgadski · 2025-09-15T15:13:31 1757949211

There's a lot of beautiful writing on these topics on the "pure math" side, but it's hard to figure out what results are important for deep learning and to put them in a form that doesn't take too much of an investment in pure math.

I think the first chapter of [1] is a good introduction to general facts about high-dimensional stuff. I think this is where I first learned about "high-dimensional oranges" and so on.

For something more specifically about the problem of "packing data into a vector" in the context of deep learning, last year I wrote a blog post meant to give some exposition [2].

One really nice approach to this general subject is to think in terms of information theory. For example, take the fact that, for a fixed epsilon > 0, we can find exp(C d) vectors in R^d with all pairwise inner products smaller than epsilon in absolute value. (Here C is some constant depending on epsilon.) People usually find this surprising geometrically. But now, say you want to communicate a symbol by transmitting d numbers through a Gaussian channel. Information theory says that, on average, I should be able to use these d numbers to transmit C d nats of information. (C is called the channel capacity, and depends on the magnitude of the noise and e.g. the range of values I can transmit.) The statement that there exist exp(C d) vectors with small inner products is related to a certain simple protocol to transmit a symbol from an alphabet of size exp(C d) with small error rate. (I'm being quite informal with the constants C.)

[1] https://people.math.ethz.ch/~abandeira//BandeiraSingerStrohm... [2] https://cgad.ski/blog/when-numbers-are-bits.html

jryio · 2025-09-06T18:50:01 1757184601

This is a hacky joke. No sane engineer would ever sign off on this. Even for a 1-5 person team, why would I want a probabilistic selection of test execution?

The solution to running only e2e tests on affected files has been around long before LLM. This is a bandage on poor CI.

johnfn · 2025-09-06T18:52:06 1757184726

I have worked at large, competent companies, and the problem of "which e2e tests to execute" is significantly more complicated than you seem to suggest that it is. I've worked with smart engineers that put a lot of time into this problem to only get only middling results.

Yoric · 2025-09-06T19:11:09 1757185869

...and I'm not confident at all that Claude can do anything at that level.

johnfn · 2025-09-06T19:18:51 1757186331

How does that reconcile with the article, which states:

> Did Claude catch all the edge cases? Yes, and I'm not exaggerating. Claude never missed a relevant E2E test. But it tends to run more tests than needed, which is fine - better safe than sorry.

If you have some particular issue with the author's methodology, you should state that.

cerved · 2025-09-06T19:36:08 1757187368

Well since it never broke for some rando on the internet, surely that means it will always work for everyone

johnfn · 2025-09-06T19:46:34 1757187994

If you have some particular issue with the article, you should state that. Otherwise, the most charitable interpretation of your position I can come up with is "the article is wrong for some reason I refuse to specify", which doesn't lead to a productive dialogue.

ambicapter · 2025-09-06T22:08:39 1757196519

I think you're the one being uncharitable here. The meaning of what he's saying is very clear. You can't say this probabilistic method (using LLMs to decide your e2e test plan) works if you only have a single example of it working.

johnfn · 2025-09-07T01:38:33 1757209113

It's really not clear. Using probabilistic methods to determine your e2e test plan is already best practice at large tech shops, and to be quite honest the heuristics that they used to use were pretty poor and arbitrary.

cerved · 2025-09-12T22:32:27 1757716347

The author said they used Claude to decide which E2E tests to run and "Claude never missed a relevant E2E test."

How many times did they conduct this experiment? Over how long time? How did they determine which tests were relevant and that Claude didn't miss them? Did they try it on more than one project?

My point was that none of this tells me this will work in general

bgwalter · 2025-09-06T19:39:01 1757187541

If the author can keep the whole function code_change -> relevant E2E_TESTS in his head, it seems to be a trivial application.

We don't know the methodology, since the author does not state how he verified that function or how he would verify the function for a large code base.

troupo · 2025-09-06T20:53:41 1757192021

Easy. The article asks us to believe.

There's a handy list to check against the article here: https://dmitriid.com/everything-around-llms-is-still-magical... starting at "For every description of how LLMs work or don't work we know only some, but not all of the following"

johnfn · 2025-09-06T21:23:56 1757193836

It seems to me like we have the answers to all those questions.

- Do we know which projects people work on?

It's pretty easy to discover that OP works on https://livox.com.br/en/, a tool that uses AI to let people with disabilities speak. That sounds like a reasonable project to me.

- Do we know which codebases (greenfield, mature, proprietary etc.) people work on

The e2e tests took 2 hours to run and the website quotes ~40M words. That is not greenfield.

- Do we know the level of expertise the people have?

It seems like they work on nontrivial production apps.

- How much additional work did they have reviewing, fixing, deploying, finishing etc.?

The article says very little.

troupo · 2025-09-06T21:36:57 1757194617

> The article says very little.

And that's the crux, isn't it. Because that checklist really is just the tip of the iceberg.

Some people have completely opposite experiences: https://news.ycombinator.com/item?id=45152139

Others question the validity of the approach entirely: https://news.ycombinator.com/item?id=45152668

Oh, don't get me wrong: I like the idea. I would trust LLMs with this idea about as far as I could throw them.

jampa · 2025-09-06T19:25:55 1757186755

I think you might be confusing end-to-end (E2E) tests with other types of testing, such as unit and integration tests. No one is advocating this approach for unit tests, which should still run in their entirety on every pull request.

Running all E2E tests in a pipeline isn't feasible due to time constraints (takes hours). Most companies just run these tests nightly (and we still do). Which means we would still catch any issues that slip through the initial screening. But so far, nothing did.

trenchpilgrim · 2025-09-06T19:18:58 1757186338

> The solution to running only e2e tests on affected files has been around long before LLM.

This doesn't work in distributed systems, since changing the behavior of one file that's compiled in one binary can cause a downstream issue in a separate binary that sends a network call to the first. e.g. A programmer makes a behavioral change to binary #1 that falls within defined behavior, but encounters Hyrum's Law because of a valid behavior of binary #2.

hamandcheese · 2025-09-06T19:59:11 1757188751

That's easy:

- avoid distributed systems at all costs

- if you can't avoid them, never make breaking API changes

madeofpalk · 2025-09-06T20:05:34 1757189134

Determining breaking API changes is the whole point of tests.

trenchpilgrim · 2025-09-06T20:44:46 1757191486

While we're at it, give things good names and don't invalidate caches at the wrong times!

lukan · 2025-09-06T21:31:04 1757194264

Also always keep your documentation updated and complete.

hamandcheese · 2025-09-06T20:02:21 1757188941

I would sign off on it. The only evidence I would need to see is some analysis of whether the risks are worth the benefits.

Risks: missing e2e tests that should have run letting bugs into production, more time spent chasing down flakes due to non determinism.

Benefits: increased productivity, catch bugs sooner (since you can run e2e tests more often).

jryio · 2025-09-03T22:10:10 1756937410

I completely agree with the thesis here. I also have not seen a massive productivity boost with the use of AI.

I think that there will be neurological fatigue occurring whereby if software engineers are not actively practicing problem-solving, discernment, and translation into computer code - those skills will atrophy...

Yee, AI is not the 2x or 10x technology of the future ™ is was promised to be. It may the case that any productivity boost is happening within existing private code bases. Even still, there should be a modest uptick in noticeably improved offer deployment in the market, which does not appear to be there.

In my consulting practice I am seeing this phenomenon regularly, wereby new founders or stir crazy CTOs push the use of AI and ultimately find that they're spending more time wrangling a spastic code base than they are building shared understanding and working together.

I have recently taken on advisory roles and retainers just to reinstill engineering best practices..

heavyset_go · 2025-09-03T23:13:18 1756941198

> I think that there will be neurological fatigue occurring whereby if software engineers are not actively practicing problem-solving, discernment, and translation into computer code - those skills will atrophy...

I've found this to be the case with most (if not all) skills, even riding a bike. Sure, you don't forget how to ride it, but your ability to expertly articulate with the bike in a synergistic and tool-like way atrophies.

If that's the case with engineering, and I believe it to be, it should serve as a real warning.

jryio · 2025-09-03T23:24:01 1756941841

Yes and this is the placid version where lazy programmers elect to lighten their cognitive load by farming out to AI.

An insidious version is AGI replacing human cognition.

To replace human thought is to replace a biological ability which progresses on evolutionary timescales - not a Moore's law approximate curve. The issue in your skull will quite literally be as useful as a cow's for solving problems... think about that.

Automating labor in the 20th century disrupts society and we've see its consequences. Replacing cognition entirely: driving, writing, decision making, and communication; yields far worse outcomes than transitioning the population from food production to knowledge work.

If not our bodies and not our minds, then what do we have? (Note: Altman's universal basic income ought to trip every dystopian alarm bell).

Whether adopted passivity or foisted actively - cognition is what makes us human. Let's not let Claude Code be the nexus for something worse.

card_zero · 2025-09-04T08:04:36 1756973076

There's no connection between AI and AGI, apart from hopes. Besides which, if you're talking about AGI, you're talking about artificial people. That means:

• They don't really want to be servants.

• They have biases and preferences.

• Some of them are stupid.

• If you'd like to own an AGI that thinks for you, the AGI would also like one.

• They are people with cognition, even if we stop being.

GuB-42 · 2025-09-04T22:41:24 1757025684

AGI just means what it says it is: Artificial General Intelligence. AGIs don't have to have selfish traits like we do, they don't have to follow the rules of natural selection, they just need to solve general problems.

Think of them like worker bees. Bees can solve general problems, though not on level as humans do, they are like some primitive kind of AGI. They also live and die to be servants to the queen and they don't want to be queens themselves, the reason why is interesting btw, it involves genetics and game theory.

This is highly theoretical anyways, we have no idea how to make an AGI yet, and LLMs are probably a dead end as they can't interact with the physical world.

tempodox · 2025-09-04T09:44:22 1756979062

You’re anthropomorphizing too much.

card_zero · 2025-09-04T13:25:42 1756992342

These postulated entities are by definition people. Not humans, because they lack the biology, but that's a detail.

If you think they're going to be trained on all the world's data, that's still supposing them to be an extension of AI. No, they'll have to pick up their knowledge culturally, the same way everybody else does, by watching cartoons - I mean by interactions with mentors. They might have their own culture, but only the same way that existing groups of people with a shared characteristic do, and they can't weave it out of air; it has to derive from existing culture. There's a potential for an AGI to "think faster", but I'm skeptical about what that amounts to in practice or how much use it would be to them.

tavavex · 2025-09-04T16:30:45 1757003445

> These postulated entities are by definition people.

Why? Does your definition postulate that people are the only thing in the universe that can measure up to us? Or the inverse, that every entity as sentient and intelligent as us must be called a person?

My opinion is that a lot of what makes us like this is physiological. Unless the developers go out of their way to simulate these things, a hypothetical AGI won't be similar to us no matter how much human-made content it ingests. And why would they do that? Why would you want to implement physical pain, or fear, or human needs, or biases and fallacies driven from our primal instincts? Would implementing all these things even be possible at the point where we find an inroad towards AGI? All of that might require creating a comprehensive human brain simulation, not just a self-learning machine.

I think it's almost certain that, while there would be some mutual understanding, an AGI would almost certainly feel like a completely different species to us.

card_zero · 2025-09-04T21:42:19 1757022139

The latter, that intelligence is one thing, and that to imagine that an artificial intelligence would be some kind of beyond-intelligence, and would be a beyond-person, is to needlessly multiply entities. The assumption should be there's only (potential to create) people like us, because to imagine beyond-people is to get mystical about it. "Beyond-rats" is what I say to that.

I have sympathy with the point about physiology, though, I think being non-biological has to feel very different. You're released from a lot of the human condition, you're not driven by hormones or genes, your plans aren't hijacked to get you to reproduce or eat more or whatever animal thing, you don't have the same needs. That's all liable to alienate you from the meat-based folk. However, you're still a person.

cmsj · 2025-09-04T07:21:51 1756970511

AGI isn't going to come from Transformer LLMs. They are Statistical Turks.

talldrinkofwhat · 2025-09-04T16:49:56 1757004596

The author of the article had an interesting solution to this. Flip a coin to see who implements the feature.

Heads you code. Tails you review.

coffeebeqn · 2025-09-04T07:52:48 1756972368

Same - I use it at work at a big tech company and the real world efficiency gains on net are probably nonexistent. We have multiple large and not so large codebases. In a super trivial script or creating a struct from documentation it does the thing - great. For unit tests it’s about 50-50 if it’s useful or if I waste a few hours and delete the change set. In any moderately complex codebase Claude Sonnet or GPT in agent mode builds unneeded complexity, gets lost in a spiraling amount of nonsense steps, builds things that already exist in the codebase constantly. The best outcome I have to edit and review so heavily it’s like I’m jumping in on someone else’s PR halfway and have to grok what the heck did they misunderstand.

The only actually net positive is the Claude.md that some people maintain - it’s actually a good context dump for new engineers!

jryio · 2025-09-03T17:51:18 1756921878

What most of these comments are missing is the attempt at standardization and unification.

There are a lot of comments that people need X feature in order to switch to Y editor. While that may be true and your particular workflow requires certain features, what is overlooked is the survival pressure for editors.

It appears that our industry is moving towards adoption, sometimes mandatory, of AI coding agents. Regardless of your feelings on the topic, having good tooling to support this effort comes down to: switching costs, compatibility with existing editors, and a strong ecosystem of third party extensions.

While Cursor/Windsurf jumped the gun on bespoke editor integrations with LLMs - the adoption of MCP and other SDKs for coding agents means it's plug and play. The full feature set will be in every editor connected to every agent.

I think Zed wins on having the lowest switching costs for most developers. Paying down generic solutions like Agent Client Protocol (AC) now is a good strategy. It took multiple parties coming together for us to get TLS, OAuth 2.0, and ECMAScript.

I don't see why most editors should behave like hand crafted musical instruments when in reality they are much more akin to high quality knives in a kitchen (sure you have your favorite knife set and bring it from job to job, but at the end of the day you can be just as productive with a different knife when necessary).

Shank · 2025-09-04T00:09:51 1756944591

> I don't see why most editors should behave like hand crafted musical instruments when in reality they are much more akin to high quality knives in a kitchen (sure you have your favorite knife set and bring it from job to job, but at the end of the day you can be just as productive with a different knife when necessary).

This is such a poor analogy. Yes, a good chef can make do with a different knife, but there is a reason why chefs pay for significantly higher quality knives, keep them sharpened, and treat them with diligence and care, than other kitchen tools. A blunt knife can actually be dangerous. Consequently, a lot of chefs buy knives that are effectively hand crafted / forged knives out of this relentless pursuit of quality.

> What most of these comments are missing is the attempt at standardization and unification.

> While that may be true and your particular workflow requires certain features, what is overlooked is the survival pressure for editors.

I think your general perception is not something I agree with. I want to use software I enjoy using. Programming is a creative exercise for me, and I want to use the tools I enjoy. If a tool is not enjoyable to use, I do not want to use it. Sometimes, productivity does increase enjoyment, but sometimes it doesn't. For example, arguably I would have been more productive in my Java days if I used Eclipse, but because the editor was so bad, I preferred to learn the APIs myself and use Sublime Text instead.

I also don't think I'm sympathetic to the survival of any particular editor. Software comes and goes, and sustainably built business models will prevail. All of the AI-first editors hinge on this being the right iteration of this technology, and we simply do not have a long enough timeframe or context to know if this is truly the best way to write code using AI. MCP/ACP, whatever else might be the best strategy for now, but I think it's too early for anyone to suggest that we've come to the right conclusion forever.

conartist6 · 2025-09-04T10:49:13 1756982953

As someone who is in the position to see what the next really disruptive innovation is, you're quite right that there exist much, much better ways to write and collaborate on code. Flying leaps of innovation to Zed's tiny shuffle-steps.

conartist6 · 2025-09-04T10:56:11 1756983371

Zed spent their innovation budget on Rust and GPUI, and as a result they have no energy to question the status quo of IDEs as a whole. Git and LSP are antiquated but form the bedrock of their plans for the future.

Essentially at this point they can only do spaghetti enigneering: adding more and more complexity on top of the complexity that already exists. IDEs have been through so many iterations of this process already that all the real wins are in refactoring: moving the whole system (and ecosystem) design sideways, which is he one thing they dare not try to do (though it happens to be my forte).

azemetre · 2025-09-04T14:41:15 1756996875

What tools or tech are you referring to?

conartist6 · 2025-09-04T15:47:49 1757000869

BABLR -- a parser framework, and agAST, the DOM structure at the heart of our state layer. Come to our Discord if you want to learn more. We're trying to launch in the next day or two here.

azemetre · 2025-09-04T16:22:34 1757002954

Looks very interesting, thanks for sharing. Will be following for sure!

conartist6 · 2025-09-04T10:45:52 1756982752

I'm sorry if this is blunt but is Agent Client Protocol... ...good?

It just looks to me like a bolted-on dongle to the past 50 years of kludges in editor design. It hasn't got 1/20th of the value proposition that a proper shared state layer would offer.

jryio · 2025-09-03T17:34:58 1756920898

Zed succeeds at reducing the switching cost. I used NeoVim for ten years daily and configured it way back in college days.

I thought I would be unable to move to a GUI editor and it turns out that the speed and efficiency of Zed plus the almost one-to-one mapping of Vim features means that I am extremely productive in Zed.

jryio · 2025-08-31T03:37:19 1756611439

A trend I have noticed as well. I consider this pattern to ultimately be the forcing function of free market capitalism itself.

Once a brand accumulates sufficient reputational capital through genuine quality, the profit-maximizing imperative inevitably drives extraction over quality. (I would extend this argument briefly outside of the domain of economic theory and into physics: we do not observe low entropy being temporally consistent anywhere in the universe.)

The market doesn’t reward maintaining expensive quality standards when cheaper alternatives can temporarily coast on accumulated goodwill - shareholders demand margin expansion, private equity needs returns, and the competitive landscape punishes companies that leave money on the table by over-investing in product integrity.

Less of a moral failure by individual companies and more structural incentive alignment: capitalism systematically rewards converting hard won brand trust into extractable rents until the reputation is depleted, at which point capital simply moves to the next target.

The pattern you’re observing isn’t a bug but the logical endpoint of a system that treats reputation as just another asset to be optimized for shareholder value rather than a covenant with customers.

jryio · 2025-08-21T17:26:35 1755797195

Started an entire consulting practice to get engineering teams and founders out of vibe coded pits. Even got a great domain for it - vibebusters

So far business is booming and clients are happy with both human interactions with senior engineers as well as a final deliverable on best practices for using AI to write code.

Curious to compare notes

jryio · 2025-08-21T17:11:32 1755796292

In the last few months we have worked with startups who have vibe coded themselves into an abyss. Either because they never made the correct hires in the first place or they let technical talent go. [1]

The thinking was that they could iterate faster, ship better code, and have an always on 10x engineer in the form of Claude code.

I've observed perfectly rational founders become addicted to the dopamine hit as they see Claude code output what looks like weeks or years of software engineering work.

It's overgenerous to allow anyone to believe AI can actually "think" or "reason" through complex problems. Perhaps we should be measuring time saved typing rather than cognition.

[1] vibebusters.com

JustExAWS · 2025-08-22T02:20:55 1755829255

As if startups before LLMs were creating great code. Right now on the front page, a YC company is offering a “Founding Full Stack Engineer” $100K-$150K. What quality of code do you think they will end up with?

https://www.ycombinator.com/companies/text-ai/jobs/OJBr0v2-f...

zdragnar · 2025-08-22T03:48:20 1755834500

Notably, that is a company that... adds AI to group chats. Startups offering crap salaries with a vague promise of equity in a vague product idea with no moat are a dime a dozen, and have been well before LLMs came around.

JustExAWS · 2025-08-22T04:10:35 1755835835

How did they get YC funding? It doesn’t seem like they have even a POC or any technical employees.

cde-v · 2025-08-22T05:21:24 1755840084

Have you seen the companies YC has been funding recently? All you need to do is mention AI and YC will throw some money your way. I don't know if you saw my first attempt at a post, but someone should suggest AI for HN comment formatting and I'm sure it will be funded.

Acrely — AI for HVAC administration

Aden — AI for ERP operations

AgentHub — AI for agent simulation and evaluation

Agentin AI — AI for enterprise agents

AgentMail — AI for agent email infrastructure

AlphaWatch AI — AI for financial search

Alter — AI for secure agent workflow access control

Altur — AI for debt collection voice agents

Ambral — AI for account management

Anytrace — AI for support engineering

April — AI for voice executive assistants

AutoComputer — AI for robotic desktop automation

Autosana — AI for mobile QA

Autotab — AI for knowledge work

Avent — AI for industrial commerce

b-12 — AI for chemical intelligence

Bluebirds — AI for outbound targeting

burnt — AI for food supply chain operations

Cactus — AI for smartphone model deployment

Candytrail — AI for sales funnel automation

CareSwift — AI for ambulance operations

Certus AI — AI for restaurant phone lines

Clarm — AI for search and agent building

Clodo — AI for real estate CRMs

Closera — AI for commercial real estate employees

Clueso — AI for instructional content generation

cocreate — AI for video editing

Comena — AI for order automation in distribution

ContextFort — AI for construction drawing reviews

Convexia — AI for pharma drug discovery

Credal.ai — AI for enterprise workflow assistants

CTGT — AI for preventing hallucinations

Cyberdesk — AI for legacy desktop automation

datafruit — AI for DevOps engineering

Daymi — AI for personal clones

DeepAware AI — AI for data center efficiency

Defog.ai — AI for natural-language data queries

Design Arena — AI for design benchmarks

Doe — AI for autonomous private equity workforce

Double – Coding Copilot — AI for coding assistance

EffiGov — AI for local government call centers

Eloquent AI — AI for complex financial workflows

F4 — AI for compliance in engineering drawings

Finto — AI for enterprise accounting

Flai — AI for dealership customer acquisition

Floot — AI for app building

Fluidize — AI for scientific experiments

Flywheel AI — AI for excavator autonomy

Freya — AI for financial services voice agents

Frizzle — AI for teacher grading

Galini — AI guardrails as a service

Gaus — AI for retail investors

Ghostship — AI for UX bug detection

Golpo — AI for video generation from documents

Halluminate — AI for training computer use

HealthKey — AI for clinical trial matching

Hera — AI for motion design

Humoniq — AI for BPO in travel and transport

Hyprnote — AI for enterprise notetaking

Imprezia — AI for ad networks

Induction Labs — AI for computer use automation

iollo — AI for multimodal biological data

Iron Grid — AI for hardware insurance

IronLedger.ai — AI for property accounting

Janet AI — AI for project management (AI-native Jira)

Kernel — AI for web agent browsing infrastructure

Kestroll — AI for media asset management

Keystone — AI for software engineering

Knowlify — AI for explainer video creation

Kyber — AI for regulatory notice drafting

Lanesurf — AI for freight booking voice automation

Lantern — AI for Postgres application development

Lark — AI for billing operations

Latent — AI for medical language models

Lemma — AI for consumer brand insights

Linkana — AI for supplier onboarding reviews

Liva AI — AI for video and voice data labeling

Locata — AI for healthcare referral management

Lopus AI — AI for deal intelligence

Lotas — AI for data science IDEs

Louiza Labs — AI for synthetic biology data

Luminai — AI for business process automation

Magnetic — AI for tax preparation

MangoDesk — AI for evaluation data

Maven Bio — AI for BioPharma insights

Meteor — AI for web browsing (AI-native browser)

Mimos — AI for regulated firm visibility in search

Minimal AI — AI for e-commerce customer support

Mobile Operator — AI for mobile QA

Mohi — AI for workflow clarity

Monarcha — AI for GIS platforms

moonrepo — AI for developer workflow tooling

Motives — AI for consumer research

Nautilus — AI for car wash optimization

NOSO LABS — AI for field technician support

Nottelabs — AI for enterprise web agents

Novaflow — AI for biology lab analytics

Nozomio — AI for contextual coding agents

Oki — AI for company intelligence

Okibi — AI for agent building

Omnara — AI for agent command centers

OnDeck AI — AI for video analysis

Onyx — AI for generative platform development

Opennote — AI for note-based tutoring

Opslane — AI for ETL data pipelines

Orange Slice — AI for sales lead generation

Outlit — AI for quoting and proposals

Outrove — AI for Salesforce

Pally — AI for relationship management

Paloma — AI for billing CRMs

Parachute — AI for clinical evaluation and deployment

PARES AI — AI for commercial real estate brokers

People.ai — AI for enterprise growth insights

Perspectives Health — AI for clinic EMRs

Pharmie AI — AI for pharmacy technicians

Phases — AI for clinical trial automation

Pingo AI — AI for language learning companions

Pleom — AI for conversational interaction

Qualify.bot — AI for commercial lending phone agents

Reacher — AI for creator collaboration marketing

Ridecell — AI for fleet operations

Risely AI — AI for campus administration

Risotto — AI for IT helpdesk automation

Riverbank Security — AI for offensive security

Saphira AI — AI for certification automation

Sendbird — AI for omnichannel agents

Sentinel — AI for on-call engineering

Serafis — AI for institutional investor knowledge graphs

Sigmantic AI — AI for HDL design

Sira — AI for HR management of hourly teams

Socratix AI — AI for fraud and risk teams

Solva — AI for insurance

Spotlight Realty — AI for real estate brokerage

StackAI — AI for low-code agent platforms

stagewise — AI for frontend coding agents

Stellon Labs — AI for edge device models

Stockline — AI for food wholesaler ERP

Stormy AI — AI for influencer marketing

Synthetic Society — AI for simulating real users

SynthioLabs — AI for medical expertise in pharma

Tailor — AI for retail ERP automation

Tecto AI — AI for governance of AI employees

Tesora — AI for procurement analysis

Trace — AI for workflow automation

TraceRoot.AI — AI for automated bug fixing

truthsystems — AI for regulated governance layers

Uplift AI — AI for underserved voice languages

Veles — AI for dynamic sales pricing

Veritus Agent — AI for loan servicing and collections

Verne Robotics — AI for robotic arms

VoiceOS — AI for voice interviews

VoxOps AI — AI for regulated industry calls

Vulcan Technologies — AI for regulatory drafting

Waydev — AI for engineering leadership insights

Wayline — AI for property management voice automation

Wedge — AI for healthcare trust layers

Workflow86 — AI for workflow automation

ZeroEval — AI for agent evaluation and optimization

jamiek88 · 2025-08-22T06:34:17 1755844457

Oh my god, this hilariously looks like it could have been LLM generated itself!

JustExAWS · 2025-08-22T13:53:46 1755870826

And the ideas may or may not be bad. I don’t know enough about any of the business segments. But to paraphrase the famous Steve Jobs quote “those aren’t businesses, they are features” [1] that a company that is already in the business should be able to throw a few halfway decent engineers at and add the feature to an existing product with real users.

[1] He said that about Dropbox. He wasn’t wrong just premature. For the price of 2TB on Dropbox, you can get the entire GSuite with 2TB or Office365 with 1TB for up to five users for 5TB in all.

fragmede · 2025-08-22T17:16:30 1755882990

now you can, but, what, are you gonna lie down and wait for tech giants to do everything? Not every company needs to be Apple. If Dropbox filed for bankruptcy tomorrow, they've still made millionaires of thousands of people and given jobs to hundreds more, and enabled people to share their files online.

Steve Jobs gets to call other companies small because Apple is huge, but there are thousands of companies that "are just features". Yeah, features they forgot to add!

Apple still doesn't make cases for their phones.

JustExAWS · 2025-08-22T18:02:27 1755885747

Out of the literally thousands of companies that YC has invested in, only about a dozen have gone public, the rest are either dead, zombies or got acquired. These are all acquisition plays.

Even the ones that have gone public haven’t done that well in aggregate.

https://medium.com/@kazeemibrahim18/the-post-ipo-performance...

Dropbox was solving a hard infrastructure problem at scale. These companies are just making some API calls to a model.

If an established company in any of these verticals - not necessarily BigTech - see an opportunity, they are either going to throw a few engineers at the problem and add it as a feature or hire a company like the one I work for and we are going to knock out an implementation in a few months.

The one YC company I mentioned above is expecting to have their product written by one “full stack engineer” that they are only willing to pay $150K for. How difficult can it be?

fragmede · 2025-08-22T18:14:00 1755886440

> These are all acquisition plays.

Which seems fine? VC money gets thrown at a problem, the problem may or may not get solved by a particular team, but a company gets created, some people do some work, some people make money, others don't. I don't get it. Are you saying no one should bother doing anything because someone else is already doing it or that it's not difficult so why try?

JustExAWS · 2025-08-22T18:24:23 1755887063

I’m pushing back against this…

> Not every company needs to be Apple

These aren’t people deciding to build “companies” - ie create a product that people want and turn a profit. They are a legal Ponzi scheme.

kehvyn · 2025-08-22T09:58:49 1755856729

Wow, that's a lot of AI.

Do you think they're all using actual LLMs? I've got a natural language parser I could probably market as "AI Semantic Detection" even though it's all regular expressions

JustExAWS · 2025-08-22T12:21:42 1755865302

I have a confession to make, I was about to downvote you because I thought you just asked ChatGPT to come up with some ridiculous company concepts and copy and pasted.

Then I saw the sibling comment and searched a couple of company names and realized they were real.

ewoodrich · 2025-08-22T17:06:52 1755882412

I made it as far as "Halluminate" and thought I got got until I Googled it.

v-yanakiev · 2025-08-23T07:25:14 1755933914

From what I’ve read, this is a consequence of applicants themselves concentrating on AI, which preceded their AI-filled batches. YC still has a very low acceptance rate, btw.

gregoryl · 2025-08-22T03:38:14 1755833894

Shush please. I wasn't old enough to cash in on the Y2K contracting boons; I'm hoping the vibe coding 200k LOC b2b AI slop "please help us scale to 200 users" contracting gigs will be lucrative.

s_dev · 2025-08-22T07:04:28 1755846268

Completely agree, software developers need to be using agentic coding as a writing tool not as a thinking tool.

DirkH · 2025-08-22T01:02:26 1755824546

Give it a year or 2. Its not like 2 years ago everyone wasn't saying it would be 10+ years before AI can do what it does now.

johnnyanmac · 2025-08-22T01:19:32 1755825572

>Its not like 2 years ago everyone wasn't saying it would be 10+ years before AI can do what it does now.

So far I don't see that notion disproved. Ai still doesn't truly "reason with" nor understand the data it outputs.

jryio · 2025-08-19T04:05:36 1755576336

Does this functionality exist on iOS ? I'm looking for an iOS app that wraps Parakeet or whisper in a custom iOS keyboard.

That way I can switch to the dictation keyboard, press dictate, and have the transcription inserted in any application (first or third party).

MacWhisper is fantastic for macOS system dictation but the same abilities don't exist on iOS yet. The native iOS dictation is quite good but not as accurate with bespoke technical words / acronyms as Whisper cpp.

nchudleigh · 2025-08-19T04:12:10 1755576730

superwhisper has that functionality.

jryio · 2025-08-19T05:12:14 1755580334

Right but not running locally on device. No privacy

braden-w · 2025-08-19T05:30:42 1755581442

I really want to run it locally on a phone, but as a developer it's scary to think about making a native mobile app and having to work with the iOS toolchain I don't have bandwidth at the moment, but if anyone knows of any OSS mobile alternatives, feel free to drop them!

jryio · 2025-07-29T17:12:54 1753809174

I would like to see randomized control group studies using study mode.

Does it offer meaningful benefits to students over self directed study?

Does it out perform students who are "learning how to learn"?

What affect does allowing students to make mistakes have compared to being guided through what to review?

I would hope Study Mode would produce flash card prompts and quantize information for usage in spaced repetition tools like Mochi [1] or Anki.

See Andy's talk here [2]

[1] https://mochi.cards

[2] https://andymatuschak.org/hmwl/

righthand · 2025-07-29T17:18:42 1753809522

It doesn’t do any of that, it just captures the student market more.

They want a student to use it and say “I wouldn’t have learned anything without study mode”.

This also allows them to fill their data coffers more with bleeding edge education. “Please input the data you are studying and we will summarize it for you.”

LordDragonfang · 2025-07-29T18:15:18 1753812918

> It doesn’t do any of that

Not to be contrarian, but do you have any evidence of this assertion? Or are you just confidently confabulating a response for something outside of the data you've been exposed to? Because a commentor below provided a study that directly contradicts this.

righthand · 2025-07-29T18:49:17 1753814957

A study that directly contradicts what exactly?

echelon · 2025-07-29T17:24:36 1753809876

Such a smart play.

precompute · 2025-07-29T18:29:46 1753813786

Bingo. The scale they're operating at, new features don't have to be useful, they only need to look like they are for the first few minutes.

theodorewiles · 2025-07-29T17:39:52 1753810792

https://www.nature.com/articles/s41598-025-97652-6

This isn't study mode, it's a different AI tutor, but:

"The median learning gains for students, relative to the pre-test baseline (M = 2.75, N = 316), in the AI-tutored group were over double those for students in the in-class active learning group."

Aachen · 2025-07-29T21:26:51 1753824411

I wonder how much this was a factor:

"The occurrence of inaccurate “hallucinations” by the current [LLMs] poses a significant challenge for their use in education. [...] we enriched our prompts with comprehensive, step-by-step answers, guiding the AI tutor to deliver accurate and high-quality explanations (v) to students. As a result, 83% of students reported that the AI tutor’s explanations were as good as, or better than, those from human instructors in the class."

Not at all dismissing the study, but to replicate these results for yourself, this level of gain over a classroom setting may be tricky to achieve without having someone make class materials for the bot to present to you first

Edit: the authors further say

"Krupp et al. (2023) observed limited reflection among students using ChatGPT without guidance, while Forero (2023) reported a decline in student performance when AI interactions lacked structure and did not encourage critical thinking. These previous approaches did not adhere to the same research-based best practices that informed our approach."

Two other studies failed to get positive results at all. YMMV a lot apparently (like, all bets are off and your learning might go in the negative direction if you don't do everything exactly as in this study)

purplerabbit · 2025-07-29T23:07:03 1753830423

In case you find it interesting: I deployed an early version of a "lesson administering" bot deployed on a college campus that guides students through tutored activities of content curated by a professor in the "study mode" style -- that is, forcing them to think for themselves. We saw an immediate student performance gain on exams of about 1 stdev in the course. So with the right material and right prompting, things are looking promising.

energy123 · 2025-07-30T02:34:48 1753842888

OpenAI should figure out how to onboard teachers. Teacher uploads context for the year, OpenAI distributes a chatbot to the class that's perma fixed into study mode. Basically like GPT store but with an interface and UX tuned for a classroom.

posix86 · 2025-07-29T18:55:38 1753815338

There's studies showing that LLM makes experienced devs slower in their work. I wouldn't be surprised if it was the same for self study.

However consider the extent to which LLMs make the learning process more enjoyable. More students will keep pushing because they have someone to ask. Also, having fun & being motivated is such a massive factor when it comes to learning. And, finally, keeping at it at 50% the speed for 100% the material always beats working at 100% the speed for 50% the material. Who cares if you're slower - we're slower & faster without LLMs too! Those that persevere aren't the fastest; they're the ones with the most grit & discipline, and LLMs make that more accesible.

SkyPuncher · 2025-07-29T20:40:27 1753821627

The study you're referencing doesn't make that conclusion.

It concludes theres a learning curve that generally takes about 50 hours of time to figure out. The data shows that the one engineer who had more than 50 hours of experience with Cursor actually worked faster.

This is largely my experience, now. I was much slower initially, but I've now figured out the correct way to prompt, guide, and fix the LLM to be effective. I produce way more code and am mentally less fatigued at the end of each day.

graerg · 2025-07-29T19:19:39 1753816779

People keep citing this study (and it was on the top of HN for a day). But this claim falls flat when you find out that the test subjects had effectively no experience with LLM equipped editors and the 1-2 people in the study that actually did have experience with these tools showed a marked increase in productivity.

Like yeah, if you’ve only ever used an axe you probably don’t know the first thing about how to use a chainsaw, but if you know how to use a chainsaw you’re wiping the floor with the axe wielders. Wholeheartedly agree with the rest of your comment; even if you’re slow you lap everyone sitting on the couch.

snewman · 2025-07-29T19:13:20 1753816400

I presume you're referring to the recent METR study. One aspect of the study population, which seems like an important causal factor in the results, is that they were working in large, mature codebases with specific standards for code style, which libraries to use, etc. LLMs are much better at producing "generic" results than matching a very specific and idiosyncratic set of requirements. The study involved the latter (specific) situation; helping people learn mainstream material seems more like the former (generic) situation.

(Qualifications: I was a reviewer on the METR study.)

bretpiatt · 2025-07-29T19:45:31 1753818331

*slower with Sonnet 3.7 on large open source code bases where the developer is a senior member of the project core team.

https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

I believe we'll see the benefits and drawbacks of AI augmentation to humans performing various tasks will vary wildly based on the task, the way the AI is being asked to interact, and the AI model.

daedrdev · 2025-07-29T22:54:58 1753829698

It was a 16 person study on open source devs that found 50 hours of experience with the tool made people more productive

viccis · 2025-07-29T18:37:22 1753814242

I would be interested to see if there have already been studies about the efficacy of tutors at good colleges. In my experience (in academia), the students who make it into an Ivy or an elite liberal arts school make extensive use of tutor resources, but not in a helpful way. They basically just get the tutor to work problems for them (often their homework!) and feel like they've "learned" things because tough questions always seems so obvious when you've been shown the answer. In reality, what it means it that they have no experience being confused or having to push past difficult things they were stuck on. And those situations are some of the most valuable for learning.

I bring this up because the way I see students "study" with LLMs is similar to this misapplication of tutoring. You try something, feel confused and lost, and immediately turn to the pacifier^H^H^H^H^H^H^H ChatGPT helper to give you direction without ever having to just try things out and experiment. It means students are so much more anxious about exams where they don't have the training wheels. Students have always wanted practice exams with similar problems to the real one with the numbers changed, but it's more than wanting it now. They outright expect it and will write bad evals and/or even complain to your department if you don't do it.

I'm not very optimistic. I am seeing a rapidly rising trend at a very "elite" institution of students being completely incapable of using textbooks to augment learning concepts that were introduced in the classroom. And not just struggling with it, but lashing out at professors who expect them to do reading or self study.

apwell23 · 2025-07-29T18:19:30 1753813170

it makes difference to students who are already motivated. that was the case with youtube.

unfortunately that group is tiny and getting tinier due to dwindling attention span.

CobrastanJorji · 2025-07-29T17:33:08 1753810388

Come on. Asking an educational product to do a basic sanity test as to whether it helps is far too high a bar. Almost no educational app does that sort of thing.

tempfile · 2025-07-29T17:22:17 1753809737

I would also be interested to see whether it outperforms students doing literally nothing.