More

Bishonen88 · 2026-02-11T10:50:19 1770807019

All your posts are written by AI. All of them have the same "The x is not y, its z" phrases.

gamma-interface · 2026-02-11T14:54:07 1770821647

I think you are a paranoid, honestly. Is not simply true. Maybe you are an AI trying to dissimulate

Bishonen88 · 2026-02-11T09:11:44 1770801104

I know I've seen something like this on hackernews before. SaaS for taking over the IDE at any point of a recording, just without the video.

EDIT: https://news.ycombinator.com/item?id=28207662

Seems this but with a slightly different spin?

EDIT2: Gave it a go. Works as intended, so good job on that. The video being a video, makes it a bit awkward though - if I stop the recording and edit some part, I'd want to see the changes live, but for that I guess I'd have to start the server myself? And when I hit play, my changes got deleted anyway (?).

As for the usefulness aspect, personally I am not sure that this has a benefit over e.g. watching youtube tutorials/following books. I watched one of the videos and I'd have to concentrate on the video, the text and audio at the same time, and it wouldn't be me typing the code anyway, so I'm not sure how much I'd remember of it. I'd have to stop, open a new project and try to rewrite it myself to memorize the concepts deeper. But that's just my personal take - might be that there's a big userbase for such interactive learning!

seansh · 2026-02-11T12:51:25 1770814285

> As for the usefulness aspect, personally I am not sure that this has a benefit over e.g. watching youtube tutorials/following books.

I do like YouTube video tutorials, but only as long as they're short. Watching Handmade Hero (by Casey Muratori) for example was a little frustrating: the videos are long, the codebase is large, things are moving fast, and I'd get lost.

I often wished I could pause the video to look up the definition of a function, or get an overview of when each file/line was edited and jump straight to that point.

Books/blogs are ok for explaining large codebases that already exist, but not for following a project as the code constantly changes. The book Crafting Interpreters did a really good job there, but that's really rare and hard to do.

I think CodeMic could be useful for this kind of long-form tutorials.

seansh · 2026-02-11T12:20:26 1770812426

I think you mean Scrimba. Yes, it's similar in the sense that in both tools, when you're playing back a recording, you're not looking at the code as a video. But instead the code is there as text. You can pause the recording, look at the files in the project, scroll up and down the editor etc.

The difference is that CodeMic records and replays inside your editor, not on the web. Currently, only VSCode is supported, but the output is independent of VSCode, making it easy to bring it to other editors and even the web.

Another difference is that CodeMic is not focused on web development or any particular stack. It's more general.

Bishonen88 · 2026-02-11T09:05:42 1770800742

At what point does the project outgrow the AI in your experience? I have a 70k LOC backend/frontend/database/docker app that Claude still mostly one shots most features/tasks I throw at it. Perhaps, it's not as good remembering all the intertwined side-effects between functionalities/ui's and I have to let it know "in the calendar view, we must hide it as well", but that takes little time/effort.

Does it break down at some point to the extent that it simply does not finish tasks? Honest question as I saw this sentiment stated previously and assumed that sooner or later I'll face it myself but so far I didn't.

mschild · 2026-02-11T09:35:15 1770802515

I find that with more complex projects (full-stack application with some 50 controllers, services, and about 90 distinct full-feature pages) it often starts writing code that simply breaks functionality.

For example, had to update some more complex code to correctly calculate a financial penalty amount. The amount is defined by law and recently received an overhaul so we had to change our implementation.

Every model we tried (and we have corporate access and legal allowance to use pretty much all of them) failed to update it correctly. Models would start changing parts of the calculation that didn't need to be updated. After saying that the specific parts shouldn't be touched and to retry, most of them would go right back to changing it again. The legal definition of the calculation logic is, surprisingly, pretty clear and we do have rigorous tests in place to ensure the calculations are correct.

Beyond that, it was frustrating trying to get the models to stick to our coding standards. Our application has developers from other teams doing work as well. We enforce a minimum standard to ensure code quality doesn't suffer and other people can take over without much issue. This standard is documented in the code itself but also explicitly written out in the repository in simple language. Even when explicitly prompting the models to stick to the standard and copy pasting it into the actual chat, it would ignore 50% of it.

The most apt comparison I can make is that of a consultant that always agrees with you to your face but when doing actual work, ignores half of your instructions and you end up running after them to try to minimize the mess and clean up you have to do. It outputs more code but it doesn't meet the standards we have. I'd genuinely be happy to offload tasks to AI so I can focus on the more interesting parts of work I have, but from my experience and that of my colleagues, its just not working out for us (yet).

judahmeek · 2026-02-11T20:33:36 1770842016

I noticed that you said "models" & not "agents". Agents can receive feedback from automated QA systems, such as linters, unit, & integration tests, which can dramatically improve their work.

There's still the risk that the agent will try to modify the QA systems themselves, but that's why there will always be a human in the loop.

mschild · 2026-02-12T06:16:38 1770876998

Should've clarified in that case. I used models as a general stand-in for AI.

To provide a bit more context: - We use VS Code (plus derivatives like Cursor) hooked up to general modals and allowing general context access to the entire repository. - We have a MCP server that has access to company internal framework and tools (especially the documentation) so it should know how they are used.

So far, we've found 2 use-cases that make AI work for us: 1. Code Review. This took quite a bit of refinement for the instructions but we've got it to a point where it provides decent comments on the things we want it to comment on. It still fails on the more complex application logic, but will consistently point out minor things. It's used now as a Pre-PR review so engineers can use it and fix things before publishing a PR. Less noise for the rest of the developers. 2. CRUD croft like tests for a controller. We still create the controller endpoint, but providing it with the controller, DTOs, and an example of how another controller has its tests done, it will produce decent code. Even then, we still often have to fix a couple of things and debug to see where it went wrong like fixing a broken test by removing the actual strictlyEquals() call.

Just keeping up with newest AI changes is hard. We all have personal curiosity but at the end of the day, we need to deliver our product and only have so much time to experiment with AI stuff. Nevermind all the other developments in our regulatory heavy environment and tech stack we need to keep on top off.

the_other · 2026-02-11T16:47:41 1770828461

> At what point does the project outgrow the AI in your experience? I have a 70k LOC backend/frontend/database/docker app that Claude still mostly one shots most features/tasks I throw at it.

How do you do this?

Admittedly, I'm using Copilot, not CC.

I can't get Copilot to finish a refactor properly, let alone a feature. It'll miss an import rename, leave in duplicated code, update half the use cases but not all.. etc. And that's with all the relevant files in context, and letting it search the codebase so it can get more context.

It can talk about DRY, or good factoring, or SOLID, but it only applies them when it feels like it, despite what's in AGENTS.md. I have much better results when I break the task down into small chunks myself and NOT tell it the whole story.

threatofrain · 2026-02-11T09:45:20 1770803120

I'm having trouble at 150k, but I'm not sure the issue is that per se, as opposed to the issue of the set of relevant context which is easy to find. The relevant part of the context threatens to bring in disparate parts of the codebase. The easy to find part determines whether a human has to manually curate the context.

Bishonen88 · 2026-02-11T09:02:17 1770800537

Some interesting parts in the text. Some not so interesting ones. The author seems to be thinking that he's a big deal it seems, though - a month ago, I did not know who he is. My work environment has never heard of him (SDE at FAANG). Maybe I'm an outlier and he indeed influences the whole expectation management at companies with his writing, or maybe the success (?) of gastown got to him and he thinks he's bigger than he actually is. Time will tell. In any case, the glorification of oneself in an article like that throws me off for some reason.

arjie · 2026-02-11T09:39:53 1770802793

He's early Amazon early Google so he's seen two companies super scale. Few people last two paradigm shifts so that's no guarantee of credentials. But at the time he was famous for a specific accidentally-public post that exposed people to the amount that Bezos's influence ramified through Amazon and how his choices contrasted with Google's approach to platforms.

https://news.ycombinator.com/item?id=3101876

strstr · 2026-02-11T09:05:58 1770800758

Popular blogger from roughly a decade ago. His rants were frequently cited early in my career. I think he’s fallen off in popularity substantially since.

Bishonen88 · 2026-02-10T09:54:38 1770717278

An everything app for personal life. Todos, Calendar with google, garmin integration, habits, long term goals, meals, leaderboards (within family), notifications, notes (md exportable), finances (account balances, transactions, investments), shopping lists with scannable receipts, quick inbox for random ideas which then are converted into tasks/goals/habits/notes, weekly reviews, morning manifestos and more.

70k LOC. Deployed on Nas. Superpowered with AI for daily recaps. Looks like any modern commercial SaaS.

Bishonen88 · 2026-02-05T09:16:45 1770283005

When you write code yourself, you're convinced each line is correct as you write it. That assumption is hard to shake, so you spend hours hunting for bugs that turn out to be obvious. When reading AI-generated code fresh, you lack that assumption. Bugs can jump out faster. That's at least my naive explanation to this phenomenon

Bishonen88 · 2026-02-03T10:51:04 1770115864

I think deploying can already be done with the help of LLMs using docker and vpc's (e.g. hetzner and co.) rather easily.

What I struggle with is the legal overhead of e.g. collecting money for an app/website. I have a semi-finished app which I know I could delploy within a few hours but to collect money, living in Germany is a minefield from what I understand. I don't want my name made public with the app. GmbH (LLCs) cost thousands (?). The whole GDPR minefield, google-font usage scam etc. makes me hold back.

Googling/reddit only gives so much insights.

If someone has a good reference about starting a SaaS/App from within EU/Germany with all the legalities etc. I'd be super interested!

Bishonen88 · 2026-02-03T08:30:01 1770107401

Crazy free tier. I reckon I used ~2 weeks of the claude $20 subscription within an hour. Spawned like 12 semi big tasks and I still didn't see no warnings.

ProofHouse · 2026-02-03T08:31:32 1770107492

Not totally understanding this. Crazy free tier but that it used a crazy amount of tokens that cost you? Sry just trynna see what you’re saying

Bishonen88 · 2026-02-03T09:18:08 1770110288

My bad then. I meant that it's "Crazy Good" as in that the free tier gave me a tremendous amount of tokens.

What I didn't realize though, is that the limit doesn't reset each 5 hours as is the case for claude. I hit the limit of the free tier about 2 hours in, and while I was expecting to be able to continue later today, it alerts me that I can continue in a week.

So my hype for the amount of tokens one gets compared to claude was a bit too eager. Hitting the limit and having to wait a week probably means that we get a comparable token amount vs the $20 claude plan. I wonder how much more I'd get when buying the $20 plus package. The pricing page doesn't make that clear (since there was no free plan before yesterday I guess): https://developers.openai.com/codex/pricing/

Bishonen88 · 2026-02-02T20:04:21 1770062661

https://www.youtube.com/watch?v=7lzx9ft7uMw

^ Everything App for Personal use that I'm thinking about making public in some way

~50k loc with ~400 files. Docker, postgres, react + fastify I'd say between 15 and 20 hours of vibe coding

- Tasks, Goals, Habits

- Calendar showing all of the above with two way google sync

- Household sharing of markdown notes, goals and more

- Financial projections, spending, earning, recurring transactions and more

- Meal tracking with pics, last eaten, star rating and more

- Gantt chart for goals

- Dashboard for at a glance view

- PWA for android with layout optimizations

- Dark mode

... and more

Could've I done it in the last 5 years? Yes. It would've taken 3-4 months if not more though. Now we could talk 24/7 about whether it's clean code, super maintainable, etc. etc. The code written by hand wouldn't be either if it'd be me just doing a hobby project.

Shipping is rather straightforward as well thanks to LLM's. They hold your hand most of the way. Being a techie makes this much, much easier...

I think developers are cooked one way or another. Won't take long now. Same question asked a year ago was dramatically different. AI were helpful to some extent but couldn't code up basic things.

Bishonen88 · 2026-01-30T15:13:32 1769786012

At work we were joking that people will use LLM to create fancy-looking documents which will then be parsed through LLMs back to be concise and to the point. With LLMs handling the sending of messages as well, this makes the whole concept will be even more efficient.

I can just imagine that many people won't be using stuff like this to automate copy-pasting etc. but literally let LLM's handle conversations for them (which will in turn be read by other LLMs).

"You free to chat?" "Always. I'm a bot." "…Same."

This post has been written by a human :)

TeMPOraL · 2026-01-31T08:48:07 1769849287

More charitable take: they'd be using LLMs as secretaries.

Having a delegate to deal with communications is something people embrace when they can afford it. "My people will talk to your people" isn't an unusual concept. LLMs could be an alternative to human secretaries, that's affordable to the middle class and below.