> but not contemplating that it's going to be out in the wild for 10 years either way
I think there are a lot of developers working in repos where it's almost guaranteed that their code will _not_ still be there in 10 years, or 5 years, or even 1 year.
>I think there are a lot of developers working in repos where it's almost guaranteed that their code will _not_ still be there in 10 years, or 5 years, or even 1 year.
And in almost all of those cases, they'd be wrong.
I think I calculated the half-life of my code written at my first stint of Google (15 years ago) as 1 year. Within 1 year, half of the code I'd written was deprecated, deleted, or replaced, and it continued to decay exponentially like that throughout my 6-year tenure there.
Interestingly, I still have some code in the codebase, which I guess makes sense because I submitted about 680K LOC (note: not all hand-authored, there was a lot of output from automated tools in that) and 2^15 is 32768, so I'd expect to have about 20 lines left, which is actually surprisingly close to accurate (I didn't precisely count, but a quick glance at what I recognized suggested about 200 non-deprecated lines remain in prod). It is not at all the code that I thought would still be there 15 years later, or that I was most proud of. The most durable change appears to be renaming some attributes in a custom templating language that is now deeply embedded in the Search stack, as well as some C++ code that handles how various search options are selected and persisted between queries.
I think this both proves and disproves the original point. Most of your code is temporary. You have no idea which parts of your code is temporary. It's probably not the parts that you wish were temporary, which will almost certainly be made permanent.
In my experience the code will, but by year 5 nobody is left who worked on it from inception, and by year 10 nobody knows anybody who did, and during that time it reaches a stage where nobody will ever feel any sense of ownership or care about the code in its entirety again.
I come into work and work on a 20 year old codebase every day, working on slowly modernizing it while preserving the good parts. In my experience, and I've been experimenting with both a lot, LLM-based tools are far worse at this than they are at starting new greenfield projects.
When it comes to professional development, I've almost never worked on a codebase less than 10 years old, and it was always [either silently or overtly] understood that the software we are writing is a project that's going to effectively live forever. Or at least until the company is no longer recognizable from what it is today. It just seems wild and unbelievable to me, to go to work at a company and know that your code is going to be compiled, sent off to customers, and then nobody is ever going to touch it again. Where the product is so throwaway that you're going to work on it for about a year and then start another greenfield codebase. Yet there are companies that operate that way!
How can you possibly know which type of repo you're in ahead of time? My experience is that "temporary" code frequently becomes permanent and I've also been on the other side of those decisions 40 years later.
Unless you’re producing demos for sales presentation (internally or externally), it’s always worth it to produce something good. Bad code will quickly slow you down and it will be a never ending parade of bug tickets.
That depends on how quick the feedback loop is for your decisions. If it takes weeks or months to find the impact of your changes, or worse, if you're insulated somehow from those changes, you may not be pushed toward improving the quality of your code.
> People may not know that the reason they like your product is because the code is so good, but everyone likes software that is mostly free from bugs, performs extremely well, helps them do their work quickly, and is obviously created by people the care deeply about the quality of the product they produce (you know, the kind that acutally read bug reports, and fix problems quickly).
I would classify all of those as "capabilities and limitations of your product"
I read OPs "good code" to mean "highly aesthetic code" (well laid out, good abstractions, good comments, etc. etc.), and in that sense I agree no customer who's just using the product actually cares about that.
Another definition of "good code" is probably "code that meets the requirements without unexpected behavior" and in that sense of course end users care about good code, but you could give me two black boxes that act the same externally, one written as a single line , single character variables, etc. etc. etc. and another written to be readable, and I wouldn't care so long as I wasn't expected to maintain it.
>but you could give me two black boxes that act the same externally, one written as a single line , single character variables, etc. etc. etc. and another written to be readable, and I wouldn't care so long as I wasn't expected to maintain it.
The reality of software products is that they are in nearly in all cases developed/maintained over time, though--and whenever that's the case, the black box metaphor fails. It's an idealization that only works for single moments of time, and yet software development typically extends through the entire period during which a product has users.
> I read OPs "good code" to mean "highly aesthetic code" (well laid out, good abstractions, good comments, etc. etc.)
The above is also why these properties you've mentioned shouldn't be considered aesthetic only: the software's likelihood of having tractable bugs, manageable performance concerns, or to adapt quickly to the demands of its users and the changing ecosystem it's embedded in are all affected by matters of abstraction selection, code organization, and documentation.
But those aesthetics stem from that need for fewer bugs, performance, maintainability. Identifying/defining code smell comes from experience of what does and doesn’t work.
> I wouldn't care so long as I wasn't expected to maintain it.
But, if you’re the one putting out that software, of course you will have to maintain it! When your users come back with a bug or a “this flow is too slow,” you will have to wade into the innards (at least until AI can do that without mistakes).
But the thing is that someone has to maintain it. And while beautiful code is not the same as correct code, the first is impactful in getting the second and keeping it.
And most users are not consuming your code. They’re consuming some compiled, transpiled, or minified version of it. But they do have expectations and it’s easier to amend the product if the source code is maintainable.
I really don't see how this can be possible unless they're accepting abysmal recall? Perhaps I'm missing something fundamental here, but the idea that AI and non-AI assisted text can be separated with "nearly 0 false positives" just says to me that it's really just a filter for the weakest, most obvious AI generated text. Is that valuable?
We use playwright for interacting with the browser, so while it's not available by default, we do support bulk exporting tests as playwright to move off our platform or to customers who want to run deterministic versions of the tests on their own infra (you can also run them on ours!)
This is interesting, I think we've shied away a bit from security-ish use cases since it's outside of our personal core competencies, do you have examples of what tools exist today for catching things like that? Or is it totally adhoc?
> can the agents check their email? other notification methods?
Yes to email (for paying customers agents spin up with unique addresses), no to other notifications, but as soon as a paying customer has a use case for SMS, etc. we'll build it.
> Are your agents good at testing other agents? e.g. I want your agent to ask our agent a few questions and complete a few UI interactions with the results.
I'd say this is one of our strong suits I think, specifically the UIs tend to be easy to navigate for browser agents, and the LLM as a judge offers pretty good feedback on chat quality and it can inform later actions. (I'd be remiss not to mention though that a good LLM eval framework like Braintrust is probably the best first line though)
> How do you handle testing onboarding flows?
We can step through most onboarding flows if you start from logged out state & give the context it'll need (i.e. a stripe test card, etc.) That said though, setting up integrations that require multi-page hops is still a pain point in our system and leaves a lot to be desired.
Would love to talk more about your specific case and see if we can help! founders@propolis.tech
Hey I'm Matt! Really excited to answer any questions.
To elaborate a little bit on the "canary" comment --
For a while at Airtable I was on the infra team that managed the deploy (basically click run and then sit and triage issues for a day), One of my first contributions on the team was adding a new canary analysis framework that made it easier to catch and rollback bugs automatically. Two things always bothered me about the standard canary release process:
1) It necessarily treats some users as lower value, and thus more acceptable to risk exposing bugs to (this makes sense for things like free-tier, etc. but the more you segment out, the less representative and thus less effective your canary is). When every customer interaction matters (as is the case for so many types of businesses) this approach is harder to justify
2) Low frequency / high impact bugs are really difficult to catch in canary analysis. While it’s easy to write metrics that catch glaring drops/spikes in metrics, more subtle high impact regressions are much harder and often require user reports (which we did not factor in as part of our canary). Example: how do you write a canary metric that auto rolls back when an enterprise account owner (small % of overall users) logs in and a broken modal prevents them from interacting with your website.
I view what we’re building at Propolis as an answer to both of these things. I envision a deploy process (very soon) that lets us roll out to simulated traffic and canary on THAT before you actually hit real users (and then do a traditional staged release, etc.)
seems like you are misappropriating what canaries are useful and used for... they are designed to be lightweight and shallow... hence the name and whole analogy, canaries never were meant to determine if a mine was structurally unsafe etc
Canaries are lightweight and shallow once they exist. Building a canary from the ground up is still beyond us, but if you don’t want to kill an actual bird that is pretty much the only way to go.
I thought about doing this as well but ended up viewing it personally as a "lose-lose" for myself rather than a "win-win". I wonder if that says anything about my risk-aversion.
It means you are not a degenerate gambler. You probably also do not enjoy slot machines, or the bullshit that passes for card games in a casino. I want to play blackjack, not "blackjack" where you change the allowed bets and rules specifically to increase the house edge thank you very much.
I think there are a lot of developers working in repos where it's almost guaranteed that their code will _not_ still be there in 10 years, or 5 years, or even 1 year.