Buried in here are great examples of why rewrites don’t help: “The module that d...

jjoonathan · on Sept 21, 2021

Right, but GC encourages you to not think about memory at all until the program starts tipping over and fixing the underlying cause of the leak now requires an architecture change because the "we hold onto everything" assumption got baked into the structure in 2 places that you know about and 5 that you don't.

I don't miss the rote parts of manual memory management, but it had the enormously beneficial side effect of making people consider object lifetimes upfront (to keep the retain graph acyclic) and cultivate occasional familiarity with leak tracking tools. Problematic patterns like the undo queue or query correlator that accidentally leak everything tended to become obvious when writing the code, rather than while running it. These days, I keep seeing those same memory management anti-patterns show up when I ask interviewees to tell a debugging war story. Sometimes I even see otherwise capable devs shooting in the dark and missing when it comes to the "what's eating RAM" problem.

I feel like GC in long-form program development substitutes a small problem for a big one. Short-form programming can get away with just leaking everything, which is what GC does anyway, so I'm not sure there's any benefit there either.

tl;dr: get off my lawn.

titzer · on Sept 21, 2021

GC will not fix trashy programming. The problem is that many GC'd languages have adopted a style guide that commits to a lot of unnecessary allocations. For example, in Java, you can't parse an integer out of the middle of a string without allocating in-between. Ditto with lots of other common operations. Java has oodles of trashy choices. With auto-boxing, allocations are hidden. Without reified (let's say, type-specialized) generics, all the collection classes carry extra overhead for boxing values.

I write almost all of my code in Virgil these days. It is fully garbage-collected but nothing forces you into a trashy style. E.g. I use (and reuse) StringBuilders, DataReaders, and TextReaders that don't create unnecessary intermediate garbage. It makes a big difference.

Sometimes avoiding allocation means reusing a data structure and "resetting" or clearing its internal state to be empty. This works if you are careful about it. It's a nightmare if you are not careful about it.

I'm not going back to manual memory management, and I don't want to think about ownership. So GC.

edit: Java also highly discourages reimplementing common JDK functionality, but I've found building a customized datastructure that fits exactly my needs (e.g. an intrusive doubly-linked list) can work wonders for performance.

jjoonathan · on Sept 21, 2021

> many GC'd languages have adopted a style guide that commits to a lot of unnecessary allocations.

Oh, that too. I forgot to rant about that.

> Virgil

Unfortunately I'd rather live with a crummy language that has strong ecosystem, tooling, and developer availability, so I'll never really know. It does sound nice, though.

pjmlp · on Sept 21, 2021

Yeah, but that was one of Java's 1.0 mistakes, that thankfully Go, .NET, D, Swift, among others, did not make.

Now lets see if Valhalla actually happens.

josephg · on Sept 21, 2021

> Right, but GC encourages you to not think about memory at all

I’ve come to a new obvious realisation with this sort of thing recently: if you care about some metric, make a test for it early and run it often.

If you care about correctness, grow unit tests and run them at least every commit.

If you care about performance, write a benchmark and run it often. You’ll start noticing what makes performance improve and regress, which over time improves your instincts. And you’ll start finding it upsetting when a small change drops performance by a few percent.

If you care about memory usage, do the same thing. Make a standard test suite and measure it regularly. Ideally write the test as early as possible in the development process. Doing things in a sloppy way will start feeling upsetting when it makes the metric get worse.

I find when I have a clear metric, it always feels great when I can make the numbers improve. And that in turn makes it really effortless bring my attention to performance work.

tptacek · on Sept 22, 2021

Not so much. Here we have an example of a memory pressure problem that's evident only under high load in realistic environments. This is a classic problem with performance engineering: it's usually difficult to do realistic automated load testing. Instead, you end up running lab experiments, which are time-consuming to set up.

The whole post is essentially about how tricky it was to surface the problems their customers were seeing in the field. I'd resist the urge to respond to that with a platitude about automated testing.

josephg · on Sept 22, 2021

Yes it can be difficult to do realistic automated load testing. But I suppose I see this as more evidence that if you're going to do load testing, do it right! In complex systems you often need real world usage data, or your metrics won't predict reality.

I've been running into this a lot writing software for collaborative editing. Randomly generated editing traces work fine for correctness testing. But doing performance testing with random traces is unrepresentative. The way people move their cursors around a text box while editing is idiosyncratic. Lots of optimizations make performance worse with random editing histories, but improve performance for real world data sets.

tptacek · on Sept 21, 2021

Plenty of C programs do the equivalent of ioutil.ReadAll; it's not a GC thing.

jjoonathan · on Sept 21, 2021

"Leak everything because we can get away with it here" is a fine memory management strategy. "Why does my program keep getting killed?" isn't.

tptacek · on Sept 21, 2021

This has nothing to do with leaking (nothing "leaked"; it's a garbage-collected runtime). It's about memory pressure, which, I promise you, is a very real perf problem in C programs, and why we memory profile them. The difference between incremental and one-shot reads is not a GC vs. non-GC thing.

xondono · on Sept 21, 2021

> Buried in here are great examples of why rewrites don’t help

That has not been my experience. Rewrites do sometimes help, because in a lot of codebases there’s too many “pet” modules or badly designed frozen interfaces.

Rewrites can help in those situations, because there’s no sacred cows anymore. The issue is that a lot of people do rewrites as translations, without touching structures.

silisili · on Sept 21, 2021

Agreed with this 100%.

So many posts here over the years of examples of 'how we rewrote from x to y and saw 2000% gains', where x and y are languages. Such examples are 100% meaningless. Rewrites from the ground up -should- always be way faster, since it's all greenfield. If trying to make a language comparison, rewrite the entire thing in both languages!

josephg · on Sept 21, 2021

Yes absolutely. I wrote an article a couple months ago which was trending here where I got a 5000x performance improvement over an existing system. One of the changes I made was moving to rust, and some people seemed to think the takeaway was “rewriting the code in rust made it 5000x faster”. It wasn’t that. Automerge already had a rust version of their code which ran a benchmark in 5 minutes. Yjs does the same benchmark in less than 1 second in javascript.

Yjs is so fast because it makes better choices with its data structures. A recent PR in automerge-rs brought the same 5 minute test down to 2 seconds by changing the data structure it uses.

Rust/C/C++ give you more tools to write high performance code. But if you put everything on the heap with copies everywhere, your code won’t be necessarily any faster than it would in JS / python / ruby. And on the flip side, you can achieve very respectable performance in dynamic languages with a bit of care along the hot path.

marcus_holmes · on Sept 22, 2021

Not only greenfield, but the problem domain is much better understood. A lot of architecture choices are made in the early days of a project when the problem isn't sufficiently understood to make the choice correctly.

I'm a huge fan of writing the first version of anything as an problem-exploration prototype, intended to be discarded and rewritten. As Fred Brooks said, "you're going to rewrite anyway, you might as well plan for it" [0]

[0] paraphrased from https://en.wikiquote.org/wiki/Fred_Brooks "The management question, therefore, is not whether to build a pilot system and throw it away. You will do that. […] Hence plan to throw one away; you will, anyhow."

kelnos · on Sept 22, 2021

In my experience, the prototype never gets thrown away when it should be, and sometimes it's never thrown away at all. It just gets extended, poorly, until development grinds to a halt because you can no longer add features or fix bugs without creating new bugs.

Then you either a) stop what you're doing and spend many months rewriting, or b) spin up a parallel team that does the rewrite, while the old team maintains the old code and does their best to add the most critical features and fix the most critical bugs without breaking anything else in the process.

Neither approach is good. (a) means you'll probably lose customers due to lack of progress on their pet issues. (b) means your development costs have doubled, and you have a team full of people who are demotivated and demoralized because they know they're working on something that's soon destined for the junk heap.

I usually build the first version expecting that it will live on for quite a long time (and sometimes/often be the only version), and build with an eye toward ease of refactor and even ease of rearchitecting. Yes, it's slower than building a prototype-quality product, and yes, sometimes product managers complain that the extra time needed will blow a market opportunity. Those PMs are usually wrong, and even if they are potentially right, building the prototype always takes longer than expected, so the PMs end up fretting over time-to-market anyway.

laumars · on Sept 21, 2021

This is where profiling helps more. Find the weak parts of the code, try to optimise those. If the language proves to be a barrier then you have a justification for a rewrite.

All too often people don’t understand how to performance tune software properly and instead blame other things first (eg garbage collection)

bluGill · on Sept 21, 2021

Most slow languages make escape to C easy for cases where the language is the issue. Most fast languages make writing a C APIed interface easy, so if the language is your issue just rewrite the parts where that is the problem.

Of course eventually you get to the point where enough of the code is in a fast language that writing everything in the fast language to avoid the pain of language interfaces is worth it.

laumars · on Sept 21, 2021

And there’s time when even C isn’t sufficient and a developer needs to resort to inlined assembly. But most of the time the starting language (whatever that might be) is good enough. Even here, the issue wasn’t the language, it was the implementation. And even where the problem is the language, there will always be hot paths that need hardware performant code (be that CPU, memory, or sometimes other devices like disk IO) and there will be other parts in most programs that need to be optimised for developer performance.

Not everyone is writing sqlite or kernel development level software. Most software projects are a trade off of time vs purity.

That all said, backend web development is probably the edge case here. But even there, that’s only true if you’re trying to serve several thousand requests a second on a monolithic site in something like CGI/Perl. Then I’d argue there’s not point fixing any hot paths and just rewrite the entire thing. But even then, there’s still no need to jump straight to C, skipping Go, Java, C#, and countless others.

pjmlp · on Sept 21, 2021

Except when the program is actually written in C, then better hold the Algorithms and Data Structures book and dust it off, or Intel/AMD/ARM/... manuals.

bluGill · on Sept 21, 2021

Algorithms and data structures come BEFORE dropping to c.

These days it is rare that you can beat your compiler with hand machine code, and even if you can it isn't worth it because the difference is typically small and only applies to one specific machine.

Of course once in C you can often think about memory locality and other cache factors that higher languages hide from you.

pjmlp · on Sept 22, 2021

Many applications still start in C, there is no dropping into C.

wrs · on Sept 21, 2021

Quite true, a rewrite can help if it is also a "rethink". But you don't have to switch languages to get that effect--in fact you'll probably do better if you don't throw a new language/library into the mix.

My point was that, contrary to what is apparently a common impulse, rewriting the same thing in a different language while maintaining the lack of attention to performance considerations that was present in the first version isn't going to help much.

coliveira · on Sept 21, 2021

This is less an argument for a rewrite than an argument for redesigning parts of your codebase, which can be done much more easily than a complete rewrite.

xondono · on Sept 21, 2021

The tricky thing is that it’s easy to end up with a result that’s not far off. Some modules will improve, but a lot of the time these kind of bottlenecks tend to happen because the performant version is not very idiomatic (feels weird), it’s too verbose, or it’s to confusing to think through.

Unless you have the same team (and they learned the lesson the first time), it’s very likely to end up with modules that perform in a similar way.

Sometimes changing the language makes thinking about the problems easier.

hinkley · on Sept 21, 2021

I would argue that the rewrites help when the information architecture for the original code is proven to be wrong, and there is either no way to refactor the old code to the new model, or employee turnover has resulted in nobody having an emotional attachment to the old code.

That said, to slot in a new implementation you often have to make the external API very similar to the old one, which can complicate making the improvements you're after.

bluGill · on Sept 21, 2021

> there is either no way to refactor the old code to the new model

That doesn't happen. Write facades as needed. Even if they are slower than everything else write the facades so you can keep in production all along.

hinkley · on Sept 21, 2021

If you get the object ownership and the internal state model wrong (information architecture) facades don't help you.

You can't put an idempotent or pure functional wrapper around a design that isn't re-entrant and expect anything good to come from it. IF you get it to work, it'll be dog slow.

bluGill · on Sept 21, 2021

Last time I was in a rewrite the boss had the old software on a computer next to him with the label "Product owner of rewrite". He regularly when asked how to do something looked at what that did.

_ivvf · on Sept 21, 2021

I downvoted you at first and then changed my mind. I think I would like your comment more if it were more worded like: "buried in here are great examples of important optimizations that did not require a rewrite". Or something like: "this article does a great job of showing that you can hit many reasonable performance targets while using a GC'ed language like Go."

You can pretty much always get better performance with more control over memory, and more importantly, you can dramatically lower overall memory usage and avoid GC pauses, but you have to weigh that against the fact that automated memory management is one of the few programming language features that is basically proven to give a massive developer productivity boost. In my corner of the industry, everyone chooses the GC'ed languages and performance isn't really a major concern most of the time.

olau · on Sept 21, 2021

> The problem here isn’t that the language has GC, it’s that memory usage was just not considered.

While I agree with the gist of what you're saying, I do think runtimes based on the we'll-clean-it-up-some-day GC paradigm makes it more important to consider memory allocation than less laissez-faire paradigms (like RAII or reference counting), contrary to how it's presented in the glamorous brochures.

jerf · on Sept 21, 2021

Put it this way: Each of the things mentioned in that post were errors that could just as easily have been made in Rust, and Rust would not necessarily have helped avoid. At best you can make a case for the errors being more explicit, but in my personal experience even that would be weak.

The last error in particular, using byte buffers instead of a streaming abstraction, is pervasive in programming. I don't know if Rust is necessarily any worse than Go's library environment for dealing with that problem but I doubt it's any better. By having io.Reader in the standard library from the beginning (and not because of any other particular virtue of the language, IMHO) it has had one of the best ecosystems for dealing with streams without having to manifest them as full bytes around [1].

It amounts to, the root problem is that they didn't have the problem they thought they have. Rust will blow the socks off the competition w.r.t. memory efficiency of lots of small objects, which is why it's so solid in the browser space. But that's not the problem they were having. Go's just fine where they seem to have ultimately ended up, stream processing things with transient per-object processing. Even if you do some allocation in the processing, the GC ends up not being a big deal because the runs end up scanning over not much memory not all that frequently. This is why Go is so popular in network servers. Could Rust do better? Yes. Absolutely, beyond a shadow of a doubt. But not enough to matter, in a lot of cases.

[1]: An expansion on that thought if you like: https://news.ycombinator.com/item?id=28368080

tptacek · on Sept 21, 2021

I think the Rust and Go stories with buffers vs. readers is pretty comparable. They both have good support for readers, and to-good support for reading whole messages into slices or Vec<u8>'s.

jerf · on Sept 21, 2021

Good to hear. I hope it's something all new languages have going forward, because like I mentioned in my extended post it's almost all about setting the tone correctly early in the standard library & culture, rather than any sort of "language feature" Go had.

As mostly-a-network engineer it's a major pet peeve of mine when I have to step back into some environment where everything works with strings. I can just feel the memory screaming.

pjmlp · on Sept 22, 2021

You mean just like XML-RPC and JSON-RPC (sorry REST), work?

Because the best way to contribute to global warming is to waste CPU cycles serializing and deserializing data structures into XML and JSON, and parsing them as well.

_ivvf · on Sept 21, 2021

More importantly, GC'ed languages tend to use at least 2x the memory of un-GC'ed languages and have to deal with the consequences of GC-induced pauses and generally inferior native code interop. Whether that matters to you or not depends on your application. No one is going to use a GC'ed language in the Linux Kernel, but practically 100% of backend applications are written in GC'ed languages because the productivity benefits are of automatic memory management are massive.

fiddlerwoaroof · on Sept 21, 2021

I’m not really sure if that 2x figure is accurate. I’ve seen charts on both sides of this and a lot here depends on your programming language and the things it can optimize: with Linear/Affine types, I’m fairly sure Haskell could, in theory, eliminate GC deterministically from the critical sections of your code-base without forcing you to adopt manual memory management universally.

But, there’s just the fact that people writing real-time/near real-time systems do, in fact, choose GC languages and make it work: video games are one example with Minecraft and Unity being the major examples. But also HFT systems: Jane Street heavily uses Ocaml and other companies use Java/etc. with specialized GCs.

This is not even to mention the microbenchmarks that seem to indicate that Common Lisp and Java can match or exceed Rust for tasks like implementing lock-free hash maps and various other things https://programming-language-benchmarks.vercel.app/problem/s...

_ivvf · on Sept 21, 2021

I am aware that you can hit really good latency targets with GC'ed languages, like in the video game and finance industry. Whenever I investigate examples, though, I find the devs have to go through a ton of effort to avoid memory allocations, and then I ask if using the GC'ed language was even worth it in the first place?

I'm actually fascinated with the idea of going off-heap in the hotspots of GC'ed languages to get better performance. Netty, for instance, relies on off-heap allocations to achieve better networking performance. But, once you do so, you start incurring the disadvantages of languages like C/C++, and it can get complicated mixing the two styles of code.

perfectspiral · on Sept 21, 2021

"Whenever I investigate examples, though, I find the devs have to go through a ton of effort to avoid memory allocations"

Yep, also the median dev in a GC'ed language is simply incapable of writing super efficient code in these languages because they rarely have to. You would have to bring in the best of the best people from those communities or put your existing devs through a pretty significant education process that is similar in difficulty to just learning/using Rust.

The resulting code will be very different to what typical code looks like in those languages, so the supposed homogeneity benefits of just writing fast C#/Java when it's needed are probably not quite true. You'd basically have to keep that project staffed up with these kinds of people and ensure they have very good Prod observability to ensure regressions don't appear.

_ivvf · on Sept 21, 2021

Yes, and I think one important aspect to this is the necessary CI/CD changes needed to support these kinds of optimizations. If your performance targets are tight enough that you are making significant non-standard optimizations in your GC'ed language, you're probably going to want some automated performance regression testing in your deployment pipeline to ensure you don't ship something that falls down under load. In my experience, building and maintaining those pipeline components is not easy.

igouy · on Sept 22, 2021

> … tasks like implementing lock-free hash maps…

Please be specific.

You pointed to spectral-norm, what does that have to do with lock-free hash maps?

The 2.java program seems to be 4x slower than the 7.rs program !

fiddlerwoaroof · on Sept 22, 2021

Look at 2.cl, though: the lisp solution is faster than everything except one c++ solution. (And, aside from the SIMD intrinsics, the lisp solution is fairly idiomatic)

I was referring to this with the lock-free hash maps: https://twitter.com/nodefunallowed/status/137196906733924761...

igouy · on Sept 22, 2021

> I was referring to this with the lock-free hash maps…

Well thank you for providing an actual reference.

afaict from a twitter thread, "42nd At Threadmill" and "Luckless" are both Lisp re-implementations of the same Java hashtable code.

afaict the Rust sofware is not a re-implementation of that same Java hashtable code.

afaict that chart does not show any measurements of Java software, just Lisp and Rust.

So "… Java can match or exceed Rust …" seems to be based on nothing.

> Look at 2.cl, though…

So hand-coded AVX is hand-coded AVX in any language?

tsimionescu · on Sept 21, 2021

I mostly agree with what you're saying, but I'll also add that GC pauses are mostly a problem of yester-year unless you're either managing truly enormous amounts of memory or have hard real-time requirements (and even then it's debatable). Modern GCs, as seen in Go, Java 11+, .NET 4.5+ guarantee sub-millisecond pauses on terrabyte-large heaps (I believe the JS GC does as well, but I'm less sure).

zamadatix · on Sept 21, 2021

Rewrites can definitely help but rushing into them before doing these other things is going to net you a lot less gain for the time.

coliveira · on Sept 21, 2021

That is correct.

In the worst case, you can always (even on GC'd languages) pre-allocate buffers and do your work without new memory requests. But you need to plan for this, in the same way you'd do in a language without GC.