Good to include the whole quote. I wonder if using string builders is really a c...

kaslai · on Jan 14, 2020

String builders are often not a critical optimization, however the additional cognitive burden on the reader is nearly zero. In some languages, string builders can even overload operator += which makes the type of the object the only visible distinction outside of the final conversion to string.

In languages that have immutable strings, a chain of `+=` operators is basically O(n^2) vs O(n) for a string builder. For how easy the optimization is, there's little excuse to not use them for any bulk append operations.

kragen · on Jan 14, 2020

The standard approach to this in Python is

    rv = []
    for x in y:
        rv.append(f(x))
        if g(x):
            rv.append(h(x))
    return ''.join(rv)

This gives you the same O(N²) to O(N) speedup you would get from a StringBuilder.

More recently, though, I've often been preferring the following construction instead:

    for x in y:
        yield f(x)
        if g(x):
            yield h(x)

This is sometimes actually faster (when you can pass the result to somefile.writelines, for example, which does not append newlines to the items despite its name) and is usually less code. If you want to delegate part of this kind of string generation to another function, in Python 3.3+, you can use `yield from f(x)` rather than `for s in f(x): yield s` or the just `yield f(x)` you use if `f` returns a string, and the delegation is cleaner and more efficient than if you're appending to a list and the other function is internally joining a list to give you a string.

However, if you're optimizing a deeply nested string generator, you're better off using the list approach and passing in the incomplete list to callee functions so they can append to it. Despite the suggestive syntax, at least last time I checked, `yield from` doesn't directly delegate the transmission of the iterated values; on this old netbook, it costs about 240 ns per item per stack level of `yield from`. (By comparison, a simple Python function call and return takes about 420 ns on the same machine.)

But if you really wanted your code to run fast you wouldn't have written it in Python anyway. You'd've used JS, LuaJIT, or Golang. Or maybe Scheme. Or C or Rust. But not Python.

davedx · on Jan 15, 2020

Okay, so I checked this for JavaScript, and it's not actually true -- in Chrome, a vanilla += is faster than pushing into an array and joining.

https://jsperf.com/javascript-concat-vs-join/2

This is why you really should always benchmark. In my view, "premature optimization" is not so much about optimizing too early in a project, it's about writing code a particular way you assume will make it faster without testing first.

kaslai · on Jan 15, 2020

So that means JS strings aren't truly immutable in a modern environment (which is fine). The runtime environment is internally using an approach similar to a string builder, which is a good optimization.

I agree that you shouldn't operate on assumptions alone for a decision like whether or not you should use a string builder. That's where prior experience should come in to play to guide your decisions. For instance, I am not a JS developer, so I have no prior experience to inform a decision to use a builder vs concat in JS.

I cited that case in particular since the slowness of concatenation was called out in the article, and in some languages it actually does make a huge difference at a very small complexity cost.