Turn O(n^2) reverse into O(n)

drostie · on Dec 16, 2013

Explanation:

bufOps is a dictionary which holds a bunch of functions accessed with the getters on it. For the sake of this comment, we can concretize and use (buf_empty bufOps) as [] and (buf_append bufOps) as ++.

This code then essentially performs:

    foldr (flip (++)) [] xs

Which, if you look up the definition of foldr, is:

    ((([] ++ xN) ++ ... ) ++ x2) ++ x1

And a definition of ++ is of course:

    [] ++ ys = ys
    (x:xs) ++ ys = x : (xs ++ ys)

This means that for lists of this sort a ++ b runs in time O(length a), because it has to descend down the leftmost list to find the empty list -- only once it finds [] can it "work its way backwards" to append elements from a onto b.

If each of the x1, x2, ... xN has m elements, then we do 0 + m + 2m + ... + N m = m * N * (N + 1) / 2 operations. Each ++ will do about N operations and we'll do about N of them; it's O(N^2).

The new algorithm, `concat (reverse xs)`, works because `xs` is just a list which can be reversed by traversing down it in O(N) time, then those can be merged together in O(N * m) time.

zvrba · on Dec 16, 2013

So the original line of code has apparently been present since the very first import by Sigbjørn Finne in https://github.com/nominolo/HTTP/blob/c4765e822eb92196fec955... (check line 443).

If a Haskell expert (e.g., he authored hdirect -- an IDL compiler and interface with Win32 COM -- sadly defunct now) makes this kind of mistake, how are mere mortals supposed to reason about algorithmic efficiency?

velis_vel · on Dec 16, 2013

I don't think this was an 'oh my god we're trying to track down why cabal update is so slow and we just can't find it' issue like you make it out to be.

edit: http://www.reddit.com/r/haskell/comments/1sh67u/the_reason_w... This comment by the patch author indicates that actually tracking it down was a fairly straightforward profiling job.

zvrba · on Dec 16, 2013

I didn't try to imply that it was difficult to find the performance problem. It's an observation on something else: Haskell is often mentioned for its equational reasoning. I find it interesting that correct equational reasoning in this case lead to a correct program, but one with absymal performance.

It'd be interesting to think how one could encode performance characteristics into equations.

Peaker · on Dec 16, 2013

I think having simple and direct denotational semantics necessarily means having relatively complex, indirect operational semantics (e.g: Haskell). Having a simple and direct operational semantics necessarily means having complex and indirect denotational semantics (e.g: C).

C makes performance easy and correctness hard. Haskell makes correctness easy and performance hard.

The trick is, in most programs, you only need performance for a tiny subset of the program. You need correctness throughout the whole program.

zvrba · on Dec 17, 2013

Why do you think that there must be a tradeoff between simplicity of denotational and operational semantics?

Peaker · on Dec 17, 2013

Because operational semantics (of contemporary computers, at least) are very different from a simple denotational semantics.

We have to bridge that gap:

1) Either use a compiler that hides away the operational details

2) Or use a language that directly maps to the operational semantics, but then is necessarily far away from the denotational ones

jjaredsimpson · on Dec 16, 2013

Its known that postpending to a linked list is slow. It's just a property of the data type.

velis_vel · on Dec 16, 2013

> I find it interesting that correct equational reasoning in this case lead to a correct program, but one with absymal performance.

That's not really an interesting property of Haskell, though. You can always write correct but slow code in any language.

> It'd be interesting to think how one could encode performance characteristics into equations.

I've seen the concept of encoding performance in types played around with, but I can't find actual work (if any) that's been done on it.

eru · on Dec 17, 2013

> You can always write correct but slow code in any language.

Yes, but Haskell makes that exceptionally easy.

sjtgraham · on Dec 16, 2013

Humans occasionally make mistakes, including expert ones.

vanderZwan · on Dec 16, 2013

Exactly. Isn't it comforting to know that making mistakes is only human? And isn't this a wonderful practical example of why you want to be open source?

microtherion · on Dec 16, 2013

I suppose what Alan Perlis said about LISP programmers—that they know the value of everything and the cost of nothing—is true for Haskell programmers as well.

kovalkos · on Dec 16, 2013

This is a bad application of the quote.

Most of the Lisp hackers are usually knowledgeable about analysis of algorithm complexity.

Alan Perlis comment talks about inefficiency of Lisp programs due to a lot of (meta) abstraction and indirect mapping of abstract software to hardware.

microtherion · on Dec 18, 2013

Admittedly, this particular case seems amenable to fairly straightforward algorithmic complexity analysis, but it seems to me that the lazy nature of Haskell seems to make it even harder to get a good grasp of costs involved than in a language like Lisp.

nilkn · on Dec 16, 2013

Mistakes of this sort have been made in many languages. I don't see what this has to do with Haskell specifically.

hesselink · on Dec 16, 2013

I think back then, the 'n' just wasn't very large, so O(n) versus O(n^2) didn't make much of a difference.

6d0debc071 · on Dec 16, 2013

Advantage in focus - maybe he wrote it once and didn't look back at it. Low hanging fruit elsewhere for typical use case?

eru · on Dec 17, 2013

> [...] how are mere mortals supposed to reason about algorithmic efficiency?

You profile, and then you look at the suspicious parts.

_ezkx · on Dec 16, 2013

Here is the context

http://www.reddit.com/r/haskell/comments/1sh67u/the_reason_w...

djulius · on Dec 16, 2013

I guess nobody ever got upvoted for using StringBuffer/StringBuilder instead of String concatenation in Java.

m0shen · on Dec 16, 2013

In JDK 7 '+' concatenation actually uses StringBuilder under the hood, so probably nobody cares.

I can't tell if you're being facetious.

djulius · on Dec 16, 2013

Indeed I was.

It's a good safety barrier against hype things.

thinkpad20 · on Dec 16, 2013

I'd definitely like to see an explanation of what's going on, both at a high level and the specifics of the code. As a Haskell beginner-intermediate, I don't really know what most of those functions are doing (much less the context of what that function's purpose is), but I feel I could probably understand an explanation if it were given.

MBCook · on Dec 16, 2013

I think I know enough Haskell to explain.

Old version:

    foldr (flip (buf_append bufOps)) (buf_empty bufOps) strs

foldr is a combination map/reduce in one. It takes a value (the accumulator), runs a function on the rightmost value in a list called str (that's why it's 'foldr'), and combines the two together into a new accumulator value.

Let's say we have a list of numbers, the addition operator, and an accumulator of 0.

    foldr op acc list
    foldr + 0 [1, 2, 3, 4]
    a = 0, list = [1, 2, 3, 4]
    a = 0 + 4, list = [1, 2, 3]
    a = 4 + 3, list = [1, 2]
    a = 7 + 1, list = [1]
    a = 8, list = []
    Result is 8

One thing to note is that this walks the list 'backward', and in Haskell it's a singly linked list. That means that each time it has to walk the full list. I believe this is why it's O(n^2).

It looks like the accumulator starts with (buf_empty bufOps), which from context I assume is an empty buffer structure. The operator is (flip (but_append bufOps)), which looks like it adds values onto the bufOps structure when called. Each time it goes through this process, it appends one element.

Because they're using foldr and the list is walked 'backward', this has the natural side effect that the order of the items in strs is reversed as it's added to bufOps.

New version:

    buf_concat bufOps $ reverse strs

The $ is a way of controlling precedence in Haskell, you can replace it with parenthesis. So we get:

    buf_concat bufOps ( reverse strs )

So reverse just reverses the order of a list, getting things in the order the code wants them. Then it calls buf_concat with the current structure and the items it wants added. So instead of taking one existing list and adding 6 new elements at the end one at a time, it takes one existing list and adds a list of 6 elements to the end once.

It seems like a relatively simple change. I wonder if using foldl or foldl' to avoid having to rewalk the list every time would have performance similar to the new line.

I find the new line simpler and cleaner either way, but was the choice of a right-fold the cause of the performance problem?

quchen · on Dec 16, 2013

> One thing to note is that this walks the list 'backward'

No, foldr walks through the list forward, and exactly once. Your example is evaluated as

    foldr (+) 0 [1, 2, 3, 4]
    = 1 + foldr (+) 0 [2,3,4]
    = 1 + (2 + foldr (+) 0 [3,4])
    = 1 + (2 + (3 + foldr (+) 0 [4]))
    = 1 + (2 + (3 + (4 + foldr (+) 0 [])))
    = 1 + (2 + (3 + (4 + 0)))

e12e · on Dec 16, 2013

Is there an (easy) way to see what/how haskell evaluates such code (ie: step through)? Something like explain from sql or disassemble from lisp?

jerf · on Dec 16, 2013

GHCI has a debugger. You can step into an expression and watch as Haskell evaluates it, in full lazy order.

Even if you understand lazy evaluation in theory, I recommend taking some non-trivial (but not too large) expression and :step'ing through the whole evaluation sometime, just to train your intuition.

peteretep · on Dec 16, 2013

    $ is a way of controlling precedence in Haskell

But much more importantly, is `apply`.

PKop · on Dec 16, 2013

Just so no one reading is confused, should be: Result is 10 in foldr example. Just forgot one element from list.

MBCook · on Dec 16, 2013

You're right. Good catch.

tibbon · on Dec 16, 2013

My Haskell skills still are growing. Can someone explain?

seliopou · on Dec 16, 2013

The original line was something like this:

  acc = "";
  for (var i = 0; i < strings.length; i++) {
      acc += strings[i]
  }

The new version is something like this:

  strings.join("")

The second version can pre-allocate a string and copy characters into that string under the hood. The first version has to allocate a new string and recopy the characters on every iteration.

tmoertel · on Dec 16, 2013

Even if the buffers were vanilla Strings, and there were no pre-allocation, the patch would still have fixed the quadratic-runtime problem. Consider that, for vanilla Strings, the buffer ops are implemented like so:

    buf_append bufOps = (++)
    buf_concat bufOps = concat

Since (++) has a running time that's linear in the length of its left argument, you want to treat it as a right-associative operator to prevent quadratic runtime when joining a bunch of strings.

But, going back to the patch in question, the original code was effectively this:

    foldr (flip (++)) [] strs

Since flip had been applied to (++), the resulting operator you now want to treat as left associative to avoid quadratic blow-up. The code, however, uses the right-associative fold, foldr, to join the list of strings str.

The patch fixes this problem by replacing the code with the equivalent code

    concat (reverse strs)

And how is concat defined in the libraries? Looking at the source [1], it's

    concat = foldr (++) []

Thus the fix is basically

    foldr (++) [] (reverse strs).

This version reverses strs, at linear-time cost, to be able to apply the normal, unflipped (++) with a right-associative fold and thus avoid the dreaded quadratic blow-up.

[1] http://hackage.haskell.org/package/base-4.6.0.1/docs/src/GHC...

EDITED TO ADD: Also, if you knew the buffers were vanilla Strings, you could even eliminate the reverse overhead since

    foldr op z xs == foldl (flip op) z (reverse xs)

for all finite lists xs. Thus you could flip (++) and use foldl instead of foldr to get an efficient implementation like this:

    foldl (flip (++)) [] strs

djipko · on Dec 16, 2013

Nice explanation. I'm surprised that no one so far mentioned a cool trick for getting around this using a different way to define mappend as function composition, to make sure ++ is applied in the right order, since it's from LYAH (scroll to end of the linked chapter):

http://learnyouahaskell.com/for-a-few-monads-more#writer

fekberg · on Dec 16, 2013

I recall when I studied algorithms and data structures, but just swapping to using references/pointers instead of copying memory, I could speed up my algorithms ridiculously. Correct me if I am wrong. "Small" things like these (as we just saw) can be really beneficial!

eru · on Dec 16, 2013

As an aside, immutability (ie purity) in Haskell allows refercences everywhere, and thus less copying. (They call it `sharing' in Haskell land.)

yongjik · on Dec 16, 2013

I wonder if there's something different about Haskell that makes "less copying" more attractive?

In the C++ world, libstdc++ strings have copy-on-write semantics, which (as far as I heard) turned out to be terrible because you have to do reference counting instead, and with multithreading it requires atomic operations, which is slower than copying for small strings.

thinkpad20 · on Dec 16, 2013

It's worth noting, though, that immutability doesn't always mean less copying. Immutable arrays, where the entire array must be copied with the change of one element, are an example of that. And by extension, hash tables, etc. Of course there are lots of tricks that can be employed to get around this, to a degree.

coolsunglasses · on Dec 16, 2013

>Immutable arrays, where the entire array must be copied with the change of one element, are an example of that

That is not how modern persistent data structures are implemented. Please do not talk about immutability as if it necessarily means having a naive implementation like this.

thinkpad20 · on Dec 16, 2013

Like I said there are tricks to get around it. I was referring to a C-style array. I think it's still accurate to say that immutability does not always mean less copying.

coolsunglasses · on Dec 16, 2013

"tricks to get around it" if by that you mean non-naive data structures that you're supposed to use in order to make immutability efficient yes.

Egregious mischaracterization.

Said "tricks" are an entire branch of research in CS.

graue · on Dec 16, 2013

As a fellow Clojurist, I agree with the substance of what you're saying here but wish you could express it in a more friendly manner. Both your comments essentially say "you're wrong" without educating or adding value. I don't feel that reflects well on the Clojure community, and I'd like us to do better.

thinkpad20 · on Dec 16, 2013

Your tone suggests that you think I don't understand what you're saying (which perhaps I wouldn't, since you succeeded only in telling me I was wrong and failed to actually explain "how modern persistent data structures are implemented" -- as if they were all implemented the same way). Your last sentence suggests that you don't think I know that efficient immutable data structures are an active area of computer science research. Both assumptions are incorrect.

Now, I realize that in my original post, I might have given the wrong impression. I thought that by my second post I was being clear enough, but perhaps I wasn't. Let's try take three:

Immutable data structures do not necessarily guarantee less copying, or necessarily imply a performance gain. A data structure which does not lend itself well to immutability, such as a C-style array, can lead to very inefficient code when used in an immutable fashion. The C-style array or a variation thereof is also the default in most current languages, including Java, Python, C++, Ruby, and many others, so this is hardly a thing of the past. It's important to be aware of the performance characteristics of the data structures one is using, respective to the way in which they are used.

PeterisP · on Dec 16, 2013

No, you'd have to resort to tricks in order to have a C-style array and encounter this problem; if you just use the default stuff, you get data structures that work great with immutability.

graue · on Dec 16, 2013

> Immutable arrays, where the entire array must be copied with the change of one element, are an example of that.

Only in a naive implementation. Clojure, for example, has a persistent vector that only requires O(log32 n) copying, which grows so slowly as to be effectively O(1).

See: http://hypirion.com/musings/understanding-persistent-vector-...

eru · on Dec 16, 2013

Yes. The right data structure for the right job.

If your array only has one future---ie there are no references to the unchanged array around---you can re-use the old array. That means you get to mutate in place but still pretend you have immutability.

dmak · on Dec 16, 2013

How is the original (the for loop that you mentioned) O(n^2)? Isn't that O(n)? Isn't acc += strings[i] a O(1) operation?

alok-g · on Dec 16, 2013

Each time strings[i] is added to acc, it may require reallocating entire acc to new memory location with more memory for the concatenated string.

masklinn · on Dec 16, 2013

`acc` is assumed immutable, so it has to be copied entirely on each concatenation. Copying a string is an O(n) operation.

MBCook · on Dec 16, 2013

But strings.length is O(n), and you have to run it n times.

comex · on Dec 16, 2013

No, it is not.

jefffoster · on Dec 16, 2013

With the naive representation of a string as a list of characters [Char], it is. The only way to get to the end of the list is to recursively take the tail of the string (see the source at http://hackage.haskell.org/package/base-4.6.0.1/docs/src/GHC...).

There's alternative representations with different trade offs, such as Data.Text

comex · on Dec 16, 2013

Heh... but I was referring to the example seliopou gave, which appears to be in JavaScript, and strings is [String] rather than [Char]. I think the analogy might cause more confusion than clarification.

10098 · on Dec 16, 2013

not if you have to reallocate space for storing acc on each iteration

auggierose · on Dec 16, 2013

It is a standard mistake that people sometimes make when working with lists. Lists have a O(1) complexity of prepending an element, and O(n) complexity of appending an element. So when you have to append a bunch of elements to a list, it is better to reverse the list, prepend the elements, and then reverse the list again, instead of just appending all elements. The former approach has linear complexity, the latter quadratic.

aidenn0 · on Dec 16, 2013

Lists don't have O(n) complexity of appending an element, if you keep track of the tail element.

verbin217 · on Dec 16, 2013

I assume you're talking about lists with mutable pointers. I don't think that's what they're talking about here. Functional (persistent) lists are immutable. You can add elements only by creating a new data structure, similar to a linked list node, that will hold immutable references to the existing list and the new element. By convention the element is considered to be in front of the list. Hence prepending is rather easy (constant time) while appending requires rebuilding the whole list (linear time).

danieldk · on Dec 16, 2013

I assume you're talking about lists with mutable pointers. I don't think that's what they're talking about here. Functional (persistent) lists are immutable. [...] Hence prepending is rather easy (constant time) while appending requires rebuilding the whole list (linear time).

You can append in O(1) time using difference lists. They don't have all the niceties of Prolog difference lists, but they are still great if you only have to append:

http://hackage.haskell.org/package/dlist

mhogomchungu · on Dec 16, 2013

A list manager with a pointer to the last element on the list should also be able to append to the managed list in O(1)

auggierose · on Dec 16, 2013

That does not work if your list is persistent.

seliopou · on Dec 16, 2013

I for one prefer the more "algebraic" implementation. Why worry about performance when you get theorems for free?!

Monads.

kenko · on Dec 16, 2013

I for one prefer the (appropriately) unthinking anti-intellectualism.

Hacker News.

coherentpony · on Dec 16, 2013

I think (s)he was being sarcastic.

mintplant · on Dec 16, 2013

And that's the anti-intellectualism.

hdevalence · on Dec 16, 2013

"And make no mistake: irony tyrannizes us. The reason why our pervasive cultural irony is at once so powerful and so unsatisfying is that an ironist is impossible to pin down. All irony is a variation on a sort of existential poker-face. All U.S. irony is based on an implicit "I don't really mean what I say." So what does irony as a cultural norm mean to say? That it's impossible to mean what you say? That maybe it's too bad it's impossible, but wake up and smell the coffee already? Most likely, I think, today's irony ends up saying: "How very banal to ask what I mean." Anyone with the heretical gall to ask an ironist what he actually stands for ends up looking like a hysteric or a prig. And herein lies the oppressiveness of institutionalized irony, the too-successful rebel: the ability to interdict the question without attending to its content is tyranny. It is the new junta, using the very tool that exposed its enemy to insulate itself."

From David Foster Wallace's "E Unibus Pluram".

haberman · on Dec 16, 2013

Only if you believe that "intellectualism" means "favoring more abstract approaches regardless of practical concerns."

oakwhiz · on Dec 16, 2013

Why not upgrade the compiler to detect the algebraic implementation and optimize it to the buffer implementation?

tikhonj · on Dec 16, 2013

It actually does this in a bunch of cases; see Haskell's stream fusion[1] and rewrite rules in general[2] for examples.

However, there aren't any rewrite rules for list-based code, at least with the standard library. Rewrite rules are generally used for libraries designed explicitly with performance in mind like Vector and Bytestring; the standard String type and the Prelude don't fall into this category.

[1]: http://stackoverflow.com/questions/578063/what-is-haskells-s...

[2]: http://www.haskell.org/ghc/docs/7.0.1/html/users_guide/rewri...

lmm · on Dec 16, 2013

Obvious follow-up question: why does cabal not use more performance-oriented libraries? Does it have to avoid depending on anything because it's the dependency manager?

cliffbean · on Dec 16, 2013

Because if it's a common pattern, then it's nice to put it in a common library anyway, instead of having everyone write it out by hand. It makes the code cleaner and more efficient. Win win!

dinkumthinkum · on Dec 16, 2013

Your users might care ...

10098 · on Dec 16, 2013

http://imgur.com/QIVtdxM.jpg

anonymouscowar1 · on Dec 16, 2013

Do people try and optimize Haskell programs? This is a part of Haskell that terrifies me.

tmoertel · on Dec 16, 2013

Yes! Haskell is a delight for optimization because you can use algebraic reasoning and calculation to derive more efficient code. For example, if I had the following code:

    (concat . reverse . map reverse) strs

I could easily replace it with the following code, which is simpler, faster, and reduces memory churn:

    (reverse . concat) strs

You can prove that these two implementations produce the same results (see [1] for a step-by-step proof). After you do the proofs for a while, you'll start seeing similar optimizations everywhere – and claiming them.

[1] https://gist.github.com/tmoertel/7968466

xxpor · on Dec 16, 2013

If you can derive this and prove it, shouldn't this optimization be done (theoretically) completely automatically by the compiler?

quchen · on Dec 16, 2013

Computers are good at verifying things, but bad at coming up with them.

(An example for something computers can come up with is free theorems, but in this case that wouldn't have helped.)

Kutta · on Dec 16, 2013

The rewriting part - if not the comping up with proofs part - is done automatically in quite a few libraries. You can specify the expression rewriting rules in pragmas in your source and GHC will carry them out.

cliffbean · on Dec 16, 2013

Sure they do! Optimizing is more than just low-level efficiency. It's also about using appropriate data structures and algorithms, and using advantageous APIs (like buf_concat here) when they're applicable.

aidenn0 · on Dec 16, 2013

This looks like a case of someone using a profiler to optimize a haskell program. A profiler is nearly always the best place to start when optimizing a program, regardless of the language.

seliopou · on Dec 16, 2013

Look. I love Haskell. I usually formulate solutions to programming problems in my head in Haskell before I write code in whatever language I have to deal with. Constructing programs using combinators? Great. Assuming referential transparency everywhere? Great. It's a great language.

But there comes a point in time where you have to project that stuff all away and get down to the brass tacks of what's happening in your system, leaving behind the Platonic heaven you've constructed for yourself. As the patch proves without a doubt, this is very possible to do in Haskell. But sadly, the general ethos of the Haskell community does not bend in that direction. I am a lover of the language, but at the same time a vocal critic of the community in that respect.

tikhonj · on Dec 16, 2013

Given the amount of effort and mindshare that libraries like Vector, Bytestring and Repa get in the Haskell community, I really don't see where you got your impression. Not everyone cares about performance, sure, but plenty of people do leading to some heavily optimized libraries.

In this day and age, having everyone worry about performance is patently absurd. Most programs don't need to be heavily optimized, so prioritizing programmer efficiency and correctness just makes sense. And Haskell does have the libraries, language features and people to optimize the parts that need optimization!

eru · on Dec 16, 2013

Why? There are lots of people in the Haskell community interested in speed.

freyrs3 · on Dec 16, 2013

Why is Haskell different than any other high level language in this regard? All high level languages make compromises between expressivity and performance. If you want to code really close to the metal, C exists and you know where to find it.

fekberg · on Dec 16, 2013

Shouldn't we optimize all our programs?

eru · on Dec 16, 2013

Only the programs that aren't good (eg fast, simple, small) enough.

fekberg · on Dec 16, 2013

I agree with you on that, in this case it wasn't fast enough though. I think "optimize" should be a part of the "refactoring" phase.

thomasahle · on Dec 16, 2013

Doesn't hlint catch stuff like this?

gwern · on Dec 16, 2013

How would hlint catch this? The offending line is

    foldr (flip (buf_append bufOps)) (buf_empty bufOps) strs

hlint doesn't know what 'buf_append bufOps' is, or 'buf_empty bufOps' or 'strs'.

If you saw 'foldr (flip (u v)) (x y) z', would you immediately go 'ah, n^2! better throw a warning'? If not, why do you expect hlint to warn?

taybin · on Dec 16, 2013

Yep, this can also bite you in Erlang.