Hacker News new | past | comments | ask | show | jobs | submit login
Turn O(n^2) reverse into O(n) (github.com/nominolo)
175 points by _ks3e on Dec 16, 2013 | hide | past | favorite | 90 comments



Explanation:

bufOps is a dictionary which holds a bunch of functions accessed with the getters on it. For the sake of this comment, we can concretize and use (buf_empty bufOps) as [] and (buf_append bufOps) as ++.

This code then essentially performs:

    foldr (flip (++)) [] xs
Which, if you look up the definition of foldr, is:

    ((([] ++ xN) ++ ... ) ++ x2) ++ x1
And a definition of ++ is of course:

    [] ++ ys = ys
    (x:xs) ++ ys = x : (xs ++ ys)
This means that for lists of this sort a ++ b runs in time O(length a), because it has to descend down the leftmost list to find the empty list -- only once it finds [] can it "work its way backwards" to append elements from a onto b.

If each of the x1, x2, ... xN has m elements, then we do 0 + m + 2m + ... + N m = m * N * (N + 1) / 2 operations. Each ++ will do about N operations and we'll do about N of them; it's O(N^2).

The new algorithm, `concat (reverse xs)`, works because `xs` is just a list which can be reversed by traversing down it in O(N) time, then those can be merged together in O(N * m) time.


So the original line of code has apparently been present since the very first import by Sigbjørn Finne in https://github.com/nominolo/HTTP/blob/c4765e822eb92196fec955... (check line 443).

If a Haskell expert (e.g., he authored hdirect -- an IDL compiler and interface with Win32 COM -- sadly defunct now) makes this kind of mistake, how are mere mortals supposed to reason about algorithmic efficiency?


I don't think this was an 'oh my god we're trying to track down why cabal update is so slow and we just can't find it' issue like you make it out to be.

edit: http://www.reddit.com/r/haskell/comments/1sh67u/the_reason_w... This comment by the patch author indicates that actually tracking it down was a fairly straightforward profiling job.


I didn't try to imply that it was difficult to find the performance problem. It's an observation on something else: Haskell is often mentioned for its equational reasoning. I find it interesting that correct equational reasoning in this case lead to a correct program, but one with absymal performance.

It'd be interesting to think how one could encode performance characteristics into equations.


I think having simple and direct denotational semantics necessarily means having relatively complex, indirect operational semantics (e.g: Haskell). Having a simple and direct operational semantics necessarily means having complex and indirect denotational semantics (e.g: C).

C makes performance easy and correctness hard. Haskell makes correctness easy and performance hard.

The trick is, in most programs, you only need performance for a tiny subset of the program. You need correctness throughout the whole program.


Why do you think that there must be a tradeoff between simplicity of denotational and operational semantics?


Because operational semantics (of contemporary computers, at least) are very different from a simple denotational semantics.

We have to bridge that gap:

1) Either use a compiler that hides away the operational details

2) Or use a language that directly maps to the operational semantics, but then is necessarily far away from the denotational ones


Its known that postpending to a linked list is slow. It's just a property of the data type.


> I find it interesting that correct equational reasoning in this case lead to a correct program, but one with absymal performance.

That's not really an interesting property of Haskell, though. You can always write correct but slow code in any language.

> It'd be interesting to think how one could encode performance characteristics into equations.

I've seen the concept of encoding performance in types played around with, but I can't find actual work (if any) that's been done on it.


> You can always write correct but slow code in any language.

Yes, but Haskell makes that exceptionally easy.


Humans occasionally make mistakes, including expert ones.


Exactly. Isn't it comforting to know that making mistakes is only human? And isn't this a wonderful practical example of why you want to be open source?


I suppose what Alan Perlis said about LISP programmers—that they know the value of everything and the cost of nothing—is true for Haskell programmers as well.


This is a bad application of the quote.

Most of the Lisp hackers are usually knowledgeable about analysis of algorithm complexity.

Alan Perlis comment talks about inefficiency of Lisp programs due to a lot of (meta) abstraction and indirect mapping of abstract software to hardware.


Admittedly, this particular case seems amenable to fairly straightforward algorithmic complexity analysis, but it seems to me that the lazy nature of Haskell seems to make it even harder to get a good grasp of costs involved than in a language like Lisp.


Mistakes of this sort have been made in many languages. I don't see what this has to do with Haskell specifically.


I think back then, the 'n' just wasn't very large, so O(n) versus O(n^2) didn't make much of a difference.


Advantage in focus - maybe he wrote it once and didn't look back at it. Low hanging fruit elsewhere for typical use case?


> [...] how are mere mortals supposed to reason about algorithmic efficiency?

You profile, and then you look at the suspicious parts.



I guess nobody ever got upvoted for using StringBuffer/StringBuilder instead of String concatenation in Java.


In JDK 7 '+' concatenation actually uses StringBuilder under the hood, so probably nobody cares.

I can't tell if you're being facetious.


Indeed I was.

It's a good safety barrier against hype things.


I'd definitely like to see an explanation of what's going on, both at a high level and the specifics of the code. As a Haskell beginner-intermediate, I don't really know what most of those functions are doing (much less the context of what that function's purpose is), but I feel I could probably understand an explanation if it were given.


I think I know enough Haskell to explain.

Old version:

    foldr (flip (buf_append bufOps)) (buf_empty bufOps) strs
foldr is a combination map/reduce in one. It takes a value (the accumulator), runs a function on the rightmost value in a list called str (that's why it's 'foldr'), and combines the two together into a new accumulator value.

Let's say we have a list of numbers, the addition operator, and an accumulator of 0.

    foldr op acc list
    foldr + 0 [1, 2, 3, 4]
    a = 0, list = [1, 2, 3, 4]
    a = 0 + 4, list = [1, 2, 3]
    a = 4 + 3, list = [1, 2]
    a = 7 + 1, list = [1]
    a = 8, list = []
    Result is 8
One thing to note is that this walks the list 'backward', and in Haskell it's a singly linked list. That means that each time it has to walk the full list. I believe this is why it's O(n^2).

It looks like the accumulator starts with (buf_empty bufOps), which from context I assume is an empty buffer structure. The operator is (flip (but_append bufOps)), which looks like it adds values onto the bufOps structure when called. Each time it goes through this process, it appends one element.

Because they're using foldr and the list is walked 'backward', this has the natural side effect that the order of the items in strs is reversed as it's added to bufOps.

New version:

    buf_concat bufOps $ reverse strs
The $ is a way of controlling precedence in Haskell, you can replace it with parenthesis. So we get:

    buf_concat bufOps ( reverse strs )
So reverse just reverses the order of a list, getting things in the order the code wants them. Then it calls buf_concat with the current structure and the items it wants added. So instead of taking one existing list and adding 6 new elements at the end one at a time, it takes one existing list and adds a list of 6 elements to the end once.

It seems like a relatively simple change. I wonder if using foldl or foldl' to avoid having to rewalk the list every time would have performance similar to the new line.

I find the new line simpler and cleaner either way, but was the choice of a right-fold the cause of the performance problem?


> One thing to note is that this walks the list 'backward'

No, foldr walks through the list forward, and exactly once. Your example is evaluated as

    foldr (+) 0 [1, 2, 3, 4]
    = 1 + foldr (+) 0 [2,3,4]
    = 1 + (2 + foldr (+) 0 [3,4])
    = 1 + (2 + (3 + foldr (+) 0 [4]))
    = 1 + (2 + (3 + (4 + foldr (+) 0 [])))
    = 1 + (2 + (3 + (4 + 0)))


Is there an (easy) way to see what/how haskell evaluates such code (ie: step through)? Something like explain from sql or disassemble from lisp?


GHCI has a debugger. You can step into an expression and watch as Haskell evaluates it, in full lazy order.

Even if you understand lazy evaluation in theory, I recommend taking some non-trivial (but not too large) expression and :step'ing through the whole evaluation sometime, just to train your intuition.


    $ is a way of controlling precedence in Haskell
But much more importantly, is `apply`.


Just so no one reading is confused, should be: Result is 10 in foldr example. Just forgot one element from list.


You're right. Good catch.


My Haskell skills still are growing. Can someone explain?


The original line was something like this:

  acc = "";
  for (var i = 0; i < strings.length; i++) {
      acc += strings[i]
  }
The new version is something like this:

  strings.join("")
The second version can pre-allocate a string and copy characters into that string under the hood. The first version has to allocate a new string and recopy the characters on every iteration.


Even if the buffers were vanilla Strings, and there were no pre-allocation, the patch would still have fixed the quadratic-runtime problem. Consider that, for vanilla Strings, the buffer ops are implemented like so:

    buf_append bufOps = (++)
    buf_concat bufOps = concat
Since (++) has a running time that's linear in the length of its left argument, you want to treat it as a right-associative operator to prevent quadratic runtime when joining a bunch of strings.

But, going back to the patch in question, the original code was effectively this:

    foldr (flip (++)) [] strs
Since flip had been applied to (++), the resulting operator you now want to treat as left associative to avoid quadratic blow-up. The code, however, uses the right-associative fold, foldr, to join the list of strings str.

The patch fixes this problem by replacing the code with the equivalent code

    concat (reverse strs)
And how is concat defined in the libraries? Looking at the source [1], it's

    concat = foldr (++) []
Thus the fix is basically

    foldr (++) [] (reverse strs).
This version reverses strs, at linear-time cost, to be able to apply the normal, unflipped (++) with a right-associative fold and thus avoid the dreaded quadratic blow-up.

[1] http://hackage.haskell.org/package/base-4.6.0.1/docs/src/GHC...

EDITED TO ADD: Also, if you knew the buffers were vanilla Strings, you could even eliminate the reverse overhead since

    foldr op z xs == foldl (flip op) z (reverse xs)
for all finite lists xs. Thus you could flip (++) and use foldl instead of foldr to get an efficient implementation like this:

    foldl (flip (++)) [] strs


Nice explanation. I'm surprised that no one so far mentioned a cool trick for getting around this using a different way to define mappend as function composition, to make sure ++ is applied in the right order, since it's from LYAH (scroll to end of the linked chapter):

http://learnyouahaskell.com/for-a-few-monads-more#writer


I recall when I studied algorithms and data structures, but just swapping to using references/pointers instead of copying memory, I could speed up my algorithms ridiculously. Correct me if I am wrong. "Small" things like these (as we just saw) can be really beneficial!


As an aside, immutability (ie purity) in Haskell allows refercences everywhere, and thus less copying. (They call it `sharing' in Haskell land.)


I wonder if there's something different about Haskell that makes "less copying" more attractive?

In the C++ world, libstdc++ strings have copy-on-write semantics, which (as far as I heard) turned out to be terrible because you have to do reference counting instead, and with multithreading it requires atomic operations, which is slower than copying for small strings.


It's worth noting, though, that immutability doesn't always mean less copying. Immutable arrays, where the entire array must be copied with the change of one element, are an example of that. And by extension, hash tables, etc. Of course there are lots of tricks that can be employed to get around this, to a degree.


>Immutable arrays, where the entire array must be copied with the change of one element, are an example of that

That is not how modern persistent data structures are implemented. Please do not talk about immutability as if it necessarily means having a naive implementation like this.


Like I said there are tricks to get around it. I was referring to a C-style array. I think it's still accurate to say that immutability does not always mean less copying.


"tricks to get around it" if by that you mean non-naive data structures that you're supposed to use in order to make immutability efficient yes.

Egregious mischaracterization.

Said "tricks" are an entire branch of research in CS.


As a fellow Clojurist, I agree with the substance of what you're saying here but wish you could express it in a more friendly manner. Both your comments essentially say "you're wrong" without educating or adding value. I don't feel that reflects well on the Clojure community, and I'd like us to do better.


Your tone suggests that you think I don't understand what you're saying (which perhaps I wouldn't, since you succeeded only in telling me I was wrong and failed to actually explain "how modern persistent data structures are implemented" -- as if they were all implemented the same way). Your last sentence suggests that you don't think I know that efficient immutable data structures are an active area of computer science research. Both assumptions are incorrect.

Now, I realize that in my original post, I might have given the wrong impression. I thought that by my second post I was being clear enough, but perhaps I wasn't. Let's try take three:

Immutable data structures do not necessarily guarantee less copying, or necessarily imply a performance gain. A data structure which does not lend itself well to immutability, such as a C-style array, can lead to very inefficient code when used in an immutable fashion. The C-style array or a variation thereof is also the default in most current languages, including Java, Python, C++, Ruby, and many others, so this is hardly a thing of the past. It's important to be aware of the performance characteristics of the data structures one is using, respective to the way in which they are used.


No, you'd have to resort to tricks in order to have a C-style array and encounter this problem; if you just use the default stuff, you get data structures that work great with immutability.


> Immutable arrays, where the entire array must be copied with the change of one element, are an example of that.

Only in a naive implementation. Clojure, for example, has a persistent vector that only requires O(log32 n) copying, which grows so slowly as to be effectively O(1).

See: http://hypirion.com/musings/understanding-persistent-vector-...


Yes. The right data structure for the right job.

If your array only has one future---ie there are no references to the unchanged array around---you can re-use the old array. That means you get to mutate in place but still pretend you have immutability.


How is the original (the for loop that you mentioned) O(n^2)? Isn't that O(n)? Isn't acc += strings[i] a O(1) operation?


Each time strings[i] is added to acc, it may require reallocating entire acc to new memory location with more memory for the concatenated string.


`acc` is assumed immutable, so it has to be copied entirely on each concatenation. Copying a string is an O(n) operation.


But strings.length is O(n), and you have to run it n times.


No, it is not.


With the naive representation of a string as a list of characters [Char], it is. The only way to get to the end of the list is to recursively take the tail of the string (see the source at http://hackage.haskell.org/package/base-4.6.0.1/docs/src/GHC...).

There's alternative representations with different trade offs, such as Data.Text


Heh... but I was referring to the example seliopou gave, which appears to be in JavaScript, and strings is [String] rather than [Char]. I think the analogy might cause more confusion than clarification.


not if you have to reallocate space for storing acc on each iteration


It is a standard mistake that people sometimes make when working with lists. Lists have a O(1) complexity of prepending an element, and O(n) complexity of appending an element. So when you have to append a bunch of elements to a list, it is better to reverse the list, prepend the elements, and then reverse the list again, instead of just appending all elements. The former approach has linear complexity, the latter quadratic.


Lists don't have O(n) complexity of appending an element, if you keep track of the tail element.


I assume you're talking about lists with mutable pointers. I don't think that's what they're talking about here. Functional (persistent) lists are immutable. You can add elements only by creating a new data structure, similar to a linked list node, that will hold immutable references to the existing list and the new element. By convention the element is considered to be in front of the list. Hence prepending is rather easy (constant time) while appending requires rebuilding the whole list (linear time).


I assume you're talking about lists with mutable pointers. I don't think that's what they're talking about here. Functional (persistent) lists are immutable. [...] Hence prepending is rather easy (constant time) while appending requires rebuilding the whole list (linear time).

You can append in O(1) time using difference lists. They don't have all the niceties of Prolog difference lists, but they are still great if you only have to append:

http://hackage.haskell.org/package/dlist


A list manager with a pointer to the last element on the list should also be able to append to the managed list in O(1)


That does not work if your list is persistent.


I for one prefer the more "algebraic" implementation. Why worry about performance when you get theorems for free?!

Monads.


I for one prefer the (appropriately) unthinking anti-intellectualism.

Hacker News.


I think (s)he was being sarcastic.


And that's the anti-intellectualism.


"And make no mistake: irony tyrannizes us. The reason why our pervasive cultural irony is at once so powerful and so unsatisfying is that an ironist is impossible to pin down. All irony is a variation on a sort of existential poker-face. All U.S. irony is based on an implicit "I don't really mean what I say." So what does irony as a cultural norm mean to say? That it's impossible to mean what you say? That maybe it's too bad it's impossible, but wake up and smell the coffee already? Most likely, I think, today's irony ends up saying: "How very banal to ask what I mean." Anyone with the heretical gall to ask an ironist what he actually stands for ends up looking like a hysteric or a prig. And herein lies the oppressiveness of institutionalized irony, the too-successful rebel: the ability to interdict the question without attending to its content is tyranny. It is the new junta, using the very tool that exposed its enemy to insulate itself."

From David Foster Wallace's "E Unibus Pluram".


Only if you believe that "intellectualism" means "favoring more abstract approaches regardless of practical concerns."


Why not upgrade the compiler to detect the algebraic implementation and optimize it to the buffer implementation?


It actually does this in a bunch of cases; see Haskell's stream fusion[1] and rewrite rules in general[2] for examples.

However, there aren't any rewrite rules for list-based code, at least with the standard library. Rewrite rules are generally used for libraries designed explicitly with performance in mind like Vector and Bytestring; the standard String type and the Prelude don't fall into this category.

[1]: http://stackoverflow.com/questions/578063/what-is-haskells-s...

[2]: http://www.haskell.org/ghc/docs/7.0.1/html/users_guide/rewri...


Obvious follow-up question: why does cabal not use more performance-oriented libraries? Does it have to avoid depending on anything because it's the dependency manager?


Because if it's a common pattern, then it's nice to put it in a common library anyway, instead of having everyone write it out by hand. It makes the code cleaner and more efficient. Win win!


Your users might care ...



Do people try and optimize Haskell programs? This is a part of Haskell that terrifies me.


Yes! Haskell is a delight for optimization because you can use algebraic reasoning and calculation to derive more efficient code. For example, if I had the following code:

    (concat . reverse . map reverse) strs
I could easily replace it with the following code, which is simpler, faster, and reduces memory churn:

    (reverse . concat) strs
You can prove that these two implementations produce the same results (see [1] for a step-by-step proof). After you do the proofs for a while, you'll start seeing similar optimizations everywhere – and claiming them.

[1] https://gist.github.com/tmoertel/7968466


If you can derive this and prove it, shouldn't this optimization be done (theoretically) completely automatically by the compiler?


Computers are good at verifying things, but bad at coming up with them.

(An example for something computers can come up with is free theorems, but in this case that wouldn't have helped.)


The rewriting part - if not the comping up with proofs part - is done automatically in quite a few libraries. You can specify the expression rewriting rules in pragmas in your source and GHC will carry them out.


Sure they do! Optimizing is more than just low-level efficiency. It's also about using appropriate data structures and algorithms, and using advantageous APIs (like buf_concat here) when they're applicable.


This looks like a case of someone using a profiler to optimize a haskell program. A profiler is nearly always the best place to start when optimizing a program, regardless of the language.


Look. I love Haskell. I usually formulate solutions to programming problems in my head in Haskell before I write code in whatever language I have to deal with. Constructing programs using combinators? Great. Assuming referential transparency everywhere? Great. It's a great language.

But there comes a point in time where you have to project that stuff all away and get down to the brass tacks of what's happening in your system, leaving behind the Platonic heaven you've constructed for yourself. As the patch proves without a doubt, this is very possible to do in Haskell. But sadly, the general ethos of the Haskell community does not bend in that direction. I am a lover of the language, but at the same time a vocal critic of the community in that respect.


Given the amount of effort and mindshare that libraries like Vector, Bytestring and Repa get in the Haskell community, I really don't see where you got your impression. Not everyone cares about performance, sure, but plenty of people do leading to some heavily optimized libraries.

In this day and age, having everyone worry about performance is patently absurd. Most programs don't need to be heavily optimized, so prioritizing programmer efficiency and correctness just makes sense. And Haskell does have the libraries, language features and people to optimize the parts that need optimization!


Why? There are lots of people in the Haskell community interested in speed.


Why is Haskell different than any other high level language in this regard? All high level languages make compromises between expressivity and performance. If you want to code really close to the metal, C exists and you know where to find it.


Shouldn't we optimize all our programs?


Only the programs that aren't good (eg fast, simple, small) enough.


I agree with you on that, in this case it wasn't fast enough though. I think "optimize" should be a part of the "refactoring" phase.


Doesn't hlint catch stuff like this?


How would hlint catch this? The offending line is

    foldr (flip (buf_append bufOps)) (buf_empty bufOps) strs
hlint doesn't know what 'buf_append bufOps' is, or 'buf_empty bufOps' or 'strs'.

If you saw 'foldr (flip (u v)) (x y) z', would you immediately go 'ah, n^2! better throw a warning'? If not, why do you expect hlint to warn?


Yep, this can also bite you in Erlang.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: