bufOps is a dictionary which holds a bunch of functions accessed with the getters on it. For the sake of this comment, we can concretize and use (buf_empty bufOps) as [] and (buf_append bufOps) as ++.
This code then essentially performs:
foldr (flip (++)) [] xs
Which, if you look up the definition of foldr, is:
((([] ++ xN) ++ ... ) ++ x2) ++ x1
And a definition of ++ is of course:
[] ++ ys = ys
(x:xs) ++ ys = x : (xs ++ ys)
This means that for lists of this sort a ++ b runs in time O(length a), because it has to descend down the leftmost list to find the empty list -- only once it finds [] can it "work its way backwards" to append elements from a onto b.
If each of the x1, x2, ... xN has m elements, then we do 0 + m + 2m + ... + N m = m * N * (N + 1) / 2 operations. Each ++ will do about N operations and we'll do about N of them; it's O(N^2).
The new algorithm, `concat (reverse xs)`, works because `xs` is just a list which can be reversed by traversing down it in O(N) time, then those can be merged together in O(N * m) time.
If a Haskell expert (e.g., he authored hdirect -- an IDL compiler and interface with Win32 COM -- sadly defunct now) makes this kind of mistake, how are mere mortals supposed to reason about algorithmic efficiency?
I don't think this was an 'oh my god we're trying to track down why cabal update is so slow and we just can't find it' issue like you make it out to be.
I didn't try to imply that it was difficult to find the performance problem. It's an observation on something else: Haskell is often mentioned for its equational reasoning. I find it interesting that correct equational reasoning in this case lead to a correct program, but one with absymal performance.
It'd be interesting to think how one could encode performance characteristics into equations.
I think having simple and direct denotational semantics necessarily means having relatively complex, indirect operational semantics (e.g: Haskell). Having a simple and direct operational semantics necessarily means having complex and indirect denotational semantics (e.g: C).
C makes performance easy and correctness hard.
Haskell makes correctness easy and performance hard.
The trick is, in most programs, you only need performance for a tiny subset of the program. You need correctness throughout the whole program.
Exactly. Isn't it comforting to know that making mistakes is only human? And isn't this a wonderful practical example of why you want to be open source?
I suppose what Alan Perlis said about LISP programmers—that they know the value of everything and the cost of nothing—is true for Haskell programmers as well.
Most of the Lisp hackers are usually knowledgeable about analysis of algorithm complexity.
Alan Perlis comment talks about inefficiency of Lisp programs due to a lot of (meta) abstraction and indirect mapping of abstract software to hardware.
Admittedly, this particular case seems amenable to fairly straightforward algorithmic complexity analysis, but it seems to me that the lazy nature of Haskell seems to make it even harder to get a good grasp of costs involved than in a language like Lisp.
I'd definitely like to see an explanation of what's going on, both at a high level and the specifics of the code. As a Haskell beginner-intermediate, I don't really know what most of those functions are doing (much less the context of what that function's purpose is), but I feel I could probably understand an explanation if it were given.
foldr is a combination map/reduce in one. It takes a value (the accumulator), runs a function on the rightmost value in a list called str (that's why it's 'foldr'), and combines the two together into a new accumulator value.
Let's say we have a list of numbers, the addition operator, and an accumulator of 0.
foldr op acc list
foldr + 0 [1, 2, 3, 4]
a = 0, list = [1, 2, 3, 4]
a = 0 + 4, list = [1, 2, 3]
a = 4 + 3, list = [1, 2]
a = 7 + 1, list = [1]
a = 8, list = []
Result is 8
One thing to note is that this walks the list 'backward', and in Haskell it's a singly linked list. That means that each time it has to walk the full list. I believe this is why it's O(n^2).
It looks like the accumulator starts with (buf_empty bufOps), which from context I assume is an empty buffer structure. The operator is (flip (but_append bufOps)), which looks like it adds values onto the bufOps structure when called. Each time it goes through this process, it appends one element.
Because they're using foldr and the list is walked 'backward', this has the natural side effect that the order of the items in strs is reversed as it's added to bufOps.
New version:
buf_concat bufOps $ reverse strs
The $ is a way of controlling precedence in Haskell, you can replace it with parenthesis. So we get:
buf_concat bufOps ( reverse strs )
So reverse just reverses the order of a list, getting things in the order the code wants them. Then it calls buf_concat with the current structure and the items it wants added. So instead of taking one existing list and adding 6 new elements at the end one at a time, it takes one existing list and adds a list of 6 elements to the end once.
It seems like a relatively simple change. I wonder if using foldl or foldl' to avoid having to rewalk the list every time would have performance similar to the new line.
I find the new line simpler and cleaner either way, but was the choice of a right-fold the cause of the performance problem?
GHCI has a debugger. You can step into an expression and watch as Haskell evaluates it, in full lazy order.
Even if you understand lazy evaluation in theory, I recommend taking some non-trivial (but not too large) expression and :step'ing through the whole evaluation sometime, just to train your intuition.
acc = "";
for (var i = 0; i < strings.length; i++) {
acc += strings[i]
}
The new version is something like this:
strings.join("")
The second version can pre-allocate a string and copy characters into that string under the hood. The first version has to allocate a new string and recopy the characters on every iteration.
Even if the buffers were vanilla Strings, and there were no pre-allocation, the patch would still have fixed the quadratic-runtime problem. Consider that, for vanilla Strings, the buffer ops are implemented like so:
Since (++) has a running time that's linear in the length of its left argument, you want to treat it as a right-associative operator to prevent quadratic runtime when joining a bunch of strings.
But, going back to the patch in question, the original code was effectively this:
foldr (flip (++)) [] strs
Since flip had been applied to (++), the resulting operator you now want to treat as left associative to avoid quadratic blow-up. The code, however, uses the right-associative fold, foldr, to join the list of strings str.
The patch fixes this problem by replacing the code with the equivalent code
concat (reverse strs)
And how is concat defined in the libraries? Looking at the source [1], it's
concat = foldr (++) []
Thus the fix is basically
foldr (++) [] (reverse strs).
This version reverses strs, at linear-time cost, to be able to apply the normal, unflipped (++) with a right-associative fold and thus avoid the dreaded quadratic blow-up.
Nice explanation. I'm surprised that no one so far mentioned a cool trick for getting around this using a different way to define mappend as function composition, to make sure ++ is applied in the right order, since it's from LYAH (scroll to end of the linked chapter):
I recall when I studied algorithms and data structures, but just swapping to using references/pointers instead of copying memory, I could speed up my algorithms ridiculously. Correct me if I am wrong. "Small" things like these (as we just saw) can be really beneficial!
I wonder if there's something different about Haskell that makes "less copying" more attractive?
In the C++ world, libstdc++ strings have copy-on-write semantics, which (as far as I heard) turned out to be terrible because you have to do reference counting instead, and with multithreading it requires atomic operations, which is slower than copying for small strings.
It's worth noting, though, that immutability doesn't always mean less copying. Immutable arrays, where the entire array must be copied with the change of one element, are an example of that. And by extension, hash tables, etc. Of course there are lots of tricks that can be employed to get around this, to a degree.
>Immutable arrays, where the entire array must be copied with the change of one element, are an example of that
That is not how modern persistent data structures are implemented. Please do not talk about immutability as if it necessarily means having a naive implementation like this.
Like I said there are tricks to get around it. I was referring to a C-style array. I think it's still accurate to say that immutability does not always mean less copying.
As a fellow Clojurist, I agree with the substance of what you're saying here but wish you could express it in a more friendly manner. Both your comments essentially say "you're wrong" without educating or adding value. I don't feel that reflects well on the Clojure community, and I'd like us to do better.
Your tone suggests that you think I don't understand what you're saying (which perhaps I wouldn't, since you succeeded only in telling me I was wrong and failed to actually explain "how modern persistent data structures are implemented" -- as if they were all implemented the same way). Your last sentence suggests that you don't think I know that efficient immutable data structures are an active area of computer science research. Both assumptions are incorrect.
Now, I realize that in my original post, I might have given the wrong impression. I thought that by my second post I was being clear enough, but perhaps I wasn't. Let's try take three:
Immutable data structures do not necessarily guarantee less copying, or necessarily imply a performance gain. A data structure which does not lend itself well to immutability, such as a C-style array, can lead to very inefficient code when used in an immutable fashion. The C-style array or a variation thereof is also the default in most current languages, including Java, Python, C++, Ruby, and many others, so this is hardly a thing of the past. It's important to be aware of the performance characteristics of the data structures one is using, respective to the way in which they are used.
No, you'd have to resort to tricks in order to have a C-style array and encounter this problem; if you just use the default stuff, you get data structures that work great with immutability.
> Immutable arrays, where the entire array must be copied with the change of one element, are an example of that.
Only in a naive implementation. Clojure, for example, has a persistent vector that only requires O(log32 n) copying, which grows so slowly as to be effectively O(1).
If your array only has one future---ie there are no references to the unchanged array around---you can re-use the old array. That means you get to mutate in place but still pretend you have immutability.
With the naive representation of a string as a list of characters [Char], it is. The only way to get to the end of the list is to recursively take the tail of the string (see the source at http://hackage.haskell.org/package/base-4.6.0.1/docs/src/GHC...).
There's alternative representations with different trade offs, such as Data.Text
Heh... but I was referring to the example seliopou gave, which appears to be in JavaScript, and strings is [String] rather than [Char]. I think the analogy might cause more confusion than clarification.
It is a standard mistake that people sometimes make when working with lists. Lists have a O(1) complexity of prepending an element, and O(n) complexity of appending an element. So when you have to append a bunch of elements to a list, it is better to reverse the list, prepend the elements, and then reverse the list again, instead of just appending all elements. The former approach has linear complexity, the latter quadratic.
I assume you're talking about lists with mutable pointers. I don't think that's what they're talking about here. Functional (persistent) lists are immutable. You can add elements only by creating a new data structure, similar to a linked list node, that will hold immutable references to the existing list and the new element. By convention the element is considered to be in front of the list. Hence prepending is rather easy (constant time) while appending requires rebuilding the whole list (linear time).
I assume you're talking about lists with mutable pointers. I don't think that's what they're talking about here. Functional (persistent) lists are immutable. [...] Hence prepending is rather easy (constant time) while appending requires rebuilding the whole list (linear time).
You can append in O(1) time using difference lists. They don't have all the niceties of Prolog difference lists, but they are still great if you only have to append:
"And make no mistake: irony tyrannizes us. The reason why our pervasive cultural irony is at once so powerful and so unsatisfying is that an ironist is impossible to pin down. All irony is a variation on a sort of existential poker-face. All U.S. irony is based on an implicit "I don't really mean what I say." So what does irony as a cultural norm mean to say? That it's impossible to mean what you say? That maybe it's too bad it's impossible, but wake up and smell the coffee already? Most likely, I think, today's irony ends up saying: "How very banal to ask what I mean." Anyone with the heretical gall to ask an ironist what he actually stands for ends up looking like a hysteric or a prig. And herein lies the oppressiveness of institutionalized irony, the too-successful rebel: the ability to interdict the question without attending to its content is tyranny. It is the new junta, using the very tool that exposed its enemy to insulate itself."
It actually does this in a bunch of cases; see Haskell's stream fusion[1] and rewrite rules in general[2] for examples.
However, there aren't any rewrite rules for list-based code, at least with the standard library. Rewrite rules are generally used for libraries designed explicitly with performance in mind like Vector and Bytestring; the standard String type and the Prelude don't fall into this category.
Obvious follow-up question: why does cabal not use more performance-oriented libraries? Does it have to avoid depending on anything because it's the dependency manager?
Because if it's a common pattern, then it's nice to put it in a common library anyway, instead of having everyone write it out by hand. It makes the code cleaner and more efficient. Win win!
Yes! Haskell is a delight for optimization because you can use algebraic reasoning and calculation to derive more efficient code. For example, if I had the following code:
(concat . reverse . map reverse) strs
I could easily replace it with the following code, which is simpler, faster, and reduces memory churn:
(reverse . concat) strs
You can prove that these two implementations produce the same results (see [1] for a step-by-step proof). After you do the proofs for a while, you'll start seeing similar optimizations everywhere – and claiming them.
The rewriting part - if not the comping up with proofs part - is done automatically in quite a few libraries. You can specify the expression rewriting rules in pragmas in your source and GHC will carry them out.
Sure they do! Optimizing is more than just low-level efficiency. It's also about using appropriate data structures and algorithms, and using advantageous APIs (like buf_concat here) when they're applicable.
This looks like a case of someone using a profiler to optimize a haskell program. A profiler is nearly always the best place to start when optimizing a program, regardless of the language.
Look. I love Haskell. I usually formulate solutions to programming problems in my head in Haskell before I write code in whatever language I have to deal with. Constructing programs using combinators? Great. Assuming referential transparency everywhere? Great. It's a great language.
But there comes a point in time where you have to project that stuff all away and get down to the brass tacks of what's happening in your system, leaving behind the Platonic heaven you've constructed for yourself. As the patch proves without a doubt, this is very possible to do in Haskell. But sadly, the general ethos of the Haskell community does not bend in that direction. I am a lover of the language, but at the same time a vocal critic of the community in that respect.
Given the amount of effort and mindshare that libraries like Vector, Bytestring and Repa get in the Haskell community, I really don't see where you got your impression. Not everyone cares about performance, sure, but plenty of people do leading to some heavily optimized libraries.
In this day and age, having everyone worry about performance is patently absurd. Most programs don't need to be heavily optimized, so prioritizing programmer efficiency and correctness just makes sense. And Haskell does have the libraries, language features and people to optimize the parts that need optimization!
Why is Haskell different than any other high level language in this regard? All high level languages make compromises between expressivity and performance. If you want to code really close to the metal, C exists and you know where to find it.
bufOps is a dictionary which holds a bunch of functions accessed with the getters on it. For the sake of this comment, we can concretize and use (buf_empty bufOps) as [] and (buf_append bufOps) as ++.
This code then essentially performs:
Which, if you look up the definition of foldr, is: And a definition of ++ is of course: This means that for lists of this sort a ++ b runs in time O(length a), because it has to descend down the leftmost list to find the empty list -- only once it finds [] can it "work its way backwards" to append elements from a onto b.If each of the x1, x2, ... xN has m elements, then we do 0 + m + 2m + ... + N m = m * N * (N + 1) / 2 operations. Each ++ will do about N operations and we'll do about N of them; it's O(N^2).
The new algorithm, `concat (reverse xs)`, works because `xs` is just a list which can be reversed by traversing down it in O(N) time, then those can be merged together in O(N * m) time.