Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Why is C++ still a very popular language in quantitative finance? (quant.stackexchange.com)
52 points by princeverma on Feb 20, 2012 | hide | past | favorite | 49 comments


For applications bottlenecked by memory performance, like most analytical databases, C++ will often be faster than a language like Java by an integer factor. When people assert Java is about as fast as C++ they are talking about CPU-bound tight loops and similar. It is difficult to make languages like Java approach the memory efficiency of C++, hence why C++ is significantly more performant for applications bottlenecked on the memory subsystem.

These days, more performance-sensitive codes are bottlenecked by memory performance than by CPU performance. The throughput of CPUs has grown faster than our ability to keep those CPUs fed from memory. In the supercomputing world this started to become evident years ago; memory benchmarks like STREAM became more closely correlated with real-world performance than CPU benchmarks like LINPACK for a great many algorithms. The resurgence of C++ is partly driven by this reality since it makes memory optimization relatively straightforward.


This cannot be emphasized enough. High-performance algorithms are constrained by memory latency/bandwidth and the CPU caches. Java, with its large per-object space overhead is not well suited for this. You need special case libraries to get the same performance, and in many cases you can just as well use modern C++.


It always amuses me the way people treat C++ like it's some kind of dark, corrupting magic that no right thinking person would use.


To be fair, C++ is a mess. It is, however, a very capable mess with extremely broad adoption and compatibility on pretty much every platform anywhere.

Yes, learning C++ is a pain. And it will bite even its expert handlers pretty badly occasionally. But the kind of maintenance overhead suggested by the stackexchange question really isn't there. For non-trivial code (as in the example here) the maintenance and correctness burden of the problem itself is going to be far higher than that of the programming environment.

So yes: just say no to C++ for web server middleware, system maintenance scripting, probably most GUI work, etc... But for the truly hard stuff, I honestly don't see much advantage to anything else in particular. And as pointed out earlier, C++ can be deployed as a straightfoward program or shared library on everything with zero dependencies, and that's a HUGE advantage vs. languages with elaborate runtimes.


"To be fair, C++ is a mess. It is, however, a very capable mess with extremely broad adoption and compatibility on pretty much every platform anywhere."

C++, IMO, is not alone in that: XML, Unicode, multi-threaded programming using condition variables and mutexes, integer modulo arithmetic, date/time libraries, IEEE floating point, etc.

The good thing of all of them is that they solve complex real-world problems. The 'bad' thing is that they are more complex than other technologies that solve the same problem, partly.


It isn't?

I always used to think of compiler design and advanced algorithms as dark magic; however, I now understand that that was just based off of unfamiliarity with the concepts. I suspect that much of that is the same with C++ as well.


I find the cachet that C++ has when you interview for jobs at financial corporations to be equally depressing and hilarious.

When you go in to interview for a C++ position, the guy starts off by looking at you like you're some kind of brain damaged impostor who eats babies when nobody's watching and starts asking about how sizeof works on pointers to member functions or when "inline virtual" does and does not inline something. Essentially, incredibly dick-waving arcane compiler-specific stuff that no one should ever have to know because it makes for really horrible C++ code.

And vice versa when you interview for a Java/C#/Python/whatever position and some of the interviewers drop how they gave up on all that and then look on in admiration when a conversation about actor model drifts into using atomic pointer swapping to build worker queues.

None of it matters, the C++ masochist questions are more an indication of the kind of people you'll be coding with more than anything while the Java/C#/Python/whatever guys are still working on difficult algorithms.

Moral of the story: Since the hardest part of working in C++ is passing the ridiculous interview questions, might as well stick with it to still appear like a wizard. It's all just programming.


> starts asking about how sizeof works on pointers > to member functions or when "inline virtual" does > and does not inline something.

You really had such kind of interview encounters? It's just weird.


Because they have no idea about C++11? Usually those who treat it this way obviously don't know C++.


C++11 has been finalized for almost two months now! And there's almost one compiler that fully supports it! That's enough time and products for everyone to re-evaluate their experiences with C++!

Seriously, C++, even C++11 is still a mess. Compiler error messages relating to template use -- including the standard library -- are unhelpful at best, except for very recent Clang builds. Compilation time when you actually use what the language has to offer is measured on a geological time scale. The footprint of the generated code, more often than not, makes I-caches cry. You'd be amazed how much faster your computational code runs when it fits properly into an I-cache.

I moved back to plain C a few years ago after having used C++ intensively since '94 or so -- and I'm not looking back. Yes, I do write ~10% more code; it compiles instantaneously, is easier to reason about and to debug, and usually runs just as fast.

I haven't had a chance to try the newfangled C++11 features, but I'd be surprised if they'll make a difference for me.


And by the way, for the typical user, the performance difference between C++ and, say, C# won't be as pronounced (F# is another matter, though).

If you include memory usage in your performance comparison (which is very important for parallel programming), C++ is leagues ahead.


I totally agree, and I have complained about Java's lack of structured value types many times before because they are the main reason for Java's crazy memory usage.

However, the gap has narrowed significantly with the introduction of pointer compression in Java. It only works up to 32GB though, so this is going to be a temporary boost for Java I guess.


Could you point me to any references about pointer comparisons in Java? How is it different than

Object a = new Object(); Object b = new Object(); boolean bool = a == b;


I said "pointer compression" not "pointer comparison".


There are a bunch of really expensive, really useful, 3rd party libraries for financial modeling that are written in C++.

A great many languages have support for calling external code, but debugging the interaction tends to be more trouble than it is worth.


It appears to be mostly inertia. People learned C++ and a few relevant specialized libraries, so they keep on them.

Outside of a few relatively narrow topics like high frequency trading, trading algorithms for direct market access and solving optimizations, the speed of C++ is not needed. As a result I am observing more code development occurring in higher level languages such as Python, R, Matlab and SecDb/Slang. Extremely time or latency sensitive tasks are offloaded from these scripting languages onto broker algorithms or other third party libraries.


SecDB/Slang outside of Goldman?


No SecDB/Slang outside of Goldman because it is completely proprietary, but the single platform vision within has no signs of abetting.


When I was at GS it depended highly on which teams you were on. In FICC, where I was, SecDB/Slang ran everything, but there where definitely signs that it was falling out of favour elsewhere and more Java and C# projects were emerging.

If I were to speculate, I'd say that although GS has a large technology investment in SecDB/Slang, it's still a language that was invented a long long time ago by technology standards (I don't know exactly when it emerged, but I'd say it's safely a good two decades or more old), and it just doesn't have the performance or features that new languages have with current generation compiler and JIT techniques.


These kinds of judgements-wrapped-in questions always come off as a bit sophomoric. It's like an artist asking, "why do some painters still use oil instead of colored pencil? Pencils are erasable, and so much easier to work with. No nasty fumes!". Or an architect-in-training asking, "Why use concrete and steel to build skyscrapers? They have so many constraints, and are so ugly, and hard to work with." I appreciate an iconoclastic spirit, but not one based on ignorance.


I'm a moderator on the Quant Finance Stack Exchange. This link has been posted to HN before:

http://news.ycombinator.com/item?id=2934042

And I'll say what said then: The accepted answer on the SE comes from someone who doesn't even work in quantitative finance. In fact, most of the answers on there are totally speculative and should be taken with the proverbial grain they deserve.


A reasonable grain I'd say :)

While I'm not a quant in anything other than job title (I'm a Quant/Dev really), my experience in the financial world suggests that there are two main reasons why C++ is used so often -

1) Legacy Code. When you've already got a C++ pricing platform and teams of C++ devs, you're not going to suddenly rewrite it in another language.

2) Libraries. Almost every bank/hedge fund/trading house has well tested numerical libraries that are in C++ already. These libraries have been very well tested already and pretty much every aspect of their performance and limitations is a known quantity. Given that, rewriting them in another language is risky. When you're trading size you don't want to run the risk that you'll hit bugs or new edge cases in your models.

For what it's worth, and take this with a grain of salt as well since HF isn't really my area, I've also seen more than a few HF models that aren't in C++. I've seen C# be used, along with Python and MLs. The time periods are so short that a small increase in language speed doesn't really let you do a whole lot more, and the models are changed so quickly that development time to bring a new model to production starts to become the limiting factor instead.


Here's the question that should be featured on HN:

http://quant.stackexchange.com/q/306/35

Essentially, C++ is popular because of existing code, particularly proprietary third-party libraries. And you are correct; even HFT shops will use languages other than C++, which is why anyone who's actually worked in this industry will know that the answer to the question of popularity is not performance.

And that's why I hate that SE question so much. It was asked by someone outside the industry and is mostly answered by people outside the industry.


Having a PhD in physics doesn't necessarily mean you've been exposed to the current state of the art in software development practices.

In my experience C++ is the default language in finance because of legacy reasons and because they don't know any better. This was only a couple of years ago and I saw some horrendously poorly written C++ code--virtually no comments, early 90's style ("let me tell you about the STL"), no comments for hundreds of lines--that was being used in production to value mortgage bonds worth tens of billions of dollars. The group I was in had switched to using Python with Numpy, and was much more productive for it.


The article states that C++ is no better for the average user. How many people in QF are average?


C++ is popular in other sectors as well. I use it daily and love it. I can build Python modules with it (Boost Python) so Python coders can easily use my C++ code. I can build Windows DLLs too so the Windows C# guys can use the C++ code as well and they never have to write a line of C++ (nor do the Python guys). C++ is awesome for systems and embedded work. Boost is awesome too and very easy to use IMO. I've no idea why so many people keep trying to find fault with it.

I think many people are intimidated by C++ and never really dig in and learn it well. If you do, you'll be a much better programmer because of it.


I'm pretty sure this exact Stack Exchange post was linked on HN before.


Template metaprogramming still solves certain problems better than other solutions. If it is not used obsessively, then there are better languages


What kinds of problems? I can't easily think of problems where C++ templates are more powerful than metaprogramming facilities of high level languages like lisp...


I didn't use the word 'powerful' because powerful generally refers to expressivity. I'm most interested in performance.

C++ template metaprogramming gives you most of the expressivity of generic programming without paying a large runtime penalty (because the resolution still occurs at compile time, the compiler can take advantage of this). This is in contrast to other languages, where run-time generic programming leads to performance penalties.

Now, as the stereotypical example of how C++ template metaprogramming can actually beat standard C programming, compare the performance of C qsort with C++ std::sort.


Not all expressive generic programming techniques have run time costs. The parent poster mentioned Lisp, in which macros are also usually expanded at compile time. Furthermore, Template Haskell offers type-safe compile time metaprogramming.


Many Quants are not really interested in new languages. I know at least one bank that's C++ and Perl, because that's what they know.


That was a quote from a quant who works at a major bank in Atlanta. Odd thing to downvote.

They're hiring btw, can't find enough C++ programmers who want to be paid 6 figures.


C++11 is a very modern language. It seems to be moving faster than Java these days.


To be "very modern" there would have to be a replacement for the antiquated header files system. Writing function signatures twice isn't modern. Writing all code that uses templates in header files isn't modern. Waiting minutes or hours for my code to compile isn't modern. C++ carries a lot of baggage from the past and that's how it was conceived from the beginning. I'm still using C++ because it's efficient and software is made for users after all.


Very valid points. It looks like the Modules proposal [1] aims to fix some of your points. There certainly is still a lot of work to be done and from what I gathered of the GoingNative panels there is no broad consensus yet on what modules really are. Unfortunately this is still very far away and for now you have to choose between high compile times or a "header-only".

[1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n334...


I hate all that old baggage as much as the next guy, forward declarations, header guards...

But C++11 has improved on that aswell, as much as possible within the old framework.


Exactly, many critiques of C++ are well and truly obsolete. What are the practical advantages of Java's GC over C++'s smart pointers? Actually not much, so why carry that overhead.

Plus the elephant in the corner of the room is that actually, write-once-run-anywhere in the real world turns out to be, write on Linux on x64, run on Linux on x64. In the Java world, they like to run one VM (JVM) inside another (on Xen or whatever). Again why schlep all that around? The world is turning back to native code, for good reason.


Smart pointers use a lot of memory and some implementations, like boost::shared_ptr are heap based. One thing I want from C++ is efficiency, so I don't subscribe to the recently popular idea of using smart pointers everywhere. A Java reference uses 32 bits even on a 64 bit machine (below 32GB). A shared_ptr uses 4 times that (the first one does).


What are the practical advantages of Java's GC over C++'s smart pointers? Actually not much, so why carry that overhead.

Smart pointers aren't free, memory or otherwise.

The syntactic overhead of declaring smart pointers is one. Say what you want, but the verbosity of these declarations is quickly tiring, and ergonomics is relevant. typedef's help, but that leads to the next bit, which is that smart pointers are leaky abstractions in the sense that they are never going to be directly interchangeable with a regular C++ pointer. So the abstraction breaks down (if only a little) anytime you need one.

Destruction and RAII have benefits that it's harder to leak memory, but they force you into awkward contortions where you may potentially need more copies than usual, and because copying is an O(n) process in some instances, this can result in the contortions changing the complexity characteristics of data structures or algorithms. Move semantics can alleviate this greatly, however, and they are a welcome addition, as are many other things in C++11 (moves, constexpr, for-each, better enumerations, lambdas, and auto are all very welcome.)

And finally: the massive, unadultered proliferation of reference types quickly becomes an incredible load intellectually. As a programmer, you likely don't care a ton about the sharing semantics of any individual object and whether it's smart or unique or whatever - that is something that can be done automatically with no intervention on your part.

If it turns out you do care about these things greatly, because they are important to your actual task at hand, it is likely C++ may be a good choice. I have written C++ on the job, and have gone from nightmarish code to much nicer code, and it is a suitable tool for many issues.

The counterargument, which is valid too I think, is that sharing semantics form an important part of an API - you can determine whether or not you own a object because it's unique, or whether you should be careful, as it's shared. But this of course a benefit that extends to any typed data, in any typed language (that's worth its salt.)

But people who throw around "just use a smart_ptr and you're like, just as good as Java, obviously" aren't really helping. There's lots of valid points for both sides.

Again why schlep all that around?

Because in a vast majority of cases, it's irrelevant and your time and money is probably just as well spent somewhere else.

The world is turning back to native code, for good reason.

What indications do you have for this? I suppose iOS is a good case example, for one. But the massive amount of web property and the needs of those institutions alone for example, shows that while the world does need native code, it's not turning back to it with reckless abandon.

Futhermore, native code and garbage collection have nothing to do with each other. And the JVM isn't anywhere near the best example: Java having a very heavy per-object overhead (something like 5-7 words per object for heap metadata) doesn't help memory benchmarks in the slightest, although the JVM does have impressive compilation facilities.

But if you want a much better baseline, compare to something like LuaJIT, that gets comparably close to even C++ with only a tiny memory footprint, or compare it to something like GHC where the collector is remarkably robust, and the per-object overhead can be significantly decreased (as all objects instead only have 1 word overhead, and unlike a language like Java, you can totally unpack structures of composite, non-primitive types, meaning new data types can come 'totally free.')


Totally agree, but I just have to pick one nit on your final point re GHC and unpacking: If you do this you forfeit all generic programming because a 'lifted' polymorphic type has to be represented by pointer to a heap object. This is where C++'s unique take on generic programming still wins big.


Aren't C++ smart pointers essentially reference counting? That's well known to be much slower than well implemented GC.


[citation needed]

Depending on how many cores you use, what processor you use, and other use cases (e.g. copy-on-write, which is free with RC and super expensive with GC), one of them can be significantly faster than the other. But for a non-specific use case, it has been my experience that they are roughly equivalent.

The place where GC consistently excels is the "no random pauses" - most GCs will occasionally need to stop the world, even when they can mostly do incremental collections. Note that this does not mean they are slower - it is just that the overhead tends to be concentrated in bursts instead of uniformly spread out as in RC.

The place where RC consistently excels is reference loops, and less dependence on implementation robustness.


Just reread, and I got the GC and RC mixed up there (thanks, chancho) too late to edit, so I'll repost a fixed version:

The place where RC consistently excels is the "no random pauses" - most GCs will occasionally need to stop the world, even when they can mostly do incremental collections. Note that this does not mean they are slower - it is just that the overhead tends to be concentrated in bursts instead of uniformly spread out as in RC.

The place where GC consistently excels is reference loops, and less dependence on implementation robustness.


You have a strange definition of "excel".


Thanks! I managed to get the two lines confused, not sure how, and it is too late to edit; just posted a correction:

The place where RC consistently excels is the "no random pauses" - most GCs will occasionally need to stop the world, even when they can mostly do incremental collections. Note that this does not mean they are slower - it is just that the overhead tends to be concentrated in bursts instead of uniformly spread out as in RC.

The place where GC consistently excels is reference loops, and less dependence on implementation robustness.


GC may be faster in terms of total amount of time spent managing memory but smart pointers are more predictable. Many domains in which C++ is used are more sensitive to the performance spikes that can arise in GC'd environments than they are to the more expensive but also more spread out cost of managing reference counted objects. Many video game technical directors would choose spending 1ms/30hz frame in memory bookkeeping rather than going 29 frames with no problems only to encounter a 15ms GC hiccup.


Well that depends on the type of pointer -- if you only need the reference till the end of the function, (or when you deconstruct a class) you can get a much more performant implementation.

As for speed, with benchmarking every thing is in the details and while GC can be made to run fast the actual speed depends on the access patterns, the cache and when memory is used (and for how long).

In addition in C++ you can put quite a lot of things on the stack (that is actually how you make a smart-pointer) which is always going to be much, much faster than anything on the heap.


C++ is harder to reverse engineer than say java, no? I would think this plays a rather large part in it as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: