tldr: in the test's first implementation mojo was faster but then the refactored...

Joel_Mckay · on Sept 9, 2023

We have seen many languages cycle in popularity, but Julia is one of the few high-level languages that could actually match... or in some cases exceed C/C++ performance.

There are always tradeoffs, and it usually takes a few weeks for people to come to terms with why Julia is unique.

Definitely falls into the fun category. =)

eoverride · on Sept 9, 2023

How can Julia exceed C/C++ performance?

adgjlsfhk1 · on Sept 9, 2023

This is actually really easy. Most C/C++ code is pretty slow. Beating perfectly optimized C/C++ code by a notable margin is basically impossible (all relatively fast languages in the limit tend to converge to theoretical peak CPU performance), but real world code isn't perfectly optimized. The better question is on a performance vs effort graph who wins. Julia has a ton of major advantages here. The base language actually gives you fast implementations of common data structures (e.g. Dictionaries and BitSets) and BLAS/LAPACK wrappers to do linear algebra efficiently while still having your code look like math. The package manager makes it basically trivial to add packages for more complicated problems (no need to mess around with makefiles). the REPL makes it really easy to interactively tweak your algorithms and gives you easy ways to introspect the compilation process (@code_native and friends). Another major advantage is that Julia has macros that make it really easy to make local changes to a block of code's semantics that are compiler flags in C/C++. For example, consider `@fastmath`. In C/C++ you can only opt in to fastmath on a per-compilation unit level, so most projects that have one part that require IEEE handling of nonfinite numbers or require associativity in one part of the program will globally opt out of the non IEEE transforms. In julia, you just write `@fastmath` before a function (or for loop or single line) and you get the optimization.

suavesito · on Sept 9, 2023

All the other answers are true. But there is one thing I didn't see people saying. Thanks to the existence of macros, you can create->compile code in runtime. This allows for faster solving of some problems which are too dynamic, thanks to the fast compile times of Julia.

This might sound counterintuitive given that latency is a normal problem mentioned everywhere else about Julia. But, if you think about it, Julia compiled to native code a plot library from scratch in 15- seconds every time you imported it (before Julia 1.9 where native caching of code was introduced, and latency was cut down significantly).

This makes that problems where you would like to (for example) generate polynomials in runtime and evaluate then a billion times each, Julia can generate efficient code for ever polynomial, compile it and run it fast those billion times. C/C++/Fortran would have needed to write a (really fast) genetic function to evaluate polynomials, but this would have always (TM) been less efficient than code generated and optimised for them.

Edit: typos and added some remarks lacking originally

Joel_Mckay · on Sept 9, 2023

In general, parallelization was a messy kludge in older languages originally intended for single CPU machine contexts. Additionally, many modern languages inherited the same old library ecosystem issues with Simplified Wrapper and Interface Generator template code (Julia also offers similar support).

Only a few like Go ecosystem developers tended to take the time to refactor many useful core tools into clean parallelized versions in the native ecosystem, and to a lesser extent Julia devs seem to focus on similar goals due to the inherent ease of doing this correctly.

When one compares the complexity of a broadcast operator version of some function in Julia, and the amount of effort needed to achieve similar results in pure C/C++... the answer of where the efficiency gains arise should be self evident.

One could always embed a Julia programs inside a c wrapper if it makes you happier. =)

https://www.youtube.com/watch?v=GZEOb6p1yvU

markkitti · on Sept 9, 2023

C is not the fastest language. Fortran can often beat them because is has non-aliasing function arguments and does not treat arrays like pointers.

https://fortran-lang.discourse.group/t/fortran-is-faster-tha...

RossBencina · on Sept 9, 2023

I do not dispute that C is not the fastest language. However C99 has the `restrict` keyword, which when combined with strict aliasing rules gives non-aliasing function arguments (I believe).

Joel_Mckay · on Sept 9, 2023

Don't be cowed without evidence friend. You are not wrong. =)

gcc is notoriously:

1. inefficient compared to the Intel or LLVM compiler

2. nondeterministic with -O3, which is why most people use g++ to check the code... and even then all bets are off on some hardware.

3. thrashes ram layouts, and slowly chokes to death if used as intended.

It comes down to the use-case, but fortran has killed too many to trust anywhere. =)

patagurbon · on Sept 9, 2023

There are a few cases where it's easier to get LLVM to generate certain code I imagine. Semantic things like aliasing, in lining, and type information.

In general though it's just a question of which hoops you have to jump through for which language comparing C/C++/Julia/Fortran when using LLVM

xiaodai · on Sept 11, 2023

give me one example where julia has exceeded c++ perf?

adgjlsfhk1 · on Sept 11, 2023

Differential equation solvers. for the fully gory details, see https://docs.sciml.ai/SciMLBenchmarksOutput/stable/ (for a specific example, see https://docs.sciml.ai/SciMLBenchmarksOutput/stable/StiffODE/...). radau and cvode are the C++ solvers which are in the majority of cases inferior to the pure julia solvers (Rodas5P, FBDF etc)

DNF2 · on Sept 9, 2023

You make it sound a bit like they optimized the heck out of Julia, while the Mojo sample was a naive little thing in a new innocent language.

The reality is that the Julia optimization was just a rewrite to use the same algorithm as Mojo, and that the Mojo code was heavily optimized.

simondanisch · on Sept 9, 2023

If this wasn't their bread and butter and not an example they picked themselves, it would indeed not be fair... But they chose this example specifically and said this is how you get state of the art performance with Mojo, so...

CraftingLinks · on Sept 9, 2023

That's not what the presented benchmarks show, though.

adgjlsfhk1 · on Sept 9, 2023

What do you mean by that?