How close is Futhark to being considered production-ready? How close is the generated code to hand-optimized OpenCL for real problems (I did look at the benchmarks, but only the Rodinia suite had comparison numbers, and those are trivial to beat)?
OpenCL is easy to write for anyone who knows C (far more people than fans of functional programming) and moderately difficult to write decently well. What's the value proposition of Futhark? Is it just that some people prefer writing in a functional DSL rather than C, or does it purport to enable some optimizations that are hard to write well manually?
> How close is Futhark to being considered production-ready?
It's not 1.0 yet, and there are ways to trip up the compiler by accidentally writing irregular code (and the workarounds are nonobvious), but several people have used it productively (albeit non-production by most standards). Addressing this is part of the research we're doing, and we are quite sure we know how to get there.
> OpenCL is easy to write for anyone who knows C (far more people than fans of functional programming) and moderately difficult to write decently well. What's the value proposition of Futhark? Is it just that some people prefer writing in a functional DSL rather than C, or does it purport to enable some optimizations that are hard to write well manually?
In principle, a skilled GPU programmer can always outperform Futhark. The value proposition of Futhark is partly that you can obtain decent code without having low-level knowledge (usually within x2 of hand-written performance), and partly that even a skilled GPU programmer can get there much faster. Most low-level GPU languages (like OpenCL and CUDA) make it very hard to write code that is both fast and modular. For example, you might write some really nice CUDA kernel that operates on vectors, but there is no obvious way to apply it to every row of a matrix - probably you will have to write a new kernel from scratch; one that is very similar to the old one, but without direct code reuse. In Futhark, you would just 'map' the old function over the matrix, and let the compiler figure out how to turn it into one or more segmented operations.
Essentially, the Futhark compiler is not bound by the restriction that the optimised code should be maintainable by humans, so it is able to use inlining, duplication, fusion and a host of other transformations that are helpful for performance, but no sane programmer would write if they cared about the maintainability of their code.
In practice, Futhark usually gets beat on single primitive operations (matrix multiplication, FFT, etc), but at an application level, it tends to do well.
Thank you for the detailed response. I'll try to allocate a weekend to port some of my usual scientific GPGPU test cases to Futhark and see how it goes.
> Most low-level GPU languages (like OpenCL and CUDA) make it very hard to write code that is both fast and modular.
That's absolutely true, which is why a frequent solution is to achieve modularity by using simple metaprogramming to generate kernel code from higher level constraints. I understand Futhark aims to obviate at least a subset of these issues, so there's value there.
> Essentially, the Futhark compiler is not bound by the restriction that the optimised code should be maintainable by humans, so it is able to use inlining, duplication, fusion and a host of other transformations that are helpful for performance, but no sane programmer would write if they cared about the maintainability of their code.
Am I correct in assuming that you can still (easily) obtain, inspect and profile the Futhark-generated OpenCL code using the usual tools? I suppose a sensible workflow could be using Futhark to get as far as you can and then hand optimize the bottleneck kernels further.
> In practice, Futhark usually gets beat on single primitive operations (matrix multiplication, FFT, etc), but at an application level, it tends to do well.
Do you happen to know of any scientific/engineering applications (lattice Boltzmann methods, molecular dynamics, etc.) using Futhark in the wild? From a quick glance at the website it seemed like all of the applications are focused on image/video processing, any reason for that?
> Am I correct in assuming that you can still (easily) obtain, inspect and profile the Futhark-generated OpenCL code using the usual tools? I suppose a sensible workflow could be using Futhark to get as far as you can and then hand optimize the bottleneck kernels further.
Sort of. Futhark generates pretty simple OpenCL host code, but there is not yet an obvious way to tie the generated kernels back to the original Futhark source code. As a result, it's easy enough to detect that some specific kernel is the bottleneck, and often also why, but it can be a bit of a puzzle to connect it to a part of the original program. While you can in principle edit the generated kernels yourself, it's not code that is at all nice to read or modify.
> Do you happen to know of any scientific/engineering applications (lattice Boltzmann methods, molecular dynamics, etc.) using Futhark in the wild?
No. Closest are simple things like nbody[0] simulations.
> From a quick glance at the website it seemed like all of the applications are focused on image/video processing, any reason for that?
Not sure. Most of these are stencils, and while Futhark does alright with stencils, it doesn't do anything particularly clever (no hexagonal tiling, for example). Maybe it's just that they are easy and satisfying to write, because you trivially get something visual at the end.
Futhark's design was originally inspired by nasty financial algorithms (the ones from the FinPar suite[1]), which tend to be a combination of Monte Carlo methods and differential equations. I'd say that is Futhark's main strength.
Is it true that more people know OpenCL than functional programming? To me it seems like functional programming is a lot more prevalent in domains where this would be used, but I don't have any numbers. Do you by any chance have a source?
I didn't assert that more people know OpenCL than functional programming, I said far more people know C than functional programming. Anyone who knows C can write (bad) OpenCL after a trivial tutorial, so the initial barrier to entry is low. Writing decently optimized OpenCL is, as I said, moderately difficult and requires some domain specific knowledge, and it scales up from there if you want to get close to optimal.
Unequivocally, C. Futhark is clearly meant for reasonably high performance computing and I don't know of a single person in that domain that isn't a competent C programmer even if they spend a lot of time writing code in other languages (C++ mostly, also Fortran, Julia, Python with Cython/Numba, etc.). Functional programming, on the other hand, is something many (most, I'd contend) people in this domain consciously stay away from, myself included.
OpenCL is easy to write for anyone who knows C (far more people than fans of functional programming) and moderately difficult to write decently well. What's the value proposition of Futhark? Is it just that some people prefer writing in a functional DSL rather than C, or does it purport to enable some optimizations that are hard to write well manually?