For ML and Data Science I'd say very immature. HLearn does interesting things but is far from providing a workable ML or Data Science workflow. We need solid, unquestionable Matrix and Vector libs upon which the ~50 bread and butter algorithms cans be solidly implemented. Then Diagrams needs super tight integration and connections to "big data" systems can be built. Frames is moving in the right direction type-wise but it'd be really nice to get HLists (or even vinyl?) in as representations which can be transparently backed by the aforementioned common matrix libs. Lenses could probably help a lot here and I think Anthony is going in that direction exactly.
Neatly, the GPU story is actually quite strong and could be integrated with this hypothetical ML and Data Science platform I'm describing quite nicely. That'd be a not small boost to its power.
What's the difference between ML and data science? ML has more programming?
And what's the difference between data science and statistics? Statisticians try to come up with new mathematics, while data scientists just try to come up with ways to apply that maths?
I'll take a brief stab since I think there are important differences but this is also holy war territory so YMMV.
ML tends to be more concerned with scale and computational efficiency as the algorithms used are often (a) heavyweight and (b) less human interpretable. Success is primarily measured objectively but via proxy measures like prediction accuracy. It may require data science efforts behind the scenes in order to succeed.
Data Science tends to be more concerned with interpretation than scale or repetition. It sits somewhere between analyst, sataistician, and journalist. It's also a much more interactive workflow often involving a great deal or successive, simple models and lots of exploratory visualization. Measures of success include discovery of human-relatable inferences from data, substantiation or dismissal of human-relevant experimental hypotheses, and success in communication of those ideas.
Statistics is a field of study of some of the most important mathematical foundations of each of these roles. A statisicians skill is more generalized but also more abstract. A statistician who learns CS and journalism could easily head into either other field.
1: I'd have rated Haskell's concurrency as "best in class" thanks to its STM implementation. Other languages do have STM, but AFIAK only Haskell provides both efficiency and safety thanks to its type system. Other languages have to choose between tracking every variable (slow) and letting the programmer declare which values are going to be rewound if the transaction aborts (unsafe).
2: Your section on IDEs does not consider Leksah. I've just come back to this after a long time on Eclipse and Atom, and its now a mature IDE specialized for Haskell. You might consider bumping Haskell to "mature" based on this.
I agree on concurrency. It is best in class and also offers a lot of choice (which some other popular languages like JS and Go are not providing).
When it comes to IDEs I do think immature would also be my choice. There is a lot of movement on this front so I have hopes this may change soon. My main points are: 7.10 is out for a while but most editor integrations have/had (until recently) some major issues with it. Also Stack is bocoming quite popular, yet most editor integrations are not Stack-aware. Finally the editor integrations are not much plug-and-play. As said, this is changing rapidly with some necessary fixes to GHC/cabal made and projects like stack-ide and ghci-ng.
> When it comes to IDEs I do think immature would also be my choice.
If you have not already logged any issues, can you please list the top few things that bugged you about Leksah here and I will make sure they get added to the issue tracker? Even if they are already there it will help us prioritise to know what your pain points were.
> 2: Your section on IDEs does not consider Leksah. I've just come back to this after a long time on Eclipse and Atom, and its now a mature IDE specialized for Haskell. You might consider bumping Haskell to "mature" based on this.
On a recent thread on Reddit, the Leksah maintainer acknowledged that on some platform (I forget which), Leksah crashes in every session, and has a CPU-spinning bug.
For me, it was (and still is) terribly hard to find libraries that do what I want. Although Hackage has a large database, it's quite confusing and very hard to figure out how a library actually works.
A few years ago, I tried to parse HTML files for a very small thing that I wanted to do and I just couldn't find or understand whether there's actually something out there that I could use. So, instead, I ended up learning Parsec and writing my own crappy parser...
I really don't think that's what it should be like. And maybe it's just my own fault. If not, there should be thousands out there who start their Haskell journey with such a frustrating experience. There's a deep and dark abyss in which beginners fall after an initial tutorial.
I've never had a similar experience with any other language. The upside, of course, is that Haskell is the most beautiful language I know.
I think this is because the abstractions are so abstract. When functionality is glued together with very general combinators and operators there's not really much to grab hold of unless you understand the abstractions.
Well said. What always saved me is that brave and devoted individuals wrote tutorials and published them to help me peel away the layers of abstractions. There's no way I could have done that on my own.
TagSoup is pretty poor — it's far from being an HTML parser (as defined by the HTML spec). The big problem is that because it's really just a tokenizer, it doesn't do any of the handling of mis-matched tags, which is really the interesting part of parsing HTML. Even insofar as what HTML can be parsed by a streaming parser, it's not great, as it doesn't imply any tags (which are a big part of HTML!).
It does, in its own code, state:
-- We make some generalisations:
-- <!name is a valid tag start closed by >
-- <?name is a valid tag start closed by ?>
-- </!name> is a valid closing tag
-- </?name> is a valid closing tag
-- <a "foo"> is a valid tag attibute in ! and ?, i.e missing an attribute name
-- We also don't do lowercase conversion
-- Entities are handled without a list of known entity names
-- We don't have RCData, CData or Escape modes (only effects dat and tagOpen)
All of which mean that it doesn't actually follow what the HTML spec says! Just because it claims to support the spec doesn't mean much!
I haven't used tagsoup, but I believe that is the point; it was designed to let you scrape data from badly formed HTML you got from somewhere else, rather than helpfully pointing out the broken tags.
The HTML spec defines how to parse any arbitrary stream of characters, and it is what is implemented in browsers and hence is what best supports badly-formed HTML (because it's typically written aimed at browsers!). Therefore, by not following the spec, TagSoup has worse compatibility with badly-formed HTML.
I know that. The situation got better in recent years. I was trying to point out that as a beginner, it is really hard to find your way and I still think it is. I learned Parsec and solved my problem that way since I could find an okay-ish tutorial for that. If at that time I had found a TagSoup tutorial, I would have probably used that.
It is sometimes hard to grasp how it is to not know something. As a beginner, you don't just write down your Monads and Monad Transformers. You don't just read the source code of libraries on Hackage. You need a lot of guidance.
This is a great and a balanced list. I have definitely enjoyed some time spent programming in Haskell.
One frustration that I have had while dabbling in Haskell is that some libraries in Hackage assume a GNU/Linux kind of system. For example, I was reading Simon Marlow's book and while trying to compile the examples in a Windows machine, a particular library could not be installed because of the lack of 'configure' (GNU autotools? I only use Windows nowadays due to professional reasons and have not used GNU/Linux last 11 years). I hope that library authors do not make such assumptions unless it really necessary.
Is there an opportunity for a company to create an implementation of Haskell which fixes some commonly mentioned problems (e.g. stop the world GC) - like proprietary JVM vendors? Is it even technically feasible? Of course, such a hypothetical company would also need to spend a lot of time writing useful libraries.
I'm a Haskell Windows user (and an Idris developer), and I have little trouble installing most things (except those directly targeted at Linux). That's not because I'm awesome, but because there's a trick to it. The trick is to install msys2 (https://msys2.github.io), and do the installs from there. Set up its libs and include as extra dirs in Cabal's config. When you need some lib or build tool look for it with pacman -Ss and install it. To start with installing make and autoconf helps you over the configure hurdle.
And if you hit Cabal hell now and then, that happens to everybody so it's probably not because you're on Windows, just make a sandbox and see if it works there.
I don't mean to trivialize the complexities and importance of setting up a development environment you are happy with, but Linux is very easily available to a Windows user via virtualbox VMs or boot2docker. Are you sure you can't find a workflow involving those that you're happy with? Editing files doesn't have to use an editor in the VM.
You can keep doing everything all the coding that you've doing on the windows side. Just set up a shared folder which will instantly update the virtual box. Keep one terminal tab open that's ssh'd into the box, and just switch over to that tab when you want to run anything.
Although hinted at in "Mobile apps" and "Package management", I think you might want to add "Deployment" to your list of criteria for a language. It is often overlooked with languages and frameworks, but the ability and method of deploying code (particularly with a running system) should not be ignored.
> Haskell is an amazing language for writing your own compiler. If you are writing a compiler in another language you should genuinely consider switching.
Uhm, no, thank you, but no. Haskell is poorly equipped for implementing compilers, because of its lack of unrestricted compile-time metaprogramming.
I'm spoiled. This is what I want from a language for implementing compilers:
1) Cheap, boilerplate-free definition for chains of slightly different ASTs and IRs. Haskell and ML are notoriously weak in this area (see the "Expression Problem" thing)
2) Boilerplate-free generation of various (potentially complex) visitors for these ASTs and IRs. For Haskell, Scrap Your Boilerplate library is the closest thing to what I want, but it is still far too clumsy.
3) I need efficient embedded Prolog and Datalog (yes, both, the latter allows some cool optimisations not possible for the former). Not feasible without compile-time metaprogramming. Both are important for simplifying compiler construction.
4) Parsing is not very important. But yet, it is necessary. The best Haskell got on offer is Parsec. Sorry, not good enough for my requirements.
So "Haskell is poorly equipped for implementing compilers" because it doesn't fit your bizarro requirements, including "efficient embedded Prolog and Datalog"...
Haskell is poorly equipped for implementing compilers because writing compilers in Haskell requires tons of boilerplate, which is not needed in the languages that are a much better fit for this task. Kinda obvious from a definition of what "poorly equipped" means.
And why do you call these basic requirements bizarre? For example, something like a region inference is just a few lines of Prolog vs. dozens or even hundreds of lines of code in Haskell.
>Haskell is poorly equipped for implementing compilers because writing compilers in Haskell requires tons of boilerplate, which is not needed in the languages that are a much better fit for this task.
And yet your "much better languages" are even more niche compared to Haskell, and no major programming language has been written in them.
So, it's basically "Haskell is poorly equipment compared to my impossible standards, as implemented by this hodge-podge of Scheme based tools that I alone ever used together for compiler implementation".
>And why do you call these basic requirements bizarre? For example, something like a region inference is just a few lines of Prolog vs. dozens or even hundreds of lines of code in Haskell.
Because no known programming language resorts to "a few lines of Prolog" to save a few tens of lines for "region inference"?
Exactly. Compilers construction is a very specific problem domain, which requires specially tailored DSLs, not a "general purpose" language. Haskell is a way too general purpose for this task.
> and no major programming language has been written in them.
C is even more unsuitable for this than Haskell, and yet most of the compilers are written in C. Technical superiority does not count at all in choosing an implementation platform.
> Because no known programming language resorts to "a few lines of Prolog" to save a few tens of lines for "region inference"?
JFYI, a prototype implementation of SSA for GCC was written in Prolog (see the Sebastian Pop works [1]). If embedding Prolog in C was technically feasible, it would have stayed this way. Also, to see how much Prolog rocks, check out Harlan [2]
In order to avoid unnecessary flame war: I give no crap to any popularity based arguments. Got any technical, well-grounded, rational arguments? Go on, I'd be delighted to comment. Otherwise I'm not interested. And, btw., you're clutching to just one out of four of my points against Haskell. Nothing to say about the other three?
>C is even more unsuitable for this than Haskell, and yet most of the compilers are written in C. Technical superiority does not count at all in choosing an implementation platform.
Or maybe technical superiority is a more varied thing than what you consider (merely a language's features and "expression" capabilities), and includes tooling, portability, speed, direct access to hardware, existing codebase to work upon and lots of other parameters, for which C was king or very near the top for many decades...
>In order to avoid unnecessary flame war: I give no crap to any popularity based arguments.
And from my side: I don't give a crap for "technical arguments" that focus only on a very narrow area of technical benefits ignoring lots of others.
Even "familiarity of the team with a language" is a technical argument (in the sense that it speaks to the capability of a teams to practically use something).
I also don't think that "popularity" is just fluff. It might not be a 100% solid proof, but it does point to real, battle proven, results (as oposed to vapour-ware and hand-waving about superiority).
In the end, history and actual results are the only criteria. That's being pragmatic and empiricist -- as opposed to mere beliefs that don't pan out in practice.
The features I am talking about are very platform-agnostic. The only requirement for implementing them is a compile-time metaprogramming, which is trivial to add to any language, including C.
The only explanation for not using this approach is a plain ignorance. Majority of people have no understanding of compilers at all. And yet they are writing compilers. In C or Haskell.
And, btw., teams that cannot learn a tiny DSL in an hour (or write this DSL in a week) should not be allowed to write compilers. It's kinda obvious.
>The only explanation for not using this approach is a plain ignorance. Majority of people have no understanding of compilers at all. And yet they are writing compilers. In C or Haskell.
That includes people like Lars Bak, Martin Odersky, Anders Hejlsberg, Simon Peyton Jones, etc? I wonder where the burden of proof should be placed in these kind of arrogant arguments -- as if "compile-time metaprogramming" is some magic bullet for compiler design...
I am glad you did not list anyone related to GCC and LLVM, because both rely on the static DSLs (and code generation) heavily.
As for the rest - yes, compiler construction is not their main research topic. They can all be great language designers, but their approach to compiler construction is needlessly overcomplicated.
A view of a compiler as a chain of stupidly trivial passes, ideally expressed as term rewriting, is not very common among the researchers, but so far it yields the best results.
>They can all be great language designers, but their approach to compiler construction is needlessly overcomplicated.
Actually Lars Bark and team are not "language designers". Smalltalk (Strongtalk), Java (Hotspot), Javascript (V8), all have been designed as languages by other people.
They are world reknowed experts in compiler design.
JIT is a totally different beast from a simple pipeline compiler. You should have known better. Anyway, you evidently do not know what you're talking about. Consider you trolling unless you can show your compiler code.
Any language that allows compile-time metaprogramming (and not as restricted as TH) should allow to implement all the components I want. One example of such a system is the Nanopass framework [1], used alongside with cKanren [2].
> 2) Boilerplate-free generation of various (potentially complex) visitors for these ASTs and IRs. For Haskell, Scrap Your Boilerplate library is the closest thing to what I want, but it is still far too clumsy.
Have you tried QuickCheck?
> 4) Parsing is not very important. But yet, it is necessary. The best Haskell got on offer is Parsec. Sorry, not good enough for my requirements.
I've not found much better than (atto-)Parsec, to the point where I miss it in other languages.
Pretty basic: graceful error recovery, arbitrarily complex error reporting, arbitrary actions (potentially with some side effects). I.e., everything that is needed for the IDE integration and for the professional compiler frontends with usable, friendly error messages and recovery suggestions.
Could you please point at their error recovery functionality (I could not find it by a quick scan)? My gut feeling is that combinator-based parsing is not very well suited for this, but I may be missing something obvious.
This is the chief drawback of these two libraries: they are poorly documented (and are arguably not documented at all). In general, parsers is for parsing and trifecta is for reporting errors and diagnostics. Trifecta looks especially nice for giving context to error sites, but I haven't used it much yet (only for highlighting, really).
By "error recovery," do you mean backtracking? If so, there are several ways of doing that, which can be found here:
See the 'choice', 'option', and 'try' combinators, and also the '<|>' operator in Control.Applicative.
If you need more than that, you can extend parsers' monadic parsing to roll your own error recovery.
Currently, the best way to understand how parsers and trifecta work is to look at projects that use them. I will give you some links to the ones I've used and found helpful, if a bit more complicated than the parsers I am currently writing:
And here is a relevant Reddit thread, which includes a link to Edward Kmett's slide deck which motivates trifecta/parsers and gives a high-level view of how they work:
By recovery I mean something like "f(x,y,z;" triggering an error message "did you miss ')' here?", with parser recovering and proceeding further, in a hope to report more errors in a single run - see how Clang handles this, with a handcrafted recursive descent parser.
Backtracking does not help here, it must be done with custom heuristics at each node. I am doing this kind of things declaratively on top of an elaborate PEG-based generator (using an idea of signals attached to both successful and failed branches, and then choosing which signals to execute based on the exact points of failure), but would be very interested to see a lightweight combinator-based solution.
`trifecta` is basically produces `clang`-style error messages (i.e. they have the pretty colors, underlined text spans, cursor pointers into the text, and really good error messages indicating the real problem and not some unrelated problem). That's why most people recommend `trifecta` when error reporting quality is paramount.
The author of the `trifecta` library is extremely knowledgeable about parsing/backtracking/recovery/error-reporting and has put a lot of thought into how to improve error message relevance as much as possible.
Ok, I think I understand now how it works (played with Idris parser a bit). Sort of a simple trick, I'll borrow it for the lower tier combinator-based parsing layer.
I was even more interested in handcrafted, custom error messages rather than constructed. Still cannot figure out how to stuff custom recovery heuristics into a combinator-based code.
I think the rating difference between sections "application domains" and "common programming" illustrates clearly what is known as "academia language", i.e. advanced, mature features and lack of real application stories and libraries.
> * recalling C++ & Java, not crazy about the idea of going back to having to specify types for everything.
This is a big misconception. The type system in Haskell becomes much more of a tool than a hindrance. Although you will tend to write type signatures for all your top level functions, type inference is very good and it's not necessary most of the time.
The reason you should consider writing at least the top level signatures is that when you do the compiler will help you fill in the banks with type holes. See
> recalling C++ & Java, not crazy about the idea of going back to having to specify types for everything.
In the dynamically typed languages static types are substituted with tons of trivial (stupidly trivial) tests, which is a way much more lines of code than with just simple type annotations.
I thought it was pretty straight forward coming from Erlang.
The biggest thing for me was practice thinking in recursion.
edit: Erlang and Haskell was just side hobby. I didn't do anything major with them. But I thought the languages were small enough, Haskell has such a clean looking syntax.
Oh, I think you're trying to say the threshold of being good with haskell is much higher? Cause I see that argument quite and bit.
> I thought it was pretty straight forward coming from Erlang.
> The biggest thing for me was practice thinking in recursion.
I use Clojure, generally like functional programming, and am fine using recursion.
> Haskell has such a clean looking syntax.
I have gotten fond of Clojure's Lisp-like syntax.
> Oh, I think you're trying to say the threshold of being good with haskell is much higher? Cause I see that argument quite and bit.
Not sure -- haven't spent any time with Haskell. Never tried Erlang either. Do you think it generally takes longer to become productive with Haskell than with other functional programming languages?
> recalling C++ & Java, not crazy about the idea of going back to having to specify types for everything.
I want to reiterate the other comments and clarify: C++ & Java have poor type systems compared to Haskell. Haskell's not only is much safer, but it also allows for much much better inference, so you rarely ever need to add type annotations. People do typically add top-level annotations, for documentation's sake, in most cases, but they are not required.
My understanding is that type inferencing allows you to omit specifying types in many places (the compiler uses heuristics to figure out the types in those places out at compile-time), but that you still need to specify types when necessary. (Please correct me if I'm wrong.)
I think Rust does this, but looking at Rust code, there's still types specified all over the place, and it looks much more cluttered to me than dynamically typed languages I've used.
Here's a slightly longer example of Haskell's awesome type inference with some additional commentary to complement the example that elbenshira gave:
>>> let showAdd x y = show (x + y)
>>> :type showAdd
showAdd :: (Num a, Show a) => a -> a -> String
The compiler infers:
* The two arguments, `x` and `y`, must be some type `a`
* They must be the same type `a`
* The type `a` must implement the `Num` interface (because we add them together using `(+)`
* The result of addition must also be the same type `a`
* The type `a` must also implement the `Show` interface (because we call `show` on the result of addition, which also has type `a`)
Notice how the compiler works backwards from the call to `show` on the result of addition to infer that the two original arguments must also implement the `Show` interface.
We didn't need to annotate any argument types or interfaces: the compiler did all the work for us. This is what people mean when they say that Haskell has "global" type inference.
A language like Rust has "local" type inference, meaning that you have to help a the compiler a little bit by providing the types and interfaces of function arguments, but then the compiler can infer the types of locally bound variables from that point forward.
From my (short) time with Haskell, I've found the type system to be the best I've ever used. It really does feel like it's helping you, instead of getting in the way. And so it feels more like a dynamic language than other strongly typed languages I've used (e.g. Scala, Java).
Below is a simple example, but look, no types to type!
>> let plus1 = (+1)
>> plus1 10
11
>> :t plus1
plus1 :: Num a => a -> a
You can ask for the type by `:t plus1`.
I've also used Clojure, and I do really like Lisps, but I don't think untyped languages give enough benefit for all the problems it causes. I'm excited, however, about Typed Clojure (and Typed Racket).
We made a choice with Rust to force you to write type annotations in function signatures, because it prevents errors at a distance. Even in languages with full type inference, writing out signatures is still considered good style for this reason.
Trying to get a Haskell application deployed to the web was an exercise in frustration. Somehow my resolution for its important package management and binaries has become profoundly fucked up as well.
Neatly, the GPU story is actually quite strong and could be integrated with this hypothetical ML and Data Science platform I'm describing quite nicely. That'd be a not small boost to its power.