Hacker News new | past | comments | ask | show | jobs | submit login

For me, it was (and still is) terribly hard to find libraries that do what I want. Although Hackage has a large database, it's quite confusing and very hard to figure out how a library actually works.

A few years ago, I tried to parse HTML files for a very small thing that I wanted to do and I just couldn't find or understand whether there's actually something out there that I could use. So, instead, I ended up learning Parsec and writing my own crappy parser...

I really don't think that's what it should be like. And maybe it's just my own fault. If not, there should be thousands out there who start their Haskell journey with such a frustrating experience. There's a deep and dark abyss in which beginners fall after an initial tutorial.

I've never had a similar experience with any other language. The upside, of course, is that Haskell is the most beautiful language I know.




I think this is because the abstractions are so abstract. When functionality is glued together with very general combinators and operators there's not really much to grab hold of unless you understand the abstractions.


Well said. What always saved me is that brave and devoted individuals wrote tutorials and published them to help me peel away the layers of abstractions. There's no way I could have done that on my own.


Which is why I view haskell's community as one of its main, less tangible qualities. :-)

edit clarity - Apparently my brain is too Haskelled today for writing English


Sometimes, but sometimes a function just has a type of String -> String with nothing besides the name to tell you what it does.


Google "haskell parse html", the first hit is a tutorial, the second hit the TagSoup library on hackage.


TagSoup is pretty poor — it's far from being an HTML parser (as defined by the HTML spec). The big problem is that because it's really just a tokenizer, it doesn't do any of the handling of mis-matched tags, which is really the interesting part of parsing HTML. Even insofar as what HTML can be parsed by a streaming parser, it's not great, as it doesn't imply any tags (which are a big part of HTML!).

It does, in its own code, state:

    -- We make some generalisations:
    -- <!name is a valid tag start closed by >
    -- <?name is a valid tag start closed by ?>
    -- </!name> is a valid closing tag
    -- </?name> is a valid closing tag
    -- <a "foo"> is a valid tag attibute in ! and ?, i.e missing an attribute name
    -- We also don't do lowercase conversion
    -- Entities are handled without a list of known entity names
    -- We don't have RCData, CData or Escape modes (only effects dat and tagOpen)
All of which mean that it doesn't actually follow what the HTML spec says! Just because it claims to support the spec doesn't mean much!


I haven't used tagsoup, but I believe that is the point; it was designed to let you scrape data from badly formed HTML you got from somewhere else, rather than helpfully pointing out the broken tags.


The HTML spec defines how to parse any arbitrary stream of characters, and it is what is implemented in browsers and hence is what best supports badly-formed HTML (because it's typically written aimed at browsers!). Therefore, by not following the spec, TagSoup has worse compatibility with badly-formed HTML.


Google is your friend: "html scraping Haskell" -> https://hackage.haskell.org/package/scalpel

Googling is the best way to go through hackage.


I know that. The situation got better in recent years. I was trying to point out that as a beginner, it is really hard to find your way and I still think it is. I learned Parsec and solved my problem that way since I could find an okay-ish tutorial for that. If at that time I had found a TagSoup tutorial, I would have probably used that.

It is sometimes hard to grasp how it is to not know something. As a beginner, you don't just write down your Monads and Monad Transformers. You don't just read the source code of libraries on Hackage. You need a lot of guidance.


You may be interested in Hayoo: http://hayoo.fh-wedel.de/?query=Html

Unlike Hoogle (which only searches a few things) Hayoo searches most of Hackage.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: