Hacker News new | past | comments | ask | show | jobs | submit login

I am in the process of writing my own scraper for recipe sites that grabs only the recipes and parses them into a machine readable (searchable) format. Turns out you don't need much for parsing, because an incredibly large percentage of these sites use wordpress, and either the tasty recipes plugin or wprm (wordpress recipe maker) plugin.

The only tedious part at this point is writing the different search crawlers for each site - some are reusable while others are not.

I had assumed that this would have been much more difficult, but after a weekend of writing the cheerio utils for pulling the recipes only from tasty or wprm tags, I found myself nearly done. The frontend and search engine tuning will take much longer.

It would be really cool if recipe sites could just include a recipe instead of a useless blog post punctuated by ads every 4 sentences, but these people clearly don't want me using their site in the "right" way. Oh well.




Recipes cant be copyrighted, so bloggers write long-form narratives ahead of the recipe to make the contents of the page intellectual property.


a substantial number of people (not me) do enjoy reading the stories, they get something out of it, damn normies.


Me too, but not when I'm in a rush, but the JS is flashing off and on, moving things around the page, and generally making for a miserable experience.

Maybe they could just put it after the recipe???




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: