I am in the process of writing my own scraper for recipe sites that grabs only the recipes and parses them into a machine readable (searchable) format.
Turns out you don't need much for parsing, because an incredibly large percentage of these sites use wordpress, and either the tasty recipes plugin or wprm (wordpress recipe maker) plugin.
The only tedious part at this point is writing the different search crawlers for each site - some are reusable while others are not.
I had assumed that this would have been much more difficult, but after a weekend of writing the cheerio utils for pulling the recipes only from tasty or wprm tags, I found myself nearly done. The frontend and search engine tuning will take much longer.
It would be really cool if recipe sites could just include a recipe instead of a useless blog post punctuated by ads every 4 sentences, but these people clearly don't want me using their site in the "right" way. Oh well.
The only tedious part at this point is writing the different search crawlers for each site - some are reusable while others are not.
I had assumed that this would have been much more difficult, but after a weekend of writing the cheerio utils for pulling the recipes only from tasty or wprm tags, I found myself nearly done. The frontend and search engine tuning will take much longer.
It would be really cool if recipe sites could just include a recipe instead of a useless blog post punctuated by ads every 4 sentences, but these people clearly don't want me using their site in the "right" way. Oh well.