I'd never heard of this site before today. This looks cool and like something I would actually want to use, and I think you've got a very promising way to demo your product while doing content marketing.
That said, I had absolutely no idea how to make it work. I clicked around, and recognized the effects of some of the things I was doing, but it didn't really make sense to me and I'm still confused. If you're trying to acquire users from this kind of page, I'd recommend having a giant call to action somewhere at the top of the page with a link to some introductory explainer material.
You're welcome :) By the way, I've found running things through eg. UserTesting.com will help reveal this kind of info in a way that you won't necessarily get via feedback from friends or HN. Lot's of facepalm moments but it's definitely worth doing to get a new perspective on how people perceive your marketing and product.
Looks cool, but I couldn't find any definitive answer about whether it obeys robots.txt or not? Just that it's upto the end-user to determine which pages get crawled.
I'm not too fussed about people crawling my sites (as you say, it's gonna happen anyway), but I do worry about certain dynamically built sections of websites that are off-limits to bots for good technical reasons.
Although we do put the onus on the users to pay attention to robots.txt we realize the reality is that some amount of them won't necessarily respect it. That's one of the primary reasons behind the way we designed our crawler the way we did -- requiring people to actually specify the links it will visit (as opposed to spidering around sites following all links). Our hope at least is this requires people to put a little thought into the data they want (and where they want it from) before hitting a site.
Sounds good enough to me. Might have to keep an eye out for it, hopefully your users will all be good ;)
Maybe you could design a feature that gives them a little friendly reminder if they're about to cross someones robots.txt, lets them check that they're aren't about to do something seriously annoying. Sometimes a robots.txt is over zealous and should be crossed, most of the time not though.
I believe I remember you previously writing that you're also going to allow site owners to come to login to kimono and turn off spidering for their site?
This is awesome product. Requests
- add target url to json
- merge a few apis into one e.g. I'm scraping 20 jobs postings. I would like to make one GET that gets me all the jobs.
I'm looking to do some job crawling as well, but across more sites then your price tiers offer. Would be interesting to see pricing with a sliding scale of number of sites / frequency. I'm looking to crawl lots of different sites and a relatively low frequency. Compiling all those results in one api would do a long way as well.
I'd also love to see something that can do some intelligent fuzzy matching. Giving it a page and telling it to find content that matches some rules. For example finding a link that matches an email regex or text that matches a phone number regex.
Also, is it safe to use API key in Javascript that runs on client? Maybe you should do singed urls in same format that S3 or have a public read-only key.
It's safe in the sense that we don't support 'private' APIs yet, charge money yet or allow you to authenticate any other parts of the service with your API key. But yes you're right, it will have to be dealt with eventually. This is on our radar to roll out well before we actually start charging people or offer different types of security features. Will probably be something like public-key/private-key.
Thanks for the suggestion. You're right, videos aren't the best way to comprehensively describe how to use this. We're working to put together some deeper documentation (and improving the API editor) so you'll have more control over how things come out.
It's unfortunately not possible right now, but we know people want it, so we're working on it. Probably have to tuck it into our advanced mode feature as it's not something we'll ever be able to solve with the point and click UI.
No. You can extract any content you want from the page (incl. text, images, links) just not meta elements or invisible ones (e.g. <title> or anything else that usually shows up in <head> for that matter)
Great work. Big innovation in taking a clunky method of crawling info and turning it into something extremely elegant with how its outputted and someone can work with it.
That said, I had absolutely no idea how to make it work. I clicked around, and recognized the effects of some of the things I was doing, but it didn't really make sense to me and I'm still confused. If you're trying to acquire users from this kind of page, I'd recommend having a giant call to action somewhere at the top of the page with a link to some introductory explainer material.