New pagination and crawling features

nfm · on Feb 15, 2014

I'd never heard of this site before today. This looks cool and like something I would actually want to use, and I think you've got a very promising way to demo your product while doing content marketing.

That said, I had absolutely no idea how to make it work. I clicked around, and recognized the effects of some of the things I was doing, but it didn't really make sense to me and I'm still confused. If you're trying to acquire users from this kind of page, I'd recommend having a giant call to action somewhere at the top of the page with a link to some introductory explainer material.

pranade · on Feb 15, 2014

Thanks a ton for the feedback... that's super helpful. We'll make some tweaks to this page now.

nfm · on Feb 15, 2014

You're welcome :) By the way, I've found running things through eg. UserTesting.com will help reveal this kind of info in a way that you won't necessarily get via feedback from friends or HN. Lot's of facepalm moments but it's definitely worth doing to get a new perspective on how people perceive your marketing and product.

pranade · on Feb 15, 2014

Awesome, thanks for the recommendation!

adwf · on Feb 15, 2014

Looks cool, but I couldn't find any definitive answer about whether it obeys robots.txt or not? Just that it's upto the end-user to determine which pages get crawled.

I'm not too fussed about people crawling my sites (as you say, it's gonna happen anyway), but I do worry about certain dynamically built sections of websites that are off-limits to bots for good technical reasons.

pranade · on Feb 15, 2014

Although we do put the onus on the users to pay attention to robots.txt we realize the reality is that some amount of them won't necessarily respect it. That's one of the primary reasons behind the way we designed our crawler the way we did -- requiring people to actually specify the links it will visit (as opposed to spidering around sites following all links). Our hope at least is this requires people to put a little thought into the data they want (and where they want it from) before hitting a site.

adwf · on Feb 15, 2014

Sounds good enough to me. Might have to keep an eye out for it, hopefully your users will all be good ;)

Maybe you could design a feature that gives them a little friendly reminder if they're about to cross someones robots.txt, lets them check that they're aren't about to do something seriously annoying. Sometimes a robots.txt is over zealous and should be crossed, most of the time not though.

pranade · on Feb 15, 2014

Ah, yeah that's not a bad idea. Still keep the responsibility on the user, but do our best to keep them honest :)

loceng · on Feb 15, 2014

I believe I remember you previously writing that you're also going to allow site owners to come to login to kimono and turn off spidering for their site?

pranade · on Feb 15, 2014

Yes, absolutely... we'll be providing more control for webmasters before we exit beta.

lukasm · on Feb 15, 2014

This is awesome product. Requests - add target url to json - merge a few apis into one e.g. I'm scraping 20 jobs postings. I would like to make one GET that gets me all the jobs.

agotterer · on Feb 15, 2014

I'm looking to do some job crawling as well, but across more sites then your price tiers offer. Would be interesting to see pricing with a sliding scale of number of sites / frequency. I'm looking to crawl lots of different sites and a relatively low frequency. Compiling all those results in one api would do a long way as well.

I'd also love to see something that can do some intelligent fuzzy matching. Giving it a page and telling it to find content that matches some rules. For example finding a link that matches an email regex or text that matches a phone number regex.

pranade · on Feb 15, 2014

Thanks, great suggestions! We're working on a more powerful API editor that will let you do just that.

lukasm · on Feb 15, 2014

Also, is it safe to use API key in Javascript that runs on client? Maybe you should do singed urls in same format that S3 or have a public read-only key.

trip41 · on Feb 15, 2014

It's safe in the sense that we don't support 'private' APIs yet, charge money yet or allow you to authenticate any other parts of the service with your API key. But yes you're right, it will have to be dealt with eventually. This is on our radar to roll out well before we actually start charging people or offer different types of security features. Will probably be something like public-key/private-key.

zyxley · on Feb 15, 2014

This really needs actual docs instead of just videos.

Also, there's apparently no way to delete any properties or actually manage collections using the kimonofy GUI, which is pretty annoying.

pranade · on Feb 15, 2014

Thanks for the suggestion. You're right, videos aren't the best way to comprehensively describe how to use this. We're working to put together some deeper documentation (and improving the API editor) so you'll have more control over how things come out.

tonystark · on Feb 15, 2014

You guys are pumping out hero features. Keep it up.

doesnt_know · on Feb 15, 2014

I'm guessing the new feature is not suppose to look like this: http://i.imgur.com/cKFgQcg.png

It's pretty broken. Firefox 27.0.1 on Arch if someone from kimono sees this.

pranade · on Feb 15, 2014

Thanks for the feedback. Just curious, did you see this after you moved/ resized the window or did this happen right away?

doesnt_know · on Feb 16, 2014

It happened straight away. After clicking on any of the data, that's what happens.

theGimp · on Feb 15, 2014

Man. If you manage to make it work and cover edge cases, you will be solving a big problem for a lot of people, I imagine.

I hope your service flourishes and continues to exist because it will definitely make my life a lot better.

Best of luck to you!

pranade · on Feb 15, 2014

That's the plan.. thanks!

oniTony · on Feb 15, 2014

Any way to pick up elements that are on the page, but not necessarily clickable? E.g. the <title> element of the page.

pranade · on Feb 15, 2014

It's unfortunately not possible right now, but we know people want it, so we're working on it. Probably have to tuck it into our advanced mode feature as it's not something we'll ever be able to solve with the point and click UI.

k__ · on Feb 15, 2014

So this is a scraper that just scrapes links?

pranade · on Feb 15, 2014

No. You can extract any content you want from the page (incl. text, images, links) just not meta elements or invisible ones (e.g. <title> or anything else that usually shows up in <head> for that matter)

kirillzubovsky · on Feb 15, 2014

This is so good, it's scary.

pranade · on Feb 15, 2014

Awesome, glad you like it!

robertp · on Feb 15, 2014

Great work. Big innovation in taking a clunky method of crawling info and turning it into something extremely elegant with how its outputted and someone can work with it.

level09 · on Feb 15, 2014

nice and impressive work, however I'm a bit old school, still prefer to do it in a few lines of python code.