Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
New pagination and crawling features (kimonolabs.com)
53 points by pranade on Feb 15, 2014 | hide | past | favorite | 31 comments


I'd never heard of this site before today. This looks cool and like something I would actually want to use, and I think you've got a very promising way to demo your product while doing content marketing.

That said, I had absolutely no idea how to make it work. I clicked around, and recognized the effects of some of the things I was doing, but it didn't really make sense to me and I'm still confused. If you're trying to acquire users from this kind of page, I'd recommend having a giant call to action somewhere at the top of the page with a link to some introductory explainer material.


Thanks a ton for the feedback... that's super helpful. We'll make some tweaks to this page now.


You're welcome :) By the way, I've found running things through eg. UserTesting.com will help reveal this kind of info in a way that you won't necessarily get via feedback from friends or HN. Lot's of facepalm moments but it's definitely worth doing to get a new perspective on how people perceive your marketing and product.


Awesome, thanks for the recommendation!


Looks cool, but I couldn't find any definitive answer about whether it obeys robots.txt or not? Just that it's upto the end-user to determine which pages get crawled.

I'm not too fussed about people crawling my sites (as you say, it's gonna happen anyway), but I do worry about certain dynamically built sections of websites that are off-limits to bots for good technical reasons.


Although we do put the onus on the users to pay attention to robots.txt we realize the reality is that some amount of them won't necessarily respect it. That's one of the primary reasons behind the way we designed our crawler the way we did -- requiring people to actually specify the links it will visit (as opposed to spidering around sites following all links). Our hope at least is this requires people to put a little thought into the data they want (and where they want it from) before hitting a site.


Sounds good enough to me. Might have to keep an eye out for it, hopefully your users will all be good ;)

Maybe you could design a feature that gives them a little friendly reminder if they're about to cross someones robots.txt, lets them check that they're aren't about to do something seriously annoying. Sometimes a robots.txt is over zealous and should be crossed, most of the time not though.


Ah, yeah that's not a bad idea. Still keep the responsibility on the user, but do our best to keep them honest :)


I believe I remember you previously writing that you're also going to allow site owners to come to login to kimono and turn off spidering for their site?


Yes, absolutely... we'll be providing more control for webmasters before we exit beta.


This is awesome product. Requests - add target url to json - merge a few apis into one e.g. I'm scraping 20 jobs postings. I would like to make one GET that gets me all the jobs.


I'm looking to do some job crawling as well, but across more sites then your price tiers offer. Would be interesting to see pricing with a sliding scale of number of sites / frequency. I'm looking to crawl lots of different sites and a relatively low frequency. Compiling all those results in one api would do a long way as well.

I'd also love to see something that can do some intelligent fuzzy matching. Giving it a page and telling it to find content that matches some rules. For example finding a link that matches an email regex or text that matches a phone number regex.


Thanks, great suggestions! We're working on a more powerful API editor that will let you do just that.


Also, is it safe to use API key in Javascript that runs on client? Maybe you should do singed urls in same format that S3 or have a public read-only key.


It's safe in the sense that we don't support 'private' APIs yet, charge money yet or allow you to authenticate any other parts of the service with your API key. But yes you're right, it will have to be dealt with eventually. This is on our radar to roll out well before we actually start charging people or offer different types of security features. Will probably be something like public-key/private-key.


This really needs actual docs instead of just videos.

Also, there's apparently no way to delete any properties or actually manage collections using the kimonofy GUI, which is pretty annoying.


Thanks for the suggestion. You're right, videos aren't the best way to comprehensively describe how to use this. We're working to put together some deeper documentation (and improving the API editor) so you'll have more control over how things come out.


You guys are pumping out hero features. Keep it up.


I'm guessing the new feature is not suppose to look like this: http://i.imgur.com/cKFgQcg.png

It's pretty broken. Firefox 27.0.1 on Arch if someone from kimono sees this.


Thanks for the feedback. Just curious, did you see this after you moved/ resized the window or did this happen right away?


It happened straight away. After clicking on any of the data, that's what happens.


Man. If you manage to make it work and cover edge cases, you will be solving a big problem for a lot of people, I imagine.

I hope your service flourishes and continues to exist because it will definitely make my life a lot better.

Best of luck to you!


That's the plan.. thanks!


Any way to pick up elements that are on the page, but not necessarily clickable? E.g. the <title> element of the page.


It's unfortunately not possible right now, but we know people want it, so we're working on it. Probably have to tuck it into our advanced mode feature as it's not something we'll ever be able to solve with the point and click UI.


So this is a scraper that just scrapes links?


No. You can extract any content you want from the page (incl. text, images, links) just not meta elements or invisible ones (e.g. <title> or anything else that usually shows up in <head> for that matter)


This is so good, it's scary.


Awesome, glad you like it!


Great work. Big innovation in taking a clunky method of crawling info and turning it into something extremely elegant with how its outputted and someone can work with it.


nice and impressive work, however I'm a bit old school, still prefer to do it in a few lines of python code.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: