I don't have any examples anymore, because I deleted all the code I had written using the protocol buffer API when I decided to start writing MarketBot.
Searching for something as simple as "foursquare" would never return the Foursquare app back and the results I was getting were never the same between requests.
I did something like this, but using xpath queries, only problem is when they decide to change these, you will have "fun" times retesting all the dom locations to fix your scraper when the time does come when it breaks.
> I did something like this, but using xpath queries, only problem is when they decide to change these
That's a strong reason for going with a library that is used by multiple users (more people to distribute the workload over). But more realistically, you can just use the existing API endpoints because there are client apps in the wild that rely on those APIs working the same way-- at least until Google sunsets Google Play.
there was an API which i found a while back, but it didn't work at all, all the unit tests it had failed, so my guess was that google changed their api once again and the project was worthless. The solution i came up with was the only reliable one i could build. It was only like 6 lines of code.
apps are likely to be published on G Play first, then to 3rd party markets.
If an app is exclusively published on 3rd pary market, then it's likely distributed directly through apk , there's no way to identify them using a centralized way.
Do you run into any query limits with aggressive scraping? A colleague wrote an android market scraper a year or two back and she was repeatedly stymied by limits on the number of queries she could make to the store before getting cut off.
We don't do any aggressive scraping at the moment, so I can't really answer to that. The way our backend stuff works is that when we want to add an app to the site (http://gdgt.com/best/apps), we query for the search results, and then grab the app we're looking for.
http://github.com/chadrem/market_bot
Edit: I would like to point out that Google Play actually does have an API. It uses protocol buffers, if you check the Google Play apk.
http://github.com/kanzure/android-market-api-py