Hacker News new | past | comments | ask | show | jobs | submit login
Scraping data from an app: real world example (thewebscraping.club)
117 points by DantesTravel on Sept 4, 2022 | hide | past | favorite | 24 comments



Most app have some form of SSL pinning system in place which means that you have to perform additional work to allow the proxy to decrypt the HTTPS traffic.


"Most app have some form of SSL pinning system in place..."

I would like to see the data, if any, supporting this statement. I would expect some apps would use pinning, but most would not.

Google recommends against it.

https://developer.android.com/training/articles/security-ssl...


I would say based on personal observation that the more scrape-worthy an app is, the more likely it has cert pinning. Rather obvious if you think about it, really. High value targets especially from big shops tend to have other measures like complex MACs that make scraping hell.

I’m sure most largely-worthless-to-scrape apps don’t employ cert pinning.


> other measures like complex MACs that make scraping hell.

Do you have examples of these techniques?


Recent example I encountered: TikTok web API has dynamically generated parameters X-Bogus, msToken and _signature (could be slightly wrong, it’s been a while) that are verified server-side. I haven’t reversed their mobile app so not sure if they also employ MACs there, but I’ve seen these from other apps in the past. And it’s harder when employed in an app; on the web you’ll be reversing (obfuscated) JavaScript in a readily available debugger, whereas for an app you’ll likely be reversing from disassembly.


>Caution: Certificate Pinning is not recommended for Android applications due to the high risk of future server configuration changes, such as changing to another Certificate Authority, rendering the application unable to connect to the server without receiving a client software update.

This actually applies to websites and browsers as well.


Why isn't there a site-controlled fallback setting for this?

Does this not make sense? Abu given website's beet interest is to continue to be reachable.


Every escape hatch in the certificate validation is also an additional avenue for attack. For example, using a DNS record to override certificate pins makes DNS cache poisoning much more valuable to the attacker.


Every layer of security is also an additional accessibility hurdle.


Got it, thanks @tremon.


I reverse engineer Android apps for work and pinning is present in all but the lowest effort apps I encounter.


Most games I've tried to examine have had cert pinning enabled.


Can someone suggest some resources to understand the additional work needed to decrypt the pinned https traffic?



This is really helpful, thanks!


Previously i had some success with this https://httptoolkit.tech/ and running the app on android emulator


I have worked with the site mentioned in the article’s API previously. I am not sure why they used the overhead of a "scraping framework" when it was just JSON they needed to parse.


Perhaps they're used to the Scrapy APIs and tools? I agree that an HTTP requests library would have been sufficient but maybe it was just easier for them to use the framework they're used to.


That is really nice. The last time I attempted scraping an app was using an android emulator (bluestacks), then using maybe Wireshark or Charles for getting the API endpoint. It didn't work for some reason though. I don't remember the exact error and I am kinda skeptical about app scraping being this easy.


I recently did something similar with good results (I found the api endpoints I was interested in) using the official Android emulator and https://github.com/mitmproxy/mitmproxy

I did have to jump through some hoops with the emulator and pushing my own ssl cert to it's RO system partition. But it was a few commands and easy enough.


Thank you. I am going to try out your solution :) appreciate it.


There's also PCAPdroid [1] which you can run straight from your phone with no root. Works with https traffic too when you enable the mitm setting.

[1]: https://github.com/emanuele-f/PCAPdroid


This is a great method. It's essentially creating a custom client for their server. Fiddler makes interception of encrypted traffic from apps a lot easier than the last time I tried this. Really nice.


I use Burp for same purposes. Very convenient and solves the problem of MITM certificates.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: