its not useful in those cases, but usually for those js rendered sites you can r...

squaresmile · on June 8, 2022

Or the data is stored in js objects within script tags in the html and can be extracted programmatically. It's getting common with SSG sites using SPA frameworks.

For example, the new Google Play Store website stores the data in AF_initDataCallback calls and can be extracted with re.findall(r"<script nonce=\"\S+\">AF_initDataCallback\((.*?)\);", html_string).

phone8675309 · on June 8, 2022

I used to do that when I was responsible for a set of web crawlers to extract public records data, but the problem is that changes happen and these sorts of things become out of date fairly quickly.

Getting this working in a headless browser driven by Selenium would probably be easier for maintainability.

867-5309 · on June 8, 2022

nowadays you usually have to submit http headers and cookies too, that's always a fun process of elimination