Gathering stats requires keeping them somewhere. Making inferences. Documenting ...

vlovich123 · on April 25, 2018

Parsing HTML content won't get you the full benefit an inference engine would. An inference engine could easily learn that 90% of your users getting to your landing page are going to login & end up on their home screen so it would push the static resources for the home screen too. Similarly, it might know that it already pushed those resources previously in the session & only push the new static resources that are unique to you once you login (saving the round-trip of the client nacking the resource). Doing it via stateless HTML parsing is never going to work because you have no idea of the state of the session. That doesn't mean there's not a place for a mixture of approaches (& yes you could teach the HTML parsing about historical pushes but then you get back to the concern you raised about storing that data somewhere).

The HTML parsing approach is probably great from a 80% of the benefit for 20% of the effort on small-scale websites (i.e. majority). A super accurate inference engine might use deep learning to train what to serve on a very personalized level if you have a lot of users & the CPU/latency trade-off makes sense for your business model (i.e. more accuracy for a larger slice of your population). A less accurate one might just collect statistics in a DB & make cheap less accurate guesses from that (or use more "classic ML" like Bayes) if you have a medium amount of users or the CPU usage makes more sense and you're OK with the maintenance burden of a DB. It's a sliding scale IMO of tradeoffs with different approaches making sense depending on your priorities.

pas · on April 26, 2018

Yes, I agree, that of course a hypothetical ML/AI outperforms any naive and simple solution. But usually magic technology is required to do that, otherwise it wouldn't be magic :)

That said a simple heuristic like "after requesting an URL the server got these requests on the same HTTP/2 connection in less than 1 second, and those were static assets served with Expires headers" could work.

vlovich123 · on April 26, 2018

Yes, like I said there's a sliding scale of effort/reward & HTML parsing is on the extreme of one end.