Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why are page loads so slow or demanding? I can't imagine how a web crawler could be DoS'ing you if it's in good faith. What is the TPS? What caching are you doing? What's your stack like?


Not GP, but from having run a small/niche search engine that got hammered by a crawler in the past:

Webserver was a single VM running a Java + Spring webserver in Tomcat, connecting to an overworked Solr cluster to do the actual faceted searching.

Caches kept most page loads for organic traffic within respectable bounds, but the crawler destroyed our cache hit rate when it was scraping our site and at one point did exhaust a concurrent connection limit of some kind because there were so many slow/timing-out requests in progress at the same time.


I would expect that a small to medium e-commerce site would cache all their pages.


That approach depends on what's in the pages.

Does the served HTML contain everything: the product, related products, comments, reviews, etc? If so, caching the entire page might be counterproductive.

But if the page is designed in such a way where it's broken up into fragments, then caching heavily is your friend.

Those fragments can either be loaded client-side (Javascript) or server-side (ESI, SSR, etc) -- but page fragments are key to increasing how much and how long you can cache.

And with this, you've added a layer of complexity that a small to medium e-commerce site may not be able to fulfill with their in-house talent.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: