The problem with waiting 1-2 seconds between requests is that if you’re trying t...

cushychicken · on Dec 3, 2021

The problem with waiting 1-2 seconds between requests is that if you’re trying to scrape on the scale of millions of pages, the difference between 30 parallel requests / sec and a single request every 1-2 seconds is the difference between a process that takes 9 hours and a month

Fortunately for me, I'm almost assuredly never going to have to do this on the scale of millions of pages. If time proves me wrong, I suspect I'll be hiring someone with more expertise to take over that part of the project.

I'm definitely biasing towards a very conservative interval. Optimizing the runtime is more to help with tightening the iteration cycles for me, the sole developer, instead of limiting the job size to a reasonable timeframe.