So google compute setup I did a while back with preemptible instances + a celery queue + some autoscale based on load...
The guts to make all that work was 50 or so lines of config. I think my auto scale script was 20 lines or so of Python.
I guess the biggest downside was spinning up the new server took about 2 minutes, so for big load spikes it took a bit for it to level out... but with GCE per minute billing, all the napkin math I did says it is a fair bit cheaper per cpu unit to do it this way.
In conclusion, someone good with devops + a normal work queue system could do this years ago. I guess it's cool to lower the barrier to entry, but not it does not seem like a game changer. It totally IS cool to scale out backend work to a job farm. Just seems like not the only way to do it.
That's essentially what Google Cloud Dataproc gives you (managed Hadoop/Spark):
- Per-minute billing
- 0-to-cluster in under 90 seconds (aim for 30 seconds)
- Pre-emptiblem VMs
- Custom VMs
Now you start with a job, pay a 30 second penalty, and execute it on an entirely ephemeral cluster. The "get a cluster and fill it with jobs and round up to an hour" model is indeed outdated IMHO.
Well it depends on my workload. The setup above on GCE had a stable load of X thousand jobs per minute, then burst loads up to 100x for short times. So for me it made sense to have a 24/7 celery cluster for the base load and add and remove nodes for the variable node. There was never a point shutting down the cluster made sense.
The guts to make all that work was 50 or so lines of config. I think my auto scale script was 20 lines or so of Python.
I guess the biggest downside was spinning up the new server took about 2 minutes, so for big load spikes it took a bit for it to level out... but with GCE per minute billing, all the napkin math I did says it is a fair bit cheaper per cpu unit to do it this way.
In conclusion, someone good with devops + a normal work queue system could do this years ago. I guess it's cool to lower the barrier to entry, but not it does not seem like a game changer. It totally IS cool to scale out backend work to a job farm. Just seems like not the only way to do it.