So google compute setup I did a while back with preemptible instances + a celery...

vgt · on Nov 4, 2016

That's essentially what Google Cloud Dataproc gives you (managed Hadoop/Spark):

- Per-minute billing

- 0-to-cluster in under 90 seconds (aim for 30 seconds)

- Pre-emptiblem VMs

- Custom VMs

Now you start with a job, pay a 30 second penalty, and execute it on an entirely ephemeral cluster. The "get a cluster and fill it with jobs and round up to an hour" model is indeed outdated IMHO.

(work at Google Cloud)

brianwawok · on Nov 5, 2016

Well it depends on my workload. The setup above on GCE had a stable load of X thousand jobs per minute, then burst loads up to 100x for short times. So for me it made sense to have a 24/7 celery cluster for the base load and add and remove nodes for the variable node. There was never a point shutting down the cluster made sense.