Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We're also doing this at Thumbtack. We run all of our Spark jobs in job-scoped Cloud Dataproc clusters. We wrote a custom Airflow operator which launches a cluster, schedules a job on that cluster, and shuts down the cluster upon job completion. Since Google can bring up Spark clusters in < 90s and bills minutely, this works really well for us, simplifying our infrastructure and eliminating resource contention issues.


Co-Author of the blog here.

Awesome stuff, glad to see folks leveraging the possibilities! Perhaps as a follow-up you could write a guest blog on how this works for you! Feel free to ping me offline.


Have you tried calculating what percentage increase in cost there would be if you moved to an aws billing style?

Basically I'm curious if your hands are tied to gcp because of the fine grained billing they provide?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: