More

gerhardlazu · 2026-05-26T14:41:43 1779806503

Until recently, rewriting an established open source project could take years. With LLMs, that's changed. We're rewriting CRIU in Zig, and expect it to be complete in months, not years.

gerhardlazu · 2026-05-01T15:06:32 1777647992

Loophole Labs | Senior Systems Engineer | REMOTE (Americas & Europe) | $150k–195k USD + equity | Go, Zig, Rust, C, eBPF, CRIU, Kubernetes

We make Architect: a Kubernetes runtime that hibernates idle pods in place and wakes them in 50ms with TCP connections intact. Five engineers. You'd be the 6th.

If tracing an x86 instruction in the morning and hunting a control-plane race in the afternoon both sound fun, and you insist on measuring rather than guessing, this is the job.

Customers run Architect for workloads where cold starts hurt: real-time voice & video AI agents, long-warming JVM apps, stateful data services that can't be rescheduled cheaply. 1.0 shipped Q1 2026; you'd join mid-way through 2.0. Seed-stage, VC-funded, a few years of runway. Fully distributed across the Americas and Europe.

What you'd work on:

- Hibernation surface: containerd shims and CRUISE, our Zig-native CRIU replacement.

- Control plane: per-node DaemonSet streaming checkpoints; admission controller resizing hibernated pods in place.

- Networking and migration: eBPF/XDP at line rate; cross-node live migration to production; cross-cloud next.

You're a senior generalist. Years across the stack: assembly to frontends, hardware-near, comfortable in x86. Tests ship with the code, decisions get worked out in writing, and you measure rather than guess. Strong in Go; willing to use Zig, Rust, or C.

Bonus: eBPF/XDP, CRIU, Linux kernel internals, containerd, gVisor, live migration, or public writing in kernel/container/eBPF land. Strong systems depth and the willingness to pick up the rest is enough on its own.

Apply: https://loopholelabs.io/careers - we respond within the week (typically a few days).

gerhardlazu · on July 31, 2023

Another https://dagger.io fan here. Have been using it since late 2021 to continuously deploy a Phoenix app to Fly.io: https://github.com/thechangelog/changelog.com/pull/395. Every commit goes into production.

This is what the GHA workflow currently looks like: https://github.com/thechangelog/changelog.com/blob/c7b8a57b2...

FWIW, you can see how everything fits together in this architecture diagram: https://github.com/thechangelog/changelog.com/blob/master/IN...

gerhardlazu · on July 23, 2023

I really like the work that you're doing Thomas, this is the right approach. FWIW, https://fly.io/blog/carving-the-scheduler-out-of-our-orchest... is one of my favourite posts on your blog.

For everyone else reading this, we have been running https://changelog.com on Fly.io since April 2022. This is what our architecture currently looks like: https://github.com/thechangelog/changelog.com/blob/master/IN...

After 15 months & more than 100 million requests served by our Phoenix + PostgreSQL app running on Fly.io, I would be hard pressed to find a reason to complain. - Some deploys failed, and re-running the pipeline fixed it. - Early July 2023, 9k requests from Frankfurt returned 503s. Issue lasted 10 seconds. - While experimenting with machines, after many creations & deletions, one volume could not be deleted. Next day, the volume was gone.

That's about it after 15 months of running production workloads on Fly.io.

We mention about our Fly.io experience often in our Kaizen pod episodes, which we publish every ~2 months: https://changelog.com/topic/kaizen. For anyone curious, this is the episode in which we announced the migration: https://changelog.com/shipit/50. There is a detailed PR which goes with it: https://github.com/thechangelog/changelog.com/pull/407. We've been talking about our migration plan from apps v1 (Nomad) to apps v2 (flyd) recently: https://changelog.com/friends/2#transcript-138

I'm sorry to hear that many of you didn't have the best experience. I know that things will continue improving at Fly.io. My hope is that one day, all these hard times will make for great stories. This gives me hope: https://community.fly.io/t/reliability-its-not-great/11253

Keep improving.

gerhardlazu · on Dec 27, 2020

changelog.com used to be WordPress, then became a Phoenix app because it needed features that were hacky to implement & then manage in WP. It's more of a podcasting platform these days rather than a CMS.

The code in this repo tells the truth about what it is, and even shows how it works: https://github.com/thechangelog/changelog.com

gerhardlazu · on Dec 27, 2020

For what it's worth, Rook, OpenEBS or Longhorn are worth exploring.

gerhardlazu · on Dec 27, 2020

My first Supermicro just turned 9 and it's still running strong, with a fresh install of Ubuntu 20.04 & k3s over the holidays. The second Supermicro turned 5, and has been running FreeBSD all this time like a champ. They are both loft guardians.

A bunch of bare metal hosts run on Scaleway / Online, and different VMs & managed services run in Digital Ocean, Linode, AWS & GCP. I sometimes spin the odd bare metal instance on Equinix Metal (former Packet).

A diverse fleet means that there's always something new to learn and try out. A single large host would make me anxious, as no internet provider or power grid is 100% reliable and available. Also, software upgrades sometimes fail, and things get messed up all the time, which is when I find it most efficient to just start from scratch. A single host makes that less convenient.

Every approach has its pros and cons, which is why my main workstation is a 20 Xeon W with 64GB RAM & 1TB NVME : ). Yes, there is a backup workstation which doubles up as a mobile one meaning that it can work without power or hard internet for almost a day. Options are good ; )

gerhardlazu · on Dec 27, 2020

> Does this imply there is a cloud abstract layer that should come

crossplane.io comes closest afaik

> And is k8s the simplest possible abstraction? And if not - what is?

If you are asking about the simplest possible abstraction for container scheduling and orchestration, then I believe Nomad from HashiCorp or Docker Swarm are simpler. As for managed solutions with wide adoption in all types of environments and the largest investment to date, I am not aware of anything on par with K8S.

gerhardlazu · on Dec 27, 2020

We are both! I would also add lazy to that paradox. My surname is a letter off, and that's at close as it gets : )

The devil is in the details, there is more to it than dynamic & static content, we are using Fastly, otherwise we couldn't serve all the traffic that we do.

The best part is that it's all public - https://github.com/thechangelog/changelog.com - and we welcome contributions, especially those that simplify our setup without compromising on resiliency and availability. I'm looking forward to yours ; )

gerhardlazu · on Dec 27, 2020

K8S is an API that the majority is agreeing on, which is rare. There is a lot of amazing tooling, a staggering amount of ongoing innovation, all built on solid concepts: declarative models, emitted metrics (the /proc equivalent, but with larger scope) and versioned infrastructure as data (a.k.a. GitOps).

For someone that is known as the King of Bash (self-proclaimed) - https://speakerdeck.com/gerhardlazu/how-to-write-good-bash-c... - and after a decade of Puppet, Chef, Ansible and oh wow that sweet bash https://github.com/gerhard/deliver - even if all my workstations and work servers (yup, all running k3s) are provisioned with Make (bash++), I still think that K8S is the better approach to running production infrastructure. The advantage to using simple and well-defined components (e.g. external-dns, ingress-nginx, prometheus-operator etc.) that adhere to a universal API, and are maintained by many smart people all around the world, is a better proposition than scripting in my opinion.

At the end of the day, I'm in it for the shared mindset, great conversations and a genuine desire to do better, which I have not seen before K8S & the wider CNCF. I will go on a limb here and assume that I love scripting just as much as you do, but go beyond this aspect and you will discover that it's more to it than "thin install scripts that deploy containers" (which are not just glorified jails or unikernels).

ClumsyPilot · on Dec 27, 2020

I thi k you've hit your head on the nail - the point is not just the kubernetes, it's that you can build standard infrastructure on top. Any software can be (in theory) setup with a helm script, configured in a standard way through YAML configmaps rather than some esoteric configfiles or scripts which are diffetent for every piece of software