Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Writing from the SRE side of the discipline, it's commonly a configuration change (or a "flag flip") that ultimately winds up causing an outage. All too seldom are configuration data considered part of the same deployable surface area (and, as a corollary, part of the same blast radius) as program text.

I've mostly resigned myself today to deploying the configuration change and watching for anomalies in my monitoring for a number of hours or days afterward, but I acknowledge that I also have both a process supervisor that will happily let me crash loop my programs and deployment infrastructure that will nonetheless allow me to roll things back. Without either of those, I'm honestly at a loss as to how I'd safely operate this product.




  # Update A
  
  ## config.ext
  
  foo = false
  
  ## src.py
  
  from config import config
  
  if config('foo'):
      work(2 / 0)
  else:
      work(10 / 5)
"Yep, we rigorously tested it."

  # Update B
  
  ## config.ext
  
  foo = true
"It's just a config change, let's go live."


Yeah, that's about right.

The most insidious part of this is when there are entire swaths of infrastructure in place that circumvent the usual code review process in order to execute those configuration changes. Boolean flags like your `config('foo')` here are most common, but I've also seen nested dictionaries shoved through this way.


When I was at FB there were a load of SEVs caused by config changes, such that the repo itself would print out a huge warning about updating configs and show you how to do a canary to avoid this problem.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: