Hacker News new | past | comments | ask | show | jobs | submit login

You are dead on. We do have a bug where we are not recovering the Oracle connectivity correctly. It is on our radar to address the issue. https://github.com/department-of-veterans-affairs/caseflow-m...

However, There is actually another 50% of the story that I never posted. VACOLS is a really old Oracle DB (from the 80s) that is out of our control. Somehow, it has a "feature" where you can only make one TCP connection to it every 2-3 second. So if we lose connection to the database, it will take many seconds to recover. At that point, our ELB health-check would've fired and restarted our EC2 instances. This is why recoverability of the database connection is not an immediate priority.

Here's how we preallocate the VACOLS connection pool to workaround this throttling feature. https://github.com/department-of-veterans-affairs/caseflow/b...

The infrastructure we operate in are very challenging (and interesting) because of legacy systems. That's why common sense engineering often may not apply in USDS.




I also bet, those challenging legacy systems in many case are way better built than what "modern" systems would provide. Sure there will be whacky things to work around, but I've seen my share of whacky engineering in brand new systems too. Common sense engineering seems to be few and far between these day's.

Kudos on having something interesting to work on.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: