Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Two early databases I worked on.

The first contained monetary values. These were split over two columns, a decimal column holding the magnitude of the value, and a string column, containing an ISO currency code. Sounds good so far, right? Well, I learned much later (after, of course, having relied on the data) that the currency code column had only been added after expanding into Europe … but not before expanding into Canada. So when it had been added, there had been mixed USD/CAD values, but no currency code column to distinguish them. But when the column was added, they just defaulted it all to USD. So and USD value could be CAD — you "just" needed to parse the address column to find out.

Another one was a pair of Postgres DBs. To provide "redundancy" in case of an outage, there were two such databases. But no sort of Postgres replication strategy was used between them, rather, IIRC, the client did the replication. There was no formal specification of the consensus logic — if it could even be said to have such logic; I think it was just "try both, hope for the best". Effectively, this is a rather poorly described multi-master setup. They'd noticed some of the values hadn't replicated properly, and wanted to know how bad it was; could I find places where the databases disagreed?

I didn't know the term "split brain" at the time (that would have helped!), but that's what this setup was in. What made pairing data worse is that, while any column containing text was a varchar, IIRC the character set of the database was just "latin1". The client ran on Windows, and it was just shipping the values from the Windows API "A" functions directly to the database. So Windows has two sets of APIs for like … everything with a string, an "A" version, and a "W" version. "W" is supposed to be Unicode¹, but "A" is "the computer's locale", which is nearly never latin1. Worse, the company had some usage on machines that were set to like, the Russian locale is, or the Greek locale. So every string value in the database was, effectively, in a different character set, and nowhere was it specified which. The assumption is the same bytes would always get shipped back to the same client, or something? It wasn't always the case, and if you opened a client and poked around enough, you'd find mojibake easily enough. Now remember we're trying to find mismatched/unreplicated rows? Some rows were mismatched in character encoding only: the values on the two DBs were technically the same, just encoded differently. (Their machines' Python setup was also broken, because Python was ridiculously out of date. I'm talking 2.x where the x was too old, this was before the problems of Python 3 were relevant. Everything in the company was C++, so this didn't matter much to the older hands there, but … god a working Python would have made working with character set issues so much easier.)

¹IIRC, it's best described as "nearly UTF-16"




> But when the column was added, they just defaulted it all to USD. So and USD value could be CAD — you "just" needed to parse the address column to find out.

I bet no one even considered the idea that someone in Canada might pay in USD dollars.

I had to... discuss... with someone the possibility that if we were recording money received from a bank, we might do well to record the currency type as well, because... banks can deal with multiple currencies. "No, banks don't do that. That's not possible". This US company was getting ready to expand their services over to Europe, and I couldn't understand why no one could understand that this might be a necessary item to record.

Someone using this system in France to help finance a project in New York, for example, might want to know whether Euros or USD were the subject of the project. This was sort of a PM tool to note the cost of a project and how much was left in a budget - it wasn't moving money directly. We had a system to choose specific banks and note that 'person X moved Y currency units to the project' but... no ability to select the currency units. If it was from a UK bank, it would be GBP. A US bank, USD, etc. I was voted down as not understanding how banks worked.

Months later, a demo was presented right before launch to the people who were financing the project, and they asked how someone would know the currency being used for each project. Back came a work ticket "high priority" because we were already late for launch and multiple screens now had to accommodate this "new" requirement.

I know at least some of this is how I present info, but... being 'nice', being 'inquisitive', raising issues up the chain of command, etc.. rarely work. These sorts of extremely obvious things get ignored until the actual end users bring up the same concern. Somehow I wasn't supposed to know this sort of info because "you're a developer, you're not some international banker". Insane...


For such told-you-so scenarios, when there is a disagreement and you are pretty sure you are correct, raise a ticket, describe the issue and close it "As Designed" with comment detailing the discussion (including the overruling party). Next time there is a ticket to do this with "high priority", tell them "we already have a ticket for this, let me reopen it" for everyone to see... risky though, saving face/burning bridges and all that.


Agreed. I usually document the disagreement somewhere - tickets, or email thread at very least.


> I bet no one even considered the idea that someone in Canada might pay in USD dollars.

Certainly not I, at the time! (In my defense, I was an intern, and was shocked to learn that rows labelled "USD" could not be assumed to be USD. I hope today I am less naïve… but even by today me's standards, that's pretty appalling data quality.)

> I was voted down as not understanding how banks worked.

Yep, I feel this. The best engineers I have worked with make it their job to become an expert in the subject matter they're designing/engineering software for. I mean … why wouldn't they?


Reminds me of a date column where half the dates were MM/DD/YYYY and the rest DD/MM/YYYY. I did manage to eventually find the point in time when it changed and normalise the database. For those wondering about DD/MM, Australia.



Sure, just trying to explain it to US ;)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: