Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In the past year or so I have thought a lot about how to design tables and columns within databases and there is nearly nothing that wouldn't get more robust by adding in a "valid_from" and "valid_till" and make it accept multiple values. Someone's name is Foo? What if they change it to Bar at some point and you need to access something from before with the old name?

If you have only a name field that has a single value that is going to be a crazy workaround. If your names are referencing a person with a date that is much easier. But you need to make that ddcision pretty early.



The tradeoff is that this is very expensive at the scale of large geospatial data models both in terms of performance and storage. In practice, it is much more common to just take regular snapshots of the database. If you want to go back in time, you have to spin-up an old snapshot of the database model.

A less obvious issue is that to make this work well, you need to do time interval intersection searches/joins at scale. There is a dearth of scalable data structures and algorithms for this in databases.


Anyone who works with human names should take a look at the HL7 V3 and FHIR data models, which were designed for healthcare. They support name validity ranges, and a bunch of other related metadata. It can be challenging to efficiently represent those abstract data models in a traditional traditional database because with a fully normalized schema you end up needing a lot of joins.


If you have an "audit" table, where you write a copy of the data before updating it in the primary table, that's a decision you can make at any point.

Of course, you don't get that historical data, but you do get it going forward from there.


SQL 2011 defines temporal tables, which few FOSS databases support. I used it in mariadb:

https://mariadb.com/kb/en/temporal-tables/

and if your schema doesn't change much, it's practically free to implement, much easier and simpler than copypasting audit tables, or relying on codegen to do the same.


something like https://www.pgaudit.org/ ?

Basically you keep an history of all changes so you can always roll-back / get that data if needed?


The last time we did this, we basically hand-rolled our own, with a database trigger to insert data into a different table whenever an `UPDATE` statement happened.

But this seems like it's probably a better solution.


never had used pgaudit yet to vouch for it but have it on the backburner/log of things to try for such a use case!

I think the real magic is it lleverages the WAL (write ahead logs) from pg engine itself, which you could certainly hook up into too, but im not a db expert here


I just found out about bemi dot io, seems like they're targeting this issue


See also "Eventual Business Consistency"[0] by Kent Beck. Really good read.

> Double-dated data—we tag each bit of business data with 2 dates:

> * The date on which the data changed out in the real world, the effective date.

> * The date on which the system found out about the change, the posting date.

> Using effective & posting dates together we can record all the strange twists & turns of feeding data into a system.

[0] https://tidyfirst.substack.com/p/eventual-business-consisten...


Thanks for posting this, I read tbis a while ago, but it is worth revisiting.


One that routinely surprises me is that this is not easy to do in any popular contacts application. I actually would like to keep every address I've ever had in there, in case I need to remember it for some reason later. Maybe I just want to reminisce. I don't want to accidentally have it as an "active" one, though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: