Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From experience:

Datomic Cloud is slow, expensive, resource intensive, designed in the baroque style of massively over-complicated CloudFormation astronautics. Hard to diagnose performance issues. Impossible to backup. Ran into one scenario where apparently we weren't quick enough to migrate to the latest version, AWS had dropped support for $runtime in Lambda, and it became impossible to upgrade the CloudFormation template. Had to write application code to export/reimport prod data from cluster to another—there was no other migration path (and yes, we were talking to their enterprise support).

We migrated to Postgres and are now using a 10th of the compute resources. Our p99 response times went from 1.3-1.5s to under 300ms once all the read traffic was cut over.

Mother Postgres can do no wrong.

Still, Datomic seems like a cool idea.



As someone who is using Datomic Pro in production for many years now I must agree with you. One time I began a project with Datomic Cloud and it was a disaster similar to what you described. I learned a lot about AWS, but after about half a year we switched to Datomic Pro.

There were some cool ideas in Datomic Cloud, like IONs and its integrated deployment CLI. But the dev workflow with Datomic Pro in the REPL, potentially connected to your live or staging database is much more interactive and fun than waiting for CodeDeploy. I guess there is a reason Datomic Pro is the featured product on datomic.com again. It appears that Cognitect took a big bet with Datomic Cloud and it didn't take off. Soon after the NuBank acquisition happened. That being said, Datomic Cloud was not a bad idea, it just turned out that Datomic Pro/onPrem is much easier to use. Also of all their APIs, the "Peer API" of Pro is just the best IME, especially with `d/entity` vs. "pull" etc.


I don't doubt your story of course, and I love Postgres, but comparing apples to oranges no?

Datomic's killer feature is time travel.

Did you simply not use that feature once you moved off Datomic (and if so why'd you pick Datomic in the first place)? Or are you using Postgres using some extension to add in?


We implemented it in Postgres with 'created_at' and 'deleted_at' columns on everything and filtering to make sure that the object 'exists' at the time the query is concerned with. Changes in relationships between objects are modeled as join tables with a boolean indicating whether the relationship is made or broken and at what time.

Our data model is not large and we had a very complete test suite already, so it was easy to produce another implementation backed by postgres, RAM, etc.


Yeah, it seems you could be able to substitute thoughtful schema design avoiding updates/deletes for time-travel as a feature.

I wonder if anyone has made a collection of reference examples implemented this way (and in general think that a substantial compendium good examples of DB schema and thinking behind them could be worthwhile).


It’s called a slowly changing dimension. In this example, it’s a type-2.

https://en.m.wikipedia.org/wiki/Slowly_changing_dimension


I'm moderately confident you could mechanically transform a time-oblivious schema into a history-preserving one, and then write a view on top of it which gave a slice at a particular time. Moderately.


That is essentially what MVCC does.


Yes, although AFAIK those hidden MVCC columns (xmin, xmax?) aren't very usable from an application standpoint -- the obsoleted rows only hang around until the next VACUUM, right?

I realize you're not claiming those columns are useful from an application perspective. Just curious to know if I'm wrong and they are useful.

Because as I understand it, the selling point of Datomic is their audit trail functionality and that is admittedly a bit onerous to implement in a RBDMS. Even though I feel like every project needs/requires that eventually.


I meant MVCC is the proof that you can automate the transform of a schema into a versioned schema. How and if the DBMS exposes that is another concern.

The garbage collection / VACUUM part of an MVCC system is the harder part, saving all versions and querying a point in time is the easy one.


Oracle lets you use the MVCC data to query past states of the database, called "flashback":

https://docs.oracle.com/en/database/oracle/oracle-database/2...

You can configure how long the old data is kept:

https://docs.oracle.com/en/database/oracle/oracle-database/2...

Worked examples:

http://www.dba-oracle.com/t_rman_149_flasbback_query.htm


Wow, super informative. Thank you so much.


That is an extremely good point.


Snodgrass wrote a whole book on that topic: Developing Time-Oriented Database Applications in SQL (1999).

It is available as PDF on his publications page:

https://www2.cs.arizona.edu/~rts/publications.html


Maybe search around on bitemporal database table modeling.


Ive built a couple systems that would have been datomic’s bread and butter.

Each time the company was more comfortable with mainstream dbs, so we ended going with something like you’re talking about, built on top of a db. A couple of the projects were because a mainstream dbs wouldn’t scale.

The systems definitely worked, but it was also a lot of implementation complexity on an other wise simple business prop: “store this data as facts”


https://www.postgresql.org/docs/11/contrib-spi.html#id-1.11.... discusses a model for implementing time travel in Postgres <12 using SPI. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit... discusses why it was removed in Postgres 12 - it seems logical that it's more maintainable to implement in plpgsql, though as far as I can tell there aren't off-the-shelf implementations of this.

We use https://django-simple-history.readthedocs.io/en/latest/ (with some custom tooling for diff generation) for audit logs and resettability, and while you can't move an entire set of tables back in time simultaneously, it's usually sufficient for understanding data history.


Datomic's 'time travel' is an audit feature, not something for your application/business logic to depend on. Performance reasons make it impractical, unless you only have like 10 users and very little data.


That's certainly not how it sells and markets itself.

The first feature on benefits (and the only reason I've ever heard Datomic brought up and/or considered it myself for production workflows) is using that stuff in application workflows: https://docs.datomic.com/pro/time/filters.html#history

Could be you're saying it in fact doesn't work well performance-wise, that'd (surprise me but) certainly explain why it's not more popular -- but I think it's clear it wants you to use this as an application feature.


Welcome to sales tactics ;)

Datomic is great but as another commenter said, is good for "small-ish backoffice systems that never has to be web scale". You almost probably can rely on querying history for internal applications. I think their primary market was for companies to use it internally but they never made this clear.


Ironically, Hickey fired the one marketer they hired for Datomic.

He lucked out when a unicorn went all in on it. Word around Cognitect was, Datomic was barely breaking even.


> "small-ish backoffice systems that never has to be web scale". Doesn't production use of Datomic by Nubank and Netflix (to mention just two examples) belie this assertion?


Alternatively, Datomic wasn't performing up to snuff, and they found it cheaper to buy Cognitect than do a DB migration :D


Do those companies specify what they use it for? They probably have their own internal "small-ish backoffice systems".


Nubank is one thing, but for Netflix, just like for any big company 10000 DB technologies are probably in use at the same time.

And 9996 of them are used for stuff like the internal HR DB or other minor projects.


Are they _forcing_ you to use CloudFormation? Or is it just the officially supported mechanism?

> Mother Postgres can do no wrong.

I'll say that Postgres is usually the answer for the vast majority of use-cases. Even when you think you need something else to do something different, it's probably still a good enough solution. I've seen teams pitching other system just because they wanted to push a bunch of JSON. Guess what, PG can handle that fine and even run SQL queries against that. PG can access other database systems with its foreign data wrappers(https://wiki.postgresql.org/wiki/Foreign_data_wrappers).

The main difficulty is that horizontally scaling it is not trivial(although not impossible, and that can be improved with third party companies).


Yes. Postgres such a reliable and known quantity that IMO it should be the default choice for just about anything.

Don't misunderstand me. There are plenty of times when something else is the right choice. I'm just saying, when I have a say in the matter, folks need to clear that bar -- "tell me why tool xyz is going to be so much better than postgres for this use case that it justifies the overhead of adding another piece of software infrastructure."

Like, you want to add a document database? Obviously Mongo, Elasticsearch, etc are "best of breed." But Postgres is pretty capable and this team is already good at it. Are we ever going to have so many documents that e.g. Elasticsearch's mostly-effortless horizontal scaling even comes into play? If you don't ever see yourself scaling past 1,000 documents then adding a new piece of infra is a total joke. I see that kind of thing all the time. I can't tell if developers truly do not understand scale, or if they simply do not give a f--- and simply want to play with shiny new toys and enrich their resumes.

I mean, I've literally had devops guys telling me we need a Redis cluster even though we were only storing a few kilobytes of data, that was read dozens of times daily with zero plans to scale. That could have been a f'in Postgres table. Devops guy defended that choice hard even when pressed by mgmt to reduce AWS spend. WTF?


> Postgres such a reliable and known quantity that IMO it should be the default choice for just about anything.

This is being repeated so often. And yet — the above is true, IF (and that's a big if for some of us) you are OK with having your database on a single machine.

If you want a distributed database with strict serializability, where some nodes can go down and you still get correct answers, Postgres is not it.


Totally agree. That's really my thinking as well. Default to Postgres unless you have a reason not to choose it, and a need for distributed serializability is one of those cases where Postgres is an easy "nope, not suitable."

But I've also been burned by people reflexively reaching for $SHINY_NEW_TOY by default, when really there is no need. Architects and senior-level devs are the worst offenders. They throw a bunch of needlessly buzzword-compliant infra at a problem and then move on. They have the time and freedom to learn $SHINY_NEW_TOY well enough to MVP a product, but then the project is passed on to people who don't have that luxury.

I feel like there's a progression that often happens:

1. Early engineers: stick to Postgres or another RDBMS because it's all they know

2. Mid-stage engineers with "senior" in their title for the first time: reach for $SHINY_NEW_TOY

3. Late-stage engineers: stick to Postgres because it's something the whole team already knows and they recognize the true long-term cost of throwing multiple new bits of software infra into the mix


Where does sqlite fit into the heuristic? I thought for toy apps or small sites it might be an even easier option and cheaper.


> Datomic Cloud is slow, expensive, resource intensive, designed in the baroque style of massively over-complicated CloudFormation astronautics. Hard to diagnose performance issues. Impossible to backup.

You should give TerminusDB a go (https://terminusdb.com/), it's really OSS, the cloud version is cheap, fast, there are not tons of baroque settings, and it's easy to backup using clone.

TermiusDB is a graph database with a git-like model with push/pull/clone semantics as well as a datalog.


I guess this is why datomic.com front page now defaults to datomic pro and not cloud.


They should have just focused on having a great Kubernetes setup experience, their focus on this cloud stuff always seemed strange to me.


Why backups were impossible? Couldn't you backed up the storage resources?


As far as I could tell, there was no straightforward way to point a new instance of the compute resources back at the old storage resources since they're all provisioned in one CF template.


> Mother Postgres can do no wrong.

Simple, eloquent, damn true.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: