From experience: Datomic Cloud is slow, expensive, resource intensive, designed ...

lgrapenthin · on April 27, 2023

As someone who is using Datomic Pro in production for many years now I must agree with you. One time I began a project with Datomic Cloud and it was a disaster similar to what you described. I learned a lot about AWS, but after about half a year we switched to Datomic Pro.

There were some cool ideas in Datomic Cloud, like IONs and its integrated deployment CLI. But the dev workflow with Datomic Pro in the REPL, potentially connected to your live or staging database is much more interactive and fun than waiting for CodeDeploy. I guess there is a reason Datomic Pro is the featured product on datomic.com again. It appears that Cognitect took a big bet with Datomic Cloud and it didn't take off. Soon after the NuBank acquisition happened. That being said, Datomic Cloud was not a bad idea, it just turned out that Datomic Pro/onPrem is much easier to use. Also of all their APIs, the "Peer API" of Pro is just the best IME, especially with `d/entity` vs. "pull" etc.

JulianWasTaken · on April 27, 2023

I don't doubt your story of course, and I love Postgres, but comparing apples to oranges no?

Datomic's killer feature is time travel.

Did you simply not use that feature once you moved off Datomic (and if so why'd you pick Datomic in the first place)? Or are you using Postgres using some extension to add in?

bvanderveen · on April 27, 2023

We implemented it in Postgres with 'created_at' and 'deleted_at' columns on everything and filtering to make sure that the object 'exists' at the time the query is concerned with. Changes in relationships between objects are modeled as join tables with a boolean indicating whether the relationship is made or broken and at what time.

Our data model is not large and we had a very complete test suite already, so it was easy to produce another implementation backed by postgres, RAM, etc.

wwweston · on April 27, 2023

Yeah, it seems you could be able to substitute thoughtful schema design avoiding updates/deletes for time-travel as a feature.

I wonder if anyone has made a collection of reference examples implemented this way (and in general think that a substantial compendium good examples of DB schema and thinking behind them could be worthwhile).

pjot · on April 27, 2023

It’s called a slowly changing dimension. In this example, it’s a type-2.

https://en.m.wikipedia.org/wiki/Slowly_changing_dimension

twic · on April 27, 2023

I'm moderately confident you could mechanically transform a time-oblivious schema into a history-preserving one, and then write a view on top of it which gave a slice at a particular time. Moderately.

ysleepy · on April 27, 2023

That is essentially what MVCC does.

JohnBooty · on April 28, 2023

Yes, although AFAIK those hidden MVCC columns (xmin, xmax?) aren't very usable from an application standpoint -- the obsoleted rows only hang around until the next VACUUM, right?

I realize you're not claiming those columns are useful from an application perspective. Just curious to know if I'm wrong and they are useful.

Because as I understand it, the selling point of Datomic is their audit trail functionality and that is admittedly a bit onerous to implement in a RBDMS. Even though I feel like every project needs/requires that eventually.

ysleepy · on April 29, 2023

I meant MVCC is the proof that you can automate the transform of a schema into a versioned schema. How and if the DBMS exposes that is another concern.

The garbage collection / VACUUM part of an MVCC system is the harder part, saving all versions and querying a point in time is the easy one.

twic · on April 28, 2023

Oracle lets you use the MVCC data to query past states of the database, called "flashback":

https://docs.oracle.com/en/database/oracle/oracle-database/2...

You can configure how long the old data is kept:

https://docs.oracle.com/en/database/oracle/oracle-database/2...

Worked examples:

http://www.dba-oracle.com/t_rman_149_flasbback_query.htm

JohnBooty · on April 28, 2023

Wow, super informative. Thank you so much.

twic · on April 27, 2023

That is an extremely good point.

mjul · on April 28, 2023

Snodgrass wrote a whole book on that topic: Developing Time-Oriented Database Applications in SQL (1999).

It is available as PDF on his publications page:

https://www2.cs.arizona.edu/~rts/publications.html

rehevkor5 · on April 27, 2023

Maybe search around on bitemporal database table modeling.

jskulski · on April 29, 2023

Ive built a couple systems that would have been datomic’s bread and butter.

Each time the company was more comfortable with mainstream dbs, so we ended going with something like you’re talking about, built on top of a db. A couple of the projects were because a mainstream dbs wouldn’t scale.

The systems definitely worked, but it was also a lot of implementation complexity on an other wise simple business prop: “store this data as facts”

btown · on April 27, 2023

https://www.postgresql.org/docs/11/contrib-spi.html#id-1.11.... discusses a model for implementing time travel in Postgres <12 using SPI. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit... discusses why it was removed in Postgres 12 - it seems logical that it's more maintainable to implement in plpgsql, though as far as I can tell there aren't off-the-shelf implementations of this.

We use https://django-simple-history.readthedocs.io/en/latest/ (with some custom tooling for diff generation) for audit logs and resettability, and while you can't move an entire set of tables back in time simultaneously, it's usually sufficient for understanding data history.

ithrow · on April 27, 2023

Datomic's 'time travel' is an audit feature, not something for your application/business logic to depend on. Performance reasons make it impractical, unless you only have like 10 users and very little data.

JulianWasTaken · on April 27, 2023

That's certainly not how it sells and markets itself.

The first feature on benefits (and the only reason I've ever heard Datomic brought up and/or considered it myself for production workflows) is using that stuff in application workflows: https://docs.datomic.com/pro/time/filters.html#history

Could be you're saying it in fact doesn't work well performance-wise, that'd (surprise me but) certainly explain why it's not more popular -- but I think it's clear it wants you to use this as an application feature.

newlisp · on April 27, 2023

Welcome to sales tactics ;)

Datomic is great but as another commenter said, is good for "small-ish backoffice systems that never has to be web scale". You almost probably can rely on querying history for internal applications. I think their primary market was for companies to use it internally but they never made this clear.

KingMob · on May 9, 2023

Ironically, Hickey fired the one marketer they hired for Datomic.

He lucked out when a unicorn went all in on it. Word around Cognitect was, Datomic was barely breaking even.

xmlblog · on April 27, 2023

> "small-ish backoffice systems that never has to be web scale". Doesn't production use of Datomic by Nubank and Netflix (to mention just two examples) belie this assertion?

KingMob · on April 28, 2023

Alternatively, Datomic wasn't performing up to snuff, and they found it cheaper to buy Cognitect than do a DB migration :D

bilkow · on April 28, 2023

Do those companies specify what they use it for? They probably have their own internal "small-ish backoffice systems".

oblio · on April 28, 2023

Nubank is one thing, but for Netflix, just like for any big company 10000 DB technologies are probably in use at the same time.

And 9996 of them are used for stuff like the internal HR DB or other minor projects.

outworlder · on April 27, 2023

Are they _forcing_ you to use CloudFormation? Or is it just the officially supported mechanism?

> Mother Postgres can do no wrong.

I'll say that Postgres is usually the answer for the vast majority of use-cases. Even when you think you need something else to do something different, it's probably still a good enough solution. I've seen teams pitching other system just because they wanted to push a bunch of JSON. Guess what, PG can handle that fine and even run SQL queries against that. PG can access other database systems with its foreign data wrappers(https://wiki.postgresql.org/wiki/Foreign_data_wrappers).

The main difficulty is that horizontally scaling it is not trivial(although not impossible, and that can be improved with third party companies).

JohnBooty · on April 28, 2023

Yes. Postgres such a reliable and known quantity that IMO it should be the default choice for just about anything.

Don't misunderstand me. There are plenty of times when something else is the right choice. I'm just saying, when I have a say in the matter, folks need to clear that bar -- "tell me why tool xyz is going to be so much better than postgres for this use case that it justifies the overhead of adding another piece of software infrastructure."

Like, you want to add a document database? Obviously Mongo, Elasticsearch, etc are "best of breed." But Postgres is pretty capable and this team is already good at it. Are we ever going to have so many documents that e.g. Elasticsearch's mostly-effortless horizontal scaling even comes into play? If you don't ever see yourself scaling past 1,000 documents then adding a new piece of infra is a total joke. I see that kind of thing all the time. I can't tell if developers truly do not understand scale, or if they simply do not give a f--- and simply want to play with shiny new toys and enrich their resumes.

I mean, I've literally had devops guys telling me we need a Redis cluster even though we were only storing a few kilobytes of data, that was read dozens of times daily with zero plans to scale. That could have been a f'in Postgres table. Devops guy defended that choice hard even when pressed by mgmt to reduce AWS spend. WTF?

jwr · on April 28, 2023

> Postgres such a reliable and known quantity that IMO it should be the default choice for just about anything.

This is being repeated so often. And yet — the above is true, IF (and that's a big if for some of us) you are OK with having your database on a single machine.

If you want a distributed database with strict serializability, where some nodes can go down and you still get correct answers, Postgres is not it.

JohnBooty · on April 28, 2023

Totally agree. That's really my thinking as well. Default to Postgres unless you have a reason not to choose it, and a need for distributed serializability is one of those cases where Postgres is an easy "nope, not suitable."

But I've also been burned by people reflexively reaching for $SHINY_NEW_TOY by default, when really there is no need. Architects and senior-level devs are the worst offenders. They throw a bunch of needlessly buzzword-compliant infra at a problem and then move on. They have the time and freedom to learn $SHINY_NEW_TOY well enough to MVP a product, but then the project is passed on to people who don't have that luxury.

I feel like there's a progression that often happens:

1. Early engineers: stick to Postgres or another RDBMS because it's all they know

2. Mid-stage engineers with "senior" in their title for the first time: reach for $SHINY_NEW_TOY

3. Late-stage engineers: stick to Postgres because it's something the whole team already knows and they recognize the true long-term cost of throwing multiple new bits of software infra into the mix

quickthrower2 · on April 28, 2023

Where does sqlite fit into the heuristic? I thought for toy apps or small sites it might be an even easier option and cheaper.

ggleason · on April 27, 2023

> Datomic Cloud is slow, expensive, resource intensive, designed in the baroque style of massively over-complicated CloudFormation astronautics. Hard to diagnose performance issues. Impossible to backup.

You should give TerminusDB a go (https://terminusdb.com/), it's really OSS, the cloud version is cheap, fast, there are not tons of baroque settings, and it's easy to backup using clone.

TermiusDB is a graph database with a git-like model with push/pull/clone semantics as well as a datalog.

ithrow · on April 27, 2023

I guess this is why datomic.com front page now defaults to datomic pro and not cloud.

panick21_ · on April 30, 2023

They should have just focused on having a great Kubernetes setup experience, their focus on this cloud stuff always seemed strange to me.

avodonosov · on April 27, 2023

Why backups were impossible? Couldn't you backed up the storage resources?

bvanderveen · on April 28, 2023

As far as I could tell, there was no straightforward way to point a new instance of the compute resources back at the old storage resources since they're all provisioned in one CF template.

froggertoaster · on April 28, 2023

> Mother Postgres can do no wrong.

Simple, eloquent, damn true.