RxDB – a real-time database on top of PouchDB

lukevp · on Sept 1, 2020

The concept or RxDB, being able to iterate observables of your change stream, is great. Centering on schemas and typescript is as well. The broadcast channel based leader election also solved the common issue with Pouch where you can’t respond to real-time changes and update your UI if a separate tab was watching too.

It’s wise of them to also support pulling data from GraphQL. I built the first version of NoteBrook on top of couch/pouch, and the biggest pain points were:

1. Pouchdb was before async/await and typescript. The typings can be inconsistent, and it’s very difficult to properly manage the lifetimes of local databases because of the promise chaining.

2. A database technology for Real time replication Needs ACLs on a per-document basis. I built provisioning scripts to manage separate databases per user, as suggested, and it’s very cumbersome. It prevents me from lots of data sharing models like promoting one record to public view, or sharing a record with another user in a different tenant.

3. Personally I feel that the naming / marketing of the product is poor. It does not feel professional ( couch, futon, fauxton, pouch, couchbase) do not feel like professional grade products I can depend on to run a business.

I think it would be good for RxDB to support its own backend technology and move away from PouchDB, and in the meantime, de-emphasize the relationship with PouchDB. The RxDB product doesn’t require you to use PouchDB and at this point it’s got a lot of buzz around it and doesn’t need that tie to PouchDB to continue.

I feel an ideal library would offer a real-time and offline client db, user auth, and billing, tied to a backend database of my choosing (eg. Postgres, MariaDB). The data layer should allow me to specify the durability requirements of each write (down to the specific entity type) and enforce an atomic commit on all related records in one transaction at the level of the most strict entity in the commit. It should be trivial to shard because the syncing protocol the clients use itself should be available to the server. Multitenancy should be built in. It should support real-time ephemeral in memory data for collaboration/chat scenarios, with optional durability that’s eventually consistent. Schema definitions and migrations should be built in to the product.

If you have similar ideas and are interested in working with me on this project, or hearing more, drop me a line.

typingmonkey · on Sept 1, 2020

Actually, the long term plan is to move away from pouchdb. Before the last major release [1] I decided to make RxDB useable with other frontend/backend databases but then found out that I first should refactor some pain points.

[1] https://github.com/pubkey/rxdb/issues/1636

lukevp · on Sept 1, 2020

That’s great to hear! Are you pubkey? If so you should put that in your profile!

I hope you know that I think RxDB is a great lib and going in a good direction! I recommended it to the Supabase team a couple weeks ago. I am less long PouchDB, it feels like an old generation of this tech and not the best way to solve these problems in 2020.

typingmonkey · on Sept 1, 2020

I updated my profile. I do not know supabase, is it similar to hasura? They spend quite some effort [1] to make the RxDB graphql replication working with their backend. Maybe supabase can use a similar wrapper over GraphQL.

[1] https://hasura.io/learn/graphql/react-rxdb-offline-first/int...

adav · on Sept 1, 2020

Cool! I was just thinking that RxDB + Hasura would be pretty nifty for an offline-first MVP and here the legwork has already been done :)

kiwicopple · on Sept 2, 2020

Hey there, I'm the CEO of Supabase - we're pretty new (YC S20).

We've been considering using RxDB. I'm especially excited about the changes you have planned for the next version. I'd love to chat whenever you have time - my email is in my profile.

k__ · on Sept 1, 2020

Ah, nice.

Does it work with AppSync? It seems to me that Amplify/DataStore is a product like RxDB, and they use AppSync as backend.

Edit: I just saw it seems to integrate custom GraphQL, so it probably a direct competitor to DataStore in that regard.

typingmonkey · on Sept 1, 2020

No it does not work with appsync, but it could be hacked into doing that by modifiing the GraphQL plugin. There would be a huge value in replacing the aws datastore mainly because of its missing basic features like complex queries or queries with sort params.

jadbox · on Sept 2, 2020

I also would be highly interested in picking up RxDB if it can integration with either AppSync or PostgreSQL. I guess with the later- would it be possible to use RxDB with the GraphQL plugin against Postgraphile? Feels like a lot of 'magic', but it would be great something like this just worked.

k__ · on Sept 1, 2020

Ah good to know you already looked into that :D

Thanks for the info!

typingmonkey · on Sept 1, 2020

Yes I did :D . I wrote a full master thesis about the differences and tradeoffs of these "realtime" databases.

faizshah · on Sept 1, 2020

Is it available online? Sounds interesting.

k__ · on Sept 1, 2020

Well, DataStore is rather new AND AWS specific, so I simply assumed you wouldn't even care.

but good to know. Reminds me of the fact that I still have to write mine, lol

jamil7 · on Sept 1, 2020

> I feel an ideal library would offer a real-time and offline client db, user auth, and billing, tied to a backend database of my choosing (eg. Postgres, MariaDB).

I've been searching for something like this for a really long time (specifically in the context of mobile but web would also be great). For the last project I settled on Realm and their sync service (not entirely open source). There's a lot of things that get you 90% of the way there but surprisingly little that ticks all the boxes.

lukevp · on Sept 2, 2020

I've also gone through this process and evaluated Firebase, Realm, Pouch/Couch... They all ultimately fall down at some point and you end up in a situation where you have to build some strange hybrid with a separate API to do the features that aren't supported, and you can't really use them as full MBaaS like you'd hope.

Send me an e-mail from the info in my profile, and I'll include you in the alpha test! I would really appreciate some additional feedback.

thomasfedb · on Sept 1, 2020

If you could document your experience with Pouch/Couch that would be excellent. I've considered using the stack for a offline-first SPA but the curve seem so steep, particularly the database creating, proxying, etc required to make Couch behave with many clients.

MuffinFlavored · on Sept 1, 2020

> The concept or RxDB, being able to iterate observables of your change stream

Why can't this be done with a thin wrapper around Postgres pub/sub + triggers?

lukevp · on Sept 2, 2020

RxDB is using RxJS and is meant for building an observable DB on the client side (i.e. it has local data and works offline). If you were using it server-side with Node, it wouldn't be as compelling.

diegoperini · on Sept 1, 2020

Looks amazing. I always loved how Rx composes with IO on the client side, this looks like the missing half. I hope it survives.

For people who are soon to be choosing a stack for their projects, please be careful. In 2015, we adopted RethinkDB which had very similar ambitions as RxDB and was open source. Unfortunately, RethinkDB is now abandoned (kinda). Many promising subscribe-able databases are still experimental. If you are thinking long-term, you may want to consider more boring options.

remon · on Sept 1, 2020

Subscribe-on-update databases are, almost by definition, problematic to use at scale as a generic storage solution. The fundamental problem is that they do not solve many real world problems efficiently enough to warrant the significantly higher running cost. Of course there are exceptions but it'll be hard to launch a MongoDB type of product that uses a subscription only model (see Firebase RTDB and its problems and lack of adoption in this space)

The reason developers gravitate towards subscription/rx based paradigms is because it results in very clean architecture and code. Unfortunately it comes at a cost which increases rather than decreases per client/user when volume increases. Some companies or projects can absorb that cost but not all, and typically less so when the project or its userbase grows.

A subscription based model will do work whenever data changes for each subscriber whereas more traditional pull based architectures only do work when a client specifically needs the data. This can be mitigated to some extent by being micromanaging subscriptions but that kills most of the value of the model.

There are also plenty of issues with this model if multiple clients are allowed to write to the same data which every single example project seems to try and do. There's a reason master-master updates, consensus algorithms and CRDTs all come at significant cost. It's usually hard. And when it's easy you probably don't need it subscribe-on-update in the first place.

joshribakoff · on Sept 1, 2020

I think you’re oversimplifying it. I was on teams that participated in architecture design for this stuff at Twitch. If we had 100,000s of clients polling at the same moment, that would absolutely knock over the server, the canonical “stupid easy” fix is to add jitter, which can be done on either client or server.

In fact the pull based approach is doing work all the time to process all the polling. A push based approach only does work when needed (when data changes).

I’m not saying it’s not new or hard to scale. I’m just saying objectively that push based is more efficient at least in terms of raw data sent down the wire

lukevp · on Sept 1, 2020

I agree, I think the move to a more mature state management on the client in general sort of obviated the needs for these databases. Eg if you’re using redux, or mobx, or Apollo, you can cache offline data directly in your state tree, and can define validation functions locally on state transitions to keep your data valid with your business rules while it’s offline. I still feel there’s a space for this tech, but that it needs to integrate into state management as a first class citizen.

Object definitions, graphs, and validation functions need to be definable in one place and replicated to clients. I’ve never seen this implemented, but if you don’t, you end up either having a schemaless DB or building your schema twice, and same with validation functions.

I shouldn’t have to create functions to observe changes and copy them to and from my redux state, or build my ui around PouchDB’s lifecycle methods. There are middleware providers out there but it’s not first class.

IggleSniggle · on Sept 1, 2020

I haven't tried this because I don't need persistent client state, but couldn't you just take your redux or whatever state tree and shove it into a persistent storage, and restore on load?

mauflows · on Sept 1, 2020

I love the idea of couchdb, but the ecosystem centers too much on pouch, which doesn't have consistent maintenance. If Ibm was smart they'd sponsor it in a big way.

typingmonkey · on Sept 1, 2020

Yes pouchdb does not get the love it deserves. But it is still actively maintained, the last bug I had was timely fixed by the community and merged.

WorldMaker · on Sept 1, 2020

I feel the other way around; pouch is pretty rock solid at this point and I've not had trouble throwing data into it or around between pouch installs. Meanwhile every single couchdb server and installation has vastly different limits from each other, from obvious after the fact limits (but would have been nice to know much earlier in architecture planning) such as very limited attachment sizes and very poor attachment upload/download/sync performance, to much less obvious limits that seem incredibly arbitrary and underdocumented such as pouch and localhost Apache Couch 1/2 are very flexible in terms of database and document naming (supporting a large character set and mostly ignoring things), and Couchbase, Cloudant, and Apache Couch 3 all have very different allowed character sets and are differently strict about certain names.

On top of that hosting any of the server applications is increasingly a pain to setup/manage especially from a PaaS approach rather than an IaaS approach. IBM has only done terrible things to make this worse over time. My requirements when I started was I needed to host on Azure, and Cloudant supported that at the time. IBM of course being IBM and focusing entirely on IBM Cloud and its new name/strategy/plan every ~nine months dropped the Azure support I'm still told I need. But the constant name changes (IBM BlueMix, IBM Cloud, whatever it will be next week, IBM WatsonRain or whatever), plan changes, cheese moving, don't give me a lot of confidence in IBM's Cloud efforts even if I wasn't feeling a lot of pressure from my IT colleagues to get everything (back) into Azure. I'm almost desperate enough to build a Pouch driver for CosmosDB myself at this point.

yen223 · on Sept 2, 2020

I've experimented with RxDB in a side project of mine, a spaced-repetition web app that is designed to be usable offline, with background sync between clients. The offline-first requirement meant using something like IndexedDB to store state client-side, which is where RxDB comes in.

Overall I like the library. It makes the hard task of interacting with IndexedDB a lot more pleasant. The GraphQL support is also a nice touch. Coupled with something like Hasura, it took me like a day to get sync working.

My one criticism with RxDB specifically is that the documentation around writing queries can be a bit hit-or-miss.

shellac · on Sept 1, 2020

Title probably ought to be "a reactive database". Though, used properly, you should be able to reduce latency it doesn't make any guarantees.

typingmonkey · on Sept 1, 2020

Yes and no. Read "realtime" not like "realtime computing" but like a marketing keyword introduced by firebase which means something like live-replication. https://firebase.google.com/docs/database?hl=en

aabbcc1241 · on Sept 1, 2020

Saw RxDB before but it feels too heavy for my small projects. Would try that if it can support leveldb/sqlite as storage backend in the future.

My typical approach is directly logging onto fs with replaying upon restart for small projects, with fs based json store / leveldb for more demanding tasks. For non-trivial scale projects, RethinkDB is a fair choice.

haolez · on Sept 1, 2020

What's the advantage of this over triggers in a relational database? Or even notify/listen in PostgreSQL? (assuming these triggers are connected to the API, of course).

I guess it's probably a matter of scale, but I really don't know.

typingmonkey · on Sept 1, 2020

It runs fully on the client and is offline first. A listener to PostgreSQL will not work when the device goes offline.

snthpy · on Sept 1, 2020

Fair enough, so what about triggers on a sqlite db on the client?

typingmonkey · on Sept 1, 2020

There is a big difference between having a changestream of writes to the database, and observing results of multiple queries.

dezmou · on Sept 1, 2020

I don't see any authentification related suff like pouchDB have, like create an user with password hash, auto handle cookie in the browser, restrict document to user or group.

b1ackb0x · on Sept 1, 2020

But is it really "realtime"? I thought something should have consistent millisecond or even microsecond latencies to be called realtime, but it's probably not possible for JavaScript application.

typingmonkey · on Sept 1, 2020

Yes and no. Read "realtime" not like "realtime computing" but like a marketing keyword introduced by firebase which means something like live-replication. https://firebase.google.com/docs/database?hl=en

joshribakoff · on Sept 1, 2020

If you wanted to be pedantic, nothing is real-time not even what you see with your eyes due to photon latency. The word is clearly being used colloquially

b1ackb0x · on Sept 1, 2020

It is used clearly in a misleading way since this database is nowhere about latencies and time at all.

SirSavary · on Sept 1, 2020

I know few people that use "realtime" to mean it in the "realtime OS" sense. It's usually used in layman's terms to mean "happening live".

RxDB isn't being misleading, it's another valid usage of the term.

remon · on Sept 1, 2020

How is this different from Firebase RTDB? And perhaps more importantly, does it address the scalability and consistency issues associated with Firebase RTDB? Google introduced Firestore to Firebase specifically because RTDB has limited usability for larger real world applications that go beyond "sync device state to DB".

Even the offline first paradigm is fundamentally flawed in general and certainly when it comes to offline data manipulation. Either you can afford to mutate your data on the local device and sync it when possible, in which case you clearly don't need subscriptions to real-time mutations of remote data because within that scope you are the source of truth. Or, you're interested in mutations of real-time data from multiple clients in which case you need to deal with conflict resolution (mutually exclusive changes of data) which is not reliably possible with this model and scalability (linear increase in pub/sub * increasing query cost = exponential scaling).

Are there any large projects or companies that currently have this in production?

typingmonkey · on Sept 1, 2020

There are many mobile apps out there which work great and are offline first. For example the whatsapp client. So saying it is "fundamentally flawed" sound weird to me.

I have no deep knowledge of Firebase RTDB so I cannot do any comparison on that point.

jchrisa · on Sept 1, 2020

In-flight business objects for major airlines is probably considered scale: https://www.couchbase.com/customers/united-airlines

evan_ · on Sept 1, 2020

that's CouchDB, not PouchDB. PouchDB is a JavaScript implementation of CouchDB.

Graphguy · on Sept 1, 2020

That's also Couchbase and not CouchDB. They forked off CouchDB many years ago.

Cloudant (API Compatible with CouchDB) has a number of case studies you can reference for production success with the Couch API/ecosystem.

Cabify - https://www.ibm.com/case-studies/cabify-cloudant

Ticket Fairy - https://www.ibm.com/case-studies/the-ticket-fairy-cloud-clou...

We.Trade - https://www.ibm.com/case-studies/wetrade-blockchain-fintech-...

Disclaimer: I work for IBM Cloud

jchrisa · on Sept 1, 2020

I was referring to the pattern and scaling issue the parent described, which are largely overcome in mature implementations.

jamil7 · on Sept 1, 2020

I'm not following how this is a flawed concept. Any kind of collaborative document editing app is an example of when you might want both?