The founders of Convex are some of the most talented developers I know.
The CTO worked with Turing-award winner Barbara Liskov on Viewstamped Replication Revisited [1], the revision of the pioneering consensus protocol, and later the founding team pulled off the migration of Dropbox from S3 to their own custom storage stack called Magic Pocket [2].
The deterministic simulation testing techniques [3] they used to develop Dropbox's revised sync algorithm are still state of the art today (few systems are designed to be tested like this), and the online verification techniques [4] they used to verify their production systems are vital to building large-scale systems safely.
I can't think of a stronger technical team to do something like Convex, and couldn't imagine a better devops team to be running the backend.
Thank you! We have a really talented team working with us but we're very actively growing. If anyone is looking for a job in San Francisco: https://www.convex.dev/jobs
i'm also equally impressed by their background, and puzzled that they choose to target frontend developers rather than backend developers (who presumably they would be much better able to solve problems for?)
Hi Shawn. I don't remember you bringing this up when we spoke in person recently, but it's a great question.
In our opinion, the very best way to evaluate yourself as a backend developer is how directly you solve problems for frontend developers. We believe in the merit of customer obsession, and the customers are not buying queues. They're buying the product as they see it: its surfaces, workflows and experience. And that's what the frontend developers, PMs, and designers are creating.
Historically, all these backend technologies that only interoperate with each other are only useful so long as they make product creation and improvement easier, more reliable, etc. We strongly believe as soon as you don't need them anymore, you should toss them out. They're complex and not proprietary to your product.
Convex (and serverless in general) is just the next step in providing more powerful abstractions that allow companies to double down on frontend engineering (work that adds product value) instead of reimplementing the same backend/devops plumbing the users never see (work that, at best, merely sustain product value).
So, given that we recognize this need, I respectfully disagree that we're not well equipped to solve these problems for frontend developers! Most of our team's recent our work has been designing synchronization and storage platforms to enable product development, including work on web, desktop, and mobile libraries/SDKs. We feel like we have both a lot of empathy and experience for this space, and we're very proud of our early product and the enthusiasm from the web dev community.
hey jamwt! no i wasn't thinking about this at the time, someone mentioned it when discussing Convex and I thought "huh, thats an interesting way to look at market-founder fit" so I just tossed it out there.
great answer :) if you can solve DX for frontend you're solving it for everyone.
I was the one of the original TLs of Google Photos, specifically on Android and eventually managed all of the frontend teams for Google Photos. So I hope it's not a stretch to say that I have deep empathy for frontend problems. We also have great frontend oriented folks on the team.
I also worked with jamwt at a previous company (Bump) doing frontend work. When we worked together, I remember the acceleration of our ability to execute when the backend people were closely involved in our data syncing problems. Heck jamwt and team even wrote most of the initial client syncing code. When backend and frontend folks work closely together to solve data problems you can make magical things happen. In the end we're just product oriented engineers trying to ship delightful experiences.
Thanks, but no. Never. I will keep doing it server side:
$messages = DB::select(
'SELECT * FROM messages JOIN users ON users.id=messages.user_id'
);
It is amazing with how much cruft developers are willing to deal with these days. And how much CPU cycles get burned for nothing, as the Firebase example fires one query per message to get the user. This would be bad enough on the server. But with the Firebase example, it would also create a client-server http roundtrip for each message. Mind-boggling.
Technically that's how you load data from Firestore specifically, but yes, that's how it's done. I started working with a team who uses Firestore, and let me tell you, I've never hated a database so much. No disrespect to anyone from the Google team, but I cannot fathom why they made the decisions they made.
1) You are severely limited in how you can query. The list of limitations is too long to recount here, but querying is nearly worthless.
2) The database design strongly pushes you towards nesting collections, making side effects cleanup a disaster, especially as a database grows in complexity.
3) You cannot sort on a field without creating an index first. I get why creating an index is a good idea, but I can't even write a simple analytics script without indexing the fields first.
4) It gears itself towards frontend developers who don't know how to write a backend, and encourages bad practices for them. One example: Firebase lets you manually edit production data very easily from within their dashboard. Like it's treated almost like it's a CSV.
5) The recommended development approach is to directly query the DB from the frontend, with no server in between. This means any data security has to be implemented in a separate DB rules document. The syntax and structure of this doc is super limited and often results in a giant, unmaintainable file.
6) Firestore cannot count. As in, you literally cannot query the number of records in a collection. If you want that value, you have to store it as a separate field in the collection, and then update that value each time you add and remove a doc. MADNESS.
Yes, these are all limitations. But the advantage that Firestore has is significant, and for some applications it's worth accepting those limitations:
* Firestore is truly a "fire and forget" datatabase that scales without effort or maintenance. If your app works for 500 users, it will work for 500 million. Without a devops staff.
Yes, Firestore (aka Cloud Datastore) feels crippled compared to running aggregations and joins on an RDBMS. If your data and load fit on a single node Postgres, by all means use it, that's a great solution! When your requirements exceed that, you're in a different world. You can look at Spanner ($$$) or its clones (operational load, maturity). Or you can do what I did, and run the firestore/datastore as a master database and replicate data to other stores (eg BigQuery) for analytics.
Firestore's sweet spots are very small (eg, you have many microservices and want simple cheap zero-maintenance persistence) or very large (where scaling and availability would give you headaches anyway). In the middle, traditional RDBMSes are great.
I also just joined a team using Firestore and I feel exactly this same way. It absolutely blew me away how bad Firestore is for any kind of conventional CRUD app use cases. I have never worked with a database less fit for purpose, and I worked with MongoDB in 2012.
Yup. One of our products was built on Firestore and I expect 90% of my time on that project is spent trying to turn Firestore into a more featureful product to meet the application's requirements. You can push it pretty far if you have the developer resources, but it often seems ludicrous that someone is willing to spend that much to build what other database solutions provide out of the box for free.
> It gears itself towards frontend developers who don't know how to write a backend, and encourages bad practices for them. One example: Firebase lets you manually edit production data very easily from within their dashboard. Like it's treated almost like it's a CSV.
This is truly spot on. I'm one of those devs who couldn't write a good backend when I first began coding out apps (as a hobby). Started out with firebase because that was the tool used by most YouTube videos and Medium blog posts introducing people to app development. In the end, I eventually had to learn other technologies such as using Elasticsearch, DynamoDB (which I would also stay the hell away from) and PostGRES.
Thankfully Supabase decided to stay away from the NoSQL format, so learning how to get started with PostGRES was made smoother.
Looking back I don't even know why I jumped into using Firebase, since I do have a pretty good footing in SQL querying. I don't know why Google isn't bothering with an Elasticsearch-like NoSQL solution for Firebase. And quite frankly, I would only use Firebase for Auth and RTDB for basic database stuff. If I had known this earlier, I might have saved months of learning Firebase Cloud Firestore crap online.
It suffers from trying to go upmarket with something that was never meant for ultra-serious, large, production apps. Look at the original Wired article [1]: every story of the founders' pitching to customers is about being able to quickly zero-to-one a concept. It seems great for internal apps, quick prototypes, and simple things and that's about it.
Yep. And what’s worse is that Firestore is notoriously difficult to migrate away from (at least from what I’ve heard, haven’t yet tried it myself). So if you take the prototype and turn it into an mvp, you’re kinda stuck with it.
If the goal is to create a prototyping DB, ideally there’d be some nice off ramping or migration tools for when the app needs to become production ready.
Great list! I think we've had similar frustrations with firestore.
#3 is not really correct. You can sort on any single field but if you have multiple fields, then yes, you must create an index.
#4 I partially agree. The web dashboard makes things I don't want to do (accidentally edit or delete a field/document) dangerously easy, and things I do want to do (copy the contents of a document, save the results of a query, copy the text of a field) exceedingly difficult. The truth is that firestore is geared toward people who want an easy way to get near real-time data synchronization. It really sacrifices almost everything else.
The number one most annoying thing to do with firestore is work with their security rules.
This has nothing to do with business. It's fundamental limitations from the technical architecture.
It does provide a lot of serverless scalability for what it offers, but it's the classic case of optimizing for a situation that won't happen for 99% of their apps.
That's a good guess, but no this is a syntax limitation. For instance, if you wanted to say "I want records where record.score >= 10 and record.score <= 100", you can't do it, because you can't filter on the same field twice.
This is not correct. You can filter on the same field twice. What you can't do is filter on different fields, eg:
I want records where record.score >= 10 and record.date <= 2022-01-01
The first form (single field) is a simple range query on a single index - that's what Firestore is optimized for. The second form (with different fields) potentially requires walking the near-entirety of both indexes looking for matches, and therefore has unbounded time and computational requirements.
Firestore is designed so that you can't do things that don't scale. Sometimes that sucks, especially when you know that the data volume for that query will always be "reasonable". But the limitations are not arbitrary.
I never assumed they were arbitrary, it's more that they don't fit the product. This DB architecture fits some use cases, I'm sure, but the decision should be made by an experienced person. In the real world, what happens is that early stage startups get hooked on the whole "fast and scalable prototype" marketing copy, then hire some FE devs that build the entire app on Firestore because they don't know any better. 99 times out of 100, a startup's use case is not going to be a good fit for Firestore, but Firebase has no vested interest in informing the customer about this little detail. I've only worked with Firebase for 1 year and I've now seen this exact situation happen 3-5 times.
1) That’s a JOIN and Firestore is not unique among NoSQL databases for being bad at it.
2) That’s the code you’d use in a web browser to load Firestore data. That code also handles all of the API and Authentication code you’d normally put in front of another DB. It’s serverless. So that is a fully functional snippet, your SQL example needs an API layer and frontend deserialization code.
3) It does not make a new HTTP connection per request. It uses a long-lived gRPC connection.
2: The code does not handle authentification. The code literally just says "Give me this, give me that...".
3: It is still a "a client-server http roundtrip" for every query. It does not open and close the http connection every time. But it sends a "GET / HTTP ..." request for every query. With hostname, accepted formats, encodings etc.
The code does not handle authentication because it does not need to - all data access and auth rules happen within Firestore/Firebase. If a client requests data they do not have access to, they won’t get it.
...and you will show your users stale data, which is a non-starter for a chat app. You will then need to implement some custom push/invalidate mechanism, based on a custom interpretation of the DB event log.
The entire point of the Fire* style of database is that these trade-offs are not worth it in the long run, that few development teams have the skill and time to implement this themselves, and that databases can and should solve this for you.
I have little love for Cloud Firestore, it's a trash fire riddled with poor decisions, but if you don't even understand what the problem is with that SQL query, you don't understand the expectations users have nowadays of front-end applications.
Why would a SQL query show stale data? Inserts and selects are fine, the only thing needed is signaling. You can use Firestore for that, or just a separate thin layer on a much more capable database backend.
> "that few development teams have the skill"
This is the fundamental problem. There's no magic answer to a lack of skill.
Because the way that you will ultimately have to scale something like PostgreSQL for example is going to end up with eventual consistency which Firestore doesn’t have to deal with.
There's no reason for eventual consistency. Firestore is backed by BigTable and Spanner (both strongly consistent just like Postgresql). There's nothing magic about this - it just provide a pub/sub channel on the watched keys and automatically does a SELECT loop in the background once an INSERT triggers that key.
You can do the same thing on a simple by selecting by a `chatid` in a table that'll get the latest messages/inserts. Again, the only thing needed is the pub/sub layer, not an entirely different database.
Ultimately you're kinda just comparing preference for SQL syntax over JS in that case.
The JS example is also async (always adds visual complexity to code calls), whereas the PHP example is blocking. That SQL query doesn't look particularly efficient either...
There's plenty of good solid criticisms of Firebase & NoSQL but these aren't it. e.g. your HTTP roundtrip argument has been debunked by sibling commenters, but client-side DB calls is still riddled with adjacent problems & ultimately adding more complexity to the server-side avoid the need for client-side DB access is often worthwhile.
Most apps I build have a rdbms main data store but I use firestore for things like real time interactions or more recently real time commenting.
It would be a bit more time consuming and challenging to set this up across iOS, android, and web. A lot more time would be spent figuring out the system to make this work well and without issues at scale. On the other hand… It takes like 20 min to set up a real time feature with firestore that works across all platforms. It scales well and works great for what it is.
If people try to use it in place of a rdbms or complex use cases, they’re going to have a bad time. But for those who can utilize their tool belt effectively, it’s awesome.
I definitely would recomend keeping it server side if you don't need any other firebase features. You don't use firebase as a normal database. You use it for its realtime featureset.
This is a strawman and no one would actually design a messaging app using a NoSQL db like that. Typically one would employ a strategy to fan-out data such that the message document has everything needed to render a message in the UI.
Firestore does provide global consistency, so the following quote is incorrect:
> In Cloud Firestore, the data on the client are loaded from the database at different points in time. Even if you listen for realtime updates, results from separate queries will not remain in sync. This creates consistency anomalies and bugs in your app.
I built a fairly involved mobile application that used Firestore and cloud functions. The criticisms of the two made in the article are very fair. I also think there are even more significant ramifications of the issues touched on which result in horrible problems for developers.
I have a lengthy list of complaints about both firestore and the firabase flavor of cloud functions, however, I will say that the ease of getting started with the firebase suite is unmatched, in my experience. Compared to any product on AWS or raw GCP, it feels like an actual product with people thinking about their users.
There is also a large community around the firebase products, the main example being Invertase.io which provides amazing open source native clients for firebase.
Regarding Convex specifically, the approach of writing queries server side seems great. The docs aren't clear (to me) about whether the queries only send incremental state changes. I would assume and hope that is the case.
In this bit[1] it seems like the function needs to execute the entire query again, which could become a significant performance issue.
> Later on, if any mutation inserts, updates, or deletes a record that overlaps with the read set, Convex knows it needs to recompute the listMessages query. If the result of listMessages changes, the new value is synced to the client and the component rerenders.
FWIW Firestore is able to send incremental updates and not just rerun the query when data changes. There is a complex system for broadcasting individual documents as the commit happens. I posted a bit about this a long time ago: https://news.ycombinator.com/item?id=26910411
hi! sujay from convex here. I remember reading about your "reverse query engine" when we were getting started last year and really liking that framing of the broadcast problem here.
as james mentions, we entirely re-run the javascript function whenever we detect any of its inputs change. incrementality at this layer would be very difficult, since we're dealing with a general purpose programming language. also, since we fully sandbox and determinize these javascript "queries," the majority of the cost is in accessing the database.
eventually, I'd like to explore "reverse query execution" on the boundary between javascript and the underlying data using an approach like differential dataflow [1]. the materialize folks [2] have made a lot of progress applying it for OLAP and readyset [3] is using similar techniques for OLTP.
Yeah I like the model of the full function becomes the point of reactivity. It's very different than Firestore (which is trying to sync data to the device so it can run locally). It does allow Firestore to have some better offline behavior by computing queries with cached data locally, but there are benefits (which the article points out) to remote execution too.
I wish y'all the best I do think this is a super cool product!
I'm not sure what your definition of smart polling is but there is no polling going on here - the query only reruns when the data dependencies change server-side due to a subsequent mutation.
Another key distinction is that "query" here doesn't refer to a database read, it could be a complex function containing multiple reads, relatively-arbitrary compute, etc.
We track the readset for any active subscription. When a new write transaction commits we compare the writeset for the transaction against any active subscriptions. If there's an intersection then it's likely that query will have been invalidated, so we rerun it.
Note that there are false-positives here, i.e., it's possible for the inputs to a query function to change without the outputs actually changing, but this is not a significant issue in practice.
I was an early developer at Firebase. I think we made Firebase so easy to use and never spoke on about the technicals that the whole software ecosystem now underestimates the complexity involved. I see various Firebase competitors asserting various "mistakes it makes" without really understanding what it delivers, which is understandable because we never marketed it like that because we spoke only about how it can help you build easier.
The idea that n queries instead of a join is slow is not as true as you would think. Firestore supports streaming and pipelines at its core, and can reuse cache across operations. At the end of the day, the data goes over a narrow network channel. If you can saturate the channel, and don't leave any gaps, what's the performance difference if the data comes from a single query or many that are back-to-back. The data is transferred to the client either way. Both Firebase databases are pipelined, so this "many round trip" argument is not a decent argument if the client can issue the queries without waiting for responses (such as the code in this article).
The other is consistency levels and correctness. I constantly see devs call Firebase an eventually consistent database which is wrong, its causally consistent [1], and this makes a huge difference when trying to do OLTP. The offline capabilities are built on the consistency primitives, and it's the only way it can work. So while this convex article is banging on about "End-to-End Correctness Philosophy", they miss the most important quality of correctness, and if they are not careful, will miss the required engineering, and then be unable to deliver an offline cache over real-time streams. I see this playing out with Supabase, I warned them personally before they got into YCombinator that what they were building was not causally consistent. Since then, they have had to rearchitect their real-time features after shipping them. (I have not reviewed their latest design yet so I have no idea whether they have it right yet).
Many things sucked about Firebase. The bespoke security rules and the lack of views. So Convex is on the money shipping functions on the backend. I think Supabase is shipping competitors' mistakes with row-level security language. Personally, I think Firebase's mistakes can be fixed with the addition of an open-source Firebase server [1], as the clients are already open source and the mistakes are all to do with just the server. The real tech was always in the clients anyway (offline cache, connection management, operation queues).
It will be interesting to see if building expressly for React is a good idea. Firebase shipped many adapters, like https://github.com/FirebaseExtended/reactfire, using the "thin-waist" principle of not over-fitting. But Javascript technology moved from callbacks to async while Firebase was in the field, so the current API is not now idiomatic. But convex is setting itself for even more ecosystem fragility, what if React changes API or falls out of favor? This is a big risk! I hope they can roll with whatever happens!
Thanks for the feedback. We think React is the right community to target right now but the end state for Convex is certainly not limited to React.
The specific concern with regards to multiple round trips is scenarios where the client needs to send multiple serial requests to render a web view, e.g., having to fetch a list of posts first before knowing which post ids to fetch comments for. These request waterfalls can lead to high page load times for many web apps.
Consistency-wise Convex supports full serializability. The main point this article is making with regards to consistency however is providing tools to ensure not only that the database is consistent but that the data rendered to the end user is also a consistent view. I talk a little more about that here: https://youtu.be/B9aeddqwVas?t=446
Btw we are huge fans of Firebase and think it did a ton to advance the industry forward. Thanks for your work on it!
> Consistency-wise Convex supports full serializability.
As discussed elsewhere both Firebase DBs too and also provide clients with a causally consistent snapshots too, so you are not actually adding anything over the Firebase offering. Firebase has optimistic updates too and can use clientside persistent storage (I think convex has just an in-memory cache at the moment).
> e.g., having to fetch a list of posts first before knowing which post ids to fetch comments for.
Yes this does perform badly. Though this type of join is also an IO hog on a relational DB too, just it's not so visible.
Convex queries remind me of RethinkDB's query language (ReQL). RethinkDB also had the horizon project[0], looking at todays projects like Convex, Thin[1] and supabase[2], kinda makes me wonder what the RethinkDB guys could have build.
I love Firebase for what it provides "in the box". Auth, noSQL database, cloud functions a ton of added features like real time updates an a lot more. My use case is mostly for quickly developing personal, and "side apps" for work that complement our primary applications. It excels at that.
There's a lot of comparisons in here about querying in here those are fair-ish / true-ish, I do think that's also missing what Firebase is "about", at least that's not what I've found it useful for.
If you're thinking about weighty sever side business logic and SQL queries ... yeah Firebase isn't built around SQL, it's not built to optimize big business logic and SQL like queries (granted, there are still ways to do these things).
Firebase isn't likely replacement for your enterprise CRUD app that carries with it a ton of logic for each and every possible CRUD action. And you do have to stop and think about how you are going to structure your collections, docs and so on.
Having said that I think NOT being the solution for complex CRUD apps is what makes Firebase pretty great.
I think the idea that business logic on the server being kinda wonky on Firebase is true generally. At the same time:
"Once again, with Cloud Firestore you could put this code into Cloud Functions and use them as a business logic layer, but it's messy and requires giving up some of the platform's other features like reactivity and optimistic updates."
I think that statement is deceptively broad. Just using cloud functions doesn't eliminate optimistic updates and so on, the effect is limited to when or how you use them...
Actually I think Convex team are right, synchronous functions or database views would be huge improvement to Firebase.
Firebase Functions are asynchronous, and can be reordered relative to DB operations. They would be much more powerful if synchronous and on the wrtie path (so you could put auth logic in them instead of being forced to use the bespoke security language).
Firebase functions are at-least-once though, so they are pretty good but could be better.
Absolutely. But if you decide to use Cloud Functions with Firebase you have to give up the reactivity and automatic optimistic updates that Firebase normally provides.
Part of the difficulty of writing about Firebase is that it's actually a whole collection of tools with a host of complex tradeoffs. But that's also a lot of the difficulty in using it as well! Understanding when to use Cloud Firestore vs Realtime Database and when to load data directly vs via Cloud Functions are all complex questions with unclear answers.
To be clear, I think it's great for Firebase to have competition (same feeling as e.g., supabase). But the "you can only load documents" argument doesn't seem technically correct to me.
Except for maybe in the sense that convex is built on functions from day one so the docs and platform will lead you there by default rather than loading the documents which is non-optimal.
But that's a different argument than:
> With Cloud Firestore, the client interacts with its data by loading documents straight from the database.
I would hope that the authors know that you can run Firestore queries on the server just like Convex apparently can, either from Firebase/Google Cloud Functions, from Google Cloud Run, or from any other server.
Author here! I think the comparison between Convex and Supabase is quite similar to Convex vs. Firebase! Supabase also encourages developers to load individual SQL queries from the client, supports edge functions without having a reactivity story for them, etc. Superbase is designed to be a Firebase alternative and appears to be taking most of their high-level approach.
While I like the idea of reactive server side queries, the focus on React and Javascript/Typescript really turns me off.
I use Flutter for my front end, and I simply haven't had any of the trouble they say people are having with Firebase and React. Perhaps the problem is not the Firebase half of that combo.
I do use Typescript for Cloud Functions in Firebase and it IS slower than manipulating documents directly. I'd love having a more responsive mutation path with server-side business logic. But I'd love it more if I didn't have to write it in JS/TS.
I think Supabase is fine. At least you are writing security logic in Postgres instead of Cloud Firestore Authorization Language. Last time I tried the Cloud Firestore (2019) security language, it was by far the most complex part of the app, and the local simulator behavior diverged quite a bit. I expect the Postgres bits of Supabase will be reproducible locally without the Supabase bits.
It's impossible to give a proper recommendation without additional details but Firebase will do just fine. There are tons of YouTube tutorials and Medium posts about how to do that.
The CTO worked with Turing-award winner Barbara Liskov on Viewstamped Replication Revisited [1], the revision of the pioneering consensus protocol, and later the founding team pulled off the migration of Dropbox from S3 to their own custom storage stack called Magic Pocket [2].
The deterministic simulation testing techniques [3] they used to develop Dropbox's revised sync algorithm are still state of the art today (few systems are designed to be tested like this), and the online verification techniques [4] they used to verify their production systems are vital to building large-scale systems safely.
I can't think of a stronger technical team to do something like Convex, and couldn't imagine a better devops team to be running the backend.
[1] https://pmg.csail.mit.edu/papers/vr-revisited.pdf
[2] https://www.wired.com/2016/03/epic-story-dropboxs-exodus-ama...
[3] https://dropbox.tech/infrastructure/rewriting-the-heart-of-o...
[4] https://www.oreilly.com/library/view/velocity-conference-new...