This is a difficult one to answer succinctly, but I'll leave some quick thoughts.
One of the things that made this tricky is that we weren't just replacing some small system with a single API. We fundamentally changed the underlying architecture of the database and built it around an entirely different paradigm for querying. This is the result of building it around a columnar query engine with a database architecture designed for the cloud and object storage.
So we made a bunch of changes all at once. We didn't start out this way. We wanted to enable some things in the DB like infinite cardinality, tiered data storage, SQL capabilities and a bunch more. When we saw all that, I knew we'd be rewriting the database one way or another.
This was in early 2020. And I figured if we were going to look at some significant rewrite, I'd probably want to do it in Rust. But rewriting your core in a new language is a highly risky endeavor. Honestly, if you can figure out a way to do it iteratively, that's what I'd recommend. A big bang rewrite is the worst possible thing you can do. And it's super stressful.
But... I didn't see a way around that. So we started small with me and one other person working on it starting around March of 2020. Then we added another team member in May (hey Andrew). The three of us spend the next 6 months treating it as a kind of research project. We evaluated building it around existing database engines (like DuckDB and Clickhouse) and looked at what tools we'd want to use.
By August of 2020 we'd settled on building it in Rust with Apache Arrow, Apache DataFusion, and Parquet as the persistence format. I announced this crazy plan in November of 2020 at our online conference and said we were hiring.
Over the first 3 months of 2021 we formed a team around it of 9 people. Everyone else in the company was still focused on everything else we were doing. So the majority of our engineering efforts were focused elsewhere. I think this was critical. Actually, it was quite difficult to have 9 people this early in the project. We hadn't originally planned to scale up that quickly, but we had a flood of great people interested in joining the project (new hires and internal transfers) that we decided to go for it.
Over the next few years we kept this small group working on the new DB while everyone else was working on previous versions of the product. In mid-2022 we were far enough along to bring up the database alongside one of our production environments and start mirroring workloads onto the new DB. This was critical over the following 6 months or so.
We started getting more people from the engineering team looped into the effort in the 4 months leading up to the first launch.
Starting with a small team and scaling up as you get farther along is critical, I think.
There's so much more I could probably write about this, but I'll leave it at this for now :)
One of the things that made this tricky is that we weren't just replacing some small system with a single API. We fundamentally changed the underlying architecture of the database and built it around an entirely different paradigm for querying. This is the result of building it around a columnar query engine with a database architecture designed for the cloud and object storage.
So we made a bunch of changes all at once. We didn't start out this way. We wanted to enable some things in the DB like infinite cardinality, tiered data storage, SQL capabilities and a bunch more. When we saw all that, I knew we'd be rewriting the database one way or another.
This was in early 2020. And I figured if we were going to look at some significant rewrite, I'd probably want to do it in Rust. But rewriting your core in a new language is a highly risky endeavor. Honestly, if you can figure out a way to do it iteratively, that's what I'd recommend. A big bang rewrite is the worst possible thing you can do. And it's super stressful.
But... I didn't see a way around that. So we started small with me and one other person working on it starting around March of 2020. Then we added another team member in May (hey Andrew). The three of us spend the next 6 months treating it as a kind of research project. We evaluated building it around existing database engines (like DuckDB and Clickhouse) and looked at what tools we'd want to use.
By August of 2020 we'd settled on building it in Rust with Apache Arrow, Apache DataFusion, and Parquet as the persistence format. I announced this crazy plan in November of 2020 at our online conference and said we were hiring.
Over the first 3 months of 2021 we formed a team around it of 9 people. Everyone else in the company was still focused on everything else we were doing. So the majority of our engineering efforts were focused elsewhere. I think this was critical. Actually, it was quite difficult to have 9 people this early in the project. We hadn't originally planned to scale up that quickly, but we had a flood of great people interested in joining the project (new hires and internal transfers) that we decided to go for it.
Over the next few years we kept this small group working on the new DB while everyone else was working on previous versions of the product. In mid-2022 we were far enough along to bring up the database alongside one of our production environments and start mirroring workloads onto the new DB. This was critical over the following 6 months or so.
We started getting more people from the engineering team looped into the effort in the 4 months leading up to the first launch.
Starting with a small team and scaling up as you get farther along is critical, I think.
There's so much more I could probably write about this, but I'll leave it at this for now :)