InfluxDB 3.0 System Architecture

mbell · on June 27, 2023

We used InfluxDB back in the 0.8/0.9 days and it worked really well, scaled nicely with the large number of metrics we were storing.

The switch to a tag based architecture in 1.0 completely broke the database for our use case, it could no longer handle large metric cardinality. Things improved a bit around 1.2, but never got back to something usable for us.

We ultimately moved to using clickhouse for time series data and haven't had to think about it since.

Where is influx at now? Can they handle millions of metrics again? What would bring us back?

pauldix · on June 27, 2023

InfluxDB 3.0 is built around a columnar query engine (Apache DataFusion) with data stored in Parquet files in object storage. Eliminating cardinality concerns was one of the top drivers for creating 3.0. I mention some of the other big things we wanted to achieve in some other comments in this HN thread.

InfluxDB 3.0 is optimized for ingestion performance and data compression, paired with a fast columnar query engine. So we can ingest with fewer CPUs, less RAM and reduce storage cost because it's all compressed and put into object store. And we support SQL now (in addition to InfluxQL) with fast analytic queries.

We don't have open source releases yet (that's for later this year), but we have it available in the cloud as a multi-tenant product or dedicated clusters.

zX41ZdbW · on June 28, 2023

It sounds like reimplementing ClickHouse, but worse. Sorry for pointing this out, but isn't it?

cannonpalms · on June 28, 2023

It actually sports much better ingest performance versus ClickHouse, and query performance is close and improving.

zX41ZdbW · on June 29, 2023

This is a bold claim.

ilyt · on June 27, 2023

We were stuck on 1.x for a long time. Downsampling seemed to be eternally broken (or rather not performant enough) regardless of versions so we wrote our own downsampler doing it on ingestion (in riemann.io).

And as world seemed to converge on Prometheus/prometheus-compatible interfaces we will probably eventually migrate to VictoriaMetrics or something else "talking prometheus"

InfluxQL was shit. Flux looks far more complex for 90%+ things we use PromQL for now so it is another disadvantage. I'm sure it's cool for data science but all we need to do is to turn some things to rate and do some basic math or stats on it.

> Can they handle millions of metrics again? What would bring us back?

we had one instance with ~25 mil distinct series eating around 26 GB RAM. I'd suggest looking into VictoriaMetrics. Mimir is a bit more complicated to run and seems to require far more hardware for similar performance, but has distinction (whether that's advantage or not, eh...) of using object store instead of plain old disk which makes HA a bit easier.

ithkuil · on June 27, 2023

Yes, influxdb 3.0 uses a new columnar store engine (IOx) that offers "unbounded cardinality". See more at https://www.influxdata.com/blog/intro-influxdb-iox/

(Disclaimer: I work at InfluxData)

Shish2k · on June 27, 2023

Separate ingest / compaction / query sounds pretty useful - literally just this morning I needed to learn about docker memory limits, because somehow querying 10 days of data OOM’ed my 32GB server… (And my entire database, holding a little over a year of data, is only 33GB on disk o_O)

(Now it’s running in a container, and the container crashes and restarts whenever somebody opens a medium-sized dashboard, causing a few seconds of lost metrics - so being able to OOM-kill the query-daemon while leaving the ingest-daemon running sounds like a step forwards :) )

valyala · on June 28, 2023

Separating the storage layer from data ingestion and query layer is very smart idea, which is used behind VictoriaMetrics cluster architecture [1]. It makes the cluster more reliable when either data ingestion or query side has temporary issues. This architecture also allows independent scaling of data ingestion, query and storage layers.

[1] https://docs.victoriametrics.com/Cluster-VictoriaMetrics.htm...

louwrentius · on June 27, 2023

What is the future of the Flux query language now that SQL support is added? I've observed that on Grafana.com, most InfluxDB dashboards being shared are all InfluxQL based, and Flux-based dashboards are virtually non-existant.

Personally, if we use InfluxDB in combination with Grafana, the InfluxQL language was quite easy to use in terms of discoverability. With Flux, there's quite a steep learning curve. And mostly clicking something together in the Influx Query builder and copy/pasting it in Grafana, not a fan, although it does work fine.

pauldix · on June 27, 2023

For new users we are suggesting they use either InfluxQL or SQL. Both are supported natively in 3.0 and we'll continue to support them. We were able to bring InfluxQL support because of its similarity to SQL. We were able to build an InfluxQL parser in Rust and have that converted into DataFusion query plans (the SQL engine we use).

Flux is an entire language and runtime so we weren't able to implement it natively in Rust. We'll continue to support it for our customers, but the path forward long term is InfluxQL and SQL.

We also submitted a FlightSQL plugin to Grafana that works with InfluxDB 3.0. So either the InfluxDB 1.x plugin using InfluxQL or the FlightSQL plugin work.

louwrentius · on June 28, 2023

Thank you for sharing, very informative!

peterdekr · on June 27, 2023

Is there any news on the self hosted opensource version of 3.0? Would love to try the Rust based engine and test it's performance.

pauldix · on June 27, 2023

We'll be releasing an alpha of InfluxDB 3.0 open source later this year. I'm actually personally on the hook for this one while our engineering team is focused on our cloud and on-premise commercial offerings.

So it's planned, but it's going to take a little time.

ericb · on June 27, 2023

What's left to do for the release? Also, the alpha aspect seems surprising--isn't influx running this in production now?

pauldix · on June 28, 2023

The open source build will be using the libraries of what we have in our production system, but it'll be a single server release, which isn't something we're running in production. So while many components are things we've been running production for a while, the server process as a whole will be new.

ilyt · on June 27, 2023

Interesting that they come to pretty much same infrastructure as victoriametrics and mimir

https://docs.victoriametrics.com/Cluster-VictoriaMetrics.htm...

https://grafana.com/docs/mimir/latest/get-started/about-graf...

pauldix · on June 27, 2023

Hello, one of the authors here, happy to answer any questions!

simonz05 · on June 27, 2023

Paul, impressive work on the rewrite.

Can you shed light on the biggest challenges and steps you (and the team) took to overcome them to succeed with the rewrite? We often hear about how major rewrites can fail or be massively delayed, but you seem to have succeeded.

pauldix · on June 27, 2023

This is a difficult one to answer succinctly, but I'll leave some quick thoughts.

One of the things that made this tricky is that we weren't just replacing some small system with a single API. We fundamentally changed the underlying architecture of the database and built it around an entirely different paradigm for querying. This is the result of building it around a columnar query engine with a database architecture designed for the cloud and object storage.

So we made a bunch of changes all at once. We didn't start out this way. We wanted to enable some things in the DB like infinite cardinality, tiered data storage, SQL capabilities and a bunch more. When we saw all that, I knew we'd be rewriting the database one way or another.

This was in early 2020. And I figured if we were going to look at some significant rewrite, I'd probably want to do it in Rust. But rewriting your core in a new language is a highly risky endeavor. Honestly, if you can figure out a way to do it iteratively, that's what I'd recommend. A big bang rewrite is the worst possible thing you can do. And it's super stressful.

But... I didn't see a way around that. So we started small with me and one other person working on it starting around March of 2020. Then we added another team member in May (hey Andrew). The three of us spend the next 6 months treating it as a kind of research project. We evaluated building it around existing database engines (like DuckDB and Clickhouse) and looked at what tools we'd want to use.

By August of 2020 we'd settled on building it in Rust with Apache Arrow, Apache DataFusion, and Parquet as the persistence format. I announced this crazy plan in November of 2020 at our online conference and said we were hiring.

Over the first 3 months of 2021 we formed a team around it of 9 people. Everyone else in the company was still focused on everything else we were doing. So the majority of our engineering efforts were focused elsewhere. I think this was critical. Actually, it was quite difficult to have 9 people this early in the project. We hadn't originally planned to scale up that quickly, but we had a flood of great people interested in joining the project (new hires and internal transfers) that we decided to go for it.

Over the next few years we kept this small group working on the new DB while everyone else was working on previous versions of the product. In mid-2022 we were far enough along to bring up the database alongside one of our production environments and start mirroring workloads onto the new DB. This was critical over the following 6 months or so.

We started getting more people from the engineering team looped into the effort in the 4 months leading up to the first launch.

Starting with a small team and scaling up as you get farther along is critical, I think.

There's so much more I could probably write about this, but I'll leave it at this for now :)

capital_guy · on June 27, 2023

Thanks for stopping in. I've been seeing a lot of InfluxDB 3.0 content in the past few days. It would be helpful for me at least to see more comparison between 2.x and 3.0? Not sure if there is a changelog or list of things that were added/deleted/are now incompatible between versions. Cheers

pauldix · on June 27, 2023

The differences in 2.x and 3.x are quite significant. The 3.0 database was a ground up rewrite in a new language (v1 and v2 were in Go, v3 is in Rust).

InfluxDB v2 was all about the Flux language and a much broader set of API capabilities along with an integrated UI.

For 3.0 we focused on the core database technology. We were able to bring the 1.x APIs forward, which means 3.0 supports both InfluxQL and SQL natively. We were only able to add Flux support through a separate process and a lower level gRPC API that the two use to communicate.

The underlying database architecture is also completely different. v1 and v2 are essentially an inverted index paired with a time series store. v3 organizes data into larger Parquet files and pairs that with a columnar query engine (Apache DataFusion) to execute fast queries against it.

Me or someone on our team should probably write a detailed post about the underlying database architecture to highlight the differences between the versions.

We built 3.0 mainly to accomplish some things that we were unable to deliver in v1 or v2: * Unlimited cardinality * Tiered data storage * Fast analytic queries * SQL compatability * Bulk data import and export (coming soon to v3)

Then there are the systems architecture changes we made highlighted in this blog post. v1 InfluxDB was a monolithic database that had all these components in one. The v3 design allows us to scale ingest, query, and compaction separately, which is something that kept coming up in larger scale use cases.

whinvik · on June 27, 2023

This is very impressive. Did you use a library for Arrow or did you build your own? We have sometimes faced issues when using the PyArrow library.

pauldix · on June 28, 2023

It's built around the arrow-rs library, which we've contributed to significantly: https://github.com/apache/arrow-rs