More

yencabulator · 2026-02-12T20:51:43 1770929503

Generally Parquet files are combined in an LSM style, compacting smaller files into larger ones. Parquet isn't really meant for the "journal" of level-0 append-one-record style storage, it's meant for the levels that follow.

aynyc · 2026-02-12T21:38:37 1770932317

So feather for journaling and parquet for long term processing?

yencabulator · 2026-02-12T22:37:11 1770935831

You basically can't do row by row appends to any columnar format stored in a single file. You could kludge around it by allocating arenas inside the file but that's still a huge write amplification, instead of writing a row in a single block you'd have to write a block per column.

amluto · 2026-02-13T00:23:33 1770942213

You can do row by row appends to a Feather (Arrow IPC — the naming is confusing). It works fine. The main problem is that the per-append overhead is kind of silly — it costs over 300 bytes (IIRC) per append.

I wish there was an industry standard format, schema-compatible with Parquet, that was actually optimized for this use case.

yencabulator · 2026-02-13T00:37:43 1770943063

Creating a new record batch for a single row is also a huge kludge leading to lot of write amplification. At that point, you're better off storing rows than pretending it's columnar.

I actually wrote a row storage format reusing Arrow data types (not Feather), just laying them out row-wise not columnar. Validity bits of the different columns collected into a shared per-row bitmap, fixed offsets within a record allow extracting any field in a zerocopy fashion. I store those in RocksDB, for now.

https://git.kantodb.com/kantodb/kantodb/src/branch/main/crat...

amluto · 2026-02-13T02:28:50 1770949730

> Creating a new record batch for a single row is also a huge kludge leading to lot of write amplification.

Sure, except insofar as I didn’t want to pretend to be columnar. There just doesn’t seem to be something out there that met my (experimental) needs better. I wanted to stream out rows, event sourcing style, and snarf them up in batches in a separate process into Parquet. Using Feather like it’s a row store can do this.

> kantodb

Neat project. I would seriously consider using that in a project of mine, especially now that LLMs can help out with the exceedingly tedious parts. (The current stack is regrettable, but a prompt like “keep exactly the same queries but change the API from X to Y” is well within current capabilities.)

yencabulator · 2026-02-13T03:26:27 1770953187

Frankly, RocksDB, SQLite or Postgres would be easy choices for that. (Fast) durable writes are actually a nasty problem with lots of little detail to get just right, or you end up with corrupted data on restart. For example, blocks may be written out of order so on a crash you may end up storing <old_data>12_4, and if you trust all content seen in the file, or even a footer in 4, you're screwed.

Speaking as a Rustafarian, there's some libraries out there that "just" implement a WAL, which is all you need, but they're nowhere near as battle-tested as the above.

Also, if KantoDB is not compatible with Postgres in something that isn't utterly stupid, it's automatically considered a bug or a missing feature (but I have plenty of those!). I refuse to do bug-for-bug compatible and there's some stuff that are just better not implement in this millennia, but the intent is to make it be I Can't Believe It's Not Postgres, and to run integration tests against actual everyday software.

Also, definitely don't use KantoDB for anything real yet. It's very early days.

amluto · 2026-02-13T04:34:08 1770957248

> Frankly, RocksDB, SQLite or Postgres would be easy choices for that. (Fast) durable writes are actually a nasty problem with lots of little detail to get just right, or you end up with corrupted data on restart. For example, blocks may be written out of order so on a crash you may end up storing <old_data>12_4, and if you trust all content seen in the file, or even a footer in 4, you're screwed.

I have a WAL that works nicely. It surely has some issues on a crash if blocks are written out of order, but this doesn’t matter for my use case.

But none of those other choices actually do what I wanted without quite a bit of pain. First, unless I wire up some kind of CDC system or add extra schema complexity, I can stream in but I can’t stream out. But a byte or record stream streams natively. Second, I kind of like the Parquet schema system, and I wanted something compatible. (This was all an experiment. The production version is just a plain database. Insert is INSERT and queries go straight to the database. Performance and disk space management are not amazing, but it works.)

P.S. The KantoDB website says “I’ve wanted to … have meaningful tests that don’t have multi-gigabyte dependencies and runtime assumptions“. I have a very nice system using a ~100 line Python script that fires up a MySQL database using the distro mysqld, backed by a Unix socket, requiring zero setup or other complication. It’s mildly offensive that it takes mysqld multiple seconds to do this, but it works. I can run a whole bunch of copies in parallel, in the same Python process even, for a nice, parallelized reproducible testing environment. Every now and then I get in a small fight with AppArmor, but I invariably win the fight quickly without requiring any changes that need any privileges. This all predates Docker, too :). I’m sure I could rig up some snapshot system to get startup time down, but that would defeat some of the simplicity of the scheme.

gregw2 · 2026-02-13T00:24:36 1770942276

Agreed.

There is room still for an open source HTAP storage format to be designed and built. :-)

sixdimensional · 2026-02-13T04:36:27 1770957387

I still don't understand what happened to using Apache Avro [1] for row-oriented fast write use cases.

I think by now a lot of people know you can write to Avro and compact to Parquet, and that is a key area of development. I'm not sure of a great solution yet.

Apache Iceberg tables can sit on top of Avro files as one of the storage engines/formats, in addition to Parquet or even the old ORC format.

Apache Hudi[2] was looking into HTAP capabilities - writing in row store, and compacting or merge on read into column store in the background so you can get the best of both worlds. I don't know where they've ended up.

[1] https://avro.apache.org/

[2] https://hudi.apache.org/

yencabulator · 2026-02-12T20:50:00 1770929400

You can, still, gain a lot of performance by doing less I/O.

yencabulator · 2026-02-12T20:35:25 1770928525

You do realize that Google Maps prioritizes what's displayed on the map based on corporate relationships & money exchanged, right?

yencabulator · 2026-02-12T20:34:20 1770928460

You are presented the SEO crap, but make your own decisions.

It's the difference between buying the top sponsored result on an online marketplace vs. reading reviews and deciding between products.

yencabulator · 2026-02-12T17:41:56 1770918116

I see none of the instructions there.

yencabulator · 2026-02-11T18:42:27 1770835347

Technically, not "not model-able" but "not modeled". As in, the effort was not done, and is easy to omit. And doing it in the general case is a lot of work, hence the expand-contract and only-two-versions designs.

yencabulator · 2026-02-11T18:35:29 1770834929

> and time-skewed deployments.

Yeah, those pesky laws of physics, getting in the way of purity.

You simply cannot deploy simultaneously to an active fleet of servers.

yencabulator · 2026-02-11T18:29:41 1770834581

> Your quote, for instance, is used in the context of reading old logs using a typed schema if that schema changes. But that's a non-issue in the FP languages mentioned above since they tend towards the use of unstructured data (maps, lists) and lambdas

The application that assumes that key "foo" is in the map, and crashes if it's not, is just as brittle[1] as the one that assumes that the data deserializes into a struct with the field foo present.

[1]: In practice, it's more brittle because the crashes are more unexpected and at runtime. Javascript has a legacy of this...

yencabulator · 2026-02-11T17:22:26 1770830546

This infographic looks like it's lying:

https://infobeautiful4.s3.amazonaws.com/2022/03/IIB-Ukraine3...

from https://informationisbeautiful.net/visualizations/ukraine-ru...

Afghanistan gets 15 shapes for 15,000 dead, Russia gets 1 for 9,750,000 dead? Actually including Russian deaths would make that look very different.

yencabulator · 2026-02-10T17:47:09 1770745629

The US has a very strong belief in punishing people. It helps them create an "out group" to shun. For those people, the worse the conditions of your jail are, the better. It's some sort of a relic of the specific religious background common in the USA, and it's disgusting.

Other parts of the world believe in human dignity and helping people fix the things that are broken in their lives. Look up Norwegian prisons...