I'd be interested in the write amplification since Redpanda went pretty low leve...

agallego · on Dec 13, 2020

The write is a 2 stage function which works quite well for the kafka layer since it's all batch. Let me explain.

1. First, we have a core-local shared chunk cache w/ DMA aligned buffers. https://github.com/vectorizedio/redpanda/blob/dev/src/v/stor...

2. Second, we eagerly dispatch IO blocks w/ manual accounting of which offset on the DMA section it is at https://github.com/vectorizedio/redpanda/blob/dev/src/v/stor...

3. We adaptively fallocate data to prevent metadata contention on the file handle itself.

4. We issue an fdatasync() - since we are interested in the data being safe (data corruption w/ checksums, etc it's too long to type, but i can expand on a blog post if interested)

5. so imagine a big write (for simplicity) say 1MB. This will get broken up into 16-128KB DMA writes. The last step is an fdatasync for acks=-1 (raft needs this)

There are nuances between each line, but hopefully this gives you a hint on how it's done.

eis · on Dec 13, 2020

Thanks for your explanation. I'll dive into the source to check it in a bit more detail but your reply gave me a good overview to get me started :)

agallego · on Dec 13, 2020

Cool. If you want help hacking in it there is a community slack at vectorized.io/slack or github discussions work too