Hacker News new | past | comments | ask | show | jobs | submit login

I'd be interested in the write amplification since Redpanda went pretty low level in the IO layer. How do you guarantee atomic writes when virtually no disk provides guarantees other than on a page level which could result in destroying already written data if a write to the same page fails - at least in theory - and so one has to resort to writing data multiple times.



The write is a 2 stage function which works quite well for the kafka layer since it's all batch. Let me explain.

1. First, we have a core-local shared chunk cache w/ DMA aligned buffers. https://github.com/vectorizedio/redpanda/blob/dev/src/v/stor...

2. Second, we eagerly dispatch IO blocks w/ manual accounting of which offset on the DMA section it is at https://github.com/vectorizedio/redpanda/blob/dev/src/v/stor...

3. We adaptively fallocate data to prevent metadata contention on the file handle itself.

4. We issue an fdatasync() - since we are interested in the data being safe (data corruption w/ checksums, etc it's too long to type, but i can expand on a blog post if interested)

5. so imagine a big write (for simplicity) say 1MB. This will get broken up into 16-128KB DMA writes. The last step is an fdatasync for acks=-1 (raft needs this)

There are nuances between each line, but hopefully this gives you a hint on how it's done.


Thanks for your explanation. I'll dive into the source to check it in a bit more detail but your reply gave me a good overview to get me started :)


Cool. If you want help hacking in it there is a community slack at vectorized.io/slack or github discussions work too




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: