I haven't run into weird Kafka data loss issues like you describe - although, I ...

I haven't run into weird Kafka data loss issues like you describe - although, I will note, a lot of applications don't actually have much testing to notice something like 1 in 10k messages being dropped if it was happening.[0]

But when I've done that testing, Kafka hasn't been the problem.

The problem I've run into most is that ordering is a giant fucking pain in the ass if you actually want consistent replayability and don't have trivial partitioning needs. Some consumers want things in order by customer ID, other consumers want things in order by sold product ID, others by invoice ID? Uh oh. If you're thinking you could easily replay to debug, the size and scope of the data you have to process for some of those cases just exploded. Or you wrote N times, once for each of those, and then hopefully your multi-write transaction implementation was perfect!

[0] in fairness, a lot of applications also don't guarantee that they never drop requests at all, obviously. 500 and retry and hope that you don't run out of retries very often; if you do, it's just dropped on the ground and it's considered acceptable loss to have some of that for most companies/applications.