There are other reasons for duplicates in event streams - not just the dupes introduced by at-least once processing in Kinesis or Kafka workers. We've done a lot of thinking about this (all open-source) at Snowplow, this is a good starting point:
Hi, Jin here from Amplitude. You are absolutely right that there are other sources of duplicates. Our real-time data store sits behind an event processor (not covered in this blog) that handles all major event duplication scenarios. This is why the real-time store focuses on duplications introduced by the message bus replays, something that systems such as Druid do not address.
http://snowplowanalytics.com/blog/2015/08/19/dealing-with-du...
Our last release started to tackle dupes caused by bots, spiders and dodgy UUID algos:
http://snowplowanalytics.com/blog/2016/12/20/snowplow-r86-pe...