I'm curious of the 2021 measure of total disk space that Discord consumes. Serve...

latchkey · on Aug 24, 2021

Unique images or just copied from elsewhere?

objektif · on Aug 24, 2021

How does that matter? Do they keep track of all images on the internet?

jjice · on Aug 24, 2021

I don't know much about image de-deuplication, but maybe they can get some sort of fingerprint/hash for an image, see if they already have it, and then serve that already existing image.

I'd imagine a hash like SHA256 would be tricky because if that image was compressed an additional time at all throughout it's internet journey, then we'd get a different resulting hash, but maybe there is an effective way to fingerprint images. I have a utility on my machine (czkawka maybe?) that does really good image de-duplication with what seemed like a common algorithm (based on a quick look at the source).

No idea though, just spit balling.

vortico · on Aug 25, 2021

I'd imagine Discord uses deduplication, but I bet it doesn't save them 5% storage space.

Ekaros · on Aug 25, 2021

I think it might, spamming same meme images over and over is quite common in some servers. On other hand the bigger pictures might overhelm these just in size.

vortico · on Aug 25, 2021

Yeah, that's why I assumed it wouldn't help that much. People re-upload 100kB memes all the time, but the bulk would probably be 5MB phone pictures that won't typically be re-uploaded.

kroltan · on Aug 25, 2021

The plural of anecdote isn't data, but about 20% of the images I post on Discord come from Discord in the first place, cross-posting among different servers.

PeterCorless · on Aug 24, 2021

Yes. There are ways to group images that seem to be the same. TinEye and Google image search do that. So you'd have a collection of related hashes that equal "Bob's prom photo where he looks like a goofer."

objektif · on Aug 25, 2021

Yes definitely I have seen it work in action but you cant just tell a user "here use this smaller and more pixelated version of your image that we think is kind of similar".

PeterCorless · on Aug 25, 2021

Oof! No, you can't. :D

SilverRed · on Aug 25, 2021

I'd imagine that the images are not part of the main database and that they are in some kind of s3 like file storage system.

snak · on Aug 24, 2021

The Casssandra cluster mentioned (12 nodes, 1TB each) only handles text, as far as the article goes.

jhgg · on Aug 24, 2021

We're well over 12 nodes in current year :P

snak · on Aug 24, 2021

Oh, just noticed the article is from 2017. Is there a newer one, related/similar to this one?

PeterCorless · on Aug 24, 2021

lol. :D