> One of the biggest privacy problems in messaging is the availability of loads of meta-data — essentially data about who uses the service, who they talk to, and when they do that talking. […] the same problem exists with virtually every other social media network and private messenger.
Avoiding any metadata leaks without generating tons of cover traffic (to frustrate timing correlation attacks) is very hard.
Signal does indeed use an architecture (at least for chats with contacts, or optionally everyone when you enable the "sealed sender" option that makes you a bit more prone to receiving spam) where Signal doesn't know who's sending a given message from a given IP address, and only which account it's destined for.
But any entity in position to globally correlate traffic flows into and out of Signal's servers can just make correlations like "whenever Alice, as identified by her phone's IP, sends traffic to Signal, Bob seems to be getting a push notification from Apple or Google, and then his phone connects to Signal, so I think they're talking".
How accurate does the timing need to be? I imagine there must be many Bobs getting notifications around the same time. Also, if I use Signal behind a VPN is it still known that I’m talking to the Signal servers?
> Is this true for Signal too? I thought it wasn’t.
It is, because you cannot use Signal without giving them your mobile phone number, and from that point onward they (and anyone they might be sharing data with) know the who/what/when, and more. My gut feeling, notwithstanding any apologist and their weak arguments, is that the design choice is exactly about the who/what/when because it's mandatory despite being entirely unnecessary from a technical perspective.
Every Signal account is represented by the phone number the user provided in order to receive their SMS activation code, and messages are not sent directly between users' clients/apps but relayed through Signal's systems.
Is this true for Signal too? I thought it wasn’t.