Thanks for the detail, though to be clear, I’m not doubting the mathematical theoretical possibility of an SHA256 rainbow table.
I’m saying that even the example scenario you describe is a massive solution space in itself. And then to extrapolate this for all sorts of PII and combinations of words/letters to make this attack worthwhile, is quite a stretch. Again, a slight change in order of words or characters (even something like an extra space) will break your rainbow table lookup. People message in all sorts of weird ways, with abbreviations, typos, slang, or even mix languages.
Even suggesting that “one could just” pre-calculate every 1,2,3,etc combination of words is hilarious.
I categorically do not believe this attack is even remotely feasible.
Let's assume you only care about cracking one word messages in English, and only the 10000 most common words. The time stamp is granular to the hour, and there are ~10000 hours in a year.
Easily available hardware can create 10 billion SHA-1 hashes per second. So to brute force every hash for every combination of one word message and timestamp, you need ... one hundredth of one second. No rainbow tables needed here. Just brute force it. For each hash, you just check every message hash stored on your server to find matches. If you find a match, you've uncovered the probable message text.
And so on for 2, 3 word combinations. Let's take 3 word combinations, assuming all three words are in the 10000 most commonly used English words. There are 10000^3 possible combinations of such words. Once again, the timestamp adds another 10000 possibilities (for the last year). How long does it take to brute force all 3 word combinations? One million seconds - less than two weeks. This is for a SINGLE person working on this with a SINGLE GPU. Not only will a serious attacker have better hardware, but context clues can help limit the possible options so that much longer combinations of words will be possible (if they are common combinations).
Sure, there will be slight punctuation or capitalization differences in some cases and that adds entropy, but not by enough to bother taking into consideration in terms of security.
I guess we're also assuming in all this that Google is not tagging each message with the time received on the server. Given that they already have that information, they might be able to recover the message content without needing the extra hashes for the timestamp.
I’m saying that even the example scenario you describe is a massive solution space in itself. And then to extrapolate this for all sorts of PII and combinations of words/letters to make this attack worthwhile, is quite a stretch. Again, a slight change in order of words or characters (even something like an extra space) will break your rainbow table lookup. People message in all sorts of weird ways, with abbreviations, typos, slang, or even mix languages.
Even suggesting that “one could just” pre-calculate every 1,2,3,etc combination of words is hilarious.
I categorically do not believe this attack is even remotely feasible.