Modern anti-spam and E2E crypto (2014)

lstamour · on July 17, 2015

See also, previous discussion 10 months ago: https://news.ycombinator.com/item?id=8275970 (138 comments)

dang · on July 17, 2015

Thanks, we missed this one.

When a story has had significant attention within the last year or so, we bury reposts as dupes. After that, reposts are fine.

https://news.ycombinator.com/newsfaq.html

fenomas · on July 18, 2015

Isn't this meant to happen automatically? I submitted it assuming HN would tell me if it had been posted before.

dang · on July 18, 2015

It depends on whether the urls exactly match and also on what happens to be loaded in RAM on the server—which gets increasingly non-deterministic as you go back in time.

We're planning to work on better duplicate detection, but the problem is hard to solve well and we don't like to solve things not-well, so it may take a while.

fiatjaf · on July 18, 2015

Sometimes it does, for me.

carsonreinke · on July 17, 2015

That is annoying

lstamour · on July 17, 2015

I'm not suggesting you shouldn't discuss it again here, obviously. Just that as I was reading it, I felt I'd read it once before... and I had.

fiatjaf · on July 17, 2015

This sounds like a perfect _proof_ that email is not good as it sounds when you think of it as "an open decentralized protocol". Spam prevention requires an enourmous amount of hard work, computing resources and magic and is _totally dependent_ on the lack of privacy.

The whole problem, it seems, is that email is the only protocol on the internet where the sender can initiate contact with whoever he wants. I used to think this was good before reading these notes, but now I think other solutions must be seaked. Sender-initiated contact is not a feature of anything.

A good suggestion is found at this thread: https://news.ycombinator.com/item?id=9829614

the8472 · on July 17, 2015

> The whole problem, it seems, is that email is the only protocol on the internet where the sender can initiate contact with whoever he wants.

What? All p2p/federated protocols are by necessity sender-initiated because someone has to start the conversation.

The "who contacts whom" part is separate from the "which data am I (not) interested in" problem. With email you simply don't know the latter.

jcranmer · on July 17, 2015

Sender-initiated contact is an absolute necessity for a near-universal communication protocol. If you don't allow for it, then you require that all forms of social introduction be supported in the protocol itself, or you turn introduction protocols into a vehicle for spam.

It's also worth noting that not all spam is unsolicited--people give uninformed consent all the time. (This is part of the reason why legal remedies to spam have failed: it is actually very hard to give a robust, concrete definition to spam).

fiatjaf · on July 17, 2015

Email is not entirely sender-initiated. The sender must know the email address of the receiver, and that step -- getting to know the email address of the receiver -- is not supported in the protocol itself.

In other words, email is not "near-universal" and entirely open, because the receiver must give you his email address, by means of which he is consenting to receive your emails.

---

In any protocol where the message stayed at the sender's server and could be fetched by an authorized potential receiver (such as the one I linked to, with variations introduced by my imagination) there could be ways to quickly "get" one's email address, like today we do by reading the address itself.

For example, instead of showing a written email address in his website, one could show a link to third-party identification/message-request authority which would be in charge of filtering these things and letting the potential receiver choose what he would want to receive. This introduces the problem of spam again, but at least it is outside of the main protocol, and the protocol could live without it.

Another solution, now thinking specifically about the "streams" (see my previous link) protocol, is that each person willing to accept email from new people could open public streams -- and yes, they would be subjected to spam, but since the receiver would have absolute control of that stream and what does its server does with the messages, it could enforce a strict format (for example, every message could only be "let me talk to you, add my stream at ____"), and it could change its address from time to time. The customizability of the thing would be its main feature, since no spammer machine would be able to go into all these personal standards.

Also, having a public communication channel that is restricted to messages that say "I want to talk to you" is going to totally inhibit spammer behavior. No spammer will spend time trying to get people to talk to him. It will filter out machines and let only people.

In any way, the social introduction, as you called it, can exist, and does not have to be supported by the protocol (although, like in my second example, it could be done though the protocol). Supporting it as a feature of the protocol would only make it more automatable and easy for spammers.

---

I've just came up with these ideas while writing, so there are probably a lot of problems, but the main point was that there is always a pre-protocol social introduction, and that can be bad or good depending on how things work. Probably with email it was good to just give your address to everybody on the internet some years ago, but now just by writing it in a post you're going to be massively spammed.

hollerith · on July 17, 2015

I do not see a problem. A system or service that allows sender-initiated contact is valuable. A service that allows end-to-end encryption is valuable. They do not need to be the same service.

fiatjaf · on July 17, 2015

"Valuable". Even a piece of broken stone is valuable. What matters here is how much you are willing to pay, or sacrifice, to get a piece of broken stone.

twocents2p · on July 18, 2015

How can any modern article on spam not have a single reference to spf and dkim?

There is nearly zero need to read the contents of an email message to determine 'spam' / 'not spam'. And if you read the PDF paper from 2006, you'll note this is exactly how it's done at Google.

Google has its own internal reputation system, but it also relies on external services too. In short, a spammer can buy a clear shot to gmail inbox.

E2E has nothing whatsoever to do with spam mail. The only thing E2E will do for spam, is generate mountains of encrypted spam.

carsonreinke · on July 17, 2015

Reputation can go wrong, legitimate emails with double opt-in still can be marked as SPAM by the user. Gmail definitely is slowing that down by offering a warning to the user when there is a `List-Unsubscribe` header. Even that still does not work and probably the reason by AT&T bill is sometimes marked as SPAM.

JoshTriplett · on July 17, 2015

The term "double opt-in" is commonly used by spammers to suggest that this requirement is somehow onerous rather than basic due diligence. Filling in an email address in a form is not an opt-in, as anyone can do that with anyone else's email address. It's necessary to confirm that whoever did so actually owns the email address before you can consider it an opt-in of any kind.

Unsubscribe links are commonly used by disreputable spammers as a way of confirming that the address really exists, so relying on the "List-Unsubscribe" header is not always a good idea.

Yes, some people mark transactional emails as spam. However, far more spammers think their mails were justified when they're not. Your "newsletter" may very well be spam, no matter how much you think it's covered by someone's existing tenuously related relationship to some company you bought a pile of email addresses from.

carsonreinke · on July 17, 2015

Double opt-in is an industry standard. Someone buying a list of email addresses is not. Some laws are now enforcing the concept such as CASL.

Either way, newsletters/transactional emails can all be marked as SPAM even though the recipient is legitimate. The sender can be negatively affected by a blind reputation system.

"List-Unsubscribe" for sure can be abused, but better then blindly considering every email flagged as SPAM.

DanBC · on July 17, 2015

The industry standard is "confirmed opt in".

Lots of people use the term "double opt it". Some of those people are spammers. If someone cares about sounding credible they should probably use "confirmed opt in" rather than "double opt in".

carsonreinke · on July 17, 2015

Clarification, https://blog.mailchimp.com/opt-in-vs-confirmed-opt-in-vs-dou..., http://support2.constantcontact.com/articles/FAQ/1586.

I'm not sure why it is not credible to use the term double opt-in, please explain.

DanBC · on July 17, 2015

Mostly it's used by spammers. When you use it there's a hard to shake impression that you don't understand the point about getting confirmation by email from the email address owner, and that you might be using eg checkboxes on a webform as a confirmation.

That mailchimp blog? It's wrong. What they describe as confirmed opt-in is not confirmed opt in, and what they describe as double opt in is in fact just confirmed opt in.

If anyone from mailchimp is reading: please fix this fucking annoying and stupid error.

EDIT: That constantcontact post is correct. Notice how they put "also known as double opt-in" in brackets, and then never use it again but only use confirmed opt-in?

fiatjaf · on July 17, 2015

Emails with double opt-in still can BE spam. The act of signing up for an email list does not give the email list owner the right to send any amount of any email content to you.

carsonreinke · on July 17, 2015

Traditionally I consider SPAM only unsolicited emails, any double opt-in that is too high of volume that the user could be able to opt-out. Users marking as SPAM and decreasing reputation because of that seems like the wrong target.

fiatjaf · on July 17, 2015

Sometimes I opt-out of email lists, but then I keep getting email from then, probably because I was on multiple email lists and only opted out of one -- altohugh when I subscribed, I subscribed only once.

Sometimes I try to opt-out, but they ask me to login, and I don't remember my password because they asked me to put weird symbols and uppercase letters on it, while my normal login-everywhere password does not have these.

Sometimes I try to opt-out, but the link is broken.

Sometimes (this is what happens most of times) I am subscribed automatically to email lists whenever I sign up to some website. Shouldn't this be considered spam? I did not receive a confirmation email -- or maybe I did, but the confirmation email was to confirm my account on the site, not my subscription on that email list.

Sometimes the sender forgets to his opt-out link.

---

The question that these cases pose is: what is the difference between "spam" and "email that can be useful to others but that you don't want to receive"?

And the answer is: SPAM, as explained in the original submission, is a global uncustomizable tag, if something is spam, it is spam to everybody, not just to you. That is not the ideal situation. We could do better, but I don't think it will be better within the email protocol, since it would be impossible to Google to calculate the spam-probability of each message according to its receiving user. The only way it to move to other protocols.

carsonreinke · on July 17, 2015

No doubt, opt-out should not suffer these problems. Is it SPAM after that point? Maybe. I can see at that point the reputation system wins.

`it is spam to everybody`, maybe, but Google is trusting the user to categorize SPAM, which can have some unwanted consequences.

fiatjaf · on July 17, 2015

I'm not defending the current Google practices, just saying they are inevitable.