Well, the idea is that you can check networks against your own set of organised ...

kbenson · on March 25, 2016

So, how are these networks getting their data? Users submitting data? That means users are reducing their individual security to increase the group security as a whole. You are then presented with just consuming this data (and staying secure), or contributing, and we're back at the same point, data needs to be shared so it can be trained against.

Let's also look at the incentives for these networks that have data you can subscribe to. How are they supposed to keep spammers out? Any sort of vetting and management of the individual networks will be non-negligible, and if it's not funded will be at a disadvantage to the spammers that are doing this for profit.

Finally, I'm not sure that training sets for data like this can be easily combined without a massive amount of reprocessing, if at all. I'm not familiar enough with the classifying networks involved to know, but I suspect that problem alone ranges somewhere from "non-trivial" to "very-hard", if not already solved.

It sounds good, and in a perfect world we'd have well run and managed shared networks of fully anonymized spam/phishing classification training data that was easy to combine into individual personal classifiers without having to heavily re-process large training sets.

I'm just not sure how feasible the individual parts of that are, much less them combined into a whole.

thomasahle · on March 25, 2016

I think this idea certainly has interesting parts. We would at least need the following parts:

* Neural nets or similar trained models, which we could prove don't leak information on the data they are trained on.

* A way to combine these models, without access to their training data, in a way that works as well as training a new model on the union of the data.

* A way to exclude spammers. If the models vote on each message, perhaps we would be okay as long as fewer than 50% of the contributors are spammers.