Well, the idea is that you can check networks against your own set of organised data — if adding network X reduces the overall effectivity, you just stop using network X and X's score is reduced.
EDIT: As HN prevents me from adding new comments right now (Seriously, HN, allow us to post more than 3 comments per hour, it’s seriously hard to hold a conversation like this), I’ll answer your comment here:
Users would train networks locally based on their own decisions. Those networks would then be submitted to a repo, and you’d get other networks in return. If a network sorts badly (aka, you always undo its sorts manually), you will not get networks with similar sorting capabilities next time.
The concept would automatically prevent people from adding malicious networks – as they’d end up in the local blacklist of users.
Obviously you wouldn’t blacklist the network itself, but a representation of its concept of sorting.
So, how are these networks getting their data? Users submitting data? That means users are reducing their individual security to increase the group security as a whole. You are then presented with just consuming this data (and staying secure), or contributing, and we're back at the same point, data needs to be shared so it can be trained against.
Let's also look at the incentives for these networks that have data you can subscribe to. How are they supposed to keep spammers out? Any sort of vetting and management of the individual networks will be non-negligible, and if it's not funded will be at a disadvantage to the spammers that are doing this for profit.
Finally, I'm not sure that training sets for data like this can be easily combined without a massive amount of reprocessing, if at all. I'm not familiar enough with the classifying networks involved to know, but I suspect that problem alone ranges somewhere from "non-trivial" to "very-hard", if not already solved.
It sounds good, and in a perfect world we'd have well run and managed shared networks of fully anonymized spam/phishing classification training data that was easy to combine into individual personal classifiers without having to heavily re-process large training sets.
I'm just not sure how feasible the individual parts of that are, much less them combined into a whole.
EDIT: As HN prevents me from adding new comments right now (Seriously, HN, allow us to post more than 3 comments per hour, it’s seriously hard to hold a conversation like this), I’ll answer your comment here:
Users would train networks locally based on their own decisions. Those networks would then be submitted to a repo, and you’d get other networks in return. If a network sorts badly (aka, you always undo its sorts manually), you will not get networks with similar sorting capabilities next time.
The concept would automatically prevent people from adding malicious networks – as they’d end up in the local blacklist of users.
Obviously you wouldn’t blacklist the network itself, but a representation of its concept of sorting.