Hacker Newsnew | past | comments | ask | show | jobs | submit | more tribler's commentslogin

Interesting! But the DHT of Bittorrent is filled with spam and fakes. How do you create trustworthy results?

(disclaimer: academic working on this problem for 15+ years, Tribler lab)


I usually just count the seeders and leechers. 1000+ seeders are usually seeding something legit. Failing that, I grab several versions and just see what they look like.

I think here, the common-sense solution that works most of the time is more useful than an interesting, complex solution.


It would be pretty cheap and easy for copyright holders to publish a bunch of fake torrents — just copy the torrent+file names of the most popular torrents and fill the files with random data — rent a couple thousand VMs on AWS, and seed them all using these VMs.


Even more insidious: They can distribute something that is similar enough to the original file, but is still a fake. Movies with the climax cut out, books where the plot is changed, games where you cannot win.

Enough downloaders get the file, skim it to make sure they have a viable one, and then keep it in a folder for later consumption. If it passes the scan test they would be likely to get a bunch more seeders. This is one of the reasons torrent sites have comments.


Stuff like what you describe would have lots of artificial seeders, sure - and "they" could even rotate the IP addresses so blacklists don't work.

But it's a big well to poison with such weak tactics, and I think such things have been tried before; and it's not that hard to just...download a different copy.

I usually download a couple different versions of my 'Linux isos' anyway, just in case the audio or the encoding is messed up on one of them. I get that it's fun, and intellectually stimulating, to think about complex solutions to interesting problems, but you still have to look and see if the simple solution is already there. BitTorrent is a robust protocol that's already got built-in mechanisms for these things. The swarm itself attests to the valid files, because those are the files that remain seeded. No need for extra complexity.


Its not effective enough for that but if it was people just move on to the next method.


From experience with Gnutella, spammers can just fake the seeder number.

Gnutella, unlike BT can propogate standalone chunk hashes alone, as I understood, so you can weed out fakes early. BT doesn't have that before you start the download.

Gnutella 2 has even more armaments to weed out fakes


I don’t agree


Step 1: solve the identity problem to prevent Sybil attacks.

Step 2: some form of blockchain? People could vote/vouch for torrents in a completely distributed way.


It seems like just about every major problem with the internet right now would be a lot easier if Step 1 were solved. If it could be solved in a way that also preserved privacy, then the net result could even be positive.

As you mention the "b" word, let me mention one proposed solution to Step 1 which does rely on that technology, and claims it "requires no personal information. It lets you prove your humanness without risking your privacy."

https://www.brightid.org/


it doesn't seem like it actually uses blockchain? according to https://en.wikipedia.org/wiki/Proof_of_personhood, it's basically PGP WoT but hopefully actually usable?


I think "PGP WoT but hopefully actually usable" is a good way to describe it, but the system is at least blockchain-adjacent. As the user guide[0] says:

"BrightID and IDChain itself use DAOs on IDChain for governance."

and:

"IDChain (IDChain.one) is a proof-of-authority blockchain where validators are democratically elected by BrightID-verified unique humans."

[0] https://brightid.gitbook.io/brightid/idchain/introduction


Blockchain was the first thing I thought about, because that sounds like an poster-child case of actually useful blockchain. But thinking again, I'm not sure, wouldn't it be rather expensive to run on a blockchain? Either you use some pre-existing blockchain with smart contracts & such: so basically, make an Ethereum DApp and burn gas. Or you would need to implement all the same PoW as other cryptocurrencies, with is an unwelcome overhead, considring people don't even like to seed torrents for too long. On the other hand, I'm not sure that incentive to fake torrents is THAT high, so maybe some very weak version of it would suffice, because very few will be ready to spend money to create fake votes for their torrents. I seriously have no idea.

The second idea, which comes to mind is that there are "reputable" release groups for most of the content anyway, many with a web-page of their own, and it would suffice to make signing with a private key a common practice, or maybe even implement some sort of standard protocol to fetch these keys and verify torrents (with curated source lists). But then again, it seems practical, but not really decentralized anymore, as it often happens.

Which gives me a third idea: to know that an item is trusted, you don't really have to make a decentralized reputation system. I mean, you don't need a score and many votes to mark item as "trusted": you only need 1 trusted vote. So it seems like we could have something like a decentralized certificate authority. So, something exactly like a regular certificate authority: there is a trusted CA, and it can manually sign other CAs that become trusted as well, anyone can revoke certificates and so on, but instead of 1 root CA there are possibly many, different for different nodes/people. Of course, we still have "the hard problem" unsolved, we only transformed it into a different hard problem, but the difference is I think we don't actually have to solve this one! We could be piggybacking on some pre-existing social graph, possibly decentralized and quasi-anonymous. Imagine this being built-into some federated social network, like Matrix or Mastodon! You decide to trust someone for some absolutely non-technical reasons, that have nothing to do with cryptography, and everything else is relatively easy and simple.

Surely, malicious signatures would still find their way, but they would be rare enough and it would help no one if you can make tons of fake CAs, because they are not trusted by default, and if you can find a compromised CA that is trusted by somebody: well, everybody can just blacklist that CA (and all of its children) after you sign some malware with it.

There is one thing I'm not sure about: if we can somehow (usefully) implement signing and revoking without revealing who of your "friends" signed it. It would seem desirable to make all activity graphs non-transparent and anonymous in a practical sense. It somehow feels possible to me, but I'm sleepy and a bit foggy right now, so maybe there's a problem with it. Of course, it still would be useful without that feature, but a bit less nice. I would surely be more inclined to mark torrents as "verified" for all my "subscribers" if all they will know is that "somebody trusted" verified it, and not that it was me. Maybe it's less of a problem if only "bad" torrents are explicitly marked as such.


I think you might be right that a system doesn't need to be fully Sybil-resistant if you're bootstrapping your web of trust from people you actually know. The main developers of Matrix are working on decentralised reputation systems[0] which might show how this can scale, and I think the underlying protocols of both Matrix and the Fediverse are general enough that they could support granting reputations to content/hashes as well as people/groups/CAs.

Also it sounds like you're almost suggesting some sort of zero-knowledge proof system, whereby a user could calculate the average trust rating for a given entity across all their (friends of) friends, without that result disclosing the rating given by any specific friend. There are probably already algorithms for doing that, if necessary using the techniques of privacy-preserving cryptocurrencies.

[0] https://matrix.org/blog/2020/10/19/combating-abuse-in-matrix...


Can torrent creators use crypto to sign their torrents on the DHT? That'd allow for reputation signal in the distributed system.


It's a "well, yes, but actually no" situation, seeing as some torrent-related programs implement a few draft BEPs. I haven't seen any that support the torrent signing BEP, though. https://www.bittorrent.org/beps/bep_0035.html


Which leads to possibly an interesting legal question: If a third-party is vouching for the quality of a given copyright-infringing torrent, are they liable for the copyright-infringement of the people who download that torrent based on its positive rating?

Some jurisdictions have decided that running a search engine for torrents (especially if it doesn't remove results which rights holders claim are leading to copyright infringement) does make the site operator liable.

I suppose if we are being strict, what we are talking about is vouching for the quality of a .torrent metadata file, which can be downloaded by a torrent client without legal problems from the author of that metadata, and it's only when the metadata is used to download the torrent contents that copyright infringement occurs.

The thought experiment I've considered is what would happen if there were a site where people could vote on short hex sequences of a certain length, to decide which sequences are the best. It could be called the "I Rate Bay", because users give each (hash) sequence a rating from 1 to 10.

Of course all of this ignores the fact that by participating in these ratings, someone is probably incriminating themselves by saying they have not only downloaded the torrent contents but read/installed/watched/listened to it. Using that as the basis of a case against someone seems almost reasonable, but pursuing a "contributory infringement" angle strays a little too far into freedom-of-speech violating territory, in my opinion.


I think there’s an argument to be made that if “quality” is limited in scope to “not malware,” then you’re operating a service to promote the public health of the Internet. If you start talking about whether the torrents are good rips, complete, etc., then it would promote more piracy. Not sure that this argument would pass muster given the history in this space, but I do think it would help stifle a malware propagation channel.


The article "What Colour are your bits?" has meaning here.

https://ansuz.sooke.bc.ca/entry/23


It's an interesting thought experiment. But even if you figure out a way to remain on the right side of the law today, the copyright cartels will just buy some new laws to make whatever they don't like illegal. The only way to stop this corruption is to thoroughly defund them.


All it would take is one good PR against libtorrent


Spotnet, a distributed Usenet indexer does exactly that.


in the piracy business, having a cryptographically-verifiable way of proving that you were the one infringing the copyright sounds like an anti-feature to me...


Persona based, not tied to your meatspace identity.


Is this truly a "Nature-worthy" breakthrough?

Last I heard, EDA tooling failed to innovate. No serious money for open source, little competition and market failure for closed tools.


Taproot with MAST enables a DAO on Bitcoin.

See the full implementation here by Delft University of Technology scientists and students. [1] Security needs work, functionality works. (disclaimer, I'm the responsibile professor)

[1] https://github.com/Tribler/trustchain-superapp#luxury-commun...


Any alternative to Big Tech will have to deal with trolls and fake account.

This work presents the starting point for mathematics of trust. Disclaimer, involved in this work.


"We therefore have reason to be optimistic about the future of economic theory."



plus IPFS wants to enforce copyrights worldwide: [1,2] Businesses principles plus monthly bandwidth usage matter for real people.

[1] https://github.com/ipfs/community/blob/master/code-of-conduc...

[2] https://discuss.ipfs.io/tos#8


Our TOS applies to the IPFS HTTP Gateway where Protocol Labs run the infrastructure (bridging data in the IPFS Network to users over HTTP) to ease onboarding/development. There are many different IPFS Gateways (https://ipfs.github.io/public-gateway-checker/), with different local jurisdictions that can each choose their own TOS.

We do not and cannot control the data that each individual node is hosting in the IPFS Network. Each node can choose what they want to host - no central party can filter or blacklist content globally for the entire network. The ipfs.io Gateway is just one of many portals used to view content stored by third parties on the Internet.

That aside, we definitely don't want to 'apply copyright worldwide'. For one, it's not consistent! Trying to create a central rule for all nodes about what is and isn't "allowed" for the system as a whole doesn't work and is part of the problem with more centralized systems like Facebook, Google, Twitter, etc. Instead, give each node the power to decide what it does/doesn't want to host, and easy tooling to abide by local requirements/restrictions if they so choose.


I think you misread. These terms apply to "public IPFS infrastructure" (I read that as things like build services, gateways or bootstrap notes) and the ipfs.io website, not to the IPFS network as a whole.

The IPFS network itself is decentralized, there's no central authority that can police copyrighted content on individual nodes.


I'm not sure if I understand the complaint around this.. If you put a file (that you do not have permission to distribute) on the internet in a public-accessible location, you should expect that someone will want it removed.

IPFS is not trying to be an anonymous file-sharing service afaik.


The complaint is they shouldn't take a stance or at least one that is a little less firm.

The permissions thing, we can all agree if it's a new movie that just came out, ok, yes, don't be spreading that.

What if instead it's an academic journal article from 1930 in a publication that ceased operating in say 1940? You also don't have permission for this and it's still also under copyright.

The strict interpretation would be "not that one either" while there's also some who say "it's ok, there's nobody to even ask, let historians do research".

So some prefer to be grey about it like many are with obscenity and pornography. We don't for example, hide renaissance paintings with exposed beasts away from the public in basements for fear of getting shutdown by the police.

There's a spirit of the law as well.


For IPFS to work, there needs to be no central authority who can block files.

Todays IPFS is a long way from that - any Joe Random can DoS any particular hash by getting their node at the right place in the DHT and blackholing requests.


> can DoS any particular hash

Can you explain more how this is possible?

So we have one evil user Karen who wants to block access to content ABC.

She will spam the DHT with requests to content ABC. After a while, nodes will stop responding as she hits the rate limit. Now her DHT requests goes into the void.

Now Joey wants to request content ABC too. He requests the content, and because no other nodes are responding to Karens requests, they responds to Joeys request for the content. Now he can fetch the content.


There's a smarter attack...

Every node in the network 'owns' some of the keyspace.

Karen can simply keep reconnecting to the network (brute forcing her PeerID, which determines which bit of the keyspace her node will be responsible for) till she gets assigned that bit of keyspace. Then she can black hole requests to it.

You can defend against that by having multiple owners for a given bit of keyspace (known as quorum in the IPFS design), but evil Karen can simply pretend to be all of the machines hosting that bit of keyspace.

The brute forcing sounds hard, but in a million node IPFS network, on average you only need to do 1 million sha256 hashes, which takes under a second on modern hardware.


> For IPFS to work, there needs to be no central authority who can block files.

There isn't one.

> any Joe Random can DoS any particular hash

Sounds pretty decentralized to me!


This may only apply to their own ipfs.io gateway


It seems global copyright enforcement with blacklisting and blocklists: https://github.com/ipfs/notes/issues/284 (I've testified in US federal court in copyright cases as the expert witness professor. I always try to investigate the DMCA decisions of developers. Difficult tradeoffs.)


_Optional_ (up to a node operator) global copyright enforcement.

Of course they have to play ball with existing legislation in business settings.


Note that outside of special cases like a publicly-accessible gateway, an IPFS node is not supposed to fetch or retransmit data that the node operator has not specifically requested to be stored there. So this copyright enforcement stuff will always mostly apply to these services. (There might of course be some corner cases such as fetching a mutable IPNS resource and then rejecting the data because it happens to match some blacklist, but these are also broadly sensible.)


As hobofan said, this is up to each node to update/block. It really seems like that the TOS only goes for their gateway, I don't see how they could enforce this on the entire network.


Certain companies pay 2-3 extra months of salary for each draft or patent.


scientific study: "The importance of touch in development", Brain Research Centre and Department of Psychology, University of British Columbia, Vancouver. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2865952/#__ffn_... (as commented elsewhere)


most certainly, I would fully agree that touch is an important ingredient to socialization and stimulation -

it is the premature claim that it changes DNA what is so annoying and misleading.

because once something changes your DNA it suggests that it is there forever (that is why it captures people's imagination)


Not if it is the methylation of DNA, modulating the expression of that DNA...methylation is frequently modulated by experience.


That study is only reference to the existing consensus, not the actual study from the OP.

I don't have the time to dig out the actual study but it looks like [1] it's originating in Canada (not NIH/USA) based on the publication of the press release.

Original comment of thread doesn't understand how much research is behind this idea (touch linked to higher outcomes long-term). I'd strongly recommend the commenter to read tribler's citation as that's scientific consensus at this point. If you want to read the recent study on DNA affects of touch, then below is that source of research.

1 - https://www.med.ubc.ca/news/holding-infants-or-not-can-leave...


I have performed DNA studies in my career. Measuring and interpreting methylation - understanding what it is, why, when and where it is present is at its infancy at best.

Sadly I have prime view how clickbaity ideas like this one drive most of the motivations behind investigations in life sciences.

It is not just this paper that is bullshit, the whole field of "epigenetics" that this paper is a representation of is bullshit as well - this paper is just one out of the long line.

As I pointed out it there is simply no proof that handling changes DNA. It could just as well handling stimulates the development rate which, in turn, also shows up as another signal.

The lie is not that A and B are present at the same time. The lie is presenting story as if A was caused by B. There is absolutely no evidence for that.

Now hundreds of scientists want a piece of the "cool" story, will jump headfirst into proving how LOVE will reflect in the DNA. No one will care about understanding what the heck is methylation - they will all be chasing baby handling and methylation. That's what I have seen happening and will keep happening thanks to papers like this.


> is at its infancy at best.

Hence why the main source is published with ~90 subjects studied. Science starts with one paper/experiment and builds from there.

> The lie is not that A and B are present at the same time. The lie is presenting story as if A was caused by B. There is absolutely no evidence for that.

That's not a lie. It's called a hypothesis. It's testable. They've tested it and encourage others to test as well.


I will tell you what they did. They collected data, with no hypothesis or any idea what they are looking for, then desperately fit various models until something showed up as statistically significant.

Of course, you could say: how dare you, how would you even know, ... I work in this field, the p-hacking, harking (hypothesizing after the results are known) is both pervasive and endemic. they massaged the factors, the genders, the ethnicity, the socioeconomic status etc until the model did something that was publishable.

It simply not possible to accurately correlate these two measures: self-reported minutes of touching a baby with the methylation levels of the DNA of that baby - if you are serious about accounting for all the possible variations across all factors



A perfect example of what I am talking about. What is the final conclusion of that paper written in 2009!:

> Are the neuroendocrine effects of these experiences across the lifespan also mediated by DNA methylation? The answer to this question is not yet known.

So what happened in the following ten (!) years, have we finally figured out whether the effect is mediated by DNA methylation? Nah. Instead, they published another bullshit paper, this time about babies being held...

Ten years is (or rather should be) an eternity in science! The 1st smartphone was barely released back then - how far have gone in technology in this time? Yet we are nowhere closer to have proven or disproven the mechanism. Instead, they would much rather maintain the status quo and publish another bullshit paper.


Just a small caveat. When I quit my PhD (computational protein folding models) in 1987, it was because I didn't want to spend 10 years working on a problem and getting nowhere. Turned out I was wrong - way too optimistic!

Until the recent ML-based announcement from Google (and maybe not even with that in hand), protein folding research went nowhere for at least 30 years. So I wouldn't be too critical of a 10 year gap.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: