Weights in neural networks don't always need to be precise. Not all weights are equally useful to the network.
There seems to be a lot of redundancy that can be replaced with approximations.
This technique seems a bit similar to lossy image compression that replaces exact pixels with a combination of pre-defined patterns (DCT in JPEG), but here the patterns aren't from cosine function, but from a pseudo-random one.
It may also be beating simple quantization from just adding noise that acts as dithering, and breaks up the bands created by combinations of quantized numbers.
Mixture of experts involves some trained router components which routes to specific experts depending on the input, but without any terms enforcing load distribution this tends to collapse during training where most information gets routed to just one or two experts.
Keep in mind that the "experts" are selected per layer, so it's not even a single expert selection you can correlate with a token, but an interplay of abstract features across many experts at many layers.
The warp model in GPUs is great at hiding the DRAM latency. The GPU isn't idly waiting for DRAM.
All threads that need a memory access are in a hardware queue, and data coming from the DRAM immediately dequeues a thread and runs the work until the next memory access. So you compute at the full throughput of your RAM. Thread scheduling done in software can't have such granularity and low overhead, and hyperthreading has too few threads to hide the latency (2 vs 768).
Facebook and Google+ tried to do this with their realname policies. It doesn't work as well as one would expect:
• Toxic assholes are not deterred by their name being attached to what they're saying, because they think they're saying righteous things and/or fighting bad people who don't deserve any respect.
• People self-censor, because they don't want to risk upsetting some random violent stranger on the internet who can track them down.
• People who don't use their legal name publicly have trouble participating. This impacts transgender people, but also people using stage names/pen names, and stalking victims.
I think OP's point isn't to prevent toxic assholes from saying whatever righteous things and fighting whatever bad fight, but to limit bot/inorganic/foreign contributions from made up people - basically to make it "one person one voice".
I kind of like the idea of "one person one voice", but I have two problems with it, which I think will block me from accepting it.
One is that the cost of it seems much too high, even if you can change it to allow the use of chosen aliases (I don't think it matters what a "one person one voice" system calls an authenticated member). I don't really trust everyone who I have to give my ID details too, and this is just one more bit of stress for so little gain.
The second is that the benefits will never be realised. In an election, one person one vote doesn't work when half the population doesn't vote; you need almost everyone to come, otherwise it's the strongest opinions not the mainstream opinions that dominate. And I'm quite sure we'll see the exact same thing here, but in spades, and faster. If you don't like the opinion, you just don't show up. Once the centre of the social media is sufficiently different from the centre of the community, there will be the sort of bullying and self censorship you foresee and it will spiral out of control.
There's no need for real names, what is needed is that you can't create multiple accounts. This can be done without linking identities by using two unrelated parties. Party A is the platform and B is the authenticator, when creating an account on A you are sent to B to authenticate your identity and get a token to finish your account creation on A. As long as A and B are separate, A never knows the identity of the user and B doesn't know what the user represents himself on A.
I agree completely, and I think it's disgusting and despicable. But honestly this sort of thing has been happening for many, many decades, maybe even centuries, it's just been done a lot more discreetly in the past. The big difference now is that it's so blatant.
While that might sound like an improvement (and kind of is as at least we're getting more honest), I also view it as a big regression. At least when there's perceived shame in being corrupt, people aspire to be better. When it just becomes routine, I fear it's the beginning of the end.
Use of a dense matrix is an artificial constraint you've imposed yourself, but that only disproves feasibility of your proposed solution, not the entire problem in general.
A similar problem, n-body simulation*, has n² gravitational interactions. You will similarly hit a wall if you try to do it with a dense n² matrix. However, there's a hierarchical solution that takes advantage of the sparsity and exponential decay, and can solve it in (n log n) with an imperceptibly low loss of precision.
Social interactions are sparse, and group interactions can be optimized with clustering. Fine-grained simulation of the entire society is such a massive chaotic problem with so many variables, that some loss of precision from clustering is completely insignificant compared to the inevitable simplifications you'll have to make in the design of the model itself.
* I mean the naive one with a fixed timestemp, not trying to solve chaos.
Isn't the naive one hopeless since those systems require perfect measurement, and economic interactions are lagging, the calculations are ordinal, and mathematical systems fail because you can't relabel? (i.e. fails a function test so you end up trying to compare unrelated mathematical objects within differing systems).
Even statistics is fully thrown out with the islands of regularity.
The stochastic nature, lack of measure-ability, and multiple hidden underlying states (value is subjective) require any model to solve chaos somehow.
>Use of a dense matrix is an artificial constraint you've imposed yourself, but that only disproves feasibility of your proposed solution, not the entire problem in general.
The use of a dense matrix is the traditional way of solving the problem. The issue is that it solves the wrong problem. You need a dense tensor which requires more storage than the world currently has for an economy of 20 people.
Social interactions are sparse until they aren't. If you think otherwise try to estimate what every Europeans interaction with Gavrilo Princip was on 27 June 1914 vs 28 June 1914.
As for gravitation: I'm very happy for the planets and asteroids out there. Unfortunately the economy isn't a solar system.
This technique seems a bit similar to lossy image compression that replaces exact pixels with a combination of pre-defined patterns (DCT in JPEG), but here the patterns aren't from cosine function, but from a pseudo-random one.
It may also be beating simple quantization from just adding noise that acts as dithering, and breaks up the bands created by combinations of quantized numbers.