The exposition is not very clear. What exactly do you mean when you say "No edge...

iamtrask · on Nov 25, 2014

Precisely! I highly encourage checking out the slide-deck for a graphical representation.

For every node in every other layer, I colocate the edge on the same machine. In this way, when a group of, say, 10 nodes in layer 1 are each sending a weighted message to a single node in layer 2... they can pre-combine their messages (weighted sum) and send only that value over the network. This happens for every node in the second layer, reducing network i/o (this is the first optimization).