This repository contains the source code for ML hardware architectures that
require nearly half the number of multiplier units to achieve the same
performance, by executing alternative inner-product algorithms that trade
nearly half the multiplications for cheap low-bitwidth additions, while still
producing identical output as the conventional inner product.