Looks super cool! There is a bit of data clean up to do. Just looking at speech ...

Looks super cool! There is a bit of data clean up to do. Just looking at speech recognition: "WER" and "Word Error Rate" should be the same thing, and sometimes it seems to be on a scale of 0 to 1 and other times a percentage. Also the Switchboard test set is duplicated. Finally, it really should be marked when data is augmented; many of these numbers are trained on outside data, which says more about how much data the researchers have access to as opposed to the ML system design.