Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

you don't transpose it before the matmul, you always have it transposed (i.e., when you print the weights of a linear layer in pytorch, you're actually seeing (A^t)^t and what's stored is A^t.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: