Not OP, but one thing that surprised me was if you are doing rust Simd in a library, and part of the code is marked #[inline] but others are not you might see catastrophic performance regressions. We saw an issue where the SIMD version was over 10x slower because we missed marking one function as inline. Essentially rustc converted it from an intrinsic to a regular function call.
https://github.com/rust-lang/rust/issues/107617#issuecomment...