huntergr-arm wrote: Rebased (which pulls in https://github.com/llvm/llvm-project/pull/169925) and tidied a little. I now see csa_with_arith vectorization being a net win for NEON (neutral for i64, better for i32 and i8) on a Neoverse core. https://github.com/llvm/llvm-project/pull/158088