zhaoqi5 wrote: Better choice. Lower instruction latency than before. By the way, maybe `extractelement+insertelement` sequence of `v8i32/v4i64` types can also use `xvpickve` and `xvinsve0` instructions? https://github.com/llvm/llvm-project/pull/151914