efriedma added a comment. We use whilelo to lower VECTOR_SPLAT; I think that ends up being one fewer vector instruction. What's the tradeoff between that vs. dup+cmpne? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D79356/new/ https://reviews.llvm.org/D79356