arsenm wrote: > > I mean a 32-bit shift that is reducible to 16-bit. Everything just half sized. We should do that, but it's trickier because we don't > want to force vector usage in scalar contexts > > in a separate step? Yes https://github.com/llvm/llvm-project/pull/125574