[llvm] [X86][Codegen] Shuffle certain shifts on i8 vectors to create opportunity for vectorized shift instructions (PR #117980)

Fri Dec 6 14:28:21 PST 2024

huangjd wrote:

I am getting the test cases now, before that I am measuring the impact of this transformation. From some preliminary result I found that, if running in a loop where the CPU pipeline can be sufficiently filled, this transformation can be beneficial, otherwise it is questionable. Given that vector arithmetic operations is typically used in ML kernels or other very parallel operations, can there be a compile flag to toggle this behavior? 

https://github.com/llvm/llvm-project/pull/117980