[llvm] [X86][Codegen] Shuffle certain shifts on i8 vectors to create opportunity for vectorized shift instructions (PR #117980)
William Huang via llvm-commits
llvm-commits at lists.llvm.org
Sat Dec 14 22:04:57 PST 2024
huangjd wrote:
Cost/benefit analysis below, assuming a fully utilized pipeline (for example, `op mem, reg` never stalls on memory load as if the memory load uop is issued early enough so that the actual arithmetic/logic uop can be issued immediately after dependent reg is available).
v*i8 column is original latency. v*16 and v*i32 are latency values for shift widened to 16 and 32 byte respectively.
![Screenshot from 2024-12-15 00-57-40](https://github.com/user-attachments/assets/58b5329d-def5-4949-966a-2f14ef351e72)
https://github.com/llvm/llvm-project/pull/117980
More information about the llvm-commits
mailing list