[llvm] [X86][Codegen] Shuffle certain shifts on i8 vectors to create opportunity for vectorized shift instructions (PR #117980)
William Huang via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 16 14:56:48 PST 2024
================
@@ -30044,6 +30150,136 @@ static SDValue LowerShift(SDValue Op, const X86Subtarget &Subtarget,
}
}
+ // SHL/SRL/SRA on vXi8 can be widened to vYi16 or vYi32 if the constant
+ // amounts can be shuffled such that every pair or quad of adjacent elements
+ // has the same value. This introduces an extra shuffle before and after the
+ // shift, and it is profitable if the operand is aready a shuffle so that both
+ // can be merged or the extra shuffle is fast.
+ // (shift (shuffle X P1) S1) ->
+ // (shuffle (shift (shuffle X (shuffle P2 P1)) S2) P2^-1) where S2 can be
+ // widened, and P2^-1 is the inverse shuffle of P2.
+ // This is not profitable on XOP or AVX512 becasue it has 8/16-bit vector
+ // variable shift instructions.
+ // Picking out GFNI because normally it implies AVX512, and there is no
+ // latency data for CPU with GFNI and SSE or AVX only, but there are tests for
+ // such combination anyways.
+ if (ConstantAmt &&
----------------
huangjd wrote:
The code above is to handle shift widening when adjacent pairs have same shift amount. My patch tries to find a permutation to create such shift, but does not perform widening itself (and hand it to the code above), so it is in fact a different functionality and better left in a separate section
https://github.com/llvm/llvm-project/pull/117980
More information about the llvm-commits
mailing list