[llvm] [X86][Codegen] Shuffle certain shifts on i8 vectors to create opportunity for vectorized shift instructions (PR #117980)
Simon Pilgrim via llvm-commits
llvm-commits at lists.llvm.org
Sun Dec 15 08:38:35 PST 2024
================
@@ -30044,6 +30150,136 @@ static SDValue LowerShift(SDValue Op, const X86Subtarget &Subtarget,
}
}
+ // SHL/SRL/SRA on vXi8 can be widened to vYi16 or vYi32 if the constant
+ // amounts can be shuffled such that every pair or quad of adjacent elements
+ // has the same value. This introduces an extra shuffle before and after the
+ // shift, and it is profitable if the operand is aready a shuffle so that both
+ // can be merged or the extra shuffle is fast.
+ // (shift (shuffle X P1) S1) ->
+ // (shuffle (shift (shuffle X (shuffle P2 P1)) S2) P2^-1) where S2 can be
+ // widened, and P2^-1 is the inverse shuffle of P2.
+ // This is not profitable on XOP or AVX512 becasue it has 8/16-bit vector
+ // variable shift instructions.
+ // Picking out GFNI because normally it implies AVX512, and there is no
+ // latency data for CPU with GFNI and SSE or AVX only, but there are tests for
+ // such combination anyways.
+ if (ConstantAmt &&
----------------
RKSimon wrote:
The lowering scheme immediately above this is very similar to what you're doing (and a lot easier to grok) - I'd recommend you look at extending that code instead of introducing this separate implementation.
https://github.com/llvm/llvm-project/pull/117980
More information about the llvm-commits
mailing list