[llvm] [X86] Fold VPERMV3(X, M, Y) -> VPERMV(CONCAT(X, Y), WIDEN(M)) iff the CONCAT is free (PR #122485)

Simon Pilgrim via llvm-commits llvm-commits at lists.llvm.org
Sat Jan 11 06:17:34 PST 2025


================
@@ -65,10 +65,9 @@ define void @shuffle_v16i32_to_v8i32_1(ptr %L, ptr %S) nounwind {
 ;
 ; AVX512BWVL-FAST-ALL-LABEL: shuffle_v16i32_to_v8i32_1:
 ; AVX512BWVL-FAST-ALL:       # %bb.0:
-; AVX512BWVL-FAST-ALL-NEXT:    vmovdqa (%rdi), %ymm0
-; AVX512BWVL-FAST-ALL-NEXT:    vpmovsxbd {{.*#+}} ymm1 = [1,3,5,7,9,11,13,15]
-; AVX512BWVL-FAST-ALL-NEXT:    vpermi2d 32(%rdi), %ymm0, %ymm1
-; AVX512BWVL-FAST-ALL-NEXT:    vmovdqa %ymm1, (%rsi)
+; AVX512BWVL-FAST-ALL-NEXT:    vmovaps {{.*#+}} ymm0 = [1,3,5,7,9,11,13,15]
+; AVX512BWVL-FAST-ALL-NEXT:    vpermps (%rdi), %zmm0, %zmm0
----------------
RKSimon wrote:

np - I think I understood - what I was asking was whether you had any concerns regarding domain stalls from an avx512 capable target for:
```asm
vpmovsxbd {{.*#+}} ymm0 = [1,3,5,7,9,11,13,15] ; INT DOMAIN - 64-bit load
vpermps (%rdi), %zmm0, %zmm0 ; FP DOMAIN
```
vs
```asm
vmovaps {{.*#+}} ymm0 = [1,3,5,7,9,11,13,15] ; FP DOMAIN - 256-bit load
vpermps (%rdi), %zmm0, %zmm0 ; FP DOMAIN
```
I don't think there would be, and we could tweak the X86FixupVectorConstants tables to allow integer s/zext loads to replace fp full width loads.

https://github.com/llvm/llvm-project/pull/122485


More information about the llvm-commits mailing list