[PATCH] D73575: [InstCombine] canonicalize splat shuffle after cmp
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 11 11:51:37 PST 2021
spatel added inline comments.
================
Comment at: llvm/test/Transforms/InstCombine/gep-inbounds-null.ll:95
+; CHECK-NEXT: [[TMP0:%.*]] = icmp eq <2 x i8*> [[DOTSPLATINSERT]], zeroinitializer
+; CHECK-NEXT: [[CND:%.*]] = shufflevector <2 x i1> [[TMP0]], <2 x i1> undef, <2 x i32> zeroinitializer
; CHECK-NEXT: ret <2 x i1> [[CND]]
----------------
LuoYuanke wrote:
> On X86 the result of the vector compare instruction would be in %k register, but there is no shuffle instruction for %k register. Here is the test case that was regressed due to this patch. We can duplicate it with "llc -mcpu=skylake-avx512". Any idea to improve it?
>
> ```
> define void @before_canonicalization_i1(<16 x i1> %msk, i32 %in, <16 x i1>* %dst) {
> entry:
> %insrt = insertelement <16 x i32> undef, i32 %in, i32 0
> %splat = shufflevector <16 x i32> %insrt, <16 x i32> poison, <16 x i32> zeroinitializer
> %mul = mul <16 x i32> <i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789, i32 789>, %splat
> %cmp1 = icmp eq <16 x i32> %mul, <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
> %and = and <16 x i1> %msk, %cmp1
> %bc = bitcast <16 x i1> %and to i16
> %cmp = icmp ne i16 %bc, 0
> br i1 %cmp, label %b1, label %b2
> b1:
> store <16 x i1> %and, <16 x i1>* %dst, align 8
> ret void
> b2:
> store <16 x i1> %cmp1, <16 x i1>* %dst, align 8
> ret void
> }
>
> ```
>
>
> ```
> define void @after_canonicalization_i1(<16 x i1> %msk, i32 %in, <16 x i1>* %dst) {
> entry:
> %insrt = insertelement <16 x i32> undef, i32 %in, i32 0
> %0 = mul <16 x i32> %insrt, <i32 789, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
> %1 = icmp eq <16 x i32> %0, zeroinitializer
> %cmp1 = shufflevector <16 x i1> %1, <16 x i1> poison, <16 x i32> zeroinitializer
> %and = and <16 x i1> %cmp1, %msk
> %bc = bitcast <16 x i1> %and to i16
> %cmp.not = icmp eq i16 %bc, 0
> br i1 %cmp.not, label %b2, label %b1
> b1:
> store <16 x i1> %and, <16 x i1>* %dst, align 8
> ret void
> b2:
> store <16 x i1> %cmp1, <16 x i1>* %dst, align 8
> ret void
> }
> ```
>
>
>
This example (not sure if this was over-reduced from original code...) shows a few potential missed folds. I'm not sure yet why we failed all of them.
Reduced the vector length for readability:
```
define <2 x i1> @src(i32 %in) {
%insrt = insertelement <2 x i32> undef, i32 %in, i32 0
%m = mul <2 x i32> %insrt, <i32 789, i32 poison>
%r = icmp eq <2 x i32> %m, zeroinitializer
ret <2 x i1> %r
}
```
1. Scalarize the mul?
2. Followed by scalarize the icmp?
3. We also missed eliminating the mul - probably because we didn't recognize the constant with poison.
https://alive2.llvm.org/ce/z/TZyTU2
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D73575/new/
https://reviews.llvm.org/D73575
More information about the llvm-commits
mailing list