[PATCH] D123652: Add use condition for combine SetCCMOVMSK
Xiang Zhang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 13 20:51:13 PDT 2022
xiangzhangllvm added a comment.
In D123652#3447672 <https://reviews.llvm.org/D123652#3447672>, @RKSimon wrote:
> Please can you add test coverage - preferably for both cases?
I want to, if it is easy to reproduced by a small case, I had add the lit test : )
This combine directly works on DAG, hard to control/tune the "input" DAG for this combine.
I spend hours to write a workable small reproducer, but failed.
I tried to fetch the problem code from the perf-drop project :
1 ; llc -mtriple=x86_64-unknown -mattr=+avx2
2
3 define i1 @zx(<16 x i8> %x, <16 x i8>%y) {
4 %a = icmp ne <16 x i8> %x, %y
5 %b = bitcast <16 x i1> %a to i16
6 %c = icmp eq i16 %b, 0
7 br i1 %c, label %zx1, label %zx2
8
9 zx1:
10 ret i1 1
11
12 zx2:
13 ; call void @fun(i16 %b)
14 ret i1 0
15 }
16
17 declare void @fun(i16 %b)
It can reproduce the "combine allof(cmpeq(x,y)) -> ptest(sub(x,y))", out put:
8 # %bb.0:
9 vpsubb %ymm1, %ymm0, %ymm0
10 vptest %ymm0, %ymm0
11 je .LBB0_1
12 # %bb.2: # %zx2
13 xorl %eax, %eax
14 vzeroupper
15 retq
16 .LBB0_1: # %zx1
17 movb $1, %al
18 vzeroupper
19 retq
When I uncomment the test's line 13 " ; call void @fun(i16 %b)" to " call void @fun(i16 %b)", let the cmp result has more uses.
The previous optimization will change the DAG
from
SelectionDAG has 14 nodes:
t0: ch = EntryToken
t2: v32i8,ch = CopyFromReg t0, Register:v32i8 %0
t4: v32i8,ch = CopyFromReg t0, Register:v32i8 %1
t26: v32i8 = X86ISD::PCMPEQ t2, t4
t31: i32 = X86ISD::MOVMSK t26
t34: i32,i32 = X86ISD::SUB t31, Constant:i32<-1>
t36: ch = X86ISD::BRCOND t0, BasicBlock:ch<zx2 0xb5ed448>, TargetConstant:i8<5>, t34:1
t16: ch = br t36, BasicBlock:ch<zx1 0xb5ed360>
to (change the SUB to XOR )
SelectionDAG has 16 nodes:
t0: ch = EntryToken
t2: v32i8,ch = CopyFromReg t0, Register:v32i8 %1
t4: v32i8,ch = CopyFromReg t0, Register:v32i8 %2
t28: v32i8 = X86ISD::PCMPEQ t2, t4
t33: i32 = X86ISD::MOVMSK t28
t35: i32,i32 = X86ISD::XOR t33, Constant:i32<-1>
t9: ch = CopyToReg t0, Register:i32 %0, t35
t37: ch = X86ISD::BRCOND t9, BasicBlock:ch<zx2 0xb5ed288>, TargetConstant:i8<5>, t35:1
t18: ch = br t37, BasicBlock:ch<zx1 0xb5ed1a0>
So, it fail to meet the logic to do such combine. (fail to reproduce).
But from the DAG level, take combine "MOVMSK(PCMPEQ(X,Y)) == -1 -> PTESTZ(SUB(X,Y),SUB(X,Y))" for example:
This is a obvious bug to combine the "mid-node" without considering if the "mid-node" has other uses.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D123652/new/
https://reviews.llvm.org/D123652
More information about the llvm-commits
mailing list