[PATCH] D124743: [DAGCombine] Add node in the worklist in topological order in CombineTo
Amaury SECHET via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon May 2 03:45:31 PDT 2022
deadalnix added inline comments.
================
Comment at: llvm/test/CodeGen/X86/movmsk-cmp.ll:4564
+; AVX1OR2-NEXT: testb $1, %cl
+; AVX1OR2-NEXT: cmovel %edx, %eax
; AVX1OR2-NEXT: retq
----------------
deadalnix wrote:
> RKSimon wrote:
> > deadalnix wrote:
> > > RKSimon wrote:
> > > > deadalnix wrote:
> > > > > I investigated this regression.
> > > > >
> > > > > before this patch, the DAG after type legalization and optimization looks like:
> > > > > ```SelectionDAG has 25 nodes:
> > > > > t0: ch = EntryToken
> > > > > t62: i8 = xor t52, Constant:i8<-1>
> > > > > t55: i8 = and t62, Constant:i8<1>
> > > > > t53: i8 = and t52, Constant:i8<2>
> > > > > t56: i8 = or t55, t53
> > > > > t57: i8 = setcc t56, Constant:i8<2>, seteq:ch
> > > > > t40: i32 = select t57, Constant:i32<42>, Constant:i32<99>
> > > > > t19: ch,glue = CopyToReg t0, Register:i32 $eax, t40
> > > > > t2: v2f64,ch = CopyFromReg t0, Register:v2f64 %0
> > > > > t4: v2f64,ch = CopyFromReg t0, Register:v2f64 %1
> > > > > t41: v2i64 = setcc t2, t4, setogt:ch
> > > > > t24: i32 = X86ISD::MOVMSK t41
> > > > > t52: i8 = truncate t24
> > > > > t20: ch = X86ISD::RET_FLAG t19, TargetConstant:i32<0>, Register:i32 $eax, t19:1
> > > > > ```
> > > > >
> > > > > After, it looks like:
> > > > > ```
> > > > > SelectionDAG has 20 nodes:
> > > > > t0: ch = EntryToken
> > > > > t55: i8 = and t46, Constant:i8<1>
> > > > > t56: i8 = srl t46, Constant:i8<1>
> > > > > t21: i32 = select t56, Constant:i32<42>, Constant:i32<99>
> > > > > t22: i32 = select t55, t21, Constant:i32<99>
> > > > > t19: ch,glue = CopyToReg t0, Register:i32 $eax, t22
> > > > > t2: v2f64,ch = CopyFromReg t0, Register:v2f64 %0
> > > > > t4: v2f64,ch = CopyFromReg t0, Register:v2f64 %1
> > > > > t40: v2i64 = setcc t2, t4, setogt:ch
> > > > > t24: i32 = X86ISD::MOVMSK t40
> > > > > t46: i8 = truncate t24
> > > > > t20: ch = X86ISD::RET_FLAG t19, TargetConstant:i32<0>, Register:i32 $eax, t19:1
> > > > > ```
> > > > >
> > > > > Arguably, the DAG after this patch looks simpler/better, so it is hard to argue this does the wrong thing here. However, the rest of the backend doesn't seems to be able to do as good of a job on this DAG. Is there something that can be done here? It seems like there is a legitimate blind spot in the X86 backend rather than a problem with this patch. @RKSimon maybe?
> > > > Adding a select(c2, select(c1, x, y), y) fold is possible.
> > > >
> > > > Another option is to just alter the test - InstCombine will now optimize this code to a reduction pattern: https://simd.godbolt.org/z/hfv3YW4Y6
> > > > ```
> > > > define i32 @PR39665_c_ray(<2 x double> %x, <2 x double> %y) {
> > > > %cmp = fcmp ogt <2 x double> %x, %y
> > > > %shift = shufflevector <2 x i1> %cmp, <2 x i1> poison, <2 x i32> <i32 1, i32 undef>
> > > > %1 = and <2 x i1> %cmp, %shift
> > > > %u = extractelement <2 x i1> %1, i64 0
> > > > %r = select i1 %u, i32 42, i32 99
> > > > ret i32 %r
> > > > }
> > > > ```
> > > There is already a fold for select(c2, select(c1, x, y), y) , but the X86 backend tell DAGCombine that it prefers the double select. I tried to force the transform, and I got a mixed bag out of it. Sometime it improve things, sometime it makes them worse. Overall, the result was unclear.
> > >
> > > I'll put up a patch with the updated IR then.
> > Great - can you keep the current test as well (with a TODO comment)? We don't care much about the regression but it'd be useful to keep track of any fix that comes along.
> Just saw that after submitting the patch. I'll update it to keep both.
Done, see D124756
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D124743/new/
https://reviews.llvm.org/D124743
More information about the llvm-commits
mailing list