[PATCH] D124743: [DAGCombine] Add node in the worklist in topological order in CombineTo

Mon May 2 03:39:18 PDT 2022

deadalnix added inline comments.

================
Comment at: llvm/test/CodeGen/X86/movmsk-cmp.ll:4564
+; AVX1OR2-NEXT:    testb $1, %cl
+; AVX1OR2-NEXT:    cmovel %edx, %eax
 ; AVX1OR2-NEXT:    retq
----------------
RKSimon wrote:
> deadalnix wrote:
> > RKSimon wrote:
> > > deadalnix wrote:
> > > > I investigated this regression.
> > > > 
> > > > before this patch, the DAG after type legalization and optimization looks like:
> > > > ```SelectionDAG has 25 nodes:
> > > >   t0: ch = EntryToken
> > > >             t62: i8 = xor t52, Constant:i8<-1>
> > > >           t55: i8 = and t62, Constant:i8<1>
> > > >           t53: i8 = and t52, Constant:i8<2>
> > > >         t56: i8 = or t55, t53
> > > >       t57: i8 = setcc t56, Constant:i8<2>, seteq:ch
> > > >     t40: i32 = select t57, Constant:i32<42>, Constant:i32<99>
> > > >   t19: ch,glue = CopyToReg t0, Register:i32 $eax, t40
> > > >         t2: v2f64,ch = CopyFromReg t0, Register:v2f64 %0
> > > >         t4: v2f64,ch = CopyFromReg t0, Register:v2f64 %1
> > > >       t41: v2i64 = setcc t2, t4, setogt:ch
> > > >     t24: i32 = X86ISD::MOVMSK t41
> > > >   t52: i8 = truncate t24
> > > >   t20: ch = X86ISD::RET_FLAG t19, TargetConstant:i32<0>, Register:i32 $eax, t19:1
> > > > ```
> > > > 
> > > > After, it looks like:
> > > > ```
> > > > SelectionDAG has 20 nodes:
> > > >   t0: ch = EntryToken
> > > >       t55: i8 = and t46, Constant:i8<1>
> > > >         t56: i8 = srl t46, Constant:i8<1>
> > > >       t21: i32 = select t56, Constant:i32<42>, Constant:i32<99>
> > > >     t22: i32 = select t55, t21, Constant:i32<99>
> > > >   t19: ch,glue = CopyToReg t0, Register:i32 $eax, t22
> > > >         t2: v2f64,ch = CopyFromReg t0, Register:v2f64 %0
> > > >         t4: v2f64,ch = CopyFromReg t0, Register:v2f64 %1
> > > >       t40: v2i64 = setcc t2, t4, setogt:ch
> > > >     t24: i32 = X86ISD::MOVMSK t40
> > > >   t46: i8 = truncate t24
> > > >   t20: ch = X86ISD::RET_FLAG t19, TargetConstant:i32<0>, Register:i32 $eax, t19:1
> > > > ```
> > > > 
> > > > Arguably, the DAG after this patch looks simpler/better, so it is hard to argue this does the wrong thing here. However, the rest of the backend doesn't seems to be able to do as good of a job on this DAG. Is there something that can be done here? It seems like there is a legitimate blind spot in the X86 backend rather than a problem with this patch. @RKSimon maybe?
> > > Adding a select(c2, select(c1, x, y), y) fold is possible.
> > > 
> > > Another option is to just alter the test - InstCombine will now optimize this code to a reduction pattern: https://simd.godbolt.org/z/hfv3YW4Y6
> > > ```
> > > define i32 @PR39665_c_ray(<2 x double> %x, <2 x double> %y) {
> > >   %cmp = fcmp ogt <2 x double> %x, %y
> > >   %shift = shufflevector <2 x i1> %cmp, <2 x i1> poison, <2 x i32> <i32 1, i32 undef>
> > >   %1 = and <2 x i1> %cmp, %shift
> > >   %u = extractelement <2 x i1> %1, i64 0
> > >   %r = select i1 %u, i32 42, i32 99
> > >   ret i32 %r
> > > }
> > > ```
> > There is already a fold for select(c2, select(c1, x, y), y) , but the X86 backend tell DAGCombine that it prefers the double select. I tried to force the transform, and I got a mixed bag out of it. Sometime it improve things, sometime it makes them worse. Overall, the result was unclear.
> > 
> > I'll put up a patch with the updated IR then.
> Great - can you keep the current test as well (with a TODO comment)? We don't care much about the regression but it'd be useful to keep track of any fix that comes along.
Just saw that after submitting the patch. I'll update it to keep both.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124743/new/

https://reviews.llvm.org/D124743