[PATCH] D59758: [DAGCombiner] Combine OR as ADD when no common bits are set

Wed Mar 27 11:08:04 PDT 2019

bjope added a comment.

In D59758#1443308 <https://reviews.llvm.org/D59758#1443308>, @spatel wrote:

> In D59758#1443074 <https://reviews.llvm.org/D59758#1443074>, @bjope wrote:
>
> > In D59758#1442959 <https://reviews.llvm.org/D59758#1442959>, @spatel wrote:
> >
> > > In D59758#1441124 <https://reviews.llvm.org/D59758#1441124>, @bjope wrote:
> > >
> > > > Hello reviewers! Do you think this is a good idea?
> > >
> > >
> > > It's an interesting idea. :)
> > >
> > > > I've mostly seen improvements for our OOT target when doing this, but for example llvm/test/CodeGen/X86/split-store.ll also exposes a case when we trigger a rewrite into using SUB.
> > >
> > > Yes, we'd classify that as a slight regression for x86.
> >
> >
> > Isn't split-store.ll showing an improvement (we get one subb instead of andb+orb)?
>
>
> Yes - sorry, I reversed that with signbit-shift.ll. So we would call signbit-shift.ll a slight regression because of the extra mov instruction. We are probably missing a generic combine. Might be similar to the hexagon diff?

Seems to be foldAddSubMasked1 that now is doing the fold

  (or (and (AssertSext X, i1), 1), C) --> (sub C, X)

which looks pretty nice in the DAG, but at least in this case (X86 + vector ops) the load of the splat vector C from constant pool can't be folded into the PSUBD, so we get a separate MOVAPS for that:

    t24: v4i32,ch = MOVAPSrm<Mem:(load 16 from constant-pool)> Register:i64 $rip, TargetConstant:i8<1>, Register:i64 $noreg, TargetConstantPool:i32<<4 x i32> <i32 42, i32 42, i32 42, i32 42>> 0, Register:i32 $noreg, t0
    t21: v4i32 = PCMPGTDrr t2, V_SETALLONES:v4i32
  t20: v4i32 = PSUBDrr t24, t21

instead of this output from isel where the load from constant pool is folded into the POR

      t20: v4i32 = PCMPGTDrr t2, V_SETALLONES:v4i32
    t22: v4i32 = PSRLDri t20, TargetConstant:i8<31>
  t15: v4i32,ch = PORrm<Mem:(load 16 from constant-pool)> t22, Register:i64 $rip, TargetConstant:i8<1>, Register:i64 $noreg, TargetConstantPool:i32<<4 x i32> <i32 42, i32 42, i32 42, i32 42>> 0, Register:i32 $noreg, t0

Afaict, the code after ISel looks better when we do the rewrite into using sub (the height of the DAG is reduced by one). However, since both the PCMPGTD and the PSUBD has tied operands we lose when doing register allocation, and we have to insert an extra move to be pass the result in xmm0. I do not think that we can detect that in the generic DAGCombiner. And if the test case just had been a little bit different (receiving %x in xmm1, if that is possible), then there wouldn't have been an extra MOVDQA.

So is this regression anything to bother about?  (I can add another test case where %x isn't taken from the first argument.)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59758/new/

https://reviews.llvm.org/D59758