[PATCH] D46031: [DAGCombiner] Masked merge: if 'B' is constant, de-canonicalize the pattern (invert the mask).

Wed May 2 14:40:02 PDT 2018

lebedev.ri added a subscriber: andreadb.
lebedev.ri added inline comments.

================
Comment at: test/CodeGen/AArch64/unfold-masked-merge-scalar-variablemask.ll:401-403
+; CHECK-NEXT:    eor w8, w0, w8
 ; CHECK-NEXT:    bic w8, w8, w2
+; CHECK-NEXT:    orr w0, w8, w0
----------------
spatel wrote:
> lebedev.ri wrote:
> > spatel wrote:
> > > This is a real regression, or am I seeing things that aren't there?
> > We replaced two instructions with two other instructions.
> > Unless i'm using a bad `-mcpu` (`-mtriple=aarch64-unknown-linux-gnu -mcpu=cortex-a75`, is there a better choice?),
> > this does not seem to matter in practice.
> > Or i'm simply looking at `llvm-mca` wrong :)
> > {F6087636}
> Correct - 2 instructions change.
> But the whole point of the masked merge exercise was to maximize the throughput depending on the target, right? The code with and/andn+or has a shorter critical path than the dependent chain of xor+and+xor.
> 
> So I think llvm-mca is lying...at least for that CPU model.
> If we plug these in with -mcpu=kryo, we get:
> IPC:            1.32  for the 'eor' chain
> IPC:            1.96  for the 'bic' chain
> 
> Is the problem that x86 can't form 'andn' with an immediate? Can we fix its override of hasAndNot to account for that? Or is the problem that we should be ignoring 'not' ops as candidates for transforming in this function? Or both?
> But the whole point of the masked merge exercise was to maximize the throughput depending on the target, right?

Yes, absolutely.

> So I think llvm-mca is lying...at least for that CPU model.
> If we plug these in with -mcpu=kryo, we get:
> IPC: 1.32 for the 'eor' chain
> IPC: 1.96 for the 'bic' chain

Ok, thank you, that makes more sense.
It would be nice if llvm-mca's docs would contain a list of 'good' cpu models, for which it is known not to lie. (cc @andreadb )

> Is the problem that x86 can't form 'andn' with an immediate?

Yes, that is the motivational case.

> Can we fix its override of hasAndNot to account for that?

Hmm, actually, maybe we can...
Looking at the docs, it is already specified that it takes the value, not the mask-to-be-inverted.

> Or is the problem that we should be ignoring 'not' ops as candidates for transforming in this function? Or both?

I don't think i'm able to answer that. Instcombine should certainly handle that, yes.

Repository:
  rL LLVM

https://reviews.llvm.org/D46031