[PATCH] D46031: [DAGCombiner] Masked merge: if 'B' is constant, de-canonicalize the pattern (invert the mask).

Wed May 2 14:19:19 PDT 2018

spatel added inline comments.

================
Comment at: test/CodeGen/AArch64/unfold-masked-merge-scalar-variablemask.ll:401-403
+; CHECK-NEXT:    eor w8, w0, w8
 ; CHECK-NEXT:    bic w8, w8, w2
+; CHECK-NEXT:    orr w0, w8, w0
----------------
lebedev.ri wrote:
> spatel wrote:
> > This is a real regression, or am I seeing things that aren't there?
> We replaced two instructions with two other instructions.
> Unless i'm using a bad `-mcpu` (`-mtriple=aarch64-unknown-linux-gnu -mcpu=cortex-a75`, is there a better choice?),
> this does not seem to matter in practice.
> Or i'm simply looking at `llvm-mca` wrong :)
> {F6087636}
Correct - 2 instructions change.
But the whole point of the masked merge exercise was to maximize the throughput depending on the target, right? The code with and/andn+or has a shorter critical path than the dependent chain of xor+and+xor.

So I think llvm-mca is lying...at least for that CPU model.
If we plug these in with -mcpu=kryo, we get:
IPC:            1.32  for the 'eor' chain
IPC:            1.96  for the 'bic' chain

Is the problem that x86 can't form 'andn' with an immediate? Can we fix its override of hasAndNot to account for that? Or is the problem that we should be ignoring 'not' ops as candidates for transforming in this function? Or both?

Repository:
  rL LLVM

https://reviews.llvm.org/D46031