[PATCH] D46031: [DAGCombiner] Masked merge: if 'B' is constant, de-canonicalize the pattern (invert the mask).

Fri May 4 06:43:06 PDT 2018

lebedev.ri added inline comments.

================
Comment at: test/CodeGen/AArch64/unfold-masked-merge-scalar-variablemask.ll:401-403
+; CHECK-NEXT:    eor w8, w0, w8
 ; CHECK-NEXT:    bic w8, w8, w2
+; CHECK-NEXT:    orr w0, w8, w0
----------------
spatel wrote:
> lebedev.ri wrote:
> > andreadb wrote:
> > > lebedev.ri wrote:
> > > > spatel wrote:
> > > > > lebedev.ri wrote:
> > > > > > spatel wrote:
> > > > > > > This is a real regression, or am I seeing things that aren't there?
> > > > > > We replaced two instructions with two other instructions.
> > > > > > Unless i'm using a bad `-mcpu` (`-mtriple=aarch64-unknown-linux-gnu -mcpu=cortex-a75`, is there a better choice?),
> > > > > > this does not seem to matter in practice.
> > > > > > Or i'm simply looking at `llvm-mca` wrong :)
> > > > > > {F6087636}
> > > > > Correct - 2 instructions change.
> > > > > But the whole point of the masked merge exercise was to maximize the throughput depending on the target, right? The code with and/andn+or has a shorter critical path than the dependent chain of xor+and+xor.
> > > > > 
> > > > > So I think llvm-mca is lying...at least for that CPU model.
> > > > > If we plug these in with -mcpu=kryo, we get:
> > > > > IPC:            1.32  for the 'eor' chain
> > > > > IPC:            1.96  for the 'bic' chain
> > > > > 
> > > > > Is the problem that x86 can't form 'andn' with an immediate? Can we fix its override of hasAndNot to account for that? Or is the problem that we should be ignoring 'not' ops as candidates for transforming in this function? Or both?
> > > > > But the whole point of the masked merge exercise was to maximize the throughput depending on the target, right?
> > > > 
> > > > Yes, absolutely.
> > > > 
> > > > > So I think llvm-mca is lying...at least for that CPU model.
> > > > > If we plug these in with -mcpu=kryo, we get:
> > > > > IPC: 1.32 for the 'eor' chain
> > > > > IPC: 1.96 for the 'bic' chain
> > > > 
> > > > Ok, thank you, that makes more sense.
> > > > It would be nice if llvm-mca's docs would contain a list of 'good' cpu models, for which it is known not to lie. (cc @andreadb )
> > > > 
> > > > > Is the problem that x86 can't form 'andn' with an immediate?
> > > > 
> > > > Yes, that is the motivational case.
> > > > 
> > > > > Can we fix its override of hasAndNot to account for that?
> > > > 
> > > > Hmm, actually, maybe we can...
> > > > Looking at the docs, it is already specified that it takes the value, not the mask-to-be-inverted.
> > > > 
> > > > > Or is the problem that we should be ignoring 'not' ops as candidates for transforming in this function? Or both?
> > > > 
> > > > I don't think i'm able to answer that. Instcombine should certainly handle that, yes.
> > > llvm-mca is not lying.
> > > cpu cortex-a75 describes the latency of eor/bic/orr using variant scheduling classes.
> > > llvm-mca doesn't know how to analyze variant scheduling classes. So, you should have seen one or more warnings generated by the tool.
> > > 
> > > If for good model you mean a model that doesn't use variant scheduling classes, then you only need to worry about cases where the above mentioned warnings are generated.
> > > 
> > > I am currently working hard on a patch to add support for variant scheduling classes. It is quite tricky, but I am confident that I will have something in the form of a patch ready (hopefully) on next week.
> > > 
> > > Cheers,
> > > Andrea
> > I've come up with this script to do these comparisons {F6101135}
> > I'm guessing i haven't noticed any warnings because i only redirect stdout, which is good to know.
> Aha - thanks for clearing that up. I missed the warnings because they were at the top of the output, and I didn't scroll that far back up. :)
> 
> ```
> $ llvm-mca -mtriple=aarch64 -mcpu=cortex-a75  eor.s 
> warning: don't know how to model variant opcodes.
> note: assume 1 micro opcode.
> 
> ```
> 
> A potential usability improvement would be to make warnings like that one louder in some way (repeat it at the bottom, put asterisks in the stats?). Just a thought...now that I know, I'll definitely look harder at the whole output.
Or maybe duplicate them to stdout too, not just output then to stderr?

Repository:
  rL LLVM

https://reviews.llvm.org/D46031