<div dir="ltr"><div><div>Currently llvm-mca doesn't know how to resolve variant scheduling classes. This problem mostly affects the ARM target.<br></div><div>This has been reported here: <a href="https://bugs.llvm.org/show_bug.cgi?id=36672">https://bugs.llvm.org/show_bug.cgi?id=36672</a><br><br></div>The number of micro opcodes that you see is the llvm-mca output is the default (invalid) number of micro opcodes for instructions associated with a sched-variant class.<br><br></div><div>I plan to send a patch to address (most of) the issues related to the presence of variant scheduling classes. However, keep in mind that ARM sched-predicates heavily rely on TII hooks. Those are going to cause problems for tools like mca (i.e. there is not an easy way to "fix" them).<br><br></div><div>At the moment, llvm-mca doesnt' know how to analyze these two instructions, since both are associated with a variant scheduling class:<br>   eor     w8, w0, w1<br>   mov   w0, w1


<br><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Apr 18, 2018 at 5:10 PM, Sanjay Patel via Phabricator <span dir="ltr"><<a href="mailto:reviews@reviews.llvm.org" target="_blank">reviews@reviews.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">spatel added subscribers: courbet, andreadb.<br>

spatel added a comment.<br>

<span class=""><br>

In <a href="https://reviews.llvm.org/D45733#1071005" rel="noreferrer" target="_blank">https://reviews.llvm.org/<wbr>D45733#1071005</a>, @lebedev.ri wrote:<br>

<br>

> > Yeah, that is the question, i'm having. I did look at mca output.<br>

><br>

> Here is what MCA says about that for `-mtriple=aarch64-unknown-<wbr>linux-gnu -mcpu=cortex-a75`<br>

>  F5971838: diff.txt <<a href="https://reviews.llvm.org/F5971838" rel="noreferrer" target="_blank">https://reviews.llvm.org/<wbr>F5971838</a>><br>

>  Or is this a scheduling info problem?<br>

<br>

<br>

</span>Cool - a chance to poke at llvm-mca! (cc @andreadb and @courbet)<br>

<br>

First thing I see is that it's harder to get the sequence we're after on x86 using the basic source premise:<br>

<br>

  int andandor(int x, int y)  {<br>

    __asm volatile("# LLVM-MCA-BEGIN ands");<br>

    int r = (x & 42) | (y & ~42);<br>

    __asm volatile("# LLVM-MCA-END ands");<br>

    return r;<br>

  }<br>

<br>

  int xorandxor(int x, int y) {<br>

    __asm volatile("# LLVM-MCA-BEGIN xors");<br>

    int r = ((x ^ y) & 42) ^ y;<br>

    __asm volatile("# LLVM-MCA-END xors");<br>

    return r;<br>

  }<br>

<br>

...because the input param register doesn't match the output result register. We'd have to hack that in asm...or put the code in a loop, but subtract the loop overhead somehow. Things work/look alright to me other than that.<br>

<br>

I don't know AArch that well, but your example is a special-case that may be going wrong. Ie, if we have a bit-string constant like 0xff000000, you could get:<br>

        bfxil   w0, w1, #0, #24<br>

...which should certainly be better than:<br>

        eor     w8, w1, w0<br>

        and     w8, w8, #0xff000000<br>

        eor     w0, w8, w1<br>

<br>

AArch64 chose to convert to shift + possibly more expensive bfi for the 0x00ffff00 constant though. That's not something that we can account for in generic DAGCombiner, so I'd categorize that as an AArch64-specific bug (either don't use bfi there or fix the scheduling model or fix this up in MI somehow).<br>

<div class="HOEnZb"><div class="h5"><br>

<br>

Repository:<br>

  rL LLVM<br>

<br>

<a href="https://reviews.llvm.org/D45733" rel="noreferrer" target="_blank">https://reviews.llvm.org/<wbr>D45733</a><br>

<br>

<br>

<br>

</div></div></blockquote></div><br></div>