[PATCH] D85165: [X86][MC][Target] Initial backend support a tune CPU to support -mtune

Andrea Di Biagio via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Aug 12 11:12:21 PDT 2020


andreadb added a comment.

In D85165#2193969 <https://reviews.llvm.org/D85165#2193969>, @craig.topper wrote:

> @andreadb @RKSimon or @efriedma do any of you have suggestions for simple scheduler tests for this? I was hoping I could use -print-schedule like we used to but that no longer exists.

I remember the design of the `-print-schedule` functionality was a bit problematic because it had a layering violation (see PR37160).
The issue was introduced when support for printing scheduling info for inline assembly was added. The first version of print-schedule didn't have that problem though.

Not sure if it can help but, if your goal is to obtain latency and throughput information for every instruction, then you can piple the output of llc in input to llvm-mca.

You can use MCA markers around the regions of code that you want to have analyzed by mca.

Example:

  define void @vzeroupper(<4 x i64>* %x, <4 x i64>* %y) #0 {
    call void asm sideeffect "# LLVM-MCA-BEGIN vzeroupper","~{dirflag},~{fpsr},~{flags}"()
    %a = load <4 x i64>, <4 x i64>* %x
    %b = load <4 x i64>, <4 x i64>* %y
    %c = mul <4 x i64> %a, %b
    store <4 x i64> %c, <4 x i64>* %x
    call void asm sideeffect "# LLVM-MCA-END", "~{dirflag},~{fpsr},~{flags}"()
    ret void
  }

If you now run the following command:

> llc < my-vzeroupper-test.ll | llvm-mca -mcpu=skx -all-views=false -instruction-info

The you should see something like this:

  [0] Code Region - vzeroupper
  
  
  
  Instruction Info:
  [1]: #uOps
  [2]: Latency
  [3]: RThroughput
  [4]: MayLoad
  [5]: MayStore
  [6]: HasSideEffects (U)
  
  [1]    [2]    [3]    [4]    [5]    [6]    Instructions:
   1      7     0.50    *                   vmovdqa       (%rdi), %ymm0
   4      22    1.50    *                   vpmullq       (%rsi), %ymm0, %ymm0
   2      1     1.00           *            vmovdqa       %ymm0, (%rdi)

Note however that mca doesn't allow you to specify different cpus for different code blocks. If you want to do something like that, then unfortunately you need to split your test into multiple files...

That being said, you should be able to then use FileCheck and check latency/throughput values.



================
Comment at: llvm/lib/MC/MCSubtargetInfo.cpp:183-188
+
+    // If there is a match
+    if (CPUEntry) {
+      // Set the features implied by this CPU feature, if any.
+      SetImpliedBits(Bits, CPUEntry->TuneImplies.getAsBitset(), ProcFeatures);
+    } else if (TuneCPU != CPU) {
----------------
Maybe it has already been asked before (apologies in case), but what if these features are not really compatible with `CPU`?
What if let say we have a crazy combination such as: -mcpu=btver2 -mtune=skx.
Not that I expect people to write that sequence of options :-).


================
Comment at: llvm/lib/Target/X86/X86Subtarget.cpp:235-236
 
+  if (TuneCPU.empty())
+    TuneCPU = "generic";
+
----------------
Out of curiosity. Is there a reason why `TuneCPU` defaults to "generic" and not to the CPU strings (from line 233)?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85165/new/

https://reviews.llvm.org/D85165



More information about the llvm-commits mailing list