[PATCH] D85165: [X86][MC][Target] Initial backend support a tune CPU to support -mtune
Andrea Di Biagio via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Aug 12 11:12:21 PDT 2020
andreadb added a comment.
In D85165#2193969 <https://reviews.llvm.org/D85165#2193969>, @craig.topper wrote:
> @andreadb @RKSimon or @efriedma do any of you have suggestions for simple scheduler tests for this? I was hoping I could use -print-schedule like we used to but that no longer exists.
I remember the design of the `-print-schedule` functionality was a bit problematic because it had a layering violation (see PR37160).
The issue was introduced when support for printing scheduling info for inline assembly was added. The first version of print-schedule didn't have that problem though.
Not sure if it can help but, if your goal is to obtain latency and throughput information for every instruction, then you can piple the output of llc in input to llvm-mca.
You can use MCA markers around the regions of code that you want to have analyzed by mca.
Example:
define void @vzeroupper(<4 x i64>* %x, <4 x i64>* %y) #0 {
call void asm sideeffect "# LLVM-MCA-BEGIN vzeroupper","~{dirflag},~{fpsr},~{flags}"()
%a = load <4 x i64>, <4 x i64>* %x
%b = load <4 x i64>, <4 x i64>* %y
%c = mul <4 x i64> %a, %b
store <4 x i64> %c, <4 x i64>* %x
call void asm sideeffect "# LLVM-MCA-END", "~{dirflag},~{fpsr},~{flags}"()
ret void
}
If you now run the following command:
> llc < my-vzeroupper-test.ll | llvm-mca -mcpu=skx -all-views=false -instruction-info
The you should see something like this:
[0] Code Region - vzeroupper
Instruction Info:
[1]: #uOps
[2]: Latency
[3]: RThroughput
[4]: MayLoad
[5]: MayStore
[6]: HasSideEffects (U)
[1] [2] [3] [4] [5] [6] Instructions:
1 7 0.50 * vmovdqa (%rdi), %ymm0
4 22 1.50 * vpmullq (%rsi), %ymm0, %ymm0
2 1 1.00 * vmovdqa %ymm0, (%rdi)
Note however that mca doesn't allow you to specify different cpus for different code blocks. If you want to do something like that, then unfortunately you need to split your test into multiple files...
That being said, you should be able to then use FileCheck and check latency/throughput values.
================
Comment at: llvm/lib/MC/MCSubtargetInfo.cpp:183-188
+
+ // If there is a match
+ if (CPUEntry) {
+ // Set the features implied by this CPU feature, if any.
+ SetImpliedBits(Bits, CPUEntry->TuneImplies.getAsBitset(), ProcFeatures);
+ } else if (TuneCPU != CPU) {
----------------
Maybe it has already been asked before (apologies in case), but what if these features are not really compatible with `CPU`?
What if let say we have a crazy combination such as: -mcpu=btver2 -mtune=skx.
Not that I expect people to write that sequence of options :-).
================
Comment at: llvm/lib/Target/X86/X86Subtarget.cpp:235-236
+ if (TuneCPU.empty())
+ TuneCPU = "generic";
+
----------------
Out of curiosity. Is there a reason why `TuneCPU` defaults to "generic" and not to the CPU strings (from line 233)?
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D85165/new/
https://reviews.llvm.org/D85165
More information about the llvm-commits
mailing list