[PATCH] D52174: [TableGen][SubtargetEmitter] Add the ability for processor models to describe dependency breaking instructions.

Andrea Di Biagio via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Sep 17 09:33:23 PDT 2018


andreadb marked 8 inline comments as done.
andreadb added inline comments.


================
Comment at: lib/Target/X86/X86ScheduleBtVer2.td:716
+    VPXORrr, VPANDNrr, VXORPSrr, VXORPDrr,
+    VXORPSYrr, VXORPDYrr, VANDNPSrr, VANDNPDrr,
+    VPSUBBrr, VPSUBDrr, VPSUBQrr, VPSUBWrr,
----------------
RKSimon wrote:
> VANDNPSYrr/VANDNPDYrr?
Interestingly, those were missing in the original implementation of `X86InstrAnalysis::isDependencyBreaking()`.

I have added them to the set.
I have also added two extra tests for VANDNPSYrr/VANDNPDYrr in `test/tools/llvm-mca/BtVer2/zero-idioms-avx-256.s`


================
Comment at: test/tools/llvm-mca/X86/BtVer2/zero-idioms-avx-256.s:68
+# CHECK-NEXT:  -      -      -     2.00    -     2.00    -      -      -      -      -      -      -      -     vaddps	%ymm0, %ymm0, %ymm1
+# CHECK-NEXT:  -      -      -      -     2.00    -     2.00    -      -      -      -      -      -      -     vxorps	%ymm1, %ymm1, %ymm1
+# CHECK-NEXT:  -      -      -     1.00   1.00   1.00   1.00    -      -      -      -      -      -      -     vblendps	$2, %ymm1, %ymm2, %ymm3
----------------
RKSimon wrote:
> Shouldn't this only take a single resource cycle (0.5 rtp)? IIRC dep-breaking 256-bit ops only needs to process the upper half
Nice catch.

I think the latency/throughput of instructions should be fixed by a separate patch. This patch should only help to identify independent operands of an instruction.
I will add a TODO to this test.



================
Comment at: utils/TableGen/CodeGenSchedule.h:311
+class OpcodeInfo {
+  llvm::SmallVector<PredicateInfo, 8> Predicates;
+
----------------
RKSimon wrote:
> Why SmallVector here - the other added classes use std::vector
That vector is expected to be small. In the worst case scenario, it has exactly one element per processor model. It is initialized to 8, because most targets define less than 8 predicates. X86 is the target with most models (9).

If you want,  I can use a vector here for consistency.


https://reviews.llvm.org/D52174





More information about the llvm-commits mailing list