[llvm-bugs] [Bug 45873] New: [MCA][MCSchedModel] Add a optional DelayCycles vector for SchedWriteRes.

Mon May 11 02:18:37 PDT 2020

https://bugs.llvm.org/show_bug.cgi?id=45873

            Bug ID: 45873
           Summary: [MCA][MCSchedModel] Add a optional DelayCycles vector
                    for SchedWriteRes.
           Product: tools
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: llvm-mca
          Assignee: unassignedbugs at nondot.org
          Reporter: andrea.dibiagio at gmail.com
                CC: andrea.dibiagio at gmail.com, llvm-bugs at lists.llvm.org,
                    matthew.davis at sony.com

This was suggested by Andy in
https://lists.llvm.org/pipermail/llvm-dev/2020-May/141487.html

The idea is to add a DelayCycles vector to SchedWriteRes to indicate the
relative start cycle for each reserved resource. That would effectively model
dependent uOps.

At the moment, it is not possible to delay the consumption of specific hardware
resources. The expectation is that resource consumption always starts at
relative cycle #0 (i.e. relative to the instruction issue cycle).

A vector of DelayCycles (if present) would contain unsigned integer values
(ideally one per each processor resource consumed by a write), and those values
would be offsets in cycles relative to the issue cycle.
The absence of a DelayCycles vector would be semantically equivalent to a
all-zeroes DelayCycles vector.

This would require a mostly mechanical change in tablegen to teach how to parse
and semantically analyze this new concept. The subtarget-emitter would
eventually generate information about those delay-cycles in a table.
A more complicated change would be needed for the bookkeping logic in mca
(HardwareUnits/ResourceManager.cpp).

Most x86 processor models would probably benefit from this change. SchedWrite
definitions which might benefit from this change are writes for horizontal
operations. On most x86 processors, horizontal add/sub is usually decoded into
a pair of shuffles uOPs followed by a single (data-dependent) vector ADD uOP.
The ADD uOP doesn't execute immediately because it needs to wait for the other
two shuffle uOPs. So the ALU pipe is still available at relative cycle #0, and
it is only consumed by the horizontal operation starting from relative cycle
#1.

This was just an example. There are probably various write descriptors (not
just writes for microcoded instructions) which would benefit from this change.

This will also solve a number of known problems with the descriptors in
Haswell/Broadwell. Last but not least it would allow us to simplify the
bookkeping logic in llvm-mca and get rid of the not-so-nice "reserved" bit for
processor resource groups. More details about those two issues can be found in
the above mentioned llvmdev thread.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20200511/e7412dd9/attachment.html>