[llvm-bugs] [Bug 45873] New: [MCA][MCSchedModel] Add a optional DelayCycles vector for SchedWriteRes.
via llvm-bugs
llvm-bugs at lists.llvm.org
Mon May 11 02:18:37 PDT 2020
https://bugs.llvm.org/show_bug.cgi?id=45873
Bug ID: 45873
Summary: [MCA][MCSchedModel] Add a optional DelayCycles vector
for SchedWriteRes.
Product: tools
Version: trunk
Hardware: PC
OS: Windows NT
Status: NEW
Severity: enhancement
Priority: P
Component: llvm-mca
Assignee: unassignedbugs at nondot.org
Reporter: andrea.dibiagio at gmail.com
CC: andrea.dibiagio at gmail.com, llvm-bugs at lists.llvm.org,
matthew.davis at sony.com
This was suggested by Andy in
https://lists.llvm.org/pipermail/llvm-dev/2020-May/141487.html
The idea is to add a DelayCycles vector to SchedWriteRes to indicate the
relative start cycle for each reserved resource. That would effectively model
dependent uOps.
At the moment, it is not possible to delay the consumption of specific hardware
resources. The expectation is that resource consumption always starts at
relative cycle #0 (i.e. relative to the instruction issue cycle).
A vector of DelayCycles (if present) would contain unsigned integer values
(ideally one per each processor resource consumed by a write), and those values
would be offsets in cycles relative to the issue cycle.
The absence of a DelayCycles vector would be semantically equivalent to a
all-zeroes DelayCycles vector.
This would require a mostly mechanical change in tablegen to teach how to parse
and semantically analyze this new concept. The subtarget-emitter would
eventually generate information about those delay-cycles in a table.
A more complicated change would be needed for the bookkeping logic in mca
(HardwareUnits/ResourceManager.cpp).
Most x86 processor models would probably benefit from this change. SchedWrite
definitions which might benefit from this change are writes for horizontal
operations. On most x86 processors, horizontal add/sub is usually decoded into
a pair of shuffles uOPs followed by a single (data-dependent) vector ADD uOP.
The ADD uOP doesn't execute immediately because it needs to wait for the other
two shuffle uOPs. So the ALU pipe is still available at relative cycle #0, and
it is only consumed by the horizontal operation starting from relative cycle
#1.
This was just an example. There are probably various write descriptors (not
just writes for microcoded instructions) which would benefit from this change.
This will also solve a number of known problems with the descriptors in
Haswell/Broadwell. Last but not least it would allow us to simplify the
bookkeping logic in llvm-mca and get rid of the not-so-nice "reserved" bit for
processor resource groups. More details about those two issues can be found in
the above mentioned llvmdev thread.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20200511/e7412dd9/attachment.html>
More information about the llvm-bugs
mailing list