[llvm-bugs] [Bug 51557] New: [SchedModel][MCA] Improve handling of load uOPs and read-advance.
via llvm-bugs
llvm-bugs at lists.llvm.org
Fri Aug 20 04:49:04 PDT 2021
https://bugs.llvm.org/show_bug.cgi?id=51557
Bug ID: 51557
Summary: [SchedModel][MCA] Improve handling of load uOPs and
read-advance.
Product: tools
Version: trunk
Hardware: PC
OS: Windows NT
Status: NEW
Severity: enhancement
Priority: P
Component: llvm-mca
Assignee: unassignedbugs at nondot.org
Reporter: andrea.dibiagio at gmail.com
CC: andrea.dibiagio at gmail.com, llvm-bugs at lists.llvm.org,
matthew.davis at sony.com
Example:
```
vmulps 112(%rsp), %xmm14, %xmm14
vpermilps $85, %xmm14, %xmm14
```
> llvm-mca -mcpu=skylake -iterations=2 -timeline
```
Timeline view:
0123456789
Index 0123456789 0123
[0,0] DeeeeeeeeeeER . . . vmulps 112(%rsp), %xmm14, %xmm14
[0,1] D==========eER . . . vpermilps $85, %xmm14, %xmm14
[1,0] D==========eeeeeeeeeeER. vmulps 112(%rsp), %xmm14, %xmm14
[1,1] D====================eER vpermilps $85, %xmm14, %xmm14
```
However, the expected timeline looks like this:
```
Timeline view:
0123456789
Index 0123456789 0123
[0,0] DeeeeeeeeeeER . . vmulps 112(%rsp), %xmm14, %xmm14
[0,1] D==========eER . . vpermilps $85, %xmm14, %xmm14
[1,0] D=====eeeeeeeeeeER . vmulps 112(%rsp), %xmm14, %xmm14
[1,1] D===============eER . vpermilps $85, %xmm14, %xmm14
```
The reason why mca doesn't schedule the second vmulps in advance, is because
the write-back cycle for register XMM14 is unknown until cycle 11.
One of the biggest limitations in LLVM, is the inability to independently
simulate individual micro-opcodes of an instruction.
For a simulator like mca, it means that memory uOPs cannot be accurately
tracked. This is the main reason why in general, instructions with memory
operands are often poorly simulated.
ReadAdvance was originally introduced to workaround the issue related to the
inability of processing individual uOPs of an instruction. However, in order to
work, read-advance still requires that the write-back cycle for the input
register definition is known.
In this particular example, the write-back stage for the first VPERMILPS is
unknown until cycle 11. Therefore, the write-back of XMM14 is also unknown
until then. So, the read-advance in VMULPS can only trigger at that point.
That is what prevents the VMULPS from starting earlier.
There might be ways to partially work-around this issue in mca. However, I am
afraid that a proper solution would require introducing changes to the
scheduling model, and how read-advance for memory load operands is defined.
Depending on how we decide to address this issue, this bug could potentially
have an impact on bug 39829 and bug 39830.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210820/e1530d63/attachment.html>
More information about the llvm-bugs
mailing list