[llvm-bugs] [Bug 51557] New: [SchedModel][MCA] Improve handling of load uOPs and read-advance.

via llvm-bugs llvm-bugs at lists.llvm.org
Fri Aug 20 04:49:04 PDT 2021


https://bugs.llvm.org/show_bug.cgi?id=51557

            Bug ID: 51557
           Summary: [SchedModel][MCA] Improve handling of load uOPs and
                    read-advance.
           Product: tools
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: llvm-mca
          Assignee: unassignedbugs at nondot.org
          Reporter: andrea.dibiagio at gmail.com
                CC: andrea.dibiagio at gmail.com, llvm-bugs at lists.llvm.org,
                    matthew.davis at sony.com

Example:

```
vmulps 112(%rsp), %xmm14, %xmm14
vpermilps $85, %xmm14, %xmm14
```

> llvm-mca -mcpu=skylake -iterations=2 -timeline

```
Timeline view:
                    0123456789    
Index     0123456789          0123

[0,0]     DeeeeeeeeeeER  .    .  .   vmulps     112(%rsp), %xmm14, %xmm14
[0,1]     D==========eER .    .  .   vpermilps  $85, %xmm14, %xmm14
[1,0]     D==========eeeeeeeeeeER.   vmulps     112(%rsp), %xmm14, %xmm14
[1,1]     D====================eER   vpermilps  $85, %xmm14, %xmm14
```

However, the expected timeline looks like this:

```
Timeline view:
                    0123456789    
Index     0123456789          0123

[0,0]     DeeeeeeeeeeER  .    .  vmulps     112(%rsp), %xmm14, %xmm14
[0,1]     D==========eER .    .  vpermilps  $85, %xmm14, %xmm14
[1,0]     D=====eeeeeeeeeeER  .  vmulps     112(%rsp), %xmm14, %xmm14
[1,1]     D===============eER .  vpermilps  $85, %xmm14, %xmm14
```


The reason why mca doesn't schedule the second vmulps in advance, is because
the write-back cycle for register XMM14 is unknown until cycle 11.

One of the biggest limitations in LLVM, is the inability to independently
simulate individual micro-opcodes of an instruction.

For a simulator like mca, it means that memory uOPs cannot be accurately
tracked. This is the main reason why in general, instructions with memory
operands are often poorly simulated.

ReadAdvance was originally introduced to workaround the issue related to the
inability of processing individual uOPs of an instruction. However, in order to
work, read-advance still requires that the write-back cycle for the input
register definition is known.

In this particular example, the write-back stage for the first VPERMILPS is
unknown until cycle 11. Therefore, the write-back of XMM14 is also unknown
until then. So, the read-advance in VMULPS can only trigger at that point.

That is what prevents the VMULPS from starting earlier.

There might be ways to partially work-around this issue in mca. However, I am
afraid that a proper solution would require introducing changes to the
scheduling model, and how read-advance for memory load operands is defined.

Depending on how we decide to address this issue, this bug could potentially
have an impact on bug 39829 and bug 39830.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20210820/e1530d63/attachment.html>


More information about the llvm-bugs mailing list