<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - [SchedModel][Cortex-A55/Cortex-A53] Unexpected write latency for some ldr instructions"
href="https://bugs.llvm.org/show_bug.cgi?id=50484">50484</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>[SchedModel][Cortex-A55/Cortex-A53] Unexpected write latency for some ldr instructions
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Windows NT
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: AArch64
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>andrea.dibiagio@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>arnaud.degrandmaison@arm.com, llvm-bugs@lists.llvm.org, smithp352@googlemail.com, Ties.Stuij@arm.com
</td>
</tr></table>
<p>
<div>
<pre>Found while investigating an issue in the in-order processor simulation
pipeline of MCA.
Example:
<span class="quote">> $ cat foo.s
> ldr w4, [x6], #4</span >
```
llvm-mca -mtriple=aarch64 --mcpu=cortex-a55 -debug-only=llvm-mca foo.s
```
```
Opcode Name= LDRWpost
SchedClassID=924
Resource Mask=0x00000000000020, Reserved=0, #Units=1, cy=1
Buffer Mask=0x00000000000020
Used Units=0x00000000000020
Used Groups=0x00000000000000
[Def] OpIdx=0, Latency=0, WriteResourceID=0
[Def] OpIdx=1, Latency=4, WriteResourceID=0
[Use] OpIdx=2, UseIndex=0
MaxLatency=4
NumMicroOps=2
```
That particular LDR is disassembled as opcode LDRWpost.
The first write appears to have a zero latency.
That zero-latency write confuses the in-order stage, and it is the root cause
of the issue with the following timeline:
```
Timeline view:
0123456789
Index 0123456789 012345
[0,0] DeeE . . . . . ldr w4, [x6], #4
[0,1] .DeeeE . . . . str w0, [x21]
[1,0] . DeeE . . . . ldr w4, [x6], #4
[1,1] . .DeeeE . . . str w0, [x21]
[2,0] . . DeeE . . . ldr w4, [x6], #4
[2,1] . . .DeeeE . . str w0, [x21]
[3,0] . . . DeeE . . ldr w4, [x6], #4
[3,1] . . . .DeeeE . str w0, [x21]
[4,0] . . . . DeeE . ldr w4, [x6], #4
[4,1] . . . . .DeeeE str w0, [x21]
```
Under the assumption that there is no aliasing between loads and stores, each
load could start a couple of cycles earlier. However, MCA artificially delays
the loads to avoid that the load write happens before the store terminates
execution.
Note: same problem can be seen for cortex-a53.
Are we missing some InstRW?</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>