<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Poor instruction scheduling for salsa20 cypher hot loop"
href="https://bugs.llvm.org/show_bug.cgi?id=32439">32439</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Poor instruction scheduling for salsa20 cypher hot loop
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Common Code Generator Code
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>davide@freebsd.org
</td>
</tr>
<tr>
<th>CC</th>
<td>atrick@apple.com, efriedma@codeaurora.org, llvm-bugs@lists.llvm.org, matze@braunis.de, qcolombet@apple.com
</td>
</tr></table>
<p>
<div>
<pre>Created <span class=""><a href="attachment.cgi?id=18180" name="attach_18180" title="dag dump pre-scheduling">attachment 18180</a> <a href="attachment.cgi?id=18180&action=edit" title="dag dump pre-scheduling">[details]</a></span>
dag dump pre-scheduling
This is the salsa20 benchmark from the testsuite (SingleSource).
I'm not sure if the model can be improved or this is a general issue with the
instruction scheduler heuristics.
passing -O3 `-mcpu=cortex-a53 -mtune=cortex-a53` LLVM generates the following
code for the hot loop (subset of instructions):
```
400638: 0b100106 add w6, w8, w16
40063c: 0b0d0127 add w7, w9, w13
400640: 4ac66400 eor w0, w0, w6, ror #25
400644: 0b120166 add w6, w11, w18
400648: 4ac76463 eor w3, w3, w7, ror #25
40064c: 4ac66442 eor w2, w2, w6, ror #25
400650: 0b100006 add w6, w0, w16
400654: 0b0d0067 add w7, w3, w13
400658: 4ac65d8c eor w12, w12, w6, ror #23
40065c: 0b120046 add w6, w2, w18
400660: 4ac75def eor w15, w15, w7, ror #23
400664: 4ac65c84 eor w4, w4, w6, ror #23
400668: 0b000186 add w6, w12, w0
40066c: 0b0301e7 add w7, w15, w3
400670: 4ac64d08 eor w8, w8, w6, ror #19
400674: 0b020086 add w6, w4, w2
400678: 4ac74d29 eor w9, w9, w7, ror #19
40067c: 4ac64d6b eor w11, w11, w6, ror #19
```
while gcc 7:
```
400688: 0b020175 add w21, w11, w2
40068c: 0b040214 add w20, w16, w4
400690: 0b050233 add w19, w17, w5
400694: 0b030192 add w18, w12, w3
400698: 4ad56508 eor w8, w8, w21, ror #25
40069c: 4ad464e7 eor w7, w7, w20, ror #25
4006a0: 4ad364c6 eor w6, w6, w19, ror #25
4006a4: 4ad26529 eor w9, w9, w18, ror #25
4006a8: 0b0b0115 add w21, w8, w11
4006ac: 0b1000f4 add w20, w7, w16
4006b0: 0b1100d3 add w19, w6, w17
4006b4: 0b0c0132 add w18, w9, w12
4006b8: 4ad55d4a eor w10, w10, w21, ror #23
4006bc: 4ad45dad eor w13, w13, w20, ror #23
4006c0: 4ad35def eor w15, w15, w19, ror #23
4006c4: 4ad25dce eor w14, w14, w18, ror #23
```
The latter results in many more stalls and ~ 20% runtime regression.
SelectionDAG for the BB pre scheduling and initial IR attached.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>