[llvm-bugs] [Bug 32439] New: Poor instruction scheduling for salsa20 cypher hot loop
via llvm-bugs
llvm-bugs at lists.llvm.org
Mon Mar 27 13:51:23 PDT 2017
https://bugs.llvm.org/show_bug.cgi?id=32439
Bug ID: 32439
Summary: Poor instruction scheduling for salsa20 cypher hot
loop
Product: libraries
Version: trunk
Hardware: PC
OS: All
Status: NEW
Severity: enhancement
Priority: P
Component: Common Code Generator Code
Assignee: unassignedbugs at nondot.org
Reporter: davide at freebsd.org
CC: atrick at apple.com, efriedma at codeaurora.org,
llvm-bugs at lists.llvm.org, matze at braunis.de,
qcolombet at apple.com
Created attachment 18180
--> https://bugs.llvm.org/attachment.cgi?id=18180&action=edit
dag dump pre-scheduling
This is the salsa20 benchmark from the testsuite (SingleSource).
I'm not sure if the model can be improved or this is a general issue with the
instruction scheduler heuristics.
passing -O3 `-mcpu=cortex-a53 -mtune=cortex-a53` LLVM generates the following
code for the hot loop (subset of instructions):
```
400638: 0b100106 add w6, w8, w16
40063c: 0b0d0127 add w7, w9, w13
400640: 4ac66400 eor w0, w0, w6, ror #25
400644: 0b120166 add w6, w11, w18
400648: 4ac76463 eor w3, w3, w7, ror #25
40064c: 4ac66442 eor w2, w2, w6, ror #25
400650: 0b100006 add w6, w0, w16
400654: 0b0d0067 add w7, w3, w13
400658: 4ac65d8c eor w12, w12, w6, ror #23
40065c: 0b120046 add w6, w2, w18
400660: 4ac75def eor w15, w15, w7, ror #23
400664: 4ac65c84 eor w4, w4, w6, ror #23
400668: 0b000186 add w6, w12, w0
40066c: 0b0301e7 add w7, w15, w3
400670: 4ac64d08 eor w8, w8, w6, ror #19
400674: 0b020086 add w6, w4, w2
400678: 4ac74d29 eor w9, w9, w7, ror #19
40067c: 4ac64d6b eor w11, w11, w6, ror #19
```
while gcc 7:
```
400688: 0b020175 add w21, w11, w2
40068c: 0b040214 add w20, w16, w4
400690: 0b050233 add w19, w17, w5
400694: 0b030192 add w18, w12, w3
400698: 4ad56508 eor w8, w8, w21, ror #25
40069c: 4ad464e7 eor w7, w7, w20, ror #25
4006a0: 4ad364c6 eor w6, w6, w19, ror #25
4006a4: 4ad26529 eor w9, w9, w18, ror #25
4006a8: 0b0b0115 add w21, w8, w11
4006ac: 0b1000f4 add w20, w7, w16
4006b0: 0b1100d3 add w19, w6, w17
4006b4: 0b0c0132 add w18, w9, w12
4006b8: 4ad55d4a eor w10, w10, w21, ror #23
4006bc: 4ad45dad eor w13, w13, w20, ror #23
4006c0: 4ad35def eor w15, w15, w19, ror #23
4006c4: 4ad25dce eor w14, w14, w18, ror #23
```
The latter results in many more stalls and ~ 20% runtime regression.
SelectionDAG for the BB pre scheduling and initial IR attached.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20170327/43b11540/attachment.html>
More information about the llvm-bugs
mailing list