<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<span style="font-family: "Courier New", monospace;">Hi,</span><span><br>
</span>
<div><br>
</div>
<div><span style="font-family: "Courier New", monospace;">we at Arm have noticed that assembly can differ when compiling for AArch64</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;">depending on whether debug information is generated or not.
</span><br>
</div>
<div><br>
</div>
<div><span style="font-family: "Courier New", monospace;">The issue is reproducible for the following small example compiled with `-O1`</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;">for `aarch64-arm-linux-gnu`:</span><br>
</div>
<div><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> a() {</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> b(a);</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> for (;;)</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> c("", b);</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> }</span><br>
</div>
<div><br>
</div>
<div><span style="font-family: "Courier New", monospace;">The reason for the difference is that AArch64 frame lowering emits CFI</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;">instructions if debug information is enabled but not if not. CFI instructions</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;">act as scheduling boundaries during instruction scheduling and therefore lead to</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;">differing scheduling regions and an overall different instruction scheduling.</span><br>
</div>
<div><br>
</div>
<div><span style="font-family: "Courier New", monospace;">We see several ways to fix the issue and would welcome comments on this:</span><br>
</div>
<div><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> 1. Enabling unwind tables by default for AArch64: By enabling unwind tables</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> by default CFI instructions will be inserted in both, debug and non-debug</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> mode. This should lead to smaller scheduling regions and probably to less</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> scheduling potential.</span><br>
</div>
<div><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> However, I've measured the average size of scheduling regions for randomly</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> generated programs with and without default unwind tables and found an</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> average difference of 0.5 to 1 instruction. Other architectures such as x86</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> do exactly this and therefore don't face the issue.</span><br>
</div>
<div><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> The following patch on Phabricator introduces the said change:</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> https://reviews.llvm.org/D68076</span><br>
</div>
<div><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> 2. Postpone insertion of CFI instructions until after instruction scheduling.</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> This would require a new pass running after instruction scheduling that
</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> inserts CFI instructions if needed. The downside I see is increased</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> compile-time and probably some code duplication with frame lowering.
</span><br>
</div>
<div><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> 3. Change instruction scheduling such that CFI instructions get tied together</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> with relevant instructions in such a way that they get scheduled together.</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> If this could work it would probably the cleanest solution.</span><br>
</div>
<div><br>
</div>
<div><span style="font-family: "Courier New", monospace;">To summarize:</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;">1. would make scheduling in the non-debug case behave like in the</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;">debug case and therefore probably cost some scheduling potential. However, it</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;">would be by far the most easy to implement. 2. + 3. would probably lead to</span><br>
</div>
<div><span style="font-family: "Courier New", monospace;">better scheduling but seem to be more complex to implement.</span><br>
</div>
<div><br>
</div>
<div><span style="font-family: "Courier New", monospace;">Comments and additional ideas are welcome.</span><br>
</div>
<div><br>
</div>
<div><span style="font-family: "Courier New", monospace;"> David</span><br>
</div>
<span></span><br>
</div>
</body>
</html>