[llvm-dev] [AArch64] Generated assembly differs depending on whether debug information is generated or not
David Tellenbach via llvm-dev
llvm-dev at lists.llvm.org
Thu Sep 26 02:57:05 PDT 2019
Hi,
we at Arm have noticed that assembly can differ when compiling for AArch64
depending on whether debug information is generated or not.
The issue is reproducible for the following small example compiled with `-O1`
for `aarch64-arm-linux-gnu`:
a() {
b(a);
for (;;)
c("", b);
}
The reason for the difference is that AArch64 frame lowering emits CFI
instructions if debug information is enabled but not if not. CFI instructions
act as scheduling boundaries during instruction scheduling and therefore lead to
differing scheduling regions and an overall different instruction scheduling.
We see several ways to fix the issue and would welcome comments on this:
1. Enabling unwind tables by default for AArch64: By enabling unwind tables
by default CFI instructions will be inserted in both, debug and non-debug
mode. This should lead to smaller scheduling regions and probably to less
scheduling potential.
However, I've measured the average size of scheduling regions for randomly
generated programs with and without default unwind tables and found an
average difference of 0.5 to 1 instruction. Other architectures such as x86
do exactly this and therefore don't face the issue.
The following patch on Phabricator introduces the said change:
https://reviews.llvm.org/D68076
2. Postpone insertion of CFI instructions until after instruction scheduling.
This would require a new pass running after instruction scheduling that
inserts CFI instructions if needed. The downside I see is increased
compile-time and probably some code duplication with frame lowering.
3. Change instruction scheduling such that CFI instructions get tied together
with relevant instructions in such a way that they get scheduled together.
If this could work it would probably the cleanest solution.
To summarize:
1. would make scheduling in the non-debug case behave like in the
debug case and therefore probably cost some scheduling potential. However, it
would be by far the most easy to implement. 2. + 3. would probably lead to
better scheduling but seem to be more complex to implement.
Comments and additional ideas are welcome.
David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190926/f6200b54/attachment.html>
More information about the llvm-dev
mailing list