[llvm-dev] [AArch64] Generated assembly differs depending on whether debug information is generated or not

Mon Sep 30 12:50:20 PDT 2019

Hi David,

Thanks for looking into this.

It seems like D68076 might not address the underlying issue here (e.g. it probably doesn't improve the situation for projects using `-g -fno-unwind-tables`?).

Would you mind elaborating a bit on your proposals to delay/change CFI instruction insertion? In particular, it'd help to hear a bit about how CFI instructions are inserted today (is some of it done by CFIInstrInserter, and the rest by target-specific frame lowering code?).

best,
vedant

> On Sep 26, 2019, at 6:57 AM, David Tellenbach via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Hi Paul,
> 
> thanks for your comments.
>> This is PR37240 (https://bugs.llvm.org/show_bug.cgi?id=37240 <https://bugs.llvm.org/show_bug.cgi?id=37240>).  I suspect this problem affects all targets; your patch D68076 would address it only for AArch64.  Although I would suggest you do some careful measurements to determine the runtime performance effect, to decide whether this is acceptable.
> Yes, in principle the problem that instruction scheduling is dependent on the presence of cfi instruction should affect more targets than AArch64. However, this does not imply that all of these targets produce inconsistent assembly depending on debug information.
> 
>> The more complete approach in your steps 2 + 3 would solve this for all targets, assuming the solution did not have to be very target-specific.  This would benefit the entire community.
> At least 2. would require a lot of target dependent changes because the insertion of cfi instructions would have to be moved from target specific frame lowering into an (probably again target specific) insertion pass.
> 
>         David
> 
> On 26/09/2019 13:55, paul.robinson at sony.com <mailto:paul.robinson at sony.com> wrote:
>> Hi David,
>>  
>> This is PR37240 (https://bugs.llvm.org/show_bug.cgi?id=37240 <https://bugs.llvm.org/show_bug.cgi?id=37240>).  I suspect this problem affects all targets; your patch D68076 would address it only for AArch64.  Although I would suggest you do some careful measurements to determine the runtime performance effect, to decide whether this is acceptable.
>>  
>> The more complete approach in your steps 2 + 3 would solve this for all targets, assuming the solution did not have to be very target-specific.  This would benefit the entire community.
>> --paulr
>>   <>
>> From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org <mailto:llvm-dev-bounces at lists.llvm.org>] On Behalf Of David Tellenbach via llvm-dev
>> Sent: Thursday, September 26, 2019 5:57 AM
>> To: llvm-dev
>> Cc: nd
>> Subject: [llvm-dev] [AArch64] Generated assembly differs depending on whether debug information is generated or not
>>  
>> Hi,
>>  
>> we at Arm have noticed that assembly can differ when compiling for AArch64
>> depending on whether debug information is generated or not.
>>  
>> The issue is reproducible for the following small example compiled with `-O1`
>> for `aarch64-arm-linux-gnu`:
>>  
>>     a() {
>>       b(a);
>>       for (;;)
>>         c("", b);
>>     }
>>  
>> The reason for the difference is that AArch64 frame lowering emits CFI
>> instructions if debug information is enabled but not if not. CFI instructions
>> act as scheduling boundaries during instruction scheduling and therefore lead to
>> differing scheduling regions and an overall different instruction scheduling.
>>  
>> We see several ways to fix the issue and would welcome comments on this:
>>  
>>   1. Enabling unwind tables by default for AArch64: By enabling unwind tables
>>      by default CFI instructions will be inserted in both, debug and non-debug
>>      mode. This should lead to smaller scheduling regions and probably to less
>>      scheduling potential.
>>  
>>      However, I've measured the average size of scheduling regions for randomly
>>      generated programs with and without default unwind tables and found an
>>      average difference of 0.5 to 1 instruction. Other architectures such as x86
>>      do exactly this and therefore don't face the issue.
>>  
>>      The following patch on Phabricator introduces the said change:
>>                     https://reviews.llvm.org/D68076 <https://reviews.llvm.org/D68076>
>>  
>>   2. Postpone insertion of CFI instructions until after instruction scheduling.
>>      This would require a new pass running after instruction scheduling that  
>>      inserts CFI instructions if needed. The downside I see is increased
>>      compile-time and probably some code duplication with frame lowering.
>>  
>>   3. Change instruction scheduling such that CFI instructions get tied together
>>      with relevant instructions in such a way that they get scheduled together.
>>      If this could work it would probably the cleanest solution.
>>  
>> To summarize:
>> 1. would make scheduling in the non-debug case behave like in the
>> debug case and therefore probably cost some scheduling potential. However, it
>> would be by far the most easy to implement. 2. + 3. would probably lead to
>> better scheduling but seem to be more complex to implement.
>>  
>> Comments and additional ideas are welcome.
>>  
>>     David
>>  
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190930/e1bbdba9/attachment.html>