[cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?

Mon May 30 13:33:58 PDT 2016

We don't use cl::opt in gold, instead we parse the -plugin-opts that
gold passes the plugin (see process_plugin_option).

Cheers,
Rafael

On 30 May 2016 at 02:13, Mehdi Amini <mehdi.amini at apple.com> wrote:
>
> On May 29, 2016, at 5:44 PM, Shi, Steven <steven.shi at intel.com> wrote:
>
> (And I doubt the GNU linker supports LTO with LLVM).
> [Steven]: I’ve pushed GNU Binutils ld to support LLVM gold plugin, see
> detail in this bug https://sourceware.org/bugzilla/show_bug.cgi?id=20070.
> The new GNU ld linker works well with LLVM/Clang LTO when build IA32 code in
> my side. And from the ld owner input in the bug comments, the current X64
> LLVM LTO issue is in llvm LTO plugin.
>
>
> The fact that we don't support it for now seems to indicate that it is not a
> widely requested feature, especially considering that it is really a trivial
> option to add.
> What is the linker you're using? Are you building your own clang?
> [Steven]: I’m using the standard LLVM 3.8 with the above GNU new ld linker.
> I can build my own clang in my side if needed. I’m happy to know it is not
> difficult to enable the large code model in LLVM LTO and “it is really a
> trivial option to add”. Could you let me know how to enable it? My lots of
> work have been blocked by the large code model issue. Thank you!
>
>
>
> I can't test it locally, but here is a starting point in the gold plugin,
> inspired by the code present in clang:
>
>
>
> You need to use your linker-specific way of passing the option
> "-lto-use-large-codemodel=..." to the plugin.
>
> Let me know if it works for you!
>
> --
> Mehdi
>
>
>
>
> Steven Shi
> Intel\SSG\STO\UEFI Firmware
>
> Tel: +86 021-61166522
> iNet: 821-6522
>
> From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com]
> Sent: Monday, May 30, 2016 8:17 AM
> To: Shi, Steven <steven.shi at intel.com>
> Cc: Umesh Kalappa <umesh.kalappa0 at gmail.com>; eliben at gmail.com; llvm-dev
> <llvm-dev at lists.llvm.org>; cfe-dev at lists.llvm.org; Rafael Espíndola
> <rafael.espindola at gmail.com>
> Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?
>
>
>
> On May 29, 2016, at 5:10 PM, Shi, Steven <steven.shi at intel.com> wrote:
>
> Hi Mehdi,
> GCC LTO seems support large code model in my side as below, if the code
> model is linker specific, does the GCC LTO use a special linker which is
> different from the one in GNU Binutils?
>
>
> I don't know anything about GCC.
> (And I doubt the GNU linker supports LTO with LLVM).
>
>
> I’m a bit surprised if both OS X ld64 and gold plugin do not support large
> code model in LTO. Since modern system widely use the 64bit, the code need
> to run in high address (larger than 2 GB) is a reasonable requirement.
>
>
> The fact that we don't support it for now seems to indicate that it is not a
> widely requested feature, especially considering that it is really a trivial
> option to add.
> What is the linker you're using? Are you building your own clang?
>
> --
> Mehdi
>
>
>
>
>
> $ gcc -g -O0 -flto codemodel1.c -mcmodel=large -o
> codemodel1_large_lto_gcc.bin
> $ objdump -dS codemodel1_large_lto_gcc.bin
>
> int main(int argc, const char* argv[])
> {
>   40048b:       55                      push   %rbp
>   40048c:       48 89 e5                mov    %rsp,%rbp
>   40048f:       48 83 ec 20             sub    $0x20,%rsp
>   400493:       89 7d ec                mov    %edi,-0x14(%rbp)
>   400496:       48 89 75 e0             mov    %rsi,-0x20(%rbp)
>     int t = global_func(argc);
>   40049a:       8b 45 ec                mov    -0x14(%rbp),%eax
>   40049d:       89 c7                   mov    %eax,%edi
>   40049f:       48 b8 76 04 40 00 00    movabs $0x400476,%rax
>   4004a6:       00 00 00
>   4004a9:       ff d0                   callq  *%rax
>   4004ab:       89 45 fc                mov    %eax,-0x4(%rbp)
>     t += global_arr[7];
>   4004ae:       48 b8 20 09 60 00 00    movabs $0x600920,%rax
>   4004b5:       00 00 00
>   4004b8:       8b 40 1c                mov    0x1c(%rax),%eax
>   4004bb:       01 45 fc                add    %eax,-0x4(%rbp)
>     t += static_arr[7];
>   4004be:       48 b8 c0 0a 60 00 00    movabs $0x600ac0,%rax
>   4004c5:       00 00 00
>   4004c8:       8b 40 1c                mov    0x1c(%rax),%eax
>   4004cb:       01 45 fc                add    %eax,-0x4(%rbp)
>     t += global_arr_big[7];
>   4004ce:       48 b8 60 0c 60 00 00    movabs $0x600c60,%rax
>   4004d5:       00 00 00
>   4004d8:       8b 40 1c                mov    0x1c(%rax),%eax
>   4004db:       01 45 fc                add    %eax,-0x4(%rbp)
>     t += static_arr_big[7];
>   4004de:       48 b8 a0 19 63 00 00    movabs $0x6319a0,%rax
>   4004e5:       00 00 00
>   4004e8:       8b 40 1c                mov    0x1c(%rax),%eax
>   4004eb:       01 45 fc                add    %eax,-0x4(%rbp)
>     return t;
>   4004ee:       8b 45 fc                mov    -0x4(%rbp),%eax
> }
>
> Steven Shi
> Intel\SSG\STO\UEFI Firmware
>
> Tel: +86 021-61166522
> iNet: 821-6522
>
> From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com]
> Sent: Monday, May 30, 2016 4:28 AM
> To: Shi, Steven <steven.shi at intel.com>
> Cc: Umesh Kalappa <umesh.kalappa0 at gmail.com>; eliben at gmail.com; llvm-dev
> <llvm-dev at lists.llvm.org>; cfe-dev at lists.llvm.org; Rafael Espíndola
> <rafael.espindola at gmail.com>
> Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?
>
> Hi,
>
>
>
> On May 29, 2016, at 7:36 AM, Shi, Steven <steven.shi at intel.com> wrote:
>
> Hi Mehdi,
> After deeper debug, I found my firmware LTO wrong code issue is related to
> X64 code model (-mcmodel=large) is always overridden as small
> (-mcmodel=small) if LTO build. And I don't know how to correctly specific
> the large code model for my X64 firmware LTO build. Appreciate if you could
> let me know it.
>
> You know, parts of my Uefi firmware (BIOS) have to been loaded to run in
> high address (larger than 2 GB) at the very beginning, and I need the code
> makes absolutely no assumptions about the addresses and data sections. But
> current LLVM LTO seems stick to use the small code model and generate many
> code with 32-bit RIP-relative addressing, which cause CPU exceptions when
> run in address larger than 2GB.
>
> Below, I just simply reuse the Eli's codemodel1.c example (link:
> http://eli.thegreenplace.net/2012/01/03/understanding-the-x64-code-models)
> to show the LLVM LTO code model issue.
> $ clang -g -O0 codemodel1.c -mcmodel=large -o codemodel1_large.bin
> $ clang -g -O0 codemodel1.c -mcmodel=small -o codemodel1_small.bin
> $ clang -g -O0 -flto codemodel1.c -mcmodel=large -o codemodel1_large_lto.bin
> $ clang -g -O0 -flto codemodel1.c -mcmodel=small -o codemodel1_small_lto.bin
>
> You will see the codemodel1_large_lto.bin and codemodel1_small_lto.bin are
> exactly the same!
> And if you disassemble the codemodel1_large_lto.bin, you will see it uses
> the small code model (32-bit RIP-relative), not large, to do addressing as
> below.
>
> $ objdump -dS codemodel1_large_lto.bin
>
> int main(int argc, const char* argv[])
> {
>   4004f0:       55                      push   %rbp
>   4004f1:       48 89 e5                mov    %rsp,%rbp
>   4004f4:       48 83 ec 20             sub    $0x20,%rsp
>   4004f8:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
>   4004ff:       89 7d f8                mov    %edi,-0x8(%rbp)
>   400502:       48 89 75 f0             mov    %rsi,-0x10(%rbp)
>     int t = global_func(argc);
>   400506:       8b 7d f8                mov    -0x8(%rbp),%edi
>   400509:       e8 d2 ff ff ff          callq  4004e0 <global_func>
>   40050e:       89 45 ec                mov    %eax,-0x14(%rbp)
>     t += global_arr[7];
>   400511:       8b 04 25 4c 10 60 00    mov    0x60104c,%eax
>   400518:       03 45 ec                add    -0x14(%rbp),%eax
>   40051b:       89 45 ec                mov    %eax,-0x14(%rbp)
>     t += static_arr[7];
>   40051e:       8b 04 25 dc 11 60 00    mov    0x6011dc,%eax
>   400525:       03 45 ec                add    -0x14(%rbp),%eax
>   400528:       89 45 ec                mov    %eax,-0x14(%rbp)
>     t += global_arr_big[7];
>   40052b:       8b 04 25 6c 13 60 00    mov    0x60136c,%eax
>   400532:       03 45 ec                add    -0x14(%rbp),%eax
>   400535:       89 45 ec                mov    %eax,-0x14(%rbp)
>     t += static_arr_big[7];
>   400538:       8b 04 25 ac 20 63 00    mov    0x6320ac,%eax
>   40053f:       03 45 ec                add    -0x14(%rbp),%eax
>   400542:       89 45 ec                mov    %eax,-0x14(%rbp)
>     return t;
>   400545:       8b 45 ec                mov    -0x14(%rbp),%eax
>   400548:       48 83 c4 20             add    $0x20,%rsp
>   40054c:       5d                      pop    %rbp
>   40054d:       c3                      retq
>   40054e:       66 90                   xchg   %ax,%ax
>
>
> So, does LTO support large code model? How to correctly specify the LTO code
> model option?
>
>
> Same answer as before: LTO is setup by the linker, so the option for that,
> if it exists, will be linker specific.
>
> As far as I can tell, neither libLTO-based linker (ld64 on OS X for
> example), neither the gold plugin supports such an option and the code model
> is always "default".
>
> I don't know about lld, CC Rafael about that.
>
> --
> Mehdi
>
>
>
>
>
>
>
>
> Steven Shi
> Intel\SSG\STO\UEFI Firmware
>
> Tel: +86 021-61166522
> iNet: 821-6522
>
>> -----Original Message-----
>> From: mehdi.amini at apple.com [mailto:mehdi.amini at apple.com]
>> Sent: Wednesday, May 18, 2016 4:02 AM
>> To: Umesh Kalappa <umesh.kalappa0 at gmail.com>
>> Cc: Shi, Steven <steven.shi at intel.com>; llvm-dev
>> <llvm-dev at lists.llvm.org>;
>> cfe-dev at lists.llvm.org
>> Subject: Re: [cfe-dev] [llvm-dev] How to debug if LTO generate wrong code?
>>
>>
>> > On May 17, 2016, at 11:21 AM, Umesh Kalappa
>> <umesh.kalappa0 at gmail.com> wrote:
>> >
>> > Steven,
>> >
>> > As mehdi stated , the optimisation level is specific to linker and it
>> > enables Inter-Pro  opts passes ,please  refer function
>>
>> To be very clear: the -O option may trigger *linker* optimizations as
>> well,
>> independently of LTO.
>>
>> --
>> Mehdi
>>
>>
>>
>
>
>