[LLVMdev] Proposal: MCLinker - an LLVM integrated linker

Thu Nov 3 07:12:05 PDT 2011

I am curious is it better to do these link-time optimizations on section level?

For example, invoking Clang with "-ffunction-sections" and "-fdata-sections"
options can place each functions and data into its own section in the
output file for elf format backend.  So this is exactly what the two options
provide on section level.

It is not a new stuff, AFAIK there are already some link-time optimizations
working with function-sections. What are the advantages of creating atoms
when linking rather than function-sections and data-sections prior to linking?

Thanks,
ChengWei Chen

2011/11/3 Michael Spencer <bigcheesegs at gmail.com>:
> On Wed, Nov 2, 2011 at 11:05 PM, Don Quixote de la Mancha
> <quixote at dulcineatech.com> wrote:
>> A helpful link-time optimization would be to place subroutines that
>> are used close together in time also close together in the executable
>> file.  That also goes for data that is in the executable file, whether
>> initialized (.data segment) or zero-initialized (.bss).
>>
>> If the unit of linkage of code is the function rather than the
>> compilation module, and the unit of linkage of data is the individual
>> data item rather than all the .data and .bss items together that are
>> in a compilation unit, you could rearrange them at will.
>
> This is exactly what the atom model provides. And some of the use
> cases you describe were actually discussed at the social tonight.
>
> - Michael Spencer
>
>> For architectures such as ARM that cannot make jumps to faraway
>> addresses, you could make the destinations of subroutine calls close
>> to the caller so you would not need so many trampolines.
>>
>> The locality improves the speed because the program would use the code
>> and data caches more efficiently, and would page in data and code from
>> disk less often.
>>
>> Having fewer physically resident pages also saves on precious kernel
>> memory.  I read in O'Reilly's "Understanding the Linux Kernel" that on
>> the i386 architecture, the kernel's page tables consume most of the
>> physical memory in the computer, leaving very little physical memory
>> for user processes!
>>
>> A first cut would be to start with the runtime program startup code,
>> which for C program then calls main().  The subroutines that main
>> calls would be placed next in the file.  Suppose main calls Foo() and
>> then Bar().  One would then place each of the subroutines that Foo()
>> calls all together, then each of the subroutines that Bar() calls.
>>
>> It would be best if some static analysis were performed to determine
>> in what order subroutines are called, and in what order .data and .bss
>> memory is accessed.
>>
>> Getting that analysis right for the general case would not be easy, as
>> the time-order in which subroutines are called may of course depend on
>> the input data.  To improve the locality, one could produce an
>> instrumented executable which saved a stack trace at the entry of each
>> subroutine.  Examination of all the stack traces would enable a
>> post-processing tool to generate a linker script that would be used
>> for a second pass of the linker.  This is a form of profiler-guided
>> optimization.
>>
>> For extra credit one could prepare multiple input files (or for
>> interactive programs, several distinctly different GUI robot scripts).
>>  Then the tool that prepared the linker script would try to optimize
>> for the average case for most code.
>>
>> Regards,
>>
>> Don Quixote
>> --
>> Don Quixote de la Mancha
>> Dulcinea Technologies Corporation
>> Software of Elegance and Beauty
>> http://www.dulcineatech.com
>> quixote at dulcineatech.com
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>