[LLVMdev] Proposal: MCLinker - an LLVM integrated linker
Don Quixote de la Mancha
quixote at dulcineatech.com
Wed Nov 2 23:05:33 PDT 2011
A helpful link-time optimization would be to place subroutines that
are used close together in time also close together in the executable
file. That also goes for data that is in the executable file, whether
initialized (.data segment) or zero-initialized (.bss).
If the unit of linkage of code is the function rather than the
compilation module, and the unit of linkage of data is the individual
data item rather than all the .data and .bss items together that are
in a compilation unit, you could rearrange them at will.
For architectures such as ARM that cannot make jumps to faraway
addresses, you could make the destinations of subroutine calls close
to the caller so you would not need so many trampolines.
The locality improves the speed because the program would use the code
and data caches more efficiently, and would page in data and code from
disk less often.
Having fewer physically resident pages also saves on precious kernel
memory. I read in O'Reilly's "Understanding the Linux Kernel" that on
the i386 architecture, the kernel's page tables consume most of the
physical memory in the computer, leaving very little physical memory
for user processes!
A first cut would be to start with the runtime program startup code,
which for C program then calls main(). The subroutines that main
calls would be placed next in the file. Suppose main calls Foo() and
then Bar(). One would then place each of the subroutines that Foo()
calls all together, then each of the subroutines that Bar() calls.
It would be best if some static analysis were performed to determine
in what order subroutines are called, and in what order .data and .bss
memory is accessed.
Getting that analysis right for the general case would not be easy, as
the time-order in which subroutines are called may of course depend on
the input data. To improve the locality, one could produce an
instrumented executable which saved a stack trace at the entry of each
subroutine. Examination of all the stack traces would enable a
post-processing tool to generate a linker script that would be used
for a second pass of the linker. This is a form of profiler-guided
optimization.
For extra credit one could prepare multiple input files (or for
interactive programs, several distinctly different GUI robot scripts).
Then the tool that prepared the linker script would try to optimize
for the average case for most code.
Regards,
Don Quixote
--
Don Quixote de la Mancha
Dulcinea Technologies Corporation
Software of Elegance and Beauty
http://www.dulcineatech.com
quixote at dulcineatech.com
More information about the llvm-dev
mailing list