What to do about alignment of ELF objects

Thu Apr 23 14:34:07 PDT 2015

On 23 April 2015 at 17:22, Sean Silva <chisophugis at gmail.com> wrote:
>
>
> On Thu, Apr 23, 2015 at 12:12 PM, Rafael Espíndola
> <rafael.espindola at gmail.com> wrote:
>>
>> On 23 April 2015 at 14:17, Rui Ueyama <ruiu at google.com> wrote:
>> > I think the patch for LLVM looks okay, but not sure for the other one.
>> >
>> > Your patch makes the linker to not be able to handle archive files
>> > containing unaligned objects, or just makes it slower? If you cross-link
>> > an
>> > executable for machines generous for unaligned accesses, say x86, on
>> > not-so-generous machines, PowerPC for example, does it link fine?
>>
>> Not difference on X86 (we avoid the copy).
>
>
> This has the potential to radically change LLD's physical/virtual memory
> usage characteristics depending on LLVM_IS_UNALIGNED_ACCESS_FAST (LIUAF)
> along with total memory traffic profile and disk access patterns. For
> example, this patch causes the entire file to be faulted in and read up
> front on !LIUAF whereas the file might be faulted and touched on disk
> sparsely and/or in a random order when LIUAF. Realistically most
> benchmarking and optimization work is going to happen on x86 and so
> performance on !LIUAF is likely to "bit rot" (we currently don't have any
> type of performance CI to avoid this; this is on my TODO list).
>
> Have you tried copying the buffers on x86? Also, if you make sure that the
> incoming archives are aligned so you can avoid the copy on ppc, how much
> faster does it get? I.e. does (time saved from your patch on ppc)  == (time
> copying buffers with your patch on ppc) + (time saved if we use aligned
> archives and avoid the copy with your patch (for testing purposes))?
>
> Can you dig in a bit deeper and figure out where this speedup is coming
> from? As it stands right now, this patch seems like a very opportunistic
> "seems to work on my machine" speedup.

At this I don't think ti is worth it. We don't support powerpc, which
is why I had to do a cross linking to benchmark it.

The main issue is deleting a bunch of complicated dead code (on x86)
that just slows down other architectures.

For what it is worth, gold copies data when the buffer is not
sufficiently aligned, so this is know to work.

Cheers,
Rafael