[PATCH] D31528: [ELF][MIPS] Multi-GOT implementation

Sat Apr 8 00:08:34 PDT 2017

On Tue, Apr 4, 2017 at 9:09 PM, Rui Ueyama <ruiu at google.com> wrote:
> On Tue, Apr 4, 2017 at 5:46 AM, Simon Atanasyan <simon at atanasyan.com> wrote:
>>
>> On Sat, Apr 1, 2017 at 4:20 AM, Rui Ueyama via Phabricator
>> <reviews at reviews.llvm.org> wrote:
>> >
>> > This is not your fault, but I have to say that this MIPS GOT layout is
>> > very odd,
>> > too different from other architectures, and too complicated. I want to
>> > avoid supporting
>> > this unless I'm convinced that it is absolutely necessary. It seems to
>> > me that MIPS
>> > needs a clean, common new ABI. Only the MIPS ABI imposes a lot of
>> > restrictions
>> > on the size of GOT sections and the order of GOT section members, even
>> > though MIPS
>> > as a processor is an ordinary RISC ISA. This change would really hurt
>> > maintainability
>> > of LLD which I already found some MIPS-specific behavior is hard to keep
>> > it right
>> > when editing code for all the other architectures.
>>
>> MIPS will not always use old, obsoleted ABIs. It will switch to new
>> one. But it does not
>> happen this year or so. Besides other obstacles, there is a hardware
>> problem prevents from
>> fast switching and common acceptance of the new ABI. Historically many
>> MIPS instructions
>> are partitioned as 16 bit for opcode and 16 bit bit for address/index.
>> That is one of
>> the source of GOT size limitation and reason of multi-GOT invention.
>>
>> The biggest part of the patch isolated in the MipsGotSection class. It
>> adds some new
>> MIPS specific code like new constructor of the DynamicReloc class. But
>> at the same
>> time it removes some `if (Config->EMachine == EM_MIPS)` statements and
>> MIPS specific
>> fields from the `SymbolBody` class.
>
>
> It is isolated as a separate class, but we still need to understand and
> modify it when we need to do something for relocation processing. I'm
> actually trying to change the design of relocation processing, to increase
> parallelism of relocation processing. We can't parallelize it entirely, but
> some part (such as making a decision whether a symbol needs a GOT slot or
> not) can be processed per-file or per-relocation basis.
>
> Then I found that this part of code is very complex and has grown
> organically. I tried to reduce its complexity and found that keeping
> everything right for MIPS is hard. I'm really don't want to increase
> complexity of this code. If you increase the complexity, I won't be able to
> refactor it anymore because I'm struggling to do that even for the current
> code.
>
> In addition to that, the MIPS multi-GOT ABI doesn't seem a right design to
> me. If multi-GOT is in use, only the first GOT is recognized as a real GOT
> by the dynamic linker, and secondary GOTs are just some sections that
> simulates GOT. It's too hacky, isn't it?
>
>> > I wonder what is the performance penalty you would have to pay when you
>> > use the -mxgot
>> > option. With the option, you'll need three instructions as opposed to a
>> > single instruction
>> > to access an GOT entry. Does that actually make observable difference in
>> > performance?
>>
>> Regular (without -mxgot) access to GOT requires a single instruction:
>>
>> lw  t9,0(gp)
>>
>> I was wrong when say about two instructions. With -mxgot option we get
>> three instructions.
>>
>> lui     at,0x0
>> addu    at,at,gp
>> lw      t9,0(at)
>>
>> In case of MIPS global offset table is used not only to call external
>> functions / access
>> external data but for local calls / access under some conditions. So
>> using -mxgot we can
>> easily grow the code size and reduce performance.
>
>
> How much is the actual performance hit?

Multi-GOT is an attempt to bypass say limitation of MIPS architecture.
It's not my invention, this feature was implemented in GNU linker more
than ten years ago. Every time when GOT exceeds ~64KB limit BFD and
gold linkers create multi-GOT layout.

I do not think that my implementation of multi-GOT makes LLD much more
complicated. General idea remains the same - collect information about
various type of required GOT entries, layout GOT entries, write this
layout. Merging multiple GOT created for each file into larger GOT is
rather complicated routine though. From another side, creating a
separate GOT for each input file makes possible to parallelize this
process. Current implementation, where MipsGotSection maintains a
single `GotEntries` vector for all files, does not allow to process
multiple input files at the same time without some sort of "locks".

Performance degradation in case of using -mxgot depends on
application. My tests show that application use -mxgot slower on
1%-4%. But it's more important that there are large applications which
cannot be linked without multi-GOT at all even if they built with
-mxgot option. Because there are some relocations which operate by
16-bit GOT index only.

-- 
Simon Atanasyan