<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Sat, Apr 8, 2017 at 12:08 AM, Simon Atanasyan <span dir="ltr"><<a href="mailto:simon@atanasyan.com" target="_blank">simon@atanasyan.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On Tue, Apr 4, 2017 at 9:09 PM, Rui Ueyama <<a href="mailto:ruiu@google.com" target="_blank">ruiu@google.com</a>> wrote:<br>

> On Tue, Apr 4, 2017 at 5:46 AM, Simon Atanasyan <<a href="mailto:simon@atanasyan.com" target="_blank">simon@atanasyan.com</a>> wrote:<br>

>><br>

</span><div><div class="m_3714780965815102404h5">>> On Sat, Apr 1, 2017 at 4:20 AM, Rui Ueyama via Phabricator<br>

>> <<a href="mailto:reviews@reviews.llvm.org" target="_blank">reviews@reviews.llvm.org</a>> wrote:<br>

>> ><br>

>> > This is not your fault, but I have to say that this MIPS GOT layout is<br>

>> > very odd,<br>

>> > too different from other architectures, and too complicated. I want to<br>

>> > avoid supporting<br>

>> > this unless I'm convinced that it is absolutely necessary. It seems to<br>

>> > me that MIPS<br>

>> > needs a clean, common new ABI. Only the MIPS ABI imposes a lot of<br>

>> > restrictions<br>

>> > on the size of GOT sections and the order of GOT section members, even<br>

>> > though MIPS<br>

>> > as a processor is an ordinary RISC ISA. This change would really hurt<br>

>> > maintainability<br>

>> > of LLD which I already found some MIPS-specific behavior is hard to keep<br>

>> > it right<br>

>> > when editing code for all the other architectures.<br>

>><br>

>> MIPS will not always use old, obsoleted ABIs. It will switch to new<br>

>> one. But it does not<br>

>> happen this year or so. Besides other obstacles, there is a hardware<br>

>> problem prevents from<br>

>> fast switching and common acceptance of the new ABI. Historically many<br>

>> MIPS instructions<br>

>> are partitioned as 16 bit for opcode and 16 bit bit for address/index.<br>

>> That is one of<br>

>> the source of GOT size limitation and reason of multi-GOT invention.<br>

>><br>

>> The biggest part of the patch isolated in the MipsGotSection class. It<br>

>> adds some new<br>

>> MIPS specific code like new constructor of the DynamicReloc class. But<br>

>> at the same<br>

>> time it removes some `if (Config->EMachine == EM_MIPS)` statements and<br>

>> MIPS specific<br>

>> fields from the `SymbolBody` class.<br>

><br>

><br>

> It is isolated as a separate class, but we still need to understand and<br>

> modify it when we need to do something for relocation processing. I'm<br>

> actually trying to change the design of relocation processing, to increase<br>

> parallelism of relocation processing. We can't parallelize it entirely, but<br>

> some part (such as making a decision whether a symbol needs a GOT slot or<br>

> not) can be processed per-file or per-relocation basis.<br>

><br>

> Then I found that this part of code is very complex and has grown<br>

> organically. I tried to reduce its complexity and found that keeping<br>

> everything right for MIPS is hard. I'm really don't want to increase<br>

> complexity of this code. If you increase the complexity, I won't be able to<br>

> refactor it anymore because I'm struggling to do that even for the current<br>

> code.<br>

><br>

> In addition to that, the MIPS multi-GOT ABI doesn't seem a right design to<br>

> me. If multi-GOT is in use, only the first GOT is recognized as a real GOT<br>

> by the dynamic linker, and secondary GOTs are just some sections that<br>

> simulates GOT. It's too hacky, isn't it?<br>

><br>

>> > I wonder what is the performance penalty you would have to pay when you<br>

>> > use the -mxgot<br>

>> > option. With the option, you'll need three instructions as opposed to a<br>

>> > single instruction<br>

>> > to access an GOT entry. Does that actually make observable difference in<br>

>> > performance?<br>

>><br>

>> Regular (without -mxgot) access to GOT requires a single instruction:<br>

>><br>

>> lw  t9,0(gp)<br>

>><br>

>> I was wrong when say about two instructions. With -mxgot option we get<br>

>> three instructions.<br>

>><br>

>> lui     at,0x0<br>

>> addu    at,at,gp<br>

>> lw      t9,0(at)<br>

>><br>

>> In case of MIPS global offset table is used not only to call external<br>

>> functions / access<br>

>> external data but for local calls / access under some conditions. So<br>

>> using -mxgot we can<br>

>> easily grow the code size and reduce performance.<br>

><br>

><br>

> How much is the actual performance hit?<br>

<br>

</div></div>Multi-GOT is an attempt to bypass say limitation of MIPS architecture.<br>

It's not my invention, this feature was implemented in GNU linker more<br>

than ten years ago. Every time when GOT exceeds ~64KB limit BFD and<br>

gold linkers create multi-GOT layout.<br>

<br>

I do not think that my implementation of multi-GOT makes LLD much more<br>

complicated. General idea remains the same - collect information about<br>

various type of required GOT entries, layout GOT entries, write this<br>

layout. Merging multiple GOT created for each file into larger GOT is<br>

rather complicated routine though. From another side, creating a<br>

separate GOT for each input file makes possible to parallelize this<br>

process. Current implementation, where MipsGotSection maintains a<br>

single `GotEntries` vector for all files, does not allow to process<br>

multiple input files at the same time without some sort of "locks".<br></blockquote><div><br></div><div>I understand that you are just trying to implement a MIPS ABI, and I also understand that you made your effort to write good code. Your code seems to be a straightforward implementation of the ABI if I understand it correctly. But still new code inevitably adds complexity, and that's particularly true for this patch that introduces a new notion of "multi-GOT" only for MIPS. Also, it is not my fault to say that this feature is too odd, because I think it's a consequence of MIPS ABI's peculiarities. I believe many peculiarities in the MIPS ABI could have been fixed by now since they were implemented more than 10 years ago.</div><div><br></div><div>I really do not want to add this much complexity to our relocation processing code which is already too complicated. Even I don't understand the exact behavior of the current code, and I'm am trying to refactor that code now. This patch could make my refactoring impossible.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Performance degradation in case of using -mxgot depends on<br>

application. My tests show that application use -mxgot slower on<br>

1%-4%. But it's more important that there are large applications which<br>

cannot be linked without multi-GOT at all even if they built with<br>

-mxgot option. Because there are some relocations which operate by<br>

16-bit GOT index only.<br>

<span class="m_3714780965815102404HOEnZb"><font color="#888888"><br>

--<br>

Simon Atanasyan<br>

</font></span></blockquote></div><br></div></div>