[PATCH] D60958: [PPC64] toc-indirect to toc-relative relaxation

Tue Apr 23 09:29:41 PDT 2019

sfertile added a comment.

Hi MaskRay.

Thanks for doing this, when you originally described this too me I didn't realize you intended to partition and sort the relocations in the sections other then .rela.toc. This clears up my question regarding the implementation.  I'm still a little hesitant on this approach though. Did you profile link times between the 2 approaches?  I understand the speed up in number of access (you have outlined that very well in another comment), however that fails to consider both how small the number of relocation in rela.toc is compared to all the relocations in the text section, and how infrequently we have an object file that actually enters the loop.

The main reason I wasn't so concerned over the n^2 look up in the original patch is because we would hit that loop so infrequently, and even though it is technically n^2, in practice the typical object files gcc produces when it does put a constant in the TOC  would typically lead to 1 or 2  extra array access rather then n extra array accesses. I've compiled a couple of projects to show what I mean.

  Protobuf: 628 objects, 27 where there is a constant in the TOC. (4.3%)
   Missing reloc count               Frequency
         1                             19
         2                              8

  Postgres: 704 objects 12 with a constant in the TOC (1.7%)
  Missing reloc count                Frequency
          1                            10
          2                             1
          9                             1

  FMPEG: 681 objects,  28 with constants in the TOC (4.1%)
  Missing reloc count                Frequency
        1                               22
        2                                4
        3                                1
        96                               1

  LLVM: 3294 Objects, 355 with constant in the TOC.  (10.8%)
  Missing reloc count                Frequency
       1                               180
       2                                61
       3                                48
       4                                22
       5                                13
       6                                 4
       7                                 5
       8                                 3
       9                                 2
      10                                 2
      11                                 3
      12                                 3
      13                                 2
      19                                 1
      22                                 1
      28                                 2
      32                                 1
      40                                 1
      80                                 1

Clearly there are a few objects where the number of missing relocation's does start to get worryingly large (30/40/80/96), but 90%-95%  of the files never hit the loop, and  70% of those that do need the loop will have at most 1 or 2 extra array accesses per lookup. Note that this is all compiled with gcc, when compiling  with clang as the build compiler we end up with *no* objects falling though to the loop.

FWIW, I think this implementation is clean and understandable enough that we can switch to it, but I would like to know how this affects the link time of say llvm when clang is the build compiler and when gcc is the build compiler  before deciding this is the best approach.

Repository:
  rLLD LLVM Linker

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60958/new/

https://reviews.llvm.org/D60958