[llvm-dev] Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section

Thu Dec 14 00:11:29 PST 2017

> While adding a 'stride' field is definitely an improvement over simple
> delta+count encoding, it doesn't compare well against the bitmap based
> encoding.
>
> I took a look inside the encoding for the Vim binary. There are some instances
> in the bitmap based encoding like
>   [0x3855555555555555 0x3855555555555555 0x3855555555555555 ...]
> that encode sequences of relocations applying to alternate words. The stride
> based encoding works very well on these and turns it into much more compact
>   [0x0ff010ff 0x0ff010ff 0x0ff010ff ...]
> using stride==0x10 and count==0xff.

Have you looked much at where the RELATIVE relocations are coming from?

I've looked at a PIE build of gold, and they're almost all for
vtables, which mostly have consecutive entries with 8-byte strides.
There are a few for the GOT, a few for static constructors (in
.init_array), and a few for other initialized data, but vtables seem
to account for the vast majority. (Gold has almost 19,000 RELATIVE
dynamic relocs, and only about 500 non-RELATIVE dynamic relocs.)

Where do the 16-byte strides come from? Vim is plain C, right? I'm
guessing its RELATIVE relocation count is fairly low compared to big
C++ apps. I'm also guessing that the pattern comes from some large
structure or structures in the source code where initialized pointers
alternate with non-pointer values. I'm also curious about Roland's
app.

In my opinion, the data and my intuition both support your choice of a
jump + bit vector representation.

-cary