[llvm-dev] [RFC] lld: Dropping TLS relaxations in favor of TLSDESC

Tue Nov 7 18:27:37 PST 2017

tl;dr: TLSDESC have solved most problems in formerly inefficient TLS access
models, so I think we can drop TLS relaxation support from lld.

lld's code to handle relocations is a mess; the code consists of a lot of
cascading "if"s and needs a lot of prior knowledge to understand what it is
doing. Honestly it is head-scratching and needs serious refactoring. I'm
trying to simplify it to make it manageable again, and I'm now focusing on
the TLS relaxations.

Thread-local variables in ELF is complicated. The ELF TLS specification [1]
defines 4 different access models: General Dynamic, Local Dynamic, Initial
Exec and Local Exec.

I'm not going into the details of the spec here, but the reason why we have
so many different models for the same feature is because they were
different in speed, and we have to use (formerly) slow models when we know
less about their run-time memory layout at compile-time or link-time. So,
there was a trade-off between generality and performance. For example, if
you want to use thread-local variables in a dlopen(2)'able DSO, you need to
choose the slowest model. If a linker knows at link-time that a more
restricted access model is applicable (e.g. if it is linking a main
executable, it knows for sure that it is not creating a DSO that will be
used via dlopen), the linker is allowed to rewrite instructions to load
thread-local variables to use a faster access model.

What makes the situation more complicated is the presence of a new method
of accessing thread-local variables. After the ELF TLS spec was defined,
TLSDESC [2] was proposed and implemented. With that method, General Dynamic
and Local Dynamic models (that were pretty slow in the original spec) are
as fast as much faster Initial Exec model. TLSDESC doesn't have a trade-off
of dlopen'ability and access speed. According to [2], it also reduces the
size of generated DSOs. So it seems like TLSDESC is strictly a better way
of accessing thread-local variables than the old way, and the thread-local
variable's performance problem (that the TLS ELF spec was trying to address
by defining four different access models and relaxations in between)
doesn't seem a real issue anymore.

lld supports all TLS relaxations as defined by the ELF TLS spec. I accepted
the patches to implement all these features without thinking hard enough
about it, but on second thought, that was likely a wrong decision. Being a
new linker, we don't need to trace the history of the evolution of the ELF
spec. Instead, we should have implemented whatever it makes sense now.

So, I'd like to propose we drop TLS relaxations from lld, including Initial
Exec → Local Exec. Dropping IE→LE is strictly speaking a degradation, but I
don't think that is important. We don't have optimizations for much more
frequent variable access patterns such as locally-accessed variables that
have GOT slots (which in theory we can skip GOT access because GOT slot
values are known at link-time), so it is odd that we are only serious about
TLS variables, which are usually much less important. Even if it would turn
out that we want it after implementing more important relaxations, I'd like
to drop it for now and reimplement it in a different way later.

This should greatly simplifies the code because it does not only reduce the
complexity and amount of the existing code, but also reduces the amount of
knowledge you need to have to read the code, without sacrificing
performance of lld-generated files in practice.

Thoughts?

[1] https://www.akkadia.org/drepper/tls.pdf
[2] http://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171107/b39b7aaf/attachment-0001.html>