[LLVMdev] LLD improvement plan

Fri May 29 22:14:23 PDT 2015

On Fri, May 29, 2015 at 7:08 PM, Rui Ueyama <ruiu at google.com> wrote:

> On Fri, May 29, 2015 at 6:01 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>>
>>
>> On Fri, May 29, 2015 at 1:14 AM, Rui Ueyama <ruiu at google.com> wrote:
>>
>>> I want to make it clear that I didn't (at least intend to) compromise
>>> flexibility or beauty of design with short-term performance gain. I was
>>> trying to do simple things in a simple way for both humans and computers,
>>> and I believe I did that fairly well.  I'd even argue that the new design
>>> is cleaner and more expressive than before, because the "atom" model is in
>>> some part too detailed and restrictive on how to represent data and
>>> relations between symbols, particularly how to represent relocations. It
>>> also lacked capability of representing indivisible memory areas having
>>> multiple names.
>>>
>>> After I wrote up the first patch, I realized that the goal of the code
>>> is somewhat similar to what the atom model aims to achieve, with some
>>> differences. I assume that you have read the readme file for the new port.
>>> The differences are
>>>
>>>  -  An atom has only one name, but the new "chunk" can have one or more
>>> symbols referring that. But the actual difference is that chunks are
>>> agnostic of symbols referring them in the new design. I have separated
>>> actual data from symbols to get more flexibility. And that flexibility
>>> enabled me to achieve better performance by writing more abstract code
>>> which reads less data.
>>>
>>>  - In the atom model, we have detailed information about relocations,
>>> including relocation target, offset, etc, for each atom. In the new design,
>>> we don't have them. Instead, we have just a set of symbols for each chunk
>>> that needs to be resolved to include that input chunk properly. This is
>>> more abstract and flexible than the existing design.
>>>
>>> - The atom model reads too much data from files prematurely to construct
>>> a complete graph, while the new design avoided that. This is partly an
>>> implementation's issue, but partly unavoidable, because we actually needed
>>> to build more complex data structure.
>>>
>>> - And this might be stemmed from the implementation and not from the
>>> model itself, but the thing is that it's hard to write code for the atom
>>> model because their data types have too much detailed relations with other
>>> types. For example, any atom in the model has to have a "file" that an atom
>>> was created from. This makes it hard to append linker-generated data to
>>> output which don't have a source file (we ended up having a notion of
>>> "virtual input file" that doesn't do anything meaningful itself.). Another
>>> example is that, if you want to create a symbol on-demand, you've got to
>>> create a "virtual archive" file that returns a "virtual file" containing
>>> one "virtual atom" when the archive file is asked for that symbol. In the
>>> new design, it can be expressed in one line of code instead of multiple
>>> class definitions and object juggling. Also, because relocations are
>>> explicitly represented as "references" in the atom model, we've got to
>>> create platform-specific relocation objects even for linker-generated data
>>> if it refers some other symbols, and let a platform-specific relocation
>>> function to consume that data to apply relocations. That's less abstracted
>>> than the new design, in which all classes but the actual data type needs to
>>> know about relocations are agnostic about how relocations are represented
>>> and how to actually apply them.
>>>
>>
>> These all sound like things that just indicate "we have some refactoring
>> to do", just like Duncan did for debug metadata, or David is doing for the
>> opaque pointer type, or how the Type system has been changed over the
>> years, or how clang's template parsing is changed to be compatible with
>> weird MSVC behavior. Is there something about the current situation with
>> LLD that made you think that refactoring was hopeless and required a
>> rewrite? If what we currently have doesn't fit our use cases, why not just
>> fix it?
>>
>
> I don't think these points indicate a need of refactoring. Or the meaning
> of refactoring is too broad. "Atom has only one name" is a baked in
> assumption everywhere (like "in SSA variables are assigned only once").
> "Atom has data" is another assumption. "Relocations are represented as
> graph edges" is yet another. Or "everything is represented using atoms and
> references". These design choices are made at the beginning, and they are
> everywhere. If you change them, you have to update virtually all code. Then
> what's the point of refactoring compared to creating a new foundation +
> move code on it? I found that the former is difficult to do.
>

One of the standard reasons to prefer refactoring, even though it appears
to take longer or be more difficult, is that it allows you to always keep
all tests green. It is very easy for things to slip through the cracks and
not promptly return to being green on a "from-scratch" version. This
ultimately turns into bug reports later and the feature needs to be
reimplemented; the apparent simplicity of the "from-scratch" version can
disappear very rapidly.

In the refactoring approach you are forced to incorporate a holistic
understanding of the necessary features into your simplification efforts,
since the tests keep you from accidentally disregarding necessary features.

It is very easy to accidentally buy simplicity at the cost of losing
features; if you eventually need the features back then the apparent
simplicity is an illusion.

-- Sean Silva

>
> I understand what you are saying, because as you might have noticed, I'm
> probably the person who spent one's time most on refactoring it to do what
> you are saying. I wanted to make it more readable, easy to add features,
> and run faster. I worked actually really hard. Although I partly succeeded,
> I was disappointed to myself because of a (lack of) progress. After all, I
> had to conclude that that was not going to work --  they are so different
> that it's not reasonable to spend time on that direction. A better approach
> is to set a new foundation and move existing code to them, instead of doing
> rework in-place. It may also worth mentioning that the new approach worked
> well. I made up a self-hosting linker only in two weeks, which does support
> dead-stripping and is more than 4x faster.
>
>
>>> Besides them, I'd say from my experiences of working on the atom model,
>>> the new model's ability is not that different from the atom model. They are
>>> different, there are pros and cons, and I don't agree that the atom model
>>> is more flexible or conceptually better.
>>>
>>
>> I don't understand this focus on "the atom model". "the atom model" is
>> not any particular thing. We can generalize the meaning of atom, we can
>> make it more narrow, we can remove responsibilities from Atom, we can add
>> responsibilities to Atom, we can do whatever is needed. As you yourself
>> admit, the "new model" is not that different from "the atom model". Think
>> of "the atom model" like SSA. LLVM IR is SSA; there is a very large amount
>> of freedom to decide on the exact design within that scope. "the atom
>> model" AFAICT just means that a core abstraction inside the linker is the
>> notion of an indivisible chunk. Our current design might need to be
>> changed, but starting from scratch only to arrive at the same basic idea
>> but now having to effectively maintain two codebases doesn't seem worth it.
>>
>
> Large part of the difficulties in development of the current LLD comes
> from over-generalizataion to share code between pretty much different file
> formats. My observation is that we ended up having to write large amount of
> code to share little core even which doesn't really fit well any platform
> (an example is the virtual archive file I mentioned above -- that was
> invented to hide platform-specific atom creation behind something
> platform-neutral stuff, and because archive files are supported by three
> platforms, they are chosen.) Different things are different, we need to get
> the right balance. I don't think that the current balance is not right.
>
> A lot of the issue here is that we are falsely distinguishing
>> "section-based" and "atom-based". A suitable generalization of the notion
>> of "indivisible chunks" and what you can do with them covers both cases,
>> but traditional usage of sections makes the "indivisible chunks" be a lot
>> larger (and loses more information in doing so). But as
>> -ffunction-sections/-fdata-sections shows, there is not really any
>> fundamental difference.
>>
>> -- Sean Silva
>>
>>
>>>
>>> On Thu, May 28, 2015 at 8:22 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>
>>>> On Thu, May 28, 2015 at 6:25 PM, Nick Kledzik <kledzik at apple.com>
>>>> wrote:
>>>>
>>>>>
>>>>> On May 28, 2015, at 5:42 PM, Sean Silva <chisophugis at gmail.com> wrote:
>>>>>
>>>>> I guess, looking back at Nick's comment:
>>>>>
>>>>> "The atom model is a good fit for the llvm compiler model for all
>>>>> architectures.  There is a one-to-one mapping between llvm::GlobalObject
>>>>> (e.g. function or global variable) and lld:DefinedAtom."
>>>>>
>>>>> it seems that the primary issue on the ELF/COFF side is that currently
>>>>> the LLVM backends are taking a finer-grained atomicity that is present
>>>>> inside LLVM, and losing information by converting that to a coarser-grained
>>>>> atomicity that is the typical "section" in ELF/COFF.
>>>>> But doesn't -ffunction-sections -fdata-sections already fix this,
>>>>> basically?
>>>>>
>>>>> On the Mach-O side, the issue seems to be that Mach-O's notion of
>>>>> section carries more hard-coded meaning than e.g. ELF, so at the very least
>>>>> another layer of subdivision below what Mach-O calls "section" would be
>>>>> needed to preserve this information; currently symbols are used as a bit of
>>>>> a hack as this "sub-section" layer.
>>>>>
>>>>> I’m not sure what you mean here.
>>>>>
>>>>>
>>>>> So the problem seems to be that the transport format between the
>>>>> compiler and linker varies by platform, and each one has a different way to
>>>>> represent things, some can't represent everything we want to do, apparently.
>>>>>
>>>>> Yes!
>>>>>
>>>>>
>>>>> BUT it sounds like at least relocatable ELF semantics can, in
>>>>> principle, represent everything that we can imagine an "atom-based file
>>>>> format"/"native format" to want to represent. Just to play devil's
>>>>> advocate here, let's start out with the "native format" being relocatable
>>>>> ELF - on *all platforms*. Relocatable object files are just a transport
>>>>> format between compiler and linker, after all; who cares what we use? If
>>>>> the alternative is a completely new format, then bootstrapping from
>>>>> relocatable ELF is strictly less churn/tooling cost.
>>>>>
>>>>> People on the "atom side of the fence", what do you think? Is there
>>>>> anything that we cannot achieve by saying "native"="relocatable ELF"?
>>>>>
>>>>> 1) Turns out .o files are written once but read many times by the
>>>>> linker.  Therefore, the design goal of .o files should be that they are as
>>>>> fast to read/parse in the linker as possible.  Slowing down the compiler to
>>>>> make a .o file that is faster for the linker to read is a good trade off.
>>>>> This is the motivation for the native format - not that it is a universal
>>>>> format.
>>>>>
>>>>
>>>> I don't think that switching from ELF to something new can make linkers
>>>> significantly faster. We need to handle ELF files carefully not to waste
>>>> time on initial load, but if you do, reading data required for symbol
>>>> resolution from ELF file should be satisfactory fast (I did that for COFF
>>>> -- the current "atom-based ELF" linker is doing too much things in an
>>>> initial load, like read all relocation tables, splitting indivisble chunk
>>>> of data and connect them with "indivisible" edges, etc.) Looks like we read
>>>> symbol table pretty quickly in the new implementation, and the bottleneck
>>>> of it is now the time to insert symbols into the symbol hash table -- which
>>>> you cannot make faster by changing object file format.
>>>>
>>>> Speaking of the performance, if I want to make a significant
>>>> difference, I'd focus on introducing new symbol resolution semantics.
>>>> Especially, the Unix linker semantics is pretty bad for performance because
>>>> we have to visit files one by one serially and possibly repeatedly. It's
>>>> not only bad for parallelism but also for a single-thread case because it
>>>> increase size of data to be processed. This is I believe the true
>>>> bottleneck of Unix linkers. Tackling that problem seems to be most
>>>> important to me, and "ELF as a file format is slow" is still an unproved
>>>> thing to me.
>>>>
>>>>
>>>>>
>>>>> 2) I think the ELF camp still thinks that linkers are “dumb”.  That
>>>>> they just collate .o files into executable files.  The darwin linker does a
>>>>> lot of processing/optimizing the content (e.g. Objective-C optimizing, dead
>>>>> stripping, function/data re-ordering).  This is why atom level granularity
>>>>> is needed.
>>>>>
>>>>
>>>> I think that all these things are doable (and are being done) using
>>>> -ffunction-sections.
>>>>
>>>>
>>>>>
>>>>> For darwin, ELF based .o files is not interesting.  It won’t be
>>>>> faster, and it will take a bunch of effort to figure out how to encode all
>>>>> the mach-o info into ELF.  We’d rather wait for a new native format.
>>>>>
>>>>
>>>>
>>>>> -Nick
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150529/730db956/attachment.html>