[PATCH] D22683: [ELF] Symbol assignment within input section list

Tue Aug 2 18:47:57 PDT 2016

The patch as is crashes on linkerscript-provide-in-section.s. It is ok
to turn into an error if something is wrong, but please don't crash.

Cheers,
Rafael

On 2 August 2016 at 18:23, Rui Ueyama <ruiu at google.com> wrote:
> On Tue, Aug 2, 2016 at 3:15 PM, Eugene Leviant <evgeny.leviant at gmail.com>
> wrote:
>>
>>
>>
>> 2016-08-03 0:55 GMT+03:00 Rui Ueyama <ruiu at google.com>:
>>>
>>> On Tue, Aug 2, 2016 at 2:45 PM, Eugene Leviant <evgeny.leviant at gmail.com>
>>> wrote:
>>>>
>>>>
>>>>
>>>> среда, 3 августа 2016 г. пользователь Rui Ueyama написал:
>>>>>
>>>>> On Tue, Aug 2, 2016 at 3:00 AM, Eugene Leviant
>>>>> <evgeny.leviant at gmail.com> wrote:
>>>>>>
>>>>>> evgeny777 added a comment.
>>>>>>
>>>>>> I think the main reason, we're using virtual input sections is that
>>>>>> this the only way to calculate correct symbol offset. As you may know
>>>>>> location counter is not incremented while we add input sections to output
>>>>>> section, and the true size of input sections is known only after call to
>>>>>> OutputSectionBase<ELFT>::assignOffsets().
>>>>>>
>>>>>> So if you suggest an algorithm, which can calculate correct symbol
>>>>>> value (w/o using virtual input sections) in the case below:
>>>>>>
>>>>>>   .foo : { *(.foo); end_foo = .; *(.bar) }
>>>>>>
>>>>>> then we can probably switch to absolute symbols (BTW we can also use
>>>>>> synthetic symbols - there is a little difference, if any).
>>>>>> Another interesting question is what will happen if we define absolute
>>>>>> symbol in shared object and reference it in executable? For example:
>>>>>>
>>>>>>   /* script for linking shared library */
>>>>>>   SECTIONS { .text : { text_start = .; *(.text) } }
>>>>>>
>>>>>> So, when shared library is loaded by application, what value would
>>>>>> text_start have, in case it is absolute? I don't know yet, but will try.
>>>>>
>>>>>
>>>>> At first, I suggested you use empty dummy input sections to define
>>>>> linker-script-defined symbols in the hope that in that way we don't need to
>>>>> fix symbol addresses later (I was hoping that symbol addresses are
>>>>> automatically fixed as attached input sections get final output addresses.)
>>>>> Now that we know it doesn't work for many possible use cases. Then maybe we
>>>>> want to eliminated dummy sections and directly define symbols as absolute
>>>>> (or section) symbols.
>>>>
>>>>
>>>> Like I said, the main problem is calculating this "absolute value". How
>>>> are you going to do this? Also, like George said, it is not correct to use
>>>> absolute values for symbols defined inside output section description
>>>
>>>
>>> I think you don't need to calculate absolute values. We know the relative
>>> distance from beginning of the current output section and the current "."
>>> value, so we can create a DefinedSynthetic symbol with the output section
>>> and the relative offset.
>>>
>> Still have to deal with thunks, changing input section size, no?
>
>
> Yes. But even with the current two-pass approach, I think the
> above-mentioned logic should work.
>
>>
>>
>>
>>>>
>>>>
>>>>>
>>>>> In this patch, you are trying to support assignments to symbols.
>>>>> However, we eventually want to support something like this, too.
>>>>>
>>>>>   SECTIONS { .text : { foo.o(.text); . = ALIGN(128); bar.o(.text) } }
>>>>
>>>>
>>>> I do not see any problem in doing this. I think we use the same
>>>> SymbolInputSection<ELFT> but with non-zero size, so proper layout will be
>>>> calculated automatically in assignOffsets. Does this make sense?
>>>
>>>
>>> I don't think so. Does it work for more complicated inputs, such as
>>>
>>>   SECTIONS {
>>>     .data { *(.data) }
>>>     .text : { foo.o(.text); . += SIZEOF(.data); bar.o(.text) }
>>>   }
>>>
>>> ?
>>
>>
>> I think making InputSectionBase<ELFT>::getSize() a virtual method will
>> solve the problem, won't it?
>
>
> It makes getSize() really complicated, no? If the expression is ". =
> SIZEOF(.data) + ALIGN(100)", the input section need to understand the size
> of the output .data section as well as the current dot value. Also, no input
> sections have vtables now, so adding it only for getSize is probably too
> much.
>
>>
>>
>>>
>>>
>>>>>
>>>>>
>>>>> Looks like this doesn't fit to the current architecture. Currently, we
>>>>> create a list of input sections and assign them addresses later. But in
>>>>> order to process the above script, one pass would fit well. So I'm wondering
>>>>> if we should merge LinkerScript::createSections and
>>>>> LinkerScript::assignOffsets.
>>>>
>>>>
>>>> How can this be done? We have createThunks() in between.
>>>
>>>
>>> Yeah, we have Thunks. I haven't thought enough about that yet. But why we
>>> can't create thunks earlier, even before createSections?
>>
>>
>> Is this possible? As far as I understand thunk contains jump. which can be
>> between two input sections (or even output sections). Until you create full
>> layout (like we do in createSections), it looks
>> like a tough problem to solve.
>
>
> Well, I believe it's at least technically doable (I'm not sure how hard it
> is). When the Writer is called, all symbols are resolved, so all relocations
> should know where they point to. That means it can be determined whether
> they need thunks or not.
>
>>
>>
>>>
>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> ================
>>>>>> Comment at: ELF/LinkerScript.cpp:278
>>>>>> @@ -176,3 +277,3 @@
>>>>>>  // Process ONLY_IF_RO and ONLY_IF_RW.
>>>>>>  template <class ELFT> void LinkerScript<ELFT>::filter() {
>>>>>>    // In this loop, we remove output sections if they don't satisfy
>>>>>> ----------------
>>>>>> ruiu wrote:
>>>>>> > Why did you have to make a change to this function?
>>>>>> Two main reasons:
>>>>>>
>>>>>> 1) During filtering process some output sections may be removed. Those
>>>>>> sections may contain symbols and SymbolInputSection object have already been
>>>>>> created for them. To avoid crashes and/or creating dummy symbols I have to
>>>>>> remove those virtual sections as well
>>>>>>
>>>>>> 2) The old implementation is not technically correct, because it
>>>>>> removes only first output section found in name lookup. We're still using
>>>>>> OutputSectionFactory<ELFT>, so we may have several sections with the same
>>>>>> name.
>>>>>>
>>>>>> Another reason (though much less significant) is that one-by-one
>>>>>> removal from std::vector must be slow, because it stores elements on
>>>>>> continuous region of memory.
>>>>>>
>>>>>>
>>>>>> https://reviews.llvm.org/D22683
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>
>