[LLVMdev] [lld] Representation of lld::Reference with a fake target

Sat Feb 7 00:36:41 PST 2015

My 2c: maybe we should not try to put all target specific object file
formats into the single YAML/Native representation. Let's define an
universal formats of file "header" for YAML/Native representation and
probably some top-level structures common for all target and allow
target specific code to arbitrary extend these formats. For example
code in the ReaderWriter/ELF will know how to convert ELF object files
into the YAML/Native form. In that case we get in fact some
incompatible YAML/Native formats for ELF, PECOFF, MachO etc. But I
think it is not a problem.

On Sat, Feb 7, 2015 at 6:28 AM, Rui Ueyama <ruiu at google.com> wrote:
> Not all input files have to be able to represented in YAML/Native format.
> There are many unrealistic use cases there. No one wants to write an
> executable file in Native because there's no operating system that can run
> that file. So is YAML. So is the combination of .so file and Native/YAML
> unless we have an operating system whose loader is able to loads a YAML .so
> file.
>
> We might want to write a Native/YAML file as a re-linkable object file (in
> GNU it's -r option), but that's an object file.
>
> So it's totally okay if some input file type is not representable in
> YAML/Native. Some use cases are not real. We can't force all developers to
> spend their time to support unrealistic use cases.
>
> On Fri, Feb 6, 2015 at 7:04 PM, Shankar Easwaran <shankarke at gmail.com>
> wrote:
>>
>> The intermediate result is what is really written to disk when
>> --output-filetype=yaml or native is chosen too.
>>
>>
>> Writing to YAML/Reading back YAML is not doable when you convert input
>> files to atoms because some of the input files are not representable in YAML
>> format.
>>
>> On Fri, Feb 6, 2015 at 8:48 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>
>>> I think no one is opposing the idea of reading and writing YAML.
>>>
>>> The problem here is that why we need to force all developers to write
>>> code to serialize intermediate data in the middle of link, which no one
>>> except the round-trip passes needs.
>>>
>>> On Fri, Feb 6, 2015 at 6:41 PM, Shankar Easwaram <shankarke at gmail.com>
>>> wrote:
>>>>
>>>> Doing it for every input file is not useful as some of the input files
>>>> are not represent able in YAML form. Examples are shared libraries.
>>>>
>>>> The reason I made the yaml pass be called before the writer was the
>>>> intermediate result was more complete since all atoms have been resolved at
>>>> that point and the state of all atoms are much sane.
>>>>
>>>> It was also easy to use the pass manager. the code was very small to
>>>> achieve what we are trying to do that all the information to the writer is
>>>> passed through references or atom properties.
>>>>
>>>> Shankar Easwaran
>>>>
>>>>
>>>>
>>>> On Feb 6, 2015, at 19:54, Rui Ueyama <ruiu at google.com> wrote:
>>>>
>>>> On Fri, Feb 6, 2015 at 5:42 PM, Michael Spencer <bigcheesegs at gmail.com>
>>>> wrote:
>>>>>
>>>>> On Fri, Feb 6, 2015 at 5:31 PM, Rui Ueyama <ruiu at google.com> wrote:
>>>>> > There are two questions.
>>>>> >
>>>>> > Firstly, do you think the on-disk format needs to compatible with a
>>>>> > C++
>>>>> > struct so that we can cast that memory buffer to the struct? That may
>>>>> > be
>>>>> > super-fast but that also comes with many limitations. It's hard to
>>>>> > extend,
>>>>> > for example. Every time we want to store variable-length objects we
>>>>> > need to
>>>>> > define string-table-like data structure. And I'm not very sure that
>>>>> > it's
>>>>> > fastest -- because mmap'able objects are not very compact on disk,
>>>>> > slow disk
>>>>> > IO could be a bottleneck, if we compare that with more compact file
>>>>> > format.
>>>>> > I believe Protobufs or Thrust are fast enough or even might be
>>>>> > faster.
>>>>>
>>>>> I'm not sure here. Although I do question if the object files will
>>>>> even need to be read from disk in your standard edit/compile/debug
>>>>> loop or on a build server. I believe we'll need real data to determine
>>>>> this.
>>>>>
>>>>> >
>>>>> > Secondly, do you know why we are dumping post-linked object file to
>>>>> > Native
>>>>> > format? If we want to have a different kind of *object* file format,
>>>>> > we
>>>>> > would want to have a tool to convert an object file in an existing
>>>>> > file
>>>>> > format (say, ELF) to "native", and teach LLD how read from the file.
>>>>> > Currently we are writing a file in the middle of linking process,
>>>>> > which
>>>>> > doesn't make sense to me.
>>>>>
>>>>> This is an artifact of having the native format before we had any
>>>>> readers. I agree that it's weird and not terribly useful to write to
>>>>> native format in the middle of the link, although I have found it
>>>>> helpful to output yaml. There's no need to be able to read it back in
>>>>> and resume though.
>>>>
>>>>
>>>> Even for YAML it doesn't make much sense to write it to a file and read
>>>> it back from the file in the middle of the link, do it? I found that being
>>>> able to output YAML is useful too, but round-trip is a different thing. In
>>>> the middle of the process, we have bunch of additional information that
>>>> doesn't exist in input files and doesn't have to be output to the link
>>>> result. Ability to serialize that intermediate result is not useful.
>>>>
>>>> Shankar, you added these round-trip tests. Do you have any opinion?
>>>>
>>>>> Ideally lld -r would be the tool we use to convert COFF/ELF/MachO to
>>>>> the native format.

-- 
Simon Atanasyan