[PATCH] D40849: CodeGen: support an extension to pass linker options on ELF

Fri Dec 8 11:48:41 PST 2017

compnerd added a comment.

In https://reviews.llvm.org/D40849#947912, @jhenderson wrote:

> In https://reviews.llvm.org/D40849#946880, @compnerd wrote:
>
> > This could be embedded within the note structure.  Because we want this section to be discarded by default, having a `PROGBITS` section seems like a bad idea to me, but if someone else has a better way to accomplish this, I'm not tied to this.
>
>
> Okay, that makes sense. We have control over the linker scripts our proprietary linker uses, so we use /DISCARD/ to achieve this with our own version of the structure. Since there's no guarantee that a linker will recognise this new section, I'm happy for it to stay as SHT_NOTE, but it must fit the SHT_NOTE format, or tools won't be able to interpret it properly.
>
> Should we explicitly reserve a range of version numbers (e.g. 0xF0000000-0xFFFFFFFF) to allow for vendor-specific versions? That way there's no risk of people using a number in their local LLVM ports which will clash with something done upstream. Also, it might be wise to explicitly reserve version 0 and/or some other range, to allow us to do crazier things in the future that we can't anticipate just yet.

Yes, I like the reservation for the vendor range.  Yes, how about reserving the range `0x80000000`-`0x8fffffff` for crazy things in the future?  We could start with version `1`, I'm not tied to `0`.

>> The version field is for future enhancements.  I suppose we can make it a ULEB, but, does a total of 8-bytes matter that much?  (even in a ULEB encoding, that only saves us 6-bytes, in the object file, not the final binary, which should not preserve this field).
> 
> If we are staying with SHT_NOTE, I think a 4/8 byte field is okay, since it fits better with the alignment requirements anyway (pedantic - it would save 7 bytes for a low version, not 6).

Counting is hard!  Yeah, I think the 4-byte field makes it easier and keeps things aligned.

>> No, sorry, the reserved field is for future enhancements; you are right, that I forgot to null terminate the array.
> 
> If we are using a version field, we don't need additional reserved fields, since if we need them in the future, we can just bump the version number.

Good point.  Lets just remove that and go with the null string terminator.

>> I would be interested in the format of the structure in your linker; it is possible that you have a better format specified or that this version could be improved before we implement it.
> 
> Our format is essentially this struct:
> 
>   struct linker_cmd {
>     uint16_t type;
>     uint16_t size;
>     uint8_t *data;
>   };
> 
> 
> The data field's interpretation depends on the type field, and is often in the form of a sub-structure, containing information related to that particular linker command (e.g. the name of the library to be added). We don't need a version field, because the type field has the same sort of effect - if we wanted to change the format for a specific command, we would just change the type field. Looking at the NOTE format, there is already a type field, which achieves the same purpose.
> 
> Having a size field allows the linker to skip over data for types of command it doesn't recognise - this suggests to me that it might be wise to add a size field to the proposed structure as well. Indeed using the NOTE format would give us this for free again, via the descsz field. It would allow linkers to ignore specific versions of commands that it doesn't support, but still allow it to continue reading the rest of the section, in the event that there are multiple commands concatenated together. A size of zero means no data, and the command effectively becomes a flag (a bit like the idea behind .note.GNU-stack).
> 
> In summary from that, our structure has properties that are similar to SHT_NOTE's format, but without the alignment requirements or name field. I think for your proposal, it allows you to drop the Version field, in favour of the type field that comes with the section. Type 1 could simply be null-terminated strings that are added to the command-line (maybe with a flag field - see below).

Yeah, I guess we can identify the "type" using the name `.note.linker-options` and use the type field as the version field.

>> Right now, I expect that most ELF linkers would process the command line arguments after all the objects have been loaded.  In such a scenario, the most likely place for the addition will be the end of the command line, which is fine.
> 
> The linkers that I know of process files one at a time, so I think either adding the switch after that object has been loaded, or at the end are both viable possibilities. Restricting it to adding it to the end loses flexibility in what can be done here. Some options toggle behaviour for individual or groups of files, so it doesn't make sense for them to be added late. Also, in the library use case we currently have, as noted above, this would result in different behaviour to what we currently require, because it could result in the libraries being added in a different order. I think we could/should do something about this. One idea I had was to add a "flags" field, or something that allows to toggle the behaviour between one of three modes:
> 
> 1. Add immediately before the object containing the switch (note, I'm not sure linkers will necessarily be able to support this, because switches could change how the object is loaded, and thus the new section might otherwise not be read in at all deliberately)
> 2. Add immediately after the object in question, so that all future objects are affected by it.
> 3. Add at the end of the link line. What happens here for options that should affect the reading in of object files?
> 4. At the start of the link line, so that it affects all objects. I'm not convinced that this one is feasible but I'm adding it for completeness - for example what happens if an archive member includes this kind of directive?

I don't think that we need to restrict the processing, but prescribing the means is a good idea.  We also want to ensure that we do this well in lld.  Adding the flags enables that, so, perfectly good with that.

>> For `-r` links, I would expect that the sections would be preserved across all the object files and re-emitted into the relocatable linked object.  The final consumption of the object will process the concatenated options and then discard the note.
> 
> I suspect that the behaviour would need to be switch-specific. For some switches, it will be fine (indeed, required) to wait until the final link to process them. However, for others, it may affect how the intermediate -r link is supposed to work. In extreme cases, the switch could have an affect on both intermediate and final links (and of course, there's nothing stopping there being multiple intermediate links along the way). Perhaps another indication that a flag field might be necessary?

Hmm, yeah, having a `flags` field is reasonable, so lets go ahead and repurpose the version to flags and the notes type to version.

> One other question that a colleague mentioned is what is the encoding of the command strings? UTF-8?

UTF-8 if we must, I would prefer that we be more restrictive and keep ASCII as I don't think that any of the tools (clang/lld) really treat options as UTF-8 per se, but rather as ASCII.

Repository:
  rL LLVM

https://reviews.llvm.org/D40849