[llvm-dev] [RFC] Adding support for dynamic entries in yaml2obj

Thu Jan 17 01:55:32 PST 2019

Thanks for bringing this up. Since you posted on the review, I've been
thinking more about the different options and overall design of yaml2obj's
dynamic sections (including dynstr and dynsym) and how they work from a
user's perspective, not least motivated by https://reviews.llvm.org/D56791,
where I had to hand-craft a large .dynamic section, and ended up fighting
with .dynsym and .dynstr too (see also
https://bugs.llvm.org/show_bug.cgi?id=40339, for example).

As for the proposal, it sounds reasonable to be able to specify numeric and
string arguments as you propose. However, I do have some
questions/thoughts/points:

1) What should happen if an explicit .dynstr content is specified (or more
specifically, explicit content is specified for the linked section, see
point 2)? My suggestion would be that it should be an error to specify
string values in .dynamic and explicit content in .dynstr at the same time.
2) I'd like to avoid the same issue that is present in
https://bugs.llvm.org/show_bug.cgi?id=40337, namely that the .dynamic
section auto-populates strings in .dynstr, even though it is linked against
some other section. Note that this other section could also be a string
table, so using strings in that instance would still be valid.
3) It would be nice to have some mechanism to optionally auto-populate the
.dynamic section with DT_STRTAB, DT_STRSZ, DT_SYMTAB, etc.
4) It should be possible to omit the "Link:" field of the header, to get a
value of 0 there, or to specify a section index, in place of a section name.

Related to point 3), I foresee several different use-cases: a) users want a
regular .dynamic section, with DT_STRTAB etc in, but want the values
auto-populated for everything. They don't want to specify any tags at all.
yaml2obj does the hard work of fetching the addresses and sizes
automatically; b) similar to a) but a user wants to be able to extend a
.dynamic section with other tags (e.g. DT_SONAME); c) again similar to a),
but a user wants to be able to override some of the auto-generated tags
(but not necessarily all of them); d) a user wants to completely control
the content with normal-looking tags, i.e. no auto-generation at all (e.g.
they want to create a dynamic section without DT_STRTAB); e) a user wants
to be able to hand-craft the content completely (e.g. to create truncated
tags etc) - this could probably be best done via having an explicit
content. I think all except e) could be accomplished by having an extra
attribute for dynamic sections, namely something like "DeriveTags:
true/false" or similar. A value of true would mean that all mandatory tags
are automatically generated, unless an explicit tag of the same value is
specified. In order to have multiple default-generated tags with the same
DT_* value (or none of them), this would need to be false.

I like your idea of a missing Value being inferred. This would work well
with the ability to turn off auto-generated tags, and would minimise the
pain of having to write each normally-defaulted tag by hand in this case.

Note: I have no idea how well my above proposal works in relation to the
current design of yaml2obj. My personal ideal is that a user can use
yaml2obj to create basically any ELF object they want, primarily for
testing, so being able to test corner cases (e.g. missing tags, malformed
sections etc) is a significant requirement.

James

On Wed, 16 Jan 2019 at 19:13, Armando Montanez <amontanez at google.com> wrote:

> The goal of this proposal is to introduce a new type of YAML section for
> yaml2obj that allows the population of ELF .dynamic entries via a list of
> tag and value pairs. These entries are interpreted (and potentially
> validated) before being written to the .dynamic section. The simplest way
> to satisfy this requirement is for all dynamic entry values to be numeric
> values. Unfortunately, this inherently prevents entries like DT_SONAME,
> DT_NEEDED, DT_RPATH, and DT_RUNPATH from being specified alongside dynamic
> symbols due to the design of yaml2obj.
>
>
> This proposal introduces three ways to input a value for a dynamic entry.
> For a given dynamic tag, one or more of these methods of setting a value
> may be permitted. All of these cases are illustrated later with an example.
>
>
> 1. For dynamic entry strings that belong in .dynstr, the string itself can
> be used as the value for an entry. (ex. DT_SONAME, DT_NEEDED, DT_RPATH, and
> DT_RUNPATH)
>
>
> 2. A section name can be used in place of an address. In this case, the
> value of the dynamic entry is the sh_addr of the specified section. (ex.
> DT_STRTAB, DT_SYMTAB, DT_HASH, DT_RELA, and others)
>
>
> 3. A value can be specified using hexadecimal or decimal (or other bases
> supported by `StringRef::to_integer()`). (ex. DT_STRSZ, DT_SYMENT,
> DT_RELAENT, and others)
>
>
> Here's an example to illustrate this design:
>
>
> !ELF
>
> FileHeader:
>
>  Class:           ELFCLASS64
>
>  Data:            ELFDATA2LSB
>
>  Type:            ET_DYN
>
>  Machine:         EM_X86_64
>
> Sections:
>
>  - Name: .dynsym
>
>    Type: SHT_DYNSYM
>
>    Address: 0x1000
>
>  - Name: .data
>
>    Type: SHT_PROGBITS
>
>    Flags: [ SHF_ALLOC, SHF_WRITE ]
>
>  - Name: .dynamic
>
>    Type: SHT_DYNAMIC
>
>    Entries:
>
>      - Tag: DT_SONAME
>
>        Value: libsomething.so
>
>      - Tag: DT_SYMTAB
>
>        Value: .dynsym
>
>      - Tag: DT_SYMENT
>
>        Value: 0x18
>
> DynamicSymbols:
>
>  Global:
>
>    - Name: foo
>
>      Type: STT_FUNC
>
>      Section: .data
>
>    - Name: bar
>
>      Type: STT_OBJECT
>
>      Section: .data
>
>
> The final section is of type SHT_DYNAMIC, and the "Entries" key
> illustrates the proposed addition. Walking through the three dynamic
> entries,
>
>
> 1. DT_SONAME: The value of this entry is a string that will be inserted
> into the dynamic string table (.dynstr) alongside the symbol names
> specified in DynamicSymbols. This is possible due to the nature of .dynstr
> being represented as a StringTableBuilder, and that .dynamic is linked to
> .dynstr by default. If the .dynamic section had been linked to a section
> other than .dynstr, the value of this entry would have to be a number (the
> offset of the string in the linked string table) rather than a string.
>
>
> 2. DT_SYMTAB: This tag may either be a numeric address or a valid section
> name, and this example illustrates the option of using the name of a
> section rather than the address. This resolves to 0x1000 since .dynsym is
> declared with an address of 0x1000. It would have been equally valid to
> make this entry have a value of 0x1000, but doing so would mean that
> changes to .dynsym's address would need to be manually updated in the
> dynamic entry. It's also worth noting that in the case of DT_SYMTAB it
> wouldn't be too difficult to infer this.
>
>
> 3. DT_SYMENT: This tag is restricted to only having numeric values. This
> entry could easily be inferred as well.
>
>
> Note that it doesn’t make sense for DT_SYMENT to be any sort of string, so
> it is restricted to only being populated with a numeric value. Similarly,
> it doesn’t make sense for the value of DT_SONAME to ever be interpreted as
> the name of a section. Though at least one input method is required for a
> given dynamic tag, it’s typically the case that not all three are valid. It
> should also be possible to specialize upon certain tags for convenience.
> For example, DT_PLTREL could be specialized to allow “REL” and “RELA” to be
> used as values rather than requiring the values be entered in hexadecimal.
> Evaluating the needs for every dynamic tag isn’t within the scope of this
> proposal, so any tag without a specialization defaults to permitting
> numeric values or the name of a valid section (that is later converted to
> an address).
>
>
> Some dynamic tags have strict enough constraints that they can be
> inferred. This limited set of dynamic tags could treat “Value” an optional
> field since the value can be inferred from other parts of an ELF file. This
> isn’t a requirement for me, though it's something I'd certainly like to
> have.
>
>
> I began working on a patch here, and it will later be updated to reflect
> the RFC:
>
> https://reviews.llvm.org/D56569
>
>
> Best,
>
> Armando
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190117/e7c84f14/attachment.html>