[PATCH] D66426: [lld] Enable a watermark of loadable sections to be generated and placed in a note section

Thu Aug 22 08:53:48 PDT 2019

edd added a comment.

Hi all,

Chris is out of the office until the 28th. I'm sure he'll do what he can to address the technical concerns and gaps raised so far upon his return, but I'll try to field some of the questions regarding intent until then.

In D66426#1637540 <https://reviews.llvm.org/D66426#1637540>, @MaskRay wrote:

> I think I want to hear a bit more about the motivation and how you'd use this feature.

We are primarily concerned about the situation where a studio produces a game for PlayStation that depends on an incidental detail of the OS. The scenario we must avoid whenever possible is an update to the OS interfering with the intended operation of a game. Fixing that is a costly process. We put a lot of effort in to making sure our tools do what they can to avoid this situation.

So given an ELF with a watermark and the ability to recalculate the watermark (perhaps with llvm-readelf), we can detect when additional transformations were applied post-link. Such transformations may break some of the invariants that we have been careful to establish and maintain in our supported tooling and workflows. There are many ways in which our users could accidentally introduce fragility, so it isn't water-tight or all-encompassing, but detecting this one situation is nevertheless useful to us as we may decide to explore what transformations were applied to seek (mutual) reassurance or identify a gap in our SDK offering.

In D66426#1636845 <https://reviews.llvm.org/D66426#1636845>, @ruiu wrote:

> IIRC lld's --build-id={md5,sha1} was slow at first but after we made them a tree-hash to utilize mutli-cores, its cost became negligible. Have you considered taking that approach? This watermark hash is probably not a thing that people attack by the collision attack, so it might not have to be a cryptographically-safe hash, but still cryptographically-safe hash function has nice properties compared to non-crypto ones.

We don't really need any kind of cryptographic guarantees as the watermark is not intended to be part of a strict gating process. We probably would have considered crc32 if it was already available, but our experiments have shown that xxHash adds negligible overhead. Actually, md5 or sha1 may also be fine but we had no need to explore them given the existence of xxHash.

What really is important is that we only have the content of PT_LOADs contribute to the watermark. This is because we would like to be able to recalculate the watermark post-link via a tool and get the same value back, even if the ELF has since been stripped of metadata (DWARF, .symtab, etc).

In D66426#1637540 <https://reviews.llvm.org/D66426#1637540>, @MaskRay wrote:

> Another problem is that .note.gnu.build-id is SHF_ALLOC and included in a PT_LOAD segment. When computing watermark, you probably don't want to include its contents. So you should compute build-id and watermark first before you write.

We would like to have two PT_NOTEs, one for use by the OS and another for use by tooling. This is achieved by linker scripts. The second PT_NOTE is outside of any PT_LOAD and this is where the watermark would be housed. As described above, a required property of the watermark is that it can be recalculated by an external tool to infer whether or not the loadable parts of the ELF have been modified post-link. By having the watermark outside of any PT_LOAD, it is simpler for the external tool to recalculate. For a similar reason, it would actually be better in our case to have the watermark calculated after the build ID value has been "filled-in", as the build ID is inside a PT_LOAD.

(Adjusting the code to better accommodate other layouts is certainly something worth considering. I'm just explaining how we intend to make use of it).

In D66426#1639239 <https://reviews.llvm.org/D66426#1639239>, @peter.smith wrote:

> My first reaction is that this seems to be quite a bit of a platform specific feature to build into the linker, it could also make the platform dependent on LLD if this didn't also get into binutils or other ELF linkers.

This is true, but we have a very similar (although admittedly not identical) feature in our existing proprietary linker and we have a requirement that our customers use the linker supplied with our SDK.

In D66426#1639239 <https://reviews.llvm.org/D66426#1639239>, @peter.smith wrote:

> An alternative approach which I believe has been used in other platforms before (for example https://people.freebsd.org/~tmm/elfcksum.c) is to reserve some empty space in the binary that an external tool can post-process to write any checksum/hash etc that you want into it. This is not as convenient but would be compatible with other linkers and not require a LLVM specific extension to ELF.

The existence of the watermark is a requirement on PlayStation. Indeed, we would like to avoid mandating an easily-forgotten post-link step. More to the point, adding the watermark via a post-link step would mean post-link modifications could be made before the watermark is added, which rather defeats the point (that may not have been too clear before - sorry).

Thanks,
Edd

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66426/new/

https://reviews.llvm.org/D66426