[llvm-dev] -fhash-long-section-names=N, -fhashed-section-names=map.csv

Christopher Friedt via llvm-dev llvm-dev at lists.llvm.org
Tue Nov 23 13:09:03 PST 2021


Hi list,

I'm a bit new to hacking LLVM / Clang, and I wanted to add a new
command line option "-fhash-long-section-names=N". The change will
help to overcome the 16-character limit in section names in macOS[1]
which is currently a bit of a showstopper for a certain feature in one
specific project. The option itself does not necessarily need to be
tied to macOS. ELF does not impose such a  limitation on section name
size. The default would be to preserve existing behaviour and not to
hash section names but instead continue to return errors [2]. The
minimum size for N is chosen to be 16. The maximum value is arbitrary.
A value of 0 indicates "no hashing".

The hashing process will consist of:
* SHA256
* Base64
* Truncate to N

This is already a somewhat common approach to solving this problem on macOS.

The basic idea is this (N = 16):
 // this is a short section, so no change
__attribute__((section("foo"))) => "foo"
// this "long" section has been hashed
__attribute__((section("ThisSectionNameIsTooLong"))) => "ip9RNVxH27rCS+Ix"

In the unlikely event of a section name collision, it would be good to
throw an error (a good test point). Also, since hashing is not
trivially reversible, I would like to add another option
-fhashed-section-names=map.csv, which would forward hashed section
names in a format easy to read by subsequent tooling.

For macOS, specifically, patterns like the following would also need
transformation:
section("__DATA,phoo")
extern struct foo foo_start[] __asm("section$start$__DATA$phoo");
extern struct foo foo_end[] __asm("section$end$__DATA$phoo");

This is kind of a macOS parallel of linker-generated start and stop
symbols in ELF world.

The clang frontend changes were fairly straightforward and it was
quite simple to create the transform itself in python and llvm. I'm a
little unsure of how to proceed from here. Likely there
will be some aspect of AST and some aspect of Sema involved. I have
gone over the documentation and examples [3][4], and I'm still not
entirely sure.

I have done some brute-forcing and have played around with
MCSectionMachO.cpp and MCSymbolMachO.cpp, but I think that is
definitely the wrong approach.

Finally, my questions:
1. First, is this a feature that upstream would accept?
2. Should I use the AST / Replacement approach mentioned in [5]?
3. Is there another, preferable form of "backend magic" that should be used?
4. Are there any existing tests that would be good examples to borrow from?

Would you be able to point me in the right direction?

Thanks, and hope you are well.

C

[1] See "section[16]" here:
https://opensource.apple.com/source/cctools/cctools-921/include/mach-o/loader.h.auto.html
[2]
error: argument to 'section' attribute is not valid for this target:
mach-o section specifier requires a section whose length is between 1
and 16 characters
[3] https://clang.llvm.org/hacking.html
[4] https://clang.llvm.org/docs/InternalsManual.html#adding-new-command-line-option
[5] https://youtu.be/VqCkCDFLSsc?t=2370


More information about the llvm-dev mailing list