[PATCH] D112349: [Verifier] Add verification logic for GlobalIFuncs
Erich Keane via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 4 06:44:32 PDT 2021
erichkeane added a comment.
In D112349#3107460 <https://reviews.llvm.org/D112349#3107460>, @ibookstein wrote:
> I'll first explain my thought process about the representation of aliases and ifuncs in the IR, and why I think both aliasees and resolvers must always be defined; I hope I'm not completely off track and would love it if @MaskRay could weigh in as to whether I make sense.
> Let's start at the level of the object file; My understanding is that aliases apply to ELF and MachO, and ifuncs apply only to ELF. I'm not at all acquainted with MachO, but on the ELF side, my understanding is that:
>
> 1. Aliases are simply lowered to additional symbols with the same st_value as their aliasee. As long as the value of a symbol has to be concrete/numeric and cannot express a way to refer to another symbol, for aliases to make sense and have the correct semantics at this level, their aliasee must be defined at the IR level. Otherwise all you're left with at the object file level is an undefined symbol with no way to express that it 'wants to' alias an external symbol with some specified name. In other words, symbols are either undefined (st_shndx == 0, st_value meaningless) or defined (st_shndx != 0, st_value meaningful and holds a section offset). If we were to allow aliases to have undefined aliasees, they would decay to simple undefined symbols and lose their aliasee information.
> 2. IFuncs are lowered to specially typed symbols whose st_value is the resolver. In much the same way as aliases, for this to actually have any meaning, the resolver must be defined (because you have no way to specify "the value is in another castle named 'XYZ'", only "defined at offset X" or "undefined"). When we allow ifuncs to have undefined resolvers, they decay to simple undefined symbols with the additional wart of having a special symbol type, but the desired resolver name is lost. Concretely, as long as the linker doesn't throw a fuss at said wart, for the references against that symbol from within the object file this will behave like a simple undefined external function. Because in your implementation one TU will have a cpu_dispatch and therefore a defined resolver, it will 'win' and intra-EXE/intra-DSO references against the ifunc will indeed be bound against the return value of the resolver. If no translation unit in the EXE/DSO had an ifunc with the same name and a defined resolver, you'd end up with a peculiar undefined symbol of type ifunc in the EXE/DSO (same as the .o).
>
> It is my conclusion therefore that ifuncs with undefined resolvers behave exactly like function declarations (and lose the name of the resolver), as long as the linker is willing to accept such weird symbols.
> Therefore, at the IR level, they're representational slack at best, and don't do what you want (possibly binding against a differently-named resolver) at worst, so they should not be allowed.
>
>> My understanding is the frontend's semantic rules are/were different from the IR rules, which is why we implemented it that way.
>
> As I understand it, features like aliases and ifuncs consist mostly of vertical plumbing to expose low-level object-file semantics, and their design must be informed by them.
>
>> I'm not sure what you mean here? Are you suggesting that an undefined resolver should instead just implement an undefined 'function' for the .ifunc? This doesn't seem right? Wouldn't that cause ODR issues?
>
> As I understand it, making the symbol your current implementation calls "x.ifunc" a function **declaration** which gets upgraded to an ifunc with a defined resolver on encountering cpu_dispatch would yield the correct behavior.
>
>> I guess I still don't understand what the practical limitation that requires ifuncs to have a defined resolver? The resolver is just a normal function, so it seems to me that allowing them to have normal linking rules makes sense? I personally think this is the least obtrusive change; this patch is adding a limitation that didn't exist previously unnecessarily.
>
> I //think// I've addressed this in the wall of text above
>
>> It just seems odd I guess to name a function .ifunc, and not have it be an ifunc? What does our linker think about that?
>
> Ah, the name is just a name :)
> As far as the linker is concerned, it encounters an object file with an undefined symbol (of type STT_NOTYPE) and an object file with a defined symbol with the same name, of type STT_GNU_IFUNC. It will bind references in the former against the definition in the latter.
> Here's my trying it out:
>
> itay at CTHULHU ~/tmp/ifuncdecl/tu> cat main.c
> int foo(void);
> int main() { return foo(); }
>
> itay at CTHULHU ~/tmp/ifuncdecl/tu> cat foo.c
> static int foo_impl(void) { return 42; }
> static void *foo_resolver(void) { return &foo_impl; }
> int foo(void) __attribute__((ifunc("foo_resolver")));
>
> itay at CTHULHU ~/tmp/ifuncdecl/tu> clang-14 -c main.c -o main.c.o
> itay at CTHULHU ~/tmp/ifuncdecl/tu> clang-14 -c foo.c -o foo.c.o
> itay at CTHULHU ~/tmp/ifuncdecl/tu> clang-14 main.c.o foo.c.o -o main
> itay at CTHULHU ~/tmp/ifuncdecl/tu> ./main
> itay at CTHULHU ~/tmp/ifuncdecl/tu [42]>
> itay at CTHULHU ~/tmp/ifuncdecl/tu> llvm-readobj-14 --elf-output-style=GNU main.c.o --symbols | grep foo
> 4: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND foo
> itay at CTHULHU ~/tmp/ifuncdecl/tu> llvm-readobj-14 --elf-output-style=GNU foo.c.o --symbols | grep foo
> 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS foo.c
> 3: 0000000000000000 16 FUNC LOCAL DEFAULT 2 foo_resolver
> 4: 0000000000000010 11 FUNC LOCAL DEFAULT 2 foo_impl
> 5: 0000000000000000 16 IFUNC GLOBAL DEFAULT 2 foo
> itay at CTHULHU ~/tmp/ifuncdecl/tu> llvm-readobj-14 --elf-output-style=GNU main --symbols | grep foo
> 35: 0000000000000000 0 FILE LOCAL DEFAULT ABS foo.c
> 36: 0000000000401150 16 FUNC LOCAL DEFAULT 13 foo_resolver
> 37: 0000000000401160 11 FUNC LOCAL DEFAULT 13 foo_impl
> 56: 0000000000401150 16 IFUNC GLOBAL DEFAULT 13 foo
> itay at CTHULHU ~/tmp/ifuncdecl/tu> llvm-readobj-14 --elf-output-style=GNU main --relocations
>
> Relocation section '.rela.dyn' at offset 0x3d0 contains 2 entries:
> Offset Info Type Symbol's Value Symbol's Name + Addend
> 0000000000403ff0 0000000100000006 R_X86_64_GLOB_DAT 0000000000000000 __libc_start_main at GLIBC_2.2.5 + 0
> 0000000000403ff8 0000000200000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0
>
> Relocation section '.rela.plt' at offset 0x400 contains 1 entries:
> Offset Info Type Symbol's Value Symbol's Name + Addend
> 0000000000404018 0000000000000025 R_X86_64_IRELATIVE 401150
>
>
>
>> Thats correct, these aren't 'aliases' or 'ifuncs' as far as the CFE is concerned; they are multiversioned functions. That 'Aliases' and 'ifunc' list in the CFE are the AST-constructs of those, not the IR constructs, so there is no reason to put the multiversioned thinks in that list, since they are implementation details. Emitting an error "invalid alias!"/etc for
>
> I see, makes sense, thanks for the explanation.
>From my perspective, an ifunc is just a linkable entity, as is a resolver. If the linker can merge symbols for the resolver (and an ifunc points to one), it seems to me to make a ton of sense to allow the resolver to be defined in another TU?
I guess I feel the same way with an alias, why can't I just alias to a function in a different TU, so long as this is a linkable entity?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D112349/new/
https://reviews.llvm.org/D112349
More information about the llvm-commits
mailing list