[PATCH] D112349: [Verifier] Add verification logic for GlobalIFuncs

Itay Bookstein via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Nov 3 14:45:59 PDT 2021


ibookstein added a comment.

I'll first explain my thought process about the representation of aliases and ifuncs in the IR, and why I think both aliasees and resolvers must always be defined; I hope I'm not completely off track and would love it if @MaskRay could weigh in as to whether I make sense.
Let's start at the level of the object file; My understanding is that aliases apply to ELF and MachO, and ifuncs apply only to ELF. I'm not at all acquainted with MachO, but on the ELF side, my understanding is that:

1. Aliases are simply lowered to additional symbols with the same st_value as their aliasee. As long as the value of a symbol has to be concrete/numeric and cannot express a way to refer to another symbol, for aliases to make sense and have the correct semantics at this level, their aliasee must be defined at the IR level. Otherwise all you're left with at the object file level is an undefined symbol with no way to express that it 'wants to' alias an external symbol with some specified name. In other words, symbols are either undefined (st_shndx == 0, st_value meaningless) or defined (st_shndx != 0, st_value meaningful and holds a section offset). If we were to allow aliases to have undefined aliasees, they would decay to simple undefined symbols and lose their aliasee information.
2. IFuncs are lowered to specially typed symbols whose st_value is the resolver. In much the same way as aliases, for this to actually have any meaning, the resolver must be defined (because you have no way to specify "the value is in another castle named 'XYZ'", only "defined at offset X" or "undefined"). When we allow ifuncs to have undefined resolvers, they decay to simple undefined symbols with the additional wart of having a special symbol type, but the desired resolver name is lost. Concretely, as long as the linker doesn't throw a fuss at said wart, for the references against that symbol from within the object file this will behave like a simple undefined external function. Because in your implementation one TU will have a cpu_dispatch and therefore a defined resolver, it will 'win' and intra-EXE/intra-DSO references against the ifunc will indeed be bound against the return value of the resolver. If no translation unit in the EXE/DSO had an ifunc with the same name and a defined resolver, you'd end up with a peculiar undefined symbol of type ifunc in the EXE/DSO (same as the .o).

It is my conclusion therefore that ifuncs with undefined resolvers behave exactly like function declarations (and lose the name of the resolver), as long as the linker is willing to accept such weird symbols.
Therefore, at the IR level, they're representational slack at best, and don't do what you want (possibly binding against a differently-named resolver) at worst, so they should not be allowed.

> My understanding is the frontend's semantic rules are/were different from the IR rules, which is why we implemented it that way.

As I understand it, features like aliases and ifuncs consist mostly of vertical plumbing to expose low-level object-file semantics, and their design must be informed by them.

> I'm not sure what you mean here?  Are you suggesting that an undefined resolver should instead just implement an undefined 'function' for the .ifunc?  This doesn't seem right?  Wouldn't that cause ODR issues?

As I understand it, making the symbol your current implementation calls "x.ifunc" a function **declaration** which gets upgraded to an ifunc with a defined resolver on encountering cpu_dispatch would yield the correct behavior.

> I guess I still don't understand what the practical limitation that requires ifuncs to have a defined resolver?  The resolver is just a normal function, so it seems to me that allowing them to have normal linking rules makes sense?  I personally think this is the least obtrusive change; this patch is adding a limitation that didn't exist previously unnecessarily.

I //think// I've addressed this in the wall of text above

> It just seems odd I guess to name a function .ifunc, and not have it be an ifunc?  What does our linker think about that?

Ah, the name is just a name :)
As far as the linker is concerned, it encounters an object file with an undefined symbol (of type STT_NOTYPE) and an object file with a defined symbol with the same name, of type STT_GNU_IFUNC. It will bind references in the former against the definition in the latter.
Here's my trying it out:

  itay at CTHULHU ~/tmp/ifuncdecl/tu> cat main.c
  int foo(void);
  int main() { return foo(); }
  
  itay at CTHULHU ~/tmp/ifuncdecl/tu> cat foo.c
  static int foo_impl(void) { return 42; }
  static void *foo_resolver(void) { return &foo_impl; }
  int foo(void) __attribute__((ifunc("foo_resolver")));
  
  itay at CTHULHU ~/tmp/ifuncdecl/tu> clang-14 -c main.c -o main.c.o
  itay at CTHULHU ~/tmp/ifuncdecl/tu> clang-14 -c foo.c -o foo.c.o
  itay at CTHULHU ~/tmp/ifuncdecl/tu> clang-14 main.c.o foo.c.o -o main
  itay at CTHULHU ~/tmp/ifuncdecl/tu> ./main
  itay at CTHULHU ~/tmp/ifuncdecl/tu [42]>
  itay at CTHULHU ~/tmp/ifuncdecl/tu> llvm-readobj-14 --elf-output-style=GNU main.c.o --symbols | grep foo
       4: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND foo
  itay at CTHULHU ~/tmp/ifuncdecl/tu> llvm-readobj-14 --elf-output-style=GNU foo.c.o --symbols | grep foo
       1: 0000000000000000     0 FILE    LOCAL  DEFAULT   ABS foo.c
       3: 0000000000000000    16 FUNC    LOCAL  DEFAULT     2 foo_resolver
       4: 0000000000000010    11 FUNC    LOCAL  DEFAULT     2 foo_impl
       5: 0000000000000000    16 IFUNC   GLOBAL DEFAULT     2 foo
  itay at CTHULHU ~/tmp/ifuncdecl/tu> llvm-readobj-14 --elf-output-style=GNU main --symbols | grep foo
      35: 0000000000000000     0 FILE    LOCAL  DEFAULT   ABS foo.c
      36: 0000000000401150    16 FUNC    LOCAL  DEFAULT    13 foo_resolver
      37: 0000000000401160    11 FUNC    LOCAL  DEFAULT    13 foo_impl
      56: 0000000000401150    16 IFUNC   GLOBAL DEFAULT    13 foo
  itay at CTHULHU ~/tmp/ifuncdecl/tu> llvm-readobj-14 --elf-output-style=GNU main --relocations
  
  Relocation section '.rela.dyn' at offset 0x3d0 contains 2 entries:
      Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
  0000000000403ff0  0000000100000006 R_X86_64_GLOB_DAT      0000000000000000 __libc_start_main at GLIBC_2.2.5 + 0
  0000000000403ff8  0000000200000006 R_X86_64_GLOB_DAT      0000000000000000 __gmon_start__ + 0
  
  Relocation section '.rela.plt' at offset 0x400 contains 1 entries:
      Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
  0000000000404018  0000000000000025 R_X86_64_IRELATIVE                401150



> Thats correct, these aren't 'aliases' or 'ifuncs' as far as the CFE is concerned; they are multiversioned functions.  That 'Aliases' and 'ifunc' list in the CFE are the AST-constructs of those, not the IR constructs, so there is no reason to put the multiversioned thinks in that list, since they are implementation details.  Emitting an error "invalid alias!"/etc for

I see, makes sense, thanks for the explanation.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112349/new/

https://reviews.llvm.org/D112349



More information about the cfe-commits mailing list