[llvm-dev] Zero length function pointer equality

Richard Smith via llvm-dev llvm-dev at lists.llvm.org
Fri Jul 24 18:39:22 PDT 2020


On Fri, 24 Jul 2020 at 02:42, David Chisnall via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> On 24/07/2020 01:46, David Blaikie via llvm-dev wrote:
> > I believe C++ requires that all functions have a distinct address (ie:
> > &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
> > gets optimized into an unconditional assertion failure)
> >
> > But these zero length functions can end up with identical addresses.
> >
> > I'm unaware of anything in the C++ spec (or the LLVM langref) that
> > would indicate that would allow distinct functions to have identical
> > addresses - so should we do something about this in the LLVM backend?
> > add a little padding? a nop instruction? (if we're adding an
> > instruction anyway, perhaps we might as well make it an int3?)
>
> This is also a problem with identical function merging in the linker,
> which link.exe does quite aggressively.  The special case of zero-length
> functions seems less common than the more general case of merging, in
> both cases you will end up with a single implementation in the binary
> that has two symbols for the same address.  For example, consider the
> following trivial program:
>
> #include <stdio.h>
>
> int a()
> {
>          return 42;
> }
>
> int b()
> {
>          return 42;
> }
>
> int main()
> {
>          printf("a == b? %d\n", a == b);
>          return 0;
> }
>
> Compiled with cl.exe /Gy, this prints:
>
> a == b? 1
>
> Given that functions are immutable, it's a somewhat odd decision at the
> abstract machine level to assume that they have identity that is
> distinct from their value (though it can simplify debugging - back
> traces in Windows executables are sometimes quite confusing when you see
> a call into a function that is structurally correct but nominally
> incorrect).
>
> Given that link.exe can happily violate this guarantee in the general
> case, I'm not too concerned that LLVM can violate it in the special
> case.  From the perspective of a programmer, I'm not sure what kind of
> logic would be broken by function equality returning true when two
> functions with different names but identical behaviour are invoked.  I'm
> curious if you have any examples.
>

This is a well-known conformance-violating bug in link.exe; LLVM should not
be making things worse by introducing a similar bug itself. Smarter linkers
(for example, I think both lld and gold) will do identical function
combining only if all but one of the function symbols is only used as the
target of calls (and not to actually observe the address). And yes, this
non-conforming behavior (rarely) breaks things in practice. See this
research paper:
https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36912.pdf


> David
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200724/33901ddc/attachment-0001.html>


More information about the llvm-dev mailing list