[llvm-dev] Zero length function pointer equality

Fri Jul 24 18:36:58 PDT 2020

On Fri, Jul 24, 2020 at 2:42 AM David Chisnall via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>
> On 24/07/2020 01:46, David Blaikie via llvm-dev wrote:
> > I believe C++ requires that all functions have a distinct address (ie:
> > &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
> > gets optimized into an unconditional assertion failure)
> >
> > But these zero length functions can end up with identical addresses.
> >
> > I'm unaware of anything in the C++ spec (or the LLVM langref) that
> > would indicate that would allow distinct functions to have identical
> > addresses - so should we do something about this in the LLVM backend?
> > add a little padding? a nop instruction? (if we're adding an
> > instruction anyway, perhaps we might as well make it an int3?)
>
> This is also a problem with identical function merging in the linker,
> which link.exe does quite aggressively.

Yeah, though that's a choice of the Windows linker to be
non-conforming (& can be disabled), both with the LLVM IR semantics
and the C++ semantics - which doesn't necessarily mean Clang and LLVM
should also be non-conforming.

> The special case of zero-length
> functions seems less common than the more general case of merging,

On Windows, to be sure - on Linux, for instance, not as much.

> in
> both cases you will end up with a single implementation in the binary
> that has two symbols for the same address.  For example, consider the
> following trivial program:
>
> #include <stdio.h>
>
> int a()
> {
>          return 42;
> }
>
> int b()
> {
>          return 42;
> }
>
> int main()
> {
>          printf("a == b? %d\n", a == b);
>          return 0;
> }
>
> Compiled with cl.exe /Gy, this prints:
>
> a == b? 1
>
> Given that functions are immutable, it's a somewhat odd decision at the
> abstract machine level to assume that they have identity that is
> distinct from their value (though it can simplify debugging - back
> traces in Windows executables are sometimes quite confusing when you see
> a call into a function that is structurally correct but nominally
> incorrect).

Yep, when I used to work on Windows myself and my teammates disabled
the linker feature to make development/debugging/backtraces easier to
read.

I think there's value in LLVM's decision here - for debuggability, and
correctly implementing C++ semantics. I don't think it'd be great if
we went the other direction (defining LLVM IR to have no naming
importance - so that merging two LLVM modules could merge function
implementations and redirect function calls to the singular remaining
instance). Opt-in, maybe (I guess you could opt-in by marking all
functions unnamed_addr - indeed that's why unnamed_addr was
introduced, I think, to allow identical code folding to be implemented
in a way that was correct for C++).

> Given that link.exe can happily violate this guarantee in the general
> case, I'm not too concerned that LLVM can violate it in the special
> case.  From the perspective of a programmer, I'm not sure what kind of
> logic would be broken by function equality returning true when two
> functions with different names but identical behaviour are invoked.  I'm
> curious if you have any examples.

I don't have any concrete examples of C++ code that depends on pointer
inequality between zero-length functions, no. (though we do lots of
work to make Clang conforming in other ways even without code that
requires such conformance)