[llvm-dev] Zero length function pointer equality

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Thu Jul 23 20:28:18 PDT 2020


On Thu, Jul 23, 2020 at 7:17 PM Richard Smith <richard at metafoo.co.uk> wrote:
>
> On Thu, 23 Jul 2020 at 17:46, David Blaikie <dblaikie at gmail.com> wrote:
>>
>> LLVM can produce zero length functions from cases like this (when
>> optimizations are enabled):
>>
>> void f1() { __builtin_unreachable(); }
>> int f2() { /* missing return statement */ }
>>
>> This code is valid, so long as the functions are never called.
>>
>> I believe C++ requires that all functions have a distinct address (ie:
>> &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)
>> gets optimized into an unconditional assertion failure)
>>
>> But these zero length functions can end up with identical addresses.
>>
>> I'm unaware of anything in the C++ spec (or the LLVM langref) that
>> would indicate that would allow distinct functions to have identical
>> addresses - so should we do something about this in the LLVM backend?
>> add a little padding? a nop instruction? (if we're adding an
>> instruction anyway, perhaps we might as well make it an int3?)
>>
>> (I came across this due to DWARF issues with zero length functions &
>> thinking about if/how this should be supported)
>
>
> Yes, I think at least if the optimizer turns a non-empty function into an empty function,

What about functions that are already empty? (well, I guess at the
LLVM IR level, no function can be empty, because every basic block
must end in some terminator instruction - is that the distinction
you're drawing?)

> that's a miscompile for C and C++ source-language programs. My (possibly flawed) understanding is that LLVM is obliged to give a different address to distinct globals if neither of them is marked unnamed_addr,

It seems like other LLVM passes make this assumption too - which is
how "f1 == f2" can be folded to a constant false. I haven't checked to
see exactly where that constant folding happens. (hmm, looks like it
happens in some constant folding utility - happens in the inliner if
there's inlining, happens at IR generation if there's no function
indirection, etc)

> so it seems to me that this is a backend bug. Generating a ud2 function body in this case seems ideal to me.

Guess that still leaves the possibility of the last function in an
object file as being zero-length? (or I guess not, because otherwise
when linked it could still end up with the same address as the
function that comes after it)


More information about the llvm-dev mailing list