<div dir="ltr"><div dir="ltr">On Thu, 23 Jul 2020 at 20:28, David Blaikie <<a href="mailto:dblaikie@gmail.com">dblaikie@gmail.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, Jul 23, 2020 at 7:17 PM Richard Smith <<a href="mailto:richard@metafoo.co.uk" target="_blank">richard@metafoo.co.uk</a>> wrote:<br>

><br>

> On Thu, 23 Jul 2020 at 17:46, David Blaikie <<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>> wrote:<br>

>><br>

>> LLVM can produce zero length functions from cases like this (when<br>

>> optimizations are enabled):<br>

>><br>

>> void f1() { __builtin_unreachable(); }<br>

>> int f2() { /* missing return statement */ }<br>

>><br>

>> This code is valid, so long as the functions are never called.<br>

>><br>

>> I believe C++ requires that all functions have a distinct address (ie:<br>

>> &f1 != &f2) and LLVM optimizes code on this basis (assert(f1 == f2)<br>

>> gets optimized into an unconditional assertion failure)<br>

>><br>

>> But these zero length functions can end up with identical addresses.<br>

>><br>

>> I'm unaware of anything in the C++ spec (or the LLVM langref) that<br>

>> would indicate that would allow distinct functions to have identical<br>

>> addresses - so should we do something about this in the LLVM backend?<br>

>> add a little padding? a nop instruction? (if we're adding an<br>

>> instruction anyway, perhaps we might as well make it an int3?)<br>

>><br>

>> (I came across this due to DWARF issues with zero length functions &<br>

>> thinking about if/how this should be supported)<br>

><br>

><br>

> Yes, I think at least if the optimizer turns a non-empty function into an empty function,<br>

<br>

What about functions that are already empty? (well, I guess at the<br>

LLVM IR level, no function can be empty, because every basic block<br>

must end in some terminator instruction - is that the distinction<br>

you're drawing?)<br></blockquote><div><br></div><div>Here's what I was thinking: a case could be made that the frontend is responsible for making sure that functions don't start non-empty, in much the same way that if the frontend produces a global of zero size, it gets what it asked for.</div><div>But you're right, there really isn't such a thing as an empty function at the IR level, because there's always an entry block and it always has a terminator.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

> that's a miscompile for C and C++ source-language programs. My (possibly flawed) understanding is that LLVM is obliged to give a different address to distinct globals if neither of them is marked unnamed_addr,<br>

<br>

It seems like other LLVM passes make this assumption too - which is<br>

how "f1 == f2" can be folded to a constant false. I haven't checked to<br>

see exactly where that constant folding happens. (hmm, looks like it<br>

happens in some constant folding utility - happens in the inliner if<br>

there's inlining, happens at IR generation if there's no function<br>

indirection, etc)<br>

<br>

> so it seems to me that this is a backend bug. Generating a ud2 function body in this case seems ideal to me.<br>

<br>

Guess that still leaves the possibility of the last function in an<br>

object file as being zero-length? (or I guess not, because otherwise<br>

when linked it could still end up with the same address as the<br>

function that comes after it)<br></blockquote><div><br></div><div>Yes, I think that's right. We should never put a non-unnamed_addr global at the end of a section because we don't know if it will share an address with another global. </div></div></div>