<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Jun 10, 2009, at 1:18 PM, Nick Lewycky wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div class="gmail_quote">2009/6/10 John McCall <span dir="ltr"><<a href="mailto:rjmccall@apple.com">rjmccall@apple.com</a>></span><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
There's another point that hasn't been raised yet here, which is that<br>
the<br>
undefinedness of calling (void*) 0 is a property of C, not necessarily<br>
of<br>
the LLVM abstract language. I think you can make an excellent case that<br>
the standard optimizations should not be enforcing C language semantics,<br>
or at least should allow such optimizations to be disabled.</blockquote><div><br>All sorts of optimizations rely on this, whether as simple as eliminating comparisons of alloca against null to knowing that two malloc'd pointers can never alias (what if malloc returns null? if null is valid then you can store data there...).<br></div></div></blockquote><div><br></div>I'm not saying we should never make *any* assumptions about null, or that C-specific assumptions should be totally unwelcome in standard passes. I'm saying that current practice makes it very difficult to avoid certain C-specific assumptions.</div><div><br></div><div>Let's take your examples. The assumption that alloca never produces null seems like a reasonable cross-language assumption to me, based on alloca's status as a compiler-defined (and totally unstandardized) intrinsic; if I need more rigic semantics, I shouldn't be using alloca. The assumption that the function called malloc never returns aliasing pointers is indeed a C-specific assumption, but it's one that I can easily avoid if necessary by, well, not using C-specific libcall optimizations. And most of these C-inspired assumptions fall into one of those two categories: it's either generally valid or easily disabled.</div><div><br></div><div>On the other hand, the assumption that calls to null are undefined behavior is so hard-coded into instcombine that I can only avoid it by refusing to run the entire instcombine pass, or by carefully guarding how I emit calls that might be to null. And I do think this is inappropriate for a core pass, just as if someone made BasicAliasAnalysis do type-based alias analysis based on C's strict-aliasing rules, or if someone modified a loop-counting pass to use C's signed-overflow semantics, or so on. At the very least, there should be some way to configure this on the pass.</div><div><br></div><div><blockquote type="cite"><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left-width: 1px; border-left-style: solid; border-left-color: rgb(204, 204, 204); margin-top: 0pt; margin-right: 0pt; margin-bottom: 0pt; margin-left: 0.8ex; padding-left: 1ex; position: static; z-index: auto; ">Case in point — calls/loads/stores to null may be undefined behavior<br>
in C,<br>
but they're certainly not undefined behavior in (say) Java. There's a<br>
well-<br>
known implementation trick in JVMs where you optimistically emit code<br>
assuming non-null objects, and then you install signal handlers to raise<br>
exceptions in the cases where you're wrong. Now, obviously that trick<br>
is going to have implications for the optimizers beyond "don't mark null<br>
stores as unreachable" , but even so, it really shouldn't be totally<br>
precluded<br>
by widespread assumptions of C semantics.</blockquote><div><br>The current workaround is to use an alternate address space for your pointers. At some point we may extend the load/store/call instructions to specify their exact semantics similarly to the integer overflow proposal ( <a href="http://nondot.org/sabre/LLVMNotes/IntegerOverflow.txt">http://nondot.org/sabre/LLVMNotes/IntegerOverflow.txt</a> ).<br>
</div></div></blockquote></div><br><div>I'll note that instcombine actually marks stores to null as unreachable regardless of the address space of the pointer, unless I'm missing something subtle.</div><div><br></div><div>John.</div></body></html>