<div dir="ltr">OK, I think I understand now. I've been able to make some source fixes.<div><br></div><div>One case, above, is a custom data structure that can apparently exist either on the stack or heap (AFAICT), and implements lookups using "base+offset", where, when it's on the heap, base == <nullptr> and offset == <where the data really is>, but when it's on the stack, base == <location of the base>, offset == <4, 8, or whatever> to index into that.</div><div>The easy fix is to translate "base + offset" to "base ? base + offset : reinterpret_cast<const char*>(offset)" -- I think that works around the UB you mentioned. (I'm not sure it's *completely* UB-free, but it's at least less UB...)</div><div><br></div><div>And then there's also this glory: <a href="https://github.com/google/filament/blob/772af1e8971e65bbff314cf696a9f32da79e3485/filament/backend/include/private/backend/CommandStream.h#L102" class="cremed">https://github.com/google/filament/blob/772af1e8971e65bbff314cf696a9f32da79e3485/filament/backend/include/private/backend/CommandStream.h#L102</a></div><div>(hint: there's a loop elsewhere that needs that to be nullptr to terminate...)</div><div><br></div><div>Anyway, horrifying stuff, but seems to be not a miscompile after all. I'm surprised sanitizers didn't catch this, but I guess this kind of check just hasn't been implemented yet. Sorry for the false alarm!</div><div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Aug 27, 2019 at 12:39 PM Jordan Rupprecht <<a href="mailto:rupprecht@google.com">rupprecht@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Aug 27, 2019 at 12:29 PM Philip Reames <<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>Well, your problem is that your code is undefined.</p>
<p>All inbounds pointers derived from a non-null base are non-null.
(By assumption, they can't wrap.) And if you have a null base,
then there are no inbounds derived pointers other than null
itself. So, in the case below, the result of isValid is very
likely to be poison which means branching on it is UB. The only
non-poison result would be if %1 != nullptr. <br>
</p>
<p>Philip<br>
</p>
<p>So the only case which can fail is when %1 is null. <br></p></div></blockquote><div>Turns out that's the case -- when I print out the corresponding var for %1, it's nullptr. So the pointer is calculated as base+offset, but IsValid() is essentially only looking at base.</div><div><br></div><div>I'm not sure I understand the rest -- I'll try distilling the test case into some more.</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div bgcolor="#FFFFFF"><p>
</p>
<div>On 8/27/19 11:19 AM, Jordan Rupprecht
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Yes, it's inbounds.
<div><br>
</div>
<div>Still narrowing it down (I spent all day yesterday just
finding the file with the suspected miscompile). I'm snipping
out a ton, but the relevant bits in the .ll seems to be:</div>
<div><br>
</div>
<div>define doSomething() {</div>
<div>entry:</div>
<div> %1 = load i8*, i8** %<internal/mangled name of
foo.x>, align 8<br>
</div>
<div> %8 = load i64, i64* %offset31.i, align 1<br>
</div>
<div>...</div>
<div>while.body.lr.ph.i:<br>
</div>
<div> br label %while.body.i<br>
</div>
<div>while.body.i:<br>
</div>
<div> %9 = phi i64 [ %8, %while.body.lr.ph.i ], [ %11,
%if.end.i ]<br>
</div>
<div> %add.ptr.i = getelementptr inbounds i8, i8* %1, i64 %9<br>
</div>
<div>...</div>
<div>if.end.i:</div>
<div> %11 = load i64, i64* %offset.i, align 1<br>
</div>
<div>...</div>
<div> %tobool.i15 = icmp eq i8* %add.ptr.i, null ; <--
relevant inlined IsValid() check</div>
<div>
<div>}</div>
<div><br>
</div>
<div>=> that last check is now:<br>
</div>
</div>
<div> %tobool.i15 = icmp eq i8* %1, null<br>
</div>
<div><br>
</div>
<div>(the "valid pointer" is a base+offset, which I guess is why
it got the name %add.ptr.i, if that's at all relevant)</div>
<div><br>
</div>
<div>
<div>I've also confirmed it's these two patches causing the
failure, and not any regressions in the broken window --
i.e. the test fails at r369795, but passes if I take r369788
as the baseline and then apply r369790 - r369794.</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Mon, Aug 26, 2019 at 9:53
PM Philip Reames <<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF">
<p>Is there any chance <valid_pointer> is a inbounds
geps? Without that, I can't see the connection to the
patch.</p>
<p>Philip<br>
</p>
<div>On 8/26/19 5:05 PM, Jordan Rupprecht wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">btw, we're seeing something that looks like
miscompiles with this patch (& after the fix
in r369795). The code looks pretty well defined (passes
asan/msan/etc.).
<div><br>
</div>
<div>I'm trying to reduce it into something that
actually triggers the bug, but it vaguely looks like:</div>
<div>struct Foo {</div>
<div> bool IsValid() const { return x; }</div>
<div> Foo(char *y): x(y) {}</div>
<div> char *x;</div>
<div>}</div>
<div><br>
</div>
<div>Foo get() {</div>
<div> return some_condition ? Foo(<valid
pointer>) : Foo(nullptr);</div>
<div>}</div>
<div><br>
</div>
<div>void doSomething() {</div>
<div> Foo foo = get();</div>
<div> if (foo.IsValid()) { .. }</div>
<div>}</div>
<div><br>
</div>
<div>What I'm observing is get() go down the branch of
some_condition being true and returning Foo(<valid
pointer>), but foo.IsValid() is returning false.</div>
<div><br>
Given that context, is there anything you might see is
a subtle bug with this patch?</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, Aug 23, 2019
at 10:57 AM Philip Reames via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Author: reames<br>
Date: Fri Aug 23 10:58:58 2019<br>
New Revision: 369789<br>
<br>
URL: <a href="http://llvm.org/viewvc/llvm-project?rev=369789&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project?rev=369789&view=rev</a><br>
Log:<br>
[InstCombine] icmp eq/ne (gep inbounds P, Idx..), null
-> icmp eq/ne P, null<br>
<br>
This generalizes the isGEPKnownNonNull rule from
ValueTracking to apply when we do not know if the base
is non-null, and thus need to replace one condition
with another.<br>
<br>
The core notion is that since an inbounds GEP can only
form null if the base pointer is null and the offset
is zero. However, if the offset is non-zero, the the
"inbounds" marker makes the result poison. Thus, we're
free to ignore the case where the offset is non-zero.
Similarly, there's no case under which a non-null base
can result in a null result without generating poison.<br>
<br>
Differential Revision: <a href="https://reviews.llvm.org/D66608" rel="noreferrer" target="_blank">https://reviews.llvm.org/D66608</a><br>
<br>
<br>
Added:<br>
llvm/trunk/test/Transforms/InstCombine/gep-inbounds-null.ll<br>
Modified:<br>
llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp<br>
<br>
Modified:
llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp?rev=369789&r1=369788&r2=369789&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp?rev=369789&r1=369788&r2=369789&view=diff</a><br>
==============================================================================<br>
---
llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp
(original)<br>
+++
llvm/trunk/lib/Transforms/InstCombine/InstCombineCompares.cpp
Fri Aug 23 10:58:58 2019<br>
@@ -894,6 +894,27 @@ Instruction
*InstCombiner::foldGEPICmp(G<br>
Offset = EmitGEPOffset(GEPLHS);<br>
return new
ICmpInst(ICmpInst::getSignedPredicate(Cond), Offset,<br>
Constant::getNullValue(Offset->getType()));<br>
+ } else if (GEPLHS->isInBounds() &&
ICmpInst::isEquality(Cond) &&<br>
+ GEPLHS->getType()->isPointerTy()
&& // TODO: extend to vector geps<br>
+ isa<Constant>(RHS) &&
cast<Constant>(RHS)->isNullValue() &&<br>
+ !NullPointerIsDefined(I.getFunction(),<br>
+
RHS->getType()->getPointerAddressSpace())) {<br>
+ // For most address spaces, an allocation can't
be placed at null, but null<br>
+ // itself is treated as a 0 size allocation in
the in bounds rules. Thus,<br>
+ // the only valid inbounds address derived from
null, is null itself.<br>
+ // Thus, we have four cases to consider:<br>
+ // 1) Base == nullptr, Offset == 0 ->
inbounds, null<br>
+ // 2) Base == nullptr, Offset != 0 -> poison
as the result is out of bounds<br>
+ // 3) Base != nullptr, Offset == (-base) ->
poison (crossing allocations)<br>
+ // 4) Base != nullptr, Offset != (-base) ->
nonnull (and possibly poison)<br>
+ //<br>
+ // (Note if we're indexing a type of size 0, that
simply collapses into one<br>
+ // of the buckets above.)<br>
+ //<br>
+ // In general, we're allowed to make values less
poison (i.e. remove<br>
+ // sources of full UB), so in this case, we
just select between the two<br>
+ // non-poison cases (1 and 4 above).<br>
+ return new ICmpInst(Cond,
GEPLHS->getPointerOperand(), RHS);<br>
} else if (GEPOperator *GEPRHS =
dyn_cast<GEPOperator>(RHS)) {<br>
// If the base pointers are different, but the
indices are the same, just<br>
// compare the base pointer.<br>
<br>
Added:
llvm/trunk/test/Transforms/InstCombine/gep-inbounds-null.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/InstCombine/gep-inbounds-null.ll?rev=369789&view=auto" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/InstCombine/gep-inbounds-null.ll?rev=369789&view=auto</a><br>
==============================================================================<br>
---
llvm/trunk/test/Transforms/InstCombine/gep-inbounds-null.ll
(added)<br>
+++
llvm/trunk/test/Transforms/InstCombine/gep-inbounds-null.ll
Fri Aug 23 10:58:58 2019<br>
@@ -0,0 +1,184 @@<br>
+; NOTE: Assertions have been autogenerated by
utils/update_test_checks.py<br>
+; RUN: opt -S < %s -instcombine | FileCheck %s<br>
+<br>
+;; Start by showing the results of constant folding
(which doesn't use<br>
+;; the poison implied by gep for the nonnull cases).<br>
+<br>
+define i1 @test_ne_constants_null() {<br>
+; CHECK-LABEL: @test_ne_constants_null(<br>
+; CHECK-NEXT: entry:<br>
+; CHECK-NEXT: ret i1 false<br>
+;<br>
+entry:<br>
+ %gep = getelementptr inbounds i8, i8* null, i64 0<br>
+ %cnd = icmp ne i8* %gep, null<br>
+ ret i1 %cnd<br>
+}<br>
+<br>
+define i1 @test_ne_constants_nonnull() {<br>
+; CHECK-LABEL: @test_ne_constants_nonnull(<br>
+; CHECK-NEXT: entry:<br>
+; CHECK-NEXT: ret i1 true<br>
+;<br>
+entry:<br>
+ %gep = getelementptr inbounds i8, i8* null, i64 1<br>
+ %cnd = icmp ne i8* %gep, null<br>
+ ret i1 %cnd<br>
+}<br>
+<br>
+define i1 @test_eq_constants_null() {<br>
+; CHECK-LABEL: @test_eq_constants_null(<br>
+; CHECK-NEXT: entry:<br>
+; CHECK-NEXT: ret i1 true<br>
+;<br>
+entry:<br>
+ %gep = getelementptr inbounds i8, i8* null, i64 0<br>
+ %cnd = icmp eq i8* %gep, null<br>
+ ret i1 %cnd<br>
+}<br>
+<br>
+define i1 @test_eq_constants_nonnull() {<br>
+; CHECK-LABEL: @test_eq_constants_nonnull(<br>
+; CHECK-NEXT: entry:<br>
+; CHECK-NEXT: ret i1 false<br>
+;<br>
+entry:<br>
+ %gep = getelementptr inbounds i8, i8* null, i64 1<br>
+ %cnd = icmp eq i8* %gep, null<br>
+ ret i1 %cnd<br>
+}<br>
+<br>
+;; Then show the results for non-constants. These
use the inbounds provided<br>
+;; UB fact to ignore the possible overflow cases.<br>
+<br>
+define i1 @test_ne(i8* %base, i64 %idx) {<br>
+; CHECK-LABEL: @test_ne(<br>
+; CHECK-NEXT: entry:<br>
+; CHECK-NEXT: [[CND:%.*]] = icmp ne i8*
[[BASE:%.*]], null<br>
+; CHECK-NEXT: ret i1 [[CND]]<br>
+;<br>
+entry:<br>
+ %gep = getelementptr inbounds i8, i8* %base, i64
%idx<br>
+ %cnd = icmp ne i8* %gep, null<br>
+ ret i1 %cnd<br>
+}<br>
+<br>
+define i1 @test_eq(i8* %base, i64 %idx) {<br>
+; CHECK-LABEL: @test_eq(<br>
+; CHECK-NEXT: entry:<br>
+; CHECK-NEXT: [[CND:%.*]] = icmp eq i8*
[[BASE:%.*]], null<br>
+; CHECK-NEXT: ret i1 [[CND]]<br>
+;<br>
+entry:<br>
+ %gep = getelementptr inbounds i8, i8* %base, i64
%idx<br>
+ %cnd = icmp eq i8* %gep, null<br>
+ ret i1 %cnd<br>
+}<br>
+<br>
+;; TODO: vectors not yet handled<br>
+define <2 x i1> @test_vector_base(<2 x
i8*> %base, i64 %idx) {<br>
+; CHECK-LABEL: @test_vector_base(<br>
+; CHECK-NEXT: entry:<br>
+; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds
i8, <2 x i8*> [[BASE:%.*]], i64 [[IDX:%.*]]<br>
+; CHECK-NEXT: [[CND:%.*]] = icmp eq <2 x
i8*> [[GEP]], zeroinitializer<br>
+; CHECK-NEXT: ret <2 x i1> [[CND]]<br>
+;<br>
+entry:<br>
+ %gep = getelementptr inbounds i8, <2 x i8*>
%base, i64 %idx<br>
+ %cnd = icmp eq <2 x i8*> %gep,
zeroinitializer<br>
+ ret <2 x i1> %cnd<br>
+}<br>
+<br>
+define <2 x i1> @test_vector_index(i8* %base,
<2 x i64> %idx) {<br>
+; CHECK-LABEL: @test_vector_index(<br>
+; CHECK-NEXT: entry:<br>
+; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds
i8, i8* [[BASE:%.*]], <2 x i64> [[IDX:%.*]]<br>
+; CHECK-NEXT: [[CND:%.*]] = icmp eq <2 x
i8*> [[GEP]], zeroinitializer<br>
+; CHECK-NEXT: ret <2 x i1> [[CND]]<br>
+;<br>
+entry:<br>
+ %gep = getelementptr inbounds i8, i8* %base, <2
x i64> %idx<br>
+ %cnd = icmp eq <2 x i8*> %gep,
zeroinitializer<br>
+ ret <2 x i1> %cnd<br>
+}<br>
+<br>
+;; These two show instsimplify's reasoning getting to
the non-zero offsets<br>
+;; before instcombine does.<br>
+<br>
+define i1 @test_eq_pos_idx(i8* %base) {<br>
+; CHECK-LABEL: @test_eq_pos_idx(<br>
+; CHECK-NEXT: entry:<br>
+; CHECK-NEXT: ret i1 false<br>
+;<br>
+entry:<br>
+ %gep = getelementptr inbounds i8, i8* %base, i64 1<br>
+ %cnd = icmp eq i8* %gep, null<br>
+ ret i1 %cnd<br>
+}<br>
+<br>
+define i1 @test_eq_neg_idx(i8* %base) {<br>
+; CHECK-LABEL: @test_eq_neg_idx(<br>
+; CHECK-NEXT: entry:<br>
+; CHECK-NEXT: ret i1 false<br>
+;<br>
+entry:<br>
+ %gep = getelementptr inbounds i8, i8* %base, i64 -1<br>
+ %cnd = icmp eq i8* %gep, null<br>
+ ret i1 %cnd<br>
+}<br>
+<br>
+;; Show an example with a zero sized type since
that's<br>
+;; a cornercase which keeps getting mentioned. The
GEP<br>
+;; produces %base regardless of the value of the
index<br>
+;; expression.<br>
+define i1 @test_size0({}* %base, i64 %idx) {<br>
+; CHECK-LABEL: @test_size0(<br>
+; CHECK-NEXT: entry:<br>
+; CHECK-NEXT: [[CND:%.*]] = icmp ne {}*
[[BASE:%.*]], null<br>
+; CHECK-NEXT: ret i1 [[CND]]<br>
+;<br>
+entry:<br>
+ %gep = getelementptr inbounds {}, {}* %base, i64
%idx<br>
+ %cnd = icmp ne {}* %gep, null<br>
+ ret i1 %cnd<br>
+}<br>
+define i1 @test_size0_nonzero_offset({}* %base) {<br>
+; CHECK-LABEL: @test_size0_nonzero_offset(<br>
+; CHECK-NEXT: entry:<br>
+; CHECK-NEXT: [[CND:%.*]] = icmp ne {}*
[[BASE:%.*]], null<br>
+; CHECK-NEXT: ret i1 [[CND]]<br>
+;<br>
+entry:<br>
+ %gep = getelementptr inbounds {}, {}* %base, i64 15<br>
+ %cnd = icmp ne {}* %gep, null<br>
+ ret i1 %cnd<br>
+}<br>
+<br>
+<br>
+;; Finally, some negative tests for sanity checking.<br>
+<br>
+define i1 @neq_noinbounds(i8* %base, i64 %idx) {<br>
+; CHECK-LABEL: @neq_noinbounds(<br>
+; CHECK-NEXT: entry:<br>
+; CHECK-NEXT: [[GEP:%.*]] = getelementptr i8, i8*
[[BASE:%.*]], i64 [[IDX:%.*]]<br>
+; CHECK-NEXT: [[CND:%.*]] = icmp ne i8* [[GEP]],
null<br>
+; CHECK-NEXT: ret i1 [[CND]]<br>
+;<br>
+entry:<br>
+ %gep = getelementptr i8, i8* %base, i64 %idx<br>
+ %cnd = icmp ne i8* %gep, null<br>
+ ret i1 %cnd<br>
+}<br>
+<br>
+define i1 @neg_objectatnull(i8 addrspace(2)* %base,
i64 %idx) {<br>
+; CHECK-LABEL: @neg_objectatnull(<br>
+; CHECK-NEXT: entry:<br>
+; CHECK-NEXT: [[GEP:%.*]] = getelementptr inbounds
i8, i8 addrspace(2)* [[BASE:%.*]], i64 [[IDX:%.*]]<br>
+; CHECK-NEXT: [[CND:%.*]] = icmp eq i8
addrspace(2)* [[GEP]], null<br>
+; CHECK-NEXT: ret i1 [[CND]]<br>
+;<br>
+entry:<br>
+ %gep = getelementptr inbounds i8, i8 addrspace(2)*
%base, i64 %idx<br>
+ %cnd = icmp eq i8 addrspace(2)* %gep, null<br>
+ ret i1 %cnd<br>
+}<br>
<br>
<br>
_______________________________________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote></div></div>
</blockquote></div>