[PATCH] [SCCP] Calculate Null Pointer Comparisons

Mon May 11 14:13:27 PDT 2015

On Mon, May 11, 2015 at 12:35 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
> Sorry, i'm not having a precise enough day :)
> I did not mean to imply they are trapping.
>
> Anyway, i spoke with david, and his view is (and i'm sure he'll
> correct me if i'm wrong) that we should be able to eliminate the
> post-dominated null check as long as their is a load of the inbounds
> pointer.
>
> Would you agree with this?
>
> If so, let's just do that.

Yes, I agree -- storing to / loading from (gep null foo) is UB and we
can exploit that.  However, I don't think we need inbounds on the GEP
to do this.

You probably already know this, but simplifycfg already does this in
some form in removeUndefIntroducingPredecessor:

-simplifycfg can optimize

declare void @g(i8)

define i1 @test_postdominate(i8* %mem) {
  %test = icmp eq i8* %mem, null
  br i1 %test, label %a, label %b
a:
  call void @g(i8 0)
  br label %c
b:
  br label %c
c:
  %join = phi i8* [ %mem, %a ], [ null, %b ]
  %gep = getelementptr i8, i8* %join, i64 4
  %res = load i8, i8* %gep
  call void @g(i8 %res)
  ret i1 %test
}

to

; ModuleID = '/Users/sanjoy/tmp/sccp.ll'

declare void @g(i8)

define i1 @test_postdominate(i8* %mem) {
a:
  %test = icmp eq i8* %mem, null
  call void @g(i8 0)
  %gep = getelementptr i8, i8* %mem, i64 4
  %res = load i8, i8* %gep
  call void @g(i8 %res)
  ret i1 %test
}

The `icmp` is still there since SimplifyCFG does not remove the
comparison, but just the control flow.  Does it make sense to make
simplifycfg smarter instead of sccp?

-- Sanjoy

> If not, i'll drag him kicking and screaming onto this thread, since i
> dont' have a strong opinon  :)
>
>
> On Mon, May 11, 2015 at 11:51 AM, Sanjoy Das
> <sanjoy at playingwithpointers.com> wrote:
>> On Mon, May 11, 2015 at 11:39 AM, Daniel Berlin <dberlin at dberlin.org> wrote:
>>> I agree with your assessment.
>>>
>>> (IE to me, this transformation seems wrong, since it applies inbounds
>>> to a path that was not previously inbounds.
>>> In particular, the operands only guaranteed it would be inbounds along
>>> the left path.
>>> This has made it unconditionally inbounds)
>>>
>>>
>>> We already say we don't hoist things that can trap.
>>
>> The lang ref does not say that inbounds GEPs ever trap -- all they do
>> is produce poison, like nsw/nuw additions.
>>
>>> My uninformed view is that we shouldn't be unconditionally hoisting
>>> inbounds geps unless they post-dominate the target hoist point.
>>>
>>> But again, i'd love to hear other opinions.
>>>
>>>
>>> On Mon, May 11, 2015 at 11:02 AM, Sanjoy Das
>>> <sanjoy at playingwithpointers.com> wrote:
>>>> LLVM currently turns non-postdominating inbounds GEPS to
>>>> postdominating GEPS.  For instance, opt -O3 will transform
>>>>
>>>> define i8* @test_postdominate_2(i8* %mem, i1 %cond) {
>>>>  entry:
>>>>   br i1 %cond, label %left, label %right
>>>>
>>>>  left:
>>>>   %tL = getelementptr inbounds i8, i8* %mem, i32 42
>>>>   br label %merge
>>>>
>>>>  right:
>>>>   br label %merge
>>>>
>>>>  merge:
>>>>   %m = phi i8* [ %tL, %left ], [ null, %right ]
>>>>   ret i8* %m
>>>> }
>>>>
>>>> to
>>>>
>>>> define i8* @test_postdominate_2(i8* readnone %mem, i1 %cond) #0 {
>>>> entry:
>>>>   %tL = getelementptr inbounds i8, i8* %mem, i64 42
>>>>   %m = select i1 %cond, i8* %tL, i8* null
>>>>   ret i8* %m
>>>> }
>>>>
>>>> attributes #0 = { nounwind readnone }
>>>>
>>>> So we can go from "%tL does not postdominate X"  to "%tL postdominates
>>>> X" via -O3.
>>>>
>>>> IOW, if we want to rely on control dependence on inbounds GEPs then
>>>> llvm::isSafeToSpeculativelyExecute will have to return false for
>>>> inbounds GEPs.  I don't think control dependence on an instruction
>>>> that may be freely moved around means anything.
>>>>
>>>> -- Sanjoy
>>>>
>>>>
>>>> On Mon, May 11, 2015 at 10:49 AM, Daniel Berlin <dberlin at dberlin.org> wrote:
>>>>> On Mon, May 11, 2015 at 10:41 AM, Sanjoy Das
>>>>> <sanjoy at playingwithpointers.com> wrote:
>>>>>> I'm not sure that this is doing the right thing -- I don't think simply //producing// `undef` implies undefined behavior.
>>>>>
>>>>> I asked David/Nick (the two closest experts i had), and they both
>>>>> agreed that if a null check is postdominated by an inbounds GEP, it's
>>>>> fair to say the null check always comes out as "not null"
>>>>>
>>>>>>
>>>>>> For instance, in
>>>>>>
>>>>>>   declare void @g()
>>>>>>
>>>>>>   define i1 @test_postdominate(i8* %mem) {
>>>>>>     %test = icmp eq i8* %mem, null
>>>>>>     br i1 %test, label %a, label %b
>>>>>>   a:
>>>>>>     call void @g()
>>>>>>     br label %c
>>>>>>   b:
>>>>>>     br label %c
>>>>>>   c:
>>>>>>     %join = phi i8* [ %mem, %a ], [ null, %b ]
>>>>>>     %gep = getelementptr inbounds i8, i8* %join, i64 4
>>>>>>     ret i1 %test
>>>>>>   }
>>>>>>
>>>>>> if `%mem` is `null`, then we have to execute the call to `@g`, but this pass make the branch into `a` dead (so `@f` never calls `@g`).
>>>>>
>>>>> This is correct, as far as my experts told me.
>>>>>
>>>>>>  The fact that `%gep` is `undef` (or poison) does not make this program undefined.
>>>>>> In fact, if `inbounds` GEP instructions could invoke UB, then we would not be able to hoist such instructions out of loops.
>>>>>
>>>>>
>>>>> I leave this disagreement to experts that are not me :)
>>>>>
>>>>> Note that we use exactly the same line of implication to propagate
>>>>> nsw/etc onto operands of inbounds geps (and the language ref
>>>>> explicitly says this is okay :P)