[llvm-dev] Calling a null pointer. How undefined it is?

Gor Nishanov via llvm-dev llvm-dev at lists.llvm.org
Tue Jun 14 06:21:09 PDT 2016


Hi Mats:

>> Undefined behavior is not guaranteed to cause a stop. It's
>> just "do something that may or may not be what you expected".
>> Yes, most operating systems prevent NULL accesses, and cause
>> the application to crash, but it's by no means guaranteed.

You are right. After thinking a bit more, reading your reply and
re-reading llvm blog post on undefined behavior, I realized, duh,
that even on systems where NULL access results in a crash we can
still get into a bad situation.

  struct State {
     FnPtr Fn;
     State() : Fn(&SomeFunction) {}

     void Go() { (*Fn)(); } // may be devirtualized to SomeFunction()
     void Stop() { Fn = nullptr; }
     bool IsDone() { return Fn == nullptr; }
   };

Since, calling Go() when the state machine has reached the "Done"
state is undefined, it is perfectly legal to replace an indirect
call with direct call to its target. Thus, if the user end up
calling Go() in a done state, it won't crash, but, invoke SomeFunction()
which may do some bad things since we are no longer in a valid state.

>> Certainly in a debug build, I'd add an assert for `fn != nullptr`
>> in the `Go` function. Possibly even actively prevent it from getting through
>> in release builds - presumably the `SomeFunction` is more than a few
>> instructions anyway.

I am thinking that I will probably let the frontend/library writer to worry
about inserting the checks as needed. Moreover, fn != nullptr won't help
in the general case as the memory for the state machine might have been freed
and reused, so it will need some kind of flow control guard + memory sanitizer
if safety is desired.

Thanks for your reply, Mats.

--
Gor

P.S.

Though, situation is not as dire. User normally does not call low level
coroutine machinery directly and goes through a higher level coroutine
library abstractions instead.

For example, in a case of a generator:

    generator<int> range(int from, int to) {
       for(int i = from; i < to; ++n)
          co_yield i;
    }
    int main() {
       int sum = 0;
       for (auto v: range(1,100))
          sum += v;
       return sum;
    }

which, BTW, translates down to this in -O3 :-)

    define i32 @main() #5 {
    entry:
      ret i32 4950
    }

The generator destructor calls @llvm.coro.destroy to destroy the coroutine;
An iterator calls @llvm.coro.resume and @llvm.coro.done to go to the next value
and to check if there are no more values.

In operator++() and operator* library writer can add assert(!llvm.coro.done())
to make sure that we are not past the end.


On Tue, Jun 14, 2016 at 2:06 AM, mats petersson <mats at planetcatfish.com> wrote:
> Undefined behaviour is not guaranteed to cause a stop. It's just "do
> something that may or may not be what you expected". Yes, most operating
> systems prevent NULL accesses, and cause the application to crash, but it's
> by no means guaranteed.
>
> My worry is that you'll potentially get very weird crashes in systems where
> `nullptr` isn't a location that is prevented from executing. I've
> accidentally hit this with kernel mode drivers, where something has actually
> mapped address zero as a valid address - and then you scratch your head A
> LOT for the case where it is executing at address 0x43, and trying to access
> some random location you can't use (or invalid instruction, or whatever it
> may be).
>
> The only worse bug to understand was when I accidentally overwrote the
> entire code up to the point where the memset was looping at - but after the
> second or third time I managed to forget to include the string.h [which in
> that environment reversed the order of the address and size, so I'd fill
> from size, address number of bytes - size typically being small, and address
> much larger -> filling all over the place], I learned to recognise the
> pattern.
>
> Both cases cause "What the heck is going on here" type situations - even
> with good debug tools like in circuit emulators or external debgugers, it's
> even worse if you only have regular debugger (or only "printf", "blink the
> LED" etc)
>
> Certainly in a debug build, I'd add an assert for `fn != nullptr` in the
> `Go` function. Possibly even actively prevent it from getting through in
> release builds - presumably the `SomeFunction` is more than a few
> instructions anyway.
>
> --
> Mats
>
> On 14 June 2016 at 05:07, Gor Nishanov via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>>
>> Hi all:
>>
>> This question is related to a state machine generated by LLVM for a
>> coroutine.
>> I stripped all coroutine related details to get to the essence of the
>> question.
>>
>> Let's say I have a state machine that looks like this:
>>
>>   struct State {
>>     FnPtr Fn;
>>     State() : Fn(&SomeFunction) {}
>>
>>     void Go() { (*Fn)(); }
>>     void Stop() { Fn = nullptr; }
>>     bool IsDone() { return Fn == nullptr; }
>>   };
>>
>> Fn field serves two purposes:
>>
>> * Cheap check for done. What can be better than compare with zero!
>> * Guard against programmer mistake: accidentally calling Go() when in a
>>   Stopped state.
>>
>> Is it an appropriate use of undefined behavior?
>>
>> Thank you,
>> Gor
>>
>> P.S.
>>
>> This iassumes -O3 with no ubsan. Sanitizer can, of course, add a null
>> check in Go() prior to an indirect call.
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>


More information about the llvm-dev mailing list