[LLVMdev] readonly and infinite loops
Hal Finkel
hfinkel at anl.gov
Wed Jul 1 15:30:49 PDT 2015
----- Original Message -----
> From: "Nick Lewycky" <nlewycky at google.com>
> To: "Nuno Lopes" <nunoplopes at sapo.pt>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>, "Sanjoy Das"
> <sanjoy at playingwithpointers.com>, "Jeremy Lakeman" <Jeremy.Lakeman at gmail.com>
> Sent: Wednesday, July 1, 2015 5:15:31 PM
> Subject: Re: [LLVMdev] readonly and infinite loops
>
>
>
>
> On 30 June 2015 at 14:30, Nuno Lopes < nunoplopes at sapo.pt > wrote:
>
>
>
>
>
>
> Interesting. Could you give an example why knowing a function will
> halt is
> essential for the heap-to-stack conversion?
>
> The key situation you need to establish in order to do heap-to-stack
> conversion, is that you can see the calls to free (or realloc, etc.)
> along all control-flow paths. If you can, then you can perform the
> conversion (because any additional calls to free that you can't
> observe would lead to a double free, and thus undefined behavior).
> Thus, if we have this situation:
>
> void bar(int *a);
> void foo() {
> int *a = (int*) malloc(sizeof(int)*40);
> bar(a);
> free(a);
> }
>
> we can perform heap-to-stack conversion iff we know that bar(int *)
> always returns normally. If it never returns (perhaps by looping
> indefinitely) then it might capture the pointer, pass it off to some
> other thread, and that other thread might call free() (or it might
> just call free() itself before looping indefinitely). In short, we
> need to know whether the call to free() after the call to bar() is
> dead. If we know that it is reached, then we can perform
> heap-to-stack conversion. Also worth noting is that because we
> unconditionally free(a) after the call to bar(a), it would not be
> legal for bar(a) to call realloc on a (because if realloc did
> reallocate the buffer we'd end up freeing it twice when bar(a) did
> eventually return).
>
> I see, thanks!
> Your argument is that knowing that bar returns implies that 'a'
> cannot be captured or reallocated, otherwise it would be UB. Makes
> sense, yes.
>
>
>
> I'm afraid it's worse than that.
>
>
> void caller() {
> int *ptr = malloc(sizeof(int));
> callee(ptr);
> free(ptr);
> }
>
>
> void callee(int *ptr) {
> if (...) {
> free(ptr);
> log("get critical log message out to the humans, maybe my final
> message will be the last hint they need to finally resolve the bugs
> inside myself");
> }
> }
>
>
>
> C and C++ do permit undefined behaviour to have interactions
> backwards in time. In essence, knowing that UB must happen later can
> cause impossible things to happen now. However, LLVM offers an
> implementation-defined guarantee that no UB occurs until the
> earliest instruction where it occurs.
>
> Your reasoning that we can't have a free in the callee because we'd
> hit a double-free (and hence UB) in the caller is not sufficient to
> perform heap-to-stack transform with that guarantee, because it will
> move the UB sooner to the free() in the callee. That violates our
> implementation guarantee that the log message will be emitted
> (because we'll have already entered UB-land).
>
> The alternatives are to define double-free as not entering full UB
> (similar to poison, but we also have problems to solve with load,
> store, call, branch on value computed through signed overflow,
> etc.), or to remove the guarantee that the log message will be
> emitted.
We'll need to loosen that guarantee. However, for the case you've highlighted, the fact that the routine performs I/O is sufficient to conclude that it might not always return normally in some finite period of time. The C++ specification guarantees only that the function will eventually return (terminate), or perform I/O, or access a volatile variable, or perform some synchronization/atomic operation. Since your function might perform I/O, it cannot be marked as always returning normally, and so we can't perform the conversion.
The critical case here is really a constructor that fills in some structure members and then returns (although some loop may be involved if there is some array to be initialized).
-Hal
>
>
> Nick
>
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-dev
mailing list