[LLVMdev] optimization assumes malloc return is non-null
David Vandevoorde
daveed at vandevoorde.com
Thu May 1 06:57:40 PDT 2008
On Apr 30, 2008, at 9:35 PM, Jonathan S. Shapiro wrote:
> On Wed, 2008-04-30 at 18:26 -0400, David Vandevoorde wrote:
>>> Daveed:
>>>
>>> Good to know that I was looking at the correct section. I do not
>>> agree
>>> that your interpretation follows the as-if rule, because I do not
>>> agree
>>> with your interpretation of the C library specification of malloc().
>>
>>
>> Before I go on, let me state that this is not a contentious issue
>> among WG14: There is no doubt that the intent of the standard is that
>> this be a valid optimization.
>
> And of course, ANSI C working groups are infallible. Consider
> "noalias". :-)
Ah, but with the help of DMR, noalias was avoided and infallibility
maintained.
(Kidding.)
Seriously, I don't think this particular item is a committee booboo,
but I didn't mean to imply it cannot be. I only intended to note that
the intent is that such optimization be allowed (and that to the best
of my knowledge the words reflect that).
> But I think I am starting to understand your argument, and if so I
> concur that the optimization is correct in this case. Let me see ask
> some further questions just to confirm.
>
>
> Let me attempt to replay what I believe is happening here:
>
> 1. There is an observably linear chain of calls to malloc() in the
> example, and because of the structure of the example, the compiler can
> see that there are no *unobserved* calls to malloc() in the program.
(You meant "no *observed*", right?)
> 2. After all inlining of malloc() has proceeded, the compiler is
> free to
> observe that all calls succeed by partial evaluation (thus the non-
> zero
> return value) and that after this is done there is no witness to the
> internal heap limit variable.
>
> 3. Progressive application of dead variable elimination will then
> delete
> the entire chain of updates to the heap limit variable, eliminating
> the
> side effects that would normally be induced by malloc(). They will
> also
> delete the uncaptured pointer result from malloc().
>
> 4. Since the result pointer is not captured, and the heap limit
> variable
> is not consulted, the entire result of the call to malloc() is the
> return of a non-NULL pointer.
>
> 5. Constant folding then reduces the IF condition (malloc() == NULL)
> to
> false.
>
> 6. Dead code elimination then reduces the if/then to simply the then
> body.
>
> 7. Optimizer now reduces the whole thing to "return 1".
>
>
> Is this more or less the chain of events here (or if not this, then
> something of a similar nature)?
I think it's a valid set of transformations, but I don't think it's
the only approach that can lead to this result. I was thinking that a
compiler could be aware of the semantics of malloc: It wouldn't
necessarily "inline" the call, but if it could prove that the result
was not leaked or "used" (indirected), then a non-heap address could
be produced. Alternatively, the malloc call could be promoted to
alloca (as Chris noted in his reply, only if a small bound of
allocation was known; but clearly the case here), and then it becomes
akin to a normal dead local variable thing.
>
>
> If so, I understand and concur that this optimization is valid in this
> example, primarily due to perverse and non-obvious consequences of the
> particular bit of sample code.
>
> Just to cement my understanding, I suspect that *either* of the
> following two changes to the sample code would make this optimization
> invalid:
>
> 1. Rename main() to f()
>
> 2. Introduce a call to some external procedure anywhere in main().
>
> In either case, the compiler can no longer witness all reachable calls
> to malloc(), and must assume conservatively that the internally
> produced
> side effect results of malloc() may be sourced by other units of
> compilation, and therefore cannot be elided.
>
> Is this correct?
I'm of the opinion that even with such changes the optimization is
valid. Let me rename main to f (and drop the unused parameters) and
add calls to a function g. The original example is then:
#include <stdlib.h>
int f() {
g();
if (malloc(sizeof(int)) == NULL) { return 0; } else { return 1; }
g();
}
The malloc value cannot escape and the allocated storage isn't used
(in other words, it can be "garbage collected" right away). So it can
be replaced by a unique address (though the same address can be used
for each invocation, IMO).
Note that within the confines of the C (or C++) standard, g has no way
of observing the result of the call to malloc. (We might "know" that
there is a static-lifetime variable somewhere that changed, but that
variable isn't required by the standard, and a compiler may instead
posit that for this very call to malloc, a separate heap is used.)
There is (at least) a subtlety with that reasoning in C having to do
with "lifetime": The allocated _object_ has a lifetime (7.20.3/1) that
extends until deallocation. My view is that the storage only becomes
an object when the pointer returned is bound to a typed pointer
(7.20.3/1). If that's right, every instance of the particular malloc
call can produce the same value; if not, we'd need to be able to bound
the number instances of that call. (C++ has a convenient "conformance
within resource limits" clause that provides a different way out.)
> Hmm. No. I can weasel out of this in case (2), because the standard
> does
> not require that calls to malloc() be processed in order, so in some
> cases calls to malloc() appearing in main() can still be eliminated by
> logically rearranging them to occur before any of the external calls
> to
> malloc(). I bet LLVM doesn't know about this. :-)
>
>> (I should have been more careful in my earlier response and clarify
>> that modifying a nonvolatile object is a side-effect that can be
>> optimized away in many cases.)
>
> If I understand the standard correctly, this can be done exactly if
> there is no witness to the side effect. Furthermore, chains of
> intermediate unwitnessed side effects can be collapsed if the
> optimizer
> can prove that the intermediate results are unwitnessed. Yes?
Yes, I think so.
>> ...However, the optimization is allowed even without that: Just by
>> noting
>> that the return value of malloc isn't used (nor leaked); I suspect
>> that's what the compiler does in this case (though I don't know).
>
> I think that is not quite strong enough. If there is any
> possibility of
> preceding or succeeding calls to malloc() in compilation units that
> are
> not visible at compile time, then the compiler cannot determine that
> the
> current call to malloc() must succeed or that it's side effects are
> unwitnessed.
See above. If the allocation is small enough that it can be modelled
by a "reserved heap", I think it can.
> I suppose that the compiler can do special case things in main on the
> theory that these calls to malloc() may be executed out of order by a
> conforming implementation, but I suspect that goes well beyond the
> cleverness of the current LLVM implementation. It's the sort of thing
> that the VAX/VMS optimizer or the IBM PL.8 research optimizer might
> actually have done. And certainly the Bulldog optimizer.
(I should probably mention that I'm no optimizer pro. So my reasoning
is based on language specs.)
>>> Setting the matter of the standard entirely aside, the currently
>>> implemented behavior deviates so blatantly from common sense and
>>> universal convention that it really must be viewed as a bug.
>
> Just to be explicit: I now see that my statement above is wrong in the
> present example.
>
> Daveed: Thanks for being patient on this. I appreciate your
> willingness
> to help me see this one.
No problem at all. I should probably worry that I find ISO exegesis
mildly fun :-P
Daveed
More information about the llvm-dev
mailing list