[LLVMdev] optimization assumes malloc return is non-null

Thu May 1 06:57:40 PDT 2008

On Apr 30, 2008, at 9:35 PM, Jonathan S. Shapiro wrote:
> On Wed, 2008-04-30 at 18:26 -0400, David Vandevoorde wrote:
>>> Daveed:
>>>
>>> Good to know that I was looking at the correct section. I do not  
>>> agree
>>> that your interpretation follows the as-if rule, because I do not
>>> agree
>>> with your interpretation of the C library specification of malloc().
>>
>>
>> Before I go on, let me state that this is not a contentious issue
>> among WG14: There is no doubt that the intent of the standard is that
>> this be a valid optimization.
>
> And of course, ANSI C working groups are infallible. Consider
> "noalias". :-)

Ah, but with the help of DMR, noalias was avoided and infallibility  
maintained.

(Kidding.)

Seriously, I don't think this particular item is a committee booboo,  
but I didn't mean to imply it cannot be.  I only intended to note that  
the intent is that such optimization be allowed (and that to the best  
of my knowledge the words reflect that).

> But I think I am starting to understand your argument, and if so I
> concur that the optimization is correct in this case. Let me see ask
> some further questions just to confirm.
>
>
> Let me attempt to replay what I believe is happening here:
>
> 1. There is an observably linear chain of calls to malloc() in the
> example, and because of the structure of the example, the compiler can
> see that there are no *unobserved* calls to malloc() in the program.

(You meant "no *observed*", right?)

> 2. After all inlining of malloc() has proceeded, the compiler is  
> free to
> observe that all calls succeed by partial evaluation (thus the non- 
> zero
> return value) and that after this is done there is no witness to the
> internal heap limit variable.
>
> 3. Progressive application of dead variable elimination will then  
> delete
> the entire chain of updates to the heap limit variable, eliminating  
> the
> side effects that would normally be induced by malloc(). They will  
> also
> delete the uncaptured pointer result from malloc().
>
> 4. Since the result pointer is not captured, and the heap limit  
> variable
> is not consulted, the entire result of the call to malloc() is the
> return of a non-NULL pointer.
>
> 5. Constant folding then reduces the IF condition (malloc() == NULL)  
> to
> false.
>
> 6. Dead code elimination then reduces the if/then to simply the then
> body.
>
> 7. Optimizer now reduces the whole thing to "return 1".
>
>
> Is this more or less the chain of events here (or if not this, then
> something of a similar nature)?

I think it's a valid set of transformations, but I don't think it's  
the only approach that can lead to this result.  I was thinking that a  
compiler could be aware of the semantics of malloc: It wouldn't  
necessarily "inline" the call, but if it could prove that the result  
was not leaked or "used" (indirected), then a non-heap address could  
be produced.  Alternatively, the malloc call could be promoted to  
alloca (as Chris noted in his reply, only if a small bound of  
allocation was known; but clearly the case here), and then it becomes  
akin to a normal dead local variable thing.

>
>
> If so, I understand and concur that this optimization is valid in this
> example, primarily due to perverse and non-obvious consequences of the
> particular bit of sample code.
>
> Just to cement my understanding, I suspect that *either* of the
> following two changes to the sample code would make this optimization
> invalid:
>
>  1. Rename main() to f()
>
>  2. Introduce a call to some external procedure anywhere in main().
>
> In either case, the compiler can no longer witness all reachable calls
> to malloc(), and must assume conservatively that the internally  
> produced
> side effect results of malloc() may be sourced by other units of
> compilation, and therefore cannot be elided.
>
> Is this correct?

I'm of the opinion that even with such changes the optimization is  
valid.  Let me rename main to f (and drop the unused parameters) and  
add calls to a function g.  The original example is then:

	#include <stdlib.h>

	int f() {
	  g();
	  if (malloc(sizeof(int)) == NULL) { return 0; } else { return 1; }
	  g();
	}

The malloc value cannot escape and the allocated storage isn't used  
(in other words, it can be "garbage collected" right away).  So it can  
be replaced by a unique address (though the same address can be used  
for each invocation, IMO).

Note that within the confines of the C (or C++) standard, g has no way  
of observing the result of the call to malloc. (We might "know" that  
there is a static-lifetime variable somewhere that changed, but that  
variable isn't required by the standard, and a compiler may instead  
posit that for this very call to malloc, a separate heap is used.)

There is (at least) a subtlety with that reasoning in C having to do  
with "lifetime": The allocated _object_ has a lifetime (7.20.3/1) that  
extends until deallocation.  My view is that the storage only becomes  
an object when the pointer returned is bound to a typed pointer  
(7.20.3/1).  If that's right, every instance of the particular malloc  
call can produce the same value; if not, we'd need to be able to bound  
the number instances of that call.  (C++ has a convenient "conformance  
within resource limits" clause that provides a different way out.)

> Hmm. No. I can weasel out of this in case (2), because the standard  
> does
> not require that calls to malloc() be processed in order, so in some
> cases calls to malloc() appearing in main() can still be eliminated by
> logically rearranging them to occur before any of the external calls  
> to
> malloc(). I bet LLVM doesn't know about this. :-)
>
>> (I should have been more careful in my earlier response and clarify
>> that modifying a nonvolatile object is a side-effect that can be
>> optimized away in many cases.)
>
> If I understand the standard correctly, this can be done exactly if
> there is no witness to the side effect. Furthermore, chains of
> intermediate unwitnessed side effects can be collapsed if the  
> optimizer
> can prove that the intermediate results are unwitnessed. Yes?

Yes, I think so.

>> ...However, the optimization is allowed even without that: Just by  
>> noting
>> that the return value of malloc isn't used (nor leaked); I suspect
>> that's what the compiler does in this case (though I don't know).
>
> I think that is not quite strong enough.  If there is any  
> possibility of
> preceding or succeeding calls to malloc() in compilation units that  
> are
> not visible at compile time, then the compiler cannot determine that  
> the
> current call to malloc() must succeed or that it's side effects are
> unwitnessed.

See above.  If the allocation is small enough that it can be modelled  
by a "reserved heap", I think it can.

> I suppose that the compiler can do special case things in main on the
> theory that these calls to malloc() may be executed out of order by a
> conforming implementation, but I suspect that goes well beyond the
> cleverness of the current LLVM implementation. It's the sort of thing
> that the VAX/VMS optimizer or the IBM PL.8 research optimizer might
> actually have done. And certainly the Bulldog optimizer.

(I should probably mention that I'm no optimizer pro.  So my reasoning  
is based on language specs.)

>>> Setting the matter of the standard entirely aside, the currently
>>> implemented behavior deviates so blatantly from common sense and
>>> universal convention that it really must be viewed as a bug.
>
> Just to be explicit: I now see that my statement above is wrong in the
> present example.
>
> Daveed: Thanks for being patient on this. I appreciate your  
> willingness
> to help me see this one.

No problem at all.  I should probably worry that I find ISO exegesis  
mildly fun :-P

	Daveed