[LLVMdev] optimization assumes malloc return is non-null

Sun May 4 19:29:21 PDT 2008

On May 1, 2008, at 3:39 PM, David Vandevoorde wrote:
>
> Not quite.  Although there is a requirement there (and more precise  
> ones in Clause 3), there is no prohibition against doing additional,  
> observable stuff (e.g., log the calls) and hence allocations cannot  
> be elided.

That's correct, there is no prohibition, but, if one does, there are  
no requirements placed upon the semantics of the program, none:

   --If  a program contains a violation of a rule for which no  
diagnostic
     is required, this International Standard places  no  requirement   
on
     implementations with respect to that program.

   1.4.12  undefined behavior                            
[defns.undefined]
   behavior, such as might arise upon use of an  erroneous  program   
con-
   struct  or  of  erroneous  data,  for  which  the  Standard imposes  
no
   requirements.

You may also want to read these which cover the basic idea:

   17.1.6  default behavior                       
[defns.default.behavior]
   a  description of replacement function and handler function  
semantics.
   Any specific behavior provided by the implementation, within the  
scope
   of the required behavior.

   17.1.14  replacement function                       
[defns.replacement]
   a reserved function whose definition is provided  by  a  C++   
program.
   Only  one definition for such a function is in effect for the  
duration
   of the program's execution, as the  result  of  creating  the   
program
   (_lex.phases_)  and resolving the definitions of all translation  
units
   (_basic.link_).

   17.1.15  required behavior                    
[defns.required.behavior]
   a description of replacement function and handler function   
semantics,
   applicable to both the behavior provided by the implementation and  
the
   behavior that shall be provided by any function definition in the  
pro-
   gram.   If  a  function  defined  in  a  C++ program fails to meet  
the
   required behavior when it executes, the behavior is undefined.

4 For non-reserved replacement and handler functions,  Clause  _lib.lan-
   guage.support_  specifies two behaviors for the functions in  
question:
   their required and default behavior.  The default behavior  
describes a
   function  definition  provided  by  the  implementation.  The  
required
   behavior describes the semantics of a function definition provided   
by
   either  the  implementation or a C++ program.  Where no distinction  
is
   explicitly made in the description,  the  behavior  described  is   
the
   required behavior.

1 Clauses _lib.language.support_ through _lib.input.output_ describe the
   behavior  of  numerous  functions defined by the C++ Standard  
Library.
   Under some circumstances, however, certain of these function   
descrip-
   tions  also  apply  to  replacement  functions  defined in the  
program
   (_lib.definitions_).

   17.4.3.6  Other functions                        
[lib.res.on.functions]

1 In certain cases (replacement functions, handler functions, operations
   on types used to instantiate standard  library  template   
components),
   the  C++ Standard Library depends on components supplied by a C++  
pro-
   gram.  If these components do not meet their requirements,  the   
Stan-
   dard places no requirements on the implementation.

2 In particular, the effects are undefined in the following cases:

   --for  replacement  functions  (_lib.new.delete_),  if  the   
installed
     replacement function does not implement the semantics of the   
appli-
     cable Required behavior paragraph.

   --for  handler  functions (_lib.new.handler_,  
_lib.terminate.handler_,
     _lib.unexpected.handler_), if the installed  handler  function   
does
     not  implement  the  semantics  of  the applicable Required  
behavior
     paragraph

These are meant to constrain the replacements as I've described.

> No, because both "delete" and "new" may have side effects.

No, please read the above paragraphs again.

> The problem is that with a user-written operator new, there may be
> other valid ways (besides operator new) to get at the storage
> returned.  The object lifetime rules don't always give such programs
> undefined behavior.

The reason why all that wording was included was to ensure the  
semantics of the implementation when the standard library uses the  
user's new/delete operator.  If the user does something funny and  
causes the library to not work, the standard places the responsibility  
squarely in the users lap.  We do that through the wording above.  If  
the user meets the requirements, then:

   a = new stack<int>;

produces no observable semantics.  If there were no such constraint,  
then that same line could produce:

   new allocated 24 bytes at 0xfe4238

on the standard output device (cout/stdout).  The only thing that  
ensures that line doesn't produce that output is the _hard_  
requirement on the users program.  Now, we knew that some users would  
want to put a print in new/delete implementations and do exactly as  
above.  We didn't have a good way to describe that we wanted to happen  
and we didn't spend the effort to describe it well, so, we choose the  
cheap way out and described exactly what they had to do, no more, no  
less for the standard to place any requirement on the semantics.

There would be little point in insuring the semantics of programs when  
replacements are used if the replacements could foster the reuse of in  
use memory.  At best, I think you either run into unspecified behavior  
(for reads) or undefined behavior (for writes) for allocations by the  
system routines.  Though, I do agree, the standard could be made more  
clear exactly what we meant.

> (I personally consider this a defect in the standard, but I don't   
> think that's unarguable.)

> Which part of the standard would I be missing that implies so?

   A C++ implementation pro-
   vides access to, and management of, dynamic  storage  via  the   
global
   allocation  functions  operator  new and operator new[] and the  
global
   deallocation functions operator delete and operator delete[].

The standard is not meant to be read outside the scope of english nor  
the usual and customary terms used by computer science.  It that were  
meant to be the case, we'd have a formal definition for C++.  The  
standard isn't a formal definition.

That says that new is an allocation function and that delete is a  
deallocation function.  These are the usual and customary CS terms.

> it may be the object lifetime rules.  I'm pretty sure it would work  
> if instead of
> "double", a type with a destructor was used.  Worth investigating?

Yes, there are various bits in the lifetime rules that also constrain:

5 Before the lifetime of an object has started  but  after  the  storage
   which the object will occupy has been allocated34) or, after the  
life-
   time  of  an  object has ended and before the storage which the  
object
   occupied is reused or released, any pointer that refers to the  
storage
   location  where the object will be or was located may be used but  
only
   in  limited  ways.   Such  a  pointer  refers  to  allocated    
storage
   (_basic.stc.dynamic.deallocation_),  and  using  the pointer as if  
the
   pointer were of type void*, is well-defined.  Such a  pointer  may   
be
   dereferenced  but  the  resulting  lvalue  may only be used in  
limited
   ways, as described below.

[ see the rest of the standard for all the gory details. ]

>>> A lot of nice guarantees that we have with malloc/free aren't
>>> available with new/delete.  Also, since new/delete can be overridden
>>> at any time (as late as runtime with LD_PRELOAD and friends),
>>
>> 3 The program's definitions are used instead  of  the  default
>> versions
>>  supplied  by  the  implementation  (_dcl.fct.def_).   Such
>> replacement
>>  occurs prior to program startup (_basic.def.odr_, _basic.start_).
>>
>> So, the replacement is done before start, if later, there are no
>> requirements.  And, the replacement has known semantics.
>
>
> But isn't that still too late?

Too late?  What could that possibly mean?  There is by definition  
nothing before the start.  Talking about something that happens first  
as being too late is non-sensical (by this, we merely mean outside the  
bounds of the standard and by that, we mean, the standard places no  
requirements on such a program).

> The optimizer often must make decisions way before that.

There is no optimizer in the standard, so there is no way to even talk  
about it.  All there is, are the abstract semantics when the program  
is run and the constraint on the actual implementation semantics that  
derive from the abstract semantics.

> (Plus, even though the standard currently fails to address dynamic  
> libraries, in practice implementations must keep things working  
> right.)

The standard places no requirements in the face of shared libraries, I  
know, it kinda sucks, but there it is.  The implementation is free to  
map onto the standard as it sees fit however.  The quality can be  
judged by the user however.