[llvm-dev] [RFC] A nofree (and nosynch) function attribute: Mixing dereferenceable and delete

Tue Jul 10 19:12:25 PDT 2018

Hi Hal,

I'm interested in this functionality and the overall idea of inferring
things from the function body to turn into attributes. I'm looking at
this from the XRay instrumentation angle.

Overall, this is a +1 from me. Some questions below though:

On Wed, Jul 11, 2018 at 12:01 PM Hal Finkel via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
>
> Hi, everyone,
>
> I'd like to propose adding a nofree function attribute to indicate that
> a function does not, directly or indirectly, call a memory-deallocation
> function (e.g., free, C++'s operator delete). Clang/LLVM can currently
> misoptimize functions that:
>
>  1. Have a reference argument.
>
>  2. Free the memory backing the object to which the reference is bound
> during the function's execution.
>
> Because we tag, in Clang, all reference arguments using the
> dereferenceable attribute, LLVM assumes that the pointer is
> unconditionally dereferenceable throughout the course of the entire
> function. This isn't true, however, if the memory is freed during the
> execution of the function. For more information, please see the
> discussion in https://reviews.llvm.org/D48239.
>
> To solve this problem, we need to give LLVM more information in order to
> help it determine when a pointer, which is dereferenceable when the
> functions begins to execute, will still be dereferenceable later on in
> the function's execution. This nofree attribute can be part of that
> solution. If we know that free (and friends) are not called by the
> function (nor by any function called by the function, and so on), then
> we know that pointers that started out dereferenceable will stay that
> way (except as explained below).
>
> I'm initially proposing this to be only a function attribute, although
> one could easily imagine a parameter attribute as well (that indicates
> that a particular pointer argument is not freed by the function). This
> might be useful, but for the use case of helping dereferenceable, it
> would be subtle to use, unless the parameter was also marked as noalias,
> because you'd need to know that the parameter was not also aliased with
> another argument (or had not been captured). Another analysis would need
> to provide this kind of information.
>
> Also, just because a function does not, directly or indirectly, call
> free does not mean that it cannot cause memory to be deallocated. The
> function might communicate (synchronize) with another thread causing
> that other thread to delete the memory. For this reason, to use
> dereferenceable as we currently do, we also need to know that the
> function does not synchronize with any other threads. To solve this
> problem, like nofree, I propose to add a nosynch attribute (to indicate
> that a function does not use (non-relaxed) atomics or otherwise
> synchronize with any other threads (e.g., perform I/O or, as a practical
> matter, use volatile accesses).
>

How far does the attribute go? For example, does it propagate up the
caller stack?

This might be a basic IR question but I suppose this only works for
definitions in the same module -- I wonder whether the attribute can
be asserted/added in the declarations, and ensured that somehow at
link-time the attribute holds. For example, while we might assume that
a function declaration says `nofree` today but the implementation
might actually change to do something else, how we might be able to
guard against this.

Will this also extend/change the default attributes that are defined
for the intrinsics? XRay has a couple of intrinsics that have a number
of attributes, and I imagine some other intrinsics for the sanitizers
would need to learn about the attribute as well.

How extensive do we expect changes like this to be handled when doing
things like inlining, outlining, partial-inlining, etc.?

Is the default assumption going to be that a function that isn't
marked `nofree` *will* free and pessimize that way? Does it make more
sense then to make an attribute that's positive, say 'frees' and relax
the default assumption to "does not free"?

> I've posted a patch for the nofree attribute
> (https://reviews.llvm.org/D49165). nosynch's implementation would be
> very similar (except instead of looking for calls to free, it would look
> for uses of non-relaxed atomics, volatile ops, and known functions that
> are not I/O functions).
>
> With both of these attributes (nofree and nosynch), a function argument
> with the dereferenceable attribute will be known to be dereferenceable
> throughout the execution of the attributed function. We can update
> isDereferenceableAndAlignedPointer to include these additional checks on
> the current function.
>
> One more choice we have: We can, as I proposed above, essentially weaken
> the current semantics of dereferenceable to not exclude
> mid-function-execution deallocation. We can also add a second attribute
> with the current, stronger, semantics. We can keep the current attribute
> as-is, and add a second attribute with the weaker semantics (and switch
> Clang to use that).
>
> Please let me know what you think.
>

I've not worked out the full matrix of possibilities here in my head
yet, but what are the risks with relaxing the default semantics then
introducing the stronger attributes? Maybe you or someone has thought
that through before, and it would be great to have a summary or an
idea what the pros/cons are of doing that instead of attempting to
infer non-freeing behaviour.

Cheers

-- 
Dean