[LLVMdev] RFC: attribute for a pointer which is dereferenceable xor null

Fri Feb 13 09:48:12 PST 2015

On 02/12/2015 10:20 PM, Hal Finkel wrote:
> ----- Original Message -----
>> From: "Philip Reames" <listmail at philipreames.com>
>> To: llvmdev at cs.uiuc.edu
>> Sent: Thursday, February 12, 2015 11:59:17 AM
>> Subject: [LLVMdev] RFC: attribute for a pointer which is dereferenceable xor	null
>>
>>
>> I'd like to propose that we add an attribute which expresses the
>> notion that the specified value is either null or dereferenceable up
>> to a fixed size. (Note the xor.) Our current dereferenceable(n)
>> attribute doesn't quite fit the bill, it implies that the pointer is
>> non-null. Similarly, our nonnull attribute says nothing about
>> dereferenceability.
>>
>> There are two syntax proposals below, but let's start with the
>> motivation.
>>
>> These semantics arise in a number of common cases:
>> - In C, malloc is defined to either return null, or a dereferenceable
>> region of the size requested.
> I think this is really only useful if we allowed 'n' to be a runtime value.
For malloc, you might have a point.  However, I believe that the same is 
true for operator new and the size will frequently be a compile time 
constant there.

I am not proposing adding a 'n' as a runtime value.  I am not opposed to 
it, but it's not part of this proposal.
>
>> - In Java, any reference is either null or dereferenceable to the
>> size of the static type.
>> - I suspect this will also be useful for Julia, Go, Rust, and others
>> for similar reasons.
>>
>> With such an attribute available, we can increase the effectiveness
>> of LICM. We can't move a load outside a loop if it might introduce a
>> fault. Knowing that a pointer is deferefenceable(N) at a location
>> (i.e. the loop preheader) allows us to satisfy this constraint. In
>> the near term, we can simply add a case in the dereferenceability
>> analysis that combines the new attribute and isKnownNonNull. This
>> won't be too effective out of the box, but will enable testing with
>> llvm.assumes and might catch some cases. I will probably also add a
>> case to look at the controlling branch to the loop preheader since
>> in practice that tends to be where a unswitched null check would
>> live.
>>
>> Longer term, I plan on introducing a mechanism to have isKnownNonNull
>> consider trivially dominating conditions. This will make the
>> proposed attribute more powerful, but is explicitly not part of this
>> proposal. That's a lot more work and will need a fair amount of
>> discussion on its own.
>>
>> Now, on to possible syntax.
>>
>> Option 1
>> We could simply redefine our current notion of dereferenceable(N) to
>> allow the pointer to be null. Since we already have the nonnull
>> attribute, this wouldn't loose any expressibility. Frontends would
>> need to be modified to emit both dererefenceable(N) and nonnull if
>> they want to preserve the same semantics. Most of the existing
>> utility functions for dereferenceability in LLVM would be modified
>> to just check both. There'd need to by a forward migration added to
>> the bytecode parser to enable upgrade from the old semantics to the
>> new.
>>
>> This is my preferred option, but in offline conversation, Hal
>> objected to this change. I'll let him describe his objection since I
>> was never quite clear on it.
> I feel this would be all pain and no gain. We already have the dereferenceable attribute, and a fair about of code now exists which depends on the current semantics. Introducing a silent semantic change now requires, at least, all producers to be updated. Plus it would be confusing; we currently assume that dereferenceable pointers in address-space zero are not null (and optimize based on that). 'dereferenceable' is the terminology we use for that (not 'dereferenceableAndNotNull'), and I don't like the proposed inconsistency with our API. Lastly, it would be inconsistent with its name: a null pointer in address-space zero is not dereferenceable.
I think this is a far smaller change than your indicating.  There's only 
a handful of places in the code base that directly access the 
attributes; we'd extend them to check 'new deref' and 'nonnull' at 
once.  As a result, most of the APIs would be semantically unchanged.  
We might want to rename them, but that's a separate and less risky change.

Your naming point is a reasonable one.  I'm more okay with the 
seperation between "this has a dereferenceable attribute (but might 
still be null)", and "this pointer is dereferenceable.  I think that in 
practice this confusion is likely to be less than introducing a parallel 
attribute.
>
>> Option 2
>> We introduce a new attribute with the desired semantics. This results
>> in a collection of confusing overlapping attributes, but is
>> otherwise straight forward.
>>
>> My proposed strawman syntax would be: dereferenceable_or_null(N).
>> (Bikeshedding welcomed.) This would be a legal parameter and return
>> attribute on both function declarations and call sites (i.e. calls
>> and invokes). As with above, we'd extend all the places that
>> currently consider 'dereferenceable' to consider the new attribute
>> in combination with isKnownNonNull.
> Okay; I don't object to this attribute. Just so we're on the same page, what is your use case? Is it like the Java case you mentioned above? Also, I wonder: Are you satisfied with the static size constraint, or do you also want runtime sizes?
>
My use case is the Java object case.  I do not need runtime sizes.

Philip