[llvm-dev] pointer provenance introduction to load/store instructions

Tue Jun 15 04:31:55 PDT 2021

>> As far as I understand, your goal is to declare what's the set of 
>> objects a pointer may alias with at a memory dereference operation.
>> For example, you may want to say that the following load can only 
>> dereference objects pointed-to by %p and %q:
>> %v = load i8, * %r
>> 
>> If %r alias with some object other than %p/%q, the load triggers UB. 
>> This allows you to do better AA.
> 
> Yes, this should make it possible to optimize something like:
> 
> int foo(int* a, int *b) {
>   if ((uintptr_t)a +4) == (uintptr_t) b) {
>     return b[0];
>   } else {
>     return a[1];
>   }
> }
> 
> to something like (pseudo code, assuming 32bit pointers):
>   %a.gep = getelemenptr %a, 1
>   %c = cmp %a.gep, %b                                  ; This will not result in any code
>   %prov = select %c, %b, %a                            ; This will also not result in any oce
>   %result = load i32, i32* %a.gep, ptr_provenance i32* %prov
>   ret i32 %result

This approach is the reverse of what I was thinking. Instead of restricting provenance, you are adding provenance. This is a more dangerous approach, as then provenance information can never be deleted, as it's required for correctness. The other way around uses provenance information to aid optimization, but it's not required for correctness, thus can be dropped.

So the main caveat of the proposal is that every single optimization touching memory operations needs to learn how to preserve & handle this new provenance information. Maybe all the changes will be down just to AA & a few utility functions, but still, every creation, copy, etc of memory operations needs to be audited.
In general, it's good practice to add new features to the IR such that they can be ignored by existing code that doesn't know about them.

>> This is useful when you have the restrict keyword in a function 
>> argument and you inline that function. LLVM right now has no way to 
>> restrict aliasing per scope or operation, just per function.
>> (this story has been seen by every other attribute..)
>> 
>> The goal sounds useful. Though it would be nice to see some 
>> performance numbers as this is a complex feature and we need to 
> > understand if it's worth it.
> 
> In what kind of performance numbers are you interested ?

I think the first question is around benefits: Are there benchmarks we care about that benefit from this patch? Are there regressions? Even though the extra code is not materialized in assembly, it still exists and may interact with the inliner heuristics, for example.

> This is true. In my view, that discussion is more or less orthogonal to what the Full Restrict patches add. For Full Restrict we do need to track the (noalias) provenance (this is needed for the 'based-on' rule). For that a number of helpers were introduced:
> - llvm.noalias : adds 'restrict/noalias' information to a pointer
> - llvm.provenance.noalias : adds 'restrict/noalias' information to a pointer (ptr_provenance path)
> - llvm.noalias.arg.guard : combines a computational path with a ptr_provenance path:
> -- Only Load and Store have an explicit ptr_provenance argument
> -- Other places where the provenance must be tracked (when storing the pointer, when passing it to a function, when returning it),
>   the result of the 'llvm.noalias.arg.guard' is used, as that tracks both sides.
> - llvm.noalias.copy.guard : annotated that a pointer points to a memory block containing restrict pointers.
> -- This allows SROA to identify that a restrict pointer is copied when splitting up load/store of aggregates or
>    replacing memcpy.
> 
> So, in the assumption that a memcpy and aggregate load/store propagates provenance, this allows us to keep track of that provenance.

Thanks for this quick summary! I need to think more about this explicit provenance tracking and how far can we stretch it. This stuff is not trivial :)

Nuno