[PATCH] D131628: [LangRef] Add description for nocallback attribute

Mon Aug 15 11:56:56 PDT 2022

nlopes added a comment.

In D131628#3723580 <https://reviews.llvm.org/D131628#3723580>, @jdoerfert wrote:

> In D131628#3721285 <https://reviews.llvm.org/D131628#3721285>, @nlopes wrote:
>
>> In D131628#3719212 <https://reviews.llvm.org/D131628#3719212>, @jdoerfert wrote:
>>
>>> In D131628#3719136 <https://reviews.llvm.org/D131628#3719136>, @nlopes wrote:
>>>
>>>> Anyway, the semantics issue pointed out above must be resolved or else the patch must be reverted as currently the lowering of clang is wrong and will miscompile programs.
>>>
>>> Let's just resolve this now and here.
>>>
>>> I figured us using it implies we need it. However, here is some data: https://tianshilei.me/wp-content/uploads/2022/06/ipdps-2022.pdf
>>
>> My question is normal and reasonable.
>
> My answers are too :). One thing that is missing from this conversation is an example of a breakage. Or did I miss that?

No. Your answer is on the line of "trust me, I know what I'm doing". That's not an ok answer. We have the right for a proper answer with numbers.

Regarding the bug, you said yourself and I quote "You are not wrong about LTO", so I assumed you understood the issue. I left the sketch of the bug in the other thread nevertheless.

>> I ask it every time someone wants to add something new to the IR.
>> The LLVM community must know why it has to deal with extra complexity forever. I will refuse any IR feature that gives 0.00001% speedup for 1 benchmark.
>
> That is great, but not the case here.

Numbers please?

>>> Note section IV-B2 which talks about inter-procedural reachability, that is only possible with nocallback on intrinsics (and leaf on malloc and free in some cases).
>>
>> No numbers are provided. My question still stands.
>
> There are numbers in the paper, both for the impact of the entire inter-procedural store-load forwarding as well as the different parts.
> All OpenMP GPU benchmarks benefit from this substantially, especially once all paper parts have been upstreamed. 
> That is why we did the study (=paper) and prototype. Cleaning it up and upstreaming simply takes time.

I would appreciate if you could summarize the impact of this feature. There are no numbers in the section IV-B2 you pointed out.

>> What's the impact of this attribute in your reachability analysis?
>
> It allows to avoid spurious call edges due to intrinsics and annotated functions (see above) instead of treating intrinsics as if they would be implicitly nocallback.
>
> Below, we want to know if `g()` can be reached by `f()`. We can only be answered with "no" if we have `nocallback` on llvm.memset.
>
>   void kernel() { 
>    g();
>    f();
>   }
>   static void f() {
>     llvm.memset(...) 
>   }
>   static void g() { ... }

The address of the functions must have been stored to memory, otherwise the functions are not callable indirectly.
memset is writeonly, so it can't call anything as it can't read any pointer from memory. Problem solved. memcpy is argmemonly.
I know the argument doesn't apply everywhere, but how many programs write functions pointers to memory and call some intrinsic where you cannot deduce it won't read those pointers? It's a very tight condition.
Hence I ask again: how impactful is this feature?

>> Nevertheless, we cannot introduce a new IR feature that we know that is buggy and without a plan to fix it. Absolutely no way! The patch must be reverted first. Then you can propose a plan to reintroduce the feature in a sound way if it's justified. We need numbers please.
>
> The feature is not buggy. LTO works fine as the `inaccessiblememonly` is actually not hardcoded but a side-effect.
> Let's do the test:
> `a.ll`
>
>   define i32 @main() {
>   entry:
>     %r1 = call i32 @foo1()
>     %r2 = call i32 @foo2()
>     ret i32 %r2
>   }
>   
>   declare i32 @foo1() inaccessiblememonly nocallback
>   declare i32 @foo2() inaccessiblememonly nocallback
>
> `b.ll`
>
>   @g = internal global i32 0
>   
>   define i32 @foo1() noinline {
>   entry:
>     %l = load i32, i32* @g
>     %a = add i32 %l, 1
>     store i32 %a, i32* @g
>     ret i32 %l
>   }
>   define i32 @foo2() {
>   entry:
>     %l = load i32, i32* @g
>     %a = add i32 %l, 1
>     store i32 %a, i32* @g
>     call i32 @foo1()
>     ret i32 %l
>   }
>
> Now we run LTO:
> All attributes we needed to drop are gone.

Thank you for the example!
That shows the linker already drops the nocallback attribute. However, it seems the linker only drops the attribute when linking in the definition. But I don't think this is sufficient. nocallback means you don't call any function in that file. Once you bring in more functions you can't ensure you won't call those extra functions. You'll get the call graph potentially wrong.
The linker must always drop nocallback.

I also suggest that the documentation should mention the attributed must be dropped during linking.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131628/new/

https://reviews.llvm.org/D131628