[PATCH] D26437: Use -fno-unit-at-a-time and -funit-at-a-time

Stanislav Mekhanoshin via llvm-commits llvm-commits at lists.llvm.org
Fri Nov 11 14:44:27 PST 2016


rampitec added a comment.

In https://reviews.llvm.org/D26437#593300, @jlebar wrote:

> Ah, I think I understand your problem.
>  The problem is that you are explicitly marking this function as convergent, and then are surprised that LLVM is removing the convergent attribute.
>  Is that right?


Right. Note, it is marked in the standard header. I can remove attribute from the implementation, it does not change anything.

> This is WAI for how we have interpreted the convergent attribute in clang and llvm.  Maybe our intended behavior is wrong, we can discuss it.  But first I want to be really clear that this has nothing to do with running IPOs, or compiling a bitcode library.  You would have exactly the same problem, and exactly the same bug, if your testcase were the whole program.  Does that make sense?

In fact when OpenCL conformance test subgroups is compiled in the whole program mode. It does hang right now and passes if I compile library to preserve convergent attribute on sub_group_barrier.

> As I said above:
> 
>> clang adds the "convergent" attribute to every function it emits. llvm then removes this attribute from functions that do not contain convergent instructions or call convergent functions.

This statement confuses me, that is not what I observe:

  void foo(void)
  {
  }

So you say clang will add convergent attribute here and llvm will remove it. Let's see:

clang -o sb0.bc -c -emit-llvm  -x cl -cl-std=CL2.0 -target amdgcn-amd-amdhsa-opencl -O0 sb0.cl

I'm using -O0, so llvm does not optimize (note, if I use -O0 for original sub_group_barrier implementation then attribute is preserved as well, so I'm sure -O0 does not run this optimization). Result:

  ; Function Attrs: nounwind
  define void @foo() #0 {
  entry:
    ret void
  }
  
  attributes #0 = { nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-features"="+fp64-denormals,-fp32-denormals" "unsafe-fp-math"="false" "use-soft-float"="false" }

I do not see convergent attribute added.

> This is the reason that you are seeing what you are seeing: llvm is removing the "convergent" attribute from this function because it does not contain any convergent instructions.  Again, it has nothing to do with bitcode libraries.  Indeed, if we made the change proposed here, it would likely not fix your bug.  As soon as you link this bitcode library into another library and run the IPOs over the combined result, it's going to again remove the convergent attribute on this function.

I can tell you for sure that after linking with actual main module and optimizing in whole mode this really fixes the bug ;)

>> You may argue that atomic_work_item_fence does not have convergent attribute, but it should not in fact.
> 
> Can you elaborate on why atomic_work_item_fence should not have the convergent attribute but sub_group_barrier should be convergent?
>  Specifically, the convergent attribute prevents additional control-flow dependencies from being added to a callsite.  Are you saying that it is incorrect to convert
> 
>   sub_group_barrier();
>   if (a) { ... }
>   else { ... }
>    
> 
> into
> 
>   if (a) { sub_group_barrier(); ... }
>   else { sub_grouop_barrier(); ... }
>    
> 
> but it *is* ok to perform the same transformation on atomic_work_item_fence?  Why?

I suppose so, because fence does not act as a barrier, just makes memory visible.

>> Then I can remove a body of the sub_group_barrier completely so it does not call anything at all, result is still the same.
> 
> Yes, there is currently no way for you to say "this empty function must be convergent."
>  Can you explain why this is an important feature?

That is not important. That was just to show that lack of attribute on fence is not a problem per-se.

>> Also note that besides functions attribute pass PassManagerBuilder adds a lot of other passes under unit-a-time-mode which are not expected to run in a non-whole program mode, we have there IPSCCP, GlobalOptimizer, DeadArgElimination etc. All of that is not supposed to run on a library module.
> 
> If it's OK with you, I'd like to focus on the convergent issue first.  Once we get that resolved, Richard Smith should be able to speak to your larger question of funit-at-a-time.

OK, even though I believe the latter is even more important.


Repository:
  rL LLVM

https://reviews.llvm.org/D26437





More information about the llvm-commits mailing list