[PATCH] D26437: Use -fno-unit-at-a-time and -funit-at-a-time
Stanislav Mekhanoshin via llvm-commits
llvm-commits at lists.llvm.org
Fri Nov 11 13:42:15 PST 2016
rampitec added a comment.
The full testcase is rather big and requires a lot of setup, but I have created a really small one, sb1.cl:
#include <opencl-c.h>
__attribute__((overloadable, always_inline, convergent)) void
sub_group_barrier(cl_mem_fence_flags flags, memory_scope scope)
{
if (flags)
atomic_work_item_fence(flags, memory_order_acq_rel, scope);
}
The file opencl-c.h is from clang, and note, that attribute convergent is added because of the contents of this file:
#define __ovld __attribute__((overloadable))
#define __conv __attribute__((convergent))
void __ovld __conv sub_group_barrier(cl_mem_fence_flags flags, memory_scope scope);
Now I compile it to produce a library .bc module:
clang -o sb1.bc -c -emit-llvm -x cl -cl-std=CL2.0 -target amdgcn-amd-amdhsa-opencl -O3 -I <path to llvm>/llvm/tools/clang/lib/Headers sb1.cl
That is the result:
; Function Attrs: alwaysinline nounwind
define void @_Z17sub_group_barrierj12memory_scope(i32 %flags, i32 %scope) local_unnamed_addr #0 {
attributes #0 = { alwaysinline nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-features"="+fp64-denormals,-fp32-denormals" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-features"="+fp64-denormals,-fp32-denormals" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind }
Convergent attribute is missing. You may argue that atomic_work_item_fence does not have convergent attribute, but it should not in fact. Then I can remove a body of the sub_group_barrier completely so it does not call anything at all, result is still the same.
If I use -fno-unit-at-a-time option, then attribute is in place:
; Function Attrs: alwaysinline convergent nounwind
define void @_Z17sub_group_barrierj12memory_scope(i32 %flags, i32 %scope) #0 {
Also note that besides functions attribute pass PassManagerBuilder adds a lot of other passes under unit-a-time-mode which are not expected to run in a non-whole program mode, we have there IPSCCP, GlobalOptimizer, DeadArgElimination etc. All of that is not supposed to run on a library module.
Repository:
rL LLVM
https://reviews.llvm.org/D26437
More information about the llvm-commits
mailing list