[Clang] Convergent Attribute
Ettore Speziale via cfe-commits
cfe-commits at lists.llvm.org
Fri May 6 13:56:12 PDT 2016
Hello,
> In the case of foo, there could be a problem.
> If you do not mark it convergent, the LLVM sink pass push the call to foo to the then branch of the ternary operator, hence the program has been incorrectly optimized.
>
> Really? It looks like the problem is that you lied to the compiler by marking the function as 'pure'. The barrier is a side-effect that cannot be removed or duplicated, so it's not correct to mark this function as pure.
I was trying to write a very small example to trick LLVM and trigger the optimization. It is based on Transforms/Sink/convergent.ll:
define i32 @foo(i1 %arg) {
entry:
%c = call i32 @bar() readonly convergent
br i1 %arg, label %then, label %end
then:
ret i32 %c
end:
ret i32 0
}
declare i32 @bar() readonly convergent
Here is another example:
void foo0(void);
void foo1(void);
__attribute__((convergent)) void baz() {
barrier(CLK_GLOBAL_MEM_FENCE);
}
void bar(int x, global int *y) {
if (x < 5)
foo0();
else
foo1();
baz();
if (x < 5)
foo0();
else
foo1();
}
Based on Transforms/JumpThreading/basic.ll:
define void @h_con(i32 %p) {
%x = icmp ult i32 %p, 5
br i1 %x, label %l1, label %l2
l1:
call void @j()
br label %l3
l2:
call void @k()
br label %l3
l3:
; CHECK: call void @g() [[CON:#[0-9]+]]
; CHECK-NOT: call void @g() [[CON]]
call void @g() convergent
%y = icmp ult i32 %p, 5
br i1 %y, label %l4, label %l5
l4:
call void @j()
ret void
l5:
call void @k()
ret void
; CHECK: }
}
If you do not mark baz convergent, you get this:
clang -x cl -emit-llvm -S -o - test.c -O0 | opt -mem2reg -jump-threading -S
define void @bar(i32 %x) #0 {
entry:
%cmp = icmp slt i32 %x, 5
br i1 %cmp, label %if.then2, label %if.else3
if.then2: ; preds = %entry
call void @foo0()
call void @baz()
call void @foo0()
br label %if.end4
if.else3: ; preds = %entry
call void @foo1()
call void @baz()
call void @foo1()
br label %if.end4
if.end4: ; preds = %if.else3, %if.then2
ret void
}
Which is illegal, as the value of x might not be the same for all work-items.
I’ll update the patch such as:
* it uses the example about jump-threading
* it marks the attribute available in OpenCL/Cuda
* it provides the [[clang::convergent]] attribute
Thanks,
Ettore Speziale
--------------------------------------------------
Ettore Speziale — Compiler Engineer
speziale.ettore at gmail.com
espeziale at apple.com
--------------------------------------------------
More information about the cfe-commits
mailing list