[Clang] Convergent Attribute

Ettore Speziale via cfe-commits cfe-commits at lists.llvm.org
Fri May 6 13:56:12 PDT 2016


Hello,

> In the case of foo, there could be a problem.
> If you do not mark it convergent, the LLVM sink pass push the call to foo to the then branch of the ternary operator, hence the program has been incorrectly optimized.
> 
> Really? It looks like the problem is that you lied to the compiler by marking the function as 'pure'. The barrier is a side-effect that cannot be removed or duplicated, so it's not correct to mark this function as pure.

I was trying to write a very small example to trick LLVM and trigger the optimization. It is based on Transforms/Sink/convergent.ll:

define i32 @foo(i1 %arg) {
entry:
  %c = call i32 @bar() readonly convergent
  br i1 %arg, label %then, label %end

then:
  ret i32 %c

end:
  ret i32 0
}

declare i32 @bar() readonly convergent

Here is another example:

void foo0(void);
void foo1(void);

__attribute__((convergent)) void baz() {
  barrier(CLK_GLOBAL_MEM_FENCE);
}

void bar(int x, global int *y) {
  if (x < 5)
    foo0();
  else
    foo1();

  baz();

  if (x < 5)
    foo0();
  else
    foo1();
}

Based on Transforms/JumpThreading/basic.ll:

define void @h_con(i32 %p) {
  %x = icmp ult i32 %p, 5
  br i1 %x, label %l1, label %l2

l1:
  call void @j()
  br label %l3

l2:
  call void @k()
  br label %l3

l3:
; CHECK: call void @g() [[CON:#[0-9]+]]
; CHECK-NOT: call void @g() [[CON]]
  call void @g() convergent
  %y = icmp ult i32 %p, 5
  br i1 %y, label %l4, label %l5

l4:
  call void @j()
  ret void

l5:
  call void @k()
  ret void
; CHECK: }
}

If you do not mark baz convergent, you get this:

clang -x cl -emit-llvm -S -o - test.c -O0 | opt -mem2reg -jump-threading -S

define void @bar(i32 %x) #0 {
entry:
  %cmp = icmp slt i32 %x, 5
  br i1 %cmp, label %if.then2, label %if.else3

if.then2:                                         ; preds = %entry
  call void @foo0()
  call void @baz()
  call void @foo0()
  br label %if.end4

if.else3:                                         ; preds = %entry
  call void @foo1()
  call void @baz()
  call void @foo1()
  br label %if.end4

if.end4:                                          ; preds = %if.else3, %if.then2
  ret void
}

Which is illegal, as the value of x might not be the same for all work-items.

I’ll update the patch such as:

* it uses the example about jump-threading
* it marks the attribute available in OpenCL/Cuda
* it provides the [[clang::convergent]] attribute

Thanks,
Ettore Speziale

--------------------------------------------------
Ettore Speziale — Compiler Engineer
speziale.ettore at gmail.com
espeziale at apple.com
--------------------------------------------------



More information about the cfe-commits mailing list