[PATCH] D25343: [OpenCL] Mark group functions as convergent in opencl-c.h
Ettore Speziale via cfe-commits
cfe-commits at lists.llvm.org
Tue Oct 25 15:12:09 PDT 2016
Hello,
> As far as I understand the whole problem is that the optimized functions are marked by __attribute__((pure)). If the attribute is removed from your example, we get LLVM dump preserving correctness:
>
> define i32 @bar(i32 %x) local_unnamed_addr #0 {
> entry:
> %call = tail call i32 @foo() #2
> %tobool = icmp eq i32 %x, 0
> %.call = select i1 %tobool, i32 0, i32 %call
> ret i32 %.call
> }
I’ve used __attribute__((pure)) only to force LLVM applying the transformation and show you an example of incorrect behavior.
This is another example:
void foo();
int baz();
int bar(int x) {
int y;
if (x)
y = baz();
foo();
if (x)
y = baz();
return y;
}
Which gets lowered into:
define i32 @bar(i32) #0 {
%2 = icmp eq i32 %0, 0
br i1 %2, label %3, label %4
; <label>:3 ; preds = %1
tail call void (...) @foo() #2
br label %7
; <label>:4 ; preds = %1
%5 = tail call i32 (...) @baz() #2
tail call void (...) @foo() #2
%6 = tail call i32 (...) @baz() #2
br label %7
; <label>:7 ; preds = %3, %4
%8 = phi i32 [ %6, %4 ], [ undef, %3 ]
ret i32 %8
}
As you can see the call sites of foo in the optimized IR are not control-equivalent to the only call site of foo in the unoptimized IR. Now imaging foo is implemented in another module and contains a call to a convergent function — e.f. barrier(). You are going to generate incorrect code.
Bye
--------------------------------------------------
Ettore Speziale — Compiler Engineer
speziale.ettore at gmail.com
espeziale at apple.com
--------------------------------------------------
More information about the cfe-commits
mailing list