[PATCH] D25343: [OpenCL] Mark group functions as convergent in opencl-c.h

Anastasia Stulova via cfe-commits cfe-commits at lists.llvm.org
Fri Oct 28 03:46:49 PDT 2016


Hi Ettore,

Thanks for the example. Up to now we used noduplicate to prevent this erroneous optimisation but using convergent instead would be equally good. And as it's pointed out it is less restrictive to allow more optimisations in LLVM i.e. loop unrolling with convergent operation in it.

I think we have a good motivation now for adding the convergent attribute explicitly. I am just trying to think whether we need to keep noduplicate at all, but I am guessing this is something we can decide later as well.

Cheers,
Anastasia

-----Original Message-----
From: Ettore Speziale [mailto:speziale.ettore at gmail.com] 
Sent: 25 October 2016 23:12
To: Anastasia Stulova
Cc: Ettore Speziale; Liu, Yaxun (Sam); reviews+D25343+public+a10e9553b0fc895f at reviews.llvm.org; alexey.bader at intel.com; aaron.ballman at gmail.com; Clang Commits; Sumner, Brian; Stellard, Thomas; Arsenault, Matthew; nd
Subject: Re: [PATCH] D25343: [OpenCL] Mark group functions as convergent in opencl-c.h

Hello,

> As far as I understand the whole problem is that the optimized functions are marked by __attribute__((pure)). If the attribute is removed from your example, we get LLVM dump preserving correctness:
> 
> define i32 @bar(i32 %x) local_unnamed_addr #0 {
> entry:
>  %call = tail call i32 @foo() #2
>  %tobool = icmp eq i32 %x, 0
>  %.call = select i1 %tobool, i32 0, i32 %call  ret i32 %.call }

I’ve used __attribute__((pure)) only to force LLVM applying the transformation and show you an example of incorrect behavior.

This is another example:

void foo();
int baz();

int bar(int x) {
  int y;
  if (x) 
    y = baz();
  foo();
  if (x) 
    y = baz();
  return y;
} 

Which gets lowered into:

define i32 @bar(i32) #0 {
  %2 = icmp eq i32 %0, 0
  br i1 %2, label %3, label %4

; <label>:3                                       ; preds = %1
  tail call void (...) @foo() #2
  br label %7

; <label>:4                                       ; preds = %1
  %5 = tail call i32 (...) @baz() #2
  tail call void (...) @foo() #2
  %6 = tail call i32 (...) @baz() #2
  br label %7

; <label>:7                                       ; preds = %3, %4
  %8 = phi i32 [ %6, %4 ], [ undef, %3 ]
  ret i32 %8
}

As you can see the call sites of foo in the optimized IR are not control-equivalent to the only call site of foo in the unoptimized IR. Now imaging foo is implemented in another module and contains a call to a convergent function — e.f. barrier(). You are going to generate incorrect code.

Bye

--------------------------------------------------
Ettore Speziale — Compiler Engineer
speziale.ettore at gmail.com
espeziale at apple.com
--------------------------------------------------



More information about the cfe-commits mailing list