[PATCH] D79781: [OpenCL] Add cl_khr_extended_subgroup extensions

Thu Jun 4 08:12:36 PDT 2020

PiotrFusik marked 2 inline comments as done.
PiotrFusik added inline comments.

================
Comment at: clang/lib/Headers/opencl-c.h:15594
+int     __ovld sub_group_elect(void);
+int     __ovld sub_group_non_uniform_all( int predicate );
+int     __ovld sub_group_non_uniform_any( int predicate );
----------------
Anastasia wrote:
> PiotrFusik wrote:
> > Anastasia wrote:
> > > PiotrFusik wrote:
> > > > Anastasia wrote:
> > > > > PiotrFusik wrote:
> > > > > > Anastasia wrote:
> > > > > > > PiotrFusik wrote:
> > > > > > > > Anastasia wrote:
> > > > > > > > > Ideally `convergent` attribute was intended to be used in the non-divergent scenarios. So I don't know if it's going to do what is needed here. Did you look into this already?
> > > > > > > > > 
> > > > > > > > > If we look at the Clang documentation it says:
> > > > > > > > > 
> > > > > > > > > > In languages designed for SPMD/SIMT programming model, e.g. OpenCL or CUDA, the call instructions of a function with this attribute must be executed by all work items or threads in a work group or sub group.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > I remember @nhaehnle  was looking at using `convergent` with operations in the divergent control flow some time ago https://reviews.llvm.org/D68994? I am not sure where this thread ended up and whether we can expect this to work currently?
> > > > > > > > Thanks for pointing this out! My understanding of the `convergent` attribute was that it's for uniform control flow, as per the documentation you cited.
> > > > > > > > A quick check shows that Intel Graphics Compiler doesn't suffer from this invalid optimization.
> > > > > > > > Yet I agree that the functions should be marked somehow. It is with `__conv` ?
> > > > > > > 
> > > > > > > > A quick check shows that Intel Graphics Compiler doesn't suffer from this invalid optimization.
> > > > > > > 
> > > > > > > Did you check on the examples that @nhaehnle provided in the review?
> > > > > > > 
> > > > > > > > Yet I agree that the functions should be marked somehow. It is with __conv ?
> > > > > > > 
> > > > > > > I am not sure there is a solution to this in the upstream LLVM at present. I am hoping @nhaehnle  can provide us more information. Alternatively, we could run his examples against the latest LLVM revision to see if the problem still remains or not. I am not saying that it should block this review, I am ok that we commit your patch but if we know already that this functionality can't be fully supported it would be good to at least create a bug to OpenCL component to record this.
> > > > > > The following example:
> > > > > > ```
> > > > > > kernel void test(global int *data)
> > > > > > {
> > > > > >     uint id = (uint) get_global_id(0);
> > > > > >     if (id < 4)
> > > > > >         data[id] = sub_group_elect();
> > > > > >     else
> > > > > >         data[id] = sub_group_elect();
> > > > > > }
> > > > > > ```
> > > > > > with `clang -S -emit-llvm` emits invalid code both with and without the `convergent` attribute.
> > > > > > 
> > > > > > We don't have this problem in our Intel Graphics Compiler though. That's because we replace the subgroup functions with internal intrinsics early enough. These intrinsics are marked with both the `convergent` and `inaccessiblememonly` attributes. Then in the `SimplifyCFG` pass we prevent optimization if both of these attributes are present: https://github.com/intel/intel-graphics-compiler/blob/master/IGC/Compiler/GenTTI.cpp#L397
> > > > > > 
> > > > > > I don't understand what can go wrong in the second example (with the jump threading pass) ?
> > > > > > 
> > > > > Thanks for checking. Do you mind creating a bug to clang for now under OpenCL component https://bugs.llvm.org/enter_bug.cgi?product=clang although we might reclassify this for a wider scope later on.
> > > > > 
> > > > > 
> > > > > 
> > > > > >  Then in the SimplifyCFG pass we prevent optimization if both of these attributes are present: https://github.com/intel/intel-graphics-compiler/blob/master/IGC/Compiler/GenTTI.cpp#L397
> > > > > 
> > > > > I see. Perhaps we need to come up with a new semantic of `convergent` and update LLVM passes... or maybe we have to introduce a new attribute. Not sure. I suggest we add a comment to https://reviews.llvm.org/D68994 and see if it gets picked up.
> > > > > 
> > > > > 
> > > > Do you mean a bug for the invalid optimizations of these new subgroup functions?
> > > > I added a comment to https://reviews.llvm.org/D68994.
> > > > Do you mean a bug for the invalid optimizations of these new subgroup functions?
> > > 
> > > yes, you can just copy/paste your test above:
> > > 
> > > 
> > > > The following example:
> > > > 
> > > > kernel void test(global int *data)
> > > > {
> > > >     uint id = (uint) get_global_id(0);
> > > >     if (id < 4)
> > > >         data[id] = sub_group_elect();
> > > >     else
> > > >         data[id] = sub_group_elect();
> > > > }
> > > > 
> > > > with clang -S -emit-llvm emits invalid code both with and without the convergent attribute.
> > > 
> > > If you can it would be good to highlight what is invalid in IR produced by upstream LLVM now.
> > > 
> > > 
> > Shall I submit the bug before this change is merged?
> The order is not important as long as we have it eventually. :)
Submitted https://bugs.llvm.org/show_bug.cgi?id=46199

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79781/new/

https://reviews.llvm.org/D79781