[LLVMdev] Instructions that cannot be duplicated

shreyas krishnan shreyas76 at gmail.com
Wed Oct 7 14:57:39 PDT 2009


If I may -an unrelated question perhaps- can clang in trunk compile
this CL program with the 2  OPENCL EXTENSION and keyword __kernel?

thanks
shrey



On Wed, Oct 7, 2009 at 11:20 AM, Villmow, Micah <Micah.Villmow at amd.com> wrote:
> Is there a current way to specify that an instruction or function call
> cannot be duplicated and thus any optimizations that might want to duplicate
> this instruction would fail?
>
>
>
> The problem deals with barrier in OpenCL 1.0. One of the conditions of using
> barrier is that if a barrier exists inside of control flow, every thread in
> a work-group must execute the barrier instruction(6.11.9).
>
>
>
> However, in this simple CL code:
> #pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics: enable
>
> #pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics: enable
>
> __kernel void
>
> KMeansMapReduceAtomic(const int  num_attributes,
>
>                       const int  num_objects,
>
>                       __global int* delta_d
>
>                       )
>
> {
>
>                 __local int clusterCount[256];
>
>                 __local int sTemp[1];  // amd opencl needed this to be an
> array
>
>     const unsigned int point_id = get_local_id(0);
>
>    int  index = 0;
>
>    int i, addr;
>
>    int xx = get_local_id(0);
>
>    clusterCount[xx] = 0;
>
>    if(get_local_id(0) == 0){
>
>      sTemp[0] = 0; //sTemp is for prefix sum
>
>      }
>
>    barrier(CLK_LOCAL_MEM_FENCE);
>
>    int idWithinCluster = 300; // anthing other then zero
>
>    if (point_id < num_objects) {
>
>     idWithinCluster = atom_add(&clusterCount
>
>                                [index],1);
>
>                 }
>
>                   barrier(CLK_LOCAL_MEM_FENCE);
>
> int numMembers = 2;
>
>                 if(idWithinCluster == 0)                 {
>
>                                 clusterCount[index] = atom_add(&sTemp[0],
> numMembers);//This holds the prefix offset
>
>                 }
>
>    delta_d[xx] = clusterCount[index];
>
> }
>
>
>
> produces bitcode file which has 3 barriers.
>
>
>
> The problem is now that the second if/barrier pair:
>
> if (point_id < num_objects) {
>
>     idWithinCluster = atom_add(&clusterCount
>
>                                [index],1);
>
>                 }
>
>                   barrier(CLK_LOCAL_MEM_FENCE);
>
>
>
>
>
> is transformed into flow control equivalent to :
>
> if (point_id >= num_objects) {
>
>   barrier(CLK_LOCAL_MEM_FENCE);
>
> } else {
>
>     idWithinCluster = atom_add(&clusterCount
>
>                                [index],1);
>
>   barrier(CLK_LOCAL_MEM_FENCE);
>
> }
>
>
>
> which violates opencl, which can cause undefined behavior on the underlying
> hardware, as each barrier is unique.
>
>
>
> So we want to disable all optimizations around barrier instructions, but not
> in other cases when no barrier instruction exists. One way to do this is to
> mark an instruction as not being copyable, but is there a method of doing
> this in LLVM?
>
>
>
> Also, this barrier does not map to llvm.barrier because llvm.barrier only
> seems to worry about memory operations and not synchronization between
> threads.
>
>
>
> Thanks for any help,
>
> Micah
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>




More information about the llvm-dev mailing list