[LLVMdev] Instructions that cannot be duplicated

Villmow, Micah Micah.Villmow at amd.com
Wed Oct 7 11:20:05 PDT 2009


Is there a current way to specify that an instruction or function call
cannot be duplicated and thus any optimizations that might want to
duplicate this instruction would fail?

 

The problem deals with barrier in OpenCL 1.0. One of the conditions of
using barrier is that if a barrier exists inside of control flow, every
thread in a work-group must execute the barrier instruction(6.11.9). 

 

However, in this simple CL code:
#pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics: enable

#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics: enable

__kernel void  

KMeansMapReduceAtomic(const int  num_attributes,

                      const int  num_objects,

                      __global int* delta_d

                      ) 

{

                __local int clusterCount[256];

                __local int sTemp[1];  // amd opencl needed this to be
an array

    const unsigned int point_id = get_local_id(0);

   int  index = 0;

   int i, addr;

   int xx = get_local_id(0);

   clusterCount[xx] = 0;

   if(get_local_id(0) == 0){

     sTemp[0] = 0; //sTemp is for prefix sum

     }

   barrier(CLK_LOCAL_MEM_FENCE);

   int idWithinCluster = 300; // anthing other then zero

   if (point_id < num_objects) {

    idWithinCluster = atom_add(&clusterCount

                               [index],1);

                }

                  barrier(CLK_LOCAL_MEM_FENCE);

int numMembers = 2; 

                if(idWithinCluster == 0)                 {

                                clusterCount[index] =
atom_add(&sTemp[0], numMembers);//This holds the prefix offset

                }

   delta_d[xx] = clusterCount[index];

}

 

produces bitcode file which has 3 barriers.

 

The problem is now that the second if/barrier pair:

if (point_id < num_objects) {

    idWithinCluster = atom_add(&clusterCount

                               [index],1);

                }

                  barrier(CLK_LOCAL_MEM_FENCE);

 

 

is transformed into flow control equivalent to :

if (point_id >= num_objects) {

  barrier(CLK_LOCAL_MEM_FENCE);

} else {

    idWithinCluster = atom_add(&clusterCount

                               [index],1);

  barrier(CLK_LOCAL_MEM_FENCE);

}

 

which violates opencl, which can cause undefined behavior on the
underlying hardware, as each barrier is unique.  

 

So we want to disable all optimizations around barrier instructions, but
not in other cases when no barrier instruction exists. One way to do
this is to mark an instruction as not being copyable, but is there a
method of doing this in LLVM?

 

Also, this barrier does not map to llvm.barrier because llvm.barrier
only seems to worry about memory operations and not synchronization
between threads.

 

Thanks for any help,

Micah

                

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20091007/5cebd323/attachment.html>


More information about the llvm-dev mailing list