[LLVMdev] Instructions that cannot be duplicated
shreyas krishnan
shreyas76 at gmail.com
Wed Oct 7 14:57:39 PDT 2009
If I may -an unrelated question perhaps- can clang in trunk compile
this CL program with the 2 OPENCL EXTENSION and keyword __kernel?
thanks
shrey
On Wed, Oct 7, 2009 at 11:20 AM, Villmow, Micah <Micah.Villmow at amd.com> wrote:
> Is there a current way to specify that an instruction or function call
> cannot be duplicated and thus any optimizations that might want to duplicate
> this instruction would fail?
>
>
>
> The problem deals with barrier in OpenCL 1.0. One of the conditions of using
> barrier is that if a barrier exists inside of control flow, every thread in
> a work-group must execute the barrier instruction(6.11.9).
>
>
>
> However, in this simple CL code:
> #pragma OPENCL EXTENSION cl_khr_local_int32_base_atomics: enable
>
> #pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics: enable
>
> __kernel void
>
> KMeansMapReduceAtomic(const int num_attributes,
>
> const int num_objects,
>
> __global int* delta_d
>
> )
>
> {
>
> __local int clusterCount[256];
>
> __local int sTemp[1]; // amd opencl needed this to be an
> array
>
> const unsigned int point_id = get_local_id(0);
>
> int index = 0;
>
> int i, addr;
>
> int xx = get_local_id(0);
>
> clusterCount[xx] = 0;
>
> if(get_local_id(0) == 0){
>
> sTemp[0] = 0; //sTemp is for prefix sum
>
> }
>
> barrier(CLK_LOCAL_MEM_FENCE);
>
> int idWithinCluster = 300; // anthing other then zero
>
> if (point_id < num_objects) {
>
> idWithinCluster = atom_add(&clusterCount
>
> [index],1);
>
> }
>
> barrier(CLK_LOCAL_MEM_FENCE);
>
> int numMembers = 2;
>
> if(idWithinCluster == 0) {
>
> clusterCount[index] = atom_add(&sTemp[0],
> numMembers);//This holds the prefix offset
>
> }
>
> delta_d[xx] = clusterCount[index];
>
> }
>
>
>
> produces bitcode file which has 3 barriers.
>
>
>
> The problem is now that the second if/barrier pair:
>
> if (point_id < num_objects) {
>
> idWithinCluster = atom_add(&clusterCount
>
> [index],1);
>
> }
>
> barrier(CLK_LOCAL_MEM_FENCE);
>
>
>
>
>
> is transformed into flow control equivalent to :
>
> if (point_id >= num_objects) {
>
> barrier(CLK_LOCAL_MEM_FENCE);
>
> } else {
>
> idWithinCluster = atom_add(&clusterCount
>
> [index],1);
>
> barrier(CLK_LOCAL_MEM_FENCE);
>
> }
>
>
>
> which violates opencl, which can cause undefined behavior on the underlying
> hardware, as each barrier is unique.
>
>
>
> So we want to disable all optimizations around barrier instructions, but not
> in other cases when no barrier instruction exists. One way to do this is to
> mark an instruction as not being copyable, but is there a method of doing
> this in LLVM?
>
>
>
> Also, this barrier does not map to llvm.barrier because llvm.barrier only
> seems to worry about memory operations and not synchronization between
> threads.
>
>
>
> Thanks for any help,
>
> Micah
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
More information about the llvm-dev
mailing list