[llvm] r194760 - Add addrspacecast instruction.

Thu Nov 21 09:42:51 PST 2013

Hi Michele,

Thanks for the patch.

I think your patch allows for too much behaviour, at least your test case is not legal OpenCL code. However, I'm not exactly sure what you were trying to achieve. The following:

 __global *int b = (__global int *)A

with A defined as

 __local int *A

is actually not allowed in OpenCL 1.2 (and earlier) [1]:

"A pointer to address space A can only be assigned to a pointer to the same address space A. Casting a pointer to address space A to a pointer to address space B is illegal."

I haven't checked whether this situation has changed in the new and shiny OpenCL 2.0.

As an aside, this makes me wonder whether the definition of NULL below is actually wrong. It should maybe simply be defined as 0. In any case clang should not fail with an assertion failure :-)

The situation in CUDA is a bit different. As far as I understand, for older CUDA devices the compiler should be able to figure out what the address space of a pointer is, so in the example below, the associated memory space of p is __shared__. For newer CUDA devices (compute capability 2.0 and up) there is a unified address space, so the address space associated with a pointer can actually change. So, if we have something like

__global__ int A[10]
__shared__ int B[10]

int *p;
p = A;
p = B;

Then p first is a pointer to global memory and next to shared memory. 

Also I still have a number of failing test cases. For example, the following CUDA code still gives me an assertion failure:

#define N 32

__global__ void foo(int* p) {
  __shared__ unsigned char x[N];

  for (unsigned int i=0; i<(N/4); i++) {
    ((unsigned int *)x)[i] = 0;
  }
}

as does:

__global__ void foo() {
  __shared__ int A[10];
  int* p = A + 1;
  p[threadIdx.x] = 0;
}

Regards,

 Jeroen

[1] http://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/global.html

On Nov 21, 2013, at 2:27 PM, Michele Scandale <michele.scandale at gmail.com> wrote:

> On 11/18/2013 10:18 PM, Michele Scandale wrote:
>> On 11/18/2013 07:29 PM, Jeroen Ketema wrote:
>>> 
>>> Hi Michele,
>>> 
>>> I'm currently on 195003.
>>> 
>>> I'm compiling the OpenCL code with:
>>> 
>>> lvm-build/Debug+Asserts/bin/clang -target nvptx--nvidiacl -emit-llvm -c 
>>> kernel.cl
>>> 
>>> (and similarly for the CUDA code)
>>> 
>>> Slightly simpler CUDA example:
>>> 
>>> __global__ void foo() {
>>>   __shared__ int A[10];
>>>   int* p = A;
>>>   p[0] = 0;
>>> }
>>> 
>>> Slightly simpler OpenCL example:
>>> 
>>> #define NULL ((void*)0)
>>> __kernel void foo(int i, __global int *A)
>>> {
>>>   __global int *a;
>>> 
>>>   if (i == 0)
>>>     a = A;
>>>   else
>>>     a = NULL;
>>> 
>>>   if (a != NULL)
>>>     A[0] = 0;
>>> }
>>> 
>>> Note the definition of NULL is as in libclc (llvm related project).
>>> 
>> 
>> Thanks, it was the definition of NULL that I was using the prevent me to
>> reproduce the problem. From a quick look, is a problem related to clang: now
>> bitcast changed semantic and so clang must be updated in order to match the new
>> semantic. A quick workaround seems to solve the problem, but I think that a
>> better solution would be to introduce a new CastKind in clang in order to track
>> address space conversion during semantic analysis, in order to drive nicely the
>> code generation.
>> Till tomorrow I wont able to work on this.
>> 
>> -Michele
>> 
> 
> Here I prepared a possible fix: http://llvm-reviews.chandlerc.com/D2241
> 
> -Michele