[cfe-dev] C++ AMP

Tue Sep 11 11:50:40 PDT 2012

Hi,

On May 15, 2012, at 2:25 AM, John Bytheway wrote:

> On 15/05/12 00:05, Hal Finkel wrote:
>> On Mon, 14 May 2012 23:47:03 +0100
>> John Bytheway <jbytheway+llvm at gmail.com> wrote:
>> 
>>> On 14/05/12 21:50, Chandler Carruth wrote:
>>>> On Mon, May 14, 2012 at 1:37 PM, Rahul Garg
>>>> <rahulgarg44 at gmail.com
>>>> <mailto:rahulgarg44 at gmail.com>> wrote:
>>>> 
>>>>    Hi.
>>>> 
>>>>    New here. Tried searching for any discussions about C++ AMP but
>>>> did not find anything.
>>>>    I am wondering if there are any plans or projects by anyone to
>>>> support C++ AMP extensions through Clang?
>>>> 
>>>> 
>>>> Not currently. There is much work to be done outside of AMP on
>>>> Windows, so until we make headway there, it doesn't make much sense
>>>> to look at AMP specifically.
>>> 
>>> I see no particular reason to aim to support AMP only on Windows.
>>> Personally, I would be more interested in a Linux implementation.
>> 
>> Out of curiosity, what does 'supporting AMP' mean in this context.
> 
> I presume it means implementing the specification:
> <http://download.microsoft.com/download/4/0/E/40EA02D8-23A7-4BD2-AD3A-0BFFFB640F28/CppAMPLanguageAndProgrammingModel.pdf>
> 

We are planning to implement a subset of C++AMP targetting OpenCL on Linux. Obviously minus the MS specific stuff in the specification.

>> Aside from the tile_static storage-class specifier and the
>> restrict(amp) clause, is the rest of it just a template library, or
>> would other frontend modifications be necessary?
> 
> I see some classes use __declspec(property(get)), which is a bit silly.
> It looks like the spec is also assuming sizeof(long)==4, which could be
> a problem.
> 

There are some parts of the spec that lead to FE changes.

For a simplified example, suppose we have a greatly simplified container class myarray that is holds a GPU-side array (which is an OpenCL buffer underneath). Now, the only permitted way in C++AMP to pass it from host to the GPU side is to enclose the object in a functor that is passed to parallel_for_each.

  On the host-side, myarray is defined like: class myarray { cl_mem p; }

  On the GPU side, the cl_mem has to be converted to a __global pointer through clSetKernelArg before it can be used inside the kernel.

This implies that 
1) myarray would somehow look different on the GPU side like: class myarray { __global T *p; } and,
2) one way to make it work is to serialize the myarray through a series of clSetKernelArgs and then reconstructed on the device side using the GPU-side declaration using the arguments of the kernel. Now this would mean some form of automatic serialization/deserialization code being generated by clang for such a functor class, as this involves introspecting arbitrary functor given by the user and AFAIK it cannot be done with just templates in C++.

Regards,
Ray