[cfe-dev] [LLVMdev] C++AMP -> OpenCL (NVPTX) prototype

corngood at gmail.com corngood at gmail.com
Sun Apr 14 11:18:40 PDT 2013

On April 14, 2013 09:42:28 AM Hal Finkel wrote:
> Dave,
> [I've copied the cfe-dev list as well.]
> Thanks for sharing this! I think this sounds very interesting. I don't know
> much about AMP, but I do have users who are also interested in accelerator
> targeting, and I'd like you to share your thoughts on:
>  1. Does your implementation share common functionality with the 'captured
> statement' work that Intel is currently doing (in order to support Cilk,
> OpenMP, etc.)? If you're not aware of it, see:
> http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20130408/077615.
> html -- This should end up in trunk soon. I ask because if the current
> captured statement patches would almost, but not quite, work for you, then
> it would be interesting to understand why.

Kernels in AMP are represented by a lambda, so I haven't had to do anything 
special to capture variables.  I do some work in the opt passes to marshal 
certain types (buffer references so far; also textures, etc in the future), so 
maybe there's some overlap there.  

Thanks for the link, I'll have to read more about it.

>  2. What will be necessary to eliminate the two-clang-invocations problem.
> If we ever grow support for embedded accelerator targeting (through AMP,
> OpenACC, OpenMP 4+, etc.), it sounds like this will be a common
> requirement, and if I had to guess, there is common interest in putting the
> necessary infrastructure in place.

The only reason I have two clang invokations right now is because of how I 
dealt with adress-spaces.  In the Shevlin Park presentation, they mentioned 
doing analysis and assigning address-spaces after codegen, but I just assign 
them using __attribute__((addressspace)) for now, and zero them out for CPU 
codegen with a TargetOpt.  It sort of piggybacks on the OpenCL -> 
NVPTX/SPIR/AMD/etc address space abstraction.  The other differences are 
similar to how CodeGenOpts.CUDAIsDevice works.

Unfortunately it won't be sufficient for a full implementation of AMP, which 
doesn't specify (to my knowledge) any address-space declaration on pointer 
types, but still allows pointers into buffers in various address-spaces.

>  -Hal

To be honest, I'm not crazy about the AMP specification, I just like the idea 
of compiling a heterogenous module for host/device code, which can be easily 
integrated into existing C++ application.  I'd be happy for it to drop the MS 
specific syntax like properties, use C++ attributes wherever possible instead 
of keywords, and have explicit address spaces like cuda/opencl.

I think the big problem is going to be making it robustly target two very 
different targets in one pass.  Most obviously, supporting different bitness for 
host/device.  My testing was all on 64/32 bit, but all other combinations are 
available in practice.

- Dave

More information about the cfe-dev mailing list