[cfe-dev] [LLVMdev] C++AMP -> OpenCL (NVPTX) prototype
corngood at gmail.com
corngood at gmail.com
Sun Apr 14 11:18:40 PDT 2013
On April 14, 2013 09:42:28 AM Hal Finkel wrote:
> [I've copied the cfe-dev list as well.]
> Thanks for sharing this! I think this sounds very interesting. I don't know
> much about AMP, but I do have users who are also interested in accelerator
> targeting, and I'd like you to share your thoughts on:
> 1. Does your implementation share common functionality with the 'captured
> statement' work that Intel is currently doing (in order to support Cilk,
> OpenMP, etc.)? If you're not aware of it, see:
> html -- This should end up in trunk soon. I ask because if the current
> captured statement patches would almost, but not quite, work for you, then
> it would be interesting to understand why.
Kernels in AMP are represented by a lambda, so I haven't had to do anything
special to capture variables. I do some work in the opt passes to marshal
certain types (buffer references so far; also textures, etc in the future), so
maybe there's some overlap there.
Thanks for the link, I'll have to read more about it.
> 2. What will be necessary to eliminate the two-clang-invocations problem.
> If we ever grow support for embedded accelerator targeting (through AMP,
> OpenACC, OpenMP 4+, etc.), it sounds like this will be a common
> requirement, and if I had to guess, there is common interest in putting the
> necessary infrastructure in place.
The only reason I have two clang invokations right now is because of how I
dealt with adress-spaces. In the Shevlin Park presentation, they mentioned
doing analysis and assigning address-spaces after codegen, but I just assign
them using __attribute__((addressspace)) for now, and zero them out for CPU
codegen with a TargetOpt. It sort of piggybacks on the OpenCL ->
NVPTX/SPIR/AMD/etc address space abstraction. The other differences are
similar to how CodeGenOpts.CUDAIsDevice works.
Unfortunately it won't be sufficient for a full implementation of AMP, which
doesn't specify (to my knowledge) any address-space declaration on pointer
types, but still allows pointers into buffers in various address-spaces.
To be honest, I'm not crazy about the AMP specification, I just like the idea
of compiling a heterogenous module for host/device code, which can be easily
integrated into existing C++ application. I'd be happy for it to drop the MS
specific syntax like properties, use C++ attributes wherever possible instead
of keywords, and have explicit address spaces like cuda/opencl.
I think the big problem is going to be making it robustly target two very
different targets in one pass. Most obviously, supporting different bitness for
host/device. My testing was all on 64/32 bit, but all other combinations are
available in practice.
More information about the cfe-dev