[cfe-dev] [LLVMdev] C++AMP -> OpenCL (NVPTX) prototype

Sun Apr 14 07:42:28 PDT 2013

----- Original Message -----
> From: corngood at gmail.com
> To: llvmdev at cs.uiuc.edu
> Sent: Saturday, April 13, 2013 9:13:57 PM
> Subject: [LLVMdev] C++AMP -> OpenCL (NVPTX) prototype
> 
> After reading about Intel's 'Shevlin Park' project to implement
> C++AMP in
> llvm/clang, and failing to find any code for it, I decided to try to
> implement
> something similar.  I did it as an excuse to explore and hack on
> llvm/clang,
> which I hadn't done before, but it's now at the point where it will
> run the
> simplest matrix multiplication sample from MSDN, so I thought I might
> as well
> share it.
> 
> The source is in:
> https://github.com/corngood/llvm.git
> https://github.com/corngood/clang.git
> https://github.com/corngood/compiler-rt.git [unchanged]
> https://github.com/corngood/amp.git [simple test project]
> 
> It's fairly hacky, and very fragile, so don't expect anything that
> isn't used
> in the sample to work.  I also haven't tested it on large datasets,
> and there
> are some things that definitely need fixing before I'd expect good
> performance
> (e.g. workgroup size).  It currently works only on NVIDIA GPUs, and
> has only
> been tested on my shitty old 9600GT on amd64 linux with the stable
> binary
> drivers.
> 
> The compilation process currently works like this:
> 
> .cpp -> [clang++ -fc++-amp] -> .ll
> 	- compile non-amp code
> 
> .cpp -> [clang++ -fc++-amp -famp-is-kernel] -> .amp.ll
> 	- compile amp kernels only
> 
> .amp.ll -> [opt -amp-to-opencl] -> .nvvm.ll
> 	- create kernel wrapper to deal with buffer/const inputs
> 	- add nvvm annotations
> 
> .nvvm.ll -> [llc -march=nvptx] -> .ptx
> 	- compile kernels to NVPTX (unchanged)
> 
> .ll + .ptx -> [opt -amp-create-stubs .ptx] -> .opt.ll
> 	- embed ptx as array data
> 	- create functions to get kernel info, load inputs, etc
> 
> .opt.ll -> [llc] -> .o
> 	- unchanged
> 
> The clang steps only differ in codegen, so eventually they should be
> combined
> into one clang call.  NVPTX is meant to be replaced with SPIR at some
> point,
> to make it portable, which is why I didn't bother with text kernel
> generation.
> 
> I won't go into implementation details, but if anyone is interested,
> or
> working on something similar, feel free to get in touch.

Dave,

[I've copied the cfe-dev list as well.]

Thanks for sharing this! I think this sounds very interesting. I don't know much about AMP, but I do have users who are also interested in accelerator targeting, and I'd like you to share your thoughts on:

 1. Does your implementation share common functionality with the 'captured statement' work that Intel is currently doing (in order to support Cilk, OpenMP, etc.)? If you're not aware of it, see: http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20130408/077615.html -- This should end up in trunk soon. I ask because if the current captured statement patches would almost, but not quite, work for you, then it would be interesting to understand why.

 2. What will be necessary to eliminate the two-clang-invocations problem. If we ever grow support for embedded accelerator targeting (through AMP, OpenACC, OpenMP 4+, etc.), it sounds like this will be a common requirement, and if I had to guess, there is common interest in putting the necessary infrastructure in place.

 -Hal

> 
> Thanks,
> Dave McFarland
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>