[Openmp-dev] Preferred alternative to a C++ dialect for device library functions

Finkel, Hal J. via Openmp-dev openmp-dev at lists.llvm.org
Tue Jan 14 10:13:51 PST 2020


On 1/14/20 11:28 AM, Jon Chesterfield wrote:
Hi Hal,

In this case (atomicInc) it's a target specific IR intrinsic but there are other cases where there is target-independent IR support but not clang. OpenCL memory fences for one.

So the general question is what do do about functions with IR/asm support but no clang builtins.


My general preference is just to add the Clang intrinsic. Adding target-specific intrinsics is, generically, pretty easy (one line in include/clang/Basic/BuiltinsAMDGPU.def and a few lines of code in CodeGenFunction::EmitAMDGPUBuiltinExpr in lib/CodeGen/CGBuiltin.cpp and some lines for testing in test/CodeGen/builtins-amdgcn.c (which, unfortunately, doesn't seem to exist, but just make one like test/CodeGen/builtins-nvptx.c or like test/CodeGen/builtins-nvptx-sm_70.cu).

For the target-independent ones, please post an RFC to cfe-dev about adding the intrinsics so that we can settle on that before you need them.

If there's something complicated about the frontend work, I recommend a .ll file as a work-around.

 -Hal


Thanks!

Jon


On Tue, Jan 14, 2020 at 5:15 PM Finkel, Hal J. <hfinkel at anl.gov<mailto:hfinkel at anl.gov>> wrote:


On 1/14/20 10:37 AM, Jon Chesterfield via Openmp-dev wrote:
Hello OpenMP dev,

A motivating example is atomicInc for amdgcn. There is ISA support for this so a good implementation folds to a single instruction. There is no corresponding clang intrinsic, though there is an llvm intrinsic.


Do you mean a target-specific intrinsic, or a target-independent intrinsic?

 -Hal


I see the following options:
- Implement it in IR, linked into deviceRTL
- Inline assembly
- Delay implementation until the intrinsic can be added to clang
- Implement in terms of CAS
- Your suggestion here

Adding atomicInc.ll to the source tree is the easy short term fix. It has drawbacks in terms of future ABI change, build complexity and limited precedent - libclc does this, but nowhere else.

Inline assembly works (modulo getting the syntax right) and hits the right instruction.

Implementing in terms of CAS means one can stay in HIP or OpenCL, but performance suffers.

What would the you prefer out of these options?

Thanks,

Jon



_______________________________________________
Openmp-dev mailing list
Openmp-dev at lists.llvm.org<mailto:Openmp-dev at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev


--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20200114/22bae175/attachment.html>


More information about the Openmp-dev mailing list