[PATCH] D42319: [CUDA] CUDA has no device-side library builtins.

Mon Jan 22 14:55:08 PST 2018

tra added a comment.

In https://reviews.llvm.org/D42319#983377, @jlebar wrote:

> How does this affect e.g. calling memcpy()?  There isn't a standard library implementation of this on nvptx, but we do want calls to memcpy() to be lowered to llvm.memcpy so that they can be optimized.

We implement memcpy as a call to __builtin_memcpy() which gets code-gen-ed as usual.  NVPTX also lowers all memcpy/memset/memmove as loads/stores, so those don't need external library. This behavior is not affected by this patch.

This patch's goal is to prevent clang codegen-ing its idea of the library builtin function while ignoring the implementation we've provided in the headers for device side.

Original issue I had was triggered by code roughly similar to this:

  extern "C" __device__ int logf(float a) { return __nv_logf(a); }
  __global__ void kernel() { logf(0.0f); }

In the AST, the kernel was calling the logf functions above However, when clang generated code, it considered that logf is a library builtin with known semantics and happily codegen'ed a call to @llvm.log.f32, which NVPTX back-end has no way to lower. The patch adds a safety net in clang so it does not generate code for builtins which we have disabled (or can't handle) in NVPTX.

https://reviews.llvm.org/D42319