[cfe-dev] CUDA and libm

Hal Finkel via cfe-dev cfe-dev at lists.llvm.org
Mon Jan 25 10:00:39 PST 2016


----- Original Message -----
> From: "Justin Lebar via cfe-dev" <cfe-dev at lists.llvm.org>
> To: cfe-dev at lists.llvm.org, c-gpu-team at google.com
> Sent: Sunday, January 24, 2016 1:29:10 PM
> Subject: [cfe-dev] CUDA and libm
> 
> Hi, I'm looking for some advice on the right way to implement math
> functions in device-side CUDA code.  I'm pretty new here, so please
> forgive the likely many gross misunderstandings below.
> 
> There are three classes of things I'm concerned with:
> 
>  * Functions declared in math.h (e.g. sinf),
>  * Builtin math functions (e.g. __builtin_sinf), and
>  * LLVM intrinsic math instructions (e.g. llvm.sin.f32).
> 
> At the moment the way this works is that the CUDA headers provided by
> nvidia define inline functions along the lines of
> 
>   __device__ float sinf(float v) {
>   #ifndef FAST_MATH
>     return __nv_fast_sinf(v);
>   #else
>     return __nv_sinf(v);
>   #endif
>   }
> 
> This function is C++, *not* extern "C".  __nv_sin and __nv_fast_sin
> are defined in libdevice [1], a bitcode library provided by nvidia.
> 
> Some functions inside libdevice, e.g. __nv_fast_sinf, do nothing more
> than call the equivalent llvm nvvm intrinsic, e.g.
> llvm.nvvm.sin.approx.f.  This then gets lowered to an equivalent nvvm
> machine instruction.  Other functions in libdevice, such as
> __nv_sinf,
> do nontrivial computation explicitly written out in the llvm bitcode.
> 
> Following so far?  If so, you may note that the state of the world is
> rather incomplete!  Here are the problems I see at the moment:
> 
>  * Many builtins don't work.  For example, clang emits __builtin_sinf
> as a call to sinf [2].  But that function doesn't exist; nvidia's
> not-extern-"C" ::sinf is not the same.  In particular this means that
> libstdc++ is not going to work well, since it implements e.g.
> std::sin(float) as a call to __builtin_sinf.
> 
> * Many math optimizations aren't going to work (I think), because
> e.g.
> SimplifyLibCalls checks for function names like "log" [3], but by the
> time we get there, we're calling __nv_log.

We already have a solution to this problem. When you instantiate TLI, you can call TLI::setAvailableWithName to set an alternate name for some of the library calls. We already do some of this in lib/Analysis/TargetLibraryInfo.cpp in initialize.

 -Hal

> 
>  * (At least some) llvm intrinsics sort of work, but e.g.
>  llvm.sin.f32
> gets lowered to the nvvm instruction sin.approx.f32, while ::sin from
> the CUDA headers only does this transformation if fast-math is
> enabled.  Maybe this is sound if we only emit llvm.sin.f32 if
> fast-math is enabled; I dunno.
> 
> My question for the list is about the right way to fix these
> problems.
> It seems to me that since the optimizer explicitly uses knowledge of
> the various math functions, we shouldn't define inline versions of
> them.  Instead, we should leave them as plain calls to e.g. sinf
> until
> we lower to nvvm in llvm.  We have a header baked into clang that
> already disables some CUDA headers; in theory we should be able to
> disable CUDA's math_functions.h using the same mechanism.
> 
> When lowering to nvvm, we can make the determination as to whether we
> want to call a function defined in libdevice, invoke an nvvm
> intrinsic, or whatever.  If we call into libdevice we'll want to
> rerun
> some optimization passes, but I presume we can order these order
> these
> passes appropriately.
> 
> In order to do this, we'll need libdevice to be available to llvm.
> It's not clear to me whether it is at the moment; clang sees it, but
> I'm not sure if clang passes both its generated IR and all of
> libdevice to llvm, or if it just copies the relevant definitions from
> libdevice into the IR it sends to llvm.  If it's the latter, we could
> always copy *all* of libdevice into the generated IR.  But I hope we
> could do better.
> 
> I think this proposal lets us solve all three problems above, but
> again I'm not sure I'm not missing something, or if there's a more
> canonical way to do this.  Any feedback is appreciated!
> 
> -Justin
> 
> [1]
> http://docs.nvidia.com/cuda/libdevice-users-guide/__nv_sin.html#__nv_sin
> [2]
> https://github.com/llvm-mirror/clang/blob/e2636ac0bad65451c3eb6272d7ab3abbba96da17/lib/CodeGen/CGBuiltin.cpp#L1971
> [3]
> https://github.com/llvm-mirror/llvm/blob/b3bc79d5556108307026be07e7eaa644cce041ab/lib/Transforms/Utils/SimplifyLibCalls.cpp#L1339
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory



More information about the cfe-dev mailing list