[cfe-dev] CUDA and libm
Hal Finkel via cfe-dev
cfe-dev at lists.llvm.org
Mon Jan 25 10:00:39 PST 2016
----- Original Message -----
> From: "Justin Lebar via cfe-dev" <cfe-dev at lists.llvm.org>
> To: cfe-dev at lists.llvm.org, c-gpu-team at google.com
> Sent: Sunday, January 24, 2016 1:29:10 PM
> Subject: [cfe-dev] CUDA and libm
>
> Hi, I'm looking for some advice on the right way to implement math
> functions in device-side CUDA code. I'm pretty new here, so please
> forgive the likely many gross misunderstandings below.
>
> There are three classes of things I'm concerned with:
>
> * Functions declared in math.h (e.g. sinf),
> * Builtin math functions (e.g. __builtin_sinf), and
> * LLVM intrinsic math instructions (e.g. llvm.sin.f32).
>
> At the moment the way this works is that the CUDA headers provided by
> nvidia define inline functions along the lines of
>
> __device__ float sinf(float v) {
> #ifndef FAST_MATH
> return __nv_fast_sinf(v);
> #else
> return __nv_sinf(v);
> #endif
> }
>
> This function is C++, *not* extern "C". __nv_sin and __nv_fast_sin
> are defined in libdevice [1], a bitcode library provided by nvidia.
>
> Some functions inside libdevice, e.g. __nv_fast_sinf, do nothing more
> than call the equivalent llvm nvvm intrinsic, e.g.
> llvm.nvvm.sin.approx.f. This then gets lowered to an equivalent nvvm
> machine instruction. Other functions in libdevice, such as
> __nv_sinf,
> do nontrivial computation explicitly written out in the llvm bitcode.
>
> Following so far? If so, you may note that the state of the world is
> rather incomplete! Here are the problems I see at the moment:
>
> * Many builtins don't work. For example, clang emits __builtin_sinf
> as a call to sinf [2]. But that function doesn't exist; nvidia's
> not-extern-"C" ::sinf is not the same. In particular this means that
> libstdc++ is not going to work well, since it implements e.g.
> std::sin(float) as a call to __builtin_sinf.
>
> * Many math optimizations aren't going to work (I think), because
> e.g.
> SimplifyLibCalls checks for function names like "log" [3], but by the
> time we get there, we're calling __nv_log.
We already have a solution to this problem. When you instantiate TLI, you can call TLI::setAvailableWithName to set an alternate name for some of the library calls. We already do some of this in lib/Analysis/TargetLibraryInfo.cpp in initialize.
-Hal
>
> * (At least some) llvm intrinsics sort of work, but e.g.
> llvm.sin.f32
> gets lowered to the nvvm instruction sin.approx.f32, while ::sin from
> the CUDA headers only does this transformation if fast-math is
> enabled. Maybe this is sound if we only emit llvm.sin.f32 if
> fast-math is enabled; I dunno.
>
> My question for the list is about the right way to fix these
> problems.
> It seems to me that since the optimizer explicitly uses knowledge of
> the various math functions, we shouldn't define inline versions of
> them. Instead, we should leave them as plain calls to e.g. sinf
> until
> we lower to nvvm in llvm. We have a header baked into clang that
> already disables some CUDA headers; in theory we should be able to
> disable CUDA's math_functions.h using the same mechanism.
>
> When lowering to nvvm, we can make the determination as to whether we
> want to call a function defined in libdevice, invoke an nvvm
> intrinsic, or whatever. If we call into libdevice we'll want to
> rerun
> some optimization passes, but I presume we can order these order
> these
> passes appropriately.
>
> In order to do this, we'll need libdevice to be available to llvm.
> It's not clear to me whether it is at the moment; clang sees it, but
> I'm not sure if clang passes both its generated IR and all of
> libdevice to llvm, or if it just copies the relevant definitions from
> libdevice into the IR it sends to llvm. If it's the latter, we could
> always copy *all* of libdevice into the generated IR. But I hope we
> could do better.
>
> I think this proposal lets us solve all three problems above, but
> again I'm not sure I'm not missing something, or if there's a more
> canonical way to do this. Any feedback is appreciated!
>
> -Justin
>
> [1]
> http://docs.nvidia.com/cuda/libdevice-users-guide/__nv_sin.html#__nv_sin
> [2]
> https://github.com/llvm-mirror/clang/blob/e2636ac0bad65451c3eb6272d7ab3abbba96da17/lib/CodeGen/CGBuiltin.cpp#L1971
> [3]
> https://github.com/llvm-mirror/llvm/blob/b3bc79d5556108307026be07e7eaa644cce041ab/lib/Transforms/Utils/SimplifyLibCalls.cpp#L1339
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the cfe-dev
mailing list