[llvm-dev] Math functions for CUDA patch

Fri Jun 15 15:16:37 PDT 2018

In general we try to convert nvvm intrinsics to proper LLVM intrinsics, so
that LLVM can understand what's going on and optimize the code.  There's a
whole bunch of these in AutoUpgrade.cpp, search for "nvvm".

The llvm/nvvm intrinsics are ultimately translated to the same PTX.

On Fri, Jun 15, 2018 at 3:00 PM Artem Belevich <tra at google.com> wrote:

> +CC: jlebar, llvm-dev@ as this may be of some interest for other users of
> NVPTX back-end.
>
> On Thu, Jun 14, 2018 at 9:33 AM Gheorghe-Teod Bercea <
> Gheorghe-Teod.Bercea at ibm.com> wrote:
>
>> Hi Artem,
>>
>> I hope things are well.
>>
>> Just touching base regarding the patch I posted last week:
>> https://reviews.llvm.org/D47849
>>
>> Based on your expertise of the CUDA toolchain in Clang, are math
>> functions for optimizations levels of O1 or higher translated to device
>> functions at all?
>>
>> I have been having a mixed experience with that. On the device side, for
>> CUDA, some functions (like pow) will be translated to a device version but
>> some functions like sqrt will use the llvm intrinsic version even though an
>> nvvm version of the function exists.
>
>
> I have been trying to leverage the existing CUDA functionality for OpenMP
>> device toolchain. I've been able to get OpenMP to do exactly what CUDA does
>> but my question, does CUDA do the right thing by using llvm intrinsics on
>> the device side? Or do we perhaps need to fix CUDA too.
>>
>>
> AFAICT, clang does not do anything special about translating math library
> calls into libdevice calls. We do include CUDA SDK headers that end up
> providing device-side overloads for at subset of libm calls. See
> include/math_functions.hpp in CUDA SDK.
> That maps math functions to __nv_* functions that come with CUDA SDK's
> libdevice bitcode. We link in necessary bits of bitcode before passing it
> to LLVM.
>
> Those libdevice functions in turn sometimes use NVPTX-specific intrinsics.
> E.g. fsqrt has this IR:
>
> define float @__nv_fsqrt_rn(float %x) #0 {
>> ...
>>   %3 = call float @llvm.nvvm.sqrt.rn.ftz.f(float %x)
>
>
> Then LLVM replaces calls to some of those intrinsics to their LLVM
> counterparts:
>
> https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineCalls.cpp#L1466
>
> This way LLVM has ability to reason about these calls and can optimize
> some of them.
>
> So, depending on optimizations you may or may not see some of these
> transformations and hopefully it explains the inconsistencies you have seen.
>
>
>
>> Please let me know your thoughts on this.
>>
>> Thanks,
>>
>> --Doru
>>
>>
>>
>>
>
> --
> --Artem Belevich
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180615/ccfc8440/attachment.html>