[LLVMdev] [NVPTX] We need an LLVM CUDA math library, after all

Dmitry Mikushin dmitry at kernelgen.org
Sun Feb 17 06:48:59 PST 2013


Hi Justin,

I don't understand, why, for instance, X86 backend handles pow
automatically, and NVPTX should be a PITA requiring user to bring his own
pow implementation. Even at a very general level, this limits the interest
of users to LLVM NVPTX backend. Could you please elaborate on the rationale
behind your point? Why the accuracy modes I suggested are not sufficient,
in your opinion?

- D.

2013/2/17 Justin Holewinski <justin.holewinski at gmail.com>

> I would be very hesitant to expose all math library functions as
> intrinsics.  I believe linking with a target-specific math library is the
> correct approach, as it decouples the back end from the needs of the source
> program/language.  Users should be free to use any math library
> implementation they choose.  Intrinsics are meant for functions that
> compile down to specific isa features, like fused multiply add and square
> root.
> On Feb 16, 2013 8:46 PM, "Dmitry Mikushin" <dmitry at kernelgen.org> wrote:
>
>> Dear Yuan,
>>
>> Sorry for delay with reply,
>>
>> Answers on your questions could be different, depending on the math
>> library placement in the code generation pipeline. At KernelGen, we
>> currently have a user-level CUDA math module, adopted from cicc internals
>> [1]. It is intended to be linked with the user LLVM IR module, right before
>> proceeding with the final optimization and backend. Last few months we are
>> using this method to temporary workaround the absence of many math
>> functions, to keep up the speed of applications testing in our compiler
>> test suite. Supplying math in such way is not portable and introduces many
>> issues, for instance:
>> 1) The frontend (DragonEgg - in our case) must be taught to emit real
>> math functions calls instead those of LLVM intrinsics, NVPTX cannot handle
>> 2) However, not all intrinsics should be replaced by math calls directly,
>> for example, there is not cdexp call, but it could be modelled with sincos.
>> 3) Our math module assumes sm_20, and could be inefficient or
>> non-portable on other families of GPUs.
>>
>> Instead of this approach, I think math library should be implemented *as
>> a lowering pass in backend*, working directly with intrinsics. In this
>> case - naming is not important, as well as final optimization is the job of
>> backend. But there is another important thing: backend should codegen math
>> with respect to accuracy settings, specified either as backend options, or
>> as functions attributes (quiet recent addition of LLVM). Accuracy settings
>> should be:
>> 1) fast-math (ftz, prec-div, prec-sqrt, fma, etc.)
>> 2) Use or not GPU-specific low-precision functions (__sin, __cos, etc.)
>>
>> Following latter approach, math handling of NVPTX will conform the rest
>> of LLVM, and no host-dependant tweaks will be needed.
>>
>> I'm also interested to contribute into this developments at reasonable
>> depth. Moving this part only on our own would slow down the progess with
>> main targets too much, that's why I'm asking for your help and cooperation.
>>
>> Best regards,
>> - Dima.
>>
>> [1]
>> https://hpcforge.org/scm/viewvc.php/*checkout*/trunk/src/cuda/include/math.bc?root=kernelgen
>>
>> 2013/2/8 Yuan Lin <yulin at nvidia.com>
>>
>>> Yes, it helps a lot and we are working on it.****
>>>
>>> ** **
>>>
>>> A few questions,****
>>>
>>> **1)      **What will be your use model of this library? Will you run
>>> optimization phases after linking with the library? If so, what are they?
>>> ****
>>>
>>> **2)      **Do you care if the names of functions differ from those in
>>> libm? For example, it would be gpusin() instead of sin(). ****
>>>
>>> **3)      **Do you need a different library for different host
>>> platforms? Why?****
>>>
>>> **4)      **Any other functions (besides math) you want to see in this
>>> library?****
>>>
>>> ** **
>>>
>>> Thanks.****
>>>
>>> ** **
>>>
>>> Yuan****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> *From:* Dmitry Mikushin [mailto:dmitry at kernelgen.org]
>>> *Sent:* Thursday, February 07, 2013 2:09 PM
>>> *To:* Justin Holewinski; LLVM Developers Mailing List
>>> *Cc:* Yuan Lin
>>> *Subject:* [NVPTX] We need an LLVM CUDA math library, after all****
>>>
>>> ** **
>>>
>>> Hi Justin, gentlemen,
>>>
>>> I'm afraid I have to escalate this issue at this point. Since it was
>>> discussed for the first time last summer, it was sufficient for us for a
>>> while to have lowering of math calls into intrinsics disabled at DragonEgg
>>> level, and link them against CUDA math functions at LLVM IR level. Now I
>>> can say: this is not sufficient any longer, and we need NVPTX backend to
>>> deal with GPU math.
>>>
>>> > There also is no standard libm for PTX.
>>>
>>> Yes, that's right, but there is an interesting idea to codegen CUDA math
>>> headers into LLVM IR and link it with user module at IR level. This method
>>> gives a perfect degree of flexibility with respect to high-level languages:
>>> the user no longer needs to deal with headers and can have math right in
>>> the IR, regardless the language it was lowered from. I can confirm this
>>> method works for us very well with C and Fortran, but in order to make
>>> accurate replacements of unsupported intrinsics calls, it needs to become
>>> aware of NVPTX backend capabilities in the form of:
>>>
>>> bool NVPTXTargetMachine::****
>>>
>>> isIntrinsicSupported(Function& intrinsic) and
>>> string NVPTXTargetMachine::whichMathCallReplacesIntrinsic(Function&
>>> intrinsic)
>>>
>>> > I would prefer not to lower such things in the back-end since
>>> different compilers may want to implement such functions differently based
>>> on speed vs. accuracy trade-offs.
>>>
>>> Who are those different compilers? We are LLVM, the complete compiler
>>> stack, which should handle these things on its specific preference. Derived
>>> compilers may certainly think different, and it's their own business to
>>> change anything they want and never contribute back. We should not forget
>>> there are a lot of derived projects that use LLVM directly, like KernelGen
>>> or many of those embedded DSLs recently started flourishing. Their
>>> completeness and future relies on LLVM. For these reasons, I would strongly
>>> prefer LLVM/NVPTX should supply a reference GPU math implementation and
>>> invite you and everyone else to form a joint roadmap to deliver it.
>>>
>>> Before we started, IANAL, but something tells me there could be a
>>> licensing issue about releasing the LLVM IR emitted from CUDA headers.
>>> Could you please check this with NVIDIA?
>>>
>>> Many thanks,
>>> - D.
>>>
>>> 2012/9/6 Justin Holewinski <justin.holewinski at gmail.com>:
>>> > On 09/06/2012 10:02 AM, Dmitry N. Mikushin wrote:
>>> >>
>>> >> Dear all,
>>> >>
>>> >> During app compilation we have a crash in NVPTX backend:
>>> >>
>>> >> LLVM ERROR: Cannot select: 0x732b270: i64 = ExternalSymbol'__powisf2'
>>> >> [ID=18]
>>> >>
>>> >> As I understand LLVM tries to lower the following call
>>> >>
>>> >> %28 = call ptx_device float @llvm.powi.f32(float 2.000000e+00, i32 %8)
>>> >> nounwind readonly
>>> >>
>>> >> to device intrinsic. The table llvm/IntrinsicsNVVM.td does not contain
>>> >> such intrinsic, however it should be builtin, according to
>>> >> cuda/include/math_functions.h
>>> >
>>> >
>>> > It actually gets lowered into an external function call.
>>> >
>>> >
>>> >>
>>> >> Is my understanding correct, and we need simply add the corresponding
>>> >> definition to llvm/IntrinsicsNVVM.td ? How to do that, what are the
>>> >> rules?
>>> >
>>> >
>>> > PTX does not have an instruction (or simple series of instructions)
>>> that
>>> > implements pow, so this will not be handled.  I would prefer not to
>>> lower
>>> > such things in the back-end since different compilers may want to
>>> implement
>>> > such functions differently based on speed vs. accuracy trade-offs.
>>> >
>>> > There also is no standard libm for PTX.  It is up to the higher-level
>>> > compiler to link against a run-time library that provides functions
>>> like pow
>>> > (see include/math_functions.h in a CUDA distribution).
>>> >
>>> >>
>>> >> Thanks,
>>> >> - D.
>>> >> _______________________________________________
>>> >> LLVM Developers mailing list
>>> >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> >****
>>>
>>> ****
>>>
>>> >
>>> > --
>>> > Thanks,
>>> >
>>> > Justin Holewinski
>>> >****
>>>  ------------------------------
>>>  This email message is for the sole use of the intended recipient(s)
>>> and may contain confidential information.  Any unauthorized review, use,
>>> disclosure or distribution is prohibited.  If you are not the intended
>>> recipient, please contact the sender by reply email and destroy all copies
>>> of the original message.
>>>  ------------------------------
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130217/907ddef1/attachment.html>


More information about the llvm-dev mailing list