[LLVMdev] PTX builtin functions.

Tue Nov 22 08:40:58 PST 2011

On Mon, Nov 21, 2011 at 5:31 PM, Justin Holewinski
<justin.holewinski at gmail.com> wrote:
> On Mon, Nov 21, 2011 at 11:45 AM, Alberto Magni <alberto.magni86 at gmail.com>
> wrote:
>>
>> On Mon, Nov 21, 2011 at 3:36 PM, Justin Holewinski
>> <justin.holewinski at gmail.com> wrote:
>> > On Mon, Nov 21, 2011 at 7:01 AM, Alberto Magni
>> > <alberto.magni86 at gmail.com>
>> > wrote:
>> >>
>> >> Hi Justin,
>> >>
>> >> attached you find the patch for the integer max instruction.
>> >> The multiclass PTX_INTRINSIC_INT3 in file PTXIntrinsicInstrInfo.td
>> >> is almost an exact copy of  PTX_INT3 in PTXInstrInfo.td, maybe
>> >> a modification of this class can be defined in a separate file.
>> >
>> >
>> > I'm copying llvmdev.  We should keep discussions like this on the list
>> > for
>> > the benefit of others.
>>
>> I always forget "Reply to All".
>>
>> > We can probably factor out a generic description, or even just use the
>> > PTX_INT3 multiclass directly.  The PTXIntrinsicInstrInfo.td file is
>> > included
>> > by PTXInstrInfo.td, so anything defined in PTXInstrInfo.td is available
>> > in
>> > PTXIntrinsicInstrInfo.td.
>>
>> I agree with you but my class PTX_INTRINSIC_INT3 works with an Intrinsic
>> and not with a SDNode, like PTX_INT3.
>> PTX_INTRINSIC_INT3 also requires the presence of the type of
>> the immediate in the pattern, e.g. (i32 imm:$b).
>
>
> Alright, I'm fine with that.
>
>>
>>
>> >>
>> >>
>> >> Do you agree with this approach ?
>> >> Also, do you think that a class like PTX_INTRINSIC_INT3_SIGNED
>> >> (a clone of PTX_INT3_SIGNED) is required ?
>> >
>> >
>> > Yes, I believe we should split these into signed and unsigned variants.
>> >  The
>> > results of max/min operations can definitely be different depending on
>> > whether the operands are signed or unsigned.  Since this information is
>> > not
>> > encoded in LLVM types, we may want to create two versions for each
>> > integer
>> > type; something like:
>> >
>> > i32 @llvm.ptx.max.signed.i32(i32, i32)
>> > i32 @llvm.ptx.max.unsigned.i32(i32, i32)
>>
>> Yes, this the only way.
>
>
> A couple more comments:
>
> Please make sure to set TargetPrefix="ptx" for the intrinsics (probably best
> in the multiclass, see PTXReadSpecialRegisterIntrinsic_r32)]

Ok

> I'm not sure how to define a GCCBuiltin for an intrinsic that can take
> multiple types, but it's probably worth looking into so we can expose this
> intrinsic to Clang.

This could be an issue. I looked for something similar in other backends
and I found no previous examples. It may be worth to ask on the ML
explicitly for this.
The only fallback that I see is to define explicitly every intrinsic
for every data type,
but this would prevent the usage of the multiclass for the definition
of the patterns.

Bye.

>
>
>>
>>
>> >
>> > Otherwise, the patch looks good.
>> >
>> >>
>> >>
>> >> Thanks,
>> >>
>> >> Alberto
>> >>
>> >> On Wed, Nov 16, 2011 at 5:44 PM, Alberto Magni
>> >> <alberto.magni86 at gmail.com> wrote:
>> >> > On Wed, Nov 16, 2011 at 2:17 PM, Justin Holewinski
>> >> > <justin.holewinski at gmail.com> wrote:
>> >> >> On Wed, Nov 16, 2011 at 9:16 AM, Justin Holewinski
>> >> >> <justin.holewinski at gmail.com> wrote:
>> >> >>>
>> >> >>> On Wed, Nov 16, 2011 at 8:05 AM, Alberto Magni
>> >> >>> <alberto.magni86 at gmail.com>
>> >> >>> wrote:
>> >> >>>>
>> >> >>>> Dear Justin,
>> >> >>>>
>> >> >>>> I am trying to add the support for some OpenCL builtin functions
>> >> >>>> to
>> >> >>>> the PTX backend.
>> >> >>>> The attached file represent the first stub of a patch for the fmax
>> >> >>>> builtin function.
>> >> >>>
>> >> >>> First off, thanks for helping to improve the PTX back-end!
>> >> >>> There are really two main issues here.  First, OpenCL built-in
>> >> >>> functions
>> >> >>> do not belong in the PTX back-end.  These will be implemented in
>> >> >>> the
>> >> >>> libclc
>> >> >>> library (http://www.pcc.me.uk/~peter/libclc).  The back-end will
>> >> >>> only
>> >> >>> implement PTX intrinsics, which may be used by the OpenCL built-in
>> >> >>> functions
>> >> >>> in libclc.  However, this particular function (max) corresponds to
>> >> >>> a
>> >> >>> PTX
>> >> >>> instruction, so it makes sense to implement it as an intrinsic in
>> >> >>> the
>> >> >>> back-end.
>> >> >>> Second, intrinsic functions require a bit more work.  You're off to
>> >> >>> a
>> >> >>> great start, but intrinsics are implemented a bit differently.  It
>> >> >>> looks
>> >> >>> like LLVM does not have a max intrinsic, so we'll need to create
>> >> >>> one.
>> >> >>>  Have
>> >> >>> a look at include/llvm/IntrinsicsPTX.td.  This file defines the
>> >> >>> PTX-specific
>> >> >>> intrinsics.  You can add an intrinsic for max here, and then
>> >> >>> implement
>> >> >>> a
>> >> >>> pattern-match in the PTXInstrInfo.td file.  There is no need to
>> >> >>> create
>> >> >>> a new
>> >> >>> SDNode type for intrinsics, unless they require some special
>> >> >>> handling
>> >> >>> in the
>> >> >>> C++ code, which I do not see being the case here.
>> >> >>
>> >> >> Sorry, there's a typo here.  The intrinsic pattern matching goes in
>> >> >> PTXInstrinsicInstrInfo.td.
>> >> >>
>> >> >
>> >> > Thank you for the pointers I will let you know when I have the first
>> >> > patch.
>> >> >
>> >> >>>
>> >> >>> When you define a new intrinsic, use the following template as a
>> >> >>> name:
>> >> >>> int_ptx_max.  This will define the LLVM intrinsic as
>> >> >>> @llvm.ptx.max().
>> >> >>>  Please follow the same convention when naming the __builtin_*
>> >> >>> function.
>> >> >>>
>> >> >>>>
>> >> >>>> The test case I am trying is the following:
>> >> >>>>
>> >> >>>> define ptx_device float @f(float %x, float %y) {
>> >> >>>> entry:
>> >> >>>>  %z = call float @fmax(float %x, float %y)
>> >> >>>>  ret float %z
>> >> >>>> }
>> >> >>>>
>> >> >>>> declare float @fmax(float, float)
>> >> >>>>
>> >> >>>> But at the moment llc crashes saying that "calls are not
>> >> >>>> supported",
>> >> >>>> this does not
>> >> >>>> happens with llvm builtins like llvm.sqrt.f32
>> >> >>>
>> >> >>> Which version of LLVM are you using?  Calls to PTX device functions
>> >> >>> have
>> >> >>> been implemented for a little while now, so I'm surprised to see
>> >> >>> that
>> >> >>> error.
>> >> >>>  Perhaps it's because the fmax function is not defined as
>> >> >>> ptx_device.
>> >> >>>
>> >> >
>> >> > This is the testcase that I am using to verify I the max builtin
>> >> > function I am impementing
>> >> > is actually recognised. I took inspiration from the llvm-intrinsic.ll
>> >> > test case.
>> >> > The command I am using to compile is:
>> >> >
>> >> > llc -march=ptx32 -mattr=+ptx22 fmax.ll
>> >> >
>> >> > The option -mattr does not seem to have any effect.
>> >> > I tried also with the ptx_device qualifier with the same outcome.
>> >> > I am using llvm from the svn repository.
>> >> >
>> >> > Bye,
>> >> >
>> >> > Alberto
>> >> >
>> >> >>>>
>> >> >>>> Can you please give me a hint on what I am missing, or some
>> >> >>>> general
>> >> >>>> advice on how
>> >> >>>> to add builtin functions.
>> >> >>>>
>> >> >>>> Thank you in advance,
>> >> >>>>
>> >> >>>> Alberto.
>> >> >>>>
>> >> >>>> _______________________________________________
>> >> >>>> LLVM Developers mailing list
>> >> >>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >> >>>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>>
>> >> >>> Thanks,
>> >> >>> Justin Holewinski
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >> Thanks,
>> >> >> Justin Holewinski
>> >> >>
>> >
>> >
>> >
>> >
>> > --
>> >
>> > Thanks,
>> >
>> > Justin Holewinski
>> >
>
>
>
>
> --
>
> Thanks,
>
> Justin Holewinski
>