[LLVMdev] PTX builtin functions.

Wed Nov 23 03:57:31 PST 2011

On Tue, Nov 22, 2011 at 5:01 PM, Villmow, Micah <Micah.Villmow at amd.com> wrote:
> Alberto,
>  The AMDIL backend solves your problem with intrinsic overloading this way:
> def int_AMDIL_mad     : GCCBuiltin<"__amdil_mad">, TernaryIntFloat;
>
> Where TernaryIntFloat is defined as:
> class TernaryIntFloat :
>          Intrinsic<[llvm_anyfloat_ty], [LLVMMatchType<0>,
>          LLVMMatchType<0>, LLVMMatchType<0>], []>;
>
> This allows us to write a multi-def for int_AMDIL_mad like so:
> defm MAD  : TernaryIntrinsicFloat<IL_OP_MAD, int_AMDIL_mad>;
>
> Where TernaryIntrinsicFloat is defined as:
> multiclass TernaryIntrinsicFloat<ILOpCode opcode, Intrinsic intr>
> {
>  def _f32 : ThreeInOneOut<opcode, (outs GPRF32:$dst),
>      (ins GPRF32:$src, GPRF32:$src2, GPRF32:$src3),
>      !strconcat(opcode.Text, " $dst, $src, $src2, $src3"),
>      [(set GPRF32:$dst,
>          (intr GPRF32:$src, GPRF32:$src2, GPRF32:$src3))]>;
>  def _v2f32 : ThreeInOneOut<opcode, (outs GPRV2F32:$dst),
>      (ins GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3),
>      !strconcat(opcode.Text, " $dst, $src, $src2, $src3"),
>      [(set GPRV2F32:$dst,
>          (intr GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3))]>;
> ...
> }
>
> Now, this doesn't completely work, because LLVM does not allow overloading of intrinsics values, so there needs to be a little coding in *IntrinsicInfo class.
> AMD always encodes builtin names as __amdil_mad_f32, __amdil_mad_v2f32, __amdil_mad_v4f32, etc....
> So in the function "*IntrinsicInfo::lookup_name", when attempting to find out what intrinsic the function maps to, the AMDIL backend strips off the type, and then looks up for just '__amdil_mad'.
>
> This is how you can do intrinsic overloading in LLVM.
>
> Hope this helps,
> Micah

Thank you Micah, it really does.

At the moment the PTX backend does not have a PTXIntrinsicInfo class,
the only backend which does so is MBlaze.
If Justin agrees with the approach I will look on how to generate the
PTXGenIntrinsics.inc file (I am still learning TableGen)
required by PTXIntrinsicInfo and write the lookUp method.

Cheers,

Alberto

>
>> -----Original Message-----
>> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
>> On Behalf Of Alberto Magni
>> Sent: Tuesday, November 22, 2011 8:41 AM
>> To: Justin Holewinski
>> Cc: LLVM Developers Mailing List
>> Subject: Re: [LLVMdev] PTX builtin functions.
>>
>> On Mon, Nov 21, 2011 at 5:31 PM, Justin Holewinski
>> <justin.holewinski at gmail.com> wrote:
>> > On Mon, Nov 21, 2011 at 11:45 AM, Alberto Magni
>> <alberto.magni86 at gmail.com>
>> > wrote:
>> >>
>> >> On Mon, Nov 21, 2011 at 3:36 PM, Justin Holewinski
>> >> <justin.holewinski at gmail.com> wrote:
>> >> > On Mon, Nov 21, 2011 at 7:01 AM, Alberto Magni
>> >> > <alberto.magni86 at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> Hi Justin,
>> >> >>
>> >> >> attached you find the patch for the integer max instruction.
>> >> >> The multiclass PTX_INTRINSIC_INT3 in file
>> PTXIntrinsicInstrInfo.td
>> >> >> is almost an exact copy of  PTX_INT3 in PTXInstrInfo.td, maybe
>> >> >> a modification of this class can be defined in a separate file.
>> >> >
>> >> >
>> >> > I'm copying llvmdev.  We should keep discussions like this on the
>> list
>> >> > for
>> >> > the benefit of others.
>> >>
>> >> I always forget "Reply to All".
>> >>
>> >> > We can probably factor out a generic description, or even just use
>> the
>> >> > PTX_INT3 multiclass directly.  The PTXIntrinsicInstrInfo.td file
>> is
>> >> > included
>> >> > by PTXInstrInfo.td, so anything defined in PTXInstrInfo.td is
>> available
>> >> > in
>> >> > PTXIntrinsicInstrInfo.td.
>> >>
>> >> I agree with you but my class PTX_INTRINSIC_INT3 works with an
>> Intrinsic
>> >> and not with a SDNode, like PTX_INT3.
>> >> PTX_INTRINSIC_INT3 also requires the presence of the type of
>> >> the immediate in the pattern, e.g. (i32 imm:$b).
>> >
>> >
>> > Alright, I'm fine with that.
>> >
>> >>
>> >>
>> >> >>
>> >> >>
>> >> >> Do you agree with this approach ?
>> >> >> Also, do you think that a class like PTX_INTRINSIC_INT3_SIGNED
>> >> >> (a clone of PTX_INT3_SIGNED) is required ?
>> >> >
>> >> >
>> >> > Yes, I believe we should split these into signed and unsigned
>> variants.
>> >> >  The
>> >> > results of max/min operations can definitely be different
>> depending on
>> >> > whether the operands are signed or unsigned.  Since this
>> information is
>> >> > not
>> >> > encoded in LLVM types, we may want to create two versions for each
>> >> > integer
>> >> > type; something like:
>> >> >
>> >> > i32 @llvm.ptx.max.signed.i32(i32, i32)
>> >> > i32 @llvm.ptx.max.unsigned.i32(i32, i32)
>> >>
>> >> Yes, this the only way.
>> >
>> >
>> > A couple more comments:
>> >
>> > Please make sure to set TargetPrefix="ptx" for the intrinsics
>> (probably best
>> > in the multiclass, see PTXReadSpecialRegisterIntrinsic_r32)]
>>
>> Ok
>>
>> > I'm not sure how to define a GCCBuiltin for an intrinsic that can
>> take
>> > multiple types, but it's probably worth looking into so we can expose
>> this
>> > intrinsic to Clang.
>>
>> This could be an issue. I looked for something similar in other
>> backends
>> and I found no previous examples. It may be worth to ask on the ML
>> explicitly for this.
>> The only fallback that I see is to define explicitly every intrinsic
>> for every data type,
>> but this would prevent the usage of the multiclass for the definition
>> of the patterns.
>>
>>
>> Bye.
>>
>> >
>> >
>> >>
>> >>
>> >> >
>> >> > Otherwise, the patch looks good.
>> >> >
>> >> >>
>> >> >>
>> >> >> Thanks,
>> >> >>
>> >> >> Alberto
>> >> >>
>> >> >> On Wed, Nov 16, 2011 at 5:44 PM, Alberto Magni
>> >> >> <alberto.magni86 at gmail.com> wrote:
>> >> >> > On Wed, Nov 16, 2011 at 2:17 PM, Justin Holewinski
>> >> >> > <justin.holewinski at gmail.com> wrote:
>> >> >> >> On Wed, Nov 16, 2011 at 9:16 AM, Justin Holewinski
>> >> >> >> <justin.holewinski at gmail.com> wrote:
>> >> >> >>>
>> >> >> >>> On Wed, Nov 16, 2011 at 8:05 AM, Alberto Magni
>> >> >> >>> <alberto.magni86 at gmail.com>
>> >> >> >>> wrote:
>> >> >> >>>>
>> >> >> >>>> Dear Justin,
>> >> >> >>>>
>> >> >> >>>> I am trying to add the support for some OpenCL builtin
>> functions
>> >> >> >>>> to
>> >> >> >>>> the PTX backend.
>> >> >> >>>> The attached file represent the first stub of a patch for
>> the fmax
>> >> >> >>>> builtin function.
>> >> >> >>>
>> >> >> >>> First off, thanks for helping to improve the PTX back-end!
>> >> >> >>> There are really two main issues here.  First, OpenCL built-
>> in
>> >> >> >>> functions
>> >> >> >>> do not belong in the PTX back-end.  These will be implemented
>> in
>> >> >> >>> the
>> >> >> >>> libclc
>> >> >> >>> library (http://www.pcc.me.uk/~peter/libclc).  The back-end
>> will
>> >> >> >>> only
>> >> >> >>> implement PTX intrinsics, which may be used by the OpenCL
>> built-in
>> >> >> >>> functions
>> >> >> >>> in libclc.  However, this particular function (max)
>> corresponds to
>> >> >> >>> a
>> >> >> >>> PTX
>> >> >> >>> instruction, so it makes sense to implement it as an
>> intrinsic in
>> >> >> >>> the
>> >> >> >>> back-end.
>> >> >> >>> Second, intrinsic functions require a bit more work.  You're
>> off to
>> >> >> >>> a
>> >> >> >>> great start, but intrinsics are implemented a bit
>> differently.  It
>> >> >> >>> looks
>> >> >> >>> like LLVM does not have a max intrinsic, so we'll need to
>> create
>> >> >> >>> one.
>> >> >> >>>  Have
>> >> >> >>> a look at include/llvm/IntrinsicsPTX.td.  This file defines
>> the
>> >> >> >>> PTX-specific
>> >> >> >>> intrinsics.  You can add an intrinsic for max here, and then
>> >> >> >>> implement
>> >> >> >>> a
>> >> >> >>> pattern-match in the PTXInstrInfo.td file.  There is no need
>> to
>> >> >> >>> create
>> >> >> >>> a new
>> >> >> >>> SDNode type for intrinsics, unless they require some special
>> >> >> >>> handling
>> >> >> >>> in the
>> >> >> >>> C++ code, which I do not see being the case here.
>> >> >> >>
>> >> >> >> Sorry, there's a typo here.  The intrinsic pattern matching
>> goes in
>> >> >> >> PTXInstrinsicInstrInfo.td.
>> >> >> >>
>> >> >> >
>> >> >> > Thank you for the pointers I will let you know when I have the
>> first
>> >> >> > patch.
>> >> >> >
>> >> >> >>>
>> >> >> >>> When you define a new intrinsic, use the following template
>> as a
>> >> >> >>> name:
>> >> >> >>> int_ptx_max.  This will define the LLVM intrinsic as
>> >> >> >>> @llvm.ptx.max().
>> >> >> >>>  Please follow the same convention when naming the
>> __builtin_*
>> >> >> >>> function.
>> >> >> >>>
>> >> >> >>>>
>> >> >> >>>> The test case I am trying is the following:
>> >> >> >>>>
>> >> >> >>>> define ptx_device float @f(float %x, float %y) {
>> >> >> >>>> entry:
>> >> >> >>>>  %z = call float @fmax(float %x, float %y)
>> >> >> >>>>  ret float %z
>> >> >> >>>> }
>> >> >> >>>>
>> >> >> >>>> declare float @fmax(float, float)
>> >> >> >>>>
>> >> >> >>>> But at the moment llc crashes saying that "calls are not
>> >> >> >>>> supported",
>> >> >> >>>> this does not
>> >> >> >>>> happens with llvm builtins like llvm.sqrt.f32
>> >> >> >>>
>> >> >> >>> Which version of LLVM are you using?  Calls to PTX device
>> functions
>> >> >> >>> have
>> >> >> >>> been implemented for a little while now, so I'm surprised to
>> see
>> >> >> >>> that
>> >> >> >>> error.
>> >> >> >>>  Perhaps it's because the fmax function is not defined as
>> >> >> >>> ptx_device.
>> >> >> >>>
>> >> >> >
>> >> >> > This is the testcase that I am using to verify I the max
>> builtin
>> >> >> > function I am impementing
>> >> >> > is actually recognised. I took inspiration from the llvm-
>> intrinsic.ll
>> >> >> > test case.
>> >> >> > The command I am using to compile is:
>> >> >> >
>> >> >> > llc -march=ptx32 -mattr=+ptx22 fmax.ll
>> >> >> >
>> >> >> > The option -mattr does not seem to have any effect.
>> >> >> > I tried also with the ptx_device qualifier with the same
>> outcome.
>> >> >> > I am using llvm from the svn repository.
>> >> >> >
>> >> >> > Bye,
>> >> >> >
>> >> >> > Alberto
>> >> >> >
>> >> >> >>>>
>> >> >> >>>> Can you please give me a hint on what I am missing, or some
>> >> >> >>>> general
>> >> >> >>>> advice on how
>> >> >> >>>> to add builtin functions.
>> >> >> >>>>
>> >> >> >>>> Thank you in advance,
>> >> >> >>>>
>> >> >> >>>> Alberto.
>> >> >> >>>>
>> >> >> >>>> _______________________________________________
>> >> >> >>>> LLVM Developers mailing list
>> >> >> >>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> >> >> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> >> >> >>>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> --
>> >> >> >>>
>> >> >> >>> Thanks,
>> >> >> >>> Justin Holewinski
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >>
>> >> >> >> Thanks,
>> >> >> >> Justin Holewinski
>> >> >> >>
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Justin Holewinski
>> >> >
>> >
>> >
>> >
>> >
>> > --
>> >
>> > Thanks,
>> >
>> > Justin Holewinski
>> >
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>