[LLVMdev] PTX builtin functions.

Alberto Magni alberto.magni86 at gmail.com
Thu Dec 8 10:44:58 PST 2011


If I do so something strange happens.
If I add these 5 lines to include/llvm/IntrinsicsPTX.td:

let TargetPrefix = "ptx", isTarget = 1 in {
  def int_ptx_max_signed : Intrinsic<[llvm_anyint_ty],
                                     [LLVMMatchType<0>, LLVMMatchType<0>],
                                     [IntrNoMem, Commutative]>;
}

I get the following compilation error:

In file included from MBlazeIntrinsicInfo.cpp:99:
MBlazeGenIntrinsics.inc: In function ‘llvm::FunctionType*
getType(llvm::LLVMContext&, unsigned int)’:
MBlazeGenIntrinsics.inc:651: error: ‘Tys’ was not declared in this scope

That's why the generated file MBlazeGenIntrinsics.inc contains a reference
to the ptx intrinsics. The error is due to the fact that MBlaze intrinsics
are not overloaded and therefore the variable Tys is not defined.

I am not sure if this is a limitation of the MBlaze backend of the PTX.

Anyway I noticed the isTarget (I included it in my previous patch) but I thought
it works only for the XXXIntrinsics.td file.

Alberto


On Thu, Dec 8, 2011 at 5:36 PM, Villmow, Micah <Micah.Villmow at amd.com> wrote:
> It is my understanding that all you need to do is specify let isTarget = 1
> in your .td file and it will generate target specific intrinsics. This
> should allow you to keep the IntrinsicsPTX.td file in the same location.
>
>
>
> Micah
>
>
>
> From: Justin Holewinski [mailto:justin.holewinski at gmail.com]
> Sent: Monday, December 05, 2011 6:13 AM
> To: Alberto Magni
> Cc: Villmow, Micah; LLVM Developers Mailing List
>
>
> Subject: Re: [LLVMdev] PTX builtin functions.
>
>
>
> On Sun, Dec 4, 2011 at 1:10 PM, Alberto Magni <alberto.magni86 at gmail.com>
> wrote:
>
> Hi Justin,
>
> sorry for the delay, I have been busy.
>
> Micah's proposal requires to move the definitions of the intrinsics
> from include/llvm/IntrinsicsPTX.td to lib/Target/PTX/PTXIntrinsics.td
> thus allowing the generation of the file PTXGenIntrinsics.inc which
> will be included by PTXIntrinsicInfo.cpp.
> This is a quite big modification, do you agree with this ?
> Or do you have a better solution.
>
>
>
> I'm opposed to this, mainly because we need the intrinsic definitions to be
> defined during LLVM IR optimization and not just at code-gen time.  This is
> particularly important for pure intrinsics, like llvm.ptx.read.tid.x(),
> where the optimizers can fold multiple calls to these functions into a
> single call.  Without the intrinsic definitions in
> include/llvm/IntrinsicsPTX.td, this optimization would be illegal.
>
>
>
> At the moment, I'm not seeing a clean solution to this.  Overloading the
> intrinsics by writing custom code in PTXIntrinsicInfo.h/.cpp is only a
> partial solution, with the problems mentioned above.  In my mind, the
> cleanest solution would be to just write out explicit intrinsics for each
> possible type.  We can still use multiclasses to an extent:
>
>
>
> multiclass PTXBinaryIntrinsic<string prefix> {
>
>   def _u16 : Intrinsic<[llvm_i16_ty], [llvm_i16_ty, llvm_i16_ty],
> [InstrNoMem]>,
>
>              GCCBuiltin<!strconcat(prefix, "_u16")>;
>
>   // Repeat for s16, u32, s32, u64, s64, f32, f64
>
> }
>
>
>
> defm int_ptx_mad<"__builtin_ptx_mad">;
>
>
>
> It's not the cleanest, but it gets the job done (unless I'm missing
> something).
>
>
>
>
> Also I don't know yet how to make llvm recognize the intrinsics
> defined in lib/Target/PTX/PTXIntrinsics.td, the only other
> backend that does so is MBlaze.
>
> A tentative patch is attached.
>
> Bye,
> Alberto
>
> On Wed, Nov 23, 2011 at 2:36 PM, Justin Holewinski
>
> <justin.holewinski at gmail.com> wrote:
>>
>> On Nov 23, 2011 8:33 AM, "Justin Holewinski" <justin.holewinski at gmail.com>
>> wrote:
>>>
>>>
>>> On Nov 23, 2011 6:57 AM, "Alberto Magni" <alberto.magni86 at gmail.com>
>>> wrote:
>>> >
>>> > On Tue, Nov 22, 2011 at 5:01 PM, Villmow, Micah <Micah.Villmow at amd.com>
>>> > wrote:
>>> > > Alberto,
>>> > >  The AMDIL backend solves your problem with intrinsic overloading
>>> > > this
>>> > > way:
>>> > > def int_AMDIL_mad     : GCCBuiltin<"__amdil_mad">, TernaryIntFloat;
>>> > >
>>> > > Where TernaryIntFloat is defined as:
>>> > > class TernaryIntFloat :
>>> > >          Intrinsic<[llvm_anyfloat_ty], [LLVMMatchType<0>,
>>> > >          LLVMMatchType<0>, LLVMMatchType<0>], []>;
>>> > >
>>> > > This allows us to write a multi-def for int_AMDIL_mad like so:
>>> > > defm MAD  : TernaryIntrinsicFloat<IL_OP_MAD, int_AMDIL_mad>;
>>> > >
>>> > > Where TernaryIntrinsicFloat is defined as:
>>> > > multiclass TernaryIntrinsicFloat<ILOpCode opcode, Intrinsic intr>
>>> > > {
>>> > >  def _f32 : ThreeInOneOut<opcode, (outs GPRF32:$dst),
>>> > >      (ins GPRF32:$src, GPRF32:$src2, GPRF32:$src3),
>>> > >      !strconcat(opcode.Text, " $dst, $src, $src2, $src3"),
>>> > >      [(set GPRF32:$dst,
>>> > >          (intr GPRF32:$src, GPRF32:$src2, GPRF32:$src3))]>;
>>> > >  def _v2f32 : ThreeInOneOut<opcode, (outs GPRV2F32:$dst),
>>> > >      (ins GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3),
>>> > >      !strconcat(opcode.Text, " $dst, $src, $src2, $src3"),
>>> > >      [(set GPRV2F32:$dst,
>>> > >          (intr GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3))]>;
>>> > > ...
>>> > > }
>>> > >
>>> > > Now, this doesn't completely work, because LLVM does not allow
>>> > > overloading of intrinsics values, so there needs to be a little
>>> > > coding in
>>> > > *IntrinsicInfo class.
>>> > > AMD always encodes builtin names as __amdil_mad_f32,
>>> > > __amdil_mad_v2f32, __amdil_mad_v4f32, etc....
>>> > > So in the function "*IntrinsicInfo::lookup_name", when attempting to
>>> > > find out what intrinsic the function maps to, the AMDIL backend
>>> > > strips off
>>> > > the type, and then looks up for just '__amdil_mad'.
>>> > >
>>> > > This is how you can do intrinsic overloading in LLVM.
>>> > >
>>> > > Hope this helps,
>>> > > Micah
>>> >
>>> > Thank you Micah, it really does.
>>> >
>>> > At the moment the PTX backend does not have a PTXIntrinsicInfo class,
>>> > the only backend which does so is MBlaze.
>>> > If Justin agrees with the approach I will look on how to generate the
>>> > PTXGenIntrinsics.inc file (I am still learning TableGen)
>>> > required by PTXIntrinsicInfo and write the lookUp method.
>>>
>>> Looks good to me.  For OpenCL support in clang, we definitely need the
>>> built-in function support.  And the total number of intrinsics like this
>>> should be relatively minimal.
>>
>> One thing I forgot to mention:  once these are implemented, it may be
>> worth
>> implementing some instruction selection patterns to collapse icmp/fcmp and
>> select pairs into Max/min whenever it makes sense.
>>
>>>
>>> >
>>> > Cheers,
>>> >
>>> > Alberto
>>> >
>>> > >
>>> > >> -----Original Message-----
>>> > >> From: llvmdev-bounces at cs.uiuc.edu
>>> > >> [mailto:llvmdev-bounces at cs.uiuc.edu]
>>> > >> On Behalf Of Alberto Magni
>>> > >> Sent: Tuesday, November 22, 2011 8:41 AM
>>> > >> To: Justin Holewinski
>>> > >> Cc: LLVM Developers Mailing List
>>> > >> Subject: Re: [LLVMdev] PTX builtin functions.
>>> > >>
>>> > >> On Mon, Nov 21, 2011 at 5:31 PM, Justin Holewinski
>>> > >> <justin.holewinski at gmail.com> wrote:
>>> > >> > On Mon, Nov 21, 2011 at 11:45 AM, Alberto Magni
>>> > >> <alberto.magni86 at gmail.com>
>>> > >> > wrote:
>>> > >> >>
>>> > >> >> On Mon, Nov 21, 2011 at 3:36 PM, Justin Holewinski
>>> > >> >> <justin.holewinski at gmail.com> wrote:
>>> > >> >> > On Mon, Nov 21, 2011 at 7:01 AM, Alberto Magni
>>> > >> >> > <alberto.magni86 at gmail.com>
>>> > >> >> > wrote:
>>> > >> >> >>
>>> > >> >> >> Hi Justin,
>>> > >> >> >>
>>> > >> >> >> attached you find the patch for the integer max instruction.
>>> > >> >> >> The multiclass PTX_INTRINSIC_INT3 in file
>>> > >> PTXIntrinsicInstrInfo.td
>>> > >> >> >> is almost an exact copy of  PTX_INT3 in PTXInstrInfo.td, maybe
>>> > >> >> >> a modification of this class can be defined in a separate
>>> > >> >> >> file.
>>> > >> >> >
>>> > >> >> >
>>> > >> >> > I'm copying llvmdev.  We should keep discussions like this on
>>> > >> >> > the
>>> > >> list
>>> > >> >> > for
>>> > >> >> > the benefit of others.
>>> > >> >>
>>> > >> >> I always forget "Reply to All".
>>> > >> >>
>>> > >> >> > We can probably factor out a generic description, or even just
>>> > >> >> > use
>>> > >> the
>>> > >> >> > PTX_INT3 multiclass directly.  The PTXIntrinsicInstrInfo.td
>>> > >> >> > file
>>> > >> is
>>> > >> >> > included
>>> > >> >> > by PTXInstrInfo.td, so anything defined in PTXInstrInfo.td is
>>> > >> available
>>> > >> >> > in
>>> > >> >> > PTXIntrinsicInstrInfo.td.
>>> > >> >>
>>> > >> >> I agree with you but my class PTX_INTRINSIC_INT3 works with an
>>> > >> Intrinsic
>>> > >> >> and not with a SDNode, like PTX_INT3.
>>> > >> >> PTX_INTRINSIC_INT3 also requires the presence of the type of
>>> > >> >> the immediate in the pattern, e.g. (i32 imm:$b).
>>> > >> >
>>> > >> >
>>> > >> > Alright, I'm fine with that.
>>> > >> >
>>> > >> >>
>>> > >> >>
>>> > >> >> >>
>>> > >> >> >>
>>> > >> >> >> Do you agree with this approach ?
>>> > >> >> >> Also, do you think that a class like PTX_INTRINSIC_INT3_SIGNED
>>> > >> >> >> (a clone of PTX_INT3_SIGNED) is required ?
>>> > >> >> >
>>> > >> >> >
>>> > >> >> > Yes, I believe we should split these into signed and unsigned
>>> > >> variants.
>>> > >> >> >  The
>>> > >> >> > results of max/min operations can definitely be different
>>> > >> depending on
>>> > >> >> > whether the operands are signed or unsigned.  Since this
>>> > >> information is
>>> > >> >> > not
>>> > >> >> > encoded in LLVM types, we may want to create two versions for
>>> > >> >> > each
>>> > >> >> > integer
>>> > >> >> > type; something like:
>>> > >> >> >
>>> > >> >> > i32 @llvm.ptx.max.signed.i32(i32, i32)
>>> > >> >> > i32 @llvm.ptx.max.unsigned.i32(i32, i32)
>>> > >> >>
>>> > >> >> Yes, this the only way.
>>> > >> >
>>> > >> >
>>> > >> > A couple more comments:
>>> > >> >
>>> > >> > Please make sure to set TargetPrefix="ptx" for the intrinsics
>>> > >> (probably best
>>> > >> > in the multiclass, see PTXReadSpecialRegisterIntrinsic_r32)]
>>> > >>
>>> > >> Ok
>>> > >>
>>> > >> > I'm not sure how to define a GCCBuiltin for an intrinsic that can
>>> > >> take
>>> > >> > multiple types, but it's probably worth looking into so we can
>>> > >> > expose
>>> > >> this
>>> > >> > intrinsic to Clang.
>>> > >>
>>> > >> This could be an issue. I looked for something similar in other
>>> > >> backends
>>> > >> and I found no previous examples. It may be worth to ask on the ML
>>> > >> explicitly for this.
>>> > >> The only fallback that I see is to define explicitly every intrinsic
>>> > >> for every data type,
>>> > >> but this would prevent the usage of the multiclass for the
>>> > >> definition
>>> > >> of the patterns.
>>> > >>
>>> > >>
>>> > >> Bye.
>>> > >>
>>> > >> >
>>> > >> >
>>> > >> >>
>>> > >> >>
>>> > >> >> >
>>> > >> >> > Otherwise, the patch looks good.
>>> > >> >> >
>>> > >> >> >>
>>> > >> >> >>
>>> > >> >> >> Thanks,
>>> > >> >> >>
>>> > >> >> >> Alberto
>>> > >> >> >>
>>> > >> >> >> On Wed, Nov 16, 2011 at 5:44 PM, Alberto Magni
>>> > >> >> >> <alberto.magni86 at gmail.com> wrote:
>>> > >> >> >> > On Wed, Nov 16, 2011 at 2:17 PM, Justin Holewinski
>>> > >> >> >> > <justin.holewinski at gmail.com> wrote:
>>> > >> >> >> >> On Wed, Nov 16, 2011 at 9:16 AM, Justin Holewinski
>>> > >> >> >> >> <justin.holewinski at gmail.com> wrote:
>>> > >> >> >> >>>
>>> > >> >> >> >>> On Wed, Nov 16, 2011 at 8:05 AM, Alberto Magni
>>> > >> >> >> >>> <alberto.magni86 at gmail.com>
>>> > >> >> >> >>> wrote:
>>> > >> >> >> >>>>
>>> > >> >> >> >>>> Dear Justin,
>>> > >> >> >> >>>>
>>> > >> >> >> >>>> I am trying to add the support for some OpenCL builtin
>>> > >> functions
>>> > >> >> >> >>>> to
>>> > >> >> >> >>>> the PTX backend.
>>> > >> >> >> >>>> The attached file represent the first stub of a patch for
>>> > >> the fmax
>>> > >> >> >> >>>> builtin function.
>>> > >> >> >> >>>
>>> > >> >> >> >>> First off, thanks for helping to improve the PTX back-end!
>>> > >> >> >> >>> There are really two main issues here.  First, OpenCL
>>> > >> >> >> >>> built-
>>> > >> in
>>> > >> >> >> >>> functions
>>> > >> >> >> >>> do not belong in the PTX back-end.  These will be
>>> > >> >> >> >>> implemented
>>> > >> in
>>> > >> >> >> >>> the
>>> > >> >> >> >>> libclc
>>> > >> >> >> >>> library (http://www.pcc.me.uk/~peter/libclc).  The
>>> > >> >> >> >>> back-end
>>> > >> will
>>> > >> >> >> >>> only
>>> > >> >> >> >>> implement PTX intrinsics, which may be used by the OpenCL
>>> > >> built-in
>>> > >> >> >> >>> functions
>>> > >> >> >> >>> in libclc.  However, this particular function (max)
>>> > >> corresponds to
>>> > >> >> >> >>> a
>>> > >> >> >> >>> PTX
>>> > >> >> >> >>> instruction, so it makes sense to implement it as an
>>> > >> intrinsic in
>>> > >> >> >> >>> the
>>> > >> >> >> >>> back-end.
>>> > >> >> >> >>> Second, intrinsic functions require a bit more work.
>>> > >> >> >> >>>  You're
>>> > >> off to
>>> > >> >> >> >>> a
>>> > >> >> >> >>> great start, but intrinsics are implemented a bit
>>> > >> differently.  It
>>> > >> >> >> >>> looks
>>> > >> >> >> >>> like LLVM does not have a max intrinsic, so we'll need to
>>> > >> create
>>> > >> >> >> >>> one.
>>> > >> >> >> >>>  Have
>>> > >> >> >> >>> a look at include/llvm/IntrinsicsPTX.td.  This file
>>> > >> >> >> >>> defines
>>> > >> the
>>> > >> >> >> >>> PTX-specific
>>> > >> >> >> >>> intrinsics.  You can add an intrinsic for max here, and
>>> > >> >> >> >>> then
>>> > >> >> >> >>> implement
>>> > >> >> >> >>> a
>>> > >> >> >> >>> pattern-match in the PTXInstrInfo.td file.  There is no
>>> > >> >> >> >>> need
>>> > >> to
>>> > >> >> >> >>> create
>>> > >> >> >> >>> a new
>>> > >> >> >> >>> SDNode type for intrinsics, unless they require some
>>> > >> >> >> >>> special
>>> > >> >> >> >>> handling
>>> > >> >> >> >>> in the
>>> > >> >> >> >>> C++ code, which I do not see being the case here.
>>> > >> >> >> >>
>>> > >> >> >> >> Sorry, there's a typo here.  The intrinsic pattern matching
>>> > >> goes in
>>> > >> >> >> >> PTXInstrinsicInstrInfo.td.
>>> > >> >> >> >>
>>> > >> >> >> >
>>> > >> >> >> > Thank you for the pointers I will let you know when I have
>>> > >> >> >> > the
>>> > >> first
>>> > >> >> >> > patch.
>>> > >> >> >> >
>>> > >> >> >> >>>
>>> > >> >> >> >>> When you define a new intrinsic, use the following
>>> > >> >> >> >>> template
>>> > >> as a
>>> > >> >> >> >>> name:
>>> > >> >> >> >>> int_ptx_max.  This will define the LLVM intrinsic as
>>> > >> >> >> >>> @llvm.ptx.max().
>>> > >> >> >> >>>  Please follow the same convention when naming the
>>> > >> __builtin_*
>>> > >> >> >> >>> function.
>>> > >> >> >> >>>
>>> > >> >> >> >>>>
>>> > >> >> >> >>>> The test case I am trying is the following:
>>> > >> >> >> >>>>
>>> > >> >> >> >>>> define ptx_device float @f(float %x, float %y) {
>>> > >> >> >> >>>> entry:
>>> > >> >> >> >>>>  %z = call float @fmax(float %x, float %y)
>>> > >> >> >> >>>>  ret float %z
>>> > >> >> >> >>>> }
>>> > >> >> >> >>>>
>>> > >> >> >> >>>> declare float @fmax(float, float)
>>> > >> >> >> >>>>
>>> > >> >> >> >>>> But at the moment llc crashes saying that "calls are not
>>> > >> >> >> >>>> supported",
>>> > >> >> >> >>>> this does not
>>> > >> >> >> >>>> happens with llvm builtins like llvm.sqrt.f32
>>> > >> >> >> >>>
>>> > >> >> >> >>> Which version of LLVM are you using?  Calls to PTX device
>>> > >> functions
>>> > >> >> >> >>> have
>>> > >> >> >> >>> been implemented for a little while now, so I'm surprised
>>> > >> >> >> >>> to
>>> > >> see
>>> > >> >> >> >>> that
>>> > >> >> >> >>> error.
>>> > >> >> >> >>>  Perhaps it's because the fmax function is not defined as
>>> > >> >> >> >>> ptx_device.
>>> > >> >> >> >>>
>>> > >> >> >> >
>>> > >> >> >> > This is the testcase that I am using to verify I the max
>>> > >> builtin
>>> > >> >> >> > function I am impementing
>>> > >> >> >> > is actually recognised. I took inspiration from the llvm-
>>> > >> intrinsic.ll
>>> > >> >> >> > test case.
>>> > >> >> >> > The command I am using to compile is:
>>> > >> >> >> >
>>> > >> >> >> > llc -march=ptx32 -mattr=+ptx22 fmax.ll
>>> > >> >> >> >
>>> > >> >> >> > The option -mattr does not seem to have any effect.
>>> > >> >> >> > I tried also with the ptx_device qualifier with the same
>>> > >> outcome.
>>> > >> >> >> > I am using llvm from the svn repository.
>>> > >> >> >> >
>>> > >> >> >> > Bye,
>>> > >> >> >> >
>>> > >> >> >> > Alberto
>>> > >> >> >> >
>>> > >> >> >> >>>>
>>> > >> >> >> >>>> Can you please give me a hint on what I am missing, or
>>> > >> >> >> >>>> some
>>> > >> >> >> >>>> general
>>> > >> >> >> >>>> advice on how
>>> > >> >> >> >>>> to add builtin functions.
>>> > >> >> >> >>>>
>>> > >> >> >> >>>> Thank you in advance,
>>> > >> >> >> >>>>
>>> > >> >> >> >>>> Alberto.
>>> > >> >> >> >>>>
>>> > >> >> >> >>>> _______________________________________________
>>> > >> >> >> >>>> LLVM Developers mailing list
>>> > >> >> >> >>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> > >> >> >> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> > >> >> >> >>>>
>>> > >> >> >> >>>
>>> > >> >> >> >>>
>>> > >> >> >> >>>
>>> > >> >> >> >>> --
>>> > >> >> >> >>>
>>> > >> >> >> >>> Thanks,
>>> > >> >> >> >>> Justin Holewinski
>>> > >> >> >> >>
>>> > >> >> >> >>
>>> > >> >> >> >>
>>> > >> >> >> >> --
>>> > >> >> >> >>
>>> > >> >> >> >> Thanks,
>>> > >> >> >> >> Justin Holewinski
>>> > >> >> >> >>
>>> > >> >> >
>>> > >> >> >
>>> > >> >> >
>>> > >> >> >
>>> > >> >> > --
>>> > >> >> >
>>> > >> >> > Thanks,
>>> > >> >> >
>>> > >> >> > Justin Holewinski
>>> > >> >> >
>>> > >> >
>>> > >> >
>>> > >> >
>>> > >> >
>>> > >> > --
>>> > >> >
>>> > >> > Thanks,
>>> > >> >
>>> > >> > Justin Holewinski
>>> > >> >
>>> > >>
>>> > >> _______________________________________________
>>> > >> LLVM Developers mailing list
>>> > >> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>> > >
>>> > >
>
>
>
>
>
> --
>
> Thanks,
>
>
>
> Justin Holewinski
>
>




More information about the llvm-dev mailing list