[LLVMdev] PTX builtin functions.
Villmow, Micah
Micah.Villmow at amd.com
Thu Dec 8 08:36:03 PST 2011
It is my understanding that all you need to do is specify let isTarget = 1 in your .td file and it will generate target specific intrinsics. This should allow you to keep the IntrinsicsPTX.td file in the same location.
Micah
From: Justin Holewinski [mailto:justin.holewinski at gmail.com]
Sent: Monday, December 05, 2011 6:13 AM
To: Alberto Magni
Cc: Villmow, Micah; LLVM Developers Mailing List
Subject: Re: [LLVMdev] PTX builtin functions.
On Sun, Dec 4, 2011 at 1:10 PM, Alberto Magni <alberto.magni86 at gmail.com<mailto:alberto.magni86 at gmail.com>> wrote:
Hi Justin,
sorry for the delay, I have been busy.
Micah's proposal requires to move the definitions of the intrinsics
from include/llvm/IntrinsicsPTX.td to lib/Target/PTX/PTXIntrinsics.td
thus allowing the generation of the file PTXGenIntrinsics.inc which
will be included by PTXIntrinsicInfo.cpp.
This is a quite big modification, do you agree with this ?
Or do you have a better solution.
I'm opposed to this, mainly because we need the intrinsic definitions to be defined during LLVM IR optimization and not just at code-gen time. This is particularly important for pure intrinsics, like llvm.ptx.read.tid.x(), where the optimizers can fold multiple calls to these functions into a single call. Without the intrinsic definitions in include/llvm/IntrinsicsPTX.td, this optimization would be illegal.
At the moment, I'm not seeing a clean solution to this. Overloading the intrinsics by writing custom code in PTXIntrinsicInfo.h/.cpp is only a partial solution, with the problems mentioned above. In my mind, the cleanest solution would be to just write out explicit intrinsics for each possible type. We can still use multiclasses to an extent:
multiclass PTXBinaryIntrinsic<string prefix> {
def _u16 : Intrinsic<[llvm_i16_ty], [llvm_i16_ty, llvm_i16_ty], [InstrNoMem]>,
GCCBuiltin<!strconcat(prefix, "_u16")>;
// Repeat for s16, u32, s32, u64, s64, f32, f64
}
defm int_ptx_mad<"__builtin_ptx_mad">;
It's not the cleanest, but it gets the job done (unless I'm missing something).
Also I don't know yet how to make llvm recognize the intrinsics
defined in lib/Target/PTX/PTXIntrinsics.td, the only other
backend that does so is MBlaze.
A tentative patch is attached.
Bye,
Alberto
On Wed, Nov 23, 2011 at 2:36 PM, Justin Holewinski
<justin.holewinski at gmail.com<mailto:justin.holewinski at gmail.com>> wrote:
>
> On Nov 23, 2011 8:33 AM, "Justin Holewinski" <justin.holewinski at gmail.com<mailto:justin.holewinski at gmail.com>>
> wrote:
>>
>>
>> On Nov 23, 2011 6:57 AM, "Alberto Magni" <alberto.magni86 at gmail.com<mailto:alberto.magni86 at gmail.com>>
>> wrote:
>> >
>> > On Tue, Nov 22, 2011 at 5:01 PM, Villmow, Micah <Micah.Villmow at amd.com<mailto:Micah.Villmow at amd.com>>
>> > wrote:
>> > > Alberto,
>> > > The AMDIL backend solves your problem with intrinsic overloading this
>> > > way:
>> > > def int_AMDIL_mad : GCCBuiltin<"__amdil_mad">, TernaryIntFloat;
>> > >
>> > > Where TernaryIntFloat is defined as:
>> > > class TernaryIntFloat :
>> > > Intrinsic<[llvm_anyfloat_ty], [LLVMMatchType<0>,
>> > > LLVMMatchType<0>, LLVMMatchType<0>], []>;
>> > >
>> > > This allows us to write a multi-def for int_AMDIL_mad like so:
>> > > defm MAD : TernaryIntrinsicFloat<IL_OP_MAD, int_AMDIL_mad>;
>> > >
>> > > Where TernaryIntrinsicFloat is defined as:
>> > > multiclass TernaryIntrinsicFloat<ILOpCode opcode, Intrinsic intr>
>> > > {
>> > > def _f32 : ThreeInOneOut<opcode, (outs GPRF32:$dst),
>> > > (ins GPRF32:$src, GPRF32:$src2, GPRF32:$src3),
>> > > !strconcat(opcode.Text, " $dst, $src, $src2, $src3"),
>> > > [(set GPRF32:$dst,
>> > > (intr GPRF32:$src, GPRF32:$src2, GPRF32:$src3))]>;
>> > > def _v2f32 : ThreeInOneOut<opcode, (outs GPRV2F32:$dst),
>> > > (ins GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3),
>> > > !strconcat(opcode.Text, " $dst, $src, $src2, $src3"),
>> > > [(set GPRV2F32:$dst,
>> > > (intr GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3))]>;
>> > > ...
>> > > }
>> > >
>> > > Now, this doesn't completely work, because LLVM does not allow
>> > > overloading of intrinsics values, so there needs to be a little coding in
>> > > *IntrinsicInfo class.
>> > > AMD always encodes builtin names as __amdil_mad_f32,
>> > > __amdil_mad_v2f32, __amdil_mad_v4f32, etc....
>> > > So in the function "*IntrinsicInfo::lookup_name", when attempting to
>> > > find out what intrinsic the function maps to, the AMDIL backend strips off
>> > > the type, and then looks up for just '__amdil_mad'.
>> > >
>> > > This is how you can do intrinsic overloading in LLVM.
>> > >
>> > > Hope this helps,
>> > > Micah
>> >
>> > Thank you Micah, it really does.
>> >
>> > At the moment the PTX backend does not have a PTXIntrinsicInfo class,
>> > the only backend which does so is MBlaze.
>> > If Justin agrees with the approach I will look on how to generate the
>> > PTXGenIntrinsics.inc file (I am still learning TableGen)
>> > required by PTXIntrinsicInfo and write the lookUp method.
>>
>> Looks good to me. For OpenCL support in clang, we definitely need the
>> built-in function support. And the total number of intrinsics like this
>> should be relatively minimal.
>
> One thing I forgot to mention: once these are implemented, it may be worth
> implementing some instruction selection patterns to collapse icmp/fcmp and
> select pairs into Max/min whenever it makes sense.
>
>>
>> >
>> > Cheers,
>> >
>> > Alberto
>> >
>> > >
>> > >> -----Original Message-----
>> > >> From: llvmdev-bounces at cs.uiuc.edu<mailto:llvmdev-bounces at cs.uiuc.edu>
>> > >> [mailto:llvmdev-bounces at cs.uiuc.edu<mailto:llvmdev-bounces at cs.uiuc.edu>]
>> > >> On Behalf Of Alberto Magni
>> > >> Sent: Tuesday, November 22, 2011 8:41 AM
>> > >> To: Justin Holewinski
>> > >> Cc: LLVM Developers Mailing List
>> > >> Subject: Re: [LLVMdev] PTX builtin functions.
>> > >>
>> > >> On Mon, Nov 21, 2011 at 5:31 PM, Justin Holewinski
>> > >> <justin.holewinski at gmail.com<mailto:justin.holewinski at gmail.com>> wrote:
>> > >> > On Mon, Nov 21, 2011 at 11:45 AM, Alberto Magni
>> > >> <alberto.magni86 at gmail.com<mailto:alberto.magni86 at gmail.com>>
>> > >> > wrote:
>> > >> >>
>> > >> >> On Mon, Nov 21, 2011 at 3:36 PM, Justin Holewinski
>> > >> >> <justin.holewinski at gmail.com<mailto:justin.holewinski at gmail.com>> wrote:
>> > >> >> > On Mon, Nov 21, 2011 at 7:01 AM, Alberto Magni
>> > >> >> > <alberto.magni86 at gmail.com<mailto:alberto.magni86 at gmail.com>>
>> > >> >> > wrote:
>> > >> >> >>
>> > >> >> >> Hi Justin,
>> > >> >> >>
>> > >> >> >> attached you find the patch for the integer max instruction.
>> > >> >> >> The multiclass PTX_INTRINSIC_INT3 in file
>> > >> PTXIntrinsicInstrInfo.td
>> > >> >> >> is almost an exact copy of PTX_INT3 in PTXInstrInfo.td, maybe
>> > >> >> >> a modification of this class can be defined in a separate file.
>> > >> >> >
>> > >> >> >
>> > >> >> > I'm copying llvmdev. We should keep discussions like this on
>> > >> >> > the
>> > >> list
>> > >> >> > for
>> > >> >> > the benefit of others.
>> > >> >>
>> > >> >> I always forget "Reply to All".
>> > >> >>
>> > >> >> > We can probably factor out a generic description, or even just
>> > >> >> > use
>> > >> the
>> > >> >> > PTX_INT3 multiclass directly. The PTXIntrinsicInstrInfo.td file
>> > >> is
>> > >> >> > included
>> > >> >> > by PTXInstrInfo.td, so anything defined in PTXInstrInfo.td is
>> > >> available
>> > >> >> > in
>> > >> >> > PTXIntrinsicInstrInfo.td.
>> > >> >>
>> > >> >> I agree with you but my class PTX_INTRINSIC_INT3 works with an
>> > >> Intrinsic
>> > >> >> and not with a SDNode, like PTX_INT3.
>> > >> >> PTX_INTRINSIC_INT3 also requires the presence of the type of
>> > >> >> the immediate in the pattern, e.g. (i32 imm:$b).
>> > >> >
>> > >> >
>> > >> > Alright, I'm fine with that.
>> > >> >
>> > >> >>
>> > >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> Do you agree with this approach ?
>> > >> >> >> Also, do you think that a class like PTX_INTRINSIC_INT3_SIGNED
>> > >> >> >> (a clone of PTX_INT3_SIGNED) is required ?
>> > >> >> >
>> > >> >> >
>> > >> >> > Yes, I believe we should split these into signed and unsigned
>> > >> variants.
>> > >> >> > The
>> > >> >> > results of max/min operations can definitely be different
>> > >> depending on
>> > >> >> > whether the operands are signed or unsigned. Since this
>> > >> information is
>> > >> >> > not
>> > >> >> > encoded in LLVM types, we may want to create two versions for
>> > >> >> > each
>> > >> >> > integer
>> > >> >> > type; something like:
>> > >> >> >
>> > >> >> > i32 @llvm.ptx.max.signed.i32(i32, i32)
>> > >> >> > i32 @llvm.ptx.max.unsigned.i32(i32, i32)
>> > >> >>
>> > >> >> Yes, this the only way.
>> > >> >
>> > >> >
>> > >> > A couple more comments:
>> > >> >
>> > >> > Please make sure to set TargetPrefix="ptx" for the intrinsics
>> > >> (probably best
>> > >> > in the multiclass, see PTXReadSpecialRegisterIntrinsic_r32)]
>> > >>
>> > >> Ok
>> > >>
>> > >> > I'm not sure how to define a GCCBuiltin for an intrinsic that can
>> > >> take
>> > >> > multiple types, but it's probably worth looking into so we can
>> > >> > expose
>> > >> this
>> > >> > intrinsic to Clang.
>> > >>
>> > >> This could be an issue. I looked for something similar in other
>> > >> backends
>> > >> and I found no previous examples. It may be worth to ask on the ML
>> > >> explicitly for this.
>> > >> The only fallback that I see is to define explicitly every intrinsic
>> > >> for every data type,
>> > >> but this would prevent the usage of the multiclass for the definition
>> > >> of the patterns.
>> > >>
>> > >>
>> > >> Bye.
>> > >>
>> > >> >
>> > >> >
>> > >> >>
>> > >> >>
>> > >> >> >
>> > >> >> > Otherwise, the patch looks good.
>> > >> >> >
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> Thanks,
>> > >> >> >>
>> > >> >> >> Alberto
>> > >> >> >>
>> > >> >> >> On Wed, Nov 16, 2011 at 5:44 PM, Alberto Magni
>> > >> >> >> <alberto.magni86 at gmail.com<mailto:alberto.magni86 at gmail.com>> wrote:
>> > >> >> >> > On Wed, Nov 16, 2011 at 2:17 PM, Justin Holewinski
>> > >> >> >> > <justin.holewinski at gmail.com<mailto:justin.holewinski at gmail.com>> wrote:
>> > >> >> >> >> On Wed, Nov 16, 2011 at 9:16 AM, Justin Holewinski
>> > >> >> >> >> <justin.holewinski at gmail.com<mailto:justin.holewinski at gmail.com>> wrote:
>> > >> >> >> >>>
>> > >> >> >> >>> On Wed, Nov 16, 2011 at 8:05 AM, Alberto Magni
>> > >> >> >> >>> <alberto.magni86 at gmail.com<mailto:alberto.magni86 at gmail.com>>
>> > >> >> >> >>> wrote:
>> > >> >> >> >>>>
>> > >> >> >> >>>> Dear Justin,
>> > >> >> >> >>>>
>> > >> >> >> >>>> I am trying to add the support for some OpenCL builtin
>> > >> functions
>> > >> >> >> >>>> to
>> > >> >> >> >>>> the PTX backend.
>> > >> >> >> >>>> The attached file represent the first stub of a patch for
>> > >> the fmax
>> > >> >> >> >>>> builtin function.
>> > >> >> >> >>>
>> > >> >> >> >>> First off, thanks for helping to improve the PTX back-end!
>> > >> >> >> >>> There are really two main issues here. First, OpenCL
>> > >> >> >> >>> built-
>> > >> in
>> > >> >> >> >>> functions
>> > >> >> >> >>> do not belong in the PTX back-end. These will be
>> > >> >> >> >>> implemented
>> > >> in
>> > >> >> >> >>> the
>> > >> >> >> >>> libclc
>> > >> >> >> >>> library (http://www.pcc.me.uk/~peter/libclc). The back-end
>> > >> will
>> > >> >> >> >>> only
>> > >> >> >> >>> implement PTX intrinsics, which may be used by the OpenCL
>> > >> built-in
>> > >> >> >> >>> functions
>> > >> >> >> >>> in libclc. However, this particular function (max)
>> > >> corresponds to
>> > >> >> >> >>> a
>> > >> >> >> >>> PTX
>> > >> >> >> >>> instruction, so it makes sense to implement it as an
>> > >> intrinsic in
>> > >> >> >> >>> the
>> > >> >> >> >>> back-end.
>> > >> >> >> >>> Second, intrinsic functions require a bit more work.
>> > >> >> >> >>> You're
>> > >> off to
>> > >> >> >> >>> a
>> > >> >> >> >>> great start, but intrinsics are implemented a bit
>> > >> differently. It
>> > >> >> >> >>> looks
>> > >> >> >> >>> like LLVM does not have a max intrinsic, so we'll need to
>> > >> create
>> > >> >> >> >>> one.
>> > >> >> >> >>> Have
>> > >> >> >> >>> a look at include/llvm/IntrinsicsPTX.td. This file defines
>> > >> the
>> > >> >> >> >>> PTX-specific
>> > >> >> >> >>> intrinsics. You can add an intrinsic for max here, and
>> > >> >> >> >>> then
>> > >> >> >> >>> implement
>> > >> >> >> >>> a
>> > >> >> >> >>> pattern-match in the PTXInstrInfo.td file. There is no
>> > >> >> >> >>> need
>> > >> to
>> > >> >> >> >>> create
>> > >> >> >> >>> a new
>> > >> >> >> >>> SDNode type for intrinsics, unless they require some
>> > >> >> >> >>> special
>> > >> >> >> >>> handling
>> > >> >> >> >>> in the
>> > >> >> >> >>> C++ code, which I do not see being the case here.
>> > >> >> >> >>
>> > >> >> >> >> Sorry, there's a typo here. The intrinsic pattern matching
>> > >> goes in
>> > >> >> >> >> PTXInstrinsicInstrInfo.td.
>> > >> >> >> >>
>> > >> >> >> >
>> > >> >> >> > Thank you for the pointers I will let you know when I have
>> > >> >> >> > the
>> > >> first
>> > >> >> >> > patch.
>> > >> >> >> >
>> > >> >> >> >>>
>> > >> >> >> >>> When you define a new intrinsic, use the following template
>> > >> as a
>> > >> >> >> >>> name:
>> > >> >> >> >>> int_ptx_max. This will define the LLVM intrinsic as
>> > >> >> >> >>> @llvm.ptx.max().
>> > >> >> >> >>> Please follow the same convention when naming the
>> > >> __builtin_*
>> > >> >> >> >>> function.
>> > >> >> >> >>>
>> > >> >> >> >>>>
>> > >> >> >> >>>> The test case I am trying is the following:
>> > >> >> >> >>>>
>> > >> >> >> >>>> define ptx_device float @f(float %x, float %y) {
>> > >> >> >> >>>> entry:
>> > >> >> >> >>>> %z = call float @fmax(float %x, float %y)
>> > >> >> >> >>>> ret float %z
>> > >> >> >> >>>> }
>> > >> >> >> >>>>
>> > >> >> >> >>>> declare float @fmax(float, float)
>> > >> >> >> >>>>
>> > >> >> >> >>>> But at the moment llc crashes saying that "calls are not
>> > >> >> >> >>>> supported",
>> > >> >> >> >>>> this does not
>> > >> >> >> >>>> happens with llvm builtins like llvm.sqrt.f32
>> > >> >> >> >>>
>> > >> >> >> >>> Which version of LLVM are you using? Calls to PTX device
>> > >> functions
>> > >> >> >> >>> have
>> > >> >> >> >>> been implemented for a little while now, so I'm surprised
>> > >> >> >> >>> to
>> > >> see
>> > >> >> >> >>> that
>> > >> >> >> >>> error.
>> > >> >> >> >>> Perhaps it's because the fmax function is not defined as
>> > >> >> >> >>> ptx_device.
>> > >> >> >> >>>
>> > >> >> >> >
>> > >> >> >> > This is the testcase that I am using to verify I the max
>> > >> builtin
>> > >> >> >> > function I am impementing
>> > >> >> >> > is actually recognised. I took inspiration from the llvm-
>> > >> intrinsic.ll
>> > >> >> >> > test case.
>> > >> >> >> > The command I am using to compile is:
>> > >> >> >> >
>> > >> >> >> > llc -march=ptx32 -mattr=+ptx22 fmax.ll
>> > >> >> >> >
>> > >> >> >> > The option -mattr does not seem to have any effect.
>> > >> >> >> > I tried also with the ptx_device qualifier with the same
>> > >> outcome.
>> > >> >> >> > I am using llvm from the svn repository.
>> > >> >> >> >
>> > >> >> >> > Bye,
>> > >> >> >> >
>> > >> >> >> > Alberto
>> > >> >> >> >
>> > >> >> >> >>>>
>> > >> >> >> >>>> Can you please give me a hint on what I am missing, or
>> > >> >> >> >>>> some
>> > >> >> >> >>>> general
>> > >> >> >> >>>> advice on how
>> > >> >> >> >>>> to add builtin functions.
>> > >> >> >> >>>>
>> > >> >> >> >>>> Thank you in advance,
>> > >> >> >> >>>>
>> > >> >> >> >>>> Alberto.
>> > >> >> >> >>>>
>> > >> >> >> >>>> _______________________________________________
>> > >> >> >> >>>> LLVM Developers mailing list
>> > >> >> >> >>>> LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu
>> > >> >> >> >>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> > >> >> >> >>>>
>> > >> >> >> >>>
>> > >> >> >> >>>
>> > >> >> >> >>>
>> > >> >> >> >>> --
>> > >> >> >> >>>
>> > >> >> >> >>> Thanks,
>> > >> >> >> >>> Justin Holewinski
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >> --
>> > >> >> >> >>
>> > >> >> >> >> Thanks,
>> > >> >> >> >> Justin Holewinski
>> > >> >> >> >>
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> > --
>> > >> >> >
>> > >> >> > Thanks,
>> > >> >> >
>> > >> >> > Justin Holewinski
>> > >> >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> > --
>> > >> >
>> > >> > Thanks,
>> > >> >
>> > >> > Justin Holewinski
>> > >> >
>> > >>
>> > >> _______________________________________________
>> > >> LLVM Developers mailing list
>> > >> LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu
>> > >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> > >
>> > >
--
Thanks,
Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111208/29904c6f/attachment.html>
More information about the llvm-dev
mailing list