[LLVMdev] PTX builtin functions.
Hi Justin,
sorry for the delay, I have been busy.
Micah's proposal requires to move the definitions of the intrinsics
from include/llvm/IntrinsicsPTX.td to lib/Target/PTX/PTXIntrinsics.td
thus allowing the generation of the file PTXGenIntrinsics.inc which
will be included by PTXIntrinsicInfo.cpp.
This is a quite big modification, do you agree with this ?
Or do you have a better solution.
Also I don't know yet how to make llvm recognize the intrinsics
defined in lib/Target/PTX/PTXIntrinsics.td, the only other
backend that does so is MBlaze.
A tentative patch is attached.
Bye,
Alberto
>> > > Alberto,
>> > > The AMDIL backend solves your problem with intrinsic overloading this
>> > > way:
>> > > def int_AMDIL_mad : GCCBuiltin<"__amdil_mad">, TernaryIntFloat;
>> > >
>> > > Where TernaryIntFloat is defined as:
>> > > class TernaryIntFloat :
>> > > Intrinsic<[llvm_anyfloat_ty], [LLVMMatchType<0>,
>> > > LLVMMatchType<0>, LLVMMatchType<0>], []>;
>> > >
>> > > This allows us to write a multi-def for int_AMDIL_mad like so:
>> > > defm MAD : TernaryIntrinsicFloat<IL_OP_MAD, int_AMDIL_mad>;
>> > >
>> > > Where TernaryIntrinsicFloat is defined as:
>> > > multiclass TernaryIntrinsicFloat<ILOpCode opcode, Intrinsic intr>
>> > > {
>> > > def _f32 : ThreeInOneOut<opcode, (outs GPRF32:$dst),
>> > > (ins GPRF32:$src, GPRF32:$src2, GPRF32:$src3),
>> > > !strconcat(opcode.Text, " $dst, $src, $src2, $src3"),
>> > > [(set GPRF32:$dst,
>> > > (intr GPRF32:$src, GPRF32:$src2, GPRF32:$src3))]>;
>> > > def _v2f32 : ThreeInOneOut<opcode, (outs GPRV2F32:$dst),
>> > > (ins GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3),
>> > > !strconcat(opcode.Text, " $dst, $src, $src2, $src3"),
>> > > [(set GPRV2F32:$dst,
>> > > (intr GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3))]>;
>> > > ...
>> > > }
>> > >
>> > > Now, this doesn't completely work, because LLVM does not allow
>> > > overloading of intrinsics values, so there needs to be a little coding in
>> > > *IntrinsicInfo class.
>> > > AMD always encodes builtin names as __amdil_mad_f32,
>> > > __amdil_mad_v2f32, __amdil_mad_v4f32, etc....
>> > > So in the function "*IntrinsicInfo::lookup_name", when attempting to
>> > > find out what intrinsic the function maps to, the AMDIL backend strips off
>> > > the type, and then looks up for just '__amdil_mad'.
>> > >
>> > > This is how you can do intrinsic overloading in LLVM.
>> > >
>> > > Hope this helps,
>> > > Micah
>> >
>> > Thank you Micah, it really does.
>> >
>> > At the moment the PTX backend does not have a PTXIntrinsicInfo class,
>> > the only backend which does so is MBlaze.
>> > If Justin agrees with the approach I will look on how to generate the
>> > PTXGenIntrinsics.inc file (I am still learning TableGen)
>> > required by PTXIntrinsicInfo and write the lookUp method.
>> Looks good to me. For OpenCL support in clang, we definitely need the
>> built-in function support. And the total number of intrinsics like this
>> should be relatively minimal.
> One thing I forgot to mention: once these are implemented, it may be worth
> implementing some instruction selection patterns to collapse icmp/fcmp and
> select pairs into Max/min whenever it makes sense.
>
>> > >>
>> > >> >> >>
>> > >> >> >> Hi Justin,
>> > >> >> >>
>> > >> >> >> attached you find the patch for the integer max instruction.
>> > >> >> >> The multiclass PTX_INTRINSIC_INT3 in file
>> > >> PTXIntrinsicInstrInfo.td
>> > >> >> >> is almost an exact copy of PTX_INT3 in PTXInstrInfo.td, maybe
>> > >> >> >> a modification of this class can be defined in a separate file.
>> > >> >> >
>> > >> >> >
I'm copying llvmdev. We should keep discussions like this on the list for
the benefit of others.
>> > >> >> > the
>> > >> list
>> > >> >> > for
>> > >> >> > the benefit of others.
>> > >> >>
>> > >> >> I always forget "Reply to All".
>> > >> >>
>> > >> >> > We can probably factor out a generic description, or even just
>> > >> >> > use
>> > >> the
>> > >> >> > PTX_INT3 multiclass directly. The PTXIntrinsicInstrInfo.td file
>> > >> is
>> > >> >> > included
>> > >> >> > by PTXInstrInfo.td, so anything defined in PTXInstrInfo.td is
>> > >> available
>> > >> >> > in
>> > >> >> > PTXIntrinsicInstrInfo.td.
>> > >> >>
>> > >> >> I agree with you but my class PTX_INTRINSIC_INT3 works with an
>> > >> Intrinsic
>> > >> >> and not with a SDNode, like PTX_INT3.
>> > >> >> PTX_INTRINSIC_INT3 also requires the presence of the type of
>> > >> >> the immediate in the pattern, e.g. (i32 imm:$b).
>> > >> >
>> > >> >
Alright, I'm fine with that.
>> > >> >
>> > >> >>
>> > >> >>
>> > >> >> >>
>> > >> >> >>
>> > >> >> >> Do you agree with this approach ?
>> > >> >> >> Also, do you think that a class like PTX_INTRINSIC_INT3_SIGNED
>> > >> >> >> (a clone of PTX_INT3_SIGNED) is required ?
>> > >> >> >
>> > >> >> >
>> > >> >> > Yes, I believe we should split these into signed and unsigned
>> > >> variants.
>> > >> >> > The
>> > >> >> > results of max/min operations can definitely be different
>> > >> depending on
>> > >> >> > whether the operands are signed or unsigned. Since this
>> > >> information is
>> > >> >> > not
>> > >> >> > encoded in LLVM types, we may want to create two versions for
>> > >> >> > each
>> > >> >> > integer
>> > >> >> > type; something like:
>> > >> >> >
>> > >> >> > i32 @llvm.ptx.max.signed.i32(i32, i32)
>> > >> >> > i32 @llvm.ptx.max.unsigned.i32(i32, i32)
>> > >> >>
Yes, this the only way.
>> > >> >
>> > >> >
>> > >> > A couple more comments:
>> > >> >
>> > >> > Please make sure to set TargetPrefix="ptx" for the intrinsics
>> > >> (probably best
>> > >> > in the multiclass, see PTXReadSpecialRegisterIntrinsic_r32)]
>> > >>
Ok
>> > >>
>> > >> > I'm not sure how to define a GCCBuiltin for an intrinsic that can
>> > >> take
>> > >> > multiple types, but it's probably worth looking into so we can
>> > >> > expose
>> > >> this
>> > >> > intrinsic to Clang.
>> > >>
>> > >> This could be an issue. I looked for something similar in other
>> > >> backends
>> > >> and I found no previous examples. It may be worth to ask on the ML
>> > >> explicitly for this.
>> > >> The only fallback that I see is to define explicitly every intrinsic
>> > >> for every data type,
>> > >> but this would prevent the usage of the multiclass for the definition
>> > >> of the patterns.
>> > >>
>> > >>
Bye.
>> > >>
Otherwise, the patch looks good.
Thanks,

Alberto
>> > >> >> >> >>>> Dear Justin,
>> > >> >> >> >>>>
>> > >> >> >> >>>> I am trying to add the support for some OpenCL builtin
>> > >> functions
>> > >> >> >> >>>> to
>> > >> >> >> >>>> the PTX backend.
>> > >> >> >> >>>> The attached file represent the first stub of a patch for
>> > >> the fmax
>> > >> >> >> >>>> builtin function.
>> > >> >> >> >>>
>> > >> >> >> >>> First off, thanks for helping to improve the PTX back-end!
>> > >> >> >> >>> There are really two main issues here. First, OpenCL
>> > >> >> >> >>> built-
>> > >> in
>> > >> >> >> >>> functions
>> > >> >> >> >>> do not belong in the PTX back-end. These will be
>> > >> >> >> >>> implemented
>> > >> in
>> > >> >> >> >>> the
>> > >> >> >> >>> libclc
>> > >> >> >> >>> library (http://www.pcc.me.uk/~peter/libclc). The back-end
>> > >> will
>> > >> >> >> >>> only
>> > >> >> >> >>> implement PTX intrinsics, which may be used by the OpenCL
>> > >> built-in
>> > >> >> >> >>> functions
>> > >> >> >> >>> in libclc. However, this particular function (max)
>> > >> corresponds to
>> > >> >> >> >>> a
>> > >> >> >> >>> PTX
>> > >> >> >> >>> instruction, so it makes sense to implement it as an
>> > >> intrinsic in
>> > >> >> >> >>> the
>> > >> >> >> >>> back-end.
>> > >> >> >> >>> Second, intrinsic functions require a bit more work.
>> > >> >> >> >>> You're
>> > >> off to
>> > >> >> >> >>> a
>> > >> >> >> >>> great start, but intrinsics are implemented a bit
>> > >> differently. It
>> > >> >> >> >>> looks
>> > >> >> >> >>> like LLVM does not have a max intrinsic, so we'll need to
>> > >> create
>> > >> >> >> >>> one.
>> > >> >> >> >>> Have
>> > >> >> >> >>> a look at include/llvm/IntrinsicsPTX.td. This file defines
>> > >> the
>> > >> >> >> >>> PTX-specific
>> > >> >> >> >>> intrinsics. You can add an intrinsic for max here, and
>> > >> >> >> >>> then
>> > >> >> >> >>> implement
>> > >> >> >> >>> a
>> > >> >> >> >>> pattern-match in the PTXInstrInfo.td file. There is no
>> > >> >> >> >>> need
>> > >> to
>> > >> >> >> >>> create
>> > >> >> >> >>> a new
>> > >> >> >> >>> SDNode type for intrinsics, unless they require some
>> > >> >> >> >>> special
>> > >> >> >> >>> handling
>> > >> >> >> >>> in the
>> > >> >> >> >>> C++ code, which I do not see being the case here.
>> > >> >> >> >>
Sorry, there's a typo here. The intrinsic pattern matching goes in
PTXInstrinsicInstrInfo.td.
>> > >> goes in
>> > >> >> >> >> PTXInstrinsicInstrInfo.td.
>> > >> >> >> >>
>> > >> >> >> >
Thank you for the pointers I will let you know when I have the first
patch.
>> > >> >> >> > the
>> > >> first
>> > >> >> >> > patch.
>> > >> >> >> >
>> > >> >> >> >>> When you define a new intrinsic, use the following template
>> > >> as a
>> > >> >> >> >>> name:
>> > >> >> >> >>> int_ptx_max. This will define the LLVM intrinsic as
>> > >> >> >> >>> @llvm.ptx.max().
>> > >> >> >> >>> Please follow the same convention when naming the
>> > >> __builtin_*
>> > >> >> >> >>> function.
>> > >> >> >> >>>
>> > >> >> >> >>>>
>> > >> >> >> >>>> The test case I am trying is the following:
>> > >> >> >> >>>>
>> > >> >> >> >>>> define ptx_device float @f(float %x, float %y) {
>> > >> >> >> >>>> entry:
>> > >> >> >> >>>> %z = call float @fmax(float %x, float %y)
>> > >> >> >> >>>> ret float %z
>> > >> >> >> >>>> }
>> > >> >> >> >>>>
>> > >> >> >> >>>> declare float @fmax(float, float)
>> > >> >> >> >>>>
>> > >> >> >> >>>> But at the moment llc crashes saying that "calls are not
>> > >> >> >> >>>> supported",
>> > >> >> >> >>>> this does not
>> > >> >> >> >>>> happens with llvm builtins like llvm.sqrt.f32
>> > >> >> >> >>>
>> > >> >> >> >>> Which version of LLVM are you using? Calls to PTX device
>> > >> functions
>> > >> >> >> >>> have
>> > >> >> >> >>> been implemented for a little while now, so I'm surprised
>> > >> >> >> >>> to
>> > >> see
>> > >> >> >> >>> that
>> > >> >> >> >>> error.
>> > >> >> >> >>> Perhaps it's because the fmax function is not defined as
>> > >> >> >> >>> ptx_device.
>> > >> >> >> >>>
>> > >> >> >> >
>> > >> >> >> > This is the testcase that I am using to verify I the max
>> > >> builtin
>> > >> >> >> > function I am impementing
>> > >> >> >> > is actually recognised. I took inspiration from the llvm-
>> > >> intrinsic.ll
>> > >> >> >> > test case.
>> > >> >> >> > The command I am using to compile is:
>> > >> >> >> >
>> > >> >> >> > llc -march=ptx32 -mattr=+ptx22 fmax.ll
>> > >> >> >> >
>> > >> >> >> > The option -mattr does not seem to have any effect.
>> > >> >> >> > I tried also with the ptx_device qualifier with the same
>> > >> outcome.
>> > >> >> >> > I am using llvm from the svn repository.
>> > >> >> >> >
>> > >> >> >> > Bye,
>> > >> >> >> >
>> > >> >> >> > Alberto
>> > >> >> >> >
>> > >> >> >> >>>> Can you please give me a hint on what I am missing, or
