<p><br>

On Nov 23, 2011 8:33 AM, "Justin Holewinski" <<a href="mailto:justin.holewinski@gmail.com">justin.holewinski@gmail.com</a>> wrote:<br>

><br>

><br>

> On Nov 23, 2011 6:57 AM, "Alberto Magni" <<a href="mailto:alberto.magni86@gmail.com">alberto.magni86@gmail.com</a>> wrote:<br>

> ><br>

> > On Tue, Nov 22, 2011 at 5:01 PM, Villmow, Micah <<a href="mailto:Micah.Villmow@amd.com">Micah.Villmow@amd.com</a>> wrote:<br>

> > > Alberto,<br>

> > >  The AMDIL backend solves your problem with intrinsic overloading this way:<br>

> > > def int_AMDIL_mad     : GCCBuiltin<"__amdil_mad">, TernaryIntFloat;<br>

> > ><br>

> > > Where TernaryIntFloat is defined as:<br>

> > > class TernaryIntFloat :<br>

> > >          Intrinsic<[llvm_anyfloat_ty], [LLVMMatchType<0>,<br>

> > >          LLVMMatchType<0>, LLVMMatchType<0>], []>;<br>

> > ><br>

> > > This allows us to write a multi-def for int_AMDIL_mad like so:<br>

> > > defm MAD  : TernaryIntrinsicFloat<IL_OP_MAD, int_AMDIL_mad>;<br>

> > ><br>

> > > Where TernaryIntrinsicFloat is defined as:<br>

> > > multiclass TernaryIntrinsicFloat<ILOpCode opcode, Intrinsic intr><br>

> > > {<br>

> > >  def _f32 : ThreeInOneOut<opcode, (outs GPRF32:$dst),<br>

> > >      (ins GPRF32:$src, GPRF32:$src2, GPRF32:$src3),<br>

> > >      !strconcat(opcode.Text, " $dst, $src, $src2, $src3"),<br>

> > >      [(set GPRF32:$dst,<br>

> > >          (intr GPRF32:$src, GPRF32:$src2, GPRF32:$src3))]>;<br>

> > >  def _v2f32 : ThreeInOneOut<opcode, (outs GPRV2F32:$dst),<br>

> > >      (ins GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3),<br>

> > >      !strconcat(opcode.Text, " $dst, $src, $src2, $src3"),<br>

> > >      [(set GPRV2F32:$dst,<br>

> > >          (intr GPRV2F32:$src, GPRV2F32:$src2, GPRV2F32:$src3))]>;<br>

> > > ...<br>

> > > }<br>

> > ><br>

> > > Now, this doesn't completely work, because LLVM does not allow overloading of intrinsics values, so there needs to be a little coding in *IntrinsicInfo class.<br>

> > > AMD always encodes builtin names as __amdil_mad_f32, __amdil_mad_v2f32, __amdil_mad_v4f32, etc....<br>

> > > So in the function "*IntrinsicInfo::lookup_name", when attempting to find out what intrinsic the function maps to, the AMDIL backend strips off the type, and then looks up for just '__amdil_mad'.<br>


> > ><br>

> > > This is how you can do intrinsic overloading in LLVM.<br>

> > ><br>

> > > Hope this helps,<br>

> > > Micah<br>

> ><br>

> > Thank you Micah, it really does.<br>

> ><br>

> > At the moment the PTX backend does not have a PTXIntrinsicInfo class,<br>

> > the only backend which does so is MBlaze.<br>

> > If Justin agrees with the approach I will look on how to generate the<br>

> > PTXGenIntrinsics.inc file (I am still learning TableGen)<br>

> > required by PTXIntrinsicInfo and write the lookUp method.<br>

><br>

> Looks good to me.  For OpenCL support in clang, we definitely need the built-in function support.  And the total number of intrinsics like this should be relatively minimal.</p>

<p>One thing I forgot to mention:  once these are implemented, it may be worth implementing some instruction selection patterns to collapse icmp/fcmp and select pairs into Max/min whenever it makes sense.</p>

<p>><br>

> ><br>

> > Cheers,<br>

> ><br>

> > Alberto<br>

> ><br>

> > ><br>

> > >> -----Original Message-----<br>

> > >> From: <a href="mailto:llvmdev-bounces@cs.uiuc.edu">llvmdev-bounces@cs.uiuc.edu</a> [mailto:<a href="mailto:llvmdev-bounces@cs.uiuc.edu">llvmdev-bounces@cs.uiuc.edu</a>]<br>

> > >> On Behalf Of Alberto Magni<br>

> > >> Sent: Tuesday, November 22, 2011 8:41 AM<br>

> > >> To: Justin Holewinski<br>

> > >> Cc: LLVM Developers Mailing List<br>

> > >> Subject: Re: [LLVMdev] PTX builtin functions.<br>

> > >><br>

> > >> On Mon, Nov 21, 2011 at 5:31 PM, Justin Holewinski<br>

> > >> <<a href="mailto:justin.holewinski@gmail.com">justin.holewinski@gmail.com</a>> wrote:<br>

> > >> > On Mon, Nov 21, 2011 at 11:45 AM, Alberto Magni<br>

> > >> <<a href="mailto:alberto.magni86@gmail.com">alberto.magni86@gmail.com</a>><br>

> > >> > wrote:<br>

> > >> >><br>

> > >> >> On Mon, Nov 21, 2011 at 3:36 PM, Justin Holewinski<br>

> > >> >> <<a href="mailto:justin.holewinski@gmail.com">justin.holewinski@gmail.com</a>> wrote:<br>

> > >> >> > On Mon, Nov 21, 2011 at 7:01 AM, Alberto Magni<br>

> > >> >> > <<a href="mailto:alberto.magni86@gmail.com">alberto.magni86@gmail.com</a>><br>

> > >> >> > wrote:<br>

> > >> >> >><br>

> > >> >> >> Hi Justin,<br>

> > >> >> >><br>

> > >> >> >> attached you find the patch for the integer max instruction.<br>

> > >> >> >> The multiclass PTX_INTRINSIC_INT3 in file<br>

> > >> PTXIntrinsicInstrInfo.td<br>

> > >> >> >> is almost an exact copy of  PTX_INT3 in PTXInstrInfo.td, maybe<br>

> > >> >> >> a modification of this class can be defined in a separate file.<br>

> > >> >> ><br>

> > >> >> ><br>

> > >> >> > I'm copying llvmdev.  We should keep discussions like this on the<br>

> > >> list<br>

> > >> >> > for<br>

> > >> >> > the benefit of others.<br>

> > >> >><br>

> > >> >> I always forget "Reply to All".<br>

> > >> >><br>

> > >> >> > We can probably factor out a generic description, or even just use<br>

> > >> the<br>

> > >> >> > PTX_INT3 multiclass directly.  The PTXIntrinsicInstrInfo.td file<br>

> > >> is<br>

> > >> >> > included<br>

> > >> >> > by PTXInstrInfo.td, so anything defined in PTXInstrInfo.td is<br>

> > >> available<br>

> > >> >> > in<br>

> > >> >> > PTXIntrinsicInstrInfo.td.<br>

> > >> >><br>

> > >> >> I agree with you but my class PTX_INTRINSIC_INT3 works with an<br>

> > >> Intrinsic<br>

> > >> >> and not with a SDNode, like PTX_INT3.<br>

> > >> >> PTX_INTRINSIC_INT3 also requires the presence of the type of<br>

> > >> >> the immediate in the pattern, e.g. (i32 imm:$b).<br>

> > >> ><br>

> > >> ><br>

> > >> > Alright, I'm fine with that.<br>

> > >> ><br>

> > >> >><br>

> > >> >><br>

> > >> >> >><br>

> > >> >> >><br>

> > >> >> >> Do you agree with this approach ?<br>

> > >> >> >> Also, do you think that a class like PTX_INTRINSIC_INT3_SIGNED<br>

> > >> >> >> (a clone of PTX_INT3_SIGNED) is required ?<br>

> > >> >> ><br>

> > >> >> ><br>

> > >> >> > Yes, I believe we should split these into signed and unsigned<br>

> > >> variants.<br>

> > >> >> >  The<br>

> > >> >> > results of max/min operations can definitely be different<br>

> > >> depending on<br>

> > >> >> > whether the operands are signed or unsigned.  Since this<br>

> > >> information is<br>

> > >> >> > not<br>

> > >> >> > encoded in LLVM types, we may want to create two versions for each<br>

> > >> >> > integer<br>

> > >> >> > type; something like:<br>

> > >> >> ><br>

> > >> >> > i32 @llvm.ptx.max.signed.i32(i32, i32)<br>

> > >> >> > i32 @llvm.ptx.max.unsigned.i32(i32, i32)<br>

> > >> >><br>

> > >> >> Yes, this the only way.<br>

> > >> ><br>

> > >> ><br>

> > >> > A couple more comments:<br>

> > >> ><br>

> > >> > Please make sure to set TargetPrefix="ptx" for the intrinsics<br>

> > >> (probably best<br>

> > >> > in the multiclass, see PTXReadSpecialRegisterIntrinsic_r32)]<br>

> > >><br>

> > >> Ok<br>

> > >><br>

> > >> > I'm not sure how to define a GCCBuiltin for an intrinsic that can<br>

> > >> take<br>

> > >> > multiple types, but it's probably worth looking into so we can expose<br>

> > >> this<br>

> > >> > intrinsic to Clang.<br>

> > >><br>

> > >> This could be an issue. I looked for something similar in other<br>

> > >> backends<br>

> > >> and I found no previous examples. It may be worth to ask on the ML<br>

> > >> explicitly for this.<br>

> > >> The only fallback that I see is to define explicitly every intrinsic<br>

> > >> for every data type,<br>

> > >> but this would prevent the usage of the multiclass for the definition<br>

> > >> of the patterns.<br>

> > >><br>

> > >><br>

> > >> Bye.<br>

> > >><br>

> > >> ><br>

> > >> ><br>

> > >> >><br>

> > >> >><br>

> > >> >> ><br>

> > >> >> > Otherwise, the patch looks good.<br>

> > >> >> ><br>

> > >> >> >><br>

> > >> >> >><br>

> > >> >> >> Thanks,<br>

> > >> >> >><br>

> > >> >> >> Alberto<br>

> > >> >> >><br>

> > >> >> >> On Wed, Nov 16, 2011 at 5:44 PM, Alberto Magni<br>

> > >> >> >> <<a href="mailto:alberto.magni86@gmail.com">alberto.magni86@gmail.com</a>> wrote:<br>

> > >> >> >> > On Wed, Nov 16, 2011 at 2:17 PM, Justin Holewinski<br>

> > >> >> >> > <<a href="mailto:justin.holewinski@gmail.com">justin.holewinski@gmail.com</a>> wrote:<br>

> > >> >> >> >> On Wed, Nov 16, 2011 at 9:16 AM, Justin Holewinski<br>

> > >> >> >> >> <<a href="mailto:justin.holewinski@gmail.com">justin.holewinski@gmail.com</a>> wrote:<br>

> > >> >> >> >>><br>

> > >> >> >> >>> On Wed, Nov 16, 2011 at 8:05 AM, Alberto Magni<br>

> > >> >> >> >>> <<a href="mailto:alberto.magni86@gmail.com">alberto.magni86@gmail.com</a>><br>

> > >> >> >> >>> wrote:<br>

> > >> >> >> >>>><br>

> > >> >> >> >>>> Dear Justin,<br>

> > >> >> >> >>>><br>

> > >> >> >> >>>> I am trying to add the support for some OpenCL builtin<br>

> > >> functions<br>

> > >> >> >> >>>> to<br>

> > >> >> >> >>>> the PTX backend.<br>

> > >> >> >> >>>> The attached file represent the first stub of a patch for<br>

> > >> the fmax<br>

> > >> >> >> >>>> builtin function.<br>

> > >> >> >> >>><br>

> > >> >> >> >>> First off, thanks for helping to improve the PTX back-end!<br>

> > >> >> >> >>> There are really two main issues here.  First, OpenCL built-<br>

> > >> in<br>

> > >> >> >> >>> functions<br>

> > >> >> >> >>> do not belong in the PTX back-end.  These will be implemented<br>

> > >> in<br>

> > >> >> >> >>> the<br>

> > >> >> >> >>> libclc<br>

> > >> >> >> >>> library (<a href="http://www.pcc.me.uk/~peter/libclc">http://www.pcc.me.uk/~peter/libclc</a>).  The back-end<br>

> > >> will<br>

> > >> >> >> >>> only<br>

> > >> >> >> >>> implement PTX intrinsics, which may be used by the OpenCL<br>

> > >> built-in<br>

> > >> >> >> >>> functions<br>

> > >> >> >> >>> in libclc.  However, this particular function (max)<br>

> > >> corresponds to<br>

> > >> >> >> >>> a<br>

> > >> >> >> >>> PTX<br>

> > >> >> >> >>> instruction, so it makes sense to implement it as an<br>

> > >> intrinsic in<br>

> > >> >> >> >>> the<br>

> > >> >> >> >>> back-end.<br>

> > >> >> >> >>> Second, intrinsic functions require a bit more work.  You're<br>

> > >> off to<br>

> > >> >> >> >>> a<br>

> > >> >> >> >>> great start, but intrinsics are implemented a bit<br>

> > >> differently.  It<br>

> > >> >> >> >>> looks<br>

> > >> >> >> >>> like LLVM does not have a max intrinsic, so we'll need to<br>

> > >> create<br>

> > >> >> >> >>> one.<br>

> > >> >> >> >>>  Have<br>

> > >> >> >> >>> a look at include/llvm/IntrinsicsPTX.td.  This file defines<br>

> > >> the<br>

> > >> >> >> >>> PTX-specific<br>

> > >> >> >> >>> intrinsics.  You can add an intrinsic for max here, and then<br>

> > >> >> >> >>> implement<br>

> > >> >> >> >>> a<br>

> > >> >> >> >>> pattern-match in the PTXInstrInfo.td file.  There is no need<br>

> > >> to<br>

> > >> >> >> >>> create<br>

> > >> >> >> >>> a new<br>

> > >> >> >> >>> SDNode type for intrinsics, unless they require some special<br>

> > >> >> >> >>> handling<br>

> > >> >> >> >>> in the<br>

> > >> >> >> >>> C++ code, which I do not see being the case here.<br>

> > >> >> >> >><br>

> > >> >> >> >> Sorry, there's a typo here.  The intrinsic pattern matching<br>

> > >> goes in<br>

> > >> >> >> >> PTXInstrinsicInstrInfo.td.<br>

> > >> >> >> >><br>

> > >> >> >> ><br>

> > >> >> >> > Thank you for the pointers I will let you know when I have the<br>

> > >> first<br>

> > >> >> >> > patch.<br>

> > >> >> >> ><br>

> > >> >> >> >>><br>

> > >> >> >> >>> When you define a new intrinsic, use the following template<br>

> > >> as a<br>

> > >> >> >> >>> name:<br>

> > >> >> >> >>> int_ptx_max.  This will define the LLVM intrinsic as<br>

> > >> >> >> >>> @llvm.ptx.max().<br>

> > >> >> >> >>>  Please follow the same convention when naming the<br>

> > >> __builtin_*<br>

> > >> >> >> >>> function.<br>

> > >> >> >> >>><br>

> > >> >> >> >>>><br>

> > >> >> >> >>>> The test case I am trying is the following:<br>

> > >> >> >> >>>><br>

> > >> >> >> >>>> define ptx_device float @f(float %x, float %y) {<br>

> > >> >> >> >>>> entry:<br>

> > >> >> >> >>>>  %z = call float @fmax(float %x, float %y)<br>

> > >> >> >> >>>>  ret float %z<br>

> > >> >> >> >>>> }<br>

> > >> >> >> >>>><br>

> > >> >> >> >>>> declare float @fmax(float, float)<br>

> > >> >> >> >>>><br>

> > >> >> >> >>>> But at the moment llc crashes saying that "calls are not<br>

> > >> >> >> >>>> supported",<br>

> > >> >> >> >>>> this does not<br>

> > >> >> >> >>>> happens with llvm builtins like llvm.sqrt.f32<br>

> > >> >> >> >>><br>

> > >> >> >> >>> Which version of LLVM are you using?  Calls to PTX device<br>

> > >> functions<br>

> > >> >> >> >>> have<br>

> > >> >> >> >>> been implemented for a little while now, so I'm surprised to<br>

> > >> see<br>

> > >> >> >> >>> that<br>

> > >> >> >> >>> error.<br>

> > >> >> >> >>>  Perhaps it's because the fmax function is not defined as<br>

> > >> >> >> >>> ptx_device.<br>

> > >> >> >> >>><br>

> > >> >> >> ><br>

> > >> >> >> > This is the testcase that I am using to verify I the max<br>

> > >> builtin<br>

> > >> >> >> > function I am impementing<br>

> > >> >> >> > is actually recognised. I took inspiration from the llvm-<br>

> > >> intrinsic.ll<br>

> > >> >> >> > test case.<br>

> > >> >> >> > The command I am using to compile is:<br>

> > >> >> >> ><br>

> > >> >> >> > llc -march=ptx32 -mattr=+ptx22 fmax.ll<br>

> > >> >> >> ><br>

> > >> >> >> > The option -mattr does not seem to have any effect.<br>

> > >> >> >> > I tried also with the ptx_device qualifier with the same<br>

> > >> outcome.<br>

> > >> >> >> > I am using llvm from the svn repository.<br>

> > >> >> >> ><br>

> > >> >> >> > Bye,<br>

> > >> >> >> ><br>

> > >> >> >> > Alberto<br>

> > >> >> >> ><br>

> > >> >> >> >>>><br>

> > >> >> >> >>>> Can you please give me a hint on what I am missing, or some<br>

> > >> >> >> >>>> general<br>

> > >> >> >> >>>> advice on how<br>

> > >> >> >> >>>> to add builtin functions.<br>

> > >> >> >> >>>><br>

> > >> >> >> >>>> Thank you in advance,<br>

> > >> >> >> >>>><br>

> > >> >> >> >>>> Alberto.<br>

> > >> >> >> >>>><br>

> > >> >> >> >>>> _______________________________________________<br>

> > >> >> >> >>>> LLVM Developers mailing list<br>

> > >> >> >> >>>> <a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a><br>

> > >> >> >> >>>> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

> > >> >> >> >>>><br>

> > >> >> >> >>><br>

> > >> >> >> >>><br>

> > >> >> >> >>><br>

> > >> >> >> >>> --<br>

> > >> >> >> >>><br>

> > >> >> >> >>> Thanks,<br>

> > >> >> >> >>> Justin Holewinski<br>

> > >> >> >> >><br>

> > >> >> >> >><br>

> > >> >> >> >><br>

> > >> >> >> >> --<br>

> > >> >> >> >><br>

> > >> >> >> >> Thanks,<br>

> > >> >> >> >> Justin Holewinski<br>

> > >> >> >> >><br>

> > >> >> ><br>

> > >> >> ><br>

> > >> >> ><br>

> > >> >> ><br>

> > >> >> > --<br>

> > >> >> ><br>

> > >> >> > Thanks,<br>

> > >> >> ><br>

> > >> >> > Justin Holewinski<br>

> > >> >> ><br>

> > >> ><br>

> > >> ><br>

> > >> ><br>

> > >> ><br>

> > >> > --<br>

> > >> ><br>

> > >> > Thanks,<br>

> > >> ><br>

> > >> > Justin Holewinski<br>

> > >> ><br>

> > >><br>

> > >> _______________________________________________<br>

> > >> LLVM Developers mailing list<br>

> > >> <a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a><br>

> > >> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

> > ><br>

> > ><br>

</p>