[PATCH v2 1/1] R600: Limit FMA to EG+ with FP64 hw.

Jan Vesely jan.vesely at rutgers.edu
Tue Oct 14 17:23:44 PDT 2014


On Tue, 2014-10-14 at 19:42 -0400, Tom Stellard wrote:
> On Tue, Oct 14, 2014 at 05:19:06PM -0400, Jan Vesely wrote:
> > On Tue, 2014-10-14 at 13:10 -0400, Tom Stellard wrote:
> > > On Mon, Oct 13, 2014 at 11:10:05AM -0400, Jan Vesely wrote:
> > > > v2: fixup nested predicates
> > > > 
> > > > Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
> > > > ---
> > > >  lib/Target/R600/AMDGPUISelLowering.cpp   |  5 +++++
> > > >  lib/Target/R600/AMDGPUInstructions.td    |  1 +
> > > >  lib/Target/R600/AMDGPUSubtarget.h        |  4 ++++
> > > >  lib/Target/R600/EvergreenInstructions.td | 16 ++++++++++------
> > > >  4 files changed, 20 insertions(+), 6 deletions(-)
> > > 
> > > We need to add a test case for at least one of the non-fp64 EG/NI
> > > cards to make sure FMA is not emitted.
> > 
> > I dug a bit into this since my card (turks) is not supposed to support
> > fp64, yet FMA both gets generated and runs OK.
> > 
> > The first part is due to using +fp64-denormals in AMDGPUSubtarget.cpp,
> > It forces HWFP64 for all targets. After I removed it this patch works as
> > expected. Not sure what the original intention was, isn't the
> > fp64-denormals flag enabled based on GPU features?
> > 
> > I'm not sure about the second part. Either the manual is wrong and FMA
> > does not require FP64. Or turks does support fp64.
> > Is there a way to check this?
> > both EG and NI manuals only say that DPFP is not available on all r7xx
> > products (which I think is a copy paste error).
> > 
> 
> Even though it appears to work, we need to stick with the documentation
> and disable this instruction on Turks and other devices that don't
> support FP64.  My guess is if you tested it with some corner cases
> it would fail.

yeah, it's a pity. I'll try to fix libclc before reposting this patch.
Aaron reported that sin/cos with FMA worked on his CEDAR as well.
wonder what other fp64 instructions would do, and also whether catalyst
uses it.

jan

> 
> -Tom
> 
> > jan
> > 
> > > 
> > > > 
> > > > diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp b/lib/Target/R600/AMDGPUISelLowering.cpp
> > > > index 6fd4317..b03ec72 100644
> > > > --- a/lib/Target/R600/AMDGPUISelLowering.cpp
> > > > +++ b/lib/Target/R600/AMDGPUISelLowering.cpp
> > > > @@ -244,6 +244,11 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(TargetMachine &TM) :
> > > >      setOperationAction(ISD::FCOPYSIGN, MVT::f64, Expand);
> > > >    }
> > > >  
> > > > +  if (!Subtarget->hasFMA()) {
> > > > +    setOperationAction(ISD::FMA, MVT::f32, Expand);
> > > > +    setOperationAction(ISD::FMA, MVT::f64, Expand);
> > > > +  }
> > > > +
> > > >    setOperationAction(ISD::FP16_TO_FP, MVT::f64, Expand);
> > > >  
> > > >    setLoadExtAction(ISD::EXTLOAD, MVT::f16, Expand);
> > > > diff --git a/lib/Target/R600/AMDGPUInstructions.td b/lib/Target/R600/AMDGPUInstructions.td
> > > > index a608627..e1dec7e 100644
> > > > --- a/lib/Target/R600/AMDGPUInstructions.td
> > > > +++ b/lib/Target/R600/AMDGPUInstructions.td
> > > > @@ -34,6 +34,7 @@ class AMDGPUShaderInst <dag outs, dag ins, string asm, list<dag> pattern>
> > > >  
> > > >  }
> > > >  
> > > > +def HWFP64 : Predicate<"Subtarget.hasHWFP64()">;
> > > >  def FP32Denormals : Predicate<"Subtarget.hasFP32Denormals()">;
> > > >  def FP64Denormals : Predicate<"Subtarget.hasFP64Denormals()">;
> > > >  def UnsafeFPMath : Predicate<"TM.Options.UnsafeFPMath">;
> > > > diff --git a/lib/Target/R600/AMDGPUSubtarget.h b/lib/Target/R600/AMDGPUSubtarget.h
> > > > index 55a0c58..2bba6e0 100644
> > > > --- a/lib/Target/R600/AMDGPUSubtarget.h
> > > > +++ b/lib/Target/R600/AMDGPUSubtarget.h
> > > > @@ -169,6 +169,10 @@ public:
> > > >      return (getGeneration() >= EVERGREEN);
> > > >    }
> > > >  
> > > > +  bool hasFMA() const {
> > > > +    return (getGeneration() >= EVERGREEN) && hasHWFP64();
> > > > +  }
> > > > +
> > > >    bool IsIRStructurizerEnabled() const {
> > > >      return EnableIRStructurizer;
> > > >    }
> > > > diff --git a/lib/Target/R600/EvergreenInstructions.td b/lib/Target/R600/EvergreenInstructions.td
> > > > index 8117b60..92e37cd 100644
> > > > --- a/lib/Target/R600/EvergreenInstructions.td
> > > > +++ b/lib/Target/R600/EvergreenInstructions.td
> > > > @@ -257,11 +257,16 @@ def VTX_READ_GLOBAL_128_eg : VTX_READ_128_eg <1,
> > > >  
> > > >  let Predicates = [isEGorCayman] in {
> > > >  
> > > > -// Should be predicated on FeatureFP64
> > > > -// def FMA_64 : R600_3OP <
> > > > -//   0xA, "FMA_64",
> > > > -//   [(set f64:$dst, (fma f64:$src0, f64:$src1, f64:$src2))]
> > > > -// >;
> > > > +let Predicates = [HWFP64,isEGorCayman] in {
> > > > +
> > > > +//def FMA_64 : R600_3OP <
> > > > +//  0xA, "FMA_64",
> > > > +//  [(set f64:$dst, (fma f64:$src0, f64:$src1, f64:$src2))]
> > > > +//>;
> > > > +
> > > > +def FMA_eg : FMA_Common<0x7>;
> > > > +
> > > > +}
> > > >  
> > > >  // BFE_UINT - bit_extract, an optimization for mask and shift
> > > >  // Src0 = Input
> > > > @@ -319,7 +324,6 @@ def BIT_ALIGN_INT_eg : R600_3OP <0xC, "BIT_ALIGN_INT", [], VecALU>;
> > > >  def : ROTRPattern <BIT_ALIGN_INT_eg>;
> > > >  def MULADD_eg : MULADD_Common<0x14>;
> > > >  def MULADD_IEEE_eg : MULADD_IEEE_Common<0x18>;
> > > > -def FMA_eg : FMA_Common<0x7>;
> > > >  def ASHR_eg : ASHR_Common<0x15>;
> > > >  def LSHR_eg : LSHR_Common<0x16>;
> > > >  def LSHL_eg : LSHL_Common<0x17>;
> > > > -- 
> > > > 1.9.3
> > > > 
> > 
> > -- 
> > Jan Vesely <jan.vesely at rutgers.edu>
> 
> 

-- 
Jan Vesely <jan.vesely at rutgers.edu>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141014/b4fd65c1/attachment.sig>


More information about the llvm-commits mailing list