[PATCH v2 1/1] R600: Limit FMA to EG+ with FP64 hw.

Jan Vesely jan.vesely at rutgers.edu
Tue Oct 14 17:10:28 PDT 2014


On Tue, 2014-10-14 at 14:27 -0700, Matt Arsenault wrote:
> On 10/14/2014 02:19 PM, Jan Vesely wrote:
> > On Tue, 2014-10-14 at 13:10 -0400, Tom Stellard wrote:
> >> On Mon, Oct 13, 2014 at 11:10:05AM -0400, Jan Vesely wrote:
> >>> v2: fixup nested predicates
> >>>
> >>> Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
> >>> ---
> >>>   lib/Target/R600/AMDGPUISelLowering.cpp   |  5 +++++
> >>>   lib/Target/R600/AMDGPUInstructions.td    |  1 +
> >>>   lib/Target/R600/AMDGPUSubtarget.h        |  4 ++++
> >>>   lib/Target/R600/EvergreenInstructions.td | 16 ++++++++++------
> >>>   4 files changed, 20 insertions(+), 6 deletions(-)
> >> We need to add a test case for at least one of the non-fp64 EG/NI
> >> cards to make sure FMA is not emitted.
> > I dug a bit into this since my card (turks) is not supposed to support
> > fp64, yet FMA both gets generated and runs OK.
> >
> > The first part is due to using +fp64-denormals in AMDGPUSubtarget.cpp,
> > It forces HWFP64 for all targets. After I removed it this patch works as
> > expected. Not sure what the original intention was, isn't the
> > fp64-denormals flag enabled based on GPU features?
> Yes and no. FP64 denormals can be either enabled or disabled on any SI, 
> so it needs to be settable separate from the device features. The 
> feature string has the unhelpful behavior of unsetting the processor and 
> all other features if you disable a feature added to the processor's 
> feature set. It makes sense for fp64-denormals to imply hw fp64, but I 
> guess that is what causes your problem. You can try removing the implies 
> FeatureFP64 from FeatureFP64Denormals

removing implies works as well, I'll add it to the patch.
but it might be a good idea to fix libclc first.

I think we can use the alternate paths from sincos_heleprs.cl as sw fma
implementation.

> 
> >
> > I'm not sure about the second part. Either the manual is wrong and FMA
> > does not require FP64. Or turks does support fp64.
> > Is there a way to check this?
> > both EG and NI manuals only say that DPFP is not available on all r7xx
> > products (which I think is a copy paste error).
> >
> > jan
> Turks does not support fp64, and r770 definitely supports fp64.

just out of curiosity. Is this based on some internal docs?
I only checked supported OpenGL version and 4.0 requires both fp64 and
fma, fma is also required by openCL 1.1.

jan

> 
> >
> >>> diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp b/lib/Target/R600/AMDGPUISelLowering.cpp
> >>> index 6fd4317..b03ec72 100644
> >>> --- a/lib/Target/R600/AMDGPUISelLowering.cpp
> >>> +++ b/lib/Target/R600/AMDGPUISelLowering.cpp
> >>> @@ -244,6 +244,11 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(TargetMachine &TM) :
> >>>       setOperationAction(ISD::FCOPYSIGN, MVT::f64, Expand);
> >>>     }
> >>>   
> >>> +  if (!Subtarget->hasFMA()) {
> >>> +    setOperationAction(ISD::FMA, MVT::f32, Expand);
> >>> +    setOperationAction(ISD::FMA, MVT::f64, Expand);
> >>> +  }
> >>> +
> >>>     setOperationAction(ISD::FP16_TO_FP, MVT::f64, Expand);
> >>>   
> >>>     setLoadExtAction(ISD::EXTLOAD, MVT::f16, Expand);
> >>> diff --git a/lib/Target/R600/AMDGPUInstructions.td b/lib/Target/R600/AMDGPUInstructions.td
> >>> index a608627..e1dec7e 100644
> >>> --- a/lib/Target/R600/AMDGPUInstructions.td
> >>> +++ b/lib/Target/R600/AMDGPUInstructions.td
> >>> @@ -34,6 +34,7 @@ class AMDGPUShaderInst <dag outs, dag ins, string asm, list<dag> pattern>
> >>>   
> >>>   }
> >>>   
> >>> +def HWFP64 : Predicate<"Subtarget.hasHWFP64()">;
> >>>   def FP32Denormals : Predicate<"Subtarget.hasFP32Denormals()">;
> >>>   def FP64Denormals : Predicate<"Subtarget.hasFP64Denormals()">;
> >>>   def UnsafeFPMath : Predicate<"TM.Options.UnsafeFPMath">;
> >>> diff --git a/lib/Target/R600/AMDGPUSubtarget.h b/lib/Target/R600/AMDGPUSubtarget.h
> >>> index 55a0c58..2bba6e0 100644
> >>> --- a/lib/Target/R600/AMDGPUSubtarget.h
> >>> +++ b/lib/Target/R600/AMDGPUSubtarget.h
> >>> @@ -169,6 +169,10 @@ public:
> >>>       return (getGeneration() >= EVERGREEN);
> >>>     }
> >>>   
> >>> +  bool hasFMA() const {
> >>> +    return (getGeneration() >= EVERGREEN) && hasHWFP64();
> >>> +  }
> >>> +
> >>>     bool IsIRStructurizerEnabled() const {
> >>>       return EnableIRStructurizer;
> >>>     }
> >>> diff --git a/lib/Target/R600/EvergreenInstructions.td b/lib/Target/R600/EvergreenInstructions.td
> >>> index 8117b60..92e37cd 100644
> >>> --- a/lib/Target/R600/EvergreenInstructions.td
> >>> +++ b/lib/Target/R600/EvergreenInstructions.td
> >>> @@ -257,11 +257,16 @@ def VTX_READ_GLOBAL_128_eg : VTX_READ_128_eg <1,
> >>>   
> >>>   let Predicates = [isEGorCayman] in {
> >>>   
> >>> -// Should be predicated on FeatureFP64
> >>> -// def FMA_64 : R600_3OP <
> >>> -//   0xA, "FMA_64",
> >>> -//   [(set f64:$dst, (fma f64:$src0, f64:$src1, f64:$src2))]
> >>> -// >;
> >>> +let Predicates = [HWFP64,isEGorCayman] in {
> >>> +
> >>> +//def FMA_64 : R600_3OP <
> >>> +//  0xA, "FMA_64",
> >>> +//  [(set f64:$dst, (fma f64:$src0, f64:$src1, f64:$src2))]
> >>> +//>;
> >>> +
> >>> +def FMA_eg : FMA_Common<0x7>;
> >>> +
> >>> +}
> >>>   
> >>>   // BFE_UINT - bit_extract, an optimization for mask and shift
> >>>   // Src0 = Input
> >>> @@ -319,7 +324,6 @@ def BIT_ALIGN_INT_eg : R600_3OP <0xC, "BIT_ALIGN_INT", [], VecALU>;
> >>>   def : ROTRPattern <BIT_ALIGN_INT_eg>;
> >>>   def MULADD_eg : MULADD_Common<0x14>;
> >>>   def MULADD_IEEE_eg : MULADD_IEEE_Common<0x18>;
> >>> -def FMA_eg : FMA_Common<0x7>;
> >>>   def ASHR_eg : ASHR_Common<0x15>;
> >>>   def LSHR_eg : LSHR_Common<0x16>;
> >>>   def LSHL_eg : LSHL_Common<0x17>;
> >>> -- 
> >>> 1.9.3
> >>>
> 

-- 
Jan Vesely <jan.vesely at rutgers.edu>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141014/cc8a7cff/attachment.sig>


More information about the llvm-commits mailing list