[PATCH v2 1/1] R600: Limit FMA to EG+ with FP64 hw.
Jan Vesely
jan.vesely at rutgers.edu
Tue Oct 14 17:10:28 PDT 2014
On Tue, 2014-10-14 at 14:27 -0700, Matt Arsenault wrote:
> On 10/14/2014 02:19 PM, Jan Vesely wrote:
> > On Tue, 2014-10-14 at 13:10 -0400, Tom Stellard wrote:
> >> On Mon, Oct 13, 2014 at 11:10:05AM -0400, Jan Vesely wrote:
> >>> v2: fixup nested predicates
> >>>
> >>> Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
> >>> ---
> >>> lib/Target/R600/AMDGPUISelLowering.cpp | 5 +++++
> >>> lib/Target/R600/AMDGPUInstructions.td | 1 +
> >>> lib/Target/R600/AMDGPUSubtarget.h | 4 ++++
> >>> lib/Target/R600/EvergreenInstructions.td | 16 ++++++++++------
> >>> 4 files changed, 20 insertions(+), 6 deletions(-)
> >> We need to add a test case for at least one of the non-fp64 EG/NI
> >> cards to make sure FMA is not emitted.
> > I dug a bit into this since my card (turks) is not supposed to support
> > fp64, yet FMA both gets generated and runs OK.
> >
> > The first part is due to using +fp64-denormals in AMDGPUSubtarget.cpp,
> > It forces HWFP64 for all targets. After I removed it this patch works as
> > expected. Not sure what the original intention was, isn't the
> > fp64-denormals flag enabled based on GPU features?
> Yes and no. FP64 denormals can be either enabled or disabled on any SI,
> so it needs to be settable separate from the device features. The
> feature string has the unhelpful behavior of unsetting the processor and
> all other features if you disable a feature added to the processor's
> feature set. It makes sense for fp64-denormals to imply hw fp64, but I
> guess that is what causes your problem. You can try removing the implies
> FeatureFP64 from FeatureFP64Denormals
removing implies works as well, I'll add it to the patch.
but it might be a good idea to fix libclc first.
I think we can use the alternate paths from sincos_heleprs.cl as sw fma
implementation.
>
> >
> > I'm not sure about the second part. Either the manual is wrong and FMA
> > does not require FP64. Or turks does support fp64.
> > Is there a way to check this?
> > both EG and NI manuals only say that DPFP is not available on all r7xx
> > products (which I think is a copy paste error).
> >
> > jan
> Turks does not support fp64, and r770 definitely supports fp64.
just out of curiosity. Is this based on some internal docs?
I only checked supported OpenGL version and 4.0 requires both fp64 and
fma, fma is also required by openCL 1.1.
jan
>
> >
> >>> diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp b/lib/Target/R600/AMDGPUISelLowering.cpp
> >>> index 6fd4317..b03ec72 100644
> >>> --- a/lib/Target/R600/AMDGPUISelLowering.cpp
> >>> +++ b/lib/Target/R600/AMDGPUISelLowering.cpp
> >>> @@ -244,6 +244,11 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(TargetMachine &TM) :
> >>> setOperationAction(ISD::FCOPYSIGN, MVT::f64, Expand);
> >>> }
> >>>
> >>> + if (!Subtarget->hasFMA()) {
> >>> + setOperationAction(ISD::FMA, MVT::f32, Expand);
> >>> + setOperationAction(ISD::FMA, MVT::f64, Expand);
> >>> + }
> >>> +
> >>> setOperationAction(ISD::FP16_TO_FP, MVT::f64, Expand);
> >>>
> >>> setLoadExtAction(ISD::EXTLOAD, MVT::f16, Expand);
> >>> diff --git a/lib/Target/R600/AMDGPUInstructions.td b/lib/Target/R600/AMDGPUInstructions.td
> >>> index a608627..e1dec7e 100644
> >>> --- a/lib/Target/R600/AMDGPUInstructions.td
> >>> +++ b/lib/Target/R600/AMDGPUInstructions.td
> >>> @@ -34,6 +34,7 @@ class AMDGPUShaderInst <dag outs, dag ins, string asm, list<dag> pattern>
> >>>
> >>> }
> >>>
> >>> +def HWFP64 : Predicate<"Subtarget.hasHWFP64()">;
> >>> def FP32Denormals : Predicate<"Subtarget.hasFP32Denormals()">;
> >>> def FP64Denormals : Predicate<"Subtarget.hasFP64Denormals()">;
> >>> def UnsafeFPMath : Predicate<"TM.Options.UnsafeFPMath">;
> >>> diff --git a/lib/Target/R600/AMDGPUSubtarget.h b/lib/Target/R600/AMDGPUSubtarget.h
> >>> index 55a0c58..2bba6e0 100644
> >>> --- a/lib/Target/R600/AMDGPUSubtarget.h
> >>> +++ b/lib/Target/R600/AMDGPUSubtarget.h
> >>> @@ -169,6 +169,10 @@ public:
> >>> return (getGeneration() >= EVERGREEN);
> >>> }
> >>>
> >>> + bool hasFMA() const {
> >>> + return (getGeneration() >= EVERGREEN) && hasHWFP64();
> >>> + }
> >>> +
> >>> bool IsIRStructurizerEnabled() const {
> >>> return EnableIRStructurizer;
> >>> }
> >>> diff --git a/lib/Target/R600/EvergreenInstructions.td b/lib/Target/R600/EvergreenInstructions.td
> >>> index 8117b60..92e37cd 100644
> >>> --- a/lib/Target/R600/EvergreenInstructions.td
> >>> +++ b/lib/Target/R600/EvergreenInstructions.td
> >>> @@ -257,11 +257,16 @@ def VTX_READ_GLOBAL_128_eg : VTX_READ_128_eg <1,
> >>>
> >>> let Predicates = [isEGorCayman] in {
> >>>
> >>> -// Should be predicated on FeatureFP64
> >>> -// def FMA_64 : R600_3OP <
> >>> -// 0xA, "FMA_64",
> >>> -// [(set f64:$dst, (fma f64:$src0, f64:$src1, f64:$src2))]
> >>> -// >;
> >>> +let Predicates = [HWFP64,isEGorCayman] in {
> >>> +
> >>> +//def FMA_64 : R600_3OP <
> >>> +// 0xA, "FMA_64",
> >>> +// [(set f64:$dst, (fma f64:$src0, f64:$src1, f64:$src2))]
> >>> +//>;
> >>> +
> >>> +def FMA_eg : FMA_Common<0x7>;
> >>> +
> >>> +}
> >>>
> >>> // BFE_UINT - bit_extract, an optimization for mask and shift
> >>> // Src0 = Input
> >>> @@ -319,7 +324,6 @@ def BIT_ALIGN_INT_eg : R600_3OP <0xC, "BIT_ALIGN_INT", [], VecALU>;
> >>> def : ROTRPattern <BIT_ALIGN_INT_eg>;
> >>> def MULADD_eg : MULADD_Common<0x14>;
> >>> def MULADD_IEEE_eg : MULADD_IEEE_Common<0x18>;
> >>> -def FMA_eg : FMA_Common<0x7>;
> >>> def ASHR_eg : ASHR_Common<0x15>;
> >>> def LSHR_eg : LSHR_Common<0x16>;
> >>> def LSHL_eg : LSHL_Common<0x17>;
> >>> --
> >>> 1.9.3
> >>>
>
--
Jan Vesely <jan.vesely at rutgers.edu>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141014/cc8a7cff/attachment.sig>
More information about the llvm-commits
mailing list