[PATCH] R600/SI: Aggressively fold fma and mad

Wed Jan 28 11:05:31 PST 2015

On 01/28/2015 10:11 AM, Tom Stellard wrote:
> On Mon, Jan 26, 2015 at 12:59:24PM -0800, Matt Arsenault wrote:
>
>>  From 19ce04e9893142edfa79078d9b5e9991a9cb7445 Mon Sep 17 00:00:00 2001
>> From: Matt Arsenault <Matthew.Arsenault at amd.com>
>> Date: Sun, 25 Jan 2015 12:56:18 -0800
>> Subject: [PATCH 4/5] R600: Copy aggressive fma combines for mad
>>
>> v_mad_f32 has the same result as the separate add and
>> multiply, and is always full rate, so we should always
>> try to form these as long as we don't need to support
>> denormals. I don't think there isn't a great way to share this
>> code without adding a new generic mad node and a generic
>> check for denormal support.
> Could you add a TLI query something like:
>
> SDValue mergeMulAdd(SDValue A, SDValue B, SDValue C)
>
> so that the target could decide what opcode to use?

I considered something like that, but  I was more seriously considering 
adding FMAD as a generic node which we would mark as Expand if we don't 
want it. I've never been sure why this doesn't already exist, 
considering there is already an fmuladd intrinsic which I believe has 
the same semantics of getting the same result as the separate operations.
>> ---
>>   lib/Target/R600/AMDGPUISelLowering.cpp | 133 ++++++++++
>>   lib/Target/R600/AMDGPUISelLowering.h   |   3 +
>>   lib/Target/R600/AMDGPUInstructions.td  |   5 -
>>   lib/Target/R600/R600Instructions.td    |   2 +-
>>   lib/Target/R600/SIISelLowering.cpp     |  25 +-
>>   lib/Target/R600/SIInstructions.td      |   5 +-
>>   test/CodeGen/R600/mad-combine.ll       | 446 +++++++++++++++++++++++++++++++++
>>   7 files changed, 587 insertions(+), 32 deletions(-)
>>   create mode 100644 test/CodeGen/R600/mad-combine.ll
>>
>> diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp b/lib/Target/R600/AMDGPUISelLowering.cpp
>> index d3897fe..f3769e3 100644
>> --- a/lib/Target/R600/AMDGPUISelLowering.cpp
>> +++ b/lib/Target/R600/AMDGPUISelLowering.cpp
>> @@ -395,6 +395,9 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(TargetMachine &TM) :
>>     setTargetDAGCombine(ISD::SELECT_CC);
>>     setTargetDAGCombine(ISD::STORE);
>>   
>> +  setTargetDAGCombine(ISD::FADD);
>> +  setTargetDAGCombine(ISD::FSUB);
>> +
>>     setBooleanContents(ZeroOrNegativeOneBooleanContent);
>>     setBooleanVectorContents(ZeroOrNegativeOneBooleanContent);
>>   
>> @@ -2419,6 +2422,128 @@ SDValue AMDGPUTargetLowering::performMulCombine(SDNode *N,
>>     return DAG.getSExtOrTrunc(Mul, DL, VT);
>>   }
>>   
>> +// FIXME: Mostly copied directly from generic FMA combines.
>> +// We can form f32 mads as long as denormals are not requested.
> Do you
?
>
>>  From b4f466d606626343ca30a9a8daec35bd61362027 Mon Sep 17 00:00:00 2001
>> From: Matt Arsenault <Matthew.Arsenault at amd.com>
>> Date: Thu, 22 Jan 2015 18:41:24 -0800
>> Subject: [PATCH 5/5] R600/SI: Only form v_mad_f32 without denormals
>>
>> According to some sources, v_mad_f32 does not support them.
> Do we ever have denormals enabled?
Not now by default anywhere, but the hardware has the option. The 
intention is a device should report CL_FP_DENORM, and then by default 
denormals would be supported and disabled by -cl-denorms-are-zero for 
speed. SI+ has a config register to control denormal support, but it's 
not particularly useful for f32 since it forces the instructions to run 
at the rate of the f64 instructions. On VI, most of the instructions 
support denormals at the normal rate, so eventually it should default to 
using them. v_mad_f32 still doesn't support denormals there as far as I 
know.