[PATCHES] R600/SI: New V_FRACT fix, intrinsic for S_FLBIT_I32, and more

Tue Mar 10 15:21:11 PDT 2015

If we didn't have to deal with fsub, only one pattern would be needed.
A possible solution is to expand fsub, so that it's translated into
(v_add_f32 a, -b), and then convert it to v_sub_f32 in the shrinking
pass if it's possible.

The hardware internally expands (v_sub_f32 a, b) into (v_add_f32 a, -b) anyway.

Marek

On Tue, Mar 10, 2015 at 5:37 PM, Marek Olšák <maraeo at gmail.com> wrote:
> On Tue, Mar 10, 2015 at 3:50 PM, Tom Stellard <tom at stellard.net> wrote:
>> On Thu, Mar 05, 2015 at 10:33:15PM +0100, Marek Olšák wrote:
>>> From e9f7ebe3fa7751e40b7d7cf4fadc17c7c8ef3a4a Mon Sep 17 00:00:00 2001
>>> From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.olsak at amd.com>
>>> Date: Sun, 1 Mar 2015 23:07:48 +0100
>>> Subject: [PATCH 2/3] R600/SI: Expand fract to floor, then only select V_FRACT
>>>  on CI
>>>
>>> V_FRACT is buggy on SI.
>>>
>>> R600-specific code is left intact.
>>>
>>> v2: drop the multiclass, use complex VOP3 patterns
>>> ---
>>>  lib/Target/R600/AMDGPUISelLowering.cpp |  3 ---
>>>  lib/Target/R600/R600ISelLowering.cpp   |  4 +++
>>>  lib/Target/R600/SIISelLowering.cpp     |  6 +++++
>>>  lib/Target/R600/SIInstructions.td      | 22 +++++++++++++++++
>>>  test/CodeGen/R600/llvm.AMDGPU.fract.ll | 45 +++++++++++++++++++++++++++++++---
>>>  5 files changed, 73 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/lib/Target/R600/AMDGPUISelLowering.cpp b/lib/Target/R600/AMDGPUISelLowering.cpp
>>> index 4707279..62a33fa 100644
>>> --- a/lib/Target/R600/AMDGPUISelLowering.cpp
>>> +++ b/lib/Target/R600/AMDGPUISelLowering.cpp
>>> @@ -885,9 +885,6 @@ SDValue AMDGPUTargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
>>>        return LowerIntrinsicIABS(Op, DAG);
>>>      case AMDGPUIntrinsic::AMDGPU_lrp:
>>>        return LowerIntrinsicLRP(Op, DAG);
>>> -    case AMDGPUIntrinsic::AMDGPU_fract:
>>> -    case AMDGPUIntrinsic::AMDIL_fraction: // Legacy name.
>>> -      return DAG.getNode(AMDGPUISD::FRACT, DL, VT, Op.getOperand(1));
>>>
>>>      case AMDGPUIntrinsic::AMDGPU_clamp:
>>>      case AMDGPUIntrinsic::AMDIL_clamp: // Legacy name.
>>> diff --git a/lib/Target/R600/R600ISelLowering.cpp b/lib/Target/R600/R600ISelLowering.cpp
>>> index c738611..cf0a60f 100644
>>> --- a/lib/Target/R600/R600ISelLowering.cpp
>>> +++ b/lib/Target/R600/R600ISelLowering.cpp
>>> @@ -837,6 +837,10 @@ SDValue R600TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const
>>>      case Intrinsic::AMDGPU_rsq:
>>>        // XXX - I'm assuming SI's RSQ_LEGACY matches R600's behavior.
>>>        return DAG.getNode(AMDGPUISD::RSQ_LEGACY, DL, VT, Op.getOperand(1));
>>> +
>>> +    case AMDGPUIntrinsic::AMDGPU_fract:
>>> +    case AMDGPUIntrinsic::AMDIL_fraction: // Legacy name.
>>> +      return DAG.getNode(AMDGPUISD::FRACT, DL, VT, Op.getOperand(1));
>>>      }
>>>      // break out of case ISD::INTRINSIC_WO_CHAIN in switch(Op.getOpcode())
>>>      break;
>>> diff --git a/lib/Target/R600/SIISelLowering.cpp b/lib/Target/R600/SIISelLowering.cpp
>>> index 7d794b8..5c9a9f9 100644
>>> --- a/lib/Target/R600/SIISelLowering.cpp
>>> +++ b/lib/Target/R600/SIISelLowering.cpp
>>> @@ -932,6 +932,12 @@ SDValue SITargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
>>>                         Op.getOperand(1),
>>>                         Op.getOperand(2),
>>>                         Op.getOperand(3));
>>> +
>>> +  case AMDGPUIntrinsic::AMDGPU_fract:
>>> +  case AMDGPUIntrinsic::AMDIL_fraction: // Legacy name.
>>> +    return DAG.getNode(ISD::FSUB, DL, VT, Op.getOperand(1),
>>> +                       DAG.getNode(ISD::FFLOOR, DL, VT, Op.getOperand(1)));
>>> +
>>>    default:
>>>      return AMDGPUTargetLowering::LowerOperation(Op, DAG);
>>>    }
>>> diff --git a/lib/Target/R600/SIInstructions.td b/lib/Target/R600/SIInstructions.td
>>> index ab1f08f..6b9230a 100644
>>> --- a/lib/Target/R600/SIInstructions.td
>>> +++ b/lib/Target/R600/SIInstructions.td
>>> @@ -3288,6 +3288,28 @@ def : Pat <
>>>    (V_CNDMASK_B32_e64 $src0, $src1, $src2)
>>>  >;
>>>
>>> +//===----------------------------------------------------------------------===//
>>> +// Fract Patterns
>>> +//===----------------------------------------------------------------------===//
>>> +
>>> +let Predicates = [isCI] in {
>>> +
>>> +// Convert (x - floor(x)) to fract(x)
>>> +def : Pat <
>>> +  (f32 (fsub (f32 (VOP3Mods f32:$x, i32:$mods)),
>>> +             (f32 (ffloor (f32 (VOP3Mods f32:$x, i32:$mods)))))),
>>> +  (V_FRACT_F32_e64 $mods, $x, DSTCLAMP.NONE, DSTOMOD.NONE)
>>> +>;
>>> +
>>> +// Convert (x + (-floor(x))) to fract(x)
>>> +def : Pat <
>>> +  (f64 (fadd (f64 (VOP3Mods f64:$x, i32:$mods)),
>>> +             (f64 (fneg (f64 (ffloor (f64 (VOP3Mods f64:$x, i32:$mods)))))))),
>>> +  (V_FRACT_F64_e64 $mods, $x, DSTCLAMP.NONE, DSTOMOD.NONE)
>>> +>;
>>> +
>>> +} // End Predicates = [isCI]
>>> +
>>
>> We are there different patterns for f32 and f64?  Also, can we match
>> this pattern on VI too?
>
> isCI includes all CI and later chips.
>
> fsub is expanded to fneg+fadd for f64, which is why we need different patterns.
>
> Marek