[llvm] r312095 - [AMDGPU] Use v_max_f* for fcanonicalize
Mekhanoshin, Stanislav via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 29 20:27:01 PDT 2017
Is there a packed max, I do not recall?
Stas
--- Original message ---
From: Matt Arsenault <arsenm2 at gmail.com>
Sent: August 29, 2017 8:09:25 PM
To: Mekhanoshin, Stanislav <Stanislav.Mekhanoshin at amd.com>
CC: llvm-commits at lists.llvm.org
Subject: Re: [llvm] r312095 - [AMDGPU] Use v_max_f* for fcanonicalize
>
>> On Aug 29, 2017, at 20:03, Stanislav Mekhanoshin via llvm-commits <llvm-commits at lists.llvm.org> wrote:
>>
>> Author: rampitec
>> Date: Tue Aug 29 20:03:38 2017
>> New Revision: 312095
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=312095&view=rev
>> Log:
>> [AMDGPU] Use v_max_f* for fcanonicalize
>>
>> If denorms are not flushed we can use max instead of multiplication
>> by 1. For double that is simply faster, while for float and half
>> it is shorter, because mul uses constant bus and VOP3.
>>
>> Differential Revision: https://reviews.llvm.org/D36856
>>
>> Modified:
>> llvm/trunk/lib/Target/AMDGPU/AMDGPUInstructions.td
>> llvm/trunk/lib/Target/AMDGPU/SIInstructions.td
>> llvm/trunk/test/CodeGen/AMDGPU/fcanonicalize-elimination.ll
>> llvm/trunk/test/CodeGen/AMDGPU/fcanonicalize.f16.ll
>> llvm/trunk/test/CodeGen/AMDGPU/fcanonicalize.ll
>>
>> Modified: llvm/trunk/lib/Target/AMDGPU/AMDGPUInstructions.td
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AMDGPU/AMDGPUInstructions.td?rev=312095&r1=312094&r2=312095&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/AMDGPU/AMDGPUInstructions.td (original)
>> +++ llvm/trunk/lib/Target/AMDGPU/AMDGPUInstructions.td Tue Aug 29 20:03:38 2017
>> @@ -42,9 +42,12 @@ class AMDGPUShaderInst <dag outs, dag in
>> field bits<32> Inst = 0xffffffff;
>> }
>>
>> -def FP16Denormals : Predicate<"Subtarget.hasFP16Denormals()">;
>> -def FP32Denormals : Predicate<"Subtarget.hasFP32Denormals()">;
>> -def FP64Denormals : Predicate<"Subtarget.hasFP64Denormals()">;
>> +def FP16Denormals : Predicate<"Subtarget->hasFP16Denormals()">;
>> +def FP32Denormals : Predicate<"Subtarget->hasFP32Denormals()">;
>> +def FP64Denormals : Predicate<"Subtarget->hasFP64Denormals()">;
>> +def NoFP16Denormals : Predicate<"!Subtarget->hasFP16Denormals()">;
>> +def NoFP32Denormals : Predicate<"!Subtarget->hasFP32Denormals()">;
>> +def NoFP64Denormals : Predicate<"!Subtarget->hasFP64Denormals()">;
>> def UnsafeFPMath : Predicate<"TM.Options.UnsafeFPMath">;
>>
>> def InstFlag : OperandWithDefaultOps <i32, (ops (i32 0))>;
>>
>> Modified: llvm/trunk/lib/Target/AMDGPU/SIInstructions.td
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AMDGPU/SIInstructions.td?rev=312095&r1=312094&r2=312095&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Target/AMDGPU/SIInstructions.td (original)
>> +++ llvm/trunk/lib/Target/AMDGPU/SIInstructions.td Tue Aug 29 20:03:38 2017
>> @@ -1278,20 +1278,47 @@ defm : BFMPatterns <i32, S_BFM_B32, S_MO
>> // FIXME: defm : BFMPatterns <i64, S_BFM_B64, S_MOV_B64>;
>> defm : BFEPattern <V_BFE_U32, V_BFE_I32, S_MOV_B32>;
>>
>> +let Predicates = [NoFP16Denormals] in {
>> def : Pat<
>> (fcanonicalize (f16 (VOP3Mods f16:$src, i32:$src_mods))),
>> (V_MUL_F16_e64 0, (i32 CONST.FP16_ONE), $src_mods, $src, 0, 0)
>>> ;
>> +}
>>
>> +let Predicates = [FP16Denormals] in {
>> +def : Pat<
>> + (fcanonicalize (f16 (VOP3Mods f16:$src, i32:$src_mods))),
>> + (V_MAX_F16_e64 $src_mods, $src, $src_mods, $src, 0, 0)
>> +>;
>> +}
>> +
>> +let Predicates = [NoFP32Denormals] in {
>> def : Pat<
>> (fcanonicalize (f32 (VOP3Mods f32:$src, i32:$src_mods))),
>> (V_MUL_F32_e64 0, (i32 CONST.FP32_ONE), $src_mods, $src, 0, 0)
>>> ;
>> +}
>> +
>> +let Predicates = [FP32Denormals] in {
>> +def : Pat<
>> + (fcanonicalize (f32 (VOP3Mods f32:$src, i32:$src_mods))),
>> + (V_MAX_F32_e64 $src_mods, $src, $src_mods, $src, 0, 0)
>> +>;
>> +}
>>
>> +let Predicates = [NoFP64Denormals] in {
>> def : Pat<
>> (fcanonicalize (f64 (VOP3Mods f64:$src, i32:$src_mods))),
>> (V_MUL_F64 0, CONST.FP64_ONE, $src_mods, $src, 0, 0)
>>> ;
>> +}
>> +
>> +let Predicates = [FP64Denormals] in {
>> +def : Pat<
>> + (fcanonicalize (f64 (VOP3Mods f64:$src, i32:$src_mods))),
>> + (V_MAX_F64 $src_mods, $src, $src_mods, $src, 0, 0)
>> +>;
>> +}
>>
>> def : Pat<
>> (fcanonicalize (v2f16 (VOP3PMods v2f16:$src, i32:$src_mods))),
>
>
> I just noticed you missed the packed case
>
> -Matt
More information about the llvm-commits
mailing list