[PATCH] D25975: AMDGPU/SI: Make f16 a legal type for VI subtargets
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 26 09:52:43 PDT 2016
arsenm added inline comments.
================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:82-83
// TODO: Subtarget feature for i16
- if (Subtarget->getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS)
+ if (Subtarget->getGeneration() >= AMDGPUSubtarget::VOLCANIC_ISLANDS) {
addRegisterClass(MVT::i16, &AMDGPU::SReg_32RegClass);
----------------
The TODO was taken care of already
================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:85-86
addRegisterClass(MVT::i16, &AMDGPU::SReg_32RegClass);
+ addRegisterClass(MVT::f16, &AMDGPU::SReg_32RegClass);
+ addRegisterClass(MVT::f16, &AMDGPU::VGPR_32RegClass);
+ }
----------------
tstellarAMD wrote:
> There is a 1-to-1 mapping between types and register classes, so you can drop the first addRegisterClass(MVT::f16, ... ) call
I think this should only be added to SReg_32. We already have other random inconsistencies from having f32 in a VGPR class and i32 in SGPR which I've been trying to fix
================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:3602
- if (VT != MVT::f32 && VT != MVT::f64)
+ if (VT != MVT::f16 && VT != MVT::f32 && VT != MVT::f64)
return SDValue();
----------------
Does this need an f16 is legal check? If not this can probably just be an !isVector check, any of the other FP types are unusable
================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:1415-1416
- if (Opc == AMDGPU::V_MAD_F32 || Opc == AMDGPU::V_MAC_F32_e64) {
+ if (Opc == AMDGPU::V_MAD_F16 || Opc == AMDGPU::V_MAC_F16_e64 ||
+ Opc == AMDGPU::V_MAD_F32 || Opc == AMDGPU::V_MAC_F32_e64) {
+ bool IsF16 = Opc == AMDGPU::V_MAD_F16 || Opc == AMDGPU::V_MAC_F16_e64;
----------------
I would prefer checking the more common f32 cases first
================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:1505
- if (Opc == AMDGPU::V_MAC_F32_e64) {
+ if (Opc == AMDGPU::V_MAC_F16_e64 || Opc == AMDGPU::V_MAC_F32_e64) {
UseMI.untieRegOperand(
----------------
Ditto
================
Comment at: lib/Target/AMDGPU/SIInstructions.td:443-445
+ (f16 (fmad (VOP3NoMods0 f32:$src0, i32:$src0_modifiers, i1:$clamp, i32:$omod),
+ (VOP3NoMods f32:$src1, i32:$src1_modifiers),
+ (VOP3NoMods f32:$src2, i32:$src2_modifiers))),
----------------
The f32 patterns on the sources look wrong. Not sure how this compiles
https://reviews.llvm.org/D25975
More information about the llvm-commits
mailing list