[PATCH] D135447: [AMDGPU] Add llvm.is.fpclass intrinsic to existing SelectionDAG fp class support and introduce GlobalISel implementation for AMDGPU
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 3 11:49:47 PDT 2022
arsenm added inline comments.
================
Comment at: llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp:2320
+ case Intrinsic::is_fpclass: {
+ unsigned Flags = MachineInstr::copyFlagsFromInstruction(CI);
+
----------------
This should get an IRTranslator test to make sure the flags are passed through
================
Comment at: llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp:2324-2325
+ Type *FpEltTy = FpValue->getType()->getScalarType();
+ const APInt &TestMaskValue =
+ cast<Constant>(CI.getOperand(1))->getUniqueInteger();
+ const fltSemantics &FpSem = FpEltTy->getFltSemantics();
----------------
getUniqueInteger is unnecessarily fancy, can just cast to ConstantInt directly
================
Comment at: llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp:2332
+ .addImm(TestMaskValue.getZExtValue())
+ .addImm((unsigned)APFloat::SemanticsToEnum(FpSem))
+ .setMIFlag(MachineInstr::NoFPExcept);
----------------
Do you really need the float type operand? I know bfloat16 isn't going to work without it, but I thought the plan was to introduce FP types to LLT
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.td:133
+def AMDGPUis_fpclass_impl : SDNode<"ISD::IS_FPCLASS", AMDGPUFPClassOp>;
----------------
Should avoid defining an AMDGPU node for this and move this to generic code
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:870
+
+bool AMDGPUInstructionSelector::selectG_IS_FPCLASS(MachineInstr &I) const {
+ MachineBasicBlock *BB = I.getParent();
----------------
I don't see why you need to manually select this (maybe sharing the pattern between the existing intrinsic is annoying because the new intrinsic uses immarg?)
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:880-883
+ const llvm::fltSemantics &fpSem =
+ APFloat::EnumToSemantics((APFloat::Semantics)I.getOperand(3).getImm());
+ if (APFloat::semanticsSizeInBits(fpSem) != Size)
+ return false;
----------------
Should be no reason to check this here
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:894-902
+ if (TII.isVOP3(Opcode) && !STI.hasVOP3Literal()) {
+ Register ConstantReg = MRI->createVirtualRegister(&AMDGPU::VGPR_32RegClass);
+ BuildMI(*BB, CmpClassBuilder.getInstr(), DL, TII.get(AMDGPU::V_MOV_B32_e32),
+ ConstantReg)
+ .addImm(Mask);
+ CmpClassBuilder.addReg(ConstantReg);
+ } else {
----------------
You can just unconditionally materialize the constant into a register and let SIFoldOperands sort out the constant bus restriction
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:905-906
+ MachineInstr *CmpClass = CmpClassBuilder;
+ RBI.constrainGenericRegister(CmpClass->getOperand(0).getReg(),
+ *TRI.getBoolRC(), *MRI);
+ bool Ret = constrainSelectedInstRegOperands(*CmpClass, TII, TRI, RBI);
----------------
You shouldn't need to special case the result constraint
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:3945
+ OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
+ OpdsMapping[2] = OpdsMapping[3] = nullptr;
+ break;
----------------
Pretty sure this default constructs to null
================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -global-isel=0 -march=amdgcn -mcpu=gfx908 -verify-machineinstrs < %s | FileCheck --check-prefix=GFX9SELDAG %s
+; RUN: llc -global-isel=1 -march=amdgcn -mcpu=gfx908 -verify-machineinstrs < %s | FileCheck --check-prefix=GFX9GLISEL %s
----------------
Should use some share prefixes, a lot of these functions are the same. Also needs a gfx7 and 8 run lines for the half promotion
================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.ll:1922
+declare i1 @llvm.is.fpclass.f64(double, i32)
+declare <2 x i1> @llvm.is.fpclass.v2f16(<2 x half>, i32)
+declare <2 x i1> @llvm.is.fpclass.v2f32(<2 x float>, i32)
----------------
v3f16 and v4f16 are also potentially interesting
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D135447/new/
https://reviews.llvm.org/D135447
More information about the llvm-commits
mailing list