[PATCH] D135447: [AMDGPU] Add llvm.is.fpclass intrinsic to existing SelectionDAG fp class support and introduce GlobalISel implementation for AMDGPU

Thu Nov 3 11:49:47 PDT 2022

arsenm added inline comments.

================
Comment at: llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp:2320
+  case Intrinsic::is_fpclass: {
+    unsigned Flags = MachineInstr::copyFlagsFromInstruction(CI);
+
----------------
This should get an IRTranslator test to make sure the flags are passed through

================
Comment at: llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp:2324-2325
+    Type *FpEltTy = FpValue->getType()->getScalarType();
+    const APInt &TestMaskValue =
+        cast<Constant>(CI.getOperand(1))->getUniqueInteger();
+    const fltSemantics &FpSem = FpEltTy->getFltSemantics();
----------------
getUniqueInteger is unnecessarily fancy, can just cast to ConstantInt directly 

================
Comment at: llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp:2332
+        .addImm(TestMaskValue.getZExtValue())
+        .addImm((unsigned)APFloat::SemanticsToEnum(FpSem))
+        .setMIFlag(MachineInstr::NoFPExcept);
----------------
Do you really need the float type operand? I know bfloat16 isn't going to work without it, but I thought the plan was to introduce FP types to LLT

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.td:133

+def AMDGPUis_fpclass_impl : SDNode<"ISD::IS_FPCLASS", AMDGPUFPClassOp>;

----------------
Should avoid defining an AMDGPU node for this and move this to generic code

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:870
+
+bool AMDGPUInstructionSelector::selectG_IS_FPCLASS(MachineInstr &I) const {
+  MachineBasicBlock *BB = I.getParent();
----------------
I don't see why you need to manually select this (maybe sharing the pattern between the existing intrinsic is annoying because the new intrinsic uses immarg?)

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:880-883
+  const llvm::fltSemantics &fpSem =
+      APFloat::EnumToSemantics((APFloat::Semantics)I.getOperand(3).getImm());
+  if (APFloat::semanticsSizeInBits(fpSem) != Size)
+    return false;
----------------
Should be no reason to check this here

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:894-902
+  if (TII.isVOP3(Opcode) && !STI.hasVOP3Literal()) {
+    Register ConstantReg = MRI->createVirtualRegister(&AMDGPU::VGPR_32RegClass);
+    BuildMI(*BB, CmpClassBuilder.getInstr(), DL, TII.get(AMDGPU::V_MOV_B32_e32),
+            ConstantReg)
+        .addImm(Mask);
+    CmpClassBuilder.addReg(ConstantReg);
+  } else {
----------------
You can just unconditionally materialize the constant into a register and let SIFoldOperands sort out the constant bus restriction

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:905-906
+  MachineInstr *CmpClass = CmpClassBuilder;
+  RBI.constrainGenericRegister(CmpClass->getOperand(0).getReg(),
+                               *TRI.getBoolRC(), *MRI);
+  bool Ret = constrainSelectedInstRegOperands(*CmpClass, TII, TRI, RBI);
----------------
You shouldn't need to special case the result constraint 

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp:3945
+    OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
+    OpdsMapping[2] = OpdsMapping[3] = nullptr;
+    break;
----------------
Pretty sure this default constructs to null 

================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN:  llc -global-isel=0 -march=amdgcn -mcpu=gfx908 -verify-machineinstrs < %s  | FileCheck --check-prefix=GFX9SELDAG %s
+; RUN:  llc -global-isel=1 -march=amdgcn -mcpu=gfx908 -verify-machineinstrs < %s  | FileCheck --check-prefix=GFX9GLISEL %s
----------------
Should use some share prefixes, a lot of these functions are the same. Also needs a gfx7 and 8 run lines for the half promotion 

================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.ll:1922
+declare i1 @llvm.is.fpclass.f64(double, i32)
+declare <2 x i1> @llvm.is.fpclass.v2f16(<2 x half>, i32)
+declare <2 x i1> @llvm.is.fpclass.v2f32(<2 x float>, i32)
----------------
v3f16 and v4f16 are also potentially interesting 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135447/new/

https://reviews.llvm.org/D135447