[PATCH] D139398: [AMDGPU] Add bf16 storage support

Thu Dec 8 07:43:45 PST 2022

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUCallingConv.td:49
+  CCIfType<[bf16], CCBitConvertToType<i16>>,
+  CCIfType<[v2bf16], CCBitConvertToType<i32>>,
   CCIfNotInReg<CCIfType<[f32, i32, f16, i16, v2i16, v2f16] , CCAssignToReg<[
----------------
Pierre-vh wrote:
> arsenm wrote:
> > Without being added to a register class, all the tablegen changes should not do anything
> bf16 ones seem to not be needed but if I don't have the v2bf16 ones I get "cannot allocate arguments" in "test_arg_store_v2bf16"
Sounds like a type legality logic bug which I don't want to spend time fighting

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:913
+      else
+        RegisterVT = (ScalarVT == MVT::bf16 ? MVT::v2bf16 : MVT::v2f16);
       IntermediateVT = RegisterVT;
----------------
If you wanted the promote to i32, you could have done it here instead of in the tablegen cc handling

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5558
+  // This lowers them into (i16 (bitconvert (f32 (fptrunc x))))
+  if (Op->getValueType(0) != MVT::i16)
+    return SDValue();
----------------
Op.getValueType()

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5563
+  return DAG.getNode(ISD::BITCAST, SL, MVT::i16,
+                     DAG.getFPExtendOrRound(Op->getOperand(0), SL, MVT::f16));
+}
----------------
Should be specific cast, not FPExtOrRound. I don't think the FP_ROUND case would be correct

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5569
+  // (bitconvert x))))
+  if (!Op->getValueType(0).isFloatingPoint() ||
+      Op->getOperand(0).getValueType() != MVT::i16)
----------------
Should use Op.getValueType() instead of Op->getValueType(0)

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5573-5576
+  SDLoc SL(Op);
+  return DAG.getNode(
+      ISD::FP_EXTEND, SL, MVT::f32,
+      DAG.getNode(ISD::BITCAST, SL, MVT::f16, Op->getOperand(0)));
----------------
ExpandNode covers lowering BF16_TO_FP. It also has a shift by 16-bits into the high bits. Is this correct?

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4819-4831
+    // When we don't have 16 bit instructions, bf16 is illegal and gets
+    // softened to i16 for storage, with float being used for arithmetic.
+    //
+    // After softening, some i16 -> fp32 bf16_to_fp operations can be left over.
+    // Lower those to (f32 (fp_extend (f16 (bitconvert x))))
+    if (!Op->getValueType(0).isFloatingPoint() ||
+        Op->getOperand(0).getValueType() != MVT::i16)
----------------
Pierre-vh wrote:
> arsenm wrote:
> > Pierre-vh wrote:
> > > arsenm wrote:
> > > > Pierre-vh wrote:
> > > > > arsenm wrote:
> > > > > > The generic legalizer should have handled this?
> > > > > It looks like those operations are not implemented in the generic legalizer, e.g. I get 
> > > > > ``` 
> > > > > Do not know how to promote this operator's operand!
> > > > > ```
> > > > Right, this is the code that would go there
> > > Do I just copy/paste this code in that PromoteInt function, and keep a copy here too in LowerOperation? (not really a fan of copy-pasting code in different files, I'd rather keep it all here)
> > > We need to have the lowering too AFAIK, it didn't go well when I tried to remove it
> > I'm not following why you need to handle it here
> IIRC:
>  - I need to handle FP_TO_BF16 in ReplaceNodeResult because that's what the Integer Legalizer calls (through CustomLowerNode)
>  - I need to handle both opcodes in LowerOperation because otherwise they'll fail selection. They can be left over from expanding/legalizing other operations.
But why are they custom? We don't have to handle FP16_TO_FP or FP_TO_FP16 there, and they aren't custom lowered. They have the same basic properties. We have this:

```
    setOperationAction(ISD::FP16_TO_FP, MVT::i16, Promote);
    AddPromotedToType(ISD::FP16_TO_FP, MVT::i16, MVT::i32);
    setOperationAction(ISD::FP_TO_FP16, MVT::i16, Promote);
    AddPromotedToType(ISD::FP_TO_FP16, MVT::i16, MVT::i32);
```

I'd expect the same basic pattern

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139398/new/

https://reviews.llvm.org/D139398