[PATCH] D139398: [AMDGPU] Add bf16 storage support

Pierre van Houtryve via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Dec 9 01:31:54 PST 2022


Pierre-vh added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:913
+      else
+        RegisterVT = (ScalarVT == MVT::bf16 ? MVT::v2bf16 : MVT::v2f16);
       IntermediateVT = RegisterVT;
----------------
arsenm wrote:
> If you wanted the promote to i32, you could have done it here instead of in the tablegen cc handling
Do you mean somewhere else in that function? Changing v2bf16 to i32 here doesn't fix it 
I also tried changing the function above but I kept running into asserts so I just left the TableGen CC for now


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5563
+  return DAG.getNode(ISD::BITCAST, SL, MVT::i16,
+                     DAG.getFPExtendOrRound(Op->getOperand(0), SL, MVT::f16));
+}
----------------
arsenm wrote:
> Should be specific cast, not FPExtOrRound. I don't think the FP_ROUND case would be correct
But we need to do f32 -> f16, isn't FP_ROUND used for that? I thought it's what we needed


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5573-5576
+  SDLoc SL(Op);
+  return DAG.getNode(
+      ISD::FP_EXTEND, SL, MVT::f32,
+      DAG.getNode(ISD::BITCAST, SL, MVT::f16, Op->getOperand(0)));
----------------
arsenm wrote:
> ExpandNode covers lowering BF16_TO_FP. It also has a shift by 16-bits into the high bits. Is this correct?
Ah I didn't know that, though as long as we use custom lowering, and our FP_TO_BF16/BF16_TO_FP methods are consistent, it should be fine, no?


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4819-4831
+    // When we don't have 16 bit instructions, bf16 is illegal and gets
+    // softened to i16 for storage, with float being used for arithmetic.
+    //
+    // After softening, some i16 -> fp32 bf16_to_fp operations can be left over.
+    // Lower those to (f32 (fp_extend (f16 (bitconvert x))))
+    if (!Op->getValueType(0).isFloatingPoint() ||
+        Op->getOperand(0).getValueType() != MVT::i16)
----------------
arsenm wrote:
> Pierre-vh wrote:
> > arsenm wrote:
> > > Pierre-vh wrote:
> > > > arsenm wrote:
> > > > > Pierre-vh wrote:
> > > > > > arsenm wrote:
> > > > > > > The generic legalizer should have handled this?
> > > > > > It looks like those operations are not implemented in the generic legalizer, e.g. I get 
> > > > > > ``` 
> > > > > > Do not know how to promote this operator's operand!
> > > > > > ```
> > > > > Right, this is the code that would go there
> > > > Do I just copy/paste this code in that PromoteInt function, and keep a copy here too in LowerOperation? (not really a fan of copy-pasting code in different files, I'd rather keep it all here)
> > > > We need to have the lowering too AFAIK, it didn't go well when I tried to remove it
> > > I'm not following why you need to handle it here
> > IIRC:
> >  - I need to handle FP_TO_BF16 in ReplaceNodeResult because that's what the Integer Legalizer calls (through CustomLowerNode)
> >  - I need to handle both opcodes in LowerOperation because otherwise they'll fail selection. They can be left over from expanding/legalizing other operations.
> But why are they custom? We don't have to handle FP16_TO_FP or FP_TO_FP16 there, and they aren't custom lowered. They have the same basic properties. We have this:
> 
> 
> ```
>     setOperationAction(ISD::FP16_TO_FP, MVT::i16, Promote);
>     AddPromotedToType(ISD::FP16_TO_FP, MVT::i16, MVT::i32);
>     setOperationAction(ISD::FP_TO_FP16, MVT::i16, Promote);
>     AddPromotedToType(ISD::FP_TO_FP16, MVT::i16, MVT::i32);
> ```
> 
> I'd expect the same basic pattern
PromoteIntegerOperand, PromoteFloatOperand and PromoteIntegerResult don't handle FP_TO_BF16 and BF16_TO_FP, and unless we put a Custom lowering mode it'll assert/unreachable.
I tried to make it work (for a while) using the default expand but I can't quite get it to work. It feels like there is some legalizer work missing for handling BF16 like we want to.
Even though it's not ideal I think the custom lowering is easiest


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139398/new/

https://reviews.llvm.org/D139398



More information about the cfe-commits mailing list