[PATCH] D139398: [AMDGPU] Add bf16 storage support

Wed Dec 7 00:07:52 PST 2022

Pierre-vh added inline comments.

================
Comment at: clang/lib/Basic/Targets/AMDGPU.h:119
+  bool hasBFloat16Type() const override { return isAMDGCN(getTriple()); }
+  const char *getBFloat16Mangling() const override { return "u6__bf16"; };
+
----------------
Pierre-vh wrote:
> arsenm wrote:
> > Don't understand this mangling. What is u6?
> Not sure; for that one I just copy-pasted the implementation of other targets. All other targets use that mangling scheme
Ah I remember now, it's just C++ mangling. I don't quite understand the lowercase "u" but a quick search in Clang tells me it's vendor-extended types.
So it's just u6 -> vendor extended type, 6 characters following + __bf16 (name of the type).

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4819-4831
+    // When we don't have 16 bit instructions, bf16 is illegal and gets
+    // softened to i16 for storage, with float being used for arithmetic.
+    //
+    // After softening, some i16 -> fp32 bf16_to_fp operations can be left over.
+    // Lower those to (f32 (fp_extend (f16 (bitconvert x))))
+    if (!Op->getValueType(0).isFloatingPoint() ||
+        Op->getOperand(0).getValueType() != MVT::i16)
----------------
arsenm wrote:
> Pierre-vh wrote:
> > arsenm wrote:
> > > Pierre-vh wrote:
> > > > arsenm wrote:
> > > > > The generic legalizer should have handled this?
> > > > It looks like those operations are not implemented in the generic legalizer, e.g. I get 
> > > > ``` 
> > > > Do not know how to promote this operator's operand!
> > > > ```
> > > Right, this is the code that would go there
> > Do I just copy/paste this code in that PromoteInt function, and keep a copy here too in LowerOperation? (not really a fan of copy-pasting code in different files, I'd rather keep it all here)
> > We need to have the lowering too AFAIK, it didn't go well when I tried to remove it
> I'm not following why you need to handle it here
IIRC:
 - I need to handle FP_TO_BF16 in ReplaceNodeResult because that's what the Integer Legalizer calls (through CustomLowerNode)
 - I need to handle both opcodes in LowerOperation because otherwise they'll fail selection. They can be left over from expanding/legalizing other operations.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139398/new/

https://reviews.llvm.org/D139398