[PATCH] D139398: [AMDGPU] Add bf16 storage support
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 12 14:51:11 PST 2022
arsenm added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4819-4831
+ // When we don't have 16 bit instructions, bf16 is illegal and gets
+ // softened to i16 for storage, with float being used for arithmetic.
+ //
+ // After softening, some i16 -> fp32 bf16_to_fp operations can be left over.
+ // Lower those to (f32 (fp_extend (f16 (bitconvert x))))
+ if (!Op->getValueType(0).isFloatingPoint() ||
+ Op->getOperand(0).getValueType() != MVT::i16)
----------------
Pierre-vh wrote:
> arsenm wrote:
> > Pierre-vh wrote:
> > > arsenm wrote:
> > > > Pierre-vh wrote:
> > > > > arsenm wrote:
> > > > > > Pierre-vh wrote:
> > > > > > > arsenm wrote:
> > > > > > > > Pierre-vh wrote:
> > > > > > > > > arsenm wrote:
> > > > > > > > > > The generic legalizer should have handled this?
> > > > > > > > > It looks like those operations are not implemented in the generic legalizer, e.g. I get
> > > > > > > > > ```
> > > > > > > > > Do not know how to promote this operator's operand!
> > > > > > > > > ```
> > > > > > > > Right, this is the code that would go there
> > > > > > > Do I just copy/paste this code in that PromoteInt function, and keep a copy here too in LowerOperation? (not really a fan of copy-pasting code in different files, I'd rather keep it all here)
> > > > > > > We need to have the lowering too AFAIK, it didn't go well when I tried to remove it
> > > > > > I'm not following why you need to handle it here
> > > > > IIRC:
> > > > > - I need to handle FP_TO_BF16 in ReplaceNodeResult because that's what the Integer Legalizer calls (through CustomLowerNode)
> > > > > - I need to handle both opcodes in LowerOperation because otherwise they'll fail selection. They can be left over from expanding/legalizing other operations.
> > > > But why are they custom? We don't have to handle FP16_TO_FP or FP_TO_FP16 there, and they aren't custom lowered. They have the same basic properties. We have this:
> > > >
> > > >
> > > > ```
> > > > setOperationAction(ISD::FP16_TO_FP, MVT::i16, Promote);
> > > > AddPromotedToType(ISD::FP16_TO_FP, MVT::i16, MVT::i32);
> > > > setOperationAction(ISD::FP_TO_FP16, MVT::i16, Promote);
> > > > AddPromotedToType(ISD::FP_TO_FP16, MVT::i16, MVT::i32);
> > > > ```
> > > >
> > > > I'd expect the same basic pattern
> > > PromoteIntegerOperand, PromoteFloatOperand and PromoteIntegerResult don't handle FP_TO_BF16 and BF16_TO_FP, and unless we put a Custom lowering mode it'll assert/unreachable.
> > > I tried to make it work (for a while) using the default expand but I can't quite get it to work. It feels like there is some legalizer work missing for handling BF16 like we want to.
> > > Even though it's not ideal I think the custom lowering is easiest
> > What about Expand? that's where the implemented part is
> Last I tried, Expand will emit a libcall in many cases that we don't handle
Library call is supposed to be a distinct action now, the DAG only did about 5% of the work to migrate to using it. This code can go to the default expand action
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5564
+ SDLoc SL(Op);
+ const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+
----------------
This is just this
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5578
+ DAG.getConstant(16, SL,
+ TLI.getShiftAmountTy(MVT::i32, DAG.getDataLayout()))});
+ return DAG.getNode(ISD::TRUNCATE, SL, MVT::i16, Result);
----------------
can just hardcode i32
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5590
+ SDLoc SL(Op);
+ const TargetLowering &TLI = DAG.getTargetLoweringInfo();
+
----------------
This is just this
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5596
+ ISD::SHL, SL, MVT::i32,
+ {DAG.getNode(ISD::ZERO_EXTEND, SL, MVT::i32, Op->getOperand(0)),
+ DAG.getConstant(16, SL,
----------------
This can be any_extend and the combiner will probably turn it into one
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:5598
+ DAG.getConstant(16, SL,
+ TLI.getShiftAmountTy(MVT::i32, DAG.getDataLayout()))});
+ Result = DAG.getBitcast(MVT::f32, Result);
----------------
Can just hardcode i32
================
Comment at: llvm/test/CodeGen/AMDGPU/bf16.ll:11
+; than simple load/stores.
+
+define void @test_load_store(ptr addrspace(1) %in, ptr addrspace(1) %out) {
----------------
Doesn't cover the different f32/f64 conversions?
================
Comment at: llvm/test/CodeGen/AMDGPU/bf16.ll:2953
+; GFX10-NEXT: s_setpc_b64 s[30:31]
+ %ins.0 = insertvalue { <32 x i32>, bfloat } undef, <32 x i32> %b, 0
+ %ins.1 = insertvalue { <32 x i32>, bfloat } %ins.0 ,bfloat %a, 1
----------------
Use poison instead of undef in tests
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D139398/new/
https://reviews.llvm.org/D139398
More information about the llvm-commits
mailing list