[PATCH] D66666: [AMDGPU] Remove unnecessary movs for v_fmac operands
Ryan Taylor via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 23 11:55:54 PDT 2019
rtaylor marked 2 inline comments as done.
rtaylor added inline comments.
================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2630-2633
+ if (!Src0Mods && !Src1Mods && !Clamp && !Omod &&
+ (ST.getConstantBusLimit(Opc) > 1 ||
+ !Src0->isReg() ||
+ !RI.isSGPRReg(MBB->getParent()->getRegInfo(), Src0->getReg()))) {
----------------
arsenm wrote:
> These are the exact conditions as checked above
Yes, it is. I could create a local function that does this and replace both with that, it would be just as ugly since there are so many conditions (params) to pass, or I could pass MI and re-get all those operands, which would be the exact code that is already in the function.
================
Comment at: test/CodeGen/AMDGPU/fmac-fma-sgpr-copy.ll:4-12
+define amdgpu_cs float @test1(<4 x i32> inreg %a, float %b, float %y) {
+entry:
+ %buf.load = call <4 x i32> @llvm.amdgcn.s.buffer.load.v4i32(<4 x i32> %a, i32 0, i32 0)
+ %vec1 = bitcast <4 x i32> %buf.load to <4 x float>
+ %.i095 = extractelement <4 x float> %vec1, i32 0
+ %.i098 = fsub nnan arcp float %b, %.i095
+ %fma1 = call float @llvm.fma.f32(float %y, float %.i098, float %.i095) #3
----------------
arsenm wrote:
> You should be able to reduce this. You shouldn't need any vector operations
I tried a few different things but I probably just couldn't find the simple form to produce a convertable V_FMAC. Suggestions?
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D66666/new/
https://reviews.llvm.org/D66666
More information about the llvm-commits
mailing list