[PATCH] D99662: [AArch64] Add Machine InstCombiner patterns for FMUL indexed variant

Wed Apr 7 11:23:45 PDT 2021

SjoerdMeijer added a comment.

Sorry for the delay, mostly nits inlined, one question about missing f16 tests.

But the other thing I was just wondering, not that I mind these patterns here, but are we not expecting that the VDUP is sunk to its user? I think that's probably what I would expect, but don't know if that is a fair expectation.

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:4553
+  case AArch64::FMULv2f32:
+    assert(Root.getOperand(1).isReg() && Root.getOperand(2).isReg());
+    Found = Match(AArch64::DUPv2i32lane, 1, MCP::FMULv2i32_indexed_OP1);
----------------
Nit: this assert can be hoisted out of this switch?

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:4802
+static MachineInstr *genIndexedMultiply(
+    MachineFunction &MF, MachineRegisterInfo &MRI, const TargetInstrInfo *TII,
+    MachineInstr &Root, SmallVectorImpl<MachineInstr *> &InsInstrs,
----------------
Nit: I think you can get the MF via a few `getParent()` calls on `Root`, so you don't have to pass it. From memory I can't remember if that is also true fro MRI and TII, but I don't have strong opinions on this. 

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:5790
+    Opc = AArch64::FMULv2i32_indexed;
+    genIndexedMultiply(MF, MRI, TII, Root, InsInstrs, IdxDupOp, Opc, RC);
+    break;
----------------
Nit: perhaps pass `RC` and `Opc` directly in here? Don't really need these assignments?

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:5821
+  case MachineCombinerPattern::FMULv8i16_indexed_OP1:
+  case MachineCombinerPattern::FMULv8i16_indexed_OP2: {
+    RC = &AArch64::FPR128RegClass;
----------------
Are we missing tests for these cases?
For this you'll probably need to add `-mattr=+fullfp16` to a RUN line.

================
Comment at: llvm/test/CodeGen/AArch64/arm64-fma-combines.ll:164
+  %add = fadd fast <2 x double> %mul, %ad
+  store <2 x double> %add, <2 x double>* %ret, align 16
+  br label %for.cond
----------------
Nit: can we just do a `ret %add` here?

================
Comment at: llvm/test/CodeGen/AArch64/arm64-fma-combines.ll:179
+  %add = fadd fast <4 x float> %mul, %ad
+  store <4 x float> %add, <4 x float>* %ret, align 16
+  br label %for.cond
----------------
Same?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99662/new/

https://reviews.llvm.org/D99662