[PATCH] D99662: [AArch64] Add Machine InstCombiner patterns for FMUL indexed variant

Fri Apr 9 07:26:43 PDT 2021

asavonic added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:4553
+  case AArch64::FMULv2f32:
+    assert(Root.getOperand(1).isReg() && Root.getOperand(2).isReg());
+    Found = Match(AArch64::DUPv2i32lane, 1, MCP::FMULv2i32_indexed_OP1);
----------------
SjoerdMeijer wrote:
> Nit: this assert can be hoisted out of this switch?
I think we can remove remove the assert completely. The `Match` function checks the same condition already.

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:4802
+static MachineInstr *genIndexedMultiply(
+    MachineFunction &MF, MachineRegisterInfo &MRI, const TargetInstrInfo *TII,
+    MachineInstr &Root, SmallVectorImpl<MachineInstr *> &InsInstrs,
----------------
SjoerdMeijer wrote:
> Nit: I think you can get the MF via a few `getParent()` calls on `Root`, so you don't have to pass it. From memory I can't remember if that is also true fro MRI and TII, but I don't have strong opinions on this. 
Thanks! We can get all three arguments from `Root`.

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:5790
+    Opc = AArch64::FMULv2i32_indexed;
+    genIndexedMultiply(MF, MRI, TII, Root, InsInstrs, IdxDupOp, Opc, RC);
+    break;
----------------
SjoerdMeijer wrote:
> Nit: perhaps pass `RC` and `Opc` directly in here? Don't really need these assignments?
Done.

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:5821
+  case MachineCombinerPattern::FMULv8i16_indexed_OP1:
+  case MachineCombinerPattern::FMULv8i16_indexed_OP2: {
+    RC = &AArch64::FPR128RegClass;
----------------
SjoerdMeijer wrote:
> Are we missing tests for these cases?
> For this you'll probably need to add `-mattr=+fullfp16` to a RUN line.
Thanks a lot! I didn't know about `-mattr=+fullfp16`.

================
Comment at: llvm/test/CodeGen/AArch64/arm64-fma-combines.ll:164
+  %add = fadd fast <2 x double> %mul, %ad
+  store <2 x double> %add, <2 x double>* %ret, align 16
+  br label %for.cond
----------------
SjoerdMeijer wrote:
> Nit: can we just do a `ret %add` here?
This an attempt to keep `shuffle` and `mul` in different basic blocks. If we return in the second basic block, the optimizer merges them into one before we reach instruction selection.
In the latest revision of the patch I changed the tests to ensure that this never happens.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99662/new/

https://reviews.llvm.org/D99662