[PATCH] D62301: Fold Address Computations into Load/Store instructions for AArch64

Thu May 23 05:32:16 PDT 2019

t.p.northover added inline comments.

================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1388-1389
+// Both instructions opcodes should 'X' or 'W'
+static bool isAddrFoldableInst(MachineBasicBlock::iterator Update,
+                MachineBasicBlock::iterator I) {
+  unsigned IOpc = I->getOpcode();
----------------
I think these names could be improved. Maybe `AddrCalcI` and `MemI`?

================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1401-1402
+  }
+  // will return true if both instructions are ADDWrs and (STRWui and LDRWui).
+  else if (Update->getOpcode() == AArch64::ADDWrs) {
+    switch(IOpc) {
----------------
I don't see why we're checking ADDWrs here. This is an address computation so it'll always be at 64-bits.

Even if it hypothetically wasn't (say with 32-bit pointers) the size of the arithmetic has no bearing on the type being stored, which is what the W in STRWui and LDRWui refer to.

================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1419-1420
+  unsigned IOpc = I->getOpcode();
+  if(UpdateOpc == AArch64::ADDXrs ||
+     UpdateOpc == AArch64::ADDWrs) {
+    switch(IOpc) {
----------------
Are you planning to add more variants in the immediate future? If not, it's probably best to convert this to an assertion.

================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1437
+
+// here will merge the both ADD and STR/LDR instructions if able to merge both 
+// and will create new instruction.
----------------
Can we start sentences with a capital letter please.

================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1449
+    ++NextI;
+  // getting the immediate value of operand three
+  int Value = Update->getOperand(3).getImm();
----------------
This is not a useful comment.

================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1473
+     // Paired instruction.
+     MIB = BuildMI(*I->getParent(), I, I->getDebugLoc(), TII->get(Opc))
+             .add(I->getOperand(0))
----------------
I don't believe this is reachable. It's certainly wrong: no ldp or stp instructions allow the sum of two registers, let alone with a shift.

================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1499
+
+   if(I->getOperand(1).isOnRegUseListNext()) {
+     MachineBasicBlock *MBB = I->getParent();
----------------
What's the benefit of doing the transformation at all if we don't get to remove the update instruction? I think there should be an earlier check that the load/store is the only user before doing anything.

If it remains, the test should probably be whether the register is killed by its use in `I`.

================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1511-1513
 bool AArch64LoadStoreOpt::isMatchingUpdateInsn(MachineInstr &MemMI,
                                                MachineInstr &MI,
                                                unsigned BaseReg, int Offset) {
----------------
I don't think this is the right place for your code. What you're looking for is a fundamentally but not necessarily different set of mergeability criteria.

You could easily imagine someone extending this function to support `ADDXrs` with the intention to apply it in the **same** way as ADDXri, and that's what this function should be saved for.

Instead, I think we need a distinctive pair of names for what you're doing and the existing transformation. Perhaps `mergeUpdateToIndexedMemOp` and `mergeUpdateToSingleUser`, though I'm not entirely happy with those and I'll think some more.

================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1518-1519
+  case AArch64::ADDWrs:
+  case AArch64::ADDXrs:
+  {
+     if (!MI.getOperand(1).isReg() || !MI.getOperand(2).isReg() || !MI.getOperand(3).isImm())
----------------
LLVM style braces start on the same line as the case.

================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1523-1524
+     int ShiftValue = MI.getOperand(3).getImm();
+     if (ShiftValue > 4)
+       break;
+     // The update instruction source and destination register must be the
----------------
I think this needs significantly better validation. The acceptable shifts depend strongly on what load or store this is being folded into. Non-paired loads only allow shifts of 0 or the data width, and paired loads don't allow any shift.

================
Comment at: lib/Target/AArch64/AArch64LoadStoreOptimizer.cpp:1785-1786
   if (Update != E) {
-    // Merge the update into the ld/st.
-    MBBI = mergeUpdateInsn(MBBI, Update, /*IsPreIdx=*/false);
-    return true;
+    if (Update->getOpcode() == AArch64::ADDXri ||
+        Update->getOpcode() == AArch64::SUBXri) {
+      // Merge the update into the ld/st.
----------------
I think this needs refactoring, we shouldn't be hard-coding what `mergeUpdateInsn` supports outside that function. Instead, the merge should be able to bail gracefully when it doesn't understand ADDWrs or whatever.

================
Comment at: test/CodeGen/AArch64/fold_addressing_modes_aarch64.ll:32
+
+attributes #0 = { minsize norecurse nounwind optsize "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="cortex-a53" "target-features"="+aes,+crc,+crypto,+fp-armv8,+neon,+sha2" "unsafe-fp-math"="false" "use-soft-float"="false" }
+
----------------
You can prune all the metadata out of this, and I think we also need more than just one test. There are many different ways this transformation could potentially go wrong.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62301/new/

https://reviews.llvm.org/D62301