[PATCH] D38196: [AArch64] Avoid interleaved SIMD store instructions for Exynos

Tue Oct 3 08:32:07 PDT 2017

kristof.beyls added a comment.

Hi Abderrazek,

I've added a few comments, but ran out of time looking through the full patch.
Hopefully this is already useful to you.

Thanks!

Kristof

================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:71-74
+  typedef enum {
+    FMLA,
+    Interleave
+  } Subpass;
----------------
In other parts, comments state "Certain SIMD instructions with vector element operand are not efficient."
In this enum, FMLA is used for that situation.
It seems that either the name of the enum value should be changed, or the comments be updated?

================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:81-82
+
+  //maximum number of replacement instructions.
+  static const unsigned MaxNumRepl = 10;
+
----------------
Maybe the comment should indicate why a limit must be placed on the maximum number of instructions replaced & why 10 is a good value?

================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:90
+  /// to replace the instruction InstDesc by the instructions stored in the
+  /// array InstDescRep1.
   /// Return true if replacement is recommended.
----------------
The comment states "InstDescRep1", but the argument name in the code is "InstDescRepl"

================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:98-107
   /// Determine if we need to exit the vector by element instruction
   /// optimization pass early. This makes sure that Targets with no need
   /// for this optimization do not spent any compile time on this pass.
   /// This check is done by comparing the latency of an indexed FMLA
   /// instruction to the latency of the DUP + the latency of a vector
   /// FMLA instruction. We do not check on other related instructions such
   /// as FMLS as we assume that if the situation shows up for one
----------------
I guess as is, the comment cannot be fully true anymore?
Either this checks also the "Interleave" patterns, or this optimization does spend some compile time on this pass for a Target that does not need this optimization?

================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:159
+/// to replace the instruction InstDesc by the instructions stored in the
+/// array InstDescRep1.
 /// Return true if replacement is recommended.
----------------
s/InstDescRep1/InstDescRepl/?

================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:163
     MachineFunction *MF, const MCInstrDesc *InstDesc,
-    const MCInstrDesc *InstDescRep1, const MCInstrDesc *InstDescRep2,
+    const MCInstrDesc *InstDescRepl[], unsigned NumRepl,
     std::map<unsigned, bool> &VecInstElemTable) const {
----------------
I guess there is a good reason why a vector cannot be used here instead of a pointer and a separate NumRepl argument?

================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:172
   unsigned SCIdx = InstDesc->getSchedClass();
-  unsigned SCIdxRep1 = InstDescRep1->getSchedClass();
-  unsigned SCIdxRep2 = InstDescRep2->getSchedClass();
+  unsigned SCIdxRepl[MaxNumRepl];
+  for (unsigned i=0; i<NumRepl; i++)
----------------
This probably should be a SmallVector and the constant MaxNumRepl should disappear?

================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:172-181
+  unsigned SCIdxRepl[MaxNumRepl];
+  for (unsigned i=0; i<NumRepl; i++)
+    SCIdxRepl[i] = InstDescRepl[i]->getSchedClass();
+
   const MCSchedClassDesc *SCDesc =
-      SchedModel.getMCSchedModel()->getSchedClassDesc(SCIdx);
-  const MCSchedClassDesc *SCDescRep1 =
-      SchedModel.getMCSchedModel()->getSchedClassDesc(SCIdxRep1);
-  const MCSchedClassDesc *SCDescRep2 =
-      SchedModel.getMCSchedModel()->getSchedClassDesc(SCIdxRep2);
+    SchedModel.getMCSchedModel()->getSchedClassDesc(SCIdx);
+  const MCSchedClassDesc *SCDescRepl[MaxNumRepl];
----------------
kristof.beyls wrote:
> This probably should be a SmallVector and the constant MaxNumRepl should disappear?
I think it's clearer if the above 2 loops would be fused, making the cryptic variable array variable SCIdxRepl unnecessary.

================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:178
+    SchedModel.getMCSchedModel()->getSchedClassDesc(SCIdx);
+  const MCSchedClassDesc *SCDescRepl[MaxNumRepl];
+	for (unsigned i=0; i<NumRepl; i++)
----------------
Similarly, this probably needs to be a SmallVector?

================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:192-197
+    for (unsigned i=0; i<NumRepl; i++)
+      if (!SCDescRepl[i]->isValid() || SCDescRepl[i]->isVariant())
+      {
+        VecInstElemTable[InstDesc->getOpcode()] = false;
+        return false;
+      }
----------------
My guess is that this logic would also be clearer if it would be fused with the previous loop populating SCDescRepl[i].

================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:228
+                                                     Subpass sp) {
+  assert(sp == FMLA || sp == Interleave);
   std::map<unsigned, bool> VecInstElemTable;
----------------
if this function would us a switch to handle different enum values, the assert here wouldn't be needed; having just an assert or llvm_unreachable on the default case. I think that's more idiomatic in the LLVM code base.

================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:235-238
+    const MCInstrDesc* ReplInstrMCID[2] = {&TII->get(AArch64::DUPv4i32lane),
+                                           &TII->get(AArch64::FMULv4f32)};
+    if (!shouldReplaceInstruction(MF, OriginalMCID, ReplInstrMCID, 2,
+                                  VecInstElemTable))
----------------
As pointed out earlier, a vector/SmallVector rather than an array would result in not needing to manually calculate and specify the length of the array here?

================
Comment at: llvm/lib/Target/AArch64/AArch64VectorByElementOpt.cpp:417

+/// Load/Store Interelaving instructions are not always beneficial.
+/// Replace them by zip instructions and classical load/store.
----------------
typo

https://reviews.llvm.org/D38196