[PATCH] D48128: [ARM] Parallel DSP IR Pass

Wed Jun 13 08:59:12 PDT 2018

samparker added a comment.

Hi Sjoerd,

Just a few comments before I go home, I will continue to look tomorrow. I was going to say that maybe this pass should live in the ARM backend, but I also see other passes in the same directory handling target specific intrinsics.

cheers,
sam

================
Comment at: lib/Transforms/Scalar/ParallelDSP.cpp:42
+
+  using ParallelMACList = std::vector<ParallelMAC>;
+  using ReductionList   = std::vector<Reduction>;
----------------
SmallVector instead?

================
Comment at: lib/Transforms/Scalar/ParallelDSP.cpp:132
+      MatchSMLAD(F);
+      return true;
+    }
----------------
This should only return true if something has changed.

================
Comment at: lib/Transforms/Scalar/ParallelDSP.cpp:250
+
+  if (Ld0 == Ld1) {
+    LLVM_DEBUG(dbgs() << "OK: loads are the same.\n");
----------------
Why do we consider these sequential?

================
Comment at: lib/Transforms/Scalar/ParallelDSP.cpp:257
+
+  // In case Ld0/Ld1 are volatile loads, this will return false.
+  if (isConsecutiveAccess(Ld0, Ld1, *DL, *SE)) {
----------------
Would it be wiser to explicitly check before calling this?

================
Comment at: lib/Transforms/Scalar/ParallelDSP.cpp:357
+
+static bool MatchReductions(Function &F, Loop *TheLoop, BasicBlock *Header,
+                            ReductionList &Reductions) {
----------------
This interface could be simplified by passing HasFnNoNanAttr instead of the function. Similarly, I don't see why its necessary to pass the loop and its header.

================
Comment at: lib/Transforms/Scalar/ParallelDSP.cpp:419
+    // Pattern 1:
+    //  %add11 = add i32 %mul, %mac1.037
+    //  %add16 = add i32 %mul10, %add11
----------------
nit: probably a nicer way to comment these patterns...

================
Comment at: lib/Transforms/Scalar/ParallelDSP.cpp:513
+  unsigned AddrSpace = VecLd0->getPointerAddressSpace();
+  Value *VecPtr = Builder.CreateBitCast(VecLd0->getPointerOperand(),
+                                        AccTy->getPointerTo(AddrSpace));
----------------
Cores will need to support unaligned accesses to do this. It should also be double checked that the load store optimiser doesn't generate any unaligned LDRD later too. I think there is already an option in that pass to check for that.

https://reviews.llvm.org/D48128