[PATCH] D31772: [PowerPC] Add pass to expand extra memcpy calls

Mon May 15 08:45:32 PDT 2017

nemanjai added a comment.

Overall, I think this is fine with the comments addressed. It would be good to get some feedback from @hfinkel on whether it would be possible/beneficial to run loop optimizations after this.

================
Comment at: lib/Target/PowerPC/PPCLowerMemIntrinsics.cpp:28
+
+#define DEBUG_TYPE "ppc-lower-mem-intrinsics"
+
----------------
I really think the debug type and the options should reflect what this pass does. Naming something `lower-mem-intrinsics` suggests that without it we don't lower these, which isn't entirely the case. Also, no real information is gained from statements such as "extra memcpy expansions". On the other hand, saying what this thing does is useful - such as "Expand memcpy calls into loops under specified conditions". And respectively, an option such as `ppc-enable-memcpy-loops`.

================
Comment at: lib/Target/PowerPC/PPCLowerMemIntrinsics.cpp:35
+// enabled with 'ppc-expand-extra-memcpy=true'. A follow on patch will refine
+// the expansions based on profiling data.
+
----------------
I suppose the temporary comments such as the last two sentences are fine as long as you remember to remove them in the follow-up patch.

================
Comment at: lib/Target/PowerPC/PPCLowerMemIntrinsics.cpp:78
+
+char PPCLowerMemIntrinsics::ID = 0;
+
----------------
Seems like in most implementation files, the convention is to put these at the bottom of the file. Having them in a fixed location in every file makes them easy to find.

================
Comment at: lib/Target/PowerPC/PPCLowerMemIntrinsics.cpp:84
+// Checks whether the cpu arch is one where we want to expand
+// memcpy calls. We expand for little-endian PPC cpus.
+static bool CPUCheck(const std::string &CpuStr) {
----------------
Umm, this is kind of meaningless. Technically, PowerPC CPU's have been capable of running in both little-endian and big-endian mode for about 4-5 generations. If the pass does not check endianness, there's no real need to mention it here.

================
Comment at: lib/Target/PowerPC/PPCLowerMemIntrinsics.cpp:139
+  bool AnyExpanded = false;
+  assert(Intrinsic::memcpy == F.getIntrinsicID());
+  // loop over all memcpy calls
----------------
Assert message please.

================
Comment at: lib/Target/PowerPC/PPCLowerMemIntrinsics.cpp:146
+      const TargetTransformInfo &TTI =
+          getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
+      ppcExpandMemCpyAsLoop(MC, TTI);
----------------
Wouldn't we want the TTI from the caller? I know that in practice there's no difference, but I can certainly envision possibilities where that isn't really the case.

================
Comment at: lib/Target/PowerPC/PPCLowerMemIntrinsics.cpp:160
+
+  if (skipModule(M))
+    return false;
----------------
I don't think there's a need to split these two early exits since there's nothing between them.

================
Comment at: lib/Target/PowerPC/PPCTargetMachine.cpp:351
+  if (TM->getOptLevel() != CodeGenOpt::None)
+    addPass(createPPCLowerMemIntrinsicsPass());
+
----------------
Would it be possible (or appropriate) to run the loop passes after this since we may have introduced loops? Seems like unrolling and vectorization may be beneficial (at least in some circumstances).

================
Comment at: lib/Target/PowerPC/PPCTargetTransformInfo.cpp:469
+    unsigned OpSize = OpTy->getBitWidth() / 8;
+    // loops just incase the remaining bytes are greater or equal to
+    // twice the largest copy operand type.
----------------
Nit: full sentences for comments. And s/incase/in case.

================
Comment at: test/CodeGen/PowerPC/memcpy-loop-expansion.ll:13
+; RUN: FileCheck %s --check-prefix OPTNONE
+
+declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i32, i1) #0
----------------
In general, I think all the functions should include checks for `PWR7`, `OPTSMALL` and `OPTNONE` (providing that the `-Os/-Oz` can be specified on the command line for the tool you're invoking).

================
Comment at: test/CodeGen/PowerPC/memcpy-loop-expansion.ll:141
+  ret i8* %dst
+; PWR7-LABEL: @memcpy_power7
+; PWR7:       tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %dst, i8* %src, i64 %len, i32 1, i1 false)
----------------
I think this should be ALL shouldn't it? You have the CPU=pwr7 attribute on the function, so regardless of the invocation, we shouldn't optimize. Same goes for `memcpy_opt_small`.

Repository:
  rL LLVM

https://reviews.llvm.org/D31772