[PATCH] D31772: [PowerPC] Add pass to expand extra memcpy calls
Sean Fertile via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed May 17 11:51:33 PDT 2017
sfertile marked 6 inline comments as done.
sfertile added inline comments.
================
Comment at: lib/Target/PowerPC/PPCLowerMemIntrinsics.cpp:28
+
+#define DEBUG_TYPE "ppc-lower-mem-intrinsics"
+
----------------
nemanjai wrote:
> I really think the debug type and the options should reflect what this pass does. Naming something `lower-mem-intrinsics` suggests that without it we don't lower these, which isn't entirely the case. Also, no real information is gained from statements such as "extra memcpy expansions". On the other hand, saying what this thing does is useful - such as "Expand memcpy calls into loops under specified conditions". And respectively, an option such as `ppc-enable-memcpy-loops`.
This makes sense to me. Initially I thought we would end up doing similar expansions for some of the other memory-intrinsics. I've only updated the debug-type and option for now but I think I will change the file-name and pass-name as well since they really aren't describing what we are currently doing. If/when we add other transformations we can update the name to reflect that.
================
Comment at: lib/Target/PowerPC/PPCLowerMemIntrinsics.cpp:35
+// enabled with 'ppc-expand-extra-memcpy=true'. A follow on patch will refine
+// the expansions based on profiling data.
+
----------------
nemanjai wrote:
> I suppose the temporary comments such as the last two sentences are fine as long as you remember to remove them in the follow-up patch.
Removed the temporary comment anyway.
================
Comment at: lib/Target/PowerPC/PPCLowerMemIntrinsics.cpp:84
+// Checks whether the cpu arch is one where we want to expand
+// memcpy calls. We expand for little-endian PPC cpus.
+static bool CPUCheck(const std::string &CpuStr) {
----------------
nemanjai wrote:
> Umm, this is kind of meaningless. Technically, PowerPC CPU's have been capable of running in both little-endian and big-endian mode for about 4-5 generations. If the pass does not check endianness, there's no real need to mention it here.
I added 'ppc64le' since that is the default target_cpu on little-endian. I thought that little-endian was only officially supported starting in Power8, so having a LE cpu indicated Power8 or above for the architecture. If thats not the case I can rethink this switch.
================
Comment at: lib/Target/PowerPC/PPCLowerMemIntrinsics.cpp:146
+ const TargetTransformInfo &TTI =
+ getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
+ ppcExpandMemCpyAsLoop(MC, TTI);
----------------
nemanjai wrote:
> Wouldn't we want the TTI from the caller? I know that in practice there's no difference, but I can certainly envision possibilities where that isn't really the case.
You are right. The reason I moved the check to inside the loop was so that I could get the TTI from the caller, then I forgot to actually do that.
================
Comment at: lib/Target/PowerPC/PPCTargetMachine.cpp:351
+ if (TM->getOptLevel() != CodeGenOpt::None)
+ addPass(createPPCLowerMemIntrinsicsPass());
+
----------------
nemanjai wrote:
> Would it be possible (or appropriate) to run the loop passes after this since we may have introduced loops? Seems like unrolling and vectorization may be beneficial (at least in some circumstances).
Yes, especially for the known size loop expansion. I initially tested this with the loop expansion being hand unrolled between 2-4 times and it had much better performance in my micro-benches.
================
Comment at: test/CodeGen/PowerPC/memcpy-loop-expansion.ll:13
+; RUN: FileCheck %s --check-prefix OPTNONE
+
+declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture writeonly, i8* nocapture readonly, i64, i32, i1) #0
----------------
nemanjai wrote:
> In general, I think all the functions should include checks for `PWR7`, `OPTSMALL` and `OPTNONE` (providing that the `-Os/-Oz` can be specified on the command line for the tool you're invoking).
I've changed the optsmall checks to be run as part of the OPT test since they didn't really have to be run on there own. The reason the no-opt test is run with llc is because opt didn't respect -O0 when invoked like this. I'll expand the PWR7 checks to the other functions as well.
Repository:
rL LLVM
https://reviews.llvm.org/D31772
More information about the llvm-commits
mailing list