[PATCH] Reimplement heuristic for estimating complete-unroll optimization effects.
Chandler Carruth
chandlerc at gmail.com
Mon Feb 23 18:44:56 PST 2015
Initial comments below. I'd also like to see some compile time impact measurement, but maybe wait to settle the design a bit first.
================
Comment at: lib/Transforms/Scalar/LoopUnrollPass.cpp:488-492
@@ +487,7 @@
+ EarlyExitFound = false;
+ // Estimate number of instructions, that could be simplified if we replace a
+ // load with the corresponding constant. Since the same load will take
+ // different values on different iterations, we have to go through all
+ // loop's
+ // iterations here.
+ for (Iteration = 0; Iteration < TripCount; ++Iteration) {
----------------
This comment seems out of date.
================
Comment at: lib/Transforms/Scalar/LoopUnrollPass.cpp:494-495
@@ +493,4 @@
+ for (Iteration = 0; Iteration < TripCount; ++Iteration) {
+ if (EarlyExitFound)
+ break;
+
----------------
There is essentially no documentation of "EarlyExitFound" or why this behavior is correct.
Generally, I'm worried about all the 'break' and loop control flow here. I think this is going to be a bit easier to understand if our early-stop points just return.
================
Comment at: lib/Transforms/Scalar/LoopUnrollPass.cpp:497
@@ +496,3 @@
+
+ BBSetVector BBWorklist;
+ BBWorklist.insert(L->getHeader());
----------------
Rather than declaring this in the iteration loop, please declare it in simulateLoop and clear it on each iteration so that we re-use allocated storage. We're *very* likely to put roughly the same number of BBs on the worklist each trip through.
================
Comment at: lib/Transforms/Scalar/LoopUnrollPass.cpp:505-512
@@ +504,10 @@
+
+ analyzeBlock(BB);
+
+ // If unrolled body turns out to be too big, bail out.
+ if (UnrolledLoopSize - NumberOfOptimizedInstructions >
+ MaxUnrolledLoopSize)
+ return;
+
+ addBBSuccessors(BB, &BBWorklist);
+ }
----------------
I think it would be more clear to inline the analyzeBlock and addBBSuccessors here... At the least, inlining addBBSuccessors avoids passing around the worklist.
================
Comment at: lib/Transforms/Scalar/LoopUnrollPass.cpp:515-519
@@ -469,10 +514,7 @@
- // We start by adding all loads to the worklist.
- for (auto &LoadDescr : LoadBaseAddresses) {
- LoadInst *LI = LoadDescr.first;
- SimplifiedValues[LI] = computeLoadValue(LI, Iteration);
- if (CountedInstructions.insert(LI).second)
- NumberOfOptimizedInstructions += TTI.getUserCost(LI);
-
- for (User *U : LI->users())
- Worklist.insert(cast<Instruction>(U));
+ // If we found no optimization opportunities on the first iteration, we
+ // won't find them on later ones too.
+ if (!NumberOfOptimizedInstructions) {
+ UnrolledLoopSize = UINT_MAX;
+ break;
+ }
----------------
Why not return?
================
Comment at: lib/Transforms/Scalar/LoopUnrollPass.cpp:524-531
@@ -480,13 +523,10 @@
- // And then we try to simplify every user of every instruction from the
- // worklist. If we do simplify a user, add it to the worklist to process
- // its users as well.
- while (!Worklist.empty()) {
- Instruction *I = Worklist.pop_back_val();
- if (!L->contains(I))
- continue;
- if (!visit(I))
- continue;
- for (User *U : I->users())
- Worklist.insert(cast<Instruction>(U));
- }
+ // Visit all GEPs in the loop L, and find those that after complete loop
+ // unrolling would become constants, or BaseAddress+Constant. For this to
+ // happen, a GEPs SCEV expression should contain at most one AddRec
+ // expression, and the loop corresponding to this expression should be L. The
+ // rest SCEV sub-expressions should be either constants, or ScevUnknown
+ // (which would become the base address).
+ // If the expression contains the base address, then after subtracting it, we
+ // should get AddRec with constant step and start.
+ void cacheSCEVResults() {
----------------
I find it a bit surprising to do this caching up-front. I would expect it to work better to lazily populate the cache based on the expressions we see.
If that makes sense, I think it also makes sense to factor this into a completely separate change that just wraps up SCEV queries in a lazily populated cache, specifically for the purpose of simulation? Might make sense to fully sink this into SCEV, I don't know and would defer to others that know SCEV better.
http://reviews.llvm.org/D7837
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
More information about the llvm-commits
mailing list