<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""></div><div class=""><br class=""></div><div class="">In this patch we use more aggressive strategy in choosing threshold. My idea here is that we can go beyond the default threshold if we can significantly optimize the unrolled body, but we still should be in reasonable limits. Let’s look at examples to illustrate it better:</div><div class="">a) DefaultThreshold=150, LoopSize=50, IterationsNumber=5, UnrolledLoopSize=50*5=250, NumberOfPotentiallyOptimizedInstructions=90</div><div class="">In this case after unroll we would get 250 instructions, and after inst-simplify+DCE, we’ll go down to 160 instructions. Though it’s still bigger than DefaultThreshold, I think we do want to unroll here, since it would speed up this part by ~90/250=36%.</div><div class=""><br class=""></div><div class="">b) DefaultThreshold=150, LoopSize=1000, IterationsNumber=1000, UnrolledLoopSize=1000*1000, NumberOfPotentiallyOptimizedInstructions=500. The absolute number of optimized instructions in this example is bigger than in the previous one, but we don’t want to unroll here, because the resultant code would be huge, and we’d only save 500/(1000*1000)=0.05% instructions.</div><div class=""><br class=""></div><div class="">To handle both situations, I suggest only unroll if we can optimize significant portion of the loop body, e.g. if after unrolling we can remove N% of code. We might want to have an absolute upper limit for final code size too (i.e. even if we optimize 50% of instructions, don’t unroll loop with 10^8 iterations), but for now I decided to add only one new parameter to avoid over-complicating things.</div><div class=""><br class=""></div><div class="">In the patch I use weight for optimized instruction (yep, like in the previous patch, but here it’s more meaningful), which is essentially equivalent to a ratio between original and removed code size.</div><div class=""><br class=""></div><div class="">My testing is still in progress and probably I’ll do some tuning later, the current default value (10) was taken almost at random.</div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">Do these patches look better?</div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">Michael</div><div class=""><br class=""></div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Jan 26, 2015, at 11:50 AM, Michael Zolotukhin <<a href="mailto:mzolotukhin@apple.com" class="">mzolotukhin@apple.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><blockquote type="cite" class=""><div class=""><br class="Apple-interchange-newline">On Jan 25, 2015, at 6:06 AM, Hal Finkel <<a href="mailto:hfinkel@anl.gov" class="">hfinkel@anl.gov</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><span class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;">----- Original Message -----</span><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><blockquote type="cite" class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">From: "Michael Zolotukhin" <<a href="mailto:mzolotukhin@apple.com" class="">mzolotukhin@apple.com</a>><br class="">To: "Hal Finkel" <<a href="mailto:hfinkel@anl.gov" class="">hfinkel@anl.gov</a>><br class="">Cc: "Arnold Schwaighofer" <<a href="mailto:aschwaighofer@apple.com" class="">aschwaighofer@apple.com</a>>, "Commit Messages and Patches for LLVM"<br class=""><<a href="mailto:llvm-commits@cs.uiuc.edu" class="">llvm-commits@cs.uiuc.edu</a>><br class="">Sent: Sunday, January 25, 2015 1:20:18 AM<br class="">Subject: Re: [RFC] Heuristic for complete loop unrolling<br class=""><br class=""><br class="">On Jan 24, 2015, at 12:38 PM, Hal Finkel <<span class="Apple-converted-space"> </span><a href="mailto:hfinkel@anl.gov" class="">hfinkel@anl.gov</a><span class="Apple-converted-space"> </span>> wrote:<br class=""><br class="">----- Original Message -----<br class=""><br class=""><br class="">From: "Michael Zolotukhin" <<span class="Apple-converted-space"> </span><a href="mailto:mzolotukhin@apple.com" class="">mzolotukhin@apple.com</a><span class="Apple-converted-space"> </span>><br class="">To: "Hal Finkel" <<span class="Apple-converted-space"> </span><a href="mailto:hfinkel@anl.gov" class="">hfinkel@anl.gov</a><span class="Apple-converted-space"> </span>><br class="">Cc: "Arnold Schwaighofer" <<span class="Apple-converted-space"> </span><a href="mailto:aschwaighofer@apple.com" class="">aschwaighofer@apple.com</a><span class="Apple-converted-space"> </span>>, "Commit<br class="">Messages and Patches for LLVM"<br class=""><<span class="Apple-converted-space"> </span><a href="mailto:llvm-commits@cs.uiuc.edu" class="">llvm-commits@cs.uiuc.edu</a><span class="Apple-converted-space"> </span>><br class="">Sent: Saturday, January 24, 2015 2:26:03 PM<br class="">Subject: Re: [RFC] Heuristic for complete loop unrolling<br class=""><br class=""><br class="">Hi Hal,<br class=""><br class=""><br class="">Thanks for the review! Please see my comments inline.<br class=""><br class=""><br class=""><br class=""><br class="">On Jan 24, 2015, at 6:44 AM, Hal Finkel <<span class="Apple-converted-space"> </span><a href="mailto:hfinkel@anl.gov" class="">hfinkel@anl.gov</a><span class="Apple-converted-space"> </span>> wrote:<br class=""><br class="">[moving patch review to llvm-commits]<br class=""><br class="">+static bool CanEliminateLoadFrom(Value *V) {<br class=""><br class="">+ if (Constant *C = dyn_cast<Constant>(V)) {<br class=""><br class="">+ if (GlobalVariable *GV = dyn_cast<GlobalVariable>(C))<br class=""><br class="">+ if (GV->isConstant() && GV->hasDefinitiveInitializer())<br class=""><br class=""><br class="">Why are you casting to a Constant, and then to a GlobalVariable, and<br class="">then checking GV->isConstant()? There seems to be unnecessary<br class="">redundancy here ;)<br class="">Indeed:)<br class=""><br class=""><br class=""><br class=""><br class="">+ return GV->getInitializer();<br class=""><br class="">+ }<br class=""><br class="">+ return false;<br class=""><br class="">+}<br class=""><br class="">+static unsigned ApproximateNumberOfEliminatedInstruction(const Loop<br class="">*L,<br class=""><br class="">+ ScalarEvolution &SE) {<br class=""><br class="">This function seems mis-named. For one thing, it is not counting the<br class="">number of instructions potentially eliminated, but the number of<br class="">loads. Eliminating the loads might have lead to further constant<br class="">propagation, and really calculating the number of eliminated<br class="">instructions would require estimating that effect, right?<br class="">That’s right. But I’d rather change the function name, than add such<br class="">calculations, since it’ll look very-very narrow targeted, and I<br class="">worry that we might start to lose some cases as well. How about<br class="">‘NumberOfConstantFoldedLoads’?<br class=""><br class=""><br class="">Sounds good to me.<br class=""><br class=""><br class=""><br class=""><br class=""><br class=""><br class=""><br class="">if (TripCount && Count == TripCount) {<br class=""><br class="">- if (Threshold != NoThreshold && UnrolledSize > Threshold) {<br class=""><br class="">+ if (Threshold != NoThreshold && UnrolledSize > Threshold + 20 *<br class="">ElimInsns) {<br class=""><br class=""><br class="">20, huh? Is this a heuristic for constant propagation? It feels like<br class="">we should be able to do better than this.<br class="">Yep, that’s a parameter of the heuristic. Counting each ‘constant'<br class="">load as 1 is too conservative and doesn’t give much here,<br class=""><br class="">Understood.<br class=""><br class=""><br class=""><br class="">but since<br class="">we don’t actually count number of eliminated instructions we need<br class="">some estimate for it. This estimate is really rough, since in some<br class="">cases we can eliminate the entire loop body, while in the others we<br class="">can’t eliminate anything.<br class=""><br class="">I think that we might as well estimate it. Once we know the loads<br class="">that unrolling will constant fold, put them into a SmallPtrSet S.<br class="">Then, use a worklist-based iteration of the uses of instructions in<br class="">S. For each use that has operands that are all either constants, or<br class="">in S, queue them and keep going. Make sure you walk breadth first<br class="">(not depth first) so that you capture things will multiple feeder<br class="">loads properly. As you do this, count the number of instructions<br class="">that will be constant folded (keeping a Visited set in the usual way<br class="">so you don't over-count). This will be an estimate, but should be a<br class="">very accurate one, and can be done without any heuristic parameters<br class="">(and you still only visit a linear number of instructions, so should<br class="">not be too expensive).<br class="">The biggest gain comes not from expressions buf[0]*buf[3], which can<br class="">be constant folded when buf[0] and buf[3] are substituted with<br class="">constants, but from x*buf[0], when buf[0] turns out to be 0, or 1.<br class="">I.e. we don’t constant-fold it, but we simplify the expression based<br class="">on the particular value. I.e. the original convolution test is in<br class="">some sense equivalent to the following code:<br class="">const int a[] = [0, 1, 5, 1, 0];<br class="">int *b;<br class="">for(i = 0; i < 100; i ++) {<br class="">for(j = 0; j < 5; j++)<br class="">b[i] += b[i+j]*a[j];<br class="">}<br class=""><br class=""><br class="">Complete unrolling will give us:<br class=""><br class="">for(i = 0; i < 100; i ++) {<br class="">b[i] += b[i]*a[0];<br class=""><br class="">b[i] += b[i+1]*a[1];<br class=""><br class="">b[i] += b[i+2]*a[2];<br class=""><br class="">b[i] += b[i+3]*a[3];<br class=""><br class="">b[i] += b[i+4]*a[4];<br class="">}<br class=""><br class=""><br class="">After const-prop we’ll get:<br class=""><br class="">for(i = 0; i < 100; i ++) {<br class="">b[i] += b[i]*0;<br class=""><br class="">b[i] += b[i+1]*1;<br class=""><br class="">b[i] += b[i+2]*5;<br class=""><br class="">b[i] += b[i+3]*1;<br class=""><br class="">b[i] += b[i+4]*0;<br class="">}<br class="">And after simplification:<br class=""><br class="">for(i = 0; i < 100; i ++) {<br class="">b[i] += b[i+1];<br class=""><br class=""><br class="">b[i] += b[i+2]*5;<br class=""><br class="">b[i] += b[i+3];<br class=""><br class="">}<br class=""><br class=""><br class="">As you can see, we actually folded nothing here, but rather we<br class="">simplified more than a half of instructions. But it’s hard to<br class="">predict, which optimizations will be enabled by exact values<br class="">replaced loads<span class="Apple-converted-space"> </span><br class=""></blockquote><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;">Agreed.</span><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;">- i.e. of course we can check it, but it would be too</span><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><blockquote type="cite" class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">optimization-specific in my opinion.<span class="Apple-converted-space"> </span><br class=""></blockquote><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;">I understand your point, but I think using one magic number is swinging the pendulum too far in the other direction.</span><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><blockquote type="cite" class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">Thus, I decided to use some<br class="">number (20 in the current patch) which represents some average<br class="">profitability of replacing a load with constant. I think I should<br class="">get rid of this magic number though, and replace it with some<br class="">target-specific parameter (that’ll help to address Owen’s<br class="">suggestions).<br class=""></blockquote><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;">Using a target-specific cost (or costs) is a good idea, but having target-specific magic numbers is going to be a mess. Different loops will unroll, or not, for fairly arbitrary reasons, on different targets. This is a heuristic, that I understand, and you won't be able to exactly predict all possible later simplifications enabled by the unrolling. However, the fact that you currently need a large per-instruction boost factor, like 20, I think, means that the model is too coarse.</span><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;">Maybe it would not be unreasonable to start from the other side: If we first located loads that could be constant folded, and determined the values those loads would actually take, and then simplified the loop instructions based on those values, we'd get a pretty good estimate. This is essentially what the code in lib/Analysis/IPA/InlineCost.cpp does when computing the inlining cost savings, and I think we could use the same technique here. Admittedly, the inline cost analysis is relatively expensive, but I'd think we could impose a reasonable size cutoff to limit the overall expense for the case of full unrolling -- for this, maybe the boost factor of 20 is appropriate -- and would also address Owen's point, as far as I can tell, because like the inline cost analysis, we can use TTI to compute the target-specific cost of GEPs, etc. Ideally, we could refactor the ICA a little bit, and re-use the code in there directly for this purpose. This way we can limit the number of places in which we compute similar kinds of heuristic simplification costs.</span><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"></div></blockquote>Hi Hal,</div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br class=""></div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">Thanks for the comments. I think that’s a good idea, and I’ll try that out. If that doesn’t lead to over-complicated implementation, I’d like it much better than having a magic number. I’ll prepare a new patch soon.</div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br class=""></div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">Thanks,</div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">Michael</div><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br class=""><blockquote type="cite" class=""><div class=""><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;">-Hal</span><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><blockquote type="cite" class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><br class=""><br class="">Thanks,<br class="">Michael<br class=""><br class=""><br class=""><br class=""><br class=""><br class="">-Hal<br class=""><br class=""><br class=""><br class=""><br class=""><br class="">I’d really appreciate a feedback on how to model this in a better<br class="">way.<br class=""><br class=""><br class="">Thanks,<br class="">Michael<br class=""><br class=""><br class=""><br class=""><br class="">Thanks again,<br class="">Hal<br class=""><br class="">----- Original Message -----<br class=""><br class=""><br class="">From: "Michael Zolotukhin" <<span class="Apple-converted-space"> </span><a href="mailto:mzolotukhin@apple.com" class="">mzolotukhin@apple.com</a><span class="Apple-converted-space"> </span>><br class="">To: "LLVM Developers Mailing List (<span class="Apple-converted-space"> </span><a href="mailto:llvmdev@cs.uiuc.edu" class="">llvmdev@cs.uiuc.edu</a><span class="Apple-converted-space"> </span>)" <<br class=""><a href="mailto:llvmdev@cs.uiuc.edu" class="">llvmdev@cs.uiuc.edu</a><span class="Apple-converted-space"> </span>><br class="">Cc: "Hal J. Finkel" <<span class="Apple-converted-space"> </span><a href="mailto:hfinkel@anl.gov" class="">hfinkel@anl.gov</a><span class="Apple-converted-space"> </span>>, "Arnold Schwaighofer" <<br class=""><a href="mailto:aschwaighofer@apple.com" class="">aschwaighofer@apple.com</a><span class="Apple-converted-space"> </span>><br class="">Sent: Friday, January 23, 2015 2:05:11 PM<br class="">Subject: [RFC] Heuristic for complete loop unrolling<br class=""><br class=""><br class=""><br class="">Hi devs,<br class=""><br class="">Recently I came across an interesting testcase that LLVM failed to<br class="">optimize well. The test does some image processing, and as a part of<br class="">it, it traverses all the pixels and computes some value basing on<br class="">the adjacent pixels. So, the hot part looks like this:<br class=""><br class="">for(y = 0..height) {<br class="">for (x = 0..width) {<br class="">val = 0<br class="">for (j = 0..5) {<br class="">for (i = 0..5) {<br class="">val += img[x+i,y+j] * weight[i,j]<br class="">}<br class="">}<br class="">}<br class="">}<br class=""><br class="">And ‘weight' is just a constant matrix with some coefficients.<br class=""><br class="">If we unroll the two internal loops (with tripcount 5), then we can<br class="">replace weight[i,j] with concrete constant values. In this<br class="">particular case, many of the coefficients are actually 0 or 1, which<br class="">enables huge code simplifications later on. But currently we unroll<br class="">only the innermost one, because unrolling both of them will exceed<br class="">the threshold.<br class=""><br class="">When deciding whether to unroll or not, we currently look only at the<br class="">instruction count of the loop. My proposal is to, on top of that,<br class="">check if we can enable any later optimizations by unrolling - in<br class="">this case by replacing a load with a constant. Similar to what we do<br class="">in inlining heuristics, we can estimate how many instructions would<br class="">be potentially eliminated after unrolling and adjust our threshold<br class="">with this value.<br class=""><br class="">I can imagine that it might be also useful for computations,<br class="">involving sparse constant matrixes (like identity matrix).<br class=""><br class="">The attached patch implements this feature, and with it we handle the<br class="">original testcase well.<br class=""><br class=""><br class=""><br class=""><br class=""><br class="">Does it look good? Of course, any ideas, suggestions and other<br class="">feedback are welcome!<br class=""><br class=""><br class="">Thanks,<br class="">Michael<br class=""><br class="">--<br class="">Hal Finkel<br class="">Assistant Computational Scientist<br class="">Leadership Computing Facility<br class="">Argonne National Laboratory<br class=""><br class=""><br class="">--<br class="">Hal Finkel<br class="">Assistant Computational Scientist<br class="">Leadership Computing Facility<br class="">Argonne National Laboratory<br class=""><br class=""></blockquote><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;">--<span class="Apple-converted-space"> </span></span><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;">Hal Finkel</span><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;">Assistant Computational Scientist</span><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;">Leadership Computing Facility</span><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;">Argonne National Laboratory</span></div></blockquote></div><br class="" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">_______________________________________________</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">llvm-commits mailing list</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><a href="mailto:llvm-commits@cs.uiuc.edu" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">llvm-commits@cs.uiuc.edu</a><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a></div></blockquote></div><br class=""></div></body></html>