<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Feb 4, 2015, at 5:42 PM, Hal Finkel <<a href="mailto:hfinkel@anl.gov" class="">hfinkel@anl.gov</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">----- Original Message -----</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class="">From: "Michael Zolotukhin" <<a href="mailto:mzolotukhin@apple.com" class="">mzolotukhin@apple.com</a>><br class="">To: "Hal J. Finkel" <<a href="mailto:hfinkel@anl.gov" class="">hfinkel@anl.gov</a>><br class="">Cc: "Commit Messages and Patches for LLVM" <<a href="mailto:llvm-commits@cs.uiuc.edu" class="">llvm-commits@cs.uiuc.edu</a>><br class="">Sent: Wednesday, February 4, 2015 1:29:34 PM<br class="">Subject: Re: [RFC] Heuristic for complete loop unrolling<br class=""><br class="">On Feb 3, 2015, at 7:28 PM, Hal Finkel < <a href="mailto:hfinkel@anl.gov" class="">hfinkel@anl.gov</a> > wrote:<br class=""><br class="">----- Original Message -----<br class=""><br class=""><br class="">From: "Michael Zolotukhin" < <a href="mailto:mzolotukhin@apple.com" class="">mzolotukhin@apple.com</a> ><br class="">To: "Hal J. Finkel" < <a href="mailto:hfinkel@anl.gov" class="">hfinkel@anl.gov</a> ><br class="">Cc: "Commit Messages and Patches for LLVM" < <a href="mailto:llvm-commits@cs.uiuc.edu" class="">llvm-commits@cs.uiuc.edu</a><br class=""><blockquote type="cite" class=""><br class=""></blockquote>Sent: Tuesday, February 3, 2015 7:28:48 PM<br class="">Subject: Re: [RFC] Heuristic for complete loop unrolling<br class=""><br class=""><br class="">Hi Hal,<br class=""><br class=""><br class="">These are updated versions of the patches:<br class=""><br class="">I kept visiting only binary operators for now, but added a ‘TODO’<br class="">there. I’ll address it in later patches, since it looks like 1) it<br class="">can be done incrementally if we just add insn-visitors one by one,<br class="">2) if we decide to reuse code from inlining (I prefer this option),<br class="">then It’ll be an independent and probably big effort for refactoring<br class="">this code first.<br class=""><br class=""><br class="">As for the grammar fixes, everything should be fixed now, thank you<br class="">for pointing me to them! Also, I feel like variables and options<br class="">names that I used are also not ideal - if you have better ideas for<br class="">them, please let me know.<br class=""><br class=""><br class="">Is it ok to commit the first patch?<br class=""><br class="">Yes, but a few things to address first:<br class=""><br class="">1. There are a couple of places where you have dyn_cast that should<br class="">always succeed:<br class=""><br class="">+ NumberOfOptimizedInstructions +=<br class=""><br class="">+ TTI.getUserCost(dyn_cast<Instruction>(&I));<br class=""><br class="">+ LoadInst *LI = LoadDescr.first;<br class=""><br class="">...<br class="">+ NumberOfOptimizedInstructions +=<br class=""><br class="">+ TTI.getUserCost(dyn_cast<Instruction>(LI));<br class=""><br class=""><br class="">+ NumberOfOptimizedInstructions +=<br class=""><br class="">+ TTI.getUserCost(dyn_cast<Instruction>(I));<br class=""><br class="">If you need a cast in these place at all (you might for the<br class="">iterators, but for the LoadInst * you shouldn't), use a cast<>, not<br class="">a dyn_cast<>.<br class="">Thanks, fixed! Updated patch attached.<br class=""><br class=""><br class=""><br class=""><br class="">2. In EstimateNumberOfSimplifiedInsns(unsigned Iteration) we have:<br class=""><br class="">+ while (!Worklist.empty()) {<br class=""><br class="">+ Instruction *I = Worklist.pop_back_val();<br class=""><br class="">+ if (!visit(I))<br class=""><br class="">+ continue;<br class=""><br class="">+ for (auto U : I->users()) {<br class=""><br class="">+ Instruction *UI = dyn_cast<Instruction>(U);<br class=""><br class="">+ if (!UI)<br class=""><br class="">+ continue;<br class=""><br class="">+ if (!L->contains(UI))<br class=""><br class="">+ continue;<br class=""><br class="">+ Worklist.push_back(UI);<br class=""><br class="">+ }<br class=""><br class="">+ }<br class=""><br class="">Worklist is a SmallVector, and so I think you might potentially visit<br class="">the same users more than once. For example, if you have two loads<br class="">that can be turned into constants, and then you add the results<br class="">together, the add will be visited twice. One possible solution is to<br class="">keep a Visited set, and explicitly avoid visiting the same user<br class="">twice. To do this, you'll need to make sure that you visit the user<br class="">graph breadth first (by which I mean such that you know you've<br class="">visited all relevant operands of an instructions before visiting the<br class="">instruction itself), not depth first as you currently do.<br class="">That’s true, we can visit some instructions twice, but I don't think<br class="">we can easily avoid that. Breadth first search wouldn’t solve this<br class="">problem, e.g. in the following case:<br class="">(1) %a = load 1<br class="">(2) %b = load 2<br class="">(3) %use_a1 = add %a, 5<br class="">(4) %use_a2 = sub %use_a1, 6<br class="">(5) %common_use = mul %use_a2, %b<br class="">Starting from loads (1) and (2), BFS will visit (3) and (5) first,<br class="">and only then (4). Thus, it will visit (5) before one of its<br class="">operands - (4).<br class=""><br class=""><br class="">We can try to use topological order here, but in this case we would<br class="">visit all users - currently we only visit those that have at least<br class="">one operand simplified. I’m not sure what’s the right call here. I<br class="">think that in real cases the current algorithm should be fine,<br class="">though asymptotically it’s worse, than using topological order. To<br class="">play safe, we can add a limitation, so that we’ll give up if number<br class="">of performed iterations is too high.<br class=""><br class=""><br class="">What do you think?<br class=""><br class=""></blockquote><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">I think there is a simpler solution: Keep as Visited set, but don't use it to directly prune the search. Since we only really care about the number of simplified instructions, add simplified instructions to the Visited set, and if the instruction is already in the set, don't increment NumberOfOptimizedInstructions. So we'll visit multiple times (potentially), but we won't over-count. Do you think that will work?</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""></div></blockquote>Yes, I think that would work. Here is a new patch:</div><div></div></body></html>