<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri","sans-serif";}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">I can commit the change. I have a workspace set up with just that patch in it.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">-Andy<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> Karthik Bhat [mailto:blitz.opensource@gmail.com]
<br>
<b>Sent:</b> Friday, April 24, 2015 9:38 AM<br>
<b>To:</b> Aaron Ballman<br>
<b>Cc:</b> Karthik Bhat; llvm-commits; Kaylor, Andrew<br>
<b>Subject:</b> Re: [llvm] r231458 - Add a new pass "Loop Interchange"<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">Hi Aaron,<o:p></o:p></p>
<div>
<p class="MsoNormal">I feel we can use <span style="color:black">removeChildLoop as we want to move sub-loops inside the inner loop into outer loop after loops are interchanged. LoopSimplify and LoopInfo currently use this API.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="color:black">But I agree that part of code was badly written and was bound to fail. Unfortunately was not able to catch it during our testing.
</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">I feel Andrew Kaylor's patch with modification should be good. I had tested the patch in morning and it seems to work fine. <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Unfortunately I'm not having a workspace access as of now to commit the patch..:( Creating the same.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Sorry once again for the noise.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Thanks and Regards<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Karthik Bhat<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">On Fri, Apr 24, 2015 at 9:47 PM, Aaron Ballman <<a href="mailto:aaron@aaronballman.com" target="_blank">aaron@aaronballman.com</a>> wrote:<o:p></o:p></p>
<p>Ah, I hadn't noticed that patch! Yes, it looks good to me to commit it, but we may still want to consider whether the removal api is the best approach.
<o:p></o:p></p>
<p>-Aaron<o:p></o:p></p>
<div>
<p class="MsoNormal">On Apr 24, 2015 11:53 AM, "Karthik Bhat" <<a href="mailto:blitz.opensource@gmail.com" target="_blank">blitz.opensource@gmail.com</a>> wrote:<o:p></o:p></p>
<div>
<p>Hi Aaron,<br>
Sorry for the delay..<o:p></o:p></p>
<p>Kyler had proposed a fix which looks good to me.. I have added my comments on the same.. I think it would be submitted shortly.<o:p></o:p></p>
<p><a href="http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150420/273084.html" target="_blank">http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150420/273084.html</a><o:p></o:p></p>
<p>Shall i go ahead and make those changes?<o:p></o:p></p>
<p>Thanks and Regards<o:p></o:p></p>
<p>Karthik Bhat<o:p></o:p></p>
<p><o:p> </o:p></p>
<p><o:p> </o:p></p>
<div>
<p class="MsoNormal">On 24-Apr-2015 8:25 pm, "Aaron Ballman" <<a href="mailto:aaron@aaronballman.com" target="_blank">aaron@aaronballman.com</a>> wrote:<o:p></o:p></p>
<p class="MsoNormal">I am still seeing these failures on Windows; can you please address<br>
them (I suspect reversion would be a horribly painful option I would<br>
like to avoid).<br>
<br>
Thanks!<br>
<span style="color:#888888"><br>
~Aaron</span><o:p></o:p></p>
<div>
<p class="MsoNormal"><br>
On Thu, Apr 23, 2015 at 11:07 AM, Aaron Ballman <<a href="mailto:aaron@aaronballman.com" target="_blank">aaron@aaronballman.com</a>> wrote:<br>
> On Fri, Mar 6, 2015 at 5:11 AM, Karthik Bhat <<a href="mailto:kv.bhat@samsung.com" target="_blank">kv.bhat@samsung.com</a>> wrote:<br>
>> Author: karthik<br>
>> Date: Fri Mar 6 04:11:25 2015<br>
>> New Revision: 231458<br>
>><br>
>> URL: <a href="http://llvm.org/viewvc/llvm-project?rev=231458&view=rev" target="_blank">
http://llvm.org/viewvc/llvm-project?rev=231458&view=rev</a><br>
>> Log:<br>
>> Add a new pass "Loop Interchange"<br>
>> This pass interchanges loops to provide a more cache-friendly memory access.<br>
>><br>
>> For e.g. given a loop like -<br>
>> for(int i=0;i<N;i++)<br>
>> for(int j=0;j<N;j++)<br>
>> A[j][i] = A[j][i]+B[j][i];<br>
>><br>
>> is interchanged to -<br>
>> for(int j=0;j<N;j++)<br>
>> for(int i=0;i<N;i++)<br>
>> A[j][i] = A[j][i]+B[j][i];<br>
>><br>
>> This pass is currently disabled by default.<br>
>><br>
>> To give a brief introduction it consists of 3 stages-<br>
>><br>
>> LoopInterchangeLegality : Checks the legality of loop interchange based on Dependency matrix.<br>
>> LoopInterchangeProfitability: A very basic heuristic has been added to check for profitibility. This will evolve over time.<br>
>> LoopInterchangeTransform : Which does the actual transform.<br>
>><br>
>> LNT Performance tests shows improvement in Polybench/linear-algebra/kernels/mvt and Polybench/linear-algebra/kernels/gemver becnmarks.<br>
>><br>
>> TODO:<br>
>> 1) Add support for reductions and lcssa phi.<br>
>> 2) Improve profitability model.<br>
>> 3) Improve loop selection algorithm to select best loop for interchange. Currently the innermost loop is selected for interchange.<br>
>> 4) Improve compile time regression found in llvm lnt due to this pass.<br>
>> 5) Fix issues in Dependency Analysis module.<br>
>><br>
>> A special thanks to Hal for reviewing this code.<br>
>> Review: <a href="http://reviews.llvm.org/D7499" target="_blank">http://reviews.llvm.org/D7499</a><br>
>><br>
>><br>
>><br>
>> Added:<br>
>> llvm/trunk/lib/Transforms/Scalar/LoopInterchange.cpp<br>
>> llvm/trunk/test/Transforms/LoopInterchange/<br>
>> llvm/trunk/test/Transforms/LoopInterchange/currentLimitation.ll<br>
>> llvm/trunk/test/Transforms/LoopInterchange/interchange.ll<br>
>> llvm/trunk/test/Transforms/LoopInterchange/profitability.ll<br>
>> Modified:<br>
>> llvm/trunk/include/llvm/InitializePasses.h<br>
>> llvm/trunk/include/llvm/LinkAllPasses.h<br>
>> llvm/trunk/include/llvm/Transforms/Scalar.h<br>
>> llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp<br>
>> llvm/trunk/lib/Transforms/Scalar/CMakeLists.txt<br>
>> llvm/trunk/lib/Transforms/Scalar/Scalar.cpp<br>
>><br>
>> Modified: llvm/trunk/include/llvm/InitializePasses.h<br>
>> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/InitializePasses.h?rev=231458&r1=231457&r2=231458&view=diff" target="_blank">
http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/InitializePasses.h?rev=231458&r1=231457&r2=231458&view=diff</a><br>
>> ==============================================================================<br>
>> --- llvm/trunk/include/llvm/InitializePasses.h (original)<br>
>> +++ llvm/trunk/include/llvm/InitializePasses.h Fri Mar 6 04:11:25 2015<br>
>> @@ -166,6 +166,7 @@ void initializeLocalStackSlotPassPass(Pa<br>
>> void initializeLoopDeletionPass(PassRegistry&);<br>
>> void initializeLoopExtractorPass(PassRegistry&);<br>
>> void initializeLoopInfoWrapperPassPass(PassRegistry&);<br>
>> +void initializeLoopInterchangePass(PassRegistry &);<br>
>> void initializeLoopInstSimplifyPass(PassRegistry&);<br>
>> void initializeLoopRotatePass(PassRegistry&);<br>
>> void initializeLoopSimplifyPass(PassRegistry&);<br>
>><br>
>> Modified: llvm/trunk/include/llvm/LinkAllPasses.h<br>
>> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/LinkAllPasses.h?rev=231458&r1=231457&r2=231458&view=diff" target="_blank">
http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/LinkAllPasses.h?rev=231458&r1=231457&r2=231458&view=diff</a><br>
>> ==============================================================================<br>
>> --- llvm/trunk/include/llvm/LinkAllPasses.h (original)<br>
>> +++ llvm/trunk/include/llvm/LinkAllPasses.h Fri Mar 6 04:11:25 2015<br>
>> @@ -95,6 +95,7 @@ namespace {<br>
>> (void) llvm::createLICMPass();<br>
>> (void) llvm::createLazyValueInfoPass();<br>
>> (void) llvm::createLoopExtractorPass();<br>
>> + (void)llvm::createLoopInterchangePass();<br>
>> (void) llvm::createLoopSimplifyPass();<br>
>> (void) llvm::createLoopStrengthReducePass();<br>
>> (void) llvm::createLoopRerollPass();<br>
>><br>
>> Modified: llvm/trunk/include/llvm/Transforms/Scalar.h<br>
>> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Transforms/Scalar.h?rev=231458&r1=231457&r2=231458&view=diff" target="_blank">
http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Transforms/Scalar.h?rev=231458&r1=231457&r2=231458&view=diff</a><br>
>> ==============================================================================<br>
>> --- llvm/trunk/include/llvm/Transforms/Scalar.h (original)<br>
>> +++ llvm/trunk/include/llvm/Transforms/Scalar.h Fri Mar 6 04:11:25 2015<br>
>> @@ -140,6 +140,13 @@ Pass *createLICMPass();<br>
>><br>
>> //===----------------------------------------------------------------------===//<br>
>> //<br>
>> +// LoopInterchange - This pass interchanges loops to provide a more<br>
>> +// cache-friendly memory access patterns.<br>
>> +//<br>
>> +Pass *createLoopInterchangePass();<br>
>> +<br>
>> +//===----------------------------------------------------------------------===//<br>
>> +//<br>
>> // LoopStrengthReduce - This pass is strength reduces GEP instructions that use<br>
>> // a loop's canonical induction variable as one of their indices.<br>
>> //<br>
>><br>
>> Modified: llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp<br>
>> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp?rev=231458&r1=231457&r2=231458&view=diff" target="_blank">
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp?rev=231458&r1=231457&r2=231458&view=diff</a><br>
>> ==============================================================================<br>
>> --- llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp (original)<br>
>> +++ llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp Fri Mar 6 04:11:25 2015<br>
>> @@ -77,6 +77,10 @@ static cl::opt<bool><br>
>> EnableMLSM("mlsm", cl::init(true), cl::Hidden,<br>
>> cl::desc("Enable motion of merged load and store"));<br>
>><br>
>> +static cl::opt<bool> EnableLoopInterchange(<br>
>> + "enable-loopinterchange", cl::init(false), cl::Hidden,<br>
>> + cl::desc("Enable the new, experimental LoopInterchange Pass"));<br>
>> +<br>
>> PassManagerBuilder::PassManagerBuilder() {<br>
>> OptLevel = 2;<br>
>> SizeLevel = 0;<br>
>> @@ -239,6 +243,8 @@ void PassManagerBuilder::populateModuleP<br>
>> MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars<br>
>> MPM.add(createLoopIdiomPass()); // Recognize idioms like memset.<br>
>> MPM.add(createLoopDeletionPass()); // Delete dead loops<br>
>> + if (EnableLoopInterchange)<br>
>> + MPM.add(createLoopInterchangePass()); // Interchange loops<br>
>><br>
>> if (!DisableUnrollLoops)<br>
>> MPM.add(createSimpleLoopUnrollPass()); // Unroll small loops<br>
>> @@ -454,6 +460,9 @@ void PassManagerBuilder::addLTOOptimizat<br>
>> // More loops are countable; try to optimize them.<br>
>> PM.add(createIndVarSimplifyPass());<br>
>> PM.add(createLoopDeletionPass());<br>
>> + if (EnableLoopInterchange)<br>
>> + PM.add(createLoopInterchangePass());<br>
>> +<br>
>> PM.add(createLoopVectorizePass(true, LoopVectorize));<br>
>><br>
>> // More scalar chains could be vectorized due to more alias information<br>
>><br>
>> Modified: llvm/trunk/lib/Transforms/Scalar/CMakeLists.txt<br>
>> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/CMakeLists.txt?rev=231458&r1=231457&r2=231458&view=diff" target="_blank">
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/CMakeLists.txt?rev=231458&r1=231457&r2=231458&view=diff</a><br>
>> ==============================================================================<br>
>> --- llvm/trunk/lib/Transforms/Scalar/CMakeLists.txt (original)<br>
>> +++ llvm/trunk/lib/Transforms/Scalar/CMakeLists.txt Fri Mar 6 04:11:25 2015<br>
>> @@ -18,6 +18,7 @@ add_llvm_library(LLVMScalarOpts<br>
>> LoopDeletion.cpp<br>
>> LoopIdiomRecognize.cpp<br>
>> LoopInstSimplify.cpp<br>
>> + LoopInterchange.cpp<br>
>> LoopRerollPass.cpp<br>
>> LoopRotation.cpp<br>
>> LoopStrengthReduce.cpp<br>
>><br>
>> Added: llvm/trunk/lib/Transforms/Scalar/LoopInterchange.cpp<br>
>> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/LoopInterchange.cpp?rev=231458&view=auto" target="_blank">
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/LoopInterchange.cpp?rev=231458&view=auto</a><br>
>> ==============================================================================<br>
>> --- llvm/trunk/lib/Transforms/Scalar/LoopInterchange.cpp (added)<br>
>> +++ llvm/trunk/lib/Transforms/Scalar/LoopInterchange.cpp Fri Mar 6 04:11:25 2015<br>
>> @@ -0,0 +1,1193 @@<br>
>> +//===- LoopInterchange.cpp - Loop interchange pass------------------------===//<br>
>> +//<br>
>> +// The LLVM Compiler Infrastructure<br>
>> +//<br>
>> +// This file is distributed under the University of Illinois Open Source<br>
>> +// License. See LICENSE.TXT for details.<br>
>> +//<br>
>> +//===----------------------------------------------------------------------===//<br>
>> +//<br>
>> +// This Pass handles loop interchange transform.<br>
>> +// This pass interchanges loops to provide a more cache-friendly memory access<br>
>> +// patterns.<br>
>> +//<br>
>> +//===----------------------------------------------------------------------===//<br>
>> +<br>
>> +#include "llvm/ADT/SmallVector.h"<br>
>> +#include "llvm/Analysis/AliasAnalysis.h"<br>
>> +#include "llvm/Analysis/AliasSetTracker.h"<br>
>> +#include "llvm/Analysis/AssumptionCache.h"<br>
>> +#include "llvm/Analysis/BlockFrequencyInfo.h"<br>
>> +#include "llvm/Analysis/CodeMetrics.h"<br>
>> +#include "llvm/Analysis/DependenceAnalysis.h"<br>
>> +#include "llvm/Analysis/LoopInfo.h"<br>
>> +#include "llvm/Analysis/LoopIterator.h"<br>
>> +#include "llvm/Analysis/LoopPass.h"<br>
>> +#include "llvm/Analysis/ScalarEvolution.h"<br>
>> +#include "llvm/Analysis/ScalarEvolutionExpander.h"<br>
>> +#include "llvm/Analysis/ScalarEvolutionExpressions.h"<br>
>> +#include "llvm/Analysis/TargetTransformInfo.h"<br>
>> +#include "llvm/Analysis/ValueTracking.h"<br>
>> +#include "llvm/Transforms/Scalar.h"<br>
>> +#include "llvm/IR/Function.h"<br>
>> +#include "llvm/IR/IRBuilder.h"<br>
>> +#include "llvm/IR/IntrinsicInst.h"<br>
>> +#include "llvm/IR/InstIterator.h"<br>
>> +#include "llvm/IR/Dominators.h"<br>
>> +#include "llvm/Pass.h"<br>
>> +#include "llvm/Support/Debug.h"<br>
>> +#include "llvm/Transforms/Utils/SSAUpdater.h"<br>
>> +#include "llvm/Support/raw_ostream.h"<br>
>> +#include "llvm/Transforms/Utils/LoopUtils.h"<br>
>> +#include "llvm/Transforms/Utils/BasicBlockUtils.h"<br>
>> +using namespace llvm;<br>
>> +<br>
>> +#define DEBUG_TYPE "loop-interchange"<br>
>> +<br>
>> +namespace {<br>
>> +<br>
>> +typedef SmallVector<Loop *, 8> LoopVector;<br>
>> +<br>
>> +// TODO: Check if we can use a sparse matrix here.<br>
>> +typedef std::vector<std::vector<char>> CharMatrix;<br>
>> +<br>
>> +// Maximum number of dependencies that can be handled in the dependency matrix.<br>
>> +static const unsigned MaxMemInstrCount = 100;<br>
>> +<br>
>> +// Maximum loop depth supported.<br>
>> +static const unsigned MaxLoopNestDepth = 10;<br>
>> +<br>
>> +struct LoopInterchange;<br>
>> +<br>
>> +#ifdef DUMP_DEP_MATRICIES<br>
>> +void printDepMatrix(CharMatrix &DepMatrix) {<br>
>> + for (auto I = DepMatrix.begin(), E = DepMatrix.end(); I != E; ++I) {<br>
>> + std::vector<char> Vec = *I;<br>
>> + for (auto II = Vec.begin(), EE = Vec.end(); II != EE; ++II)<br>
>> + DEBUG(dbgs() << *II << " ");<br>
>> + DEBUG(dbgs() << "\n");<br>
>> + }<br>
>> +}<br>
>> +#endif<br>
>> +<br>
>> +bool populateDependencyMatrix(CharMatrix &DepMatrix, unsigned Level, Loop *L,<br>
>> + DependenceAnalysis *DA) {<br>
>> + typedef SmallVector<Value *, 16> ValueVector;<br>
>> + ValueVector MemInstr;<br>
>> +<br>
>> + if (Level > MaxLoopNestDepth) {<br>
>> + DEBUG(dbgs() << "Cannot handle loops of depth greater than "<br>
>> + << MaxLoopNestDepth << "\n");<br>
>> + return false;<br>
>> + }<br>
>> +<br>
>> + // For each block.<br>
>> + for (Loop::block_iterator BB = L->block_begin(), BE = L->block_end();<br>
>> + BB != BE; ++BB) {<br>
>> + // Scan the BB and collect legal loads and stores.<br>
>> + for (BasicBlock::iterator I = (*BB)->begin(), E = (*BB)->end(); I != E;<br>
>> + ++I) {<br>
>> + Instruction *Ins = dyn_cast<Instruction>(I);<br>
>> + if (!Ins)<br>
>> + return false;<br>
>> + LoadInst *Ld = dyn_cast<LoadInst>(I);<br>
>> + StoreInst *St = dyn_cast<StoreInst>(I);<br>
>> + if (!St && !Ld)<br>
>> + continue;<br>
>> + if (Ld && !Ld->isSimple())<br>
>> + return false;<br>
>> + if (St && !St->isSimple())<br>
>> + return false;<br>
>> + MemInstr.push_back(I);<br>
>> + }<br>
>> + }<br>
>> +<br>
>> + DEBUG(dbgs() << "Found " << MemInstr.size()<br>
>> + << " Loads and Stores to analyze\n");<br>
>> +<br>
>> + ValueVector::iterator I, IE, J, JE;<br>
>> +<br>
>> + for (I = MemInstr.begin(), IE = MemInstr.end(); I != IE; ++I) {<br>
>> + for (J = I, JE = MemInstr.end(); J != JE; ++J) {<br>
>> + std::vector<char> Dep;<br>
>> + Instruction *Src = dyn_cast<Instruction>(*I);<br>
>> + Instruction *Des = dyn_cast<Instruction>(*J);<br>
>> + if (Src == Des)<br>
>> + continue;<br>
>> + if (isa<LoadInst>(Src) && isa<LoadInst>(Des))<br>
>> + continue;<br>
>> + if (auto D = DA->depends(Src, Des, true)) {<br>
>> + DEBUG(dbgs() << "Found Dependency between Src=" << Src << " Des=" << Des<br>
>> + << "\n");<br>
>> + if (D->isFlow()) {<br>
>> + // TODO: Handle Flow dependence.Check if it is sufficient to populate<br>
>> + // the Dependence Matrix with the direction reversed.<br>
>> + DEBUG(dbgs() << "Flow dependence not handled");<br>
>> + return false;<br>
>> + }<br>
>> + if (D->isAnti()) {<br>
>> + DEBUG(dbgs() << "Found Anti dependence \n");<br>
>> + unsigned Levels = D->getLevels();<br>
>> + char Direction;<br>
>> + for (unsigned II = 1; II <= Levels; ++II) {<br>
>> + const SCEV *Distance = D->getDistance(II);<br>
>> + const SCEVConstant *SCEVConst =<br>
>> + dyn_cast_or_null<SCEVConstant>(Distance);<br>
>> + if (SCEVConst) {<br>
>> + const ConstantInt *CI = SCEVConst->getValue();<br>
>> + if (CI->isNegative())<br>
>> + Direction = '<';<br>
>> + else if (CI->isZero())<br>
>> + Direction = '=';<br>
>> + else<br>
>> + Direction = '>';<br>
>> + Dep.push_back(Direction);<br>
>> + } else if (D->isScalar(II)) {<br>
>> + Direction = 'S';<br>
>> + Dep.push_back(Direction);<br>
>> + } else {<br>
>> + unsigned Dir = D->getDirection(II);<br>
>> + if (Dir == Dependence::DVEntry::LT ||<br>
>> + Dir == Dependence::DVEntry::LE)<br>
>> + Direction = '<';<br>
>> + else if (Dir == Dependence::DVEntry::GT ||<br>
>> + Dir == Dependence::DVEntry::GE)<br>
>> + Direction = '>';<br>
>> + else if (Dir == Dependence::DVEntry::EQ)<br>
>> + Direction = '=';<br>
>> + else<br>
>> + Direction = '*';<br>
>> + Dep.push_back(Direction);<br>
>> + }<br>
>> + }<br>
>> + while (Dep.size() != Level) {<br>
>> + Dep.push_back('I');<br>
>> + }<br>
>> +<br>
>> + DepMatrix.push_back(Dep);<br>
>> + if (DepMatrix.size() > MaxMemInstrCount) {<br>
>> + DEBUG(dbgs() << "Cannot handle more than " << MaxMemInstrCount<br>
>> + << " dependencies inside loop\n");<br>
>> + return false;<br>
>> + }<br>
>> + }<br>
>> + }<br>
>> + }<br>
>> + }<br>
>> +<br>
>> + // We don't have a DepMatrix to check legality return false<br>
>> + if (DepMatrix.size() == 0)<br>
>> + return false;<br>
>> + return true;<br>
>> +}<br>
>> +<br>
>> +// A loop is moved from index 'from' to an index 'to'. Update the Dependence<br>
>> +// matrix by exchanging the two columns.<br>
>> +void interChangeDepedencies(CharMatrix &DepMatrix, unsigned FromIndx,<br>
>> + unsigned ToIndx) {<br>
>> + unsigned numRows = DepMatrix.size();<br>
>> + for (unsigned i = 0; i < numRows; ++i) {<br>
>> + char TmpVal = DepMatrix[i][ToIndx];<br>
>> + DepMatrix[i][ToIndx] = DepMatrix[i][FromIndx];<br>
>> + DepMatrix[i][FromIndx] = TmpVal;<br>
>> + }<br>
>> +}<br>
>> +<br>
>> +// Checks if outermost non '=','S'or'I' dependence in the dependence matrix is<br>
>> +// '>'<br>
>> +bool isOuterMostDepPositive(CharMatrix &DepMatrix, unsigned Row,<br>
>> + unsigned Column) {<br>
>> + for (unsigned i = 0; i <= Column; ++i) {<br>
>> + if (DepMatrix[Row][i] == '<')<br>
>> + return false;<br>
>> + if (DepMatrix[Row][i] == '>')<br>
>> + return true;<br>
>> + }<br>
>> + // All dependencies were '=','S' or 'I'<br>
>> + return false;<br>
>> +}<br>
>> +<br>
>> +// Checks if no dependence exist in the dependency matrix in Row before Column.<br>
>> +bool containsNoDependence(CharMatrix &DepMatrix, unsigned Row,<br>
>> + unsigned Column) {<br>
>> + for (unsigned i = 0; i < Column; ++i) {<br>
>> + if (DepMatrix[Row][i] != '=' || DepMatrix[Row][i] != 'S' ||<br>
>> + DepMatrix[Row][i] != 'I')<br>
>> + return false;<br>
>> + }<br>
>> + return true;<br>
>> +}<br>
>> +<br>
>> +bool validDepInterchange(CharMatrix &DepMatrix, unsigned Row,<br>
>> + unsigned OuterLoopId, char InnerDep, char OuterDep) {<br>
>> +<br>
>> + if (isOuterMostDepPositive(DepMatrix, Row, OuterLoopId))<br>
>> + return false;<br>
>> +<br>
>> + if (InnerDep == OuterDep)<br>
>> + return true;<br>
>> +<br>
>> + // It is legal to interchange if and only if after interchange no row has a<br>
>> + // '>' direction as the leftmost non-'='.<br>
>> +<br>
>> + if (InnerDep == '=' || InnerDep == 'S' || InnerDep == 'I')<br>
>> + return true;<br>
>> +<br>
>> + if (InnerDep == '<')<br>
>> + return true;<br>
>> +<br>
>> + if (InnerDep == '>') {<br>
>> + // If OuterLoopId represents outermost loop then interchanging will make the<br>
>> + // 1st dependency as '>'<br>
>> + if (OuterLoopId == 0)<br>
>> + return false;<br>
>> +<br>
>> + // If all dependencies before OuterloopId are '=','S'or 'I'. Then<br>
>> + // interchanging will result in this row having an outermost non '='<br>
>> + // dependency of '>'<br>
>> + if (!containsNoDependence(DepMatrix, Row, OuterLoopId))<br>
>> + return true;<br>
>> + }<br>
>> +<br>
>> + return false;<br>
>> +}<br>
>> +<br>
>> +// Checks if it is legal to interchange 2 loops.<br>
>> +// [Theorm] A permutation of the loops in a perfect nest is legal if and only if<br>
>> +// the direction matrix, after the same permutation is applied to its columns,<br>
>> +// has no ">" direction as the leftmost non-"=" direction in any row.<br>
>> +bool isLegalToInterChangeLoops(CharMatrix &DepMatrix, unsigned InnerLoopId,<br>
>> + unsigned OuterLoopId) {<br>
>> +<br>
>> + unsigned NumRows = DepMatrix.size();<br>
>> + // For each row check if it is valid to interchange.<br>
>> + for (unsigned Row = 0; Row < NumRows; ++Row) {<br>
>> + char InnerDep = DepMatrix[Row][InnerLoopId];<br>
>> + char OuterDep = DepMatrix[Row][OuterLoopId];<br>
>> + if (InnerDep == '*' || OuterDep == '*')<br>
>> + return false;<br>
>> + else if (!validDepInterchange(DepMatrix, Row, OuterLoopId, InnerDep,<br>
>> + OuterDep))<br>
>> + return false;<br>
>> + }<br>
>> + return true;<br>
>> +}<br>
>> +<br>
>> +static void populateWorklist(Loop &L, SmallVector<LoopVector, 8> &V) {<br>
>> +<br>
>> + DEBUG(dbgs() << "Calling populateWorklist called\n");<br>
>> + LoopVector LoopList;<br>
>> + Loop *CurrentLoop = &L;<br>
>> + std::vector<Loop *> vec = CurrentLoop->getSubLoopsVector();<br>
>> + while (vec.size() != 0) {<br>
>> + // The current loop has multiple subloops in it hence it is not tightly<br>
>> + // nested.<br>
>> + // Discard all loops above it added into Worklist.<br>
>> + if (vec.size() != 1) {<br>
>> + LoopList.clear();<br>
>> + return;<br>
>> + }<br>
>> + LoopList.push_back(CurrentLoop);<br>
>> + CurrentLoop = *(vec.begin());<br>
>> + vec = CurrentLoop->getSubLoopsVector();<br>
>> + }<br>
>> + LoopList.push_back(CurrentLoop);<br>
>> + V.push_back(LoopList);<br>
>> +}<br>
>> +<br>
>> +static PHINode *getInductionVariable(Loop *L, ScalarEvolution *SE) {<br>
>> + PHINode *InnerIndexVar = L->getCanonicalInductionVariable();<br>
>> + if (InnerIndexVar)<br>
>> + return InnerIndexVar;<br>
>> + if (L->getLoopLatch() == nullptr || L->getLoopPredecessor() == nullptr)<br>
>> + return nullptr;<br>
>> + for (BasicBlock::iterator I = L->getHeader()->begin(); isa<PHINode>(I); ++I) {<br>
>> + PHINode *PhiVar = cast<PHINode>(I);<br>
>> + Type *PhiTy = PhiVar->getType();<br>
>> + if (!PhiTy->isIntegerTy() && !PhiTy->isFloatingPointTy() &&<br>
>> + !PhiTy->isPointerTy())<br>
>> + return nullptr;<br>
>> + const SCEVAddRecExpr *AddRec =<br>
>> + dyn_cast<SCEVAddRecExpr>(SE->getSCEV(PhiVar));<br>
>> + if (!AddRec || !AddRec->isAffine())<br>
>> + continue;<br>
>> + const SCEV *Step = AddRec->getStepRecurrence(*SE);<br>
>> + const SCEVConstant *C = dyn_cast<SCEVConstant>(Step);<br>
>> + if (!C)<br>
>> + continue;<br>
>> + // Found the induction variable.<br>
>> + // FIXME: Handle loops with more than one induction variable. Note that,<br>
>> + // currently, legality makes sure we have only one induction variable.<br>
>> + return PhiVar;<br>
>> + }<br>
>> + return nullptr;<br>
>> +}<br>
>> +<br>
>> +/// LoopInterchangeLegality checks if it is legal to interchange the loop.<br>
>> +class LoopInterchangeLegality {<br>
>> +public:<br>
>> + LoopInterchangeLegality(Loop *Outer, Loop *Inner, ScalarEvolution *SE,<br>
>> + LoopInterchange *Pass)<br>
>> + : OuterLoop(Outer), InnerLoop(Inner), SE(SE), CurrentPass(Pass) {}<br>
>> +<br>
>> + /// Check if the loops can be interchanged.<br>
>> + bool canInterchangeLoops(unsigned InnerLoopId, unsigned OuterLoopId,<br>
>> + CharMatrix &DepMatrix);<br>
>> + /// Check if the loop structure is understood. We do not handle triangular<br>
>> + /// loops for now.<br>
>> + bool isLoopStructureUnderstood(PHINode *InnerInductionVar);<br>
>> +<br>
>> + bool currentLimitations();<br>
>> +<br>
>> +private:<br>
>> + bool tightlyNested(Loop *Outer, Loop *Inner);<br>
>> +<br>
>> + Loop *OuterLoop;<br>
>> + Loop *InnerLoop;<br>
>> +<br>
>> + /// Scev analysis.<br>
>> + ScalarEvolution *SE;<br>
>> + LoopInterchange *CurrentPass;<br>
>> +};<br>
>> +<br>
>> +/// LoopInterchangeProfitability checks if it is profitable to interchange the<br>
>> +/// loop.<br>
>> +class LoopInterchangeProfitability {<br>
>> +public:<br>
>> + LoopInterchangeProfitability(Loop *Outer, Loop *Inner, ScalarEvolution *SE)<br>
>> + : OuterLoop(Outer), InnerLoop(Inner), SE(SE) {}<br>
>> +<br>
>> + /// Check if the loop interchange is profitable<br>
>> + bool isProfitable(unsigned InnerLoopId, unsigned OuterLoopId,<br>
>> + CharMatrix &DepMatrix);<br>
>> +<br>
>> +private:<br>
>> + int getInstrOrderCost();<br>
>> +<br>
>> + Loop *OuterLoop;<br>
>> + Loop *InnerLoop;<br>
>> +<br>
>> + /// Scev analysis.<br>
>> + ScalarEvolution *SE;<br>
>> +};<br>
>> +<br>
>> +/// LoopInterchangeTransform interchanges the loop<br>
>> +class LoopInterchangeTransform {<br>
>> +public:<br>
>> + LoopInterchangeTransform(Loop *Outer, Loop *Inner, ScalarEvolution *SE,<br>
>> + LoopInfo *LI, DominatorTree *DT,<br>
>> + LoopInterchange *Pass, BasicBlock *LoopNestExit)<br>
>> + : OuterLoop(Outer), InnerLoop(Inner), SE(SE), LI(LI), DT(DT),<br>
>> + LoopExit(LoopNestExit) {<br>
>> + initialize();<br>
>> + }<br>
>> +<br>
>> + /// Interchange OuterLoop and InnerLoop.<br>
>> + bool transform();<br>
>> + void restructureLoops(Loop *InnerLoop, Loop *OuterLoop);<br>
>> + void removeChildLoop(Loop *OuterLoop, Loop *InnerLoop);<br>
>> + void initialize();<br>
>> +<br>
>> +private:<br>
>> + void splitInnerLoopLatch(Instruction *);<br>
>> + void splitOuterLoopLatch();<br>
>> + void splitInnerLoopHeader();<br>
>> + bool adjustLoopLinks();<br>
>> + void adjustLoopPreheaders();<br>
>> + void adjustOuterLoopPreheader();<br>
>> + void adjustInnerLoopPreheader();<br>
>> + bool adjustLoopBranches();<br>
>> +<br>
>> + Loop *OuterLoop;<br>
>> + Loop *InnerLoop;<br>
>> +<br>
>> + /// Scev analysis.<br>
>> + ScalarEvolution *SE;<br>
>> + LoopInfo *LI;<br>
>> + DominatorTree *DT;<br>
>> + BasicBlock *LoopExit;<br>
>> +};<br>
>> +<br>
>> +// Main LoopInterchange Pass<br>
>> +struct LoopInterchange : public FunctionPass {<br>
>> + static char ID;<br>
>> + ScalarEvolution *SE;<br>
>> + LoopInfo *LI;<br>
>> + DependenceAnalysis *DA;<br>
>> + DominatorTree *DT;<br>
>> + LoopInterchange()<br>
>> + : FunctionPass(ID), SE(nullptr), LI(nullptr), DA(nullptr), DT(nullptr) {<br>
>> + initializeLoopInterchangePass(*PassRegistry::getPassRegistry());<br>
>> + }<br>
>> +<br>
>> + void getAnalysisUsage(AnalysisUsage &AU) const override {<br>
>> + AU.addRequired<ScalarEvolution>();<br>
>> + AU.addRequired<AliasAnalysis>();<br>
>> + AU.addRequired<DominatorTreeWrapperPass>();<br>
>> + AU.addRequired<LoopInfoWrapperPass>();<br>
>> + AU.addRequired<DependenceAnalysis>();<br>
>> + AU.addRequiredID(LoopSimplifyID);<br>
>> + AU.addRequiredID(LCSSAID);<br>
>> + }<br>
>> +<br>
>> + bool runOnFunction(Function &F) override {<br>
>> + SE = &getAnalysis<ScalarEvolution>();<br>
>> + LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();<br>
>> + DA = &getAnalysis<DependenceAnalysis>();<br>
>> + auto *DTWP = getAnalysisIfAvailable<DominatorTreeWrapperPass>();<br>
>> + DT = DTWP ? &DTWP->getDomTree() : nullptr;<br>
>> + // Build up a worklist of loop pairs to analyze.<br>
>> + SmallVector<LoopVector, 8> Worklist;<br>
>> +<br>
>> + for (Loop *L : *LI)<br>
>> + populateWorklist(*L, Worklist);<br>
>> +<br>
>> + DEBUG(dbgs() << "Worklist size = " << Worklist.size() << "\n");<br>
>> + bool Changed = true;<br>
>> + while (!Worklist.empty()) {<br>
>> + LoopVector LoopList = Worklist.pop_back_val();<br>
>> + Changed = processLoopList(LoopList);<br>
>> + }<br>
>> + return Changed;<br>
>> + }<br>
>> +<br>
>> + bool isComputableLoopNest(LoopVector LoopList) {<br>
>> + for (auto I = LoopList.begin(), E = LoopList.end(); I != E; ++I) {<br>
>> + Loop *L = *I;<br>
>> + const SCEV *ExitCountOuter = SE->getBackedgeTakenCount(L);<br>
>> + if (ExitCountOuter == SE->getCouldNotCompute()) {<br>
>> + DEBUG(dbgs() << "Couldn't compute Backedge count\n");<br>
>> + return false;<br>
>> + }<br>
>> + if (L->getNumBackEdges() != 1) {<br>
>> + DEBUG(dbgs() << "NumBackEdges is not equal to 1\n");<br>
>> + return false;<br>
>> + }<br>
>> + if (!L->getExitingBlock()) {<br>
>> + DEBUG(dbgs() << "Loop Doesn't have unique exit block\n");<br>
>> + return false;<br>
>> + }<br>
>> + }<br>
>> + return true;<br>
>> + }<br>
>> +<br>
>> + unsigned selectLoopForInterchange(LoopVector LoopList) {<br>
>> + // TODO: Add a better heuristic to select the loop to be interchanged based<br>
>> + // on the dependece matrix. Currently we select the innermost loop.<br>
>> + return LoopList.size() - 1;<br>
>> + }<br>
>> +<br>
>> + bool processLoopList(LoopVector LoopList) {<br>
>> + bool Changed = false;<br>
>> + bool containsLCSSAPHI = false;<br>
>> + CharMatrix DependencyMatrix;<br>
>> + if (LoopList.size() < 2) {<br>
>> + DEBUG(dbgs() << "Loop doesn't contain minimum nesting level.\n");<br>
>> + return false;<br>
>> + }<br>
>> + if (!isComputableLoopNest(LoopList)) {<br>
>> + DEBUG(dbgs() << "Not vaild loop candidate for interchange\n");<br>
>> + return false;<br>
>> + }<br>
>> + Loop *OuterMostLoop = *(LoopList.begin());<br>
>> +<br>
>> + DEBUG(dbgs() << "Processing LoopList of size = " << LoopList.size()<br>
>> + << "\n");<br>
>> +<br>
>> + if (!populateDependencyMatrix(DependencyMatrix, LoopList.size(),<br>
>> + OuterMostLoop, DA)) {<br>
>> + DEBUG(dbgs() << "Populating Dependency matrix failed\n");<br>
>> + return false;<br>
>> + }<br>
>> +#ifdef DUMP_DEP_MATRICIES<br>
>> + DEBUG(dbgs() << "Dependence before inter change \n");<br>
>> + printDepMatrix(DependencyMatrix);<br>
>> +#endif<br>
>> +<br>
>> + BasicBlock *OuterMostLoopLatch = OuterMostLoop->getLoopLatch();<br>
>> + BranchInst *OuterMostLoopLatchBI =<br>
>> + dyn_cast<BranchInst>(OuterMostLoopLatch->getTerminator());<br>
>> + if (!OuterMostLoopLatchBI)<br>
>> + return false;<br>
>> +<br>
>> + // Since we currently do not handle LCSSA PHI's any failure in loop<br>
>> + // condition will now branch to LoopNestExit.<br>
>> + // TODO: This should be removed once we handle LCSSA PHI nodes.<br>
>> +<br>
>> + // Get the Outermost loop exit.<br>
>> + BasicBlock *LoopNestExit;<br>
>> + if (OuterMostLoopLatchBI->getSuccessor(0) == OuterMostLoop->getHeader())<br>
>> + LoopNestExit = OuterMostLoopLatchBI->getSuccessor(1);<br>
>> + else<br>
>> + LoopNestExit = OuterMostLoopLatchBI->getSuccessor(0);<br>
>> +<br>
>> + for (auto I = LoopList.begin(), E = LoopList.end(); I != E; ++I) {<br>
>> + Loop *L = *I;<br>
>> + BasicBlock *Latch = L->getLoopLatch();<br>
>> + BasicBlock *Header = L->getHeader();<br>
>> + if (Latch && Latch != Header && isa<PHINode>(Latch->begin())) {<br>
>> + containsLCSSAPHI = true;<br>
>> + break;<br>
>> + }<br>
>> + }<br>
>> +<br>
>> + // TODO: Handle lcssa PHI's. Currently LCSSA PHI's are not handled. Handle<br>
>> + // the same by splitting the loop latch and adjusting loop links<br>
>> + // accordingly.<br>
>> + if (containsLCSSAPHI)<br>
>> + return false;<br>
>> +<br>
>> + unsigned SelecLoopId = selectLoopForInterchange(LoopList);<br>
>> + // Move the selected loop outwards to the best posible position.<br>
>> + for (unsigned i = SelecLoopId; i > 0; i--) {<br>
>> + bool Interchanged =<br>
>> + processLoop(LoopList, i, i - 1, LoopNestExit, DependencyMatrix);<br>
>> + if (!Interchanged)<br>
>> + return Changed;<br>
>> + // Loops interchanged reflect the same in LoopList<br>
>> + Loop *OldOuterLoop = LoopList[i - 1];<br>
>> + LoopList[i - 1] = LoopList[i];<br>
>> + LoopList[i] = OldOuterLoop;<br>
>> +<br>
>> + // Update the DependencyMatrix<br>
>> + interChangeDepedencies(DependencyMatrix, i, i - 1);<br>
>> +<br>
>> +#ifdef DUMP_DEP_MATRICIES<br>
>> + DEBUG(dbgs() << "Dependence after inter change \n");<br>
>> + printDepMatrix(DependencyMatrix);<br>
>> +#endif<br>
>> + Changed |= Interchanged;<br>
>> + }<br>
>> + return Changed;<br>
>> + }<br>
>> +<br>
>> + bool processLoop(LoopVector LoopList, unsigned InnerLoopId,<br>
>> + unsigned OuterLoopId, BasicBlock *LoopNestExit,<br>
>> + std::vector<std::vector<char>> &DependencyMatrix) {<br>
>> +<br>
>> + DEBUG(dbgs() << "Processing Innder Loop Id = " << InnerLoopId<br>
>> + << " and OuterLoopId = " << OuterLoopId << "\n");<br>
>> + Loop *InnerLoop = LoopList[InnerLoopId];<br>
>> + Loop *OuterLoop = LoopList[OuterLoopId];<br>
>> +<br>
>> + LoopInterchangeLegality LIL(OuterLoop, InnerLoop, SE, this);<br>
>> + if (!LIL.canInterchangeLoops(InnerLoopId, OuterLoopId, DependencyMatrix)) {<br>
>> + DEBUG(dbgs() << "Not interchanging Loops. Cannot prove legality\n");<br>
>> + return false;<br>
>> + }<br>
>> + DEBUG(dbgs() << "Loops are legal to interchange\n");<br>
>> + LoopInterchangeProfitability LIP(OuterLoop, InnerLoop, SE);<br>
>> + if (!LIP.isProfitable(InnerLoopId, OuterLoopId, DependencyMatrix)) {<br>
>> + DEBUG(dbgs() << "Interchanging Loops not profitable\n");<br>
>> + return false;<br>
>> + }<br>
>> +<br>
>> + LoopInterchangeTransform LIT(OuterLoop, InnerLoop, SE, LI, DT, this,<br>
>> + LoopNestExit);<br>
>> + LIT.transform();<br>
>> + DEBUG(dbgs() << "Loops interchanged\n");<br>
>> + return true;<br>
>> + }<br>
>> +};<br>
>> +<br>
>> +} // end of namespace<br>
>> +<br>
>> +static bool containsUnsafeInstructions(BasicBlock *BB) {<br>
>> + for (auto I = BB->begin(), E = BB->end(); I != E; ++I) {<br>
>> + if (I->mayHaveSideEffects() || I->mayReadFromMemory())<br>
>> + return true;<br>
>> + }<br>
>> + return false;<br>
>> +}<br>
>> +<br>
>> +bool LoopInterchangeLegality::tightlyNested(Loop *OuterLoop, Loop *InnerLoop) {<br>
>> + BasicBlock *OuterLoopHeader = OuterLoop->getHeader();<br>
>> + BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();<br>
>> + BasicBlock *OuterLoopLatch = OuterLoop->getLoopLatch();<br>
>> +<br>
>> + DEBUG(dbgs() << "Checking if Loops are Tightly Nested\n");<br>
>> +<br>
>> + // A perfectly nested loop will not have any branch in between the outer and<br>
>> + // inner block i.e. outer header will branch to either inner preheader and<br>
>> + // outerloop latch.<br>
>> + BranchInst *outerLoopHeaderBI =<br>
>> + dyn_cast<BranchInst>(OuterLoopHeader->getTerminator());<br>
>> + if (!outerLoopHeaderBI)<br>
>> + return false;<br>
>> + unsigned num = outerLoopHeaderBI->getNumSuccessors();<br>
>> + for (unsigned i = 0; i < num; i++) {<br>
>> + if (outerLoopHeaderBI->getSuccessor(i) != InnerLoopPreHeader &&<br>
>> + outerLoopHeaderBI->getSuccessor(i) != OuterLoopLatch)<br>
>> + return false;<br>
>> + }<br>
>> +<br>
>> + DEBUG(dbgs() << "Checking instructions in Loop header and Loop latch \n");<br>
>> + // We do not have any basic block in between now make sure the outer header<br>
>> + // and outer loop latch doesnt contain any unsafe instructions.<br>
>> + if (containsUnsafeInstructions(OuterLoopHeader) ||<br>
>> + containsUnsafeInstructions(OuterLoopLatch))<br>
>> + return false;<br>
>> +<br>
>> + DEBUG(dbgs() << "Loops are perfectly nested \n");<br>
>> + // We have a perfect loop nest.<br>
>> + return true;<br>
>> +}<br>
>> +<br>
>> +static unsigned getPHICount(BasicBlock *BB) {<br>
>> + unsigned PhiCount = 0;<br>
>> + for (auto I = BB->begin(); isa<PHINode>(I); ++I)<br>
>> + PhiCount++;<br>
>> + return PhiCount;<br>
>> +}<br>
>> +<br>
>> +bool LoopInterchangeLegality::isLoopStructureUnderstood(<br>
>> + PHINode *InnerInduction) {<br>
>> +<br>
>> + unsigned Num = InnerInduction->getNumOperands();<br>
>> + BasicBlock *InnerLoopPreheader = InnerLoop->getLoopPreheader();<br>
>> + for (unsigned i = 0; i < Num; ++i) {<br>
>> + Value *Val = InnerInduction->getOperand(i);<br>
>> + if (isa<Constant>(Val))<br>
>> + continue;<br>
>> + Instruction *I = dyn_cast<Instruction>(Val);<br>
>> + if (!I)<br>
>> + return false;<br>
>> + // TODO: Handle triangular loops.<br>
>> + // e.g. for(int i=0;i<N;i++)<br>
>> + // for(int j=i;j<N;j++)<br>
>> + unsigned IncomBlockIndx = PHINode::getIncomingValueNumForOperand(i);<br>
>> + if (InnerInduction->getIncomingBlock(IncomBlockIndx) ==<br>
>> + InnerLoopPreheader &&<br>
>> + !OuterLoop->isLoopInvariant(I)) {<br>
>> + return false;<br>
>> + }<br>
>> + }<br>
>> + return true;<br>
>> +}<br>
>> +<br>
>> +// This function indicates the current limitations in the transform as a result<br>
>> +// of which we do not proceed.<br>
>> +bool LoopInterchangeLegality::currentLimitations() {<br>
>> +<br>
>> + BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();<br>
>> + BasicBlock *InnerLoopHeader = InnerLoop->getHeader();<br>
>> + BasicBlock *OuterLoopHeader = OuterLoop->getHeader();<br>
>> + BasicBlock *InnerLoopLatch = InnerLoop->getLoopLatch();<br>
>> + BasicBlock *OuterLoopLatch = OuterLoop->getLoopLatch();<br>
>> +<br>
>> + PHINode *InnerInductionVar;<br>
>> + PHINode *OuterInductionVar;<br>
>> +<br>
>> + // We currently handle only 1 induction variable inside the loop. We also do<br>
>> + // not handle reductions as of now.<br>
>> + if (getPHICount(InnerLoopHeader) > 1)<br>
>> + return true;<br>
>> +<br>
>> + if (getPHICount(OuterLoopHeader) > 1)<br>
>> + return true;<br>
>> +<br>
>> + InnerInductionVar = getInductionVariable(InnerLoop, SE);<br>
>> + OuterInductionVar = getInductionVariable(OuterLoop, SE);<br>
>> +<br>
>> + if (!OuterInductionVar || !InnerInductionVar) {<br>
>> + DEBUG(dbgs() << "Induction variable not found\n");<br>
>> + return true;<br>
>> + }<br>
>> +<br>
>> + // TODO: Triangular loops are not handled for now.<br>
>> + if (!isLoopStructureUnderstood(InnerInductionVar)) {<br>
>> + DEBUG(dbgs() << "Loop structure not understood by pass\n");<br>
>> + return true;<br>
>> + }<br>
>> +<br>
>> + // TODO: Loops with LCSSA PHI's are currently not handled.<br>
>> + if (isa<PHINode>(OuterLoopLatch->begin())) {<br>
>> + DEBUG(dbgs() << "Found and LCSSA PHI in outer loop latch\n");<br>
>> + return true;<br>
>> + }<br>
>> + if (InnerLoopLatch != InnerLoopHeader &&<br>
>> + isa<PHINode>(InnerLoopLatch->begin())) {<br>
>> + DEBUG(dbgs() << "Found and LCSSA PHI in inner loop latch\n");<br>
>> + return true;<br>
>> + }<br>
>> +<br>
>> + // TODO: Current limitation: Since we split the inner loop latch at the point<br>
>> + // were induction variable is incremented (induction.next); We cannot have<br>
>> + // more than 1 user of induction.next since it would result in broken code<br>
>> + // after split.<br>
>> + // e.g.<br>
>> + // for(i=0;i<N;i++) {<br>
>> + // for(j = 0;j<M;j++) {<br>
>> + // A[j+1][i+2] = A[j][i]+k;<br>
>> + // }<br>
>> + // }<br>
>> + bool FoundInduction = false;<br>
>> + Instruction *InnerIndexVarInc = nullptr;<br>
>> + if (InnerInductionVar->getIncomingBlock(0) == InnerLoopPreHeader)<br>
>> + InnerIndexVarInc =<br>
>> + dyn_cast<Instruction>(InnerInductionVar->getIncomingValue(1));<br>
>> + else<br>
>> + InnerIndexVarInc =<br>
>> + dyn_cast<Instruction>(InnerInductionVar->getIncomingValue(0));<br>
>> +<br>
>> + if (!InnerIndexVarInc)<br>
>> + return true;<br>
>> +<br>
>> + // Since we split the inner loop latch on this induction variable. Make sure<br>
>> + // we do not have any instruction between the induction variable and branch<br>
>> + // instruction.<br>
>> +<br>
>> + for (auto I = InnerLoopLatch->rbegin(), E = InnerLoopLatch->rend();<br>
>> + I != E && !FoundInduction; ++I) {<br>
>> + if (isa<BranchInst>(*I) || isa<CmpInst>(*I) || isa<TruncInst>(*I))<br>
>> + continue;<br>
>> + const Instruction &Ins = *I;<br>
>> + // We found an instruction. If this is not induction variable then it is not<br>
>> + // safe to split this loop latch.<br>
>> + if (!Ins.isIdenticalTo(InnerIndexVarInc))<br>
>> + return true;<br>
>> + else<br>
>> + FoundInduction = true;<br>
>> + }<br>
>> + // The loop latch ended and we didnt find the induction variable return as<br>
>> + // current limitation.<br>
>> + if (!FoundInduction)<br>
>> + return true;<br>
>> +<br>
>> + return false;<br>
>> +}<br>
>> +<br>
>> +bool LoopInterchangeLegality::canInterchangeLoops(unsigned InnerLoopId,<br>
>> + unsigned OuterLoopId,<br>
>> + CharMatrix &DepMatrix) {<br>
>> +<br>
>> + if (!isLegalToInterChangeLoops(DepMatrix, InnerLoopId, OuterLoopId)) {<br>
>> + DEBUG(dbgs() << "Failed interchange InnerLoopId = " << InnerLoopId<br>
>> + << "and OuterLoopId = " << OuterLoopId<br>
>> + << "due to dependence\n");<br>
>> + return false;<br>
>> + }<br>
>> +<br>
>> + // Create unique Preheaders if we already do not have one.<br>
>> + BasicBlock *OuterLoopPreHeader = OuterLoop->getLoopPreheader();<br>
>> + BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();<br>
>> +<br>
>> + // Create a unique outer preheader -<br>
>> + // 1) If OuterLoop preheader is not present.<br>
>> + // 2) If OuterLoop Preheader is same as OuterLoop Header<br>
>> + // 3) If OuterLoop Preheader is same as Header of the previous loop.<br>
>> + // 4) If OuterLoop Preheader is Entry node.<br>
>> + if (!OuterLoopPreHeader || OuterLoopPreHeader == OuterLoop->getHeader() ||<br>
>> + isa<PHINode>(OuterLoopPreHeader->begin()) ||<br>
>> + !OuterLoopPreHeader->getUniquePredecessor()) {<br>
>> + OuterLoopPreHeader = InsertPreheaderForLoop(OuterLoop, CurrentPass);<br>
>> + }<br>
>> +<br>
>> + if (!InnerLoopPreHeader || InnerLoopPreHeader == InnerLoop->getHeader() ||<br>
>> + InnerLoopPreHeader == OuterLoop->getHeader()) {<br>
>> + InnerLoopPreHeader = InsertPreheaderForLoop(InnerLoop, CurrentPass);<br>
>> + }<br>
>> +<br>
>> + // Check if the loops are tightly nested.<br>
>> + if (!tightlyNested(OuterLoop, InnerLoop)) {<br>
>> + DEBUG(dbgs() << "Loops not tightly nested\n");<br>
>> + return false;<br>
>> + }<br>
>> +<br>
>> + // TODO: The loops could not be interchanged due to current limitations in the<br>
>> + // transform module.<br>
>> + if (currentLimitations()) {<br>
>> + DEBUG(dbgs() << "Not legal because of current transform limitation\n");<br>
>> + return false;<br>
>> + }<br>
>> +<br>
>> + return true;<br>
>> +}<br>
>> +<br>
>> +int LoopInterchangeProfitability::getInstrOrderCost() {<br>
>> + unsigned GoodOrder, BadOrder;<br>
>> + BadOrder = GoodOrder = 0;<br>
>> + for (auto BI = InnerLoop->block_begin(), BE = InnerLoop->block_end();<br>
>> + BI != BE; ++BI) {<br>
>> + for (auto I = (*BI)->begin(), E = (*BI)->end(); I != E; ++I) {<br>
>> + const Instruction &Ins = *I;<br>
>> + if (const GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(&Ins)) {<br>
>> + unsigned NumOp = GEP->getNumOperands();<br>
>> + bool FoundInnerInduction = false;<br>
>> + bool FoundOuterInduction = false;<br>
>> + for (unsigned i = 0; i < NumOp; ++i) {<br>
>> + const SCEV *OperandVal = SE->getSCEV(GEP->getOperand(i));<br>
>> + const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(OperandVal);<br>
>> + if (!AR)<br>
>> + continue;<br>
>> +<br>
>> + // If we find the inner induction after an outer induction e.g.<br>
>> + // for(int i=0;i<N;i++)<br>
>> + // for(int j=0;j<N;j++)<br>
>> + // A[i][j] = A[i-1][j-1]+k;<br>
>> + // then it is a good order.<br>
>> + if (AR->getLoop() == InnerLoop) {<br>
>> + // We found an InnerLoop induction after OuterLoop induction. It is<br>
>> + // a good order.<br>
>> + FoundInnerInduction = true;<br>
>> + if (FoundOuterInduction) {<br>
>> + GoodOrder++;<br>
>> + break;<br>
>> + }<br>
>> + }<br>
>> + // If we find the outer induction after an inner induction e.g.<br>
>> + // for(int i=0;i<N;i++)<br>
>> + // for(int j=0;j<N;j++)<br>
>> + // A[j][i] = A[j-1][i-1]+k;<br>
>> + // then it is a bad order.<br>
>> + if (AR->getLoop() == OuterLoop) {<br>
>> + // We found an OuterLoop induction after InnerLoop induction. It is<br>
>> + // a bad order.<br>
>> + FoundOuterInduction = true;<br>
>> + if (FoundInnerInduction) {<br>
>> + BadOrder++;<br>
>> + break;<br>
>> + }<br>
>> + }<br>
>> + }<br>
>> + }<br>
>> + }<br>
>> + }<br>
>> + return GoodOrder - BadOrder;<br>
>> +}<br>
>> +<br>
>> +bool isProfitabileForVectorization(unsigned InnerLoopId, unsigned OuterLoopId,<br>
>> + CharMatrix &DepMatrix) {<br>
>> + // TODO: Improve this heuristic to catch more cases.<br>
>> + // If the inner loop is loop independent or doesn't carry any dependency it is<br>
>> + // profitable to move this to outer position.<br>
>> + unsigned Row = DepMatrix.size();<br>
>> + for (unsigned i = 0; i < Row; ++i) {<br>
>> + if (DepMatrix[i][InnerLoopId] != 'S' && DepMatrix[i][InnerLoopId] != 'I')<br>
>> + return false;<br>
>> + // TODO: We need to improve this heuristic.<br>
>> + if (DepMatrix[i][OuterLoopId] != '=')<br>
>> + return false;<br>
>> + }<br>
>> + // If outer loop has dependence and inner loop is loop independent then it is<br>
>> + // profitable to interchange to enable parallelism.<br>
>> + return true;<br>
>> +}<br>
>> +<br>
>> +bool LoopInterchangeProfitability::isProfitable(unsigned InnerLoopId,<br>
>> + unsigned OuterLoopId,<br>
>> + CharMatrix &DepMatrix) {<br>
>> +<br>
>> + // TODO: Add Better Profitibility checks.<br>
>> + // e.g<br>
>> + // 1) Construct dependency matrix and move the one with no loop carried dep<br>
>> + // inside to enable vectorization.<br>
>> +<br>
>> + // This is rough cost estimation algorithm. It counts the good and bad order<br>
>> + // of induction variables in the instruction and allows reordering if number<br>
>> + // of bad orders is more than good.<br>
>> + int Cost = 0;<br>
>> + Cost += getInstrOrderCost();<br>
>> + DEBUG(dbgs() << "Cost = " << Cost << "\n");<br>
>> + if (Cost < 0)<br>
>> + return true;<br>
>> +<br>
>> + // It is not profitable as per current cache profitibility model. But check if<br>
>> + // we can move this loop outside to improve parallelism.<br>
>> + bool ImprovesPar =<br>
>> + isProfitabileForVectorization(InnerLoopId, OuterLoopId, DepMatrix);<br>
>> + return ImprovesPar;<br>
>> +}<br>
>> +<br>
>> +void LoopInterchangeTransform::removeChildLoop(Loop *OuterLoop,<br>
>> + Loop *InnerLoop) {<br>
>> + for (Loop::iterator I = OuterLoop->begin(), E = OuterLoop->end();; ++I) {<br>
>> + assert(I != E && "Couldn't find loop");<br>
>> + if (*I == InnerLoop) {<br>
>> + OuterLoop->removeChildLoop(I);<br>
>> + return;<br>
>> + }<br>
>> + }<br>
>> +}<br>
>> +void LoopInterchangeTransform::restructureLoops(Loop *InnerLoop,<br>
>> + Loop *OuterLoop) {<br>
>> + Loop *OuterLoopParent = OuterLoop->getParentLoop();<br>
>> + if (OuterLoopParent) {<br>
>> + // Remove the loop from its parent loop.<br>
>> + removeChildLoop(OuterLoopParent, OuterLoop);<br>
>> + removeChildLoop(OuterLoop, InnerLoop);<br>
>> + OuterLoopParent->addChildLoop(InnerLoop);<br>
>> + } else {<br>
>> + removeChildLoop(OuterLoop, InnerLoop);<br>
>> + LI->changeTopLevelLoop(OuterLoop, InnerLoop);<br>
>> + }<br>
>> +<br>
>> + for (Loop::iterator I = InnerLoop->begin(), E = InnerLoop->end(); I != E; ++I)<br>
>> + OuterLoop->addChildLoop(InnerLoop->removeChildLoop(I));<br>
><br>
> This for loop is causing failed assertions in debug builds with MSVC;<br>
> the iterator is invalidated when the child loop is removed on<br>
> InnerLoop, so when ++I is executed, the following assertion is<br>
> triggered. I don't think removeChildLoop() is a particularly safe API<br>
> design given how trivial it is for the underlying container to<br>
> invalidate all iterators.<br>
><br>
> 63> FAIL: LLVM :: Transforms/LoopInterchange/reductions.ll (19260 of 22394)<br>
> 63> ******************** TEST 'LLVM ::<br>
> Transforms/LoopInterchange/reductions.ll' FAILED ********************<br>
> 63> Script:<br>
> 63> --<br>
> 63> E:/llvm/2013/Debug/bin\opt.EXE <<br>
> E:\llvm\llvm\test\Transforms\LoopInterchange\reductions.ll -basicaa<br>
> -loop-interchange -S | E:/llvm/2013/Debug/bin\FileCheck.EXE<br>
> E:\llvm\llvm\test\Transforms\LoopInterchange\reductions.ll<br>
> 63> --<br>
> 63> Exit Code: 2<br>
> 63><br>
> 63> Command Output (stdout):<br>
> 63> --<br>
> 63> Command 0: "E:/llvm/2013/Debug/bin\opt.EXE" "-basicaa"<br>
> "-loop-interchange" "-S"<br>
> 63> Command 0 Result: -2147483645<br>
> 63> Command 0 Output:<br>
> 63><br>
> 63><br>
> 63> Command 0 Stderr:<br>
> 63> 0x0F4FCEE6 (0x02AB16A0 0x02AACBD0 0x00000065 0x00322278),<br>
> ?_Debug_message@std@@YAXPB_W0I@Z() + 0x26 bytes(s)<br>
> 63><br>
> 63> 0x0108C5AB (0x0421EE7C 0xCCCCCCCC 0xCCCCCCCC 0x00000000),<br>
> std::_Vector_const_iterator<std::_Vector_val<std::_Simple_types<llvm::Loop<br>
> *> > >::operator++() + 0x4B bytes(s), d:\program files (x86)\microsoft<br>
> visual studio 12.0\vc\include\vector, line 101 + 0x14 byte(s)<br>
> 63><br>
> 63> 0x01D974CF (0x00331A38 0x00337B40 0x00000000 0xCCCCCCCC),<br>
> `anonymous namespace'::LoopInterchangeTransform::restructureLoops() +<br>
> 0x9F bytes(s), e:\llvm\llvm\lib\transforms\scalar\loopinterchange.cpp,<br>
> line 1015 + 0xA byte(s)<br>
> 63><br>
> 63> 0x01D9741E (0x0421EF20 0xCCCCCCCC 0xCCCCCCCC 0x00337B40),<br>
> `anonymous namespace'::LoopInterchangeTransform::transform() + 0x22E<br>
> bytes(s), e:\llvm\llvm\lib\transforms\scalar\loopinterchange.cpp, line<br>
> 1060<br>
> 63><br>
> 63> 0x01D9C9BC (0x0421EE90 0x0421EE9C 0x0421EEB0 0x00337B40),<br>
> `anonymous namespace'::LoopInterchange::processLoop() + 0x20C<br>
> bytes(s), e:\llvm\llvm\lib\transforms\scalar\loopinterchange.cpp, line<br>
> 592<br>
> 63><br>
> 63> 0x01D9CD64 (0x0421EF34 0x0421EF40 0x0421EF54 0x00337B40),<br>
> `anonymous namespace'::LoopInterchange::processLoopList() + 0x304<br>
> bytes(s), e:\llvm\llvm\lib\transforms\scalar\loopinterchange.cpp, line<br>
> 550 + 0x29 byte(s)<br>
> 63><br>
> 63> 0x01D9D4F0 (0x003213C0 0x0421F2B4 0x0421F1F8 0x00000001),<br>
> `anonymous namespace'::LoopInterchange::runOnFunction() + 0x1D0<br>
> bytes(s), e:\llvm\llvm\lib\transforms\scalar\loopinterchange.cpp, line<br>
> 465 + 0x1D byte(s)<br>
> 63><br>
> 63> 0x0191E1D5 (0x003213C0 0x00000000 0xCCCCCCCC 0x002F53DC),<br>
> llvm::FPPassManager::runOnFunction() + 0x105 bytes(s),<br>
> e:\llvm\llvm\lib\ir\legacypassmanager.cpp, line 1538 + 0x17 byte(s)<br>
> 63><br>
> 63> 0x0191E365 (0x002F5408 0x0421F77C 0x0421F2C0 0x00000001),<br>
> llvm::FPPassManager::runOnModule() + 0x75 bytes(s),<br>
> e:\llvm\llvm\lib\ir\legacypassmanager.cpp, line 1558 + 0x15 byte(s)<br>
> 63><br>
> 63> 0x0191F2D9 (0x002F5408 0x0421F304 0x7EFDE000 0xCCCCCCCC),<br>
> `anonymous namespace'::MPPassManager::runOnModule() + 0x1C9 bytes(s),<br>
> e:\llvm\llvm\lib\ir\legacypassmanager.cpp, line 1616 + 0x17 byte(s)<br>
> 63><br>
> 63> 0x0191F971 (0x002F5408 0x0421F570 0x0421F77C 0x00B90A06),<br>
> llvm::legacy::PassManagerImpl::run() + 0x101 bytes(s),<br>
> e:\llvm\llvm\lib\ir\legacypassmanager.cpp, line 1723 + 0x1B byte(s)<br>
> 63><br>
> 63> 0x0191A3ED (0x002F5408 0x00000000 0x00000000 0xCCCCCCCC),<br>
> llvm::legacy::PassManager::run() + 0x1D bytes(s),<br>
> e:\llvm\llvm\lib\ir\legacypassmanager.cpp, line 1757<br>
> 63><br>
> 63> 0x00B90A06 (0x00000004 0x002EA2A0 0x002EFF88 0x75B36014), main()<br>
> + 0x1696 bytes(s), e:\llvm\llvm\tools\opt\opt.cpp, line 614<br>
> 63><br>
> 63> 0x024017A9 (0x0421F7E0 0x778D336A 0x7EFDE000 0x0421F820),<br>
> __tmainCRTStartup() + 0x199 bytes(s),<br>
> f:\dd\vctools\crt\crtw32\dllstuff\crtexe.c, line 626 + 0x19 byte(s)<br>
> 63><br>
> 63> 0x024018ED (0x7EFDE000 0x0421F820 0x77E992B2 0x7EFDE000),<br>
> mainCRTStartup() + 0xD bytes(s),<br>
> f:\dd\vctools\crt\crtw32\dllstuff\crtexe.c, line 466<br>
> 63><br>
> 63> 0x778D336A (0x7EFDE000 0x7AB5E46A 0x00000000 0x00000000),<br>
> BaseThreadInitThunk() + 0x12 bytes(s)<br>
> 63><br>
> 63> 0x77E992B2 (0x024018E0 0x7EFDE000 0x00000000 0x00000000),<br>
> RtlInitializeExceptionChain() + 0x63 bytes(s)<br>
> 63><br>
> 63> 0x77E99285 (0x024018E0 0x7EFDE000 0x00000000 0x00000000),<br>
> RtlInitializeExceptionChain() + 0x36 bytes(s)<br>
> 63><br>
> 63><br>
> 63><br>
> 63> Command 1: "E:/llvm/2013/Debug/bin\FileCheck.EXE"<br>
> "E:\llvm\llvm\test\Transforms\LoopInterchange\reductions.ll"<br>
> 63> Command 1 Result: 2<br>
> 63> Command 1 Output:<br>
> 63><br>
> 63><br>
> 63> Command 1 Stderr:<br>
> 63>CUSTOMBUILD : FileCheck error : '-' is empty.<br>
> 63><br>
> 63><br>
> 63><br>
> 63><br>
> 63> --<br>
> 63><br>
><br>
> ~Aaron<br>
><br>
>> +<br>
>> + InnerLoop->addChildLoop(OuterLoop);<br>
>> +}<br>
>> +<br>
>> +bool LoopInterchangeTransform::transform() {<br>
>> +<br>
>> + DEBUG(dbgs() << "transform\n");<br>
>> + bool Transformed = false;<br>
>> + Instruction *InnerIndexVar;<br>
>> +<br>
>> + if (InnerLoop->getSubLoops().size() == 0) {<br>
>> + BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();<br>
>> + DEBUG(dbgs() << "Calling Split Inner Loop\n");<br>
>> + PHINode *InductionPHI = getInductionVariable(InnerLoop, SE);<br>
>> + if (!InductionPHI) {<br>
>> + DEBUG(dbgs() << "Failed to find the point to split loop latch \n");<br>
>> + return false;<br>
>> + }<br>
>> +<br>
>> + if (InductionPHI->getIncomingBlock(0) == InnerLoopPreHeader)<br>
>> + InnerIndexVar = dyn_cast<Instruction>(InductionPHI->getIncomingValue(1));<br>
>> + else<br>
>> + InnerIndexVar = dyn_cast<Instruction>(InductionPHI->getIncomingValue(0));<br>
>> +<br>
>> + //<br>
>> + // Split at the place were the induction variable is<br>
>> + // incremented/decremented.<br>
>> + // TODO: This splitting logic may not work always. Fix this.<br>
>> + splitInnerLoopLatch(InnerIndexVar);<br>
>> + DEBUG(dbgs() << "splitInnerLoopLatch Done\n");<br>
>> +<br>
>> + // Splits the inner loops phi nodes out into a seperate basic block.<br>
>> + splitInnerLoopHeader();<br>
>> + DEBUG(dbgs() << "splitInnerLoopHeader Done\n");<br>
>> + }<br>
>> +<br>
>> + Transformed |= adjustLoopLinks();<br>
>> + if (!Transformed) {<br>
>> + DEBUG(dbgs() << "adjustLoopLinks Failed\n");<br>
>> + return false;<br>
>> + }<br>
>> +<br>
>> + restructureLoops(InnerLoop, OuterLoop);<br>
>> + return true;<br>
>> +}<br>
>> +<br>
>> +void LoopInterchangeTransform::initialize() {}<br>
>> +<br>
>> +void LoopInterchangeTransform::splitInnerLoopLatch(Instruction *inc) {<br>
>> +<br>
>> + BasicBlock *InnerLoopLatch = InnerLoop->getLoopLatch();<br>
>> + BasicBlock::iterator I = InnerLoopLatch->begin();<br>
>> + BasicBlock::iterator E = InnerLoopLatch->end();<br>
>> + for (; I != E; ++I) {<br>
>> + if (inc == I)<br>
>> + break;<br>
>> + }<br>
>> +<br>
>> + BasicBlock *InnerLoopLatchPred = InnerLoopLatch;<br>
>> + InnerLoopLatch = SplitBlock(InnerLoopLatchPred, I, DT, LI);<br>
>> +}<br>
>> +<br>
>> +void LoopInterchangeTransform::splitOuterLoopLatch() {<br>
>> + BasicBlock *OuterLoopLatch = OuterLoop->getLoopLatch();<br>
>> + BasicBlock *OuterLatchLcssaPhiBlock = OuterLoopLatch;<br>
>> + OuterLoopLatch = SplitBlock(OuterLatchLcssaPhiBlock,<br>
>> + OuterLoopLatch->getFirstNonPHI(), DT, LI);<br>
>> +}<br>
>> +<br>
>> +void LoopInterchangeTransform::splitInnerLoopHeader() {<br>
>> +<br>
>> + // Split the inner loop header out.<br>
>> + BasicBlock *InnerLoopHeader = InnerLoop->getHeader();<br>
>> + SplitBlock(InnerLoopHeader, InnerLoopHeader->getFirstNonPHI(), DT, LI);<br>
>> +<br>
>> + DEBUG(dbgs() << "Output of splitInnerLoopHeader InnerLoopHeaderSucc & "<br>
>> + "InnerLoopHeader \n");<br>
>> +}<br>
>> +<br>
>> +void LoopInterchangeTransform::adjustOuterLoopPreheader() {<br>
>> + BasicBlock *OuterLoopPreHeader = OuterLoop->getLoopPreheader();<br>
>> + SmallVector<Instruction *, 8> Inst;<br>
>> + for (auto I = OuterLoopPreHeader->begin(), E = OuterLoopPreHeader->end();<br>
>> + I != E; ++I) {<br>
>> + if (isa<BranchInst>(*I))<br>
>> + break;<br>
>> + Inst.push_back(I);<br>
>> + }<br>
>> +<br>
>> + BasicBlock *InnerPreHeader = InnerLoop->getLoopPreheader();<br>
>> + for (auto I = Inst.begin(), E = Inst.end(); I != E; ++I) {<br>
>> + Instruction *Ins = cast<Instruction>(*I);<br>
>> + Ins->moveBefore(InnerPreHeader->getTerminator());<br>
>> + }<br>
>> +}<br>
>> +<br>
>> +void LoopInterchangeTransform::adjustInnerLoopPreheader() {<br>
>> +<br>
>> + BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();<br>
>> + SmallVector<Instruction *, 8> Inst;<br>
>> + for (auto I = InnerLoopPreHeader->begin(), E = InnerLoopPreHeader->end();<br>
>> + I != E; ++I) {<br>
>> + if (isa<BranchInst>(*I))<br>
>> + break;<br>
>> + Inst.push_back(I);<br>
>> + }<br>
>> + BasicBlock *OuterHeader = OuterLoop->getHeader();<br>
>> + for (auto I = Inst.begin(), E = Inst.end(); I != E; ++I) {<br>
>> + Instruction *Ins = cast<Instruction>(*I);<br>
>> + Ins->moveBefore(OuterHeader->getTerminator());<br>
>> + }<br>
>> +}<br>
>> +<br>
>> +bool LoopInterchangeTransform::adjustLoopBranches() {<br>
>> +<br>
>> + DEBUG(dbgs() << "adjustLoopBranches called\n");<br>
>> + // Adjust the loop preheader<br>
>> + BasicBlock *InnerLoopHeader = InnerLoop->getHeader();<br>
>> + BasicBlock *OuterLoopHeader = OuterLoop->getHeader();<br>
>> + BasicBlock *InnerLoopLatch = InnerLoop->getLoopLatch();<br>
>> + BasicBlock *OuterLoopLatch = OuterLoop->getLoopLatch();<br>
>> + BasicBlock *OuterLoopPreHeader = OuterLoop->getLoopPreheader();<br>
>> + BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();<br>
>> + BasicBlock *OuterLoopPredecessor = OuterLoopPreHeader->getUniquePredecessor();<br>
>> + BasicBlock *InnerLoopLatchPredecessor =<br>
>> + InnerLoopLatch->getUniquePredecessor();<br>
>> + BasicBlock *InnerLoopLatchSuccessor;<br>
>> + BasicBlock *OuterLoopLatchSuccessor;<br>
>> +<br>
>> + BranchInst *OuterLoopLatchBI =<br>
>> + dyn_cast<BranchInst>(OuterLoopLatch->getTerminator());<br>
>> + BranchInst *InnerLoopLatchBI =<br>
>> + dyn_cast<BranchInst>(InnerLoopLatch->getTerminator());<br>
>> + BranchInst *OuterLoopHeaderBI =<br>
>> + dyn_cast<BranchInst>(OuterLoopHeader->getTerminator());<br>
>> + BranchInst *InnerLoopHeaderBI =<br>
>> + dyn_cast<BranchInst>(InnerLoopHeader->getTerminator());<br>
>> +<br>
>> + if (!OuterLoopPredecessor || !InnerLoopLatchPredecessor ||<br>
>> + !OuterLoopLatchBI || !InnerLoopLatchBI || !OuterLoopHeaderBI ||<br>
>> + !InnerLoopHeaderBI)<br>
>> + return false;<br>
>> +<br>
>> + BranchInst *InnerLoopLatchPredecessorBI =<br>
>> + dyn_cast<BranchInst>(InnerLoopLatchPredecessor->getTerminator());<br>
>> + BranchInst *OuterLoopPredecessorBI =<br>
>> + dyn_cast<BranchInst>(OuterLoopPredecessor->getTerminator());<br>
>> +<br>
>> + if (!OuterLoopPredecessorBI || !InnerLoopLatchPredecessorBI)<br>
>> + return false;<br>
>> + BasicBlock *InnerLoopHeaderSucessor = InnerLoopHeader->getUniqueSuccessor();<br>
>> + if (!InnerLoopHeaderSucessor)<br>
>> + return false;<br>
>> +<br>
>> + // Adjust Loop Preheader and headers<br>
>> +<br>
>> + unsigned NumSucc = OuterLoopPredecessorBI->getNumSuccessors();<br>
>> + for (unsigned i = 0; i < NumSucc; ++i) {<br>
>> + if (OuterLoopPredecessorBI->getSuccessor(i) == OuterLoopPreHeader)<br>
>> + OuterLoopPredecessorBI->setSuccessor(i, InnerLoopPreHeader);<br>
>> + }<br>
>> +<br>
>> + NumSucc = OuterLoopHeaderBI->getNumSuccessors();<br>
>> + for (unsigned i = 0; i < NumSucc; ++i) {<br>
>> + if (OuterLoopHeaderBI->getSuccessor(i) == OuterLoopLatch)<br>
>> + OuterLoopHeaderBI->setSuccessor(i, LoopExit);<br>
>> + else if (OuterLoopHeaderBI->getSuccessor(i) == InnerLoopPreHeader)<br>
>> + OuterLoopHeaderBI->setSuccessor(i, InnerLoopHeaderSucessor);<br>
>> + }<br>
>> +<br>
>> + BranchInst::Create(OuterLoopPreHeader, InnerLoopHeaderBI);<br>
>> + InnerLoopHeaderBI->eraseFromParent();<br>
>> +<br>
>> + // -------------Adjust loop latches-----------<br>
>> + if (InnerLoopLatchBI->getSuccessor(0) == InnerLoopHeader)<br>
>> + InnerLoopLatchSuccessor = InnerLoopLatchBI->getSuccessor(1);<br>
>> + else<br>
>> + InnerLoopLatchSuccessor = InnerLoopLatchBI->getSuccessor(0);<br>
>> +<br>
>> + NumSucc = InnerLoopLatchPredecessorBI->getNumSuccessors();<br>
>> + for (unsigned i = 0; i < NumSucc; ++i) {<br>
>> + if (InnerLoopLatchPredecessorBI->getSuccessor(i) == InnerLoopLatch)<br>
>> + InnerLoopLatchPredecessorBI->setSuccessor(i, InnerLoopLatchSuccessor);<br>
>> + }<br>
>> +<br>
>> + if (OuterLoopLatchBI->getSuccessor(0) == OuterLoopHeader)<br>
>> + OuterLoopLatchSuccessor = OuterLoopLatchBI->getSuccessor(1);<br>
>> + else<br>
>> + OuterLoopLatchSuccessor = OuterLoopLatchBI->getSuccessor(0);<br>
>> +<br>
>> + if (InnerLoopLatchBI->getSuccessor(1) == InnerLoopLatchSuccessor)<br>
>> + InnerLoopLatchBI->setSuccessor(1, OuterLoopLatchSuccessor);<br>
>> + else<br>
>> + InnerLoopLatchBI->setSuccessor(0, OuterLoopLatchSuccessor);<br>
>> +<br>
>> + if (OuterLoopLatchBI->getSuccessor(0) == OuterLoopLatchSuccessor) {<br>
>> + OuterLoopLatchBI->setSuccessor(0, InnerLoopLatch);<br>
>> + } else {<br>
>> + OuterLoopLatchBI->setSuccessor(1, InnerLoopLatch);<br>
>> + }<br>
>> +<br>
>> + return true;<br>
>> +}<br>
>> +void LoopInterchangeTransform::adjustLoopPreheaders() {<br>
>> +<br>
>> + // We have interchanged the preheaders so we need to interchange the data in<br>
>> + // the preheader as well.<br>
>> + // This is because the content of inner preheader was previously executed<br>
>> + // inside the outer loop.<br>
>> + BasicBlock *OuterLoopPreHeader = OuterLoop->getLoopPreheader();<br>
>> + BasicBlock *InnerLoopPreHeader = InnerLoop->getLoopPreheader();<br>
>> + BasicBlock *OuterLoopHeader = OuterLoop->getHeader();<br>
>> + BranchInst *InnerTermBI =<br>
>> + cast<BranchInst>(InnerLoopPreHeader->getTerminator());<br>
>> +<br>
>> + SmallVector<Value *, 16> OuterPreheaderInstr;<br>
>> + SmallVector<Value *, 16> InnerPreheaderInstr;<br>
>> +<br>
>> + for (auto I = OuterLoopPreHeader->begin(); !isa<BranchInst>(I); ++I)<br>
>> + OuterPreheaderInstr.push_back(I);<br>
>> +<br>
>> + for (auto I = InnerLoopPreHeader->begin(); !isa<BranchInst>(I); ++I)<br>
>> + InnerPreheaderInstr.push_back(I);<br>
>> +<br>
>> + BasicBlock *HeaderSplit =<br>
>> + SplitBlock(OuterLoopHeader, OuterLoopHeader->getTerminator(), DT, LI);<br>
>> + Instruction *InsPoint = HeaderSplit->getFirstNonPHI();<br>
>> + // These instructions should now be executed inside the loop.<br>
>> + // Move instruction into a new block after outer header.<br>
>> + for (auto I = InnerPreheaderInstr.begin(), E = InnerPreheaderInstr.end();<br>
>> + I != E; ++I) {<br>
>> + Instruction *Ins = cast<Instruction>(*I);<br>
>> + Ins->moveBefore(InsPoint);<br>
>> + }<br>
>> + // These instructions were not executed previously in the loop so move them to<br>
>> + // the older inner loop preheader.<br>
>> + for (auto I = OuterPreheaderInstr.begin(), E = OuterPreheaderInstr.end();<br>
>> + I != E; ++I) {<br>
>> + Instruction *Ins = cast<Instruction>(*I);<br>
>> + Ins->moveBefore(InnerTermBI);<br>
>> + }<br>
>> +}<br>
>> +<br>
>> +bool LoopInterchangeTransform::adjustLoopLinks() {<br>
>> +<br>
>> + // Adjust all branches in the inner and outer loop.<br>
>> + bool Changed = adjustLoopBranches();<br>
>> + if (Changed)<br>
>> + adjustLoopPreheaders();<br>
>> + return Changed;<br>
>> +}<br>
>> +<br>
>> +char LoopInterchange::ID = 0;<br>
>> +INITIALIZE_PASS_BEGIN(LoopInterchange, "loop-interchange",<br>
>> + "Interchanges loops for cache reuse", false, false)<br>
>> +INITIALIZE_AG_DEPENDENCY(AliasAnalysis)<br>
>> +INITIALIZE_PASS_DEPENDENCY(DependenceAnalysis)<br>
>> +INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)<br>
>> +INITIALIZE_PASS_DEPENDENCY(ScalarEvolution)<br>
>> +INITIALIZE_PASS_DEPENDENCY(LoopSimplify)<br>
>> +INITIALIZE_PASS_DEPENDENCY(LCSSA)<br>
>> +INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)<br>
>> +<br>
>> +INITIALIZE_PASS_END(LoopInterchange, "loop-interchange",<br>
>> + "Interchanges loops for cache reuse", false, false)<br>
>> +<br>
>> +Pass *llvm::createLoopInterchangePass() { return new LoopInterchange(); }<br>
>><br>
>> Modified: llvm/trunk/lib/Transforms/Scalar/Scalar.cpp<br>
>> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/Scalar.cpp?rev=231458&r1=231457&r2=231458&view=diff" target="_blank">
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/Scalar.cpp?rev=231458&r1=231457&r2=231458&view=diff</a><br>
>> ==============================================================================<br>
>> --- llvm/trunk/lib/Transforms/Scalar/Scalar.cpp (original)<br>
>> +++ llvm/trunk/lib/Transforms/Scalar/Scalar.cpp Fri Mar 6 04:11:25 2015<br>
>> @@ -48,6 +48,7 @@ void llvm::initializeScalarOpts(PassRegi<br>
>> initializeLoopDeletionPass(Registry);<br>
>> initializeLoopAccessAnalysisPass(Registry);<br>
>> initializeLoopInstSimplifyPass(Registry);<br>
>> + initializeLoopInterchangePass(Registry);<br>
>> initializeLoopRotatePass(Registry);<br>
>> initializeLoopStrengthReducePass(Registry);<br>
>> initializeLoopRerollPass(Registry);<br>
>><br>
>> Added: llvm/trunk/test/Transforms/LoopInterchange/currentLimitation.ll<br>
>> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopInterchange/currentLimitation.ll?rev=231458&view=auto" target="_blank">
http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopInterchange/currentLimitation.ll?rev=231458&view=auto</a><br>
>> ==============================================================================<br>
>> --- llvm/trunk/test/Transforms/LoopInterchange/currentLimitation.ll (added)<br>
>> +++ llvm/trunk/test/Transforms/LoopInterchange/currentLimitation.ll Fri Mar 6 04:11:25 2015<br>
>> @@ -0,0 +1,58 @@<br>
>> +; RUN: opt < %s -basicaa -loop-interchange -S | FileCheck %s<br>
>> +;; These are test that fail to interchange due to current limitation. This will go off once we extend the loop interchange pass.<br>
>> +<br>
>> +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"<br>
>> +target triple = "x86_64-unknown-linux-gnu"<br>
>> +<br>
>> +@A = common global [100 x [100 x i32]] zeroinitializer<br>
>> +@B = common global [100 x [100 x [100 x i32]]] zeroinitializer<br>
>> +<br>
>> +;;--------------------------------------Test case 01------------------------------------<br>
>> +;; [FIXME] This loop though valid is currently not interchanged due to the limitation that we cannot split the inner loop latch due to multiple use of inner induction<br>
>> +;; variable.(used to increment the loop counter and to access A[j+1][i+1]<br>
>> +;; for(int i=0;i<N-1;i++)<br>
>> +;; for(int j=1;j<N-1;j++)<br>
>> +;; A[j+1][i+1] = A[j+1][i+1] + k;<br>
>> +<br>
>> +define void @interchange_01(i32 %k, i32 %N) {<br>
>> + entry:<br>
>> + %sub = add nsw i32 %N, -1<br>
>> + %cmp26 = icmp sgt i32 %N, 1<br>
>> + br i1 %cmp26, label %<a href="http://for.cond1.preheader.lr.ph" target="_blank">for.cond1.preheader.lr.ph</a>, label %for.end17<br>
>> +<br>
>> + <a href="http://for.cond1.preheader.lr.ph" target="_blank">for.cond1.preheader.lr.ph</a>:<br>
>> + %cmp324 = icmp sgt i32 %sub, 1<br>
>> + %0 = add i32 %N, -2<br>
>> + %1 = sext i32 %sub to i64<br>
>> + br label %for.cond1.preheader<br>
>> +<br>
>> + for.cond.loopexit:<br>
>> + %cmp = icmp slt i64 %indvars.iv.next29, %1<br>
>> + br i1 %cmp, label %for.cond1.preheader, label %for.end17<br>
>> +<br>
>> + for.cond1.preheader:<br>
>> + %indvars.iv28 = phi i64 [ 0, %<a href="http://for.cond1.preheader.lr.ph" target="_blank">for.cond1.preheader.lr.ph</a> ], [ %indvars.iv.next29, %for.cond.loopexit ]<br>
>> + %indvars.iv.next29 = add nuw nsw i64 %indvars.iv28, 1<br>
>> + br i1 %cmp324, label %for.body4, label %for.cond.loopexit<br>
>> +<br>
>> + for.body4:<br>
>> + %indvars.iv = phi i64 [ %indvars.iv.next, %for.body4 ], [ 1, %for.cond1.preheader ]<br>
>> + %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1<br>
>> + %arrayidx7 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv.next, i64 %indvars.iv.next29<br>
>> + %2 = load i32, i32* %arrayidx7<br>
>> + %add8 = add nsw i32 %2, %k<br>
>> + store i32 %add8, i32* %arrayidx7<br>
>> + %lftr.wideiv = trunc i64 %indvars.iv to i32<br>
>> + %exitcond = icmp eq i32 %lftr.wideiv, %0<br>
>> + br i1 %exitcond, label %for.cond.loopexit, label %for.body4<br>
>> +<br>
>> + for.end17:<br>
>> + ret void<br>
>> +}<br>
>> +;; Inner loop not split so it is not interchanged.<br>
>> +; CHECK-LABEL: @interchange_01<br>
>> +; CHECK: for.body4:<br>
>> +; CHECK-NEXT: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body4 ], [ 1, %for.body4.preheader ]<br>
>> +; CHECK-NEXT: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1<br>
>> +; CHECK-NEXT: %arrayidx7 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv.next, i64 %indvars.iv.next29<br>
>> +<br>
>><br>
>> Added: llvm/trunk/test/Transforms/LoopInterchange/interchange.ll<br>
>> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopInterchange/interchange.ll?rev=231458&view=auto" target="_blank">
http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopInterchange/interchange.ll?rev=231458&view=auto</a><br>
>> ==============================================================================<br>
>> --- llvm/trunk/test/Transforms/LoopInterchange/interchange.ll (added)<br>
>> +++ llvm/trunk/test/Transforms/LoopInterchange/interchange.ll Fri Mar 6 04:11:25 2015<br>
>> @@ -0,0 +1,557 @@<br>
>> +; RUN: opt < %s -basicaa -loop-interchange -S | FileCheck %s<br>
>> +;; We test the complete .ll for adjustment in outer loop header/latch and inner loop header/latch.<br>
>> +<br>
>> +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"<br>
>> +target triple = "x86_64-unknown-linux-gnu"<br>
>> +<br>
>> +@A = common global [100 x [100 x i32]] zeroinitializer<br>
>> +@B = common global [100 x i32] zeroinitializer<br>
>> +@C = common global [100 x [100 x i32]] zeroinitializer<br>
>> +@D = common global [100 x [100 x [100 x i32]]] zeroinitializer<br>
>> +<br>
>> +declare void @foo(...)<br>
>> +<br>
>> +;;--------------------------------------Test case 01------------------------------------<br>
>> +;; for(int i=0;i<N;i++)<br>
>> +;; for(int j=1;j<N;j++)<br>
>> +;; A[j][i] = A[j][i]+k;<br>
>> +<br>
>> +define void @interchange_01(i32 %k, i32 %N) {<br>
>> +entry:<br>
>> + %cmp21 = icmp sgt i32 %N, 0<br>
>> + br i1 %cmp21, label %<a href="http://for.cond1.preheader.lr.ph" target="_blank">for.cond1.preheader.lr.ph</a>, label %for.end12<br>
>> +<br>
>> +<a href="http://for.cond1.preheader.lr.ph" target="_blank">for.cond1.preheader.lr.ph</a>:<br>
>> + %cmp219 = icmp sgt i32 %N, 1<br>
>> + %0 = add i32 %N, -1<br>
>> + br label %for.cond1.preheader<br>
>> +<br>
>> +for.cond1.preheader:<br>
>> + %indvars.iv23 = phi i64 [ 0, %<a href="http://for.cond1.preheader.lr.ph" target="_blank">for.cond1.preheader.lr.ph</a> ], [ %indvars.iv.next24, %for.inc10 ]<br>
>> + br i1 %cmp219, label %for.body3, label %for.inc10<br>
>> +<br>
>> +for.body3:<br>
>> + %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ 1, %for.cond1.preheader ]<br>
>> + %arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv23<br>
>> + %1 = load i32, i32* %arrayidx5<br>
>> + %add = add nsw i32 %1, %k<br>
>> + store i32 %add, i32* %arrayidx5<br>
>> + %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1<br>
>> + %lftr.wideiv = trunc i64 %indvars.iv to i32<br>
>> + %exitcond = icmp eq i32 %lftr.wideiv, %0<br>
>> + br i1 %exitcond, label %for.inc10, label %for.body3<br>
>> +<br>
>> +for.inc10:<br>
>> + %indvars.iv.next24 = add nuw nsw i64 %indvars.iv23, 1<br>
>> + %lftr.wideiv25 = trunc i64 %indvars.iv23 to i32<br>
>> + %exitcond26 = icmp eq i32 %lftr.wideiv25, %0<br>
>> + br i1 %exitcond26, label %for.end12, label %for.cond1.preheader<br>
>> +<br>
>> +for.end12:<br>
>> + ret void<br>
>> +}<br>
>> +<br>
>> +; CHECK-LABEL: @interchange_01<br>
>> +; CHECK: entry:<br>
>> +; CHECK: %cmp21 = icmp sgt i32 %N, 0<br>
>> +; CHECK: br i1 %cmp21, label %for.body3.preheader, label %for.end12<br>
>> +; CHECK: <a href="http://for.cond1.preheader.lr.ph" target="_blank">for.cond1.preheader.lr.ph</a>:<br>
>> +; CHECK: br label %for.cond1.preheader<br>
>> +; CHECK: for.cond1.preheader:<br>
>> +; CHECK: %indvars.iv23 = phi i64 [ 0, %<a href="http://for.cond1.preheader.lr.ph" target="_blank">for.cond1.preheader.lr.ph</a> ], [ %indvars.iv.next24, %for.inc10 ]<br>
>> +; CHECK: br i1 %cmp219, label %for.body3.split1, label %for.end12.loopexit<br>
>> +; CHECK: for.body3.preheader:<br>
>> +; CHECK: %cmp219 = icmp sgt i32 %N, 1<br>
>> +; CHECK: %0 = add i32 %N, -1<br>
>> +; CHECK: br label %for.body3<br>
>> +; CHECK: for.body3:<br>
>> +; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3.split ], [ 1, %for.body3.preheader ]<br>
>> +; CHECK: br label %<a href="http://for.cond1.preheader.lr.ph" target="_blank">for.cond1.preheader.lr.ph</a><br>
>> +; CHECK: for.body3.split1:<br>
>> +; CHECK: %arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv23<br>
>> +; CHECK: %1 = load i32, i32* %arrayidx5<br>
>> +; CHECK: %add = add nsw i32 %1, %k<br>
>> +; CHECK: store i32 %add, i32* %arrayidx5<br>
>> +; CHECK: br label %for.inc10.loopexit<br>
>> +; CHECK: for.body3.split:<br>
>> +; CHECK: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1<br>
>> +; CHECK: %lftr.wideiv = trunc i64 %indvars.iv to i32<br>
>> +; CHECK: %exitcond = icmp eq i32 %lftr.wideiv, %0<br>
>> +; CHECK: br i1 %exitcond, label %for.end12.loopexit, label %for.body3<br>
>> +; CHECK: for.inc10.loopexit:<br>
>> +; CHECK: br label %for.inc10<br>
>> +; CHECK: for.inc10:<br>
>> +; CHECK: %indvars.iv.next24 = add nuw nsw i64 %indvars.iv23, 1<br>
>> +; CHECK: %lftr.wideiv25 = trunc i64 %indvars.iv23 to i32<br>
>> +; CHECK: %exitcond26 = icmp eq i32 %lftr.wideiv25, %0<br>
>> +; CHECK: br i1 %exitcond26, label %for.body3.split, label %for.cond1.preheader<br>
>> +; CHECK: for.end12.loopexit:<br>
>> +; CHECK: br label %for.end12<br>
>> +; CHECK: for.end12:<br>
>> +; CHECK: ret void<br>
>> +<br>
>> +;;--------------------------------------Test case 02-------------------------------------<br>
>> +<br>
>> +;; for(int i=0;i<100;i++)<br>
>> +;; for(int j=100;j>=0;j--)<br>
>> +;; A[j][i] = A[j][i]+k;<br>
>> +<br>
>> +define void @interchange_02(i32 %k) {<br>
>> +entry:<br>
>> + br label %for.cond1.preheader<br>
>> +<br>
>> +for.cond1.preheader:<br>
>> + %indvars.iv19 = phi i64 [ 0, %entry ], [ %indvars.iv.next20, %for.inc10 ]<br>
>> + br label %for.body3<br>
>> +<br>
>> +for.body3:<br>
>> + %indvars.iv = phi i64 [ 100, %for.cond1.preheader ], [ %indvars.iv.next, %for.body3 ]<br>
>> + %arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv19<br>
>> + %0 = load i32, i32* %arrayidx5<br>
>> + %add = add nsw i32 %0, %k<br>
>> + store i32 %add, i32* %arrayidx5<br>
>> + %indvars.iv.next = add nsw i64 %indvars.iv, -1<br>
>> + %cmp2 = icmp sgt i64 %indvars.iv, 0<br>
>> + br i1 %cmp2, label %for.body3, label %for.inc10<br>
>> +<br>
>> +for.inc10:<br>
>> + %indvars.iv.next20 = add nuw nsw i64 %indvars.iv19, 1<br>
>> + %exitcond = icmp eq i64 %indvars.iv.next20, 100<br>
>> + br i1 %exitcond, label %for.end11, label %for.cond1.preheader<br>
>> +<br>
>> +for.end11:<br>
>> + ret void<br>
>> +}<br>
>> +<br>
>> +; CHECK-LABEL: @interchange_02<br>
>> +; CHECK: entry:<br>
>> +; CHECK: br label %for.body3.preheader<br>
>> +; CHECK: for.cond1.preheader.preheader:<br>
>> +; CHECK: br label %for.cond1.preheader<br>
>> +; CHECK: for.cond1.preheader:<br>
>> +; CHECK: %indvars.iv19 = phi i64 [ %indvars.iv.next20, %for.inc10 ], [ 0, %for.cond1.preheader.preheader ]<br>
>> +; CHECK: br label %for.body3.split1<br>
>> +; CHECK: for.body3.preheader:<br>
>> +; CHECK: br label %for.body3<br>
>> +; CHECK: for.body3:<br>
>> +; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3.split ], [ 100, %for.body3.preheader ]<br>
>> +; CHECK: br label %for.cond1.preheader.preheader<br>
>> +; CHECK: for.body3.split1: ; preds = %for.cond1.preheader<br>
>> +; CHECK: %arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv, i64 %indvars.iv19<br>
>> +; CHECK: %0 = load i32, i32* %arrayidx5<br>
>> +; CHECK: %add = add nsw i32 %0, %k<br>
>> +; CHECK: store i32 %add, i32* %arrayidx5<br>
>> +; CHECK: br label %for.inc10<br>
>> +; CHECK: for.body3.split:<br>
>> +; CHECK: %indvars.iv.next = add nsw i64 %indvars.iv, -1<br>
>> +; CHECK: %cmp2 = icmp sgt i64 %indvars.iv, 0<br>
>> +; CHECK: br i1 %cmp2, label %for.body3, label %for.end11<br>
>> +; CHECK: for.inc10:<br>
>> +; CHECK: %indvars.iv.next20 = add nuw nsw i64 %indvars.iv19, 1<br>
>> +; CHECK: %exitcond = icmp eq i64 %indvars.iv.next20, 100<br>
>> +; CHECK: br i1 %exitcond, label %for.body3.split, label %for.cond1.preheader<br>
>> +; CHECK: for.end11:<br>
>> +; CHECK: ret void<br>
>> +<br>
>> +;;--------------------------------------Test case 03-------------------------------------<br>
>> +;; Loops should not be interchanged in this case as it is not profitable.<br>
>> +;; for(int i=0;i<100;i++)<br>
>> +;; for(int j=0;j<100;j++)<br>
>> +;; A[i][j] = A[i][j]+k;<br>
>> +<br>
>> +define void @interchange_03(i32 %k) {<br>
>> +entry:<br>
>> + br label %for.cond1.preheader<br>
>> +<br>
>> +for.cond1.preheader:<br>
>> + %indvars.iv21 = phi i64 [ 0, %entry ], [ %indvars.iv.next22, %for.inc10 ]<br>
>> + br label %for.body3<br>
>> +<br>
>> +for.body3:<br>
>> + %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.body3 ]<br>
>> + %arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv21, i64 %indvars.iv<br>
>> + %0 = load i32, i32* %arrayidx5<br>
>> + %add = add nsw i32 %0, %k<br>
>> + store i32 %add, i32* %arrayidx5<br>
>> + %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1<br>
>> + %exitcond = icmp eq i64 %indvars.iv.next, 100<br>
>> + br i1 %exitcond, label %for.inc10, label %for.body3<br>
>> +<br>
>> +for.inc10:<br>
>> + %indvars.iv.next22 = add nuw nsw i64 %indvars.iv21, 1<br>
>> + %exitcond23 = icmp eq i64 %indvars.iv.next22, 100<br>
>> + br i1 %exitcond23, label %for.end12, label %for.cond1.preheader<br>
>> +<br>
>> +for.end12:<br>
>> + ret void<br>
>> +}<br>
>> +<br>
>> +; CHECK-LABEL: @interchange_03<br>
>> +; CHECK: entry:<br>
>> +; CHECK: br label %for.cond1.preheader.preheader<br>
>> +; CHECK: for.cond1.preheader.preheader: ; preds = %entry<br>
>> +; CHECK: br label %for.cond1.preheader<br>
>> +; CHECK: for.cond1.preheader: ; preds = %for.cond1.preheader.preheader, %for.inc10<br>
>> +; CHECK: %indvars.iv21 = phi i64 [ %indvars.iv.next22, %for.inc10 ], [ 0, %for.cond1.preheader.preheader ]<br>
>> +; CHECK: br label %for.body3.preheader<br>
>> +; CHECK: for.body3.preheader: ; preds = %for.cond1.preheader<br>
>> +; CHECK: br label %for.body3<br>
>> +; CHECK: for.body3: ; preds = %for.body3.preheader, %for.body3<br>
>> +; CHECK: %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ 0, %for.body3.preheader ]<br>
>> +; CHECK: %arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv21, i64 %indvars.iv<br>
>> +; CHECK: %0 = load i32, i32* %arrayidx5<br>
>> +; CHECK: %add = add nsw i32 %0, %k<br>
>> +; CHECK: store i32 %add, i32* %arrayidx5<br>
>> +; CHECK: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1<br>
>> +; CHECK: %exitcond = icmp eq i64 %indvars.iv.next, 100<br>
>> +; CHECK: br i1 %exitcond, label %for.inc10, label %for.body3<br>
>> +; CHECK: for.inc10: ; preds = %for.body3<br>
>> +; CHECK: %indvars.iv.next22 = add nuw nsw i64 %indvars.iv21, 1<br>
>> +; CHECK: %exitcond23 = icmp eq i64 %indvars.iv.next22, 100<br>
>> +; CHECK: br i1 %exitcond23, label %for.end12, label %for.cond1.preheader<br>
>> +; CHECK: for.end12: ; preds = %for.inc10<br>
>> +; CHECK: ret void<br>
>> +<br>
>> +<br>
>> +;;--------------------------------------Test case 04-------------------------------------<br>
>> +;; Loops should not be interchanged in this case as it is not legal due to dependency.<br>
>> +;; for(int j=0;j<99;j++)<br>
>> +;; for(int i=0;i<99;i++)<br>
>> +;; A[j][i+1] = A[j+1][i]+k;<br>
>> +<br>
>> +define void @interchange_04(i32 %k){<br>
>> +entry:<br>
>> + br label %for.cond1.preheader<br>
>> +<br>
>> +for.cond1.preheader:<br>
>> + %indvars.iv23 = phi i64 [ 0, %entry ], [ %indvars.iv.next24, %for.inc12 ]<br>
>> + %indvars.iv.next24 = add nuw nsw i64 %indvars.iv23, 1<br>
>> + br label %for.body3<br>
>> +<br>
>> +for.body3:<br>
>> + %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.body3 ]<br>
>> + %arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv.next24, i64 %indvars.iv<br>
>> + %0 = load i32, i32* %arrayidx5<br>
>> + %add6 = add nsw i32 %0, %k<br>
>> + %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1<br>
>> + %arrayidx11 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv23, i64 %indvars.iv.next<br>
>> + store i32 %add6, i32* %arrayidx11<br>
>> + %exitcond = icmp eq i64 %indvars.iv.next, 99<br>
>> + br i1 %exitcond, label %for.inc12, label %for.body3<br>
>> +<br>
>> +for.inc12:<br>
>> + %exitcond25 = icmp eq i64 %indvars.iv.next24, 99<br>
>> + br i1 %exitcond25, label %for.end14, label %for.cond1.preheader<br>
>> +<br>
>> +for.end14:<br>
>> + ret void<br>
>> +}<br>
>> +<br>
>> +; CHECK-LABEL: @interchange_04<br>
>> +; CHECK: entry:<br>
>> +; CHECK: br label %for.cond1.preheader<br>
>> +; CHECK: for.cond1.preheader: ; preds = %for.inc12, %entry<br>
>> +; CHECK: %indvars.iv23 = phi i64 [ 0, %entry ], [ %indvars.iv.next24, %for.inc12 ]<br>
>> +; CHECK: %indvars.iv.next24 = add nuw nsw i64 %indvars.iv23, 1<br>
>> +; CHECK: br label %for.body3<br>
>> +; CHECK: for.body3: ; preds = %for.body3, %for.cond1.preheader<br>
>> +; CHECK: %indvars.iv = phi i64 [ 0, %for.cond1.preheader ], [ %indvars.iv.next, %for.body3 ]<br>
>> +; CHECK: %arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv.next24, i64 %indvars.iv<br>
>> +; CHECK: %0 = load i32, i32* %arrayidx5<br>
>> +; CHECK: %add6 = add nsw i32 %0, %k<br>
>> +; CHECK: %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1<br>
>> +; CHECK: %arrayidx11 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %indvars.iv23, i64 %indvars.iv.next<br>
>> +; CHECK: store i32 %add6, i32* %arrayidx11<br>
>> +; CHECK: %exitcond = icmp eq i64 %indvars.iv.next, 99<br>
>><o:p></o:p></p>
</div>
</div>
</div>
<p class="MsoNormal">...<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
</body>
</html>