<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 10/12/2016 4:35 PM, Charith Mendis
      via llvm-dev wrote:<br>
    </div>
    <blockquote
cite="mid:CA+t4WYzkp+Xzowf-8LLSnyjSX6bYr_e4+eoOiEZ30e7UJ1Nr+g@mail.gmail.com"
      type="cite">
      <div dir="ltr">Hi all,
        <div><br>
        </div>
        <div>Attached herewith is a simple vectorized function with
          loops performing a simple shuffle.</div>
        <div><br>
        </div>
        <div>I want all loops (inner and outer) to be unrolled by 2 and
          as such used -unroll-count=2</div>
        <div>The inner loops(with k as the induction variable and having
          constant trip counts) unroll fully, but the outer loop with
          (j) fails to unroll. </div>
        <div><br>
        </div>
        <div>The llvm code is also attached with inner loops fully
          unrolled.</div>
        <div><br>
        </div>
        <div>To inspect further, I added the following to the
          PassManagerBuilder.cpp to run some canonicalization routines
          and redo unrolling again. I have set partial unrolling on +
          have a huge threshold + allows expensive loop trip counts.
          Still it didn't unroll by 2.</div>
        <div><br>
        </div>
        <div>
          <p
            style="margin:0px;font-size:11px;line-height:normal;font-family:menlo"><span
              style="font-variant-ligatures:no-common-ligatures">MPM.add(createLoopUnrollPass());</span></p>
          <p
            style="margin:0px;font-size:11px;line-height:normal;font-family:menlo"><span
              style="font-variant-ligatures:no-common-ligatures">MPM.add(createCFGSimplificationPass()); </span></p>
          <p
            style="margin:0px;font-size:11px;line-height:normal;font-family:menlo"><span
              style="font-variant-ligatures:no-common-ligatures">MPM.add(createLoopSimplifyPass()); </span></p>
          <p
            style="margin:0px;font-size:11px;line-height:normal;font-family:menlo"><span
              style="font-variant-ligatures:no-common-ligatures">MPM.add(createLoopRotatePass(SizeLevel
              == </span><span
              style="font-variant-ligatures:no-common-ligatures;color:rgb(39,42,216)">2</span><span
              style="font-variant-ligatures:no-common-ligatures"> ? </span><span
style="font-variant-ligatures:no-common-ligatures;color:rgb(39,42,216)">0</span><span
              style="font-variant-ligatures:no-common-ligatures"> : -</span><span
style="font-variant-ligatures:no-common-ligatures;color:rgb(39,42,216)">1</span><span
              style="font-variant-ligatures:no-common-ligatures">));</span></p>
          <p
            style="margin:0px;font-size:11px;line-height:normal;font-family:menlo"><span
              style="font-variant-ligatures:no-common-ligatures">MPM.add(createLCSSAPass());</span></p>
          <p
            style="margin:0px;font-size:11px;line-height:normal;font-family:menlo"><span
              style="font-variant-ligatures:no-common-ligatures">MPM.add(createIndVarSimplifyPass()); 
                    </span><span
              style="font-variant-ligatures:no-common-ligatures;color:rgb(0,132,0)">//
              Canonicalize indvars</span></p>
          <p
            style="margin:0px;font-size:11px;line-height:normal;font-family:menlo"><span
              style="font-variant-ligatures:no-common-ligatures">MPM.add(createLoopUnrollPass());</span></p>
        </div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div>Digging deeper I found, that it fails in
          UnrollRuntimeLoopRemainder function, where it is unable to
          calculate the BackEdge taken amount.</div>
        <div><br>
        </div>
        <div>Can anybody explain what is need to get the outer loop
          unrolled by 2? It would be a great help.</div>
      </div>
    </blockquote>
    <br>
    Well, I can at least explain what is happening... runtime unrolling
    needs to be able to symbolically compute the trip count to avoid
    inserting a branch after every iteration.  SCEV isn't able to prove
    that your loop isn't an infinite loop (consider the case of
    vectorizable_elements==SIZE_MAX), therefore it can't compute the
    trip count.  Therefore, we don't unroll.<br>
    <br>
    There's a few different angles you could use to attack this: you
    could teach the unroller to unroll loops with an uncomputable trip
    count, or you can make the trip count of your loop computable
    somehow.  Changing the unroller is probably straightforward (see the
    recently committed r284044).  Making the trip count computable is
    more complicated... it's probably possible to teach SCEV to reason
    about the overflow in the pointer computation, or maybe you could
    version the loop.<br>
    <br>
    -Eli<br>
    <pre class="moz-signature" cols="72">-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project</pre>
  </body>
</html>