<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 06/11/13 08:54, Arnold wrote:<br>
    </div>
    <blockquote
      cite="mid:47CD5558-95D6-4213-93DA-DAF9866C228F@apple.com"
      type="cite">
      <div><br>
        <br>
        Sent from my iPhone</div>
      <div><br>
        On Nov 5, 2013, at 7:39 PM, Frank Winter <<a
          moz-do-not-send="true" href="mailto:fwinter@jlab.org">fwinter@jlab.org</a>>
        wrote:<br>
        <br>
      </div>
      <blockquote type="cite">
        <div>
          <div class="moz-cite-prefix">Good that you bring this up. I
            still have no solution to this vectorization problem.<br>
            <br>
            However, I can rewrite the code and insert a second loop
            which eliminates the 'urem' and 'div' instructions in the
            index calculations. In this case, the inner loop's trip
            count would be equal to the SIMD length and the loop
            vectorizer ignores the loop. Unrolling the loop and SLP is
            not an option, since the loop body can get lengthy.<br>
            <br>
            What would be a quicker to implement: <br>
            <br>
            a) Teach the loop vectorizer the 'urem' and 'div'
            instructions, or <br>
          </div>
        </div>
      </blockquote>
      <div><br>
      </div>
      This would probably be harder because your individual accesses are
      consecutive within a stride.
      <div><br>
      </div>
      <div>a[0] a[1] a[3] a[4]  a[9] a[10] a[11] a[12]</div>
      <div><br>
        <div>Not something the loop vectorizer currently understands.</div>
        <div>
          <blockquote type="cite">
            <div>
              <div class="moz-cite-prefix"> b) have the loop vectorizer
                process loops with trip count equal to the vector length
                ?<br>
              </div>
            </div>
          </blockquote>
          <div><br>
          </div>
          You should be able to change "TinyTripCountVectorThreshold" in
          loopvectorizer.cpp<br>
        </div>
      </div>
    </blockquote>
    <br>
    I managed to set this option when using 'opt' tool. Is there a way
    to set it when using the API without changing the default value in
    the source code and recompiling LLVM?<br>
    <br>
    <blockquote
      cite="mid:47CD5558-95D6-4213-93DA-DAF9866C228F@apple.com"
      type="cite">
      <div>
        <div>
          <blockquote type="cite">
            <div>
              <div class="moz-cite-prefix"> <br>
                One of both solutions will be needed, I guess.<br>
                <br>
                Frank<br>
                <br>
                <br>
                <br>
                On 05/11/13 22:12, Andrew Trick wrote:<br>
              </div>
              <blockquote
                cite="mid:87246971-E2F3-4CE2-A754-A6BC7F8AB7F3@apple.com"
                type="cite"><br>
                <div>
                  <div>On Oct 30, 2013, at 11:21 PM, Renato Golin <<a
                      moz-do-not-send="true"
                      href="mailto:renato.golin@linaro.org">renato.golin@linaro.org</a>>

                    wrote:</div>
                  <br class="Apple-interchange-newline">
                  <blockquote type="cite">
                    <div dir="ltr">On 30 October 2013 18:40, Frank
                      Winter <<a moz-do-not-send="true"
                        href="mailto:fwinter@jlab.org">fwinter@jlab.org</a>>

                      wrote:<br>
                      <div class="gmail_extra">
                        <div class="gmail_quote">
                          <blockquote class="gmail_quote">
                            <div bgcolor="#FFFFFF" text="#000000">
                              <div>      const std::uint64_t ir0 =
                                (i+0)%4;  // not working<br>
                              </div>
                            </div>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>I thought this would be the case when I
                            saw the original expression. Maybe we need
                            to teach module arithmetic to SCEV?</div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                  <div>
                    <div><br>
                    </div>
                    <div>I let this thread get stale, so here’s the
                      background again:</div>
                    <div><br>
                    </div>
                    <div>source:</div>
                    <div><br>
                    </div>
                    <div>      const std::uint64_t ir0 = i%4 + 8*(i/4);</div>
                    <div>      c[ ir0 ]         = a[ ir0 ]         + b[
                      ir0 ];</div>
                    <div><br>
                    </div>
                    <div>
                      <div>before instcombine:</div>
                      <div><br>
                      </div>
                      <div>  %4 = urem i64 %i.0, 4</div>
                      <div>  %5 = udiv i64 %i.0, 4</div>
                      <div>  %6 = mul i64 8, %5</div>
                      <div>  %7 = add i64 %4, %6</div>
                      <div>  %8 = getelementptr inbounds float* %a, i64
                        %7</div>
                    </div>
                    <div><br>
                    </div>
                    <div>after instcombine:</div>
                    <div><br>
                    </div>
                  </div>
                  <div>
                    <div>  %2 = and i64 %i.04, 3</div>
                    <div>  %3 = lshr i64 %i.04, 2</div>
                    <div>  %4 = shl i64 %3, 3</div>
                    <div>  %5 = or i64 %4, %2</div>
                    <div>  %11 = getelementptr inbounds float* %c, i64
                      %5</div>
                    <div>  store float %10, float* %11, align 4, !tbaa
                      !0</div>
                    <div><br>
                    </div>
                    <div>Honestly, I don't understand why InstCombine
                      "anti-canonicalizes" add->or. I think that
                      transformation should be deferred into we begin
                      target-specific lower (e.g. InstOptimize pass).</div>
                  </div>
                  <div><br>
                  </div>
                  <div>Given, that we aren't going to change that any
                    time soon, SCEV could probably be taught to
                    recognize the specific pattern:</div>
                  <div><br>
                  </div>
                  <div>Instructions (or (and %a, C1), (shl %b, C2))
                    -> SCEV (add %a, %b)</div>
                  <div><br>
                  </div>
                  <div>-Andy</div>
                </div>
              </blockquote>
              <br>
              <br>
            </div>
          </blockquote>
          <blockquote type="cite">
            <div>_______________________________________________<br>
              LLVM Developers mailing list<br>
              <a moz-do-not-send="true"
                href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>
                      <a moz-do-not-send="true"
                href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a><br>
              <a moz-do-not-send="true"
                href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
            </div>
          </blockquote>
        </div>
      </div>
    </blockquote>
    <br>
    <br>
  </body>
</html>