<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 01/20/2018 12:29 PM, hameeza ahmed
      via llvm-dev wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CAFMPKeaqnr0_R1pUYwc9eYn25WzrR8KLmi6Fr7k60WNoLyO5sw@mail.gmail.com">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div dir="ltr">i have already seen usage of <span
          style="font-size:12.8px">__builtin_nontemporal_store but i
          want to automate identification of non temporal loads/stores.
          i think i need to go for a pass. is it possiblee to detect non
          temporal loops without polly? <br>
        </span></div>
    </blockquote>
    <br>
    Yes, but we don't have anything that does that right now. The cost
    modeling is non-trivial, however. In the loop below, which of those
    accesses would you expect to be nontemporal? All of those accesses
    span only 8 KB, and that's certainly smaller than many L1 caches.
    Turning those into nontemporal accesses could certainly lead to a
    performance regression for that loop, subsequent code, or both. If
    we do this more generally, I suspect that we'd need to split the
    loop so that small trip counts don't use them at all, and for larger
    trip counts, we don't disturb data-reuse opportunities that would
    otherwise exist.<br>
    <br>
     -Hal<br>
    <br>
    <blockquote type="cite"
cite="mid:CAFMPKeaqnr0_R1pUYwc9eYn25WzrR8KLmi6Fr7k60WNoLyO5sw@mail.gmail.com">
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Sat, Jan 20, 2018 at 11:26 PM, Simon
          Pilgrim <span dir="ltr"><<a
              href="mailto:llvm-dev@redking.me.uk" target="_blank"
              moz-do-not-send="true">llvm-dev@redking.me.uk</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div text="#000000" bgcolor="#FFFFFF">
              <div>
                <div class="h5"> On 20/01/2018 18:16, hameeza ahmed
                  wrote:<br>
                  <blockquote type="cite">
                    <div dir="ltr">Actually i am working on vector
                      accelerator which will perform those instructions
                      which are non temporal.
                      <div><br>
                      </div>
                      <div>for instance if i have this loop</div>
                      <div><br>
                      </div>
                      <div>for(i=0;i<2048;i++)</div>
                      <div>a[i]=b[i]+c[i];</div>
                      <div><br>
                      </div>
                      <div>currently it emits following IR;</div>
                      <div><br>
                      </div>
                      <div><br>
                      </div>
                      <div>
                        <div>  %0 = getelementptr inbounds [2048 x i32],
                          [2048 x i32]* @b, i64 0, i64 %index<br>
                        </div>
                        <div>  %1 = bitcast i32* %0 to <16 x i32>*</div>
                        <div>  %wide.load = load <16 x i32>,
                          <16 x i32>* %1, align 16, !tbaa !1</div>
                        <div>  %8 = getelementptr inbounds [2048 x i32],
                          [2048 x i32]* @c, i64 0, i64 %index</div>
                        <div>  %9 = bitcast i32* %8 to <16 x i32>*</div>
                        <div>  %wide.load14 = load <16 x i32>,
                          <16 x i32>* %9, align 16, !tbaa !1</div>
                        <div>  %16 = add nsw <16 x i32>
                          %wide.load14, %wide.load</div>
                        <div>  %20 = getelementptr inbounds [2048 x
                          i32], [2048 x i32]* @a, i64 0, i64 %index</div>
                        <div>  %21 = bitcast i32* %20 to <16 x
                          i32>*</div>
                        <div>  store <16 x i32> %16, <16 x
                          i32>* %21, align 16, !tbaa !1</div>
                      </div>
                      <div><br>
                      </div>
                      <div><br>
                      </div>
                      <div>However, i want it to emit following IR </div>
                      <div><br>
                      </div>
                      <div>
                        <div>  %0 = getelementptr inbounds [2048 x i32],
                          [2048 x i32]* @b, i64 0, i64 %index<br>
                        </div>
                        <div>  %1 = bitcast i32* %0 to <16 x i32>*</div>
                        <div>  %wide.load = load <16 x i32>,
                          <16 x i32>* %1, align 16, !tbaa !1,
                          !nontemporal !1</div>
                        <div>  %8 = getelementptr inbounds [2048 x i32],
                          [2048 x i32]* @c, i64 0, i64 %index</div>
                        <div>  %9 = bitcast i32* %8 to <16 x i32>*</div>
                        <div>  %wide.load14 = load <16 x i32>,
                          <16 x i32>* %9, align 16, !tbaa
                          !1, !nontemporal !1</div>
                        <div>  %16 = add nsw <16 x i32>
                          %wide.load14, %wide.load, !nontemporal !1</div>
                        <div>  %20 = getelementptr inbounds [2048 x
                          i32], [2048 x i32]* @a, i64 0, i64 %index</div>
                        <div>  %21 = bitcast i32* %20 to <16 x
                          i32>*</div>
                        <div>  store <16 x i32> %16, <16 x
                          i32>* %21, align 16, !tbaa !1, !nontemporal
                          !1</div>
                      </div>
                      <div><br>
                      </div>
                      <div>so that i can offload load, add, store to
                        accelerator hardware. is it possible here? do i
                        need a separate pass to detect whether the loop
                        has non temporal data or polly will help here?
                        what do you say?</div>
                    </div>
                  </blockquote>
                </div>
              </div>
              From C/C++ you just need to use the
              __builtin_nontemporal_store/__<wbr>builtin_nontemporal_load
              builtins to tag the stores/loads with the nontemporal
              flag.<br>
              <br>
              <div>for(i=0;i<2048;i++) {<br>
              </div>
              <div>  __builtin_nontemporal_store(
                __builtin_nontemporal_load(b+<wbr>i) +
                __builtin_nontemporal_load(c + i), a + i );<br>
              </div>
              <div>}<br>
              </div>
              <br>
              There may be an attribute you can tag pointers with
              instead but I don't know off hand.<span class=""><br>
                <br>
                <blockquote type="cite">
                  <div class="gmail_extra">On Sat, Jan 20, 2018 at 11:02
                    PM, Simon Pilgrim <span dir="ltr"><<a
                        href="mailto:llvm-dev@redking.me.uk"
                        target="_blank" moz-do-not-send="true">llvm-dev@redking.me.uk</a>></span>
                    wrote:<br>
                    <div class="gmail_quote">
                      <blockquote class="gmail_quote" style="margin:0 0
                        0 .8ex;border-left:1px #ccc
                        solid;padding-left:1ex">
                        <div class="m_-9084880504328883834HOEnZb">
                          <div class="m_-9084880504328883834h5">On
                            20/01/2018 17:44, hameeza ahmed via llvm-dev
                            wrote:<br>
                            <blockquote class="gmail_quote"
                              style="margin:0 0 0 .8ex;border-left:1px
                              #ccc solid;padding-left:1ex"> Hello,<br>
                              <br>
                              My work deals with non-temporal loads and
                              stores i found non-temporal meta data in
                              llvm documentation but its not shown in
                              IR.<br>
                              <br>
                              How to get non-temporal meta data?<br>
                            </blockquote>
                          </div>
                        </div>
                        llvm\test\CodeGen\X86\nontempo<wbr>ral-loads.ll
                        shows how to create nt vector loads in IR - is
                        that what you're after?<span
                          class="m_-9084880504328883834HOEnZb"><font
                            color="#888888"><br>
                            <br>
                            Simon.<br>
                          </font></span></blockquote>
                    </div>
                    <br>
                  </div>
                </blockquote>
                <br>
              </span></div>
          </blockquote>
        </div>
        <br>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
  </body>
</html>