<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <br>
    <div class="moz-cite-prefix">On 03/25/2014 02:31 PM, Jingyue Wu
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAMROOrF-M6i37_M4o0Anx-+4gf+d6pynEnRqHqiEyFPG_hMxcQ@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_quote">
          <div dir="ltr">
            <div>This is a follow-up discussion on <a
                moz-do-not-send="true"
href="http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20140324/101899.html"
                target="_blank">http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20140324/101899.html</a>.
              The front-end change was already pushed in r204677, so we
              want to continue with the IR optimization. <br>
            </div>
            <div><br>
            </div>
            <div>In general, we want to write an IR pass to convert
              generic address space usage to non-generic address space
              usage, because accessing the generic address space in CUDA
              and OpenCL is significantly slower than accessing
              non-generic ones (such as shared and constant),. </div>
            <div><br>
            </div>
            <div>Here is an example Justin gave: </div>
            <div><br>
            </div>
            <div><span
                style="font-family:arial,sans-serif;font-size:13px"> 
                %ptr = ...</span><br
                style="font-family:arial,sans-serif;font-size:13px">
              <span style="font-family:arial,sans-serif;font-size:13px"> 
                %val = load i32* %ptr</span><br
                style="font-family:arial,sans-serif;font-size:13px">
              <br style="font-family:arial,sans-serif;font-size:13px">
              <span style="font-family:arial,sans-serif;font-size:13px">In
                this case, %ptr is a generic address space pointer
                (assuming an address space mapping where 0 is generic). 
                But if an analysis can prove that the pointer %ptr was
                originally addrspacecast'd from a specific address space
                (or some other mechanism through which the pointer's
                specific address space can be determined), it may be
                beneficial to explicitly convert the IR to something
                like:</span><br
                style="font-family:arial,sans-serif;font-size:13px">
              <br style="font-family:arial,sans-serif;font-size:13px">
              <span style="font-family:arial,sans-serif;font-size:13px"> 
                %ptr = ...</span><br
                style="font-family:arial,sans-serif;font-size:13px">
              <span style="font-family:arial,sans-serif;font-size:13px"> 
                %ptr.0 = addrspacecast i32* to i32 addrspace(3)*</span><br
                style="font-family:arial,sans-serif;font-size:13px">
              <span style="font-family:arial,sans-serif;font-size:13px"> 
                %val = load i32 addrspace(3)* %ptr.0</span><br
                style="font-family:arial,sans-serif;font-size:13px">
              <br style="font-family:arial,sans-serif;font-size:13px">
              <span style="font-family:arial,sans-serif;font-size:13px">Such
                a translation may generate better code for some targets.</span></div>
          </div>
        </div>
      </div>
    </blockquote>
    Just a note of caution: for some of us, address spaces are
    semantically important.  (i.e. having a cast introduced from one to
    another would be incorrect)  I have no problem with the mechanism
    you're describing being implemented, but it needs to be an opt in
    feature.<br>
    <br>
    <blockquote
cite="mid:CAMROOrF-M6i37_M4o0Anx-+4gf+d6pynEnRqHqiEyFPG_hMxcQ@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_quote">
          <div dir="ltr">
            <div><br>
            </div>
            <div>There are two major design decisions we need to make: </div>
            <div><br>
            </div>
            <div>1. Where does this pass live? Target-independent or
              target-dependent?</div>
            <div><br>
            </div>
            <div>Both NVPTX and R600 backend want this optimization,
              which seems a good justification for making this
              optimization target-independent. </div>
            <div><br>
            </div>
            <div>However, we have three concerns on this:</div>
            <div>a) I doubt this optimization is valid for all targets,
              because LLVM language reference (<a moz-do-not-send="true"
href="http://llvm.org/docs/LangRef.html#addrspacecast-to-instruction"
                target="_blank">http://llvm.org/docs/LangRef.html#addrspacecast-to-instruction</a>)
              says addrspacecast "can be a no-op cast or a complex value
              modification, depending on the target and the address
              space pair." </div>
            <div>b) NVPTX and R600 have different address numbering for
              the generic address space, which makes things more
              complicated. </div>
            <div>c) We don't have a good understanding of the R600
              backend. </div>
            <div><br>
            </div>
            <div>
              Therefore, I would vote for making this optimization
              NVPTX-specific for now. If other targets need this, we can
              later think about how to reuse the code. <br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    No opinion, but if it is target independent, it needs to be behind
    an optin target hook.  <br>
    <blockquote
cite="mid:CAMROOrF-M6i37_M4o0Anx-+4gf+d6pynEnRqHqiEyFPG_hMxcQ@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_quote">
          <div dir="ltr">
            <div><br>
            </div>
            <div>2. How effective do we want this optimization to be? </div>
            <div><br>
            </div>
            <div>In the short term, I want it to be able to eliminate
              unnecessary non-generic-to-generic addrspacecasts the
              front-end generates for the NVPTX target. For example, <br>
            </div>
            <div><br>
            </div>
            <div>%p1 = addrspace i32 addrspace(3)* %p0 to i32*</div>
            <div>%v = load i32* %p1</div>
            <div><br>
            </div>
            <div>=></div>
            <div><br>
            </div>
            <div>%v = load i32 addrspace(3)* %p0</div>
            <div><br>
            </div>
            <div>We want similar optimization for store+addrspacecast
              and gep+addrspacecast as well. </div>
            <div><br>
            </div>
            <div>In a long term, we could for sure improve this
              optimization to handle more instructions and more
              patterns. <br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    Just to note, this last bit raises much less worries for me about
    correctness of my work.  If you've loading from a pointer which was
    in different address space, it seems very logical to combine that
    with the load.  We'd also never generate code like that.  :)<br>
    <br>
    To restate my concern in general terms, it's the introduction of
    *new* casts which worry me, not the exploitation/optimization of
    existing ones.  <br>
    <br>
    <font color="#888888">Philip</font><br>
  </body>
</html>