<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <br>
    <div class="moz-cite-prefix">On 03/26/2014 01:33 PM, Jingyue Wu
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAMROOrFfEY9E9B=zMi176f8AkZ3y+A5Uwt2uODt_-D68_CwD9Q@mail.gmail.com"
      type="cite">
      <div dir="ltr"><br>
        <div class="gmail_extra"><br>
          <br>
          <div class="gmail_quote">On Tue, Mar 25, 2014 at 5:32 PM, Matt
            Arsenault <span dir="ltr"><<a moz-do-not-send="true"
                href="mailto:Matthew.Arsenault@amd.com" target="_blank">Matthew.Arsenault@amd.com</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF">
                <div>
                  <div>On 03/25/2014 05:07 PM, Jingyue Wu wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr"><br>
                      <div class="gmail_extra"><br>
                        <br>
                        <div class="gmail_quote">On Tue, Mar 25, 2014 at
                          3:21 PM, Matt Arsenault <span dir="ltr"><<a
                              moz-do-not-send="true"
                              href="mailto:Matthew.Arsenault@amd.com"
                              target="_blank">Matthew.Arsenault@amd.com</a>></span>
                          wrote:<br>
                          <blockquote class="gmail_quote"
                            style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                            <div text="#000000" bgcolor="#FFFFFF">
                              <div>
                                <div>On 03/25/2014 02:31 PM, Jingyue Wu
                                  wrote:<br>
                                </div>
                                <blockquote type="cite">
                                  <div dir="ltr">
                                    <div class="gmail_quote">
                                      <div dir="ltr"><br>
                                        <div>However, we have three
                                          concerns on this:</div>
                                        <div>a) I doubt this
                                          optimization is valid for all
                                          targets, because LLVM language
                                          reference (<a
                                            moz-do-not-send="true"
                                            href="http://llvm.org/docs/LangRef.html#addrspacecast-to-instruction"
                                            target="_blank">http://llvm.org/docs/LangRef.html#addrspacecast-to-instruction</a>)
                                          says addrspacecast "can be a
                                          no-op cast or a complex value
                                          modification, depending on the
                                          target and the address space
                                          pair." <br>
                                        </div>
                                      </div>
                                    </div>
                                  </div>
                                </blockquote>
                              </div>
                              I think most of the simple cast
                              optimizations would be acceptable. The
                              addrspacecasted pointer still needs to
                              point to the same memory location, so
                              changing an access to use a different
                              address space would be OK. I think
                              canonicalizing accesses to use the
                              original address space of a casted pointer
                              when possible would make sense.</div>
                          </blockquote>
                          <div><br>
                          </div>
                          "the address space conversion is legal then
                          both result and operand refer to the same
                          memory location". I don't quite understand
                          this sentence. Does the same memory location
                          mean the same numeric value?</div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                No, it means they could both have different values that
                point to the same physical location. Storing to a
                pointer in one address space should have the same effect
                as storing to the addrspacecasted pointer, though it
                might not use the same value or instructions to do so.
                <div><br>
                </div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>That makes sense. Thanks! </div>
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF">
                <div> <br>
                  <br>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_extra">
                        <div class="gmail_quote">
                          <div>  </div>
                          <blockquote class="gmail_quote"
                            style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                            <div text="#000000" bgcolor="#FFFFFF">
                              <div> <br>
                                <br>
                                <blockquote type="cite">
                                  <div dir="ltr">
                                    <div class="gmail_quote">
                                      <div dir="ltr">
                                        <div>b) NVPTX and R600 have
                                          different address numbering
                                          for the generic address space,
                                          which makes things more
                                          complicated. </div>
                                        <div>c) We don't have a good
                                          understanding of the R600
                                          backend. </div>
                                        <br>
                                      </div>
                                    </div>
                                  </div>
                                </blockquote>
                                <br>
                              </div>
                              R600 currently does not support the flat
                              address space instructions intended to use
                              for the generic address space. I posted a
                              patch a while ago that half added it,
                              which I can try to work on finishing if it
                              would help.<br>
                              <br>
                              I also do not understand how NVPTX uses
                              address spaces, particularly how it can
                              use 0 as the the generic address space.</div>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>NVPTX backend generates ld.f32 for
                            reading from the generic address space.
                            There's no special machine instruction to
                            read/write from/to the generic address space
                            in R600? <br>
                          </div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                New hardware does have flat address space instructions,
                which is what my patch adds support for. They're just
                not defined in the target yet. This flat address space
                is separate different from 0 / the default. I think of
                addrspace(0) as the address space of allocas, so I don't
                understand how that can be consolidated with generic
                accesses of the other address spaces. Does NVPTX not
                differentiate between accesses of a generic pointer and
                private / alloca'd memory?
                <div><br>
                </div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>See Justin's followup. Looks like this optimization can
              benefit local accesses as well. <br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    This optimization can benefit all address spaces.  Imagine you have
    a library function that takes pointer arguments in the generic
    address space:<br>
    <br>
    <tt>__device__ float foo(float *a, float *b) {</tt><tt><br>
    </tt><tt>  return *a + *b;</tt><tt><br>
    </tt><tt>}</tt><br>
    <br>
    Now let's say this function is called twice in a kernel function:<br>
    <tt><br>
    </tt><tt>__global__ void kern(float *a, float *b, float *dst) {</tt><tt><br>
    </tt><tt>  __shared__ float buffer[32];</tt><tt><br>
    </tt><tt><br>
    </tt><tt>  ...</tt><tt><br>
    </tt><tt><br>
    </tt><tt>  *dst = foo(a, b) + foo(&buffer[0], &buffer[1]);</tt><tt><br>
    </tt><tt>}</tt><br>
    <br>
    Assuming 'foo' is inlined, you could convert the loads in the first
    call to 'ld.global' and the loads from the second call to
    'ld.shared'.<br>
    <br>
    <blockquote
cite="mid:CAMROOrFfEY9E9B=zMi176f8AkZ3y+A5Uwt2uODt_-D68_CwD9Q@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div text="#000000" bgcolor="#FFFFFF">
                <div> <br>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_extra">
                        <div class="gmail_quote">
                          <div> </div>
                          <blockquote class="gmail_quote"
                            style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                            <div text="#000000" bgcolor="#FFFFFF">
                              <div> <br>
                                <br>
                                <blockquote type="cite">
                                  <div dir="ltr">
                                    <div class="gmail_quote">
                                      <div dir="ltr">
                                        <div>2. How effective do we want
                                          this optimization to be? </div>
                                        <div><br>
                                        </div>
                                        <div>In the short term, I want
                                          it to be able to eliminate
                                          unnecessary
                                          non-generic-to-generic
                                          addrspacecasts the front-end
                                          generates for the NVPTX
                                          target. For example, <br>
                                        </div>
                                        <div><br>
                                        </div>
                                        <div>%p1 = addrspace i32
                                          addrspace(3)* %p0 to i32*</div>
                                        <div>%v = load i32* %p1</div>
                                        <div><br>
                                        </div>
                                        <div>=></div>
                                        <div><br>
                                        </div>
                                        <div>%v = load i32 addrspace(3)*
                                          %p0</div>
                                        <div><br>
                                        </div>
                                        <div>We want similar
                                          optimization for
                                          store+addrspacecast and
                                          gep+addrspacecast as well. </div>
                                        <div><br>
                                        </div>
                                        <div>In a long term, we could
                                          for sure improve this
                                          optimization to handle more
                                          instructions and more
                                          patterns. </div>
                                        <span></span><br>
                                      </div>
                                    </div>
                                  </div>
                                </blockquote>
                              </div>
                              I believe most of the cast simplifications
                              that apply to bitcasts of pointers also
                              apply to addrspacecast. I have some
                              patches waiting that extend some of the
                              more basic ones to understand
                              addrspacecast (e.g. <a
                                moz-do-not-send="true"
href="http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140120/202296.html"
                                target="_blank">http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140120/202296.html</a>),


                              plus a few more that I haven't posted yet.
                              Mostly they are little cast
                              simplifications like your example in
                              instcombine, but also SROA to eliminate
                              allocas that are addrspacecasted.</div>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>We also think InstCombine is a good place
                            to put this optimization, if we decide to go
                            with target-independent. Looking forward to
                            your patches! </div>
                          <br>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                  <br>
                </div>
                I think that strategy only gets you part of the way to
                ideal. For example, preferring to use the original
                address space works well for accesses to objects where
                you start with the known address space. You could also
                have a function with a generic address space argument
                casted back to a specific address space. Preferring the
                original address space in that case is the opposite of
                what you want, although I expect this case will end up
                being much less common in real code and will tend to go
                away after inlining.<br>
                <br>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>You're right. I overlooked this case. I doubt a CUDA
              program would even uses generic-to-non-generic casts,
              because non-generic address space qualifiers only qualify
              declarations. </div>
            <div><br>
            </div>
            <div>The backend can indicate which address spaces it
              prefers using some flags (e.g., preferNonGenericPointers
              as Justin suggested). InstCombine can then look at these
              flags to decide what to do. <br>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    The optimization should also treat the absence of target information
    as a flag to just not run.<br>
    <br>
    <br>
    <br>
  
<DIV>
<HR>
</DIV>
<DIV>This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure 
or distribution is prohibited.  If you are not the intended recipient, 
please contact the sender by reply email and destroy all copies of the original 
message. </DIV>
<DIV>
<HR>
</DIV>
</body>
</html>