<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    <div class="moz-cite-prefix">On 03/24/2014 03:08 PM, Tom Stellard

      wrote:<br>

    </div>

    <blockquote cite="mid:20140324190819.GG15147@freedesktop.org"

      type="cite">

      <pre wrap="">On Mon, Mar 24, 2014 at 03:01:14PM -0400, Justin Holewinski wrote:

</pre>

      <blockquote type="cite">

        <pre wrap="">On 03/24/2014 02:53 PM, Tom Stellard wrote:

</pre>

        <blockquote type="cite">

          <pre wrap="">On Mon, Mar 24, 2014 at 02:46:06PM -0400, Justin Holewinski wrote:

</pre>

          <blockquote type="cite">

            <pre wrap="">I don't have anything against making this a target-independent

IR-level, as long as no-one complains about it being a core pass.

Perhaps the pass could only execute if a target explicitly enables a

flag.  Something like "preferNonGenericPointers".  The default could

be 'false', and the pass would only modify the IR if the target sets

it to 'true'.  Of course, this also assumes address space 0 is

generic.  This is currently true for the in-tree targets and

CUDA/OpenCL support in Clang, but I don't believe its a set rule

anywhere.

</pre>

          </blockquote>

          <pre wrap="">What do mean by 'generic' address space here?

</pre>

        </blockquote>

        <pre wrap="">

In this context, 'generic' means an address space that encompasses

all other address spaces.  This is roughly equivalent to the OpenCL

2.0 generic address space.  Is addrspace(0) not a generic address

space for R600?

</pre>

      </blockquote>

      <pre wrap="">

No, it's not generic.  It's what we use for OpenCL's private address space.

Also, I missed the beginning of this discussion, which target-independent pass

does this impact?</pre>

    </blockquote>

    <br>

    None yet :)<br>

    <br>

    The question is whether a pass that converts generic address space

    usage to non-generic address space usage should be

    target-independent, or specific to a particular back-end.  An

    example would be an IR sequence like the following:<br>

    <br>

      %ptr = ...<br>

      %val = load i32* %ptr<br>

    <br>

    In this case, %ptr is a generic address space pointer (assuming an

    address space mapping where 0 is generic).  But if an analysis can

    prove that the pointer %ptr was originally addrspacecast'd from a

    specific address space (or some other mechanism through which the

    pointer's specific address space can be determined), it may be

    beneficial to explicitly convert the IR to something like:<br>

    <br>

      %ptr = ...<br>

      %ptr.0 = addrspacecast i32* to i32 addrspace(3)*<br>

      %val = load i32 addrspace(3)* %ptr.0<br>

    <br>

    Such a translation may generate better code for some targets.<br>

    <br>

    <blockquote cite="mid:20140324190819.GG15147@freedesktop.org"

      type="cite">

      <pre wrap="">

-Tom

</pre>

      <blockquote type="cite">

        <blockquote type="cite">

          <pre wrap="">

-Tom

</pre>

          <blockquote type="cite">

            <pre wrap="">On 03/24/2014 02:28 PM, Jingyue Wu wrote:

</pre>

            <blockquote type="cite">

              <pre wrap="">I agree with your concern. However, both CUDA and OpenCL (two most

popular users of addrspacecast I believe) support generic address

space, and could benefit from this optimization Would we end up

with duplicated code (at least one for CUDA one for opencl) if we

put it in the back-end?

Jingyue

On Mon, Mar 24, 2014 at 11:22 AM, Justin Holewinski

<<a class="moz-txt-link-abbreviated" href="mailto:jholewinski@nvidia.com">jholewinski@nvidia.com</a> <a class="moz-txt-link-rfc2396E" href="mailto:jholewinski@nvidia.com"><mailto:jholewinski@nvidia.com></a>> wrote:

   The hard part would be making this optimization general enough to

   be target-independent.  Optimizing to non-zero address spaces may

   not make sense for all targets (or even all future versions of

   PTX).  I agree that there should be an IR-level optimization for

   this, but perhaps its too target-specific and should actually live

   in the back-end.

   On 03/24/2014 01:05 PM, Jingyue Wu wrote:

</pre>

              <blockquote type="cite">

                <pre wrap="">   Right. We are aware of this issue, and think it should be

   addressed in the IR optimizer (similar to InstCombineLoadCast and

   InstCombineStoreToCast) instead of clang. Do you think this is an

   appropriate approach? Is this optimization general enough to stay

   in the IR optimizer or target-dependent?

   Jingyue

   On Mon, Mar 24, 2014 at 4:54 AM, Justin Holewinski

   <<a class="moz-txt-link-abbreviated" href="mailto:justin.holewinski@gmail.com">justin.holewinski@gmail.com</a>

   <a class="moz-txt-link-rfc2396E" href="mailto:justin.holewinski@gmail.com"><mailto:justin.holewinski@gmail.com></a>> wrote:

       Hi Jingyue,

       I committed the addrspacecast isel patterns to NVPTX.  Also,

       I wanted to point out that your changes in the last test case

       in this patch (address-spaces.cu <a class="moz-txt-link-rfc2396E" href="http://address-spaces.cu"><http://address-spaces.cu></a>)

       represent changes that may lead to performance degradation.

        Specific address spaces should be used whenever possible for

       loads/stores.  Casting everything to a generic address is

       still correct, but may lead to additional indirections for

       the hardware.

       On Fri, Mar 21, 2014 at 2:25 PM, Justin Holewinski

       <<a class="moz-txt-link-abbreviated" href="mailto:jholewinski@nvidia.com">jholewinski@nvidia.com</a> <a class="moz-txt-link-rfc2396E" href="mailto:jholewinski@nvidia.com"><mailto:jholewinski@nvidia.com></a>> wrote:

           addrspacecast support in NVPTX is on my todo list.  I'll

           try to put something together in the next few days.

           On 3/21/14, 2:20 PM, Jingyue Wu wrote:

</pre>

                <blockquote type="cite">

                  <pre wrap="">           Hi,

           Static local variables in CUDA can be declared with

           address space qualifiers, such as __shared__. Therefore,

           the codegen needs to potentially addrspacecast a static

           local variable to the type expected by its declaration.

           Peter did something similar for global variables in

           r157167.

           All clang tests passed.

           Justin: The NVPTX backend support for addrspacecast

           seems not complete. We can send you follow-up patches

           once this one gets in.

           Jingyue

</pre>

                </blockquote>

                <pre wrap="">

           --             Thanks,

           Justin Holewinski

           ------------------------------------------------------------------------

           This email message is for the sole use of the intended

           recipient(s) and may contain confidential information.

           Any unauthorized review, use, disclosure or distribution

           is prohibited. If you are not the intended recipient,

           please contact the sender by reply email and destroy all

           copies of the original message.

           ------------------------------------------------------------------------

       --

       Thanks,

       Justin Holewinski

</pre>

              </blockquote>

              <pre wrap="">

</pre>

            </blockquote>

            <pre wrap="">_______________________________________________

cfe-commits mailing list

<a class="moz-txt-link-abbreviated" href="mailto:cfe-commits@cs.uiuc.edu">cfe-commits@cs.uiuc.edu</a>

<a class="moz-txt-link-freetext" href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits</a>

</pre>

          </blockquote>

        </blockquote>

        <pre wrap="">

</pre>

      </blockquote>

    </blockquote>

    <br>

  </body>

</html>