<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <font face="Calibri">Any llvm.ptx intrinsic should have a matching

      llvm.nvvm intrinsic. If not, then its a bug.</font><br>

    <br>

    <div class="moz-cite-prefix">On 7/6/2016 1:49 PM, Justin Lebar

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAMuNMfrz7+C718_wadwJ=LsG+d=jbKah=PBJOztX0i-n-ku=mw@mail.gmail.com"

      type="cite">

      <blockquote type="cite">

        <pre wrap="">Subject: [PATCH 1/2] NVPTX: Rename __builtin_ptx_shfl -> __nvvm_shfl

</pre>

      </blockquote>

      <pre wrap="">

Please run clang-format over the changes to __clang_cuda_intrinsics.h,

if you wouldn't mind.

</pre>

      <blockquote type="cite">

        <pre wrap="">Subject: [PATCH 3/4] NVPTX: Remove the legacy ptx intrinsics

</pre>

      </blockquote>

      <pre wrap="">

Looks like we're losing a bunch of intrinsics here.  Some of them like

reading tid.w I don't care about (does that even work??).  But most of

the others seem to be providing real functionality that afaict isn't

available elsewhere.  Why are we removing them?

On Wed, Jul 6, 2016 at 4:32 AM, Justin Holewinski

<a class="moz-txt-link-rfc2396E" href="mailto:jholewinski@nvidia.com"><jholewinski@nvidia.com></a> wrote:

</pre>

      <blockquote type="cite">

        <pre wrap="">Yeah, this is probably better handled on llvm-commits. Swapping mailing

lists. Any particular reason these aren't up on Phabricator?

Overall, the changes look good. A few comments:

--- a/lib/Target/NVPTX/NVPTXInstrInfo.cpp

+++ b/lib/Target/NVPTX/NVPTXInstrInfo.cpp

@@ -112,8 +112,6 @@ bool NVPTXInstrInfo::isStoreInstr(const MachineInstr

&MI,

 bool NVPTXInstrInfo::CanTailMerge(const MachineInstr *MI) const {

   unsigned addrspace = 0;

-  if (MI->getOpcode() == NVPTX::INT_CUDA_SYNCTHREADS)

-    return false;

   if (isLoadInstr(*MI, addrspace))

     if (addrspace == NVPTX::PTXLdStInstCode::SHARED)

This should be replaced with the bar0 intrinsic. Looks like a bug that bar0

is missing anyway.

--- a/include/clang/Basic/BuiltinsNVPTX.def

+++ b/include/clang/Basic/BuiltinsNVPTX.def

@@ -15,50 +15,21 @@

 // The format of this database matches clang/Basic/Builtins.def.

 // Builtins retained from previous PTX back-end

I would just remove the comment about being retained from the previous

back-end. The intent is to remove all legacy compatibility.

--- a/test/Analysis/DivergenceAnalysis/NVPTX/diverge.ll

+++ b/test/Analysis/DivergenceAnalysis/NVPTX/diverge.ll

@@ -93,20 +93,20 @@ merge:

 ; int i = 0;

 ; do {

 ;   i++;                  // i here is uniform

-; } while (i < laneid);

+; } while (i < tid);

 ; return i == 10 ? 0 : 1; // i here is divergent

 ;

 ; The i defined in the loop is used outside.

 define i32 @loop() {

 ; CHECK-LABEL: Printing analysis 'Divergence Analysis' for function 'loop'

 entry:

-  %laneid = call i32 @llvm.ptx.read.laneid()

+  %tid = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()

   br label %loop

 loop:

   %i = phi i32 [ 0, %entry ], [ %i1, %loop ]

 ; CHECK-NOT: DIVERGENT: %i =

   %i1 = add i32 %i, 1

-  %exit_cond = icmp sge i32 %i1, %laneid

+  %exit_cond = icmp sge i32 %i1, %tid

   br i1 %exit_cond, label %loop_exit, label %loop

 loop_exit:

   %cond = icmp eq i32 %i, 10

What is the intent of the change from laneid to tid?

On 7/5/2016 5:56 PM, Justin Bogner wrote:

</pre>

        <blockquote type="cite">

          <pre wrap="">

Justin Lebar <a class="moz-txt-link-rfc2396E" href="mailto:jlebar@google.com"><jlebar@google.com></a> writes:

</pre>

          <blockquote type="cite">

            <blockquote type="cite">

              <pre wrap="">

@jlebar, any issues on your end with removing all of the non-nvvm

intrinsics?

</pre>

            </blockquote>

            <pre wrap="">

I don't think so, if the functionality is all there from the nvvm

intrinsics.  Some if it may involve clang changes, which I'm not

exactly volunteering to do, but they should be simple.

</pre>

          </blockquote>

          <pre wrap="">

Okay, so if I've understood you both correctly, the attached six patches

should do the trick, for both LLVM and clang. First, they rename the

shfl intrinsics from __builtin_ptx_shfl to __nvvm_shfl. Second,

they replace uses of cuda.syncthreads with nvvm.barrier0. Third, all of

the ptx intrinsics and builtins are removed. Finally, I wrap the nvvm

intrinsics with a TargetPrefix.

Let me know if you want me to break any of these out into specific

review threads on -commits.

</pre>

        </blockquote>

        <pre wrap="">

-----------------------------------------------------------------------------------

This email message is for the sole use of the intended recipient(s) and may

contain

confidential information.  Any unauthorized review, use, disclosure or

distribution

is prohibited.  If you are not the intended recipient, please contact the

sender by

reply email and destroy all copies of the original message.

-----------------------------------------------------------------------------------

</pre>

      </blockquote>

    </blockquote>

    <br>

  </body>

</html>