Missing TargetPrefix for NVVM intrinsics

Wed Jul 6 04:32:27 PDT 2016

Yeah, this is probably better handled on llvm-commits. Swapping mailing 
lists. Any particular reason these aren't up on Phabricator?

Overall, the changes look good. A few comments:

--- a/lib/Target/NVPTX/NVPTXInstrInfo.cpp
+++ b/lib/Target/NVPTX/NVPTXInstrInfo.cpp
@@ -112,8 +112,6 @@ bool NVPTXInstrInfo::isStoreInstr(const MachineInstr 
&MI,

  bool NVPTXInstrInfo::CanTailMerge(const MachineInstr *MI) const {
    unsigned addrspace = 0;
-  if (MI->getOpcode() == NVPTX::INT_CUDA_SYNCTHREADS)
-    return false;
    if (isLoadInstr(*MI, addrspace))
      if (addrspace == NVPTX::PTXLdStInstCode::SHARED)

This should be replaced with the bar0 intrinsic. Looks like a bug that 
bar0 is missing anyway.

--- a/include/clang/Basic/BuiltinsNVPTX.def
+++ b/include/clang/Basic/BuiltinsNVPTX.def
@@ -15,50 +15,21 @@
  // The format of this database matches clang/Basic/Builtins.def.

  // Builtins retained from previous PTX back-end

I would just remove the comment about being retained from the previous 
back-end. The intent is to remove all legacy compatibility.

--- a/test/Analysis/DivergenceAnalysis/NVPTX/diverge.ll
+++ b/test/Analysis/DivergenceAnalysis/NVPTX/diverge.ll
@@ -93,20 +93,20 @@ merge:
  ; int i = 0;
  ; do {
  ;   i++;                  // i here is uniform
-; } while (i < laneid);
+; } while (i < tid);
  ; return i == 10 ? 0 : 1; // i here is divergent
  ;
  ; The i defined in the loop is used outside.
  define i32 @loop() {
  ; CHECK-LABEL: Printing analysis 'Divergence Analysis' for function 'loop'
  entry:
-  %laneid = call i32 @llvm.ptx.read.laneid()
+  %tid = call i32 @llvm.nvvm.read.ptx.sreg.tid.x()
    br label %loop
  loop:
    %i = phi i32 [ 0, %entry ], [ %i1, %loop ]
  ; CHECK-NOT: DIVERGENT: %i =
    %i1 = add i32 %i, 1
-  %exit_cond = icmp sge i32 %i1, %laneid
+  %exit_cond = icmp sge i32 %i1, %tid
    br i1 %exit_cond, label %loop_exit, label %loop
  loop_exit:
    %cond = icmp eq i32 %i, 10

What is the intent of the change from laneid to tid?



On 7/5/2016 5:56 PM, Justin Bogner wrote:
> Justin Lebar <jlebar at google.com> writes:
>>> @jlebar, any issues on your end with removing all of the non-nvvm intrinsics?
>> I don't think so, if the functionality is all there from the nvvm
>> intrinsics.  Some if it may involve clang changes, which I'm not
>> exactly volunteering to do, but they should be simple.
> Okay, so if I've understood you both correctly, the attached six patches
> should do the trick, for both LLVM and clang. First, they rename the
> shfl intrinsics from __builtin_ptx_shfl to __nvvm_shfl. Second,
> they replace uses of cuda.syncthreads with nvvm.barrier0. Third, all of
> the ptx intrinsics and builtins are removed. Finally, I wrap the nvvm
> intrinsics with a TargetPrefix.
>
> Let me know if you want me to break any of these out into specific
> review threads on -commits.
>


-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------