[llvm] 1ee4d88 - NVPTX: Lower unreachable to exit to allow ptxas to accurately reconstruct the CFG.

Artem Belevich via llvm-commits llvm-commits at lists.llvm.org
Wed Jun 21 11:41:31 PDT 2023


Author: Tim Besard
Date: 2023-06-21T11:40:31-07:00
New Revision: 1ee4d880e8760256c606fe55b7af85a4f70d006d

URL: https://github.com/llvm/llvm-project/commit/1ee4d880e8760256c606fe55b7af85a4f70d006d
DIFF: https://github.com/llvm/llvm-project/commit/1ee4d880e8760256c606fe55b7af85a4f70d006d.diff

LOG: NVPTX: Lower unreachable to exit to allow ptxas to accurately reconstruct the CFG.

PTX does not have a notion of `unreachable`, which results in emitted basic
blocks having an edge to the next block:

```
block1:
  call @does_not_return();
  // unreachable
block2:
  // ptxas will create a CFG edge from block1 to block2
```

This may result in significant changes to the control flow graph, e.g., when
LLVM moves unreachable blocks to the end of the function. That's a problem
in the context of divergent control flow, as `ptxas` uses the CFG to determine
divergent regions, while some intructions may not be executed divergently.

For example, `bar.sync` is not allowed to be executed divergently on Pascal
or earlier. If we start with the following:

```
entry:
  // start of divergent region
  @%p0 bra cont;
  @%p1 bra unlikely;
  ...
  bra.uni cont;
unlikely:
  ...
  // unreachable
cont:
  // end of divergent region
  bar.sync 0;
  bra.uni exit;
exit:
  ret;
```

it is transformed by the branch-folder and block-placement passes to:

```
entry:
  // start of divergent region
  @%p0 bra cont;
  @%p1 bra unlikely;
  ...
  bra.uni cont;
cont:
  bar.sync 0;
  bra.uni exit;
unlikely:
  ...
  // unreachable
exit:
  // end of divergent region
  ret;
```

After moving the `unlikely` block to the end of the function, it has an edge
to the `exit` block, which widens the divergent region and makes the `bar.sync`
instruction happen divergently. That causes wrong computations, as we've been
running into for years with Julia code (which emits a lot of `trap` +
`unreachable` code all over the place).

To work around this, add an `exit` instruction before every `unreachable`,
as `ptxas` understands that exit terminates the CFG. Note that `trap` is not
equivalent, and only future versions of `ptxas` will model it like `exit`.
Another alternative would be to emit a branch to the block itself, but emitting
`exit` seems like a cleaner solution to represent `unreachable` to me.

Also note that this may not be sufficient, as it's possible that the block
with unreachable control flow is branched to from different divergent regions,
e.g. after block merging, in which case it may still be the case that `ptxas`
could reconstruct a CFG where divergent regions are merged (I haven't confirmed
this, but also haven't encountered this pattern in the wild yet):

```
entry:
  // start of divergent region 1
  @%p0 bra cont1;
  @%p1 bra unlikely;
  bra.uni cont1;
cont1:
  // intended end of divergent region 1
  bar.sync 0;
  // start of divergent region 2
  @%p2 bra cont2;
  @%p3 bra unlikely;
  bra.uni cont2;
cont2:
  // intended end of divergent region 2
  bra.uni exit;
unlikely:
  ...
  exit;
exit:
  // possible end of merged divergent region?
```

I originally tried to avoid the above by cloning paths towards `unreachable` and
splitting the outgoing edges, but that quickly became too complicated. I propose
we go with the simple solution first, also because modern GPUs with more flexible
hardware thread schedulers don't even suffer from this issue.

Finally, although I expect this to fix most of
https://bugs.llvm.org/show_bug.cgi?id=27738, I do still encounter
miscompilations with Julia's unreachable-heavy code when targeting these
older GPUs using an older `ptxas` version (specifically, from CUDA 11.4 or
below). This is likely due to related bugs in `ptxas` which have been fixed
since, as I have filed several reproducers with NVIDIA over the past couple of
years. I'm not inclined to look into fixing those issues over here, and will
instead be recommending our users to upgrade CUDA to 11.5+ when using these GPUs.

Also see:
- https://github.com/JuliaGPU/CUDAnative.jl/issues/4
- https://github.com/JuliaGPU/CUDA.jl/issues/1746
- https://discourse.llvm.org/t/llvm-reordering-blocks-breaks-ptxas-divergence-analysis/71126

Reviewed By: jdoerfert, tra

Differential Revision: https://reviews.llvm.org/D152789

Added: 
    llvm/lib/Target/NVPTX/NVPTXLowerUnreachable.cpp
    llvm/test/CodeGen/NVPTX/unreachable.ll

Modified: 
    llvm/lib/Target/NVPTX/CMakeLists.txt
    llvm/lib/Target/NVPTX/NVPTX.h
    llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Target/NVPTX/CMakeLists.txt b/llvm/lib/Target/NVPTX/CMakeLists.txt
index 6b35c2a0444f2..693365161330f 100644
--- a/llvm/lib/Target/NVPTX/CMakeLists.txt
+++ b/llvm/lib/Target/NVPTX/CMakeLists.txt
@@ -25,6 +25,7 @@ set(NVPTXCodeGen_sources
   NVPTXLowerAggrCopies.cpp
   NVPTXLowerArgs.cpp
   NVPTXLowerAlloca.cpp
+  NVPTXLowerUnreachable.cpp
   NVPTXPeephole.cpp
   NVPTXMCExpr.cpp
   NVPTXPrologEpilogPass.cpp

diff  --git a/llvm/lib/Target/NVPTX/NVPTX.h b/llvm/lib/Target/NVPTX/NVPTX.h
index 9e8a2761010c4..ec32a95dea908 100644
--- a/llvm/lib/Target/NVPTX/NVPTX.h
+++ b/llvm/lib/Target/NVPTX/NVPTX.h
@@ -47,6 +47,7 @@ MachineFunctionPass *createNVPTXReplaceImageHandlesPass();
 FunctionPass *createNVPTXImageOptimizerPass();
 FunctionPass *createNVPTXLowerArgsPass();
 FunctionPass *createNVPTXLowerAllocaPass();
+FunctionPass *createNVPTXLowerUnreachablePass();
 MachineFunctionPass *createNVPTXPeephole();
 MachineFunctionPass *createNVPTXProxyRegErasurePass();
 

diff  --git a/llvm/lib/Target/NVPTX/NVPTXLowerUnreachable.cpp b/llvm/lib/Target/NVPTX/NVPTXLowerUnreachable.cpp
new file mode 100644
index 0000000000000..1d312f82e6c06
--- /dev/null
+++ b/llvm/lib/Target/NVPTX/NVPTXLowerUnreachable.cpp
@@ -0,0 +1,126 @@
+//===-- NVPTXLowerUnreachable.cpp - Lower unreachables to exit =====--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// PTX does not have a notion of `unreachable`, which results in emitted basic
+// blocks having an edge to the next block:
+//
+//   block1:
+//     call @does_not_return();
+//     // unreachable
+//   block2:
+//     // ptxas will create a CFG edge from block1 to block2
+//
+// This may result in significant changes to the control flow graph, e.g., when
+// LLVM moves unreachable blocks to the end of the function. That's a problem
+// in the context of divergent control flow, as `ptxas` uses the CFG to
+// determine divergent regions, and some intructions may not be executed
+// divergently.
+//
+// For example, `bar.sync` is not allowed to be executed divergently on Pascal
+// or earlier. If we start with the following:
+//
+//   entry:
+//     // start of divergent region
+//     @%p0 bra cont;
+//     @%p1 bra unlikely;
+//     ...
+//     bra.uni cont;
+//   unlikely:
+//     ...
+//     // unreachable
+//   cont:
+//     // end of divergent region
+//     bar.sync 0;
+//     bra.uni exit;
+//   exit:
+//     ret;
+//
+// it is transformed by the branch-folder and block-placement passes to:
+//
+//   entry:
+//     // start of divergent region
+//     @%p0 bra cont;
+//     @%p1 bra unlikely;
+//     ...
+//     bra.uni cont;
+//   cont:
+//     bar.sync 0;
+//     bra.uni exit;
+//   unlikely:
+//     ...
+//     // unreachable
+//   exit:
+//     // end of divergent region
+//     ret;
+//
+// After moving the `unlikely` block to the end of the function, it has an edge
+// to the `exit` block, which widens the divergent region and makes the
+// `bar.sync` instruction happen divergently.
+//
+// To work around this, we add an `exit` instruction before every `unreachable`,
+// as `ptxas` understands that exit terminates the CFG. Note that `trap` is not
+// equivalent, and only future versions of `ptxas` will model it like `exit`.
+//
+//===----------------------------------------------------------------------===//
+
+#include "NVPTX.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/InlineAsm.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/Type.h"
+#include "llvm/Pass.h"
+
+using namespace llvm;
+
+namespace llvm {
+void initializeNVPTXLowerUnreachablePass(PassRegistry &);
+}
+
+namespace {
+class NVPTXLowerUnreachable : public FunctionPass {
+  bool runOnFunction(Function &F) override;
+
+public:
+  static char ID; // Pass identification, replacement for typeid
+  NVPTXLowerUnreachable() : FunctionPass(ID) {}
+  StringRef getPassName() const override {
+    return "add an exit instruction before every unreachable";
+  }
+};
+} // namespace
+
+char NVPTXLowerUnreachable::ID = 1;
+
+INITIALIZE_PASS(NVPTXLowerUnreachable, "nvptx-lower-unreachable",
+                "Lower Unreachable", false, false)
+
+// =============================================================================
+// Main function for this pass.
+// =============================================================================
+bool NVPTXLowerUnreachable::runOnFunction(Function &F) {
+  if (skipFunction(F))
+    return false;
+
+  LLVMContext &C = F.getContext();
+  FunctionType *ExitFTy = FunctionType::get(Type::getVoidTy(C), false);
+  InlineAsm *Exit = InlineAsm::get(ExitFTy, "exit;", "", true);
+
+  bool Changed = false;
+  for (auto &BB : F)
+    for (auto &I : BB) {
+      if (auto unreachableInst = dyn_cast<UnreachableInst>(&I)) {
+        Changed = true;
+        CallInst::Create(ExitFTy, Exit, "", unreachableInst);
+      }
+    }
+  return Changed;
+}
+
+FunctionPass *llvm::createNVPTXLowerUnreachablePass() {
+  return new NVPTXLowerUnreachable();
+}

diff  --git a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
index 3b74dbfeb89f5..a530de78f50ae 100644
--- a/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp
@@ -72,6 +72,7 @@ void initializeNVPTXAtomicLowerPass(PassRegistry &);
 void initializeNVPTXCtorDtorLoweringLegacyPass(PassRegistry &);
 void initializeNVPTXLowerAggrCopiesPass(PassRegistry &);
 void initializeNVPTXLowerAllocaPass(PassRegistry &);
+void initializeNVPTXLowerUnreachablePass(PassRegistry &);
 void initializeNVPTXCtorDtorLoweringLegacyPass(PassRegistry &);
 void initializeNVPTXLowerArgsPass(PassRegistry &);
 void initializeNVPTXProxyRegErasurePass(PassRegistry &);
@@ -98,6 +99,7 @@ extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeNVPTXTarget() {
   initializeNVPTXAtomicLowerPass(PR);
   initializeNVPTXLowerArgsPass(PR);
   initializeNVPTXLowerAllocaPass(PR);
+  initializeNVPTXLowerUnreachablePass(PR);
   initializeNVPTXCtorDtorLoweringLegacyPass(PR);
   initializeNVPTXLowerAggrCopiesPass(PR);
   initializeNVPTXProxyRegErasurePass(PR);
@@ -400,6 +402,8 @@ void NVPTXPassConfig::addIRPasses() {
       addPass(createLoadStoreVectorizerPass());
     addPass(createSROAPass());
   }
+
+  addPass(createNVPTXLowerUnreachablePass());
 }
 
 bool NVPTXPassConfig::addInstSelector() {

diff  --git a/llvm/test/CodeGen/NVPTX/unreachable.ll b/llvm/test/CodeGen/NVPTX/unreachable.ll
new file mode 100644
index 0000000000000..742089df1bd45
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/unreachable.ll
@@ -0,0 +1,23 @@
+; RUN: llc < %s -march=nvptx -mcpu=sm_20 -verify-machineinstrs | FileCheck %s
+; RUN: llc < %s -march=nvptx64 -mcpu=sm_20 -verify-machineinstrs | FileCheck %s
+; RUN: %if ptxas && !ptxas-12.0 %{ llc < %s -march=nvptx -mcpu=sm_20 -verify-machineinstrs | %ptxas-verify %}
+; RUN: %if ptxas %{ llc < %s -march=nvptx64 -mcpu=sm_20 -verify-machineinstrs | %ptxas-verify %}
+
+; CHECK: .extern .func throw
+declare void @throw() #0
+
+; CHECK: .entry kernel_func
+define void @kernel_func() {
+; CHECK: call.uni
+; CHECK: throw,
+  call void @throw()
+; CHECK: exit
+  unreachable
+}
+
+attributes #0 = { noreturn }
+
+
+!nvvm.annotations = !{!1}
+
+!1 = !{ptr @kernel_func, !"kernel", i32 1}


        


More information about the llvm-commits mailing list