[llvm-branch-commits] [clang] [llvm] [clang] Redefine `noconvergent` and generate convergence control tokens (PR #136282)
Sameer Sahasrabuddhe via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Fri Apr 18 01:28:10 PDT 2025
https://github.com/ssahasra created https://github.com/llvm/llvm-project/pull/136282
This introduces the `-fconvergence-control` flag that emits convergence control intrinsics which are then used as the `convergencectrl` operand bundle on convergent calls.
This also redefines the `noconvergent` attribute in Clang. The existing simple interpretation is that if a statement is marked `noconvergent`, then every asm call is treated as a non-convergent operation in the emitted LLVM IR.
The new semantics introduces a more powerful notion that a `noconvergent` statement may contain convergent operations, but the resulting convergence constraints are limited to the scope of that statement. As a whole the statement itself does not place any convergence constraints on the control flow reaching it. When emitting convergence tokens, this attribute results in a call to the `anchor` intrinsic that determines convergence within the statement.
>From 5681859e308283628da481c0ddc09a39345b3d46 Mon Sep 17 00:00:00 2001
From: Sameer Sahasrabuddhe <sameer.sahasrabuddhe at amd.com>
Date: Tue, 15 Apr 2025 18:00:01 +0530
Subject: [PATCH] [clang] Redefine `noconvergent` and generate convergence
control tokens
This introduces the `-fconvergence-control` flag that emits convergence control
intrinsics which are then used as the `convergencectrl` operand bundle on
convergent calls.
This also redefines the `noconvergent` attribute in Clang. The existing simple
interpretation is that if a statement is marked `noconvergent`, then every asm
call is treated as a non-convergent operation in the emitted LLVM IR.
The new semantics introduces a more powerful notion that a `noconvergent`
statement may contain convergent operations, but the resulting convergence
constraints are limited to the scope of that statement. As a whole the statement
itself does not place any convergence constraints on the control flow reaching
it. When emitting convergence tokens, this attribute results in a call to the
`anchor` intrinsic that determines convergence within the statement.
---
clang/docs/ThreadConvergence.rst | 27 +
.../Analysis/Analyses/ConvergenceCheck.h | 3 +-
clang/include/clang/Basic/AttrDocs.td | 15 +-
.../clang/Basic/DiagnosticSemaKinds.td | 2 +
clang/include/clang/Basic/LangOptions.def | 2 +
clang/include/clang/Driver/Options.td | 5 +
clang/lib/Analysis/ConvergenceCheck.cpp | 43 +-
clang/lib/CodeGen/CGCall.cpp | 8 +-
clang/lib/CodeGen/CGStmt.cpp | 44 +-
clang/lib/CodeGen/CodeGenFunction.cpp | 23 +-
clang/lib/CodeGen/CodeGenFunction.h | 13 +-
clang/lib/CodeGen/CodeGenModule.h | 2 +-
clang/lib/Driver/ToolChains/Clang.cpp | 3 +
clang/lib/Sema/AnalysisBasedWarnings.cpp | 8 +-
clang/test/CodeGenHIP/convergence-tokens.hip | 687 ++++++++++++++++++
.../CodeGenHIP/noconvergent-statement.hip | 109 +++
.../noconvergent-errors/backwards_jump.hip | 23 +
.../noconvergent-errors/jump-into-nest.hip | 32 +
.../SemaHIP/noconvergent-errors/no-errors.hip | 83 +++
.../noconvergent-errors/simple_jump.hip | 23 +
llvm/include/llvm/IR/InstrTypes.h | 8 +-
llvm/include/llvm/IR/IntrinsicInst.h | 12 +
.../Transforms/Utils/FixConvergenceControl.h | 21 +
llvm/lib/IR/Instructions.cpp | 7 +
llvm/lib/IR/IntrinsicInst.cpp | 21 +
llvm/lib/Transforms/Utils/CMakeLists.txt | 1 +
.../Utils/FixConvergenceControl.cpp | 191 +++++
27 files changed, 1365 insertions(+), 51 deletions(-)
create mode 100644 clang/test/CodeGenHIP/convergence-tokens.hip
create mode 100644 clang/test/CodeGenHIP/noconvergent-statement.hip
create mode 100644 clang/test/SemaHIP/noconvergent-errors/backwards_jump.hip
create mode 100644 clang/test/SemaHIP/noconvergent-errors/jump-into-nest.hip
create mode 100644 clang/test/SemaHIP/noconvergent-errors/no-errors.hip
create mode 100644 clang/test/SemaHIP/noconvergent-errors/simple_jump.hip
create mode 100644 llvm/include/llvm/Transforms/Utils/FixConvergenceControl.h
create mode 100644 llvm/lib/Transforms/Utils/FixConvergenceControl.cpp
diff --git a/clang/docs/ThreadConvergence.rst b/clang/docs/ThreadConvergence.rst
index d872ab9cb77f5..ce2ca2cbeacde 100644
--- a/clang/docs/ThreadConvergence.rst
+++ b/clang/docs/ThreadConvergence.rst
@@ -564,6 +564,33 @@ backwards ``goto`` instead of a ``while`` statement.
``outside_loop``. This includes threads that jumped from ``G2`` as well as
threads that reached ``outside_loop`` after executing ``C``.
+.. _noconvergent-statement:
+
+The ``noconvergent`` Statement
+==============================
+
+When a statement is marked as ``noconvergent`` the convergence of threads at the
+start of this statement is not constrained by any convergent operations inside
+the statement.
+
+- When two threads execute a statement marked ``noconvergent``, it is
+ implementation-defined whether they are converged at that execution. [Note:
+ The resulting evaluations must still satisfy the strict partial order imposed
+ by convergence-before.]
+- When two threads are converged at the start of this statement (as determined
+ by the implementation), whether they are converged at each convergent
+ operation inside this statement is determined by the usual rules.
+
+For every label statement ``L`` occurring inside a ``noconvergent``
+statement, every ``goto`` or ``switch`` statement that transfers control to
+``L`` must also occur inside that statement.
+
+.. note::
+
+ Convergence control tokens are necessary for correctly implementing the
+ "noconvergent" statement attribute. When tokens are not in use, the legacy
+ behaviour is retained, where the only effect of this attribute is that
+ ``asm`` calls within the statement are not treated as convergent operations.
Implementation-defined Convergence
==================================
diff --git a/clang/include/clang/Analysis/Analyses/ConvergenceCheck.h b/clang/include/clang/Analysis/Analyses/ConvergenceCheck.h
index bf0d164c6a5bc..74208889a84df 100644
--- a/clang/include/clang/Analysis/Analyses/ConvergenceCheck.h
+++ b/clang/include/clang/Analysis/Analyses/ConvergenceCheck.h
@@ -18,7 +18,8 @@ class AnalysisDeclContext;
class Sema;
class Stmt;
-void analyzeForConvergence(Sema &S, AnalysisDeclContext &AC);
+void analyzeForConvergence(Sema &S, AnalysisDeclContext &AC,
+ bool GenerateWarnings, bool GenerateTokens);
} // end namespace clang
diff --git a/clang/include/clang/Basic/AttrDocs.td b/clang/include/clang/Basic/AttrDocs.td
index 5f37922d352b7..7ef8d3d86fe50 100644
--- a/clang/include/clang/Basic/AttrDocs.td
+++ b/clang/include/clang/Basic/AttrDocs.td
@@ -1700,13 +1700,12 @@ def NoConvergentDocs : Documentation {
This attribute prevents a function from being treated as convergent; when a
function is marked ``noconvergent``, calls to that function are not
automatically assumed to be convergent, unless such calls are explicitly marked
-as ``convergent``. If a statement is marked as ``noconvergent``, any calls to
-inline ``asm`` in that statement are no longer treated as convergent.
+as ``convergent``.
-In languages following SPMD/SIMT programming model, e.g., CUDA/HIP, function
-declarations and inline asm calls are treated as convergent by default for
-correctness. This ``noconvergent`` attribute is helpful for developers to
-prevent them from being treated as convergent when it's safe.
+If a statement is marked as ``noconvergent``, the semantics depends on whether
+convergence control tokens are used in the generated LLVM IR. When convergence
+control tokens are not in use, any calls to inline ``asm`` in that statement are
+treated as not convergent.
.. code-block:: c
@@ -1719,6 +1718,10 @@ prevent them from being treated as convergent when it's safe.
[[clang::noconvergent]] { asm volatile ("nop"); } // the asm call is non-convergent
}
+When tokens are in use, placing the ``noconvergent`` attribute on a statement
+indicates that thread convergence at the entry to that statement is
+:ref:`implementation-defined<noconvergent-statement>`.
+
}];
}
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index dabb6d31b519a..3be697c6337bc 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -6514,6 +6514,8 @@ def note_goto_affects_convergence : Note<
"jump from this goto statement affects convergence">;
def note_switch_case_affects_convergence : Note<
"jump to this case statement affects convergence of loop">;
+def err_jump_into_noconvergent : Error<
+ "cannot jump into a noconvergent statement from outside">;
def err_goto_into_protected_scope : Error<
"cannot jump from this goto statement to its label">;
def ext_goto_into_protected_scope : ExtWarn<
diff --git a/clang/include/clang/Basic/LangOptions.def b/clang/include/clang/Basic/LangOptions.def
index 930c1c06d1a76..c8254af61387b 100644
--- a/clang/include/clang/Basic/LangOptions.def
+++ b/clang/include/clang/Basic/LangOptions.def
@@ -306,6 +306,8 @@ LANGOPT(HIPUseNewLaunchAPI, 1, 0, "Use new kernel launching API for HIP")
LANGOPT(OffloadUniformBlock, 1, 0, "Assume that kernels are launched with uniform block sizes (default true for CUDA/HIP and false otherwise)")
LANGOPT(HIPStdPar, 1, 0, "Enable Standard Parallel Algorithm Acceleration for HIP (experimental)")
LANGOPT(HIPStdParInterposeAlloc, 1, 0, "Replace allocations / deallocations with HIP RT calls when Standard Parallel Algorithm Acceleration for HIP is enabled (Experimental)")
+LANGOPT(ConvergenceControl, 1, 0,
+ "Generate explicit convergence control (experimental)")
LANGOPT(OpenACC , 1, 0, "OpenACC Enabled")
diff --git a/clang/include/clang/Driver/Options.td b/clang/include/clang/Driver/Options.td
index 830d3459a1320..369929c30a623 100644
--- a/clang/include/clang/Driver/Options.td
+++ b/clang/include/clang/Driver/Options.td
@@ -1397,6 +1397,11 @@ def fhip_emit_relocatable : Flag<["-"], "fhip-emit-relocatable">,
HelpText<"Compile HIP source to relocatable">;
def fno_hip_emit_relocatable : Flag<["-"], "fno-hip-emit-relocatable">,
HelpText<"Do not override toolchain to compile HIP source to relocatable">;
+defm convergence_control : BoolFOption<"convergence-control",
+ LangOpts<"ConvergenceControl">, DefaultFalse,
+ PosFlag<SetTrue, [], [ClangOption, CC1Option], "Generate">,
+ NegFlag<SetFalse, [], [ClangOption], "Don't generate">,
+ BothFlags<[], [ClangOption], " explicit convergence control tokens (experimental)">>;
}
// Clang specific/exclusive options for OpenACC.
diff --git a/clang/lib/Analysis/ConvergenceCheck.cpp b/clang/lib/Analysis/ConvergenceCheck.cpp
index 75139388ea19e..93744f8b8e495 100644
--- a/clang/lib/Analysis/ConvergenceCheck.cpp
+++ b/clang/lib/Analysis/ConvergenceCheck.cpp
@@ -16,6 +16,11 @@
using namespace clang;
using namespace llvm;
+static void errorJumpIntoNoConvergent(Sema &S, Stmt *From, Stmt *Parent) {
+ S.Diag(Parent->getBeginLoc(), diag::err_jump_into_noconvergent);
+ S.Diag(From->getBeginLoc(), diag::note_goto_affects_convergence);
+}
+
static void warnGotoCycle(Sema &S, Stmt *From, Stmt *Parent) {
S.Diag(Parent->getBeginLoc(),
diag::warn_cycle_created_by_goto_affects_convergence);
@@ -27,7 +32,8 @@ static void warnJumpIntoLoop(Sema &S, Stmt *From, Stmt *Loop) {
S.Diag(From->getBeginLoc(), diag::note_goto_affects_convergence);
}
-static void checkConvergenceOnGoto(Sema &S, GotoStmt *From, ParentMap &PM) {
+static void checkConvergenceOnGoto(Sema &S, GotoStmt *From, ParentMap &PM,
+ bool GenerateWarnings, bool GenerateTokens) {
Stmt *To = From->getLabel()->getStmt();
unsigned ToDepth = PM.getParentDepth(To) + 1;
@@ -42,7 +48,7 @@ static void checkConvergenceOnGoto(Sema &S, GotoStmt *From, ParentMap &PM) {
}
// Special case: the goto statement is a descendant of the label statement.
- if (ExpandedFrom == ExpandedTo) {
+ if (GenerateWarnings && ExpandedFrom == ExpandedTo) {
assert(ExpandedTo == To);
warnGotoCycle(S, From, To);
return;
@@ -60,10 +66,18 @@ static void checkConvergenceOnGoto(Sema &S, GotoStmt *From, ParentMap &PM) {
SmallVector<Stmt *> Loops;
for (Stmt *I = To; I != ParentFrom; I = PM.getParent(I)) {
+ if (GenerateTokens)
+ if (const auto *AS = dyn_cast<AttributedStmt>(I))
+ if (hasSpecificAttr<NoConvergentAttr>(AS->getAttrs()))
+ errorJumpIntoNoConvergent(S, From, I);
// Can't jump into a ranged-for, so we don't need to look for it here.
- if (isa<ForStmt, WhileStmt, DoStmt>(I))
+ if (GenerateWarnings && isa<ForStmt, WhileStmt, DoStmt>(I))
Loops.push_back(I);
}
+
+ if (!GenerateWarnings)
+ return;
+
for (Stmt *I : reverse(Loops))
warnJumpIntoLoop(S, From, I);
@@ -88,21 +102,29 @@ static void warnSwitchIntoLoop(Sema &S, Stmt *Case, Stmt *Loop) {
}
static void checkConvergenceForSwitch(Sema &S, SwitchStmt *Switch,
- ParentMap &PM) {
+ ParentMap &PM, bool GenerateWarnings,
+ bool GenerateTokens) {
for (SwitchCase *Case = Switch->getSwitchCaseList(); Case;
Case = Case->getNextSwitchCase()) {
SmallVector<Stmt *> Loops;
for (Stmt *I = Case; I != Switch; I = PM.getParent(I)) {
+ if (GenerateTokens)
+ if (const auto *AS = dyn_cast<AttributedStmt>(I))
+ if (hasSpecificAttr<NoConvergentAttr>(AS->getAttrs()))
+ errorJumpIntoNoConvergent(S, Switch, I);
// Can't jump into a ranged-for, so we don't need to look for it here.
- if (isa<ForStmt, WhileStmt, DoStmt>(I))
+ if (GenerateWarnings && isa<ForStmt, WhileStmt, DoStmt>(I))
Loops.push_back(I);
}
- for (Stmt *I : reverse(Loops))
- warnSwitchIntoLoop(S, Case, I);
+ if (GenerateWarnings) {
+ for (Stmt *I : reverse(Loops))
+ warnSwitchIntoLoop(S, Case, I);
+ }
}
}
-void clang::analyzeForConvergence(Sema &S, AnalysisDeclContext &AC) {
+void clang::analyzeForConvergence(Sema &S, AnalysisDeclContext &AC,
+ bool GenerateWarnings, bool GenerateTokens) {
// Iterating over the CFG helps trim unreachable blocks, and locates Goto
// statements faster than iterating over the whole body.
CFG *cfg = AC.getCFG();
@@ -111,9 +133,10 @@ void clang::analyzeForConvergence(Sema &S, AnalysisDeclContext &AC) {
for (CFGBlock *BI : *cfg) {
Stmt *Term = BI->getTerminatorStmt();
if (GotoStmt *Goto = dyn_cast_or_null<GotoStmt>(Term)) {
- checkConvergenceOnGoto(S, Goto, PM);
+ checkConvergenceOnGoto(S, Goto, PM, GenerateWarnings, GenerateTokens);
} else if (SwitchStmt *Switch = dyn_cast_or_null<SwitchStmt>(Term)) {
- checkConvergenceForSwitch(S, Switch, PM);
+ checkConvergenceForSwitch(S, Switch, PM, GenerateWarnings,
+ GenerateTokens);
}
}
}
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 8cb27420dd911..20f251a5ba5b2 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -5773,7 +5773,13 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
Attrs =
Attrs.addFnAttribute(getLLVMContext(), llvm::Attribute::AlwaysInline);
- // Remove call-site convergent attribute if requested.
+ // Remove call-site convergent attribute if this call occurs inside a
+ // noconvergent statement. This is the legacy behaviour when convergence
+ // control tokens are not in use. It only affects inline asm calls, since all
+ // other function calls inherit the convergent attribute from the callee. When
+ // convergence control tokens are in use, any inline asm calls should be
+ // explicitly marked noconvergent, else they simply inherit whatever token is
+ // currently in scope.
if (InNoConvergentAttributedStmt)
Attrs =
Attrs.removeFnAttribute(getLLVMContext(), llvm::Attribute::Convergent);
diff --git a/clang/lib/CodeGen/CGStmt.cpp b/clang/lib/CodeGen/CGStmt.cpp
index 3562b4ea22a24..1a9a574572f67 100644
--- a/clang/lib/CodeGen/CGStmt.cpp
+++ b/clang/lib/CodeGen/CGStmt.cpp
@@ -829,14 +829,24 @@ void CodeGenFunction::EmitAttributedStmt(const AttributedStmt &S) {
} break;
}
}
+ bool LegacyNoConvergent = noconvergent && !CGM.shouldEmitConvergenceTokens();
SaveAndRestore save_nomerge(InNoMergeAttributedStmt, nomerge);
SaveAndRestore save_noinline(InNoInlineAttributedStmt, noinline);
SaveAndRestore save_alwaysinline(InAlwaysInlineAttributedStmt, alwaysinline);
- SaveAndRestore save_noconvergent(InNoConvergentAttributedStmt, noconvergent);
+ SaveAndRestore save_noconvergent(InNoConvergentAttributedStmt,
+ LegacyNoConvergent);
SaveAndRestore save_musttail(MustTailCall, musttail);
SaveAndRestore save_flattenOrBranch(HLSLControlFlowAttr, flattenOrBranch);
CGAtomicOptionsRAII AORAII(CGM, AA);
+ if (noconvergent && CGM.shouldEmitConvergenceTokens()) {
+ EmitBlock(createBasicBlock("noconvergent.anchor"));
+ ConvergenceTokenStack.push_back(
+ emitConvergenceAnchorToken(Builder.GetInsertBlock()));
+ }
EmitStmt(S.getSubStmt(), S.getAttrs());
+ if (noconvergent && CGM.shouldEmitConvergenceTokens()) {
+ ConvergenceTokenStack.pop_back();
+ }
}
void CodeGenFunction::EmitGotoStmt(const GotoStmt &S) {
@@ -3317,16 +3327,6 @@ CodeGenFunction::GenerateCapturedStmtFunction(const CapturedStmt &S) {
return F;
}
-// Returns the first convergence entry/loop/anchor instruction found in |BB|.
-// std::nullptr otherwise.
-static llvm::ConvergenceControlInst *getConvergenceToken(llvm::BasicBlock *BB) {
- for (auto &I : *BB) {
- if (auto *CI = dyn_cast<llvm::ConvergenceControlInst>(&I))
- return CI;
- }
- return nullptr;
-}
-
llvm::CallBase *
CodeGenFunction::addConvergenceControlToken(llvm::CallBase *Input) {
llvm::ConvergenceControlInst *ParentToken = ConvergenceTokenStack.back();
@@ -3348,15 +3348,33 @@ CodeGenFunction::emitConvergenceLoopToken(llvm::BasicBlock *BB) {
return llvm::ConvergenceControlInst::CreateLoop(*BB, ParentToken);
}
+llvm::ConvergenceControlInst *
+CodeGenFunction::emitConvergenceAnchorToken(llvm::BasicBlock *BB) {
+ return llvm::ConvergenceControlInst::CreateAnchor(*BB);
+}
+
llvm::ConvergenceControlInst *
CodeGenFunction::getOrEmitConvergenceEntryToken(llvm::Function *F) {
llvm::BasicBlock *BB = &F->getEntryBlock();
- llvm::ConvergenceControlInst *Token = getConvergenceToken(BB);
+ llvm::ConvergenceControlInst *Token = llvm::getConvergenceControlDef(*BB);
if (Token)
return Token;
- // Adding a convergence token requires the function to be marked as
+ // Adding a convergence entry token requires the function to be marked as
// convergent.
F->setConvergent();
return llvm::ConvergenceControlInst::CreateEntry(*BB);
}
+
+llvm::ConvergenceControlInst *
+CodeGenFunction::getOrEmitConvergenceAnchorToken(llvm::Function *F) {
+ llvm::BasicBlock *BB = &F->getEntryBlock();
+ llvm::ConvergenceControlInst *Token = llvm::getConvergenceControlDef(*BB);
+ if (Token)
+ return Token;
+
+ // Adding a convergence anchor token requires the function to be marked as
+ // not convergent.
+ F->setNotConvergent();
+ return llvm::ConvergenceControlInst::CreateAnchor(*BB);
+}
diff --git a/clang/lib/CodeGen/CodeGenFunction.cpp b/clang/lib/CodeGen/CodeGenFunction.cpp
index 4d29ceace646f..d9226bdd775a3 100644
--- a/clang/lib/CodeGen/CodeGenFunction.cpp
+++ b/clang/lib/CodeGen/CodeGenFunction.cpp
@@ -47,6 +47,7 @@
#include "llvm/Support/CRC.h"
#include "llvm/Support/xxhash.h"
#include "llvm/Transforms/Scalar/LowerExpectIntrinsic.h"
+#include "llvm/Transforms/Utils/FixConvergenceControl.h"
#include "llvm/Transforms/Utils/PromoteMemToReg.h"
#include <optional>
@@ -371,12 +372,6 @@ void CodeGenFunction::FinishFunction(SourceLocation EndLoc) {
assert(DeferredDeactivationCleanupStack.empty() &&
"mismatched activate/deactivate of cleanups!");
- if (CGM.shouldEmitConvergenceTokens()) {
- ConvergenceTokenStack.pop_back();
- assert(ConvergenceTokenStack.empty() &&
- "mismatched push/pop in convergence stack!");
- }
-
bool OnlySimpleReturnStmts = NumSimpleReturnExprs > 0
&& NumSimpleReturnExprs == NumReturnExprs
&& ReturnBlock.getBlock()->use_empty();
@@ -1362,8 +1357,13 @@ void CodeGenFunction::StartFunction(GlobalDecl GD, QualType RetTy,
if (const auto *VecWidth = CurFuncDecl->getAttr<MinVectorWidthAttr>())
LargestVectorWidth = VecWidth->getVectorWidth();
- if (CGM.shouldEmitConvergenceTokens())
- ConvergenceTokenStack.push_back(getOrEmitConvergenceEntryToken(CurFn));
+ if (CGM.shouldEmitConvergenceTokens()) {
+ llvm::ConvergenceControlInst *Token =
+ (FD && FD->hasAttr<NoConvergentAttr>())
+ ? getOrEmitConvergenceAnchorToken(CurFn)
+ : getOrEmitConvergenceEntryToken(CurFn);
+ ConvergenceTokenStack.push_back(Token);
+ }
}
void CodeGenFunction::EmitFunctionBody(const Stmt *Body) {
@@ -1647,6 +1647,13 @@ void CodeGenFunction::GenerateCode(GlobalDecl GD, llvm::Function *Fn,
}
}
+ if (CGM.shouldEmitConvergenceTokens()) {
+ ConvergenceTokenStack.pop_back();
+ assert(ConvergenceTokenStack.empty() &&
+ "mismatched push/pop in convergence stack!");
+ fixConvergenceControl(CurFn);
+ }
+
// Emit the standard function epilogue.
FinishFunction(BodyRange.getEnd());
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index 9254c7077237f..0d20218f6cbf1 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -5339,15 +5339,24 @@ class CodeGenFunction : public CodeGenTypeCache {
// as it's parent convergence instr.
llvm::ConvergenceControlInst *emitConvergenceLoopToken(llvm::BasicBlock *BB);
+ // Emits a convergence_anchor instruction for the given |BB|.
+ llvm::ConvergenceControlInst *
+ emitConvergenceAnchorToken(llvm::BasicBlock *BB);
+
// Adds a convergence_ctrl token with |ParentToken| as parent convergence
// instr to the call |Input|.
llvm::CallBase *addConvergenceControlToken(llvm::CallBase *Input);
- // Find the convergence_entry instruction |F|, or emits ones if none exists.
- // Returns the convergence instruction.
+ // Find the convergence control token in the entry block of |F|, or if none
+ // exists, create an entry token.
llvm::ConvergenceControlInst *
getOrEmitConvergenceEntryToken(llvm::Function *F);
+ // Find the convergence control token in the entry block of |F|, or if none
+ // exists, create an anchor token.
+ llvm::ConvergenceControlInst *
+ getOrEmitConvergenceAnchorToken(llvm::Function *F);
+
private:
llvm::MDNode *getRangeForLoadFromType(QualType Ty);
void EmitReturnOfRValue(RValue RV, QualType Ty);
diff --git a/clang/lib/CodeGen/CodeGenModule.h b/clang/lib/CodeGen/CodeGenModule.h
index 9a0bc675e0baa..1651c87049df8 100644
--- a/clang/lib/CodeGen/CodeGenModule.h
+++ b/clang/lib/CodeGen/CodeGenModule.h
@@ -1751,7 +1751,7 @@ class CodeGenModule : public CodeGenTypeCache {
bool shouldEmitConvergenceTokens() const {
// TODO: this should probably become unconditional once the controlled
// convergence becomes the norm.
- return getTriple().isSPIRVLogical();
+ return getTriple().isSPIRVLogical() || getLangOpts().ConvergenceControl;
}
void addUndefinedGlobalForTailCall(
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index b2dd4b3b54869..c9e37548fa835 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -7098,6 +7098,9 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
if (Args.hasFlag(options::OPT_fhip_new_launch_api,
options::OPT_fno_hip_new_launch_api, true))
CmdArgs.push_back("-fhip-new-launch-api");
+ if (Args.hasFlag(options::OPT_fconvergence_control,
+ options::OPT_fno_convergence_control, false))
+ CmdArgs.push_back("-fconvergence-control");
Args.addOptInFlag(CmdArgs, options::OPT_fgpu_allow_device_init,
options::OPT_fno_gpu_allow_device_init);
Args.AddLastArg(CmdArgs, options::OPT_hipstdpar);
diff --git a/clang/lib/Sema/AnalysisBasedWarnings.cpp b/clang/lib/Sema/AnalysisBasedWarnings.cpp
index 31756d3a2f75a..bb9efacd74a88 100644
--- a/clang/lib/Sema/AnalysisBasedWarnings.cpp
+++ b/clang/lib/Sema/AnalysisBasedWarnings.cpp
@@ -2866,9 +2866,11 @@ void clang::sema::AnalysisBasedWarnings::IssueWarnings(
if (S.getLangOpts().CPlusPlus && !fscope->isCoroutine() && isNoexcept(FD))
checkThrowInNonThrowingFunc(S, FD, AC);
- if (!Diags.isIgnored(diag::warn_cycle_created_by_goto_affects_convergence,
- D->getBeginLoc()))
- analyzeForConvergence(S, AC);
+ bool WarnConvergence = !Diags.isIgnored(
+ diag::warn_cycle_created_by_goto_affects_convergence, D->getBeginLoc());
+ bool GenerateTokens = S.getLangOpts().ConvergenceControl;
+ if (GenerateTokens || WarnConvergence)
+ analyzeForConvergence(S, AC, WarnConvergence, GenerateTokens);
// If none of the previous checks caused a CFG build, trigger one here
// for the logical error handler.
diff --git a/clang/test/CodeGenHIP/convergence-tokens.hip b/clang/test/CodeGenHIP/convergence-tokens.hip
new file mode 100644
index 0000000000000..f1807acc8b0d4
--- /dev/null
+++ b/clang/test/CodeGenHIP/convergence-tokens.hip
@@ -0,0 +1,687 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -triple amdgcn -fconvergence-control -Wno-convergence -emit-llvm -disable-llvm-optzns -fcuda-is-device -o - %s | FileCheck %s
+
+
+// NOTE:
+// =====
+// The following tests are of particular interest:
+// - jump_into_unreachable_nest
+// - backwards_jump_into_nest
+// - forever_loops
+// - backwards_inside_label
+// - switch_backwards
+// - backwards_conditional
+// - duffs_device
+
+#define __device__ __attribute__((device))
+
+__device__ void foo() __attribute__((convergent));
+
+// CHECK-LABEL: @_Z14jump_into_nesti
+// CHECK: entry:
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: if.end: ; preds = %entry
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: while.body: ; preds = %while.cond
+// CHECK: [[TOK2:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK2]]) ]
+// CHECK: do.body: ; preds = %do.cond, %if.then2
+// CHECK: [[TOK4:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK4]]) ]
+// CHECK: do.end: ; preds = %do.cond
+// CHECK: [[TOK7:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK7]]) ]
+
+__device__ int jump_into_nest(int cond) {
+ int i = 0;
+
+ if (cond > 1) {
+ goto jumptarget;
+ }
+
+ foo();
+
+ while (true) {
+ foo();
+ if (cond > 1) {
+ do {
+ i++;
+ foo();
+ jumptarget:
+ i++;
+ } while (true);
+ foo();
+ }
+ }
+
+ return i;
+}
+
+// CHECK-LABEL: @_Z26jump_into_unreachable_nesti
+// CHECK: entry:
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: while.body: ; preds = %while.cond
+// CHECK: [[TOK2:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK2]]) ]
+// CHECK: do.body: ; preds = %do.cond, %if.then
+// CHECK: [[TOK4:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK4]]) ]
+// CHECK: do.end: ; preds = %do.cond
+// CHECK: [[TOK7:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK7]]) ]
+
+// A function with a nest of reducible forever loops, where the entry is from
+// the label ``jumptarget`` since they are unreachable with sequential control
+// flow. The detected cycles are inside out ... the do-while loop is the parent
+// of the while loop. What's even more weird is that the inner loop is actually
+// unreachable because the condition for the do-while never branches to it.
+__device__ int jump_into_unreachable_nest(int cond) {
+ int i = 0;
+
+ foo();
+
+ goto jumptarget;
+
+ while (true) {
+ foo();
+ if (cond > 1) {
+ do {
+ i++;
+ foo();
+ jumptarget:
+ i++;
+ } while (true);
+ foo();
+ }
+ }
+
+ return i;
+}
+
+// CHECK-LABEL: @_Z24backwards_jump_into_nesti
+// CHECK: entry:
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: while.cond: ; preds = %if.end, %entry
+// CHECK: [[TOK1:%[0-9]+]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: while.body: ; preds = %while.cond
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK1]]) ]
+// CHECK: do.body: ; preds = %do.cond, %if.then
+// CHECK: [[TOK6:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK6]]) ]
+// CHECK: do.end: ; preds = %do.cond
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK1]]) ]
+
+// The outer while-loop dominates the backwards jump, and hence remains
+// reducible. But the do-while loop does not dominate and hence becomes an
+// irreducible cycle with two entries: the blocks %do.body and %jumptarget.
+__device__ int backwards_jump_into_nest(int cond1, int cond2, int cond3, int cond4) {
+ int i = 0;
+
+ while (i < cond1) {
+ foo();
+ if (cond2 > i) {
+ do {
+ i++;
+ foo();
+ jumptarget:
+ i++;
+ } while (cond3 > i);
+ foo();
+ }
+ }
+
+ if (cond4 > i) {
+ goto jumptarget;
+ }
+
+ return i;
+}
+
+// CHECK-LABEL: @_Z17forward_all_kindsiiiii
+// CHECK: entry:
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: for.body: ; preds = %for.cond
+// CHECK: [[TOK4:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK4]]) ]
+// CHECK: while.body: ; preds = %while.cond
+// CHECK: [[TOK8:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK8]]) ]
+// CHECK: do.body: ; preds = %do.cond, %if.then6
+// CHECK: [[TOK10:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK10]]) ]
+// CHECK: while.cond10: ; preds = %while.body12, %if.else
+// CHECK: [[TOK15:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: while.body12: ; preds = %while.cond10
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK15]]) ]
+
+// All loops become irreducible due to the jump into the nest.
+__device__ int forward_all_kinds(int cond1, int cond2, int cond3, int cond4, int cond5) {
+ int i = 0;
+
+ if (cond1 < 0)
+ goto jumptarget;
+
+ for (; i < cond5;) {
+ foo();
+ if (cond2 != 0) {
+ while (i < cond3) {
+ foo();
+ if (cond4 > 1) {
+ do {
+ foo();
+ i++;
+ jumptarget:
+ i++;
+ } while (i < cond4);
+ }
+ }
+ } else {
+ while (i < cond3) {
+ foo();
+ i++;
+ }
+ }
+ }
+ return i;
+}
+
+// CHECK-LABEL: @_Z13forever_loopsi
+// CHECK: entry:
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: for.cond: ; preds = %if.end
+// CHECK: [[TOK2:%[0-9]+]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK2]]) ]
+// CHECK: jumptarget: ; preds = %while.body, %if.then
+// CHECK: [[TOK3:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK3]]) ]
+
+// The outer for-loop is "interesting". The frontend eliminates its backedge
+// effectively making it a non-loop, since control never exits the inner
+// while-loop.
+__device__ int forever_loops(int cond1) {
+ int i = 0;
+
+ if (cond1 < 0)
+ goto jumptarget;
+
+ for (;;) {
+ foo();
+ while (true) {
+ jumptarget:
+ i++;
+ foo();
+ }
+ }
+
+ return i;
+}
+
+// CHECK-LABEL: @_Z12nest_to_nesti
+// CHECK: entry:
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: for.cond: ; preds = %if.end3, %entry
+// CHECK: [[TOK1:%[0-9]+]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK1]]) ]
+// CHECK: while.cond: ; preds = %if.end, %if.then
+// CHECK: [[TOK3:%[0-9]+]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TOK1]]) ]
+// CHECK: while.body: ; preds = %while.cond
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK3]]) ]
+// CHECK: while.body5: ; preds = %while.cond4
+// CHECK: [[TOK6:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK6]]) ]
+// CHECK: do.body: ; preds = %do.cond, %if.then7
+// CHECK: [[TOK8:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK8]]) ]
+
+__device__ int nest_to_nest(int cond) {
+ int i = 0;
+
+ for (;;) {
+ foo();
+ if (cond != 0) {
+ while (true) {
+ foo();
+ if (cond > 1) {
+ goto jumptarget;
+ }
+ }
+ }
+ }
+
+ while (true) {
+ foo();
+ if (cond > 1) {
+ do {
+ foo();
+ i++;
+ jumptarget:
+ i++;
+ } while (true);
+ }
+ }
+
+ return i;
+}
+
+// CHECK-LABEL: @_Z22backwards_nest_to_nesti
+// CHECK: entry:
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: while.cond: ; preds = %if.end, %entry
+// CHECK: [[TOK1:%[0-9]+]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: while.body: ; preds = %while.cond
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK1]]) ]
+// CHECK: do.body: ; preds = %do.cond, %if.then
+// CHECK: [[TOK3:%[0-9]+]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TOK1]]) ]
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK3]]) ]
+
+__device__ int backwards_nest_to_nest(int cond) {
+ int i = 0;
+
+ while (true) {
+ foo();
+ if (cond > 1) {
+ do {
+ foo();
+ i++;
+ jumptarget:
+ i++;
+ } while (true);
+ }
+ }
+
+ for (;;) {
+ foo();
+ if (cond != 0) {
+ while (true) {
+ foo();
+ if (cond > 1) {
+ goto jumptarget;
+ }
+ }
+ }
+ }
+
+ return i;
+}
+
+// CHECK-LABEL: @_Z17backwards_and_outi
+// CHECK: entry:
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: if.then: ; preds = %entry
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: jumptarget: ; preds = %if.then4, %if.then
+// CHECK: [[TOK2:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK2]]) ]
+// CHECK: for.cond: ; preds = %if.end6, %if.end
+// CHECK: [[TOK4:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK4]]) ]
+// CHECK: while.cond: ; preds = %if.end5, %if.then2
+// CHECK: [[TOK6:%[0-9]+]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TOK4]]) ]
+// CHECK: while.body: ; preds = %while.cond
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK6]]) ]
+
+__device__ int backwards_and_out(int cond) {
+ int i = 0;
+
+ if (cond > 1) {
+ foo();
+ jumptarget:
+ foo();
+ i++;
+ }
+
+ for (;;) {
+ foo();
+ if (cond != 0) {
+ while (true) {
+ foo();
+ if (cond > 1) {
+ goto jumptarget;
+ }
+ }
+ }
+ }
+
+ return i;
+}
+
+// CHECK-LABEL: @_Z21backwards_inside_loopi
+// CHECK: entry:
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: for.cond: ; preds = %if.end, %entry
+// CHECK: [[TOK1:%[0-9]+]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK1]]) ]
+// CHECK: jumptarget: ; preds = %if.then, %for.cond
+// CHECK: [[TOK3:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK3]]) ]
+
+__device__ int backwards_inside_loop(int cond) {
+ int i = 0;
+ int j = 0;
+ int k = 0;
+ for (;;) {
+ foo();
+ i++;
+ jumptarget:
+ foo();
+ j++;
+ k++;
+ if (cond > 5)
+ goto jumptarget;
+ }
+ return i + j + k;
+}
+
+// CHECK-LABEL: @_Z19loop_backwards_loopi
+// CHECK: entry:
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: for.cond: ; preds = %entry
+// CHECK: [[TOK1:%[0-9]+]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK1]]) ]
+// CHECK: while.cond: ; preds = %if.end, %jumptarget
+// CHECK: [[TOK4:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: while.body: ; preds = %while.cond
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK4]]) ]
+
+__device__ int loop_backwards_loop(int cond) {
+ int i = 0;
+ int j = 0;
+ int k = 0;
+ for (;;) {
+ foo();
+ i++;
+ jumptarget:
+ j++;
+ while (true) {
+ foo();
+ k++;
+ if (cond > 5)
+ goto jumptarget;
+ }
+ }
+ return i + j + k;
+}
+
+// CHECK-LABEL: @_Z22backwards_inside_labeli
+// CHECK: entry:
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: jumptarget: ; preds = %if.then, %entry
+// CHECK: [[TOK1:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK1]]) ]
+
+// The border-case where a goto is a child of its own target label statement.
+// This is actually a reducible cycle and can use the loop intrinsic if we
+// strengthen the semantics of the backwards goto.
+__device__ int backwards_inside_label(int cond) {
+ int i = 0;
+ int j = 0;
+ jumptarget: {
+ foo();
+ i++;
+ j++;
+ if (cond > 5)
+ goto jumptarget;
+ }
+ return i + j;
+}
+
+// CHECK-LABEL: @_Z29backwards_label_inside_branchi
+// CHECK: entry:
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: jumptarget: ; preds = %if.then3, %if.then
+// CHECK: [[TOK2:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK2]]) ]
+
+__device__ int backwards_label_inside_branch(int cond) {
+ int i = 0;
+ int j = 0;
+ if (cond > 0) {
+ jumptarget: {
+ foo();
+ i++;
+ j++;
+ if (cond > 5)
+ goto jumptarget;
+ }
+ } else {
+ j++;
+ }
+ return i + j;
+}
+
+// CHECK-LABEL: @_Z30backwards_inside_labelled_loopi
+// CHECK: entry:
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: while.cond: ; preds = %if.end, %jumptarget
+// CHECK: [[TOK1:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: while.body: ; preds = %while.cond
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK1]]) ]
+
+__device__ int backwards_inside_labelled_loop(int cond) {
+ int i = 0;
+ int j = 0;
+ jumptarget: while (true) {
+ foo();
+ i++;
+ j++;
+ if (cond > 5)
+ goto jumptarget;
+ }
+ return i + j;
+}
+
+// CHECK-LABEL: @_Z15switch_sidewaysi
+// CHECK: entry:
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: sw.bb: ; preds = %entry
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: sw.bb1: ; preds = %entry
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: sw.bb2: ; preds = %entry
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: jumptarget: ; preds = %sw.bb2, %sw.bb1
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: sw.default: ; preds = %entry
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+
+// Convergence is well-defined for a jump within a switch statement.
+__device__ int switch_sideways(int cond) {
+ int i = 0;
+ switch (cond) {
+ case 10:
+ foo();
+ i++;
+ break;
+ case 20:
+ foo();
+ i += 2;
+ goto jumptarget;
+ break;
+ case 30: {
+ foo();
+ i += 3;
+ jumptarget:
+ foo();
+ i += 4;
+ break;
+ }
+ default:
+ foo();
+ break;
+ }
+ return i;
+}
+
+// CHECK-LABEL: @_Z16switch_backwardsi
+// CHECK: entry:
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: sw.bb: ; preds = %entry
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: sw.bb1: ; preds = %entry
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: jumptarget: ; preds = %sw.bb3, %sw.bb1
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: sw.bb3: ; preds = %entry
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: sw.default: ; preds = %entry
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+
+// Convergence is declared to be implementation-defined for a backward jump
+// within a switch statement. But we don't actually do anything if it does not
+// create a cycle. So it's perfectly legal that the convergent operations below
+// use the entry token and not their own anchor token.
+__device__ int switch_backwards(int cond) {
+ int i = 0;
+ switch (cond) {
+ case 10:
+ foo();
+ i++;
+ break;
+ case 30: {
+ foo();
+ i += 3;
+ jumptarget:
+ foo();
+ i += 4;
+ break;
+ }
+ case 20:
+ foo();
+ i += 2;
+ goto jumptarget;
+ break;
+ default:
+ foo();
+ break;
+ }
+ return i;
+}
+
+// CHECK-LABEL: @_Z18switch_fallthroughi
+// CHECK: entry:
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: sw.bb: ; preds = %entry
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: sw.bb1: ; preds = %entry
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: sw.bb2: ; preds = %entry, %sw.bb1
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: sw.default: ; preds = %entry
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+
+// Convergence is well-defined with fallthrough.
+__device__ int switch_fallthrough(int cond) {
+ int i = 0;
+ switch (cond) {
+ case 10:
+ foo();
+ i++;
+ break;
+ case 20:
+ foo();
+ i += 2;
+ case 30: {
+ foo();
+ i += 3;
+ i += 4;
+ break;
+ }
+ default:
+ foo();
+ break;
+ }
+ return i;
+}
+
+// CHECK-LABEL: @_Z19forward_conditionali
+// CHECK: entry:
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: if.then: ; preds = %entry
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: jumptarget: ; preds = %if.else, %if.then
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+
+// Convergence is well-defined for forward jumps from the then-part to the
+// else-part of a conditional statement.
+__device__ int forward_conditional(int cond) {
+ int i = 0;
+
+ if (cond > 0) {
+ foo();
+ i++;
+ goto jumptarget;
+ } else {
+ jumptarget:
+ foo();
+ i++;
+ }
+
+ return i;
+}
+
+// CHECK-LABEL: @_Z21backwards_conditionali
+// CHECK: entry:
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: if.then: ; preds = %entry
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: if.else: ; preds = %entry
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+
+// Convergence is declared to be implementation-defined for a backwards jump
+// from the else-part to the then-part of a conditional statement. But just like
+// the backwards jump in a switch, we don't actually do anything about it if it
+// does not produce a cycle. Hence the convergent operations here use the entry
+// token rather than their own anchor token.
+__device__ int backwards_conditional(int cond) {
+ int i = 0;
+
+ foo();
+ if (cond > 0) {
+ foo();
+ jumptarget:
+ i++;
+ } else {
+ foo();
+ i++;
+ goto jumptarget;
+ }
+
+ return i;
+}
+
+// CHECK-LABEL: @_Z12duffs_devicei
+// CHECK: entry:
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: sw.bb: ; preds = %entry
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: do.body: ; preds = %do.cond, %sw.bb
+// CHECK: [[TOK3:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK3]]) ]
+// CHECK: sw.bb1: ; preds = %entry, %do.body
+// CHECK: [[TOK5:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #3 [ "convergencectrl"(token [[TOK5]]) ]
+
+// Note that the convergent operation in case 0 outside the do-while loop has
+// well-defined convergence, and must use the entry token. The other operations
+// must use anchor tokens.
+__device__ int duffs_device(int count) {
+ int i = 0;
+ int n = (count + 7) / 8;
+ switch (count % 8) {
+ case 0:
+ foo();
+ do { i++;
+ foo();
+ case 7: i++;
+ foo();
+ case 6: i++;
+ case 5: i++;
+ case 4: i++;
+ case 3: i++;
+ case 2: i++;
+ case 1: i++;
+ } while (--n > 0);
+ }
+ return i;
+}
diff --git a/clang/test/CodeGenHIP/noconvergent-statement.hip b/clang/test/CodeGenHIP/noconvergent-statement.hip
new file mode 100644
index 0000000000000..64579ec226768
--- /dev/null
+++ b/clang/test/CodeGenHIP/noconvergent-statement.hip
@@ -0,0 +1,109 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -triple amdgcn -fconvergence-control -emit-llvm -disable-llvm-optzns -fcuda-is-device -o - %s | FileCheck %s
+
+#define __device__ __attribute__((device))
+
+__device__ void foo() __attribute__((convergent));
+
+__device__ bool check() __attribute__((convergent));
+
+// ------ The entry of a noconvergent function uses an anchor
+// CHECK-LABEL: @_Z3bari
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #4 [ "convergencectrl"(token [[TOK0]]) ]
+__device__ __attribute__((noconvergent)) int bar(int x) {
+ foo();
+ return x;
+}
+
+// ------ No token on a call to a noconvergent function
+// CHECK-LABEL: @_Z17call_noconvergenti
+// CHECK: %call = call noundef i32 @_Z3bari(i32 noundef %4) #
+__device__ int call_noconvergent(int cond) {
+ int i = 0;
+
+ while (i < cond) {
+ i = bar(i);
+ }
+
+ return i;
+}
+
+// CHECK-LABEL: @_Z16branch_statementi
+// CHECK: noconvergent.anchor:
+// CHECK: [[TOK1:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: %call1 = call noundef zeroext i1 @_Z5checkv() #4 [ "convergencectrl"(token [[TOK1]]) ]
+// CHECK: if.then:
+// CHECK: call void @_Z3foov() #4 [ "convergencectrl"(token [[TOK1]]) ]
+__device__ int branch_statement(int cond) {
+ int i = 0;
+
+ [[clang::noconvergent]] if (check()) {
+ foo();
+ } else {
+ foo();
+ }
+
+ return i;
+}
+
+// CHECK-LABEL: @_Z19branch_substatementi
+// CHECK: entry:
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: %call1 = call noundef zeroext i1 @_Z5checkv() #4 [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: noconvergent.anchor:
+// CHECK: [[TOK1:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #4 [ "convergencectrl"(token [[TOK1]]) ]
+// CHECK: noconvergent.anchor2:
+// CHECK: [[TOK2:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #4 [ "convergencectrl"(token [[TOK2]]) ]
+__device__ int branch_substatement(int cond) {
+ int i = 0;
+
+ if (check()) [[clang::noconvergent]] {
+ foo();
+ } else [[clang::noconvergent]] {
+ foo();
+ }
+
+ return i;
+}
+
+// CHECK-LABEL: @_Z9loop_bodyi
+// CHECK: entry:
+// CHECK: [[TOK0:%[0-9]+]] = call token @llvm.experimental.convergence.entry()
+// CHECK: while.cond: ; preds = %noconvergent.anchor, %entry
+// CHECK: [[TOK1:%[0-9]+]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TOK0]]) ]
+// CHECK: %call1 = call noundef zeroext i1 @_Z5checkv() #4 [ "convergencectrl"(token [[TOK1]]) ]
+// CHECK: noconvergent.anchor: ; preds = %while.body
+// CHECK: [[TOK2:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: call void @_Z3foov() #4 [ "convergencectrl"(token [[TOK2]]) ]
+__device__ int loop_body(int cond) {
+ int i = 0;
+
+ while (check()) [[clang::noconvergent]] {
+ foo();
+ i++;
+ }
+
+ return i;
+}
+
+// CHECK-LABEL: @_Z14loop_statementi
+// CHECK: noconvergent.anchor:
+// CHECK: [[TOK1:%[0-9]+]] = call token @llvm.experimental.convergence.anchor()
+// CHECK: while.cond:
+// CHECK: [[TOK2:%[0-9]+]] = call token @llvm.experimental.convergence.loop() [ "convergencectrl"(token [[TOK1]]) ]
+// CHECK: %call1 = call noundef zeroext i1 @_Z5checkv() #4 [ "convergencectrl"(token [[TOK2]]) ]
+// CHECK: while.body:
+// CHECK: call void @_Z3foov() #4 [ "convergencectrl"(token [[TOK2]]) ]
+__device__ int loop_statement(int cond) {
+ int i = 0;
+
+ [[clang::noconvergent]] while (check()) {
+ foo();
+ i++;
+ }
+
+ return i;
+}
diff --git a/clang/test/SemaHIP/noconvergent-errors/backwards_jump.hip b/clang/test/SemaHIP/noconvergent-errors/backwards_jump.hip
new file mode 100644
index 0000000000000..557431a75feda
--- /dev/null
+++ b/clang/test/SemaHIP/noconvergent-errors/backwards_jump.hip
@@ -0,0 +1,23 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: not %clang_cc1 -fsyntax-only -triple amdgcn -fconvergence-control -Wno-unused-value %s -o - 2>&1 | FileCheck %s
+
+#define __device__ __attribute__((device))
+
+__device__ void foo() __attribute__((convergent));
+
+__device__ int simple_jump_backwards(int cond) {
+ int i = 0;
+
+ // CHECK: :[[# @LINE+1]]:{{[0-9]+}}: error: cannot jump into a noconvergent statement from outside
+ while (true) [[clang::noconvergent]] {
+ foo();
+ jumptarget:
+ i++;
+ }
+
+ if (cond > 1) {
+ goto jumptarget;
+ }
+
+ return i;
+}
diff --git a/clang/test/SemaHIP/noconvergent-errors/jump-into-nest.hip b/clang/test/SemaHIP/noconvergent-errors/jump-into-nest.hip
new file mode 100644
index 0000000000000..8c76dc14fd5b4
--- /dev/null
+++ b/clang/test/SemaHIP/noconvergent-errors/jump-into-nest.hip
@@ -0,0 +1,32 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: not %clang_cc1 -fsyntax-only -triple amdgcn -fconvergence-control -Wno-unused-value %s -o - 2>&1 | FileCheck %s
+
+#define __device__ __attribute__((device))
+
+__device__ void foo() __attribute__((convergent));
+
+__device__ int jump_into_nest(int cond) {
+ int i = 0;
+
+ if (cond > 1) {
+ goto jumptarget;
+ }
+
+ foo();
+
+ while (true) {
+ foo();
+ if (cond > 1) {
+ // CHECK: :[[# @LINE+1]]:{{[0-9]+}}: error: cannot jump into a noconvergent statement from outside
+ do [[clang::noconvergent]] {
+ i++;
+ foo();
+ jumptarget:
+ i++;
+ } while (true);
+ foo();
+ }
+ }
+
+ return i;
+}
diff --git a/clang/test/SemaHIP/noconvergent-errors/no-errors.hip b/clang/test/SemaHIP/noconvergent-errors/no-errors.hip
new file mode 100644
index 0000000000000..6280863658554
--- /dev/null
+++ b/clang/test/SemaHIP/noconvergent-errors/no-errors.hip
@@ -0,0 +1,83 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang_cc1 -fsyntax-only -triple amdgcn -fconvergence-control -Wno-unused-value %s -o - 2>&1
+
+#define __device__ __attribute__((device))
+
+__device__ void foo() __attribute__((convergent));
+
+__device__ int annotated_target(int cond) {
+ int i = 0;
+
+ if (cond > 1) {
+ goto jumptarget;
+ }
+
+ while (true) {
+ foo();
+ jumptarget: [[clang::noconvergent]] {
+ foo();
+ i++;
+ }
+ }
+
+ return i;
+}
+
+__device__ int jump_across(int cond) {
+ int i = 0;
+
+ if (cond > 1) {
+ goto jumptarget;
+ }
+
+ while (true) [[clang::noconvergent]] {
+ foo();
+ i++;
+ }
+
+ jumptarget:
+ return i;
+}
+
+__device__ int jump_internal(int cond) {
+ int i = 0;
+
+ while (true) [[clang::noconvergent]] {
+ foo();
+ jumptarget:
+ i++;
+
+ if (cond > 1) {
+ goto jumptarget;
+ }
+ }
+ return i;
+}
+
+__device__ void nest_to_nest(int cond) {
+ int i = 0;
+
+ for (;;) {
+ foo();
+ if (cond != 0) {
+ [[clang::noconvergent]] while (true) {
+ foo();
+ if (cond > 1) {
+ goto jumptarget;
+ }
+ }
+ }
+ }
+
+ while (true) {
+ foo();
+ if (cond > 1) {
+ do {
+ foo();
+ i++;
+ jumptarget:
+ i++;
+ } while (true);
+ }
+ }
+}
diff --git a/clang/test/SemaHIP/noconvergent-errors/simple_jump.hip b/clang/test/SemaHIP/noconvergent-errors/simple_jump.hip
new file mode 100644
index 0000000000000..4321e99aa6467
--- /dev/null
+++ b/clang/test/SemaHIP/noconvergent-errors/simple_jump.hip
@@ -0,0 +1,23 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: not %clang_cc1 -fsyntax-only -triple amdgcn -fconvergence-control -Wno-unused-value %s -o - 2>&1 | FileCheck %s
+
+#define __device__ __attribute__((device))
+
+__device__ void foo() __attribute__((convergent));
+
+__device__ int simple_jump(int cond) {
+ int i = 0;
+
+ if (cond > 1) {
+ goto jumptarget;
+ }
+
+ // CHECK: :[[# @LINE+1]]:{{[0-9]+}}: error: cannot jump into a noconvergent statement from outside
+ while (true) [[clang::noconvergent]] {
+ foo();
+ jumptarget:
+ i++;
+ }
+
+ return i;
+}
diff --git a/llvm/include/llvm/IR/InstrTypes.h b/llvm/include/llvm/IR/InstrTypes.h
index 8e47e3c7b3a7c..616c374a07b90 100644
--- a/llvm/include/llvm/IR/InstrTypes.h
+++ b/llvm/include/llvm/IR/InstrTypes.h
@@ -45,6 +45,7 @@ class StringRef;
class Type;
class Value;
class ConstantRange;
+class ConvergenceControlInst;
namespace Intrinsic {
typedef unsigned ID;
@@ -1180,12 +1181,7 @@ class CallBase : public Instruction {
InsertPosition InsertPt = nullptr);
/// Return the convergence control token for this call, if it exists.
- Value *getConvergenceControlToken() const {
- if (auto Bundle = getOperandBundle(llvm::LLVMContext::OB_convergencectrl)) {
- return Bundle->Inputs[0].get();
- }
- return nullptr;
- }
+ ConvergenceControlInst *getConvergenceControlToken() const;
static bool classof(const Instruction *I) {
return I->getOpcode() == Instruction::Call ||
diff --git a/llvm/include/llvm/IR/IntrinsicInst.h b/llvm/include/llvm/IR/IntrinsicInst.h
index 93750d6e3845e..5cad494298b9b 100644
--- a/llvm/include/llvm/IR/IntrinsicInst.h
+++ b/llvm/include/llvm/IR/IntrinsicInst.h
@@ -1889,6 +1889,18 @@ class ConvergenceControlInst : public IntrinsicInst {
ConvergenceControlInst *Parent);
};
+/// Returns the first occurence of a ConvergenceControlInst in \p BB
+inline ConvergenceControlInst *getConvergenceControlDef(BasicBlock &BB) {
+ for (auto &I : BB) {
+ if (auto *CI = dyn_cast<llvm::ConvergenceControlInst>(&I))
+ return CI;
+ }
+ return nullptr;
+}
+
+CallBase *setConvergenceControlToken(CallBase *CB,
+ ConvergenceControlInst *Token);
+
} // end namespace llvm
#endif // LLVM_IR_INTRINSICINST_H
diff --git a/llvm/include/llvm/Transforms/Utils/FixConvergenceControl.h b/llvm/include/llvm/Transforms/Utils/FixConvergenceControl.h
new file mode 100644
index 0000000000000..7432692108af2
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Utils/FixConvergenceControl.h
@@ -0,0 +1,21 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// A utility function to fix convergence control tokens in the presence of
+// irreducible control flow.
+//
+//===----------------------------------------------------------------------===//
+
+namespace llvm {
+class Function;
+
+// Detect and fix invalid convergence control tokens after the entire function
+// is emitted in LLVM IR.
+void fixConvergenceControl(llvm::Function *F);
+
+} // namespace llvm
diff --git a/llvm/lib/IR/Instructions.cpp b/llvm/lib/IR/Instructions.cpp
index 18109bf107858..3fba7fa229c11 100644
--- a/llvm/lib/IR/Instructions.cpp
+++ b/llvm/lib/IR/Instructions.cpp
@@ -623,6 +623,13 @@ bool CallBase::hasClobberingOperandBundles() const {
getIntrinsicID() != Intrinsic::assume;
}
+ConvergenceControlInst *CallBase::getConvergenceControlToken() const {
+ if (auto Bundle = getOperandBundle(llvm::LLVMContext::OB_convergencectrl)) {
+ return cast<ConvergenceControlInst>(Bundle->Inputs[0].get());
+ }
+ return nullptr;
+}
+
MemoryEffects CallBase::getMemoryEffects() const {
MemoryEffects ME = getAttributes().getMemoryEffects();
if (auto *Fn = dyn_cast<Function>(getCalledOperand())) {
diff --git a/llvm/lib/IR/IntrinsicInst.cpp b/llvm/lib/IR/IntrinsicInst.cpp
index 256bce1abe71f..42e9fcde9afb1 100644
--- a/llvm/lib/IR/IntrinsicInst.cpp
+++ b/llvm/lib/IR/IntrinsicInst.cpp
@@ -913,3 +913,24 @@ ConvergenceControlInst::CreateLoop(BasicBlock &BB,
auto *Call = CallInst::Create(Fn, {}, {OB}, "", BB.getFirstInsertionPt());
return cast<ConvergenceControlInst>(Call);
}
+
+CallBase *llvm::setConvergenceControlToken(CallBase *CB,
+ ConvergenceControlInst *Token) {
+ llvm::Value *bundleArgs[] = {Token};
+ llvm::OperandBundleDef OB("convergencectrl", bundleArgs);
+
+ SmallVector<OperandBundleDef> Bundles;
+ for (unsigned I = 0, E = CB->getNumOperandBundles(); I != E; ++I) {
+ auto Bundle = CB->getOperandBundleAt(I);
+ if (Bundle.getTagID() == LLVMContext::OB_convergencectrl) {
+ continue;
+ }
+ Bundles.emplace_back(Bundle);
+ }
+ Bundles.push_back(OB);
+
+ CallBase *NewCB = CallBase::Create(CB, Bundles, CB->getIterator());
+ CB->replaceAllUsesWith(NewCB);
+ CB->eraseFromParent();
+ return NewCB;
+}
diff --git a/llvm/lib/Transforms/Utils/CMakeLists.txt b/llvm/lib/Transforms/Utils/CMakeLists.txt
index 78cad0d253be8..fb69ffe8ee9c1 100644
--- a/llvm/lib/Transforms/Utils/CMakeLists.txt
+++ b/llvm/lib/Transforms/Utils/CMakeLists.txt
@@ -25,6 +25,7 @@ add_llvm_component_library(LLVMTransformUtils
EntryExitInstrumenter.cpp
EscapeEnumerator.cpp
Evaluator.cpp
+ FixConvergenceControl.cpp
FixIrreducible.cpp
FlattenCFG.cpp
FunctionComparator.cpp
diff --git a/llvm/lib/Transforms/Utils/FixConvergenceControl.cpp b/llvm/lib/Transforms/Utils/FixConvergenceControl.cpp
new file mode 100644
index 0000000000000..91a9f0bff86fe
--- /dev/null
+++ b/llvm/lib/Transforms/Utils/FixConvergenceControl.cpp
@@ -0,0 +1,191 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// A utility function to fix convergence control tokens in the presence of
+// irreducible control flow.
+//
+// When generating convergence control tokens for iteration statements, CodeGen
+// ignores the presence of any jumps that may cause the resulting CFG to be
+// irreducible. CodeGen optimistically emits a `loop` intrinsic in the header of
+// every iteration statement, which is invalid if control can jump into that
+// statement. We fix this in the LLVM IR after the whole function is fully
+// generated. For each irreducible cycle discovered in the CFG, we replace the
+// `loop` intrinsic in the header with suitable invocations of `anchor` instead.
+//
+// Separately, a "backwards goto" can create a cycle that encapsulates an
+// iteration statement. As a result, the `loop` intrinsic in that iteration
+// statement no longer uses a token from its immediate parent cycle in the CFG.
+// This is also fixed by replacing the `loop` intrinsic with an `anchor`
+// intrinsic.
+//
+// The overall result is to produce in implementation-defined convergence as a
+// result of unstructured jumps as defined in the Clang spec for convergence.
+//
+// ===----------------------------------------------------------------------===//
+
+#include "llvm/Transforms/Utils/FixConvergenceControl.h"
+#include "llvm/IR/CycleInfo.h"
+#include "llvm/IR/InstIterator.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include <map>
+
+#define DEBUG_TYPE "convergence-fixup"
+
+using namespace llvm;
+
+static void
+initializeTokenSources(SmallVectorImpl<ConvergenceControlInst *> &Worklist,
+ Function *F) {
+ for (Instruction &II : instructions(F)) {
+ if (auto *CB = dyn_cast<ConvergenceControlInst>(&II)) {
+ if (CB->isEntry() || CB->isAnchor())
+ Worklist.push_back(CB);
+ }
+ }
+}
+
+namespace llvm {
+
+void fixConvergenceControl(Function *F) {
+ CycleInfo CI;
+ CI.compute(*F);
+
+ // F->dump();
+ // CI.dump();
+
+ enum DecisionTy { Delete, Replace };
+
+ // Can't use a DenseMap because we later insert while iterating.
+ std::map<CallBase *, DecisionTy> Decision;
+ SmallPtrSet<CallBase *, 4> NonIntrinsicUsers;
+
+ SmallVector<ConvergenceControlInst *> Worklist;
+ initializeTokenSources(Worklist, F);
+
+ while (!Worklist.empty()) {
+ ConvergenceControlInst *CB = Worklist.pop_back_val();
+ LLVM_DEBUG(llvm::dbgs() << "Visiting: " << *CB << "\n");
+ Cycle *CurrentCycle = CI.getCycle(CB->getParent());
+
+ for (Use &U : CB->uses()) {
+ auto *UserCB = cast<CallBase>(U.getUser());
+ if (auto *C = dyn_cast<ConvergenceControlInst>(UserCB)) {
+ Worklist.push_back(C);
+ continue;
+ }
+ Cycle *UserCycle = CI.getCycle(UserCB->getParent());
+ // A non-intrinsic user cannot use a token defined outside its own cycle.
+ if (UserCycle && !UserCycle->contains(CurrentCycle))
+ NonIntrinsicUsers.insert(UserCB);
+ }
+
+ if (!CurrentCycle)
+ continue;
+
+ // A loop intrinsic is no longer useful in two cases:
+ // 1. Its cycle became irreducible, or,
+ // 2. The cycle appears `rotated` in the CFG and the call is not in the
+ // header. This happens when a loop statement is unreachable via
+ // sequential control flow, but is jumped into by a goto or switch.
+ //
+ // We will be visiting its users later.
+ if (!CurrentCycle->isReducible() ||
+ CurrentCycle->getHeader() != CB->getParent()) {
+ LLVM_DEBUG(llvm::dbgs() << " Delete.\n");
+ Decision[CB] = Delete;
+ continue;
+ }
+
+ // A token use is valid only if the def is with the immediate parent. It's
+ // okay if the def is with a sibling, as long as the common parent is the
+ // immediate parent.
+ //
+ // The def can end up outside the parent when a goto forms a reducible cycle
+ // around a loop statement. Such a new reducible cycle does not itself have
+ // a heart.
+ ConvergenceControlInst *TokenUsed = CB->getConvergenceControlToken();
+ Cycle *DefCycle = CI.getCycle(TokenUsed->getParent());
+ assert(CurrentCycle == DefCycle || !CurrentCycle->contains(DefCycle));
+ Cycle *Parent = CurrentCycle->getParentCycle();
+ if (DefCycle != Parent && Parent && !Parent->contains(DefCycle)) {
+ // Don't overwrite if previous decision was to delete.
+ Decision.try_emplace(CB, Replace);
+ }
+ }
+
+ SmallVector<CallBase *> ToDelete;
+ // For deletion candidates, decide how to process each of the uses.
+ for (auto [CB, D] : Decision) {
+ if (D != Delete)
+ continue;
+ ToDelete.push_back(CB);
+
+ for (Use &U : CB->uses()) {
+ auto *ConvOp = cast<CallBase>(U.getUser());
+ // Users that are calls to the loop intrinsic can no longer use this as
+ // the parent token, so replace them with anchors.
+ if (auto *Child = dyn_cast<ConvergenceControlInst>(ConvOp)) {
+ // Don't overwrite if previous decision was to delete. Note that we are
+ // inserting while iterating over the std::map. It is possible that the
+ // newly inserted node is not visited, which is okay because we are only
+ // iterating over candidates mapped to ``Delete``.
+ Decision.try_emplace(Child, Replace);
+ continue;
+ }
+ // Other convergent users should be made non-converent.
+ NonIntrinsicUsers.insert(ConvOp);
+ }
+ }
+
+ for (auto [CB, D] : Decision) {
+ if (D != Replace)
+ continue;
+ Cycle *CurrentCycle = CI.getCycle(CB->getParent());
+ assert(CurrentCycle && CurrentCycle->isReducible());
+ LLVM_DEBUG(llvm::dbgs() << " Replace with anchor: " << *CB << "\n");
+ auto *Anchor = ConvergenceControlInst::CreateAnchor(*CB->getParent());
+ CB->replaceAllUsesWith(Anchor);
+ CB->eraseFromParent();
+ }
+
+ // Make all non-intrinsic users non-convergent. It would have been convenient
+ // to just strip the token and the ``convergent`` attribute, but attributes
+ // get checked on the callee too if they don't exist on the call. We could
+ // have set the ``noconvergent`` attribute if it existed. For now,
+ // equivalently, we replace the token with an anchor.
+ for (CallBase *CB : NonIntrinsicUsers) {
+ auto *Token = ConvergenceControlInst::CreateAnchor(*CB->getParent());
+ CB = setConvergenceControlToken(CB, Token);
+ LLVM_DEBUG(llvm::dbgs() << " Make non-convergent: " << *CB << "\n");
+ }
+
+ bool Changed = true;
+ while (Changed) {
+ Changed = false;
+ for (unsigned I = 0, E = ToDelete.size(); I != E; ++I) {
+ CallBase *CB = ToDelete[I];
+ if (CB) {
+ LLVM_DEBUG(llvm::dbgs() << "Try delete:\n" << *CB << "\n");
+ if (!CB->use_empty()) {
+ LLVM_DEBUG(llvm::dbgs() << " ... has pending use.\n");
+ continue;
+ }
+ LLVM_DEBUG(llvm::dbgs() << " ... deleted.\n");
+ CB->eraseFromParent();
+ ToDelete[I] = nullptr;
+ Changed = true;
+ }
+ }
+ }
+ LLVM_DEBUG(for (CallBase *CB : ToDelete) assert(!CB););
+
+ // F->dump();
+}
+
+} // end namespace llvm
More information about the llvm-branch-commits
mailing list