[Mlir-commits] [mlir] Update summary and description fields for Barrier Ops in NVVMOps.td (PR #156726)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Wed Sep 3 11:27:31 PDT 2025
https://github.com/smada3 created https://github.com/llvm/llvm-project/pull/156726
Several operations in the NVVM dialect were missing summaries and descriptions. This PR
adds summaries and descriptions for the following operations:
1. nvvm.bar.warp.sync
2. nvvm.barrier
3. nvvm.barrier0
4. nvvm.mbarrier.arrive
5. nvvm.mbarrier.arrive.expect_tx
6. nvvm.mbarrier.arrive.expect_tx.shared
7. nvvm.mbarrier.arrive.nocomplete
8. nvvm.mbarrier.arrive.nocomplete.shared
9. nvvm.mbarrier.arrive.shared
10. nvvm.mbarrier.init
11. nvvm.mbarrier.init.shared
12. nvvm.mbarrier.inval
13. nvvm.mbarrier.inval.shared
14. nvvm.mbarrier.test.wait
15. nvvm.mbarrier.test.wait.shared
16. nvvm.mbarrier.try_wait.parity
17. nvvm.mbarrier.try_wait.parity.shared
Documentation available here: [mbarrier](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier), [barrier](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier)
>From 5a30326effddfa17543224bfeed474c117ea99e4 Mon Sep 17 00:00:00 2001
From: Stefan Mada <smada at nvidia.com>
Date: Wed, 3 Sep 2025 17:57:54 +0000
Subject: [PATCH] Update summary and description fields for Barrier Ops in
NVVMOps.td
---
mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td | 263 +++++++++++++++++++-
1 file changed, 260 insertions(+), 3 deletions(-)
diff --git a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
index 9d93b4efe7a5b..5338bb2336ed9 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
@@ -494,6 +494,26 @@ def NVVM_PMEventOp : NVVM_PTXBuilder_Op<"pmevent">,
/// mbarrier.init instruction with generic pointer type
def NVVM_MBarrierInitOp : NVVM_PTXBuilder_Op<"mbarrier.init">,
Arguments<(ins LLVM_AnyPointer:$addr, I32:$count, PtxPredicate:$predicate)> {
+ let summary = "MBarrier Initialization Op";
+ let description = [{
+ The `nvvm.mbarrier.init` operation initializes an mbarrier object at the specified
+ memory location.
+
+ This operation initializes the mbarrier object with the following state:
+ - Current phase: 0
+ - Expected arrival count: `count`
+ - Pending arrival count: `count`
+ - Transaction count (tx-count): 0
+
+ The operation takes the following operands:
+ - `addr`: A pointer to the memory location of the mbarrier object. Uses generic
+ addressing, but the address must still be in the shared memory space.
+ - `count`: Integer specifying the number of threads that will participate in barrier
+ synchronization. Must be in the range [1, 2²⁰ - 1].
+ - `predicate`: Optional predicate for conditional execution.
+
+ [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-init)
+ }];
string llvmBuilder = [{
createIntrinsicCall(builder, llvm::Intrinsic::nvvm_mbarrier_init, {$addr, $count});
}];
@@ -509,6 +529,13 @@ def NVVM_MBarrierInitOp : NVVM_PTXBuilder_Op<"mbarrier.init">,
/// mbarrier.init instruction with shared pointer type
def NVVM_MBarrierInitSharedOp : NVVM_PTXBuilder_Op<"mbarrier.init.shared", [NVVMRequiresSM<80>, DeclareOpInterfaceMethods<BasicPtxBuilderOpInterface>]>,
Arguments<(ins LLVM_PointerShared:$addr, I32:$count, PtxPredicate:$predicate)> {
+ let summary = "Shared MBarrier Initialization Op";
+ let description = [{
+ This Op is the same as the generic memory variant except that the mbarrier object
+ should be accessed using a shared-memory pointer instead of a generic-memory pointer.
+
+ [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-init)
+ }];
string llvmBuilder = [{
createIntrinsicCall(builder, llvm::Intrinsic::nvvm_mbarrier_init_shared, {$addr, $count});
}];
@@ -521,6 +548,21 @@ def NVVM_MBarrierInitSharedOp : NVVM_PTXBuilder_Op<"mbarrier.init.shared", [NVVM
def NVVM_MBarrierInvalOp : NVVM_Op<"mbarrier.inval">,
Arguments<(ins LLVM_AnyPointer:$addr)> {
+ let summary = "MBarrier Invalidation Operation";
+ let description = [{
+ The `nvvm.mbarrier.inval` operation invalidates an mbarrier object at the
+ specified memory location.
+
+ This operation marks the mbarrier object as invalid, making it safe to repurpose
+ the memory location for other uses or to reinitialize it as a new mbarrier object.
+ It is undefined behavior if the mbarrier object is already invalid.
+
+ The operation takes the following operand:
+ - `addr`: A pointer to the memory location of the mbarrier object. Uses generic
+ addressing, but the address must still be in the shared memory space.
+
+ [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-inval)
+ }];
string llvmBuilder = [{
createIntrinsicCall(builder, llvm::Intrinsic::nvvm_mbarrier_inval, {$addr});
}];
@@ -529,6 +571,13 @@ def NVVM_MBarrierInvalOp : NVVM_Op<"mbarrier.inval">,
def NVVM_MBarrierInvalSharedOp : NVVM_Op<"mbarrier.inval.shared">,
Arguments<(ins LLVM_PointerShared:$addr)> {
+ let summary = "Shared MBarrier Invalidation Operation";
+ let description = [{
+ This Op is the same as the generic memory variant except that the mbarrier object
+ should be accessed using a shared-memory pointer instead of a generic-memory pointer.
+
+ [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-inval)
+ }];
string llvmBuilder = [{
createIntrinsicCall(builder, llvm::Intrinsic::nvvm_mbarrier_inval_shared, {$addr});
}];
@@ -538,6 +587,22 @@ def NVVM_MBarrierInvalSharedOp : NVVM_Op<"mbarrier.inval.shared">,
def NVVM_MBarrierArriveOp : NVVM_Op<"mbarrier.arrive">,
Results<(outs LLVM_Type:$res)>,
Arguments<(ins LLVM_AnyPointer:$addr)> {
+ let summary = "MBarrier Arrive Operation";
+ let description = [{
+ The `nvvm.mbarrier.arrive` operation performs an arrive-on operation on the
+ mbarrier object at the specified address. Uses the default `.release.cta` semantics.
+
+ This operation causes the executing thread to signal its arrival at the barrier.
+ The operation returns an opaque 64-bit value that captures the phase of the
+ mbarrier object prior to the arrive-on operation. The contents of this state
+ value are implementation-specific.
+
+ The operation takes the following operand:
+ - `addr`: A pointer to the memory location of the mbarrier object. Uses generic
+ addressing, but the address must still be in the shared memory space.
+
+ [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-arrive)
+ }];
string llvmBuilder = [{
$res = createIntrinsicCall(builder, llvm::Intrinsic::nvvm_mbarrier_arrive, {$addr});
}];
@@ -547,6 +612,13 @@ def NVVM_MBarrierArriveOp : NVVM_Op<"mbarrier.arrive">,
def NVVM_MBarrierArriveSharedOp : NVVM_Op<"mbarrier.arrive.shared">,
Results<(outs LLVM_Type:$res)>,
Arguments<(ins LLVM_PointerShared:$addr)> {
+ let summary = "Shared MBarrier Arrive Operation";
+ let description = [{
+ This Op is the same as the generic memory variant except that the mbarrier object
+ should be accessed using a shared-memory pointer instead of a generic-memory pointer.
+
+ [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-arrive)
+ }];
string llvmBuilder = [{
$res = createIntrinsicCall(builder, llvm::Intrinsic::nvvm_mbarrier_arrive_shared, {$addr});
}];
@@ -556,6 +628,25 @@ def NVVM_MBarrierArriveSharedOp : NVVM_Op<"mbarrier.arrive.shared">,
def NVVM_MBarrierArriveNocompleteOp : NVVM_Op<"mbarrier.arrive.nocomplete">,
Results<(outs LLVM_Type:$res)>,
Arguments<(ins LLVM_AnyPointer:$addr, I32:$count)> {
+ let summary = "MBarrier Arrive No-Complete Operation";
+ let description = [{
+ The `nvvm.mbarrier.arrive.nocomplete` operation performs an arrive-on operation
+ on the mbarrier object with the guarantee that it will not cause the barrier to
+ complete its current phase. Uses the default `.release.cta` semantics.
+
+ This operation causes the executing thread to signal its arrival at the barrier
+ with a specified count, but ensures that the barrier phase will not complete as
+ a result of this operation. The operation returns an opaque 64-bit value that
+ captures the phase of the mbarrier object prior to the arrive-on operation.
+
+ The operation takes the following operands:
+ - `addr`: A pointer to the memory location of the mbarrier object. Uses generic
+ addressing, but the address must still be in the shared memory space.
+ - `count`: Integer specifying the count argument to the arrive-on operation.
+ Must be in the valid range as specified in the mbarrier object contents.
+
+ [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-arrive)
+ }];
string llvmBuilder = [{
$res = createIntrinsicCall(builder, llvm::Intrinsic::nvvm_mbarrier_arrive_noComplete, {$addr, $count});
}];
@@ -565,6 +656,13 @@ def NVVM_MBarrierArriveNocompleteOp : NVVM_Op<"mbarrier.arrive.nocomplete">,
def NVVM_MBarrierArriveNocompleteSharedOp : NVVM_Op<"mbarrier.arrive.nocomplete.shared">,
Results<(outs LLVM_Type:$res)>,
Arguments<(ins LLVM_PointerShared:$addr, I32:$count)> {
+ let summary = "Shared MBarrier Arrive No-Complete Operation";
+ let description = [{
+ This Op is the same as the generic memory variant except that the mbarrier object
+ should be accessed using a shared-memory pointer instead of a generic-memory pointer.
+
+ [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-arrive)
+ }];
string llvmBuilder = [{
$res = createIntrinsicCall(builder, llvm::Intrinsic::nvvm_mbarrier_arrive_noComplete_shared, {$addr, $count});
}];
@@ -573,6 +671,26 @@ def NVVM_MBarrierArriveNocompleteSharedOp : NVVM_Op<"mbarrier.arrive.nocomplete.
def NVVM_MBarrierArriveExpectTxOp : NVVM_PTXBuilder_Op<"mbarrier.arrive.expect_tx">,
Arguments<(ins LLVM_AnyPointer:$addr, I32:$txcount, PtxPredicate:$predicate)> {
+ let summary = "MBarrier Arrive with Expected Transaction Count";
+ let description = [{
+ The `nvvm.mbarrier.arrive.expect_tx` operation performs an expect-tx operation
+ followed by an arrive-on operation on the mbarrier object. Uses the default
+ `.release.cta` semantics.
+
+ This operation first performs an expect-tx operation with the specified transaction
+ count, then performs an arrive-on operation with an implicit count of 1. The
+ expect-tx operation updates the expected transaction count for the barrier.
+
+ The operation takes the following operands:
+ - `addr`: A pointer to the memory location of the mbarrier object. Uses generic
+ addressing, but the address must still be in the shared memory space.
+ - `txcount`: An unsigned integer specifying the expected transaction count
+ for the expect-tx operation. This represents the number of asynchronous transactions
+ expected to complete before the barrier phase completes.
+ - `predicate`: Optional predicate for conditional execution.
+
+ [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-arrive)
+ }];
let assemblyFormat = "$addr `,` $txcount (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)";
let extraClassDefinition = [{
std::string $cppClass::getPtx() { return std::string("mbarrier.arrive.expect_tx.b64 _, [%0], %1;"); }
@@ -580,7 +698,14 @@ def NVVM_MBarrierArriveExpectTxOp : NVVM_PTXBuilder_Op<"mbarrier.arrive.expect_t
}
def NVVM_MBarrierArriveExpectTxSharedOp : NVVM_PTXBuilder_Op<"mbarrier.arrive.expect_tx.shared">,
- Arguments<(ins LLVM_PointerShared:$addr, I32:$txcount, PtxPredicate:$predicate)> {
+ Arguments<(ins LLVM_PointerShared:$addr, I32:$txcount, PtxPredicate:$predicate)> {
+ let summary = "Shared MBarrier Arrive with Expected Transaction Count";
+ let description = [{
+ This Op is the same as the generic memory variant except that the mbarrier object
+ should be accessed using a shared-memory pointer instead of a generic-memory pointer.
+
+ [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-arrive)
+ }];
let assemblyFormat = "$addr `,` $txcount (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)";
let extraClassDefinition = [{
std::string $cppClass::getPtx() { return std::string("mbarrier.arrive.expect_tx.shared.b64 _, [%0], %1;"); }
@@ -588,7 +713,36 @@ def NVVM_MBarrierArriveExpectTxSharedOp : NVVM_PTXBuilder_Op<"mbarrier.arrive.ex
}
def NVVM_MBarrierTryWaitParityOp : NVVM_PTXBuilder_Op<"mbarrier.try_wait.parity">,
- Arguments<(ins LLVM_AnyPointer:$addr, I32:$phase, I32:$ticks)> {
+ Arguments<(ins LLVM_AnyPointer:$addr, I32:$phase, I32:$ticks)> {
+ let summary = "MBarrier Potentially-Blocking Try Wait with Phase Parity";
+ let description = [{
+ The `nvvm.mbarrier.try_wait.parity` operation performs a potentially-blocking
+ test for the completion of a specific phase of an mbarrier object using phase
+ parity. It uses the default `.acquire.cta` semantics.
+
+ This operation waits for the completion of the mbarrier phase indicated by the
+ phase parity. While it uses the underlying PTX `mbarrier.try_wait.parity`
+ instruction, this MLIR operation generates a loop that enforces the test to
+ complete before continuing execution, ensuring the barrier phase is actually
+ completed rather than potentially timing out.
+
+ The operation takes the following operands:
+ - `addr`: A pointer to the memory location of the mbarrier object. Uses generic
+ addressing, but the address must still be in the shared memory space.
+ - `phase`: An integer specifying the phase parity (0 or 1). Even phases
+ have parity 0, odd phases have parity 1.
+ - `ticks`: An unsigned integer specifying the suspend time hint in
+ nanoseconds. This may be used instead of the system-dependent time limit.
+
+ **Implementation behavior**:
+ This operation generates a PTX loop that repeatedly calls the underlying
+ `mbarrier.try_wait.parity` instruction until the barrier phase completes.
+ Unlike the raw PTX instruction which may return without completion after a
+ timeout, this MLIR operation guarantees completion by continuing to loop until
+ the specified phase is reached.
+
+ [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-mbarrier-test-wait-try-wait)
+ }];
let assemblyFormat = "$addr `,` $phase `,` $ticks attr-dict `:` type(operands)";
let extraClassDefinition = [{
std::string $cppClass::getPtx() {
@@ -607,7 +761,14 @@ def NVVM_MBarrierTryWaitParityOp : NVVM_PTXBuilder_Op<"mbarrier.try_wait.parity"
}
def NVVM_MBarrierTryWaitParitySharedOp : NVVM_PTXBuilder_Op<"mbarrier.try_wait.parity.shared">,
- Arguments<(ins LLVM_PointerShared:$addr, I32:$phase, I32:$ticks)> {
+ Arguments<(ins LLVM_PointerShared:$addr, I32:$phase, I32:$ticks)> {
+ let summary = "Shared MBarrier Potentially-Blocking Try Wait with Phase Parity";
+ let description = [{
+ This Op is the same as the generic memory variant except that the mbarrier object
+ should be accessed using a shared-memory pointer instead of a generic-memory pointer.
+
+ [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-mbarrier-test-wait-try-wait)
+ }];
let assemblyFormat = "$addr `,` $phase `,` $ticks attr-dict `:` type(operands)";
let extraClassDefinition = [{
std::string $cppClass::getPtx() {
@@ -628,6 +789,30 @@ def NVVM_MBarrierTryWaitParitySharedOp : NVVM_PTXBuilder_Op<"mbarrier.try_wait.p
def NVVM_MBarrierTestWaitOp : NVVM_Op<"mbarrier.test.wait">,
Results<(outs LLVM_Type:$res)>,
Arguments<(ins LLVM_AnyPointer:$addr, LLVM_Type:$state)> {
+ let summary = "MBarrier Non-Blocking Test Wait Operation";
+ let description = [{
+ The `nvvm.mbarrier.test.wait` operation performs a non-blocking test for the
+ completion of a specific phase of an mbarrier object. It uses the default
+ `.acquire.cta` semantics.
+
+ This operation tests whether the mbarrier phase specified by the state operand
+ has completed. It is a non-blocking instruction that immediately returns the
+ completion status without suspending the executing thread.
+
+ The operation takes the following operands:
+ - `addr`: A pointer to the memory location of the mbarrier object. Uses generic
+ addressing, but the address must still be in the shared memory space.
+ - `state`: An opaque value returned by a previous `mbarrier.arrive`
+ operation on the same mbarrier object during the current or immediately
+ preceding phase.
+
+ The operation returns a boolean value indicating whether the specified phase
+ has completed:
+ - `true`: The immediately preceding phase has completed
+ - `false`: The phase is still incomplete (current phase)
+
+ [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-mbarrier-test-wait-try-wait)
+ }];
string llvmBuilder = [{
$res = createIntrinsicCall(builder, llvm::Intrinsic::nvvm_mbarrier_test_wait, {$addr, $state});
}];
@@ -637,6 +822,13 @@ def NVVM_MBarrierTestWaitOp : NVVM_Op<"mbarrier.test.wait">,
def NVVM_MBarrierTestWaitSharedOp : NVVM_Op<"mbarrier.test.wait.shared">,
Results<(outs LLVM_Type:$res)>,
Arguments<(ins LLVM_PointerShared:$addr, LLVM_Type:$state)> {
+ let summary = "Shared MBarrier Non-Blocking Test Wait Operation";
+ let description = [{
+ This Op is the same as the generic memory variant except that the mbarrier object
+ should be accessed using a shared-memory pointer instead of a generic-memory pointer.
+
+ [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-mbarrier-test-wait-try-wait)
+ }];
string llvmBuilder = [{
$res = createIntrinsicCall(builder, llvm::Intrinsic::nvvm_mbarrier_test_wait_shared, {$addr, $state});
}];
@@ -648,6 +840,15 @@ def NVVM_MBarrierTestWaitSharedOp : NVVM_Op<"mbarrier.test.wait.shared">,
//===----------------------------------------------------------------------===//
def NVVM_Barrier0Op : NVVM_Op<"barrier0"> {
+ let summary = "CTA Barrier Synchronization Op (Barrier ID 0)";
+ let description = [{
+ The `nvvm.barrier0` operation is a convenience operation that performs barrier
+ synchronization and communication within a CTA (Cooperative Thread Array) using
+ barrier ID 0. It is functionally equivalent to `nvvm.barrier` with `barrierId = 0`.
+
+ [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-barrier)
+ }];
+
let assemblyFormat = "attr-dict";
string llvmBuilder = [{
createIntrinsicCall(
@@ -657,6 +858,34 @@ def NVVM_Barrier0Op : NVVM_Op<"barrier0"> {
}
def NVVM_BarrierOp : NVVM_Op<"barrier", [AttrSizedOperandSegments]> {
+ let summary = "CTA Barrier Synchronization Op";
+ let description = [{
+ The `nvvm.barrier` operation performs barrier synchronization and communication
+ within a CTA (Cooperative Thread Array). It causes executing threads to wait for
+ all non-exited threads participating in the barrier to arrive.
+
+ The operation takes two optional operands:
+
+ - `barrierId`: Specifies a logical barrier resource with value 0 through 15.
+ Each CTA instance has sixteen barriers numbered 0..15. Defaults to 0 if not specified.
+ - `numberOfThreads`: Specifies the number of threads participating in the barrier.
+ When specified, the value must be a multiple of the warp size. If not specified,
+ all threads in the CTA participate in the barrier.
+
+ The barrier operation guarantees that when the barrier completes, prior memory
+ accesses requested by participating threads are performed relative to all threads
+ participating in the barrier. It also ensures that no new memory access is
+ requested by participating threads before the barrier completes.
+
+ When a barrier completes, the waiting threads are restarted without delay, and
+ the barrier is reinitialized so that it can be immediately reused.
+
+ This operation generates the `.aligned` version of the PTX barrier instruction,
+ indicating that all threads in the CTA execute the same barrier instruction.
+
+ [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-barrier)
+ }];
+
let arguments = (ins
Optional<I32>:$barrierId,
Optional<I32>:$numberOfThreads);
@@ -1046,6 +1275,34 @@ def NVVM_VoteSyncOp
def NVVM_SyncWarpOp :
NVVM_Op<"bar.warp.sync">,
Arguments<(ins LLVM_Type:$mask)> {
+ let summary = "Warp Barrier Synchronization Op";
+ let description = [{
+ The `nvvm.bar.warp.sync` operation performs barrier synchronization for threads
+ within a warp.
+
+ This operation causes the executing thread to wait until all threads corresponding
+ to the `mask` operand have executed a `bar.warp.sync` with the same mask value
+ before resuming execution.
+
+ The `mask` operand specifies a 32-bit integer mask indicating threads participating
+ in the barrier, where each bit position corresponds to the thread's lane ID within
+ the warp. Only threads with their corresponding bit set in the mask participate
+ in the barrier synchronization.
+
+ **Important constraints**:
+ - The behavior is undefined if the executing thread is not included in the mask
+ (i.e., the bit corresponding to the thread's lane ID is not set)
+ - For compute capability sm_6x or below, all threads in the mask must execute
+ the same `bar.warp.sync` instruction in convergence
+
+ This operation also guarantees memory ordering among participating threads.
+ Threads within the warp that wish to communicate via memory can store to memory,
+ execute `bar.warp.sync`, and then safely read values stored by other threads
+ in the warp.
+
+ [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-bar-warp-sync)
+ }];
+
string llvmBuilder = [{
createIntrinsicCall(builder, llvm::Intrinsic::nvvm_bar_warp_sync, {$mask});
}];
More information about the Mlir-commits
mailing list