[clang] [llvm] [AMDGPU] Introduce asyncmark/wait intrinsics (PR #180467)
Sameer Sahasrabuddhe via cfe-commits
cfe-commits at lists.llvm.org
Wed Feb 11 23:07:49 PST 2026
================
@@ -2785,6 +2938,84 @@ bool WaitcntBrackets::mergeScore(const MergeInfo &M, unsigned &Score,
return OtherShifted > MyShifted;
}
+bool WaitcntBrackets::mergeAsyncMarks(ArrayRef<MergeInfo> MergeInfos,
+ ArrayRef<CounterValueArray> OtherMarks) {
+ bool StrictDom = false;
+
+ LLVM_DEBUG(dbgs() << "Merging async marks ...");
+ // Early exit: both empty
+ if (AsyncMarks.empty() && OtherMarks.empty()) {
+ LLVM_DEBUG(dbgs() << " nothing to merge\n");
+ return false;
+ }
+ LLVM_DEBUG(dbgs() << '\n');
+
+ // Determine maximum length needed after merging
+ auto MaxSize = (unsigned)std::max(AsyncMarks.size(), OtherMarks.size());
+
+ // For each backedge in isolation, the algorithm reachs a fixed point after
+ // the first call to merge(). This is unchanged even with the AsyncMarks
+ // array because we call mergeScore just like the other cases.
+ //
+ // But in the rare pathological case, a nest of loops that pushes marks
+ // without waiting on any mark can cause AsyncMarks to grow very large. We cap
+ // it to a reasonable limit. We can tune this later or potentially introduce a
+ // user option to control the value.
+ MaxSize = std::min(MaxSize, MaxAsyncMarks);
+
+ // Keep only the most recent marks within our limit.
+ if (AsyncMarks.size() > MaxSize)
+ AsyncMarks.erase(AsyncMarks.begin(),
+ AsyncMarks.begin() + (AsyncMarks.size() - MaxSize));
+
+ // Pad with zero-filled marks if our list is shorter. Zero represents "no
+ // pending async operations at this checkpoint" and acts as the identity
+ // element for max() during merging. We pad at the beginning since the marks
+ // need to be aligned in most-recent order.
+ CounterValueArray ZeroMark{};
+ AsyncMarks.insert(AsyncMarks.begin(), MaxSize - AsyncMarks.size(), ZeroMark);
+
+ LLVM_DEBUG({
+ dbgs() << "Before merge:\n";
+ for (const auto &Mark : AsyncMarks) {
+ llvm::interleaveComma(Mark, dbgs());
+ dbgs() << '\n';
+ }
+ });
+
+ LLVM_DEBUG({
+ dbgs() << "Other marks:\n";
+ for (const auto &Mark : OtherMarks) {
+ llvm::interleaveComma(Mark, dbgs());
+ dbgs() << '\n';
+ }
+ });
+
+ // Merge element-wise using the existing mergeScore function and the
+ // appropriate MergeInfo for each counter type. Iterate only while we have
+ // elements in both vectors.
+ unsigned OtherSize = OtherMarks.size();
+ unsigned OurSize = AsyncMarks.size();
+ unsigned MergeCount = std::min(OtherSize, OurSize);
+ assert(OurSize == MaxSize);
+ for (unsigned Idx = 1; Idx <= MergeCount; ++Idx) {
----------------
ssahasra wrote:
Done in #181095.
https://github.com/llvm/llvm-project/pull/180467
More information about the cfe-commits
mailing list