[PATCH] D115302: [AA] Teach AA about convergent instrinsics that affect loads/stores.
Artem Belevich via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 7 16:03:51 PST 2021
tra created this revision.
tra added reviewers: efriedma, nikic.
Herald added subscribers: jeroen.dobbelaere, bixia, hiraditya.
tra requested review of this revision.
Herald added a project: LLVM.
We do not have a good way to represent memory accesses that may happen from
other threads on a GPU. Convergent intrinsics that may modify unspecified memory
is the best proxy we have at the moment. Treat such intrinsics as ModRef.
This fixes a miscompile in NVPTX that's been exposed by
https://github.com/llvm/llvm-project/commit/fa5d31f825699b0fe858d4f432bd3fbbbec523c8
and which resulted in shared memory stores/loads being incorrectly eliminated.
https://lists.llvm.org/pipermail/llvm-dev/2021-November/154060.html
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D115302
Files:
llvm/lib/Analysis/AliasAnalysis.cpp
llvm/test/Analysis/GlobalsModRef/intrinsic_convergent.ll
Index: llvm/test/Analysis/GlobalsModRef/intrinsic_convergent.ll
===================================================================
--- /dev/null
+++ llvm/test/Analysis/GlobalsModRef/intrinsic_convergent.ll
@@ -0,0 +1,33 @@
+; RUN: opt -globals-aa -gvn -S < %s | FileCheck %s
+target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
+target triple = "nvptx64-nvidia-cuda"
+
+ at s = internal local_unnamed_addr addrspace(3) global i32 undef, align 4
+
+; CHECK-LABEL: @bar_sync
+; CHECK: store
+; CHECK: tail call void @llvm.nvvm.bar.sync(i32 0)
+; CHECK: load
+define dso_local i32 @bar_sync(i32 %0) local_unnamed_addr {
+ store i32 %0, i32* addrspacecast (i32 addrspace(3)* @s to i32*), align 4
+ tail call void @llvm.nvvm.bar.sync(i32 0)
+ %2 = load i32, i32* addrspacecast (i32 addrspace(3)* @s to i32*), align 4
+ ret i32 %2
+}
+
+declare void @llvm.nvvm.bar.sync(i32) #0
+
+; CHECK-LABEL: @barrier0
+; CHECK: store
+; CHECK: tail call void @llvm.nvvm.barrier0()
+; CHECK: load
+define dso_local i32 @barrier0(i32 %0) local_unnamed_addr {
+ store i32 %0, i32* addrspacecast (i32 addrspace(3)* @s to i32*), align 4
+ tail call void @llvm.nvvm.barrier0()
+ %2 = load i32, i32* addrspacecast (i32 addrspace(3)* @s to i32*), align 4
+ ret i32 %2
+}
+
+declare void @llvm.nvvm.barrier0() #0
+
+attributes #0 = { convergent nounwind }
Index: llvm/lib/Analysis/AliasAnalysis.cpp
===================================================================
--- llvm/lib/Analysis/AliasAnalysis.cpp
+++ llvm/lib/Analysis/AliasAnalysis.cpp
@@ -42,6 +42,7 @@
#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"
+#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Module.h"
#include "llvm/IR/Type.h"
#include "llvm/IR/Value.h"
@@ -226,6 +227,18 @@
AAQueryInfo &AAQI) {
ModRefInfo Result = ModRefInfo::ModRef;
+ // Unless convergent intrinsic carries explicit memory access attributes,
+ // assume that it may touch anything.
+ // TODO: This is overly conservative. We may need to query TTI whether
+ // particular intrinsic really affects the given memory location.
+ // E.g. some intrinsics may affect only specific address spaces.
+ if (Call->isConvergent() && isa<IntrinsicInst>(Call)) {
+ Function *F = Call->getCalledFunction();
+ if (!(F->onlyReadsMemory() || F->doesNotAccessMemory() ||
+ F->onlyAccessesArgMemory()))
+ return ModRefInfo::ModRef;
+ }
+
for (const auto &AA : AAs) {
Result = intersectModRef(Result, AA->getModRefInfo(Call, Loc, AAQI));
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D115302.392580.patch
Type: text/x-patch
Size: 2590 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20211208/6bbb6093/attachment.bin>
More information about the llvm-commits
mailing list