[llvm] [AMDGPU] Add IR-level pass to rewrite away address space 7 (PR #77952)
Piotr Sobczak via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 30 22:53:22 PST 2024
================
@@ -0,0 +1,1983 @@
+//===-- AMDGPULowerBufferFatPointers.cpp ---------------------------=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This pass lowers operations on buffer fat pointers (addrspace 7) to
+// operations on buffer resources (addrspace 8) and is needed for correct
+// codegen.
+//
+// # Background
+//
+// Address space 7 (the buffer fat pointer) is a 160-bit pointer that consists
+// of a 128-bit buffer descriptor and a 32-bit offset into that descriptor.
+// The buffer resource part needs to be it needs to be a "raw" buffer resource
+// (it must have a stride of 0 and bounds checks must be in raw buffer mode
+// or disabled).
+//
+// When these requirements are met, a buffer resource can be treated as a
+// typical (though quite wide) pointer that follows typical LLVM pointer
+// semantics. This allows the frontend to reason about such buffers (which are
+// often encountered in the context of SPIR-V kernels).
+//
+// However, because of their non-power-of-2 size, these fat pointers cannot be
+// present during translation to MIR (though this restriction may be lifted
+// during the transition to GlobalISel). Therefore, this pass is needed in order
+// to correctly implement these fat pointers.
+//
+// The resource intrinsics take the resource part (the address space 8 pointer)
+// and the offset part (the 32-bit integer) as separate arguments. In addition,
+// many users of these buffers manipulate the offset while leaving the resource
+// part alone. For these reasons, we want to typically separate the resource
+// and offset parts into separate variables, but combine them together when
+// encountering cases where this is required, such as by inserting these values
+// into aggretates or moving them to memory.
+//
+// Therefore, at a high level, `ptr addrspace(7) %x` becomes `ptr addrspace(8)
+// %x.rsrc` and `i32 %x.off`, which will be combined into `{ptr addrspace(8),
+// i32} %x = {%x.rsrc, %x.off}` if needed. Similarly, `vector<Nxp7>` becomes
+// `{vector<Nxp8>, vector<Nxi32 >}` and its component parts.
+//
+// # Implementation
+//
+// This pass proceeds in three main phases:
+//
+// ## Rewriting loads and stores of p7
+//
+// The first phase is to rewrite away all loads and stors of `ptr addrspace(7)`,
+// including aggregates containing such pointers, to ones that use `i160`. This
+// is handled by `StoreFatPtrsAsIntsVisitor` , which visits loads, stores, and
+// allocas and, if the loaded or stored type contains `ptr addrspace(7)`,
+// rewrites that type to one where the p7s are replaced by i160s, copying other
+// parts of aggregates as needed. In the case of a store, each pointer is
+// `ptrtoint`d to i160 before storing, and load integers are `inttoptr`d back.
+// This same transformation is applied to vectors of pointers.
+//
+// Such a transformation allows the later phases of the pass to not need
+// to handle buffer fat pointers moving to and from memory, where we load
+// have to handle the incompatibility between a `{Nxp8, Nxi32}` representation
+// and `Nxi60` directly. Instead, that transposing action (where the vectors
+// of resources and vectors of offsets are concatentated before being stored to
+// memory) are handled through implementing `inttoptr` and `ptrtoint` only.
+//
+// Atomics operations on `ptr addrspace(7)` values are not suppported, as the
+// hardware does not include a 160-bit atomic.
+//
+// ## Type remapping
+//
+// We use a `ValueMapper` to mangle uses of [vectors of] buffer fat pointers
+// to the corresponding struct type, which has a resource part and an offset
+// part.
+//
+// This uses a `BufferFatPtrToStructTypeMap` and a `FatPtrConstMaterializer`
+// to, usually by way of `setType`ing values. Constants are handled here
+// because there isn't a good way to fix them up later.
+//
+// This has the downside of leaving the IR in an invalid state (for example,
+// the instruction `getelementptr {ptr addrspace(8), i32} %p, ...` will exist),
+// but all such invalid states will be resolved by the third phase.
+//
+// Functions that don't take buffer fat pointers are modified in place. Those
+// that do take such pointers have their basic blocks moved to a new function
+// with arguments that are {ptr addrspace(8), i32} arguments and return values.
+// This phase also records intrinsics so that they can be remangled or deleted
+// later.
+//
+//
+// ## Splitting pointer structs
+//
+// The meat of this pass consists of defining semantics for operations that
+// produce or consume [vectors of] buffer fat pointers in terms of their
+// resource and offset parts. This is accomplished throgh the `SplitPtrStructs`
+// visitor.
+//
+// In the first pass through each function that is being lowered, the splitter
+// inserts new instructions to implement the split-structures behavior, which is
+// needed for correctness and performance. It records a list of "split users",
+// instructions that are being replaced by operations on the resource and offset
+// parts.
+//
+// Split users do not necessarily need to produce parts themselves (
+// a `load float, ptr addrspace(7)` does not, for example), but, if they do not
+// generate fat buffer pointers, they must RAUW in their replacement
+// instructions during the initial visit.
+//
+// When these new instructions are created, they use the split parts recorded
+// for their initial arguments in order to generate their replacements, creating
+// a parallel set of instructions that does not refer to the original fat
+// pointer values but instead to their resource and offset components.
+//
+// Instructions, such as `extractvalue`, that produce buffer fat pointers from
+// sources that do not have split parts, have such parts generated using
+// `extractvalue`. This is also the initial handling of PHI nodes, which
+// are then cleaned up.
+//
+// ### Conditionals
+//
+// PHI nodes are initially given resource parts via `extractvalue`. However,
+// this is not an efficient rewrite of such nodes, as, in most cases, the
+// resource part in a conditional or loop remains constant throughout the loop
+// and only the offset varies. Failing to optimize away these constant resources
+// would cause additional registers to be sent around loops and might lead to
+// waterfall loops being generated for buffer operations due to the
+// "non-uniform" resource argument.
+//
+// Therefore, after all instructions have been visited, the pointer splitter
+// post-processes all encountered conditionals. Given a PHI node or select,
+// getPossibleRsrcRoots() collects all values that the resource parts of that
+// conditional's input could come from as well as collecting all conditional
+// instructions encountered during the search. If, after filtering out the
+// initial node itself, the set of encountered conditionals is a subset of the
+// potential roots and there is a single potential resource that isn't in the
+// conditional set, that value is the only possible value the resource argument
+// could have throughout the control flow.
+//
+// If that condition is met, then a PHI node can have its resource part changed
+// to the singleton value and then be replaced by a PHI on the offsets.
+// Otherwise, each PHI node is split into two, one for the resource part and one
+// for the offset part, which replace the temporary `extractvalue` instructions
+// that were added during the first pass.
+//
+// Similar logic applies to `select`, where
+// `%z = select i1 %cond, %cond, ptr addrspace(7) %x, ptr addrspace(7) %y`
+// can be split into `%z.rsrc = %x.rsrc` and
+// `%z.off = select i1 %cond, ptr i32 %x.off, i32 %y.off`
+// if both `%x` and `%y` have the same resource part, but two `select`
+// operations will be needed if they do not.
+//
+// ### Final processing
+//
+// After conditionals have been cleaned up, the IR for each function is
+// rewritten to remove all the old instructions that have been split up.
+//
+// Any instruction that used to produce a buffer fat pointer (and therefore now
+// produces a resource-and-offset struct after type remapping) is
+// replaced as follows:
+// 1. All debug value annotations are cloned to reflect that the resource part
+// and offset parts are computed separately and constitute different
+// fragments of the underlying source language variable.
+// 2. All uses that were themselves split are replaced by a `poison` of the
+// struct type, as they will themselves be erased soon. This rule, combined
+// with debug handling, should leave the use lists of split instructions
+// empty in almost all cases.
+// 3. If a user of the original struct-valued result remains, the structure
+// needed for the new types to work is constructed out of the newly-defined
+// parts, and the original instruction is replaced by this structure
+// before being erased. Instructions requiring this construction include
+// `ret` and `insertvalue`.
+//
+// # Consequences
+//
+// This pass does not alter the CFG.
+//
+// Alias analysis information will become coarser, as the LLVM alias analyzer
+// cannot handle the buffer intrinsics. Specifically, while we can determine
+// that the following two loads do not alias:
+// ```
+// %y = getelementptr i32, ptr addrspace(7) %x, i32 1
+// %a = load i32, ptr addrspace(7) %x
+// %b = load i32, ptr addrspace(7) %y
+// ```
+// we cannot (except through some code that runs during scheduling) determine
+// that the rewritten loads below do not alias.
+// ```
+// %y.off = add i32 %x.off, 1
+// %a = call @llvm.amdgcn.raw.ptr.buffer.load(ptr addrspace(8) %x.rsrc, i32
+// %x.off, ...)
+// %b = call @llvm.amdgcn.raw.ptr.buffer.load(ptr addrspace(8)
+// %x.rsrc, i32 %y.off, ...)
+// ```
+// However, existing alias information is preserved.
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPU.h"
+#include "AMDGPUTargetMachine.h"
+#include "GCNSubtarget.h"
+#include "SIDefines.h"
+#include "llvm/ADT/SetOperations.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/Analysis/ConstantFolding.h"
+#include "llvm/CodeGen/TargetPassConfig.h"
+#include "llvm/IR/AttributeMask.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/DebugInfo.h"
+#include "llvm/IR/DerivedTypes.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/InstIterator.h"
+#include "llvm/IR/InstVisitor.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/Intrinsics.h"
+#include "llvm/IR/IntrinsicsAMDGPU.h"
+#include "llvm/IR/Metadata.h"
+#include "llvm/IR/Operator.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/Support/AtomicOrdering.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/ErrorHandling.h"
+#include "llvm/Transforms/Utils/Cloning.h"
+#include "llvm/Transforms/Utils/Local.h"
+#include "llvm/Transforms/Utils/ValueMapper.h"
+
+#define DEBUG_TYPE "amdgpu-lower-buffer-fat-pointers"
+
+using namespace llvm;
+
+static constexpr unsigned BufferOffsetWidth = 32;
+
+namespace {
+/// Recursively replace instances of ptr addrspace(7) and vector<Nxptr
+/// addrspace(7)> with some other type as defined by the relevant subclass.
+class BufferFatPtrTypeLoweringBase : public ValueMapTypeRemapper {
+ DenseMap<Type *, Type *> Map;
+
+ Type *remapTypeImpl(Type *Ty, SmallPtrSetImpl<StructType *> &Seen);
+
+protected:
+ virtual Type *remapScalar(PointerType *PT) = 0;
+ virtual Type *remapVector(VectorType *VT) = 0;
+
+ const DataLayout &DL;
+
+public:
+ BufferFatPtrTypeLoweringBase(const DataLayout &DL) : DL(DL) {}
+ Type *remapType(Type *SrcTy) override;
+ void clear() { Map.clear(); }
+};
+
+/// Remap ptr addrspace(7) to i160 and vector<Nxptr addrspace(7)> to
+/// vector<Nxi60> in order to correctly handling loading/storing these values
+/// from memory.
+class BufferFatPtrToIntTypeMap : public BufferFatPtrTypeLoweringBase {
+ using BufferFatPtrTypeLoweringBase::BufferFatPtrTypeLoweringBase;
+
+protected:
+ Type *remapScalar(PointerType *PT) override { return DL.getIntPtrType(PT); }
+ Type *remapVector(VectorType *VT) override { return DL.getIntPtrType(VT); }
+};
+
+/// Remap ptr addrspace(7) to {ptr addrspace(8), i32} (the resource and offset
+/// parts of the pointer) so that we can easily rewrite operations on these
+/// values that aren't loading them from or storing them to memory.
+class BufferFatPtrToStructTypeMap : public BufferFatPtrTypeLoweringBase {
+ using BufferFatPtrTypeLoweringBase::BufferFatPtrTypeLoweringBase;
+
+protected:
+ Type *remapScalar(PointerType *PT) override;
+ Type *remapVector(VectorType *VT) override;
+};
+} // namespace
+
+// This code is adapted from the type remapper in lib/Linker/IRMover.cpp
+Type *BufferFatPtrTypeLoweringBase::remapTypeImpl(
+ Type *Ty, SmallPtrSetImpl<StructType *> &Seen) {
+ Type **Entry = &Map[Ty];
+ if (*Entry)
+ return *Entry;
+ if (auto *PT = dyn_cast<PointerType>(Ty)) {
+ if (PT->getAddressSpace() == AMDGPUAS::BUFFER_FAT_POINTER) {
+ return *Entry = remapScalar(PT);
+ }
+ }
+ if (auto *VT = dyn_cast<VectorType>(Ty)) {
+ auto *PT = dyn_cast<PointerType>(VT->getElementType());
+ if (PT && PT->getAddressSpace() == AMDGPUAS::BUFFER_FAT_POINTER) {
+ return *Entry = remapVector(VT);
+ }
+ return *Entry = Ty;
+ }
+ // Whether the type is one that is structurally uniqued - that is, if it is
+ // not a named struct (the only kind of type where multiple structurally
+ // identical types that have a distinct `Type*`)
+ StructType *TyAsStruct = dyn_cast<StructType>(Ty);
+ bool IsUniqued = !TyAsStruct || TyAsStruct->isLiteral();
+ // Base case for ints, floats, opaque pointers, and so on, which don't
+ // require recursion.
+ if (Ty->getNumContainedTypes() == 0 && IsUniqued)
+ return *Entry = Ty;
+ if (!IsUniqued) {
+ // Create a dummy type for recursion purposes.
+ if (!Seen.insert(TyAsStruct).second) {
+ StructType *Placeholder = StructType::create(Ty->getContext());
+ return *Entry = Placeholder;
+ }
+ }
+ bool Changed = false;
+ SmallVector<Type *> ElementTypes;
+ ElementTypes.reserve(Ty->getNumContainedTypes());
+ for (unsigned int I = 0, E = Ty->getNumContainedTypes(); I < E; ++I) {
+ Type *OldElem = Ty->getContainedType(I);
+ Type *NewElem = remapTypeImpl(OldElem, Seen);
+ ElementTypes.push_back(NewElem);
+ Changed |= (OldElem != NewElem);
+ }
+ if (!Changed) {
+ return *Entry = Ty;
+ }
+ if (auto *ArrTy = dyn_cast<ArrayType>(Ty))
+ return *Entry = ArrayType::get(ElementTypes[0], ArrTy->getNumElements());
+ if (auto *FnTy = dyn_cast<FunctionType>(Ty))
+ return *Entry = FunctionType::get(ElementTypes[0],
+ ArrayRef(ElementTypes).slice(1),
+ FnTy->isVarArg());
+ if (auto *STy = dyn_cast<StructType>(Ty)) {
+ // Genuine opaque types don't have a remapping.
+ if (STy->isOpaque())
+ return *Entry = Ty;
+ bool IsPacked = STy->isPacked();
+ if (IsUniqued)
+ return *Entry = StructType::get(Ty->getContext(), ElementTypes, IsPacked);
+ SmallString<16> Name(STy->getName());
+ STy->setName("");
+ Type **RecursionEntry = &Map[Ty];
+ if (*RecursionEntry) {
+ auto *Placeholder = cast<StructType>(*RecursionEntry);
+ Placeholder->setBody(ElementTypes, IsPacked);
+ Placeholder->setName(Name);
+ return *Entry = Placeholder;
+ }
+ return *Entry = StructType::create(Ty->getContext(), ElementTypes, Name,
+ IsPacked);
+ }
+ llvm_unreachable("Unknown type of type that contains elements");
+}
+
+Type *BufferFatPtrTypeLoweringBase::remapType(Type *SrcTy) {
+ SmallPtrSet<StructType *, 2> Visited;
+ return remapTypeImpl(SrcTy, Visited);
+}
+
+Type *BufferFatPtrToStructTypeMap::remapScalar(PointerType *PT) {
+ LLVMContext &Ctx = PT->getContext();
+ return StructType::get(PointerType::get(Ctx, AMDGPUAS::BUFFER_RESOURCE),
+ IntegerType::get(Ctx, BufferOffsetWidth));
+}
+
+Type *BufferFatPtrToStructTypeMap::remapVector(VectorType *VT) {
+ ElementCount EC = VT->getElementCount();
+ LLVMContext &Ctx = VT->getContext();
+ Type *RsrcVec =
+ VectorType::get(PointerType::get(Ctx, AMDGPUAS::BUFFER_RESOURCE), EC);
+ Type *OffVec = VectorType::get(IntegerType::get(Ctx, BufferOffsetWidth), EC);
+ return StructType::get(RsrcVec, OffVec);
+}
+
+static bool isBufferFatPtrOrVector(Type *Ty) {
+ if (auto *PT = dyn_cast<PointerType>(Ty->getScalarType()))
+ return PT->getAddressSpace() == AMDGPUAS::BUFFER_FAT_POINTER;
+ return false;
+}
+
+// True if the type is {ptr addrspace(8), i32} or a struct containing vectors of
+// those types. Used to quickly skip instructions we don't need to process.
+static bool isSplitFatPtr(Type *Ty) {
+ auto *ST = dyn_cast<StructType>(Ty);
+ if (!ST)
+ return false;
+ if (!ST->isLiteral() || ST->getNumElements() != 2)
+ return false;
+ auto *MaybeRsrc =
+ dyn_cast<PointerType>(ST->getElementType(0)->getScalarType());
+ auto *MaybeOff =
+ dyn_cast<IntegerType>(ST->getElementType(1)->getScalarType());
+ return MaybeRsrc && MaybeOff &&
+ MaybeRsrc->getAddressSpace() == AMDGPUAS::BUFFER_RESOURCE &&
+ MaybeOff->getBitWidth() == BufferOffsetWidth;
+}
+
+// True if the result type or any argument types are buffer fat pointers.
+static bool isBufferFatPtrConst(Constant *C) {
+ Type *T = C->getType();
+ return isBufferFatPtrOrVector(T) ||
+ llvm::any_of(C->operands(), [](const Use &U) {
+ return isBufferFatPtrOrVector(U.get()->getType());
+ });
+}
+
+namespace {
+/// Convert [vectors of] buffer fat pointers to integers when they are read from
+/// or stored to memory. This ensures that these pointers will have the same
+/// memory layout as before they are lowered, even though they will no longer
+/// have their previous layout in registers/in the program (they'll be broken
+/// down into resource and offset parts). This has the downside of imposing
+/// marshalling costs when reading or storing these values, but since placing
+/// such pointers into memory is an uncommon operation at best, we feel that
+/// this cost is acceptable for better performance in the common case.
+class StoreFatPtrsAsIntsVisitor
+ : public InstVisitor<StoreFatPtrsAsIntsVisitor, bool> {
+ BufferFatPtrToIntTypeMap *TypeMap;
+
+ ValueToValueMapTy ConvertedForStore;
+
+ IRBuilder<> IRB;
+
+ // Convert all the buffer fat pointers within the input value to inttegers
+ // so that it can be stored in memory.
+ Value *fatPtrsToInts(Value *V, Type *From, Type *To, const Twine &Name);
+ // Convert all the i160s that need to be buffer fat pointers (as specified)
+ // by the To type) into those pointers to preserve the semantics of the rest
+ // of the program.
+ Value *intsToFatPtrs(Value *V, Type *From, Type *To, const Twine &Name);
+
+public:
+ StoreFatPtrsAsIntsVisitor(BufferFatPtrToIntTypeMap *TypeMap, LLVMContext &Ctx)
+ : TypeMap(TypeMap), IRB(Ctx) {}
+ bool processFunction(Function &F);
+
+ bool visitInstruction(Instruction &I) { return false; }
+ bool visitAllocaInst(AllocaInst &I);
+ bool visitLoadInst(LoadInst &LI);
+ bool visitStoreInst(StoreInst &SI);
+ bool visitGetElementPtrInst(GetElementPtrInst &I);
+};
+} // namespace
+
+Value *StoreFatPtrsAsIntsVisitor::fatPtrsToInts(Value *V, Type *From, Type *To,
+ const Twine &Name) {
+ if (From == To)
+ return V;
+ ValueToValueMapTy::iterator Find = ConvertedForStore.find(V);
+ if (Find != ConvertedForStore.end())
+ return Find->second;
+ if (isBufferFatPtrOrVector(From)) {
+ Value *Cast = IRB.CreatePtrToInt(V, To, Name + ".int");
+ ConvertedForStore[V] = Cast;
+ return Cast;
+ }
+ if (From->getNumContainedTypes() == 0)
+ return V;
+ // Structs, arrays, and other compound types.
+ Value *Ret = PoisonValue::get(To);
+ if (auto *AT = dyn_cast<ArrayType>(From)) {
+ Type *FromPart = AT->getArrayElementType();
+ Type *ToPart = cast<ArrayType>(To)->getElementType();
+ for (uint64_t I = 0, E = AT->getArrayNumElements(); I < E; ++I) {
+ Value *Field = IRB.CreateExtractValue(V, I);
+ Value *NewField =
+ fatPtrsToInts(Field, FromPart, ToPart, Name + "." + Twine(I));
+ Ret = IRB.CreateInsertValue(Ret, NewField, I);
+ }
+ } else {
+ for (auto [Idx, FromPart, ToPart] :
+ enumerate(From->subtypes(), To->subtypes())) {
+ Value *Field = IRB.CreateExtractValue(V, Idx);
+ Value *NewField =
+ fatPtrsToInts(Field, FromPart, ToPart, Name + "." + Twine(Idx));
+ Ret = IRB.CreateInsertValue(Ret, NewField, Idx);
+ }
+ }
+ ConvertedForStore[V] = Ret;
+ return Ret;
+}
+
+Value *StoreFatPtrsAsIntsVisitor::intsToFatPtrs(Value *V, Type *From, Type *To,
+ const Twine &Name) {
+ if (From == To)
+ return V;
+ if (isBufferFatPtrOrVector(To)) {
+ Value *Cast = IRB.CreateIntToPtr(V, To, Name + ".ptr");
+ return Cast;
+ }
+ if (From->getNumContainedTypes() == 0)
+ return V;
+ // Structs, arrays, and other compound types.
+ Value *Ret = PoisonValue::get(To);
+ if (auto *AT = dyn_cast<ArrayType>(From)) {
+ Type *FromPart = AT->getArrayElementType();
+ Type *ToPart = cast<ArrayType>(To)->getElementType();
+ for (uint64_t I = 0, E = AT->getArrayNumElements(); I < E; ++I) {
+ Value *Field = IRB.CreateExtractValue(V, I);
+ Value *NewField =
+ intsToFatPtrs(Field, FromPart, ToPart, Name + "." + Twine(I));
+ Ret = IRB.CreateInsertValue(Ret, NewField, I);
+ }
+ } else {
+ for (auto [Idx, FromPart, ToPart] :
+ enumerate(From->subtypes(), To->subtypes())) {
+ Value *Field = IRB.CreateExtractValue(V, Idx);
+ Value *NewField =
+ intsToFatPtrs(Field, FromPart, ToPart, Name + "." + Twine(Idx));
+ Ret = IRB.CreateInsertValue(Ret, NewField, Idx);
+ }
+ }
+ return Ret;
+}
+
+bool StoreFatPtrsAsIntsVisitor::processFunction(Function &F) {
+ bool Changed = false;
+ // The visitors will mutate GEPs and allocas, but will push loads and stores
+ // to the worklist to avoid invalidation.
+ for (Instruction &I : make_early_inc_range(instructions(F))) {
+ Changed |= visit(I);
+ }
+ ConvertedForStore.clear();
+ return Changed;
+}
+
+bool StoreFatPtrsAsIntsVisitor::visitAllocaInst(AllocaInst &I) {
+ Type *Ty = I.getAllocatedType();
+ Type *NewTy = TypeMap->remapType(Ty);
+ if (Ty == NewTy)
+ return false;
+ I.setAllocatedType(NewTy);
+ return true;
+}
+
+bool StoreFatPtrsAsIntsVisitor::visitGetElementPtrInst(GetElementPtrInst &I) {
+ Type *Ty = I.getSourceElementType();
+ Type *NewTy = TypeMap->remapType(Ty);
+ if (Ty == NewTy)
+ return false;
+ // We'll be rewriting the type `ptr addrspace(7)` out of existence soon, so
+ // make sure GEPs don't have different semantics with the new type.
+ I.setSourceElementType(NewTy);
+ I.setResultElementType(TypeMap->remapType(I.getResultElementType()));
+ return true;
+}
+
+bool StoreFatPtrsAsIntsVisitor::visitLoadInst(LoadInst &LI) {
+ Type *Ty = LI.getType();
+ Type *IntTy = TypeMap->remapType(Ty);
+ if (Ty == IntTy)
+ return false;
+
+ IRB.SetInsertPoint(&LI);
+ auto *NLI = cast<LoadInst>(LI.clone());
+ NLI->mutateType(IntTy);
+ NLI = IRB.Insert(NLI);
+ copyMetadataForLoad(*NLI, LI);
+ NLI->takeName(&LI);
+
+ Value *CastBack = intsToFatPtrs(NLI, IntTy, Ty, NLI->getName());
+ LI.replaceAllUsesWith(CastBack);
+ LI.eraseFromParent();
+ return true;
+}
+
+bool StoreFatPtrsAsIntsVisitor::visitStoreInst(StoreInst &SI) {
+ Value *V = SI.getValueOperand();
+ Type *Ty = V->getType();
+ Type *IntTy = TypeMap->remapType(Ty);
+ if (Ty == IntTy)
+ return false;
+
+ IRB.SetInsertPoint(&SI);
+ Value *IntV = fatPtrsToInts(V, Ty, IntTy, V->getName());
+ for (auto *Dbg : at::getAssignmentMarkers(&SI))
+ Dbg->setValue(IntV);
+
+ SI.setOperand(0, IntV);
+ return true;
+}
+
+/// Return the ptr addrspace(8) and i32 (resource and offset parts) in a lowered
+/// buffer fat pointer constant.
+static std::pair<Constant *, Constant *>
+splitLoweredFatBufferConst(Constant *C) {
+ if (auto *AZ = dyn_cast<ConstantAggregateZero>(C))
+ return std::make_pair(AZ->getStructElement(0), AZ->getStructElement(1));
+ if (auto *SC = dyn_cast<ConstantStruct>(C))
+ return std::make_pair(SC->getOperand(0), SC->getOperand(1));
+ llvm_unreachable("Conversion should've created a {p8, i32} struct");
+}
+
+namespace {
+/// Handle the remapping of ptr addrspace(7) constants.
+class FatPtrConstMaterializer final : public ValueMaterializer {
+ BufferFatPtrToStructTypeMap *TypeMap;
+ BufferFatPtrToIntTypeMap *IntTypeMap;
+ // An internal mapper that is used to recurse into the arguments of constants.
+ // While the documentation for `ValueMapper` specifies not to use it
+ // recursively, examination of the logic in mapValue() shows that it can
+ // safely be used recursively when handling constants, like it does in its own
+ // logic.
+ ValueMapper InternalMapper;
+
+ Constant *materializeBufferFatPtrConst(Constant *C);
+
+ const DataLayout &DL;
+
+public:
+ // UnderlyingMap is the value map this materializer will be filling.
+ FatPtrConstMaterializer(BufferFatPtrToStructTypeMap *TypeMap,
+ ValueToValueMapTy &UnderlyingMap,
+ BufferFatPtrToIntTypeMap *IntTypeMap,
+ const DataLayout &DL)
+ : TypeMap(TypeMap), IntTypeMap(IntTypeMap),
+ InternalMapper(UnderlyingMap, RF_None, TypeMap, this), DL(DL) {}
+ virtual ~FatPtrConstMaterializer() = default;
+
+ Value *materialize(Value *V) override;
+};
+} // namespace
+
+Constant *FatPtrConstMaterializer::materializeBufferFatPtrConst(Constant *C) {
+ Type *SrcTy = C->getType();
+ auto *NewTy = dyn_cast<StructType>(TypeMap->remapType(SrcTy));
+ if (C->isNullValue())
+ return ConstantAggregateZero::getNullValue(NewTy);
+ if (isa<PoisonValue>(C))
+ return ConstantStruct::get(NewTy,
+ {PoisonValue::get(NewTy->getElementType(0)),
+ PoisonValue::get(NewTy->getElementType(1))});
+ if (isa<UndefValue>(C))
+ return ConstantStruct::get(NewTy,
+ {UndefValue::get(NewTy->getElementType(0)),
+ UndefValue::get(NewTy->getElementType(1))});
+
+ if (isa<GlobalValue>(C))
+ report_fatal_error("Global values containing ptr addrspace(7) (buffer "
+ "fat pointer) values are not supported");
+
+ if (auto *VC = dyn_cast<ConstantVector>(C)) {
+ if (Constant *S = VC->getSplatValue()) {
+ Constant *NewS = InternalMapper.mapConstant(*S);
+ if (!NewS)
+ return nullptr;
+ auto [Rsrc, Off] = splitLoweredFatBufferConst(NewS);
+ auto EC = VC->getType()->getElementCount();
+ return ConstantStruct::get(NewTy, {ConstantVector::getSplat(EC, Rsrc),
+ ConstantVector::getSplat(EC, Off)});
+ }
+ SmallVector<Constant *> Rsrcs;
+ SmallVector<Constant *> Offs;
+ for (Value *Op : VC->operand_values()) {
+ auto *NewOp = dyn_cast_or_null<Constant>(InternalMapper.mapValue(*Op));
+ if (!NewOp)
+ return nullptr;
+ auto [Rsrc, Off] = splitLoweredFatBufferConst(NewOp);
+ Rsrcs.push_back(Rsrc);
+ Offs.push_back(Off);
+ }
+ Constant *RsrcVec = ConstantVector::get(Rsrcs);
+ Constant *OffVec = ConstantVector::get(Offs);
+ return ConstantStruct::get(NewTy, {RsrcVec, OffVec});
+ }
+
+ // Constant expressions. This code mirrors how we fix up the equivalent
+ // instructions later.
+ auto *CE = dyn_cast<ConstantExpr>(C);
+ if (!CE)
+ return nullptr;
+ if (auto *GEPO = dyn_cast<GEPOperator>(C)) {
+ Constant *RemappedPtr =
+ InternalMapper.mapConstant(*cast<Constant>(GEPO->getPointerOperand()));
+ auto [Rsrc, Off] = splitLoweredFatBufferConst(RemappedPtr);
+ Type *OffTy = Off->getType();
+ bool InBounds = GEPO->isInBounds();
+
+ MapVector<Value *, APInt> VariableOffs;
+ APInt NewConstOffVal = APInt::getZero(BufferOffsetWidth);
+ if (!GEPO->collectOffset(DL, BufferOffsetWidth, VariableOffs,
+ NewConstOffVal))
+ report_fatal_error(
+ "Scalable vector or unsized struct in fat pointer GEP");
+ Constant *OffAccum = nullptr;
+ // Accumulate offsets together before adding to the base in order to
+ // preserve as many of the inbounds properties as possible.
+ for (auto [Arg, Multiple] : VariableOffs) {
+ Constant *NewArg = InternalMapper.mapConstant(*cast<Constant>(Arg));
+ NewArg = ConstantFoldIntegerCast(NewArg, OffTy, /*IsSigned=*/true, DL);
+ if (Multiple.isPowerOf2()) {
+ NewArg = ConstantExpr::getShl(
+ NewArg,
+ CE->getIntegerValue(OffTy,
+ APInt(BufferOffsetWidth, Multiple.logBase2())),
+ /*hasNUW=*/InBounds, /*HasNSW=*/InBounds);
+ } else {
+ NewArg =
+ ConstantExpr::getMul(NewArg, CE->getIntegerValue(OffTy, Multiple),
+ /*hasNUW=*/InBounds, /*hasNSW=*/InBounds);
+ }
+ if (OffAccum) {
+ OffAccum = ConstantExpr::getAdd(OffAccum, NewArg, /*hasNUW=*/InBounds,
+ /*hasNSW=*/InBounds);
+ } else {
+ OffAccum = NewArg;
+ }
+ }
+ Constant *NewConstOff = CE->getIntegerValue(OffTy, NewConstOffVal);
+ if (OffAccum)
+ OffAccum = ConstantExpr::getAdd(OffAccum, NewConstOff,
+ /*hasNUW=*/InBounds, /*hasNSW=*/InBounds);
+ else
+ OffAccum = NewConstOff;
+ bool HasNonNegativeOff = false;
+ if (auto *CI = dyn_cast<ConstantInt>(OffAccum)) {
+ HasNonNegativeOff = !CI->isNegative();
+ }
+ Constant *NewOff = ConstantExpr::getAdd(
+ Off, OffAccum, /*hasNUW=*/InBounds && HasNonNegativeOff,
+ /*hasNSW=*/false);
+ return ConstantStruct::get(NewTy, {Rsrc, NewOff});
+ }
+
+ if (auto *PI = dyn_cast<PtrToIntOperator>(CE)) {
+ Constant *Parts =
+ InternalMapper.mapConstant(*cast<Constant>(PI->getPointerOperand()));
+ auto [Rsrc, Off] = splitLoweredFatBufferConst(Parts);
+ // Here, we take advantage of the fact that ptrtoint has a built-in
+ // zero-extension behavior.
+ unsigned FatPtrWidth =
+ DL.getPointerSizeInBits(AMDGPUAS::BUFFER_FAT_POINTER);
+ Constant *RsrcInt = CE->getPtrToInt(Rsrc, SrcTy);
+ unsigned Width = SrcTy->getScalarSizeInBits();
+ Constant *Shift =
+ CE->getIntegerValue(SrcTy, APInt(Width, BufferOffsetWidth));
+ Constant *OffCast =
+ ConstantFoldIntegerCast(Off, SrcTy, /*IsSigned=*/false, DL);
+ Constant *RsrcHi = ConstantExpr::getShl(
+ RsrcInt, Shift, Width >= FatPtrWidth, Width > FatPtrWidth);
+ // This should be an or, but those got recently removed.
+ Constant *Result = ConstantExpr::getAdd(RsrcHi, OffCast, true, true);
+ return Result;
+ }
+
+ if (CE->getOpcode() == Instruction::IntToPtr) {
+ auto *Arg = cast<Constant>(CE->getOperand(0));
+ unsigned FatPtrWidth =
+ DL.getPointerSizeInBits(AMDGPUAS::BUFFER_FAT_POINTER);
+ unsigned RsrcPtrWidth = DL.getPointerSizeInBits(AMDGPUAS::BUFFER_RESOURCE);
+ auto *WantedTy = Arg->getType()->getWithNewBitWidth(FatPtrWidth);
+ Arg = ConstantFoldIntegerCast(Arg, WantedTy, /*IsSigned=*/false, DL);
+
+ Constant *Shift =
+ CE->getIntegerValue(WantedTy, APInt(FatPtrWidth, BufferOffsetWidth));
+ Type *RsrcIntType = WantedTy->getWithNewBitWidth(RsrcPtrWidth);
+ Type *RsrcTy = NewTy->getElementType(0);
+ Type *OffTy = WantedTy->getWithNewBitWidth(BufferOffsetWidth);
+ Constant *RsrcInt = CE->getTrunc(
+ ConstantFoldBinaryOpOperands(Instruction::LShr, Arg, Shift, DL),
+ RsrcIntType);
+ Constant *Rsrc = CE->getIntToPtr(RsrcInt, RsrcTy);
+ Constant *Off = ConstantFoldIntegerCast(Arg, OffTy, /*isSigned=*/false, DL);
+
+ return ConstantStruct::get(NewTy, {Rsrc, Off});
+ }
+
+ if (auto *AC = dyn_cast<AddrSpaceCastOperator>(CE)) {
+ unsigned SrcAS = AC->getSrcAddressSpace();
+ unsigned DstAS = AC->getDestAddressSpace();
+ auto *Arg = cast<Constant>(AC->getPointerOperand());
+ auto *NewArg = InternalMapper.mapConstant(*Arg);
+ if (!NewArg)
+ return nullptr;
+ if (SrcAS == AMDGPUAS::BUFFER_FAT_POINTER &&
+ DstAS == AMDGPUAS::BUFFER_FAT_POINTER)
+ return NewArg;
+ if (SrcAS == AMDGPUAS::BUFFER_RESOURCE &&
+ DstAS == AMDGPUAS::BUFFER_FAT_POINTER) {
+ auto *NullOff = CE->getNullValue(NewTy->getElementType(1));
+ return ConstantStruct::get(NewTy, {NewArg, NullOff});
+ }
+ report_fatal_error(
+ "Unsupported address space cast for a buffer fat pointer");
+ }
+ return nullptr;
+}
+
+Value *FatPtrConstMaterializer::materialize(Value *V) {
+ Constant *C = dyn_cast<Constant>(V);
+ if (!C)
+ return nullptr;
+ if (auto *GEPO = dyn_cast<GEPOperator>(C)) {
+ // As a special case, adjust GEP constants that have a ptr addrspace(7) in
+ // their source types here, since the earlier local changes didn't handle
+ // htis.
+ Type *SrcTy = GEPO->getSourceElementType();
+ Type *NewSrcTy = IntTypeMap->remapType(SrcTy);
+ if (SrcTy != NewSrcTy) {
+ SmallVector<Constant *> Ops;
+ Ops.reserve(GEPO->getNumOperands());
+ for (const Use &U : GEPO->operands())
+ Ops.push_back(cast<Constant>(U.get()));
+ auto *NewGEP = ConstantExpr::getGetElementPtr(
+ NewSrcTy, Ops[0], ArrayRef<Constant *>(Ops).slice(1),
+ GEPO->isInBounds(), GEPO->getInRangeIndex());
+ LLVM_DEBUG(dbgs() << "p7-getting GEP: " << *GEPO << " becomes " << *NewGEP
+ << "\n");
+ Value *FurtherMap = materialize(NewGEP);
+ return FurtherMap ? FurtherMap : NewGEP;
+ }
+ }
+ // Structs and other types that happen to contain fat pointers get remapped
+ // by the mapValue() logic.
+ if (!isBufferFatPtrConst(C))
+ return nullptr;
+ return materializeBufferFatPtrConst(C);
+}
+
+using PtrParts = std::pair<Value *, Value *>;
+namespace {
+// The visitor returns the resource and offset parts for an instruction if they
+// can be computed, or (nullptr, nullptr) for cases that don't have a meaningful
+// value mapping.
+class SplitPtrStructs : public InstVisitor<SplitPtrStructs, PtrParts> {
+ ValueToValueMapTy RsrcParts;
+ ValueToValueMapTy OffParts;
+
+ // Track instructions that have been rewritten into a user of the component
+ // parts of their ptr addrspace(7) input. Instructions that produced
+ // ptr addrspace(7) parts should **not** be RAUW'd before being added to this
+ // set, as that replacement will be handled in a post-visit step. However,
+ // instructions that yield values that aren't fat pointers (ex. ptrtoint)
+ // should RAUW themselves with new instructions that use the split parts
+ // of their arguments during processing.
+ DenseSet<Instruction *> SplitUsers;
+
+ // Nodes that need a second look once we've computed the parts for all other
+ // instructions to see if, for example, we really need to phi on the resource
+ // part.
+ SmallVector<Instruction *> Conditionals;
+ // Temporary instructions produced while lowering conditionals that should be
+ // killed.
+ SmallVector<Instruction *> ConditionalTemps;
+
+ // Subtarget info, needed for determining what cache control bits to set.
+ const TargetMachine *TM;
+ const GCNSubtarget *ST;
+
+ IRBuilder<> IRB;
+
+ // Copy metadata between instructions if applicable.
+ void copyMetadata(Value *Dest, Value *Src);
+
+ // Get the resource and offset parts of the value V, inserting appropriate
+ // extractvalue calls if needed.
+ PtrParts getPtrParts(Value *V);
+
+ // Given an instruction that could produce multiple resource parts (a PHI or
+ // select), collect the set of possible instructions that could have provided
+ // its resource parts that it could have (the `Roots`) and the set of
+ // conditional instructions visited during the search (`Seen`). If, after
+ // removing the root of the search from `Seen` and `Roots`, `Seen` is a subset
+ // of `Roots` and `Roots - Seen` contains one element, the resource part of
+ // that element can replace the resource part of all other elements in `Seen`.
+ void getPossibleRsrcRoots(Instruction *I, SmallPtrSetImpl<Value *> &Roots,
+ SmallPtrSetImpl<Value *> &Seen);
+ void processConditionals();
+
+ // If an instruction hav been split into resource and offset parts,
+ // delete that instruction. If any of its uses have not themselves been split
+ // into parts (for example, an insertvalue), construct the structure
+ // that the type rewrites declared should be produced by the dying instruction
+ // and use that.
+ // Also, kill the temporary extractvalue operations produced by the two-stage
+ // lowering of PHIs and conditionals.
+ void killAndReplaceSplitInstructions(SmallVectorImpl<Instruction *> &Origs);
+
+ void setAlign(CallInst *Intr, Align A, unsigned RsrcArgIdx);
+ void insertPreMemOpFence(AtomicOrdering Order, SyncScope::ID SSID);
+ void insertPostMemOpFence(AtomicOrdering Order, SyncScope::ID SSID);
+ Value *handleMemoryInst(Instruction *I, Value *Arg, Value *Ptr, Type *Ty,
+ Align Alignment, AtomicOrdering Order,
+ bool IsVolatile, SyncScope::ID SSID);
+
+public:
+ SplitPtrStructs(LLVMContext &Ctx, const TargetMachine *TM)
+ : TM(TM), ST(nullptr), IRB(Ctx) {}
+
+ void processFunction(Function &F);
+
+ // The collected set of intrinsic declarations that have had their type
+ // mangled and that can be deleted as unneeded.
+ SmallPtrSet<Function *, 4> IntrinsicDeclsToRemove;
+
+ PtrParts visitInstruction(Instruction &I);
+ PtrParts visitLoadInst(LoadInst &LI);
+ PtrParts visitStoreInst(StoreInst &SI);
+ PtrParts visitAtomicRMWInst(AtomicRMWInst &AI);
+ PtrParts visitAtomicCmpXchgInst(AtomicCmpXchgInst &AI);
+ PtrParts visitGetElementPtrInst(GetElementPtrInst &GEP);
+
+ PtrParts visitPtrToIntInst(PtrToIntInst &PI);
+ PtrParts visitIntToPtrInst(IntToPtrInst &IP);
+ PtrParts visitAddrSpaceCastInst(AddrSpaceCastInst &I);
+ PtrParts visitICmpInst(ICmpInst &Cmp);
+ PtrParts visitFreezeInst(FreezeInst &I);
+
+ PtrParts visitExtractElementInst(ExtractElementInst &I);
+ PtrParts visitInsertElementInst(InsertElementInst &I);
+ PtrParts visitShuffleVectorInst(ShuffleVectorInst &I);
+
+ PtrParts visitPHINode(PHINode &PHI);
+ PtrParts visitSelectInst(SelectInst &SI);
+
+ PtrParts visitIntrinsicInst(IntrinsicInst &II);
+};
+} // namespace
+
+void SplitPtrStructs::copyMetadata(Value *Dest, Value *Src) {
+ auto *DestI = dyn_cast<Instruction>(Dest);
+ auto *SrcI = dyn_cast<Instruction>(Src);
+
+ if (!DestI || !SrcI)
+ return;
+
+ DestI->copyMetadata(*SrcI);
+}
+
+PtrParts SplitPtrStructs::getPtrParts(Value *V) {
+ assert(isSplitFatPtr(V->getType()) && "it's not meaningful to get the parts "
+ "of something that wasn't rewritten");
+ auto *RsrcEntry = &RsrcParts[V];
+ auto *OffEntry = &OffParts[V];
+ if (*RsrcEntry && *OffEntry)
+ return {*RsrcEntry, *OffEntry};
+
+ if (auto *C = dyn_cast<Constant>(V)) {
+ auto [Rsrc, Off] = splitLoweredFatBufferConst(C);
+ return {*RsrcEntry = Rsrc, *OffEntry = Off};
+ }
+
+ IRBuilder<>::InsertPointGuard Guard(IRB);
+ if (auto *I = dyn_cast<Instruction>(V)) {
+ LLVM_DEBUG(dbgs() << "Recursing to split parts of " << *I << "\n");
+ auto [Rsrc, Off] = visit(*I);
+ if (Rsrc && Off)
+ return {*RsrcEntry = Rsrc, *OffEntry = Off};
+ // We'll be creating the new values after the relevant instruction.
+ // This instruction generates a value and so isn't a terminator.
+ IRB.SetInsertPoint(*I->getInsertionPointAfterDef());
+ IRB.SetCurrentDebugLocation(I->getDebugLoc());
+ } else if (auto *A = dyn_cast<Argument>(V)) {
+ IRB.SetInsertPointPastAllocas(A->getParent());
+ IRB.SetCurrentDebugLocation(DebugLoc());
+ }
+ Value *Rsrc = IRB.CreateExtractValue(V, 0, V->getName() + ".rsrc");
+ Value *Off = IRB.CreateExtractValue(V, 1, V->getName() + ".off");
+ return {*RsrcEntry = Rsrc, *OffEntry = Off};
+}
+
+/// Returns the instruction that defines the resource part of the value V.
+/// Note that this is not getUnderlyingObject(), since that looks through
+/// operations like ptrmask which might modify the resource part.
+///
+/// We can limit ourselves to just looking through GEPs followed by looking
+/// through addrspacecasts because only those two operations preserve the
+/// resource part, and because operations on an `addrspace(8)` (which is the
+/// legal input to this addrspacecast) would produce a different resource part.
+static Value *rsrcPartRoot(Value *V) {
+ while (auto *GEP = dyn_cast<GEPOperator>(V))
+ V = GEP->getPointerOperand();
+ while (auto *ASC = dyn_cast<AddrSpaceCastOperator>(V))
+ V = ASC->getPointerOperand();
+ return V;
+}
+
+void SplitPtrStructs::getPossibleRsrcRoots(Instruction *I,
+ SmallPtrSetImpl<Value *> &Roots,
+ SmallPtrSetImpl<Value *> &Seen) {
+ if (auto *PHI = dyn_cast<PHINode>(I)) {
+ if (!Seen.insert(I).second)
+ return;
+ for (Value *In : PHI->incoming_values()) {
+ In = rsrcPartRoot(In);
+ Roots.insert(In);
+ if (isa<PHINode, SelectInst>(In))
+ getPossibleRsrcRoots(cast<Instruction>(In), Roots, Seen);
+ }
+ } else if (auto *SI = dyn_cast<SelectInst>(I)) {
+ if (!Seen.insert(SI).second)
+ return;
+ Value *TrueVal = rsrcPartRoot(SI->getTrueValue());
+ Value *FalseVal = rsrcPartRoot(SI->getFalseValue());
+ Roots.insert(TrueVal);
+ Roots.insert(FalseVal);
+ if (isa<PHINode, SelectInst>(TrueVal))
+ getPossibleRsrcRoots(cast<Instruction>(TrueVal), Roots, Seen);
+ if (isa<PHINode, SelectInst>(FalseVal))
+ getPossibleRsrcRoots(cast<Instruction>(FalseVal), Roots, Seen);
+ } else {
+ llvm_unreachable("getPossibleRsrcParts() only works on phi and select");
+ }
+}
+
+void SplitPtrStructs::processConditionals() {
+ SmallDenseMap<Instruction *, Value *> FoundRsrcs;
+ SmallPtrSet<Value *, 4> Roots;
+ SmallPtrSet<Value *, 4> Seen;
+ for (Instruction *I : Conditionals) {
+ // These have to exist by now because we've visited these nodes.
+ Value *Rsrc = RsrcParts[I];
+ Value *Off = OffParts[I];
+ assert(Rsrc && Off && "must have visited conditionals by now");
+
+ std::optional<Value *> MaybeRsrc;
+ auto MaybeFoundRsrc = FoundRsrcs.find(I);
+ if (MaybeFoundRsrc != FoundRsrcs.end()) {
+ MaybeRsrc = MaybeFoundRsrc->second;
+ } else {
+ IRBuilder<>::InsertPointGuard Guard(IRB);
+ Roots.clear();
+ Seen.clear();
+ getPossibleRsrcRoots(I, Roots, Seen);
+ LLVM_DEBUG(dbgs() << "Processing conditional: " << *I << "\n");
+#ifndef NDEBUG
+ for (Value *V : Roots)
+ LLVM_DEBUG(dbgs() << "Root: " << *V << "\n");
+ for (Value *V : Seen)
+ LLVM_DEBUG(dbgs() << "Seen: " << *V << "\n");
+#endif
+ // If we are our own possible root, then we shouldn't block our
+ // replacement with a valid incoming value.
+ Roots.erase(I);
+ // We don't want to block the optimization for conditionals that don't
+ // refer to themselves but did see themselves during the traversal.
+ Seen.erase(I);
+
+ if (set_is_subset(Seen, Roots)) {
+ auto Diff = set_difference(Roots, Seen);
+ if (Diff.size() == 1) {
+ Value *RootVal = *Diff.begin();
+ // Handle the case where previous loops already looked through
+ // an addrspacecast.
+ if (isSplitFatPtr(RootVal->getType()))
+ MaybeRsrc = std::get<0>(getPtrParts(RootVal));
+ else
+ MaybeRsrc = RootVal;
+ }
+ }
+ }
+
+ if (auto *PHI = dyn_cast<PHINode>(I)) {
+ Value *NewRsrc;
+ StructType *PHITy = cast<StructType>(PHI->getType());
+ IRB.SetInsertPoint(*PHI->getInsertionPointAfterDef());
+ IRB.SetCurrentDebugLocation(PHI->getDebugLoc());
+ if (MaybeRsrc) {
+ NewRsrc = *MaybeRsrc;
+ } else {
+ Type *RsrcTy = PHITy->getElementType(0);
+ auto *RsrcPHI = IRB.CreatePHI(RsrcTy, PHI->getNumIncomingValues());
+ RsrcPHI->takeName(Rsrc);
+ for (auto [V, BB] : llvm::zip(PHI->incoming_values(), PHI->blocks())) {
+ Value *VRsrc = std::get<0>(getPtrParts(V));
+ RsrcPHI->addIncoming(VRsrc, BB);
+ }
+ copyMetadata(RsrcPHI, PHI);
+ NewRsrc = RsrcPHI;
+ }
+
+ Type *OffTy = PHITy->getElementType(1);
+ auto *NewOff = IRB.CreatePHI(OffTy, PHI->getNumIncomingValues());
+ NewOff->takeName(Off);
+ for (auto [V, BB] : llvm::zip(PHI->incoming_values(), PHI->blocks())) {
+ assert(OffParts.count(V) && "An offset part had to be created by now");
+ Value *VOff = std::get<1>(getPtrParts(V));
+ NewOff->addIncoming(VOff, BB);
+ }
+ copyMetadata(NewOff, PHI);
+
+ // Note: We don't eraseFromParent() the temporaries because we don't want
+ // to put the corrections maps in an inconstent state. That'll be handed
+ // during the rest of the killing. Also, `ValueToValueMapTy` guarantees
+ // that references in that map will be updated as well.
+ ConditionalTemps.push_back(cast<Instruction>(Rsrc));
+ ConditionalTemps.push_back(cast<Instruction>(Off));
+ Rsrc->replaceAllUsesWith(NewRsrc);
+ Off->replaceAllUsesWith(NewOff);
+
+ // Save on recomputing the cycle traversals in known-root cases.
+ if (MaybeRsrc)
+ for (Value *V : Seen)
+ FoundRsrcs[cast<Instruction>(V)] = NewRsrc;
+ } else if (auto *SI = dyn_cast<SelectInst>(I)) {
+ if (MaybeRsrc) {
+ ConditionalTemps.push_back(cast<Instruction>(Rsrc));
+ Rsrc->replaceAllUsesWith(*MaybeRsrc);
+ for (Value *V : Seen)
+ FoundRsrcs[cast<Instruction>(V)] = *MaybeRsrc;
+ }
+ } else {
+ llvm_unreachable("Only PHIs and selects go in the conditionals list");
+ }
+ }
+}
+
+void SplitPtrStructs::killAndReplaceSplitInstructions(
+ SmallVectorImpl<Instruction *> &Origs) {
+ for (Instruction *I : ConditionalTemps)
+ I->eraseFromParent();
+
+ for (Instruction *I : Origs) {
+ if (!SplitUsers.contains(I))
+ continue;
+
+ SmallVector<DbgValueInst *> Dbgs;
+ findDbgValues(Dbgs, I);
+ for (auto *Dbg : Dbgs) {
+ IRB.SetInsertPoint(Dbg);
+ auto &DL = I->getModule()->getDataLayout();
+ assert(isSplitFatPtr(I->getType()) &&
+ "We should've RAUW'd away loads, stores, etc. at this point");
+ auto *OffDbg = cast<DbgValueInst>(Dbg->clone());
+ copyMetadata(OffDbg, Dbg);
+ auto [Rsrc, Off] = getPtrParts(I);
+
+ int64_t RsrcSz = DL.getTypeSizeInBits(Rsrc->getType());
+ int64_t OffSz = DL.getTypeSizeInBits(Off->getType());
+
+ std::optional<DIExpression *> RsrcExpr =
+ DIExpression::createFragmentExpression(Dbg->getExpression(), 0,
+ RsrcSz);
+ std::optional<DIExpression *> OffExpr =
+ DIExpression::createFragmentExpression(Dbg->getExpression(), RsrcSz,
+ OffSz);
+ if (OffExpr) {
+ OffDbg->setExpression(*OffExpr);
+ OffDbg->replaceVariableLocationOp(I, Off);
+ IRB.Insert(OffDbg);
+ } else {
+ OffDbg->deleteValue();
+ }
+ if (RsrcExpr) {
+ Dbg->setExpression(*RsrcExpr);
+ Dbg->replaceVariableLocationOp(I, Rsrc);
+ } else {
+ Dbg->replaceVariableLocationOp(I, UndefValue::get(I->getType()));
+ }
+ }
+
+ Value *Poison = PoisonValue::get(I->getType());
+ I->replaceUsesWithIf(Poison, [&](const Use &U) -> bool {
+ if (const auto *UI = dyn_cast<Instruction>(U.getUser()))
+ return SplitUsers.contains(UI);
+ return false;
+ });
+
+ if (I->use_empty()) {
+ I->eraseFromParent();
+ continue;
+ }
+ IRB.SetInsertPoint(*I->getInsertionPointAfterDef());
+ IRB.SetCurrentDebugLocation(I->getDebugLoc());
+ auto [Rsrc, Off] = getPtrParts(I);
+ Value *Struct = PoisonValue::get(I->getType());
+ Struct = IRB.CreateInsertValue(Struct, Rsrc, 0);
+ Struct = IRB.CreateInsertValue(Struct, Off, 1);
+ copyMetadata(Struct, I);
+ Struct->takeName(I);
+ I->replaceAllUsesWith(Struct);
+ I->eraseFromParent();
+ }
+}
+
+void SplitPtrStructs::setAlign(CallInst *Intr, Align A, unsigned RsrcArgIdx) {
+ LLVMContext &Ctx = Intr->getContext();
+ Intr->addParamAttr(RsrcArgIdx, Attribute::getWithAlignment(Ctx, A));
+}
+
+void SplitPtrStructs::insertPreMemOpFence(AtomicOrdering Order,
+ SyncScope::ID SSID) {
+ switch (Order) {
+ case AtomicOrdering::Release:
+ case AtomicOrdering::AcquireRelease:
+ case AtomicOrdering::SequentiallyConsistent:
+ IRB.CreateFence(AtomicOrdering::Release, SSID);
+ break;
+ default:
+ break;
+ }
+}
+
+void SplitPtrStructs::insertPostMemOpFence(AtomicOrdering Order,
+ SyncScope::ID SSID) {
+ switch (Order) {
+ case AtomicOrdering::Acquire:
+ case AtomicOrdering::AcquireRelease:
+ case AtomicOrdering::SequentiallyConsistent:
+ IRB.CreateFence(AtomicOrdering::Acquire, SSID);
+ break;
+ default:
+ break;
+ }
+}
+
+Value *SplitPtrStructs::handleMemoryInst(Instruction *I, Value *Arg, Value *Ptr,
+ Type *Ty, Align Alignment,
+ AtomicOrdering Order, bool IsVolatile,
+ SyncScope::ID SSID) {
+ IRB.SetInsertPoint(I);
+
+ auto [Rsrc, Off] = getPtrParts(Ptr);
+ SmallVector<Value *, 5> Args;
+ if (Arg)
+ Args.push_back(Arg);
+ Args.push_back(Rsrc);
+ Args.push_back(Off);
+ insertPreMemOpFence(Order, SSID);
+ // soffset is always 0 for these cases, where we always want any offset to be
+ // part of bounds checking and we don't know which parts of the GEPs is
+ // uniform.
+ Args.push_back(IRB.getInt32(0));
+
+ uint32_t Aux = 0;
+ bool IsInvariant =
+ (isa<LoadInst>(I) && I->getMetadata(LLVMContext::MD_invariant_load));
+ bool IsNonTemporal = I->getMetadata(LLVMContext::MD_nontemporal);
+ // Atomic loads and stores need glc, atomic read-modify-write doesn't.
+ bool IsOneWayAtomic =
+ !isa<AtomicRMWInst>(I) && Order != AtomicOrdering::NotAtomic;
+ if (IsOneWayAtomic)
+ Aux |= AMDGPU::CPol::GLC;
+ if (IsNonTemporal && !IsInvariant)
+ Aux |= AMDGPU::CPol::SLC;
+ if (isa<LoadInst>(I) && ST->getGeneration() == AMDGPUSubtarget::GFX10)
+ Aux |= (Aux & AMDGPU::CPol::GLC ? AMDGPU::CPol::DLC : 0);
+ if (IsVolatile)
+ Aux |= AMDGPU::CPol::VOLATILE;
+ Args.push_back(IRB.getInt32(Aux));
+
+ Intrinsic::ID IID = Intrinsic::not_intrinsic;
+ if (isa<LoadInst>(I))
+ // TODO: Do we need to do something about atomic loads?
+ IID = Intrinsic::amdgcn_raw_ptr_buffer_load;
+ else if (isa<StoreInst>(I))
+ IID = Intrinsic::amdgcn_raw_ptr_buffer_store;
+ else if (auto *RMW = dyn_cast<AtomicRMWInst>(I)) {
+ switch (RMW->getOperation()) {
+ case AtomicRMWInst::Xchg:
+ IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_swap;
+ break;
+ case AtomicRMWInst::Add:
+ IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_add;
+ break;
+ case AtomicRMWInst::Sub:
+ IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_sub;
+ break;
+ case AtomicRMWInst::And:
+ IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_and;
+ break;
+ case AtomicRMWInst::Or:
+ IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_or;
+ break;
+ case AtomicRMWInst::Xor:
+ IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_xor;
+ break;
+ case AtomicRMWInst::Max:
+ IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_smax;
+ break;
+ case AtomicRMWInst::Min:
+ IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_smin;
+ break;
+ case AtomicRMWInst::UMax:
+ IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_umax;
+ break;
+ case AtomicRMWInst::UMin:
+ IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_umin;
+ break;
+ case AtomicRMWInst::FAdd:
+ IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_fadd;
+ break;
+ case AtomicRMWInst::FMax:
+ IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_fmax;
+ break;
+ case AtomicRMWInst::FMin:
+ IID = Intrinsic::amdgcn_raw_ptr_buffer_atomic_fmin;
+ break;
+ case AtomicRMWInst::FSub: {
+ report_fatal_error("atomic floating point subtraction not supported for "
+ "buffer resources and should've been expanded away");
+ break;
+ }
+ case AtomicRMWInst::Nand:
+ report_fatal_error("atomic nand not supported for buffer resources and "
+ "should've been expanded away");
+ break;
+ case AtomicRMWInst::UIncWrap:
+ case AtomicRMWInst::UDecWrap:
+ report_fatal_error("wrapping increment/decrement not supported for "
+ "buffer resources and should've ben expanded away");
+ break;
+ case AtomicRMWInst::BAD_BINOP:
+ llvm_unreachable("Not sure how we got a bad binop");
+ }
+ }
+
+ auto *Call = IRB.CreateIntrinsic(IID, Ty, Args);
+ copyMetadata(Call, I);
+ setAlign(Call, Alignment, Arg ? 1 : 0);
+ Call->takeName(I);
+
+ insertPostMemOpFence(Order, SSID);
+ // The "no moving p7 directly" rewrites ensure that this load or store won't
+ // itself need to be split into parts.
+ SplitUsers.insert(I);
+ I->replaceAllUsesWith(Call);
+ return Call;
+}
+
+PtrParts SplitPtrStructs::visitInstruction(Instruction &I) {
+ return {nullptr, nullptr};
+}
+
+PtrParts SplitPtrStructs::visitLoadInst(LoadInst &LI) {
+ if (!isSplitFatPtr(LI.getPointerOperandType()))
+ return {nullptr, nullptr};
+ handleMemoryInst(&LI, nullptr, LI.getPointerOperand(), LI.getType(),
+ LI.getAlign(), LI.getOrdering(), LI.isVolatile(),
+ LI.getSyncScopeID());
+ return {nullptr, nullptr};
+}
+
+PtrParts SplitPtrStructs::visitStoreInst(StoreInst &SI) {
+ if (!isSplitFatPtr(SI.getPointerOperandType()))
+ return {nullptr, nullptr};
+ Value *Arg = SI.getValueOperand();
+ handleMemoryInst(&SI, Arg, SI.getPointerOperand(), Arg->getType(),
+ SI.getAlign(), SI.getOrdering(), SI.isVolatile(),
+ SI.getSyncScopeID());
+ return {nullptr, nullptr};
+}
+
+PtrParts SplitPtrStructs::visitAtomicRMWInst(AtomicRMWInst &AI) {
+ if (!isSplitFatPtr(AI.getPointerOperand()->getType()))
+ return {nullptr, nullptr};
+ Value *Arg = AI.getValOperand();
+ handleMemoryInst(&AI, Arg, AI.getPointerOperand(), Arg->getType(),
+ AI.getAlign(), AI.getOrdering(), AI.isVolatile(),
+ AI.getSyncScopeID());
+ return {nullptr, nullptr};
+}
+
+// Unlike load, store, and RMW, cmpxchg needs special handling to account
+// for the boolean argument.
+PtrParts SplitPtrStructs::visitAtomicCmpXchgInst(AtomicCmpXchgInst &AI) {
+ Value *Ptr = AI.getPointerOperand();
+ if (!isSplitFatPtr(Ptr->getType()))
+ return {nullptr, nullptr};
+ IRB.SetInsertPoint(&AI);
+
+ Type *Ty = AI.getNewValOperand()->getType();
+ AtomicOrdering Order = AI.getMergedOrdering();
+ SyncScope::ID SSID = AI.getSyncScopeID();
+ bool IsNonTemporal = AI.getMetadata(LLVMContext::MD_nontemporal);
+
+ auto [Rsrc, Off] = getPtrParts(Ptr);
+ insertPreMemOpFence(Order, SSID);
+
+ uint32_t Aux = 0;
+ if (IsNonTemporal)
+ Aux |= AMDGPU::CPol::SLC;
+ if (AI.isVolatile())
+ Aux |= AMDGPU::CPol::VOLATILE;
+ auto *Call =
+ IRB.CreateIntrinsic(Intrinsic::amdgcn_raw_ptr_buffer_atomic_cmpswap, Ty,
+ {AI.getNewValOperand(), AI.getCompareOperand(), Rsrc,
+ Off, IRB.getInt32(0), IRB.getInt32(Aux)});
+ copyMetadata(Call, &AI);
+ setAlign(Call, AI.getAlign(), 2);
+ Call->takeName(&AI);
+ insertPostMemOpFence(Order, SSID);
+
+ Value *Res = PoisonValue::get(AI.getType());
+ Res = IRB.CreateInsertValue(Res, Call, 0);
+ if (!AI.isWeak()) {
+ Value *Succeeded = IRB.CreateICmpEQ(Call, AI.getCompareOperand());
+ Res = IRB.CreateInsertValue(Res, Succeeded, 1);
+ }
+ SplitUsers.insert(&AI);
+ AI.replaceAllUsesWith(Res);
+ return {nullptr, nullptr};
+}
+
+PtrParts SplitPtrStructs::visitGetElementPtrInst(GetElementPtrInst &GEP) {
+ Value *Ptr = GEP.getPointerOperand();
+ if (!isSplitFatPtr(Ptr->getType()))
+ return {nullptr, nullptr};
+ IRB.SetInsertPoint(&GEP);
+
+ auto [Rsrc, Off] = getPtrParts(Ptr);
+ Type *OffTy = Off->getType();
+ const DataLayout &DL = GEP.getModule()->getDataLayout();
+ bool InBounds = GEP.isInBounds();
+
+ // In order to call collectOffset() and thus not have to reimplement it,
+ // we need the GEP's pointer operand to have ptr addrspace(7) type
+ GEP.setOperand(GEP.getPointerOperandIndex(),
+ PoisonValue::get(IRB.getPtrTy(AMDGPUAS::BUFFER_FAT_POINTER)));
+ MapVector<Value *, APInt> VariableOffs;
+ APInt ConstOffVal = APInt::getZero(BufferOffsetWidth);
+ if (!GEP.collectOffset(DL, BufferOffsetWidth, VariableOffs, ConstOffVal))
+ report_fatal_error("Scalable vector or unsized struct in fat pointer GEP");
+ GEP.setOperand(GEP.getPointerOperandIndex(), Ptr);
+ Value *OffAccum = nullptr;
+ // Accumulate offsets together before adding to the base in order to preserve
+ // as many of the inbounds properties as possible.
+ for (auto [Arg, Multiple] : VariableOffs) {
+ if (auto *OffVecTy = dyn_cast<VectorType>(OffTy))
+ if (!Arg->getType()->isVectorTy())
+ Arg = IRB.CreateVectorSplat(OffVecTy->getElementCount(), Arg);
+ Arg = IRB.CreateIntCast(Arg, OffTy, /*isSigned=*/true);
+ if (Multiple.isPowerOf2())
+ Arg = IRB.CreateShl(Arg, BufferOffsetWidth, "", /*hasNUW=*/InBounds,
----------------
piotrAMD wrote:
I do not understand this line, and I can't see this being exercised in any of the tests. Surely, the accumulated offset is not meant to be shifted by its width.
https://github.com/llvm/llvm-project/pull/77952
More information about the llvm-commits
mailing list