[PATCH] D158463: [AMDGPU] Add IR-level pass to rewrite away address space 7

Wed Sep 13 04:45:08 PDT 2023

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:1
+//===-- AMDGPULowerBufferFatPointers.cpp ------------------*- C++ -*-=//
+//
----------------
Not padded out to same column as the later header lines 

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:13
+//
+// This pass returns, doing nothing, on r600.
+//
----------------
I wouldn't bother mentioning this and just not add the pass for r600 

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:43
+// `ptr addrspace(8) %x.rsrc` and `ptr addrspace(6) %x.off`, which will be
+// combined into `{ptr addrspace(8), ptr addrspace(6)} %x = {%x.rsrc, %x.off}`
+// if needed. Similarly, `vector<Nxp7>` becomes `{vector<Nxp8>, vector<Nxp6>}`
----------------
addrspace(6) is not just 32-bit, it's 32-bit constant. We shouldn't be assuming these are only const accesses. Should the offset just not be a pointer at all?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:53
+//
+// The first phase is to rewrite away all loads and stors of `ptr addrspace(7)`
+// containing values to corresponding `i160` values. This is handled by
----------------

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:54
+// The first phase is to rewrite away all loads and stors of `ptr addrspace(7)`
+// containing values to corresponding `i160` values. This is handled by
+// `StoreFatPtrsAsIntsVisitor` , which visits loads, stores, and allocas
----------------
"handled by containing values"? Not sure what this means

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:63
+// Such a transformation allows the later phases of the pass to not need
+// to handle buffer fat pointers moving to and from memory, where we loud
+// have to handle the incompatibility between a `{Nxp8, Nxp6}` representation
----------------
where we 'loud'?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:67
+// of resources and vectors of offsets are concatentated before being stored to
+// memory) are handled through ilpmententing `inttoptr` and `ptrtoint` only.
+//
----------------

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:69-70
+//
+// Atomics involving `ptr addrspace(7)` are not supported, as such values are
+// too large to be atomically processed in any case.
+//
----------------
atomic load and store should work fine 

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:79
+// This uses a `BufferFatPtrToStructTypeMap` and a `FatPtrConstMaterializer`
+// to, usually by way of `setType`ing values. Constans are handled here
+// because there isn't a good way to fix them up later.
----------------

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:107
+// Split users do not necessarily need to produce parts themselves (
+// a `load f32, ptr addrspace(7)` does not, for example), but, if they do not
+// generate fat buffer pointers, they must RAUW in their replacement
----------------

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:165
+//    fragments of the underlying source language variable.
+// 2. All uses that were themselves split are replaced by an `undef` of the
+//    struct type, as they will themselves be erased soon. This rule, combined
----------------

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:240-241
+protected:
+  virtual Type *remapScalar(PointerType *PT) = 0;
+  virtual Type *remapVector(VectorType *VT) = 0;
+
----------------
What's the point in having these be separate?

Also not sure why these are virtual 

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:259
+  Type *remapScalar(PointerType *PT) override {
+    return Type::getIntNTy(PT->getContext(), DL.getPointerTypeSizeInBits(PT));
+  }
----------------
This is just getIntPtrType

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:263
+    return VT->getWithNewType(
+        Type::getIntNTy(VT->getContext(), DL.getPointerTypeSizeInBits(VT)));
+  }
----------------
This is just getIntPtrType

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:283
+    Type *Ty, SmallPtrSetImpl<StructType *> &Seen) {
+  Type **Entry = &Map[Ty];
+  if (*Entry)
----------------
Type *&

Could you also move all the core logic to a separate function, such that every return doesn't need to consider the assignment to the map?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:298
+  }
+  bool IsUniqued = !isa<StructType>(Ty) || cast<StructType>(Ty)->isLiteral();
+  // Base case for ints, floats, opaque pointers, and so on, which don't
----------------
dyn_cast? Also could make a predicate function and clarify what uniqued means 

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:348
+  llvm_unreachable("Unknown type of type that contains elements");
+  return nullptr;
+}
----------------
remove dead return 

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:360
+      PointerType::get(Ctx, AMDGPUAS::BUFFER_RESOURCE),
+      PointerType::get(Ctx, AMDGPUAS::CONSTANT_ADDRESS_32BIT));
+}
----------------
Don't really think CONSTANT_ADDRESS_32BIT is appropriate for this 

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:412
+/// down into resource and offset parts). This has the downside of imposing
+/// marshaling costs when reading or storing these values, but since placing
+/// such pointers into memory is an uncommon operation at best, we feel that
----------------

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:587
+  if (auto *AZ = dyn_cast<ConstantAggregateZero>(C))
+    return std::make_pair(AZ->getStructElement(0), AZ->getStructElement(1));
+  if (auto *SC = dyn_cast<ConstantStruct>(C))
----------------
can use {}?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULowerBufferFatPointers.cpp:1831
+    F->replaceAllUsesWith(NewF);
+    F->setName("");
+    M.getFunctionList().insert(F->getIterator(), NewF);
----------------
Could also create an anonymous function and use takeName. Are you getting an added suffix now?

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:14604-14605

-    if (AMDGPU::isFlatGlobalAddrSpace(AS) &&
+    if ((AMDGPU::isFlatGlobalAddrSpace(AS) ||
+         AS == AMDGPUAS::BUFFER_FAT_POINTER) &&
         Subtarget->hasAtomicFaddNoRtnInsts()) {
----------------
Probably should have an additional predicate function for this. It's separate but not really from global 

================
Comment at: llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-p7-in-memory.ll:34
+; CHECK-SAME: (ptr [[A:%.*]], ptr [[B:%.*]]) #[[ATTR0]] {
+; CHECK-NEXT:    [[X:%.*]] = load <4 x i160>, ptr [[A]], align 128
+; CHECK-NEXT:    [[TMP1:%.*]] = lshr <4 x i160> [[X]], <i160 32, i160 32, i160 32, i160 32>
----------------
Is this alignment correct? I thought p7 was 256 bit / align 32

================
Comment at: llvm/test/CodeGen/AMDGPU/lower-buffer-fat-pointers-p7-in-memory.ll:161
+  ret void
+}
----------------
Also load of array and load of unpacked struct?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158463/new/

https://reviews.llvm.org/D158463