[PATCH] D119216: [AMDGPU] replace hostcall module flag with function attribute

Sameer Sahasrabuddhe via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Feb 8 09:01:44 PST 2022

sameerds added inline comments.

Comment at: llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp:198
+    const auto *STI = TM.getMCSubtargetInfo();
+    return llvm::AMDGPU::getHostcallImplicitArgPosition(STI);
+  }
arsenm wrote:
> The ABI should not be a property of the subtarget, and the global subtarget shouldn't be used
I don't understand what the objection is. Existing functions that check the ABI version clearly do so by accessing the subtarget. I am merely following existing practice.

I have now rearranged the code a bit. Maybe this works? To be honest, I am not very familiar with how ABI information is tracked. I will heartily welcome any advice on how to retrieve the ABI version and then determine the location of the hostcall implicit arg.

Comment at: llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp:521
+      while (!WorkList.empty()) {
+        auto UseInfo = WorkList.back();
arsenm wrote:
> Can you use checkForAllUses instead of creating your own worklist?
checkForAllUses does not track sufficient state to catch every load that happens to overlap with the hostcall pointer. The only pattern likely to happen in real life is "GEP to offset 24, typecast to i64*, load 8 bytes". But unless we can guarantee that this is the only pattern, we need something robust.

@jdoerfert pointed me in the correct direction to make full use of existing machinery, but that is now dependent on D119249

Comment at: llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp:581-584
+      if (ArgUsedToRetrieveHostcallPtr(I)) {
+        return false;
+      }
+      return true;
jdoerfert wrote:
> I'd suggest to switch the return value of ArgUsed... so it matches this functions (and other callbacks).
Agreed. In fact switching the return value allowed me to merge the two callbacks into one. I am still keeping the positive name on the outermost function "funcRetrievesHostcallPtr" because it matches the sense near the callsite.

Comment at: llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp:566
+      return false;
+    };
jdoerfert wrote:
> jdoerfert wrote:
> > sameerds wrote:
> > > jdoerfert wrote:
> > > > You can use AAPointerInfo for the call site return IRPosition. It will (through the iterations) gather all accesses and put them into "bins" based on offset and size. It deals with uses in calls, etc. and if there is stuff missing it is better to add it in one place so we benefit throughout. 
> > > I am not following what you have in mind. "implicitarg_ptr" is a pointer returned by an intrinsic that reads an ABI-defined register. I need to check that for a given call-graph, a particular range of bytes relative to that base pointer are never accessed. The above DFS on the uses conservatively assumes that such a load exists unless it can conclusively trace every use of the base pointer. This may include the pointer being passed to an extern function or being stored into a different memory location (although we don't expect ABI registers being capture this way). I am not seeing how to construct this around AAPointerInfo. As far as I can see, the public interface only talks about uses that are recognized as loads and stores.
> > Not actually tested, replaces the function body. Depends on D119249.
> > ```
> > const auto PointerInfoAA = A.getAAFor<AAPointerInfo>(*this, IRPosition::callback_returned(cast<CallBase>(Ptr)), DepClassTy::Required);
> > if (!PointerInfoAA.getState().isValidState())
> >   return true; // Abort (which is weird as false is abort in the other CB).
> > AAPointerInfo::OffsetAndSize OAS(*Position, /* probably look pointer width up in DL */ 8);
> > return !forallInterferingAccesses(OAS, [](const AAPointerInfo::Access &Acc, bool IsExact) {
> >    return Acc.getRemoteInst()->isDroppable(); });
> > ```
> You don't actually need the state check.
> And as I said, this will take care of following pointers passed into callees or through memory to other places, all while ignoring dead code, etc.
I see now. forallInterferingAccesses() does check for valid state on entry, which is sufficient to take care of all the opaque uses like a call to an extern function or a complicated phi or a capturing store. Thanks a ton ... this has been very educational!

  rG LLVM Github Monorepo



More information about the cfe-commits mailing list