[PATCH] D92483: AMDGPU - Use MUBUF instructions for global address space access

praveen velliengiri via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 16 08:54:58 PST 2020


pvellien added a comment.

Comments inline



================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:192
     TargetTriple(TT),
-    Gen(TT.getOS() == Triple::AMDHSA ? SEA_ISLANDS : SOUTHERN_ISLANDS),
+    Gen(!GPU.contains("generic") ? SOUTHERN_ISLANDS :
+        (TT.getOS() == Triple::AMDHSA ? SEA_ISLANDS : SOUTHERN_ISLANDS)),
----------------
rampitec wrote:
> pvellien wrote:
> > rampitec wrote:
> > > pvellien wrote:
> > > > rampitec wrote:
> > > > > pvellien wrote:
> > > > > > rampitec wrote:
> > > > > > > pvellien wrote:
> > > > > > > > t-tye wrote:
> > > > > > > > > Is this asking if the code contains use of the generic address space?
> > > > > > > > No, this isn't a check for generic address space. The GPU string is set to a string "generic" by default when no target cpu is specified. It has nothing to do with address spaces access.
> > > > > > > > when OS is HSA it is set to "generic-hsa" and in all other cases it is set to "generic".
> > > > > > > I would probably just error out if we have a generic CPU + amdhsa requested. It seems cleaner.
> > > > > > Thanks for your comment, this patch preserves the earlier functionality except the use of mubuf. Perhaps we can address your suggestion in a separate patch with some added context, and exploring the possibilities of reporting the error in other target-triple, cpu pairs? 
> > > > > It is OK to have it in a separate commit, but I do not follow logic here. You are setting SI for any NON-generic chip.
> > > > For a non-generic cpu, the correct Gen member is set by the ParseSubtargetFeatures function call (generated by tablegen) in initializeSubtargetDependencies function. It is similar to the previous version of this code, where Gen is set to SI if OS is not HSA and set to CI if OS is HSA and correct Gen is set by the call to ParseSubtargetFeatures function. 
> > > Then this line is a no-op? You say: "for a non-generic CPU gen is initialized elsewhere". An this change only forces SI for a non-generic CPU.
> > Actually there are bunch of lit tests which assumes Gen as CI if cpu isn't specified with AMDHSA OS. This change just means for HSA OS if the cpu isn't specified it initializes to CI, for non-HSA if cpu isn't specified it initializes to SI. 
> > Do I want to change that  if cpu isn't specified, then all mtriple assumes to SI? 
> > 
> I still do not understand how does that correspond to the condition. That is the old condition:
> 
> ```
> TT.getOS() == Triple::AMDHSA ? SEA_ISLANDS : SOUTHERN_ISLANDS
> ```
> And that is literally what you have described: "This change just means for HSA OS if the cpu isn't specified it initializes to CI, for non-HSA if cpu isn't specified it initializes to SI".
> 
> At the same time this code:
> 
> ```
> !GPU.contains("generic") ? SOUTHERN_ISLANDS
> ```
> Reads: use SI for any CPU besides generic regardless of the triple.
Sorry for the delay. I think I didn't explain it very well in my previous comments. So I take an attempt to describe here: 
The problem is that gfx600 processors does not support flat instructions for global load/stores. But the compiler tries to use flat, which result in a crash in instruction selection. The root cause of this problem is due to the initialization in the code:
```
TT.getOS() == Triple::AMDHSA ? SEA_ISLANDS : SOUTHERN_ISLANDS
```
for HSA OS type and -mcpu=gfx600 the above code forces Gen member variable to be SEA_ISLANDS where it should have been SOUTHERN_ISLANDS. This bug only shows up when you try to use HSA OS for -mcpu=gfx600 processors. 

What I'm trying to do is initialize the Gen member variable in following cases:
when OS is HSA
   if mcpu is specified, set Gen to the correct generation that cpu belongs to, by initializing it with SOUTHERN_ISLANDS.
   if mcpu is not specified (in this case GPU = "generic-hsa"), set Gen to the SEA_LANDS to be in parity with previous code

I didn't see how I can pick up the correct generation that the processor belong to without forcing this in constructor. I would like to know you thoughts on this? 
Please let me know if there is any other way to solve this? 



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92483/new/

https://reviews.llvm.org/D92483



More information about the llvm-commits mailing list