[PATCH] D81638: AMDGPU/GlobalISel: Fix 96-bit local loads

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jun 12 07:31:22 PDT 2020


arsenm added a comment.

In D81638#2089681 <https://reviews.llvm.org/D81638#2089681>, @mbrkusanin wrote:

> Yes, it basically avoids problems of not being able to select 3x32 for local address space. SDag was breaking these down to a ds_read_b64 and ds_read_b32 so I did the same thing for GlobalISel.
>
> I've looked at .td files and it seems that the following pattern can be added so ds_read_b96 can be selected
>
>   foreach vt = VReg_64.RegTypes in {
>   defm : DSReadPat_mc <DS_READ_B64, vt, "load_alignX_local">;
>   }
>
>
>   but I'm not sure what should the minimal alignment (X) be for this specific instruction. Any idea? For alignment of 4 every test will pass but, otherwise we'll need to break some cases to b64, b32 pairs.


I'm not sure what the alignment requirement is (I think it also depends on whether unaligned DS access is supported, which we never fully implemented in the compiler).

FYI I am working on completely rewriting the load/store legalization rules and moving this into custom lowering



================
Comment at: llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp:275
   case 96:
-    if (!ST.hasDwordx3LoadStores())
+    if (!ST.hasDwordx3LoadStores() || AS == AMDGPUAS::LOCAL_ADDRESS)
       return false;
----------------
mbrkusanin wrote:
> arsenm wrote:
> > The address space doesn't make this special? This willl break SI?
> SI will be fine because hasDwordx3LoadStores will be false. Local address space uses ds_read and ds_write so the name is slightly confusing.
> 
I mean all the logic here should work without special casing the address space here


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81638/new/

https://reviews.llvm.org/D81638





More information about the llvm-commits mailing list