[PATCH] D81638: AMDGPU/GlobalISel: Fix 96 and 128 local loads and stores

Mon Jun 29 10:48:16 PDT 2020

mbrkusanin marked 3 inline comments as done.
mbrkusanin added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructions.td:507
+def store_align4_local: PatFrag<(ops node:$val, node:$ptr),
+                                (store_local node:$val, node:$ptr)>, Aligned<4> {
+  let IsStore = 1;
----------------
arsenm wrote:
> I think the Aligned<> subclasses didn't actually work for some reason, but I only half fixed the patterns maybe?
I changed it so now Aligned<> subclasses are used for both load and store. They seem to work fine.

================
Comment at: llvm/lib/Target/AMDGPU/DSInstructions.td:683-705
+let SubtargetPredicate = isGFX7GFX8 in {
+
+foreach vt = VReg_96.RegTypes in {
+defm : DSReadPat_mc <DS_READ_B96, vt, "load_align16_local">;
+}
+
+foreach vt = VReg_128.RegTypes in {
----------------
arsenm wrote:
> You shouldn't need to re-consider the legalization logic. The selector can mostly assume legal inputs. If the less aligned version wasn't legal, it should have been broken down. This also depends more specifically on the unaligned features, rather than gfx78
I couldn't get rid of subtarget predicates because of the way SDag uses allowsMisalignedMemoryAccessesImpl. For example on gfx7/8, ds_read_b128 requires alignment of 16, but we need to say that alignment of 8 is also okay because we can pick ds_read2_b64. GISel however just sees that alignment of 8 is okay and picks ds_read_b128 instead of ds_read2_b64. If both are acceptable according to DSInstructions.td then GIsel will pick the first one (If i change the order in .td file and move it up it will actually pick ds_read2_b64 but that breaks any structure that file had).

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:1365-1386
+    if (Size == 64) {
+      // ds_read/write_b64 require 8-byte alignment, but we can do a 4 byte
+      // aligned, 8 byte access in a single operation using ds_read2/write2_b32
+      // with adjacent offsets.
+      bool AlignedBy4 = (Align % 4 == 0);
+      if (IsFast)
+        *IsFast = AlignedBy4;
----------------
arsenm wrote:
> Can you commit this as a separate patch? This changes the DAG path too. I also don't think "unaligned" here means requires 4 byte alignment
Moved to D82788

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81638/new/

https://reviews.llvm.org/D81638