[PATCH] D81638: AMDGPU/GlobalISel: Fix 96 and 128 local loads and stores

Mon Jun 22 12:54:43 PDT 2020

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructions.td:507
+def store_align4_local: PatFrag<(ops node:$val, node:$ptr),
+                                (store_local node:$val, node:$ptr)>, Aligned<4> {
+  let IsStore = 1;
----------------
I think the Aligned<> subclasses didn't actually work for some reason, but I only half fixed the patterns maybe?

================
Comment at: llvm/lib/Target/AMDGPU/DSInstructions.td:683-705
+let SubtargetPredicate = isGFX7GFX8 in {
+
+foreach vt = VReg_96.RegTypes in {
+defm : DSReadPat_mc <DS_READ_B96, vt, "load_align16_local">;
+}
+
+foreach vt = VReg_128.RegTypes in {
----------------
You shouldn't need to re-consider the legalization logic. The selector can mostly assume legal inputs. If the less aligned version wasn't legal, it should have been broken down. This also depends more specifically on the unaligned features, rather than gfx78

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:1365-1386
+    if (Size == 64) {
+      // ds_read/write_b64 require 8-byte alignment, but we can do a 4 byte
+      // aligned, 8 byte access in a single operation using ds_read2/write2_b32
+      // with adjacent offsets.
+      bool AlignedBy4 = (Align % 4 == 0);
+      if (IsFast)
+        *IsFast = AlignedBy4;
----------------
Can you commit this as a separate patch? This changes the DAG path too. I also don't think "unaligned" here means requires 4 byte alignment

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81638/new/

https://reviews.llvm.org/D81638