[PATCH] D81638: AMDGPU/GlobalISel: Fix 96 and 128 local loads and stores

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jun 22 12:54:43 PDT 2020


arsenm added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUInstructions.td:507
+def store_align4_local: PatFrag<(ops node:$val, node:$ptr),
+                                (store_local node:$val, node:$ptr)>, Aligned<4> {
+  let IsStore = 1;
----------------
I think the Aligned<> subclasses didn't actually work for some reason, but I only half fixed the patterns maybe?


================
Comment at: llvm/lib/Target/AMDGPU/DSInstructions.td:683-705
+let SubtargetPredicate = isGFX7GFX8 in {
+
+foreach vt = VReg_96.RegTypes in {
+defm : DSReadPat_mc <DS_READ_B96, vt, "load_align16_local">;
+}
+
+foreach vt = VReg_128.RegTypes in {
----------------
You shouldn't need to re-consider the legalization logic. The selector can mostly assume legal inputs. If the less aligned version wasn't legal, it should have been broken down. This also depends more specifically on the unaligned features, rather than gfx78


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:1365-1386
+    if (Size == 64) {
+      // ds_read/write_b64 require 8-byte alignment, but we can do a 4 byte
+      // aligned, 8 byte access in a single operation using ds_read2/write2_b32
+      // with adjacent offsets.
+      bool AlignedBy4 = (Align % 4 == 0);
+      if (IsFast)
+        *IsFast = AlignedBy4;
----------------
Can you commit this as a separate patch? This changes the DAG path too. I also don't think "unaligned" here means requires 4 byte alignment


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81638/new/

https://reviews.llvm.org/D81638





More information about the llvm-commits mailing list