[PATCH] D105253: GlobalISel: Handle lowering non-power-of-2 extloads

Thu Jul 1 11:20:07 PDT 2021

arsenm added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-load-global.mir:15995-16020
+    ; GFX9-MESA: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 8
+    ; GFX9-MESA: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[LOAD]], [[C1]](s32)
+    ; GFX9-MESA: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
+    ; GFX9-MESA: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[LOAD]], [[C2]](s32)
+    ; GFX9-MESA: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 24
+    ; GFX9-MESA: [[LSHR2:%[0-9]+]]:_(s32) = G_LSHR [[LOAD]], [[C3]](s32)
+    ; GFX9-MESA: [[DEF:%[0-9]+]]:_(s32) = G_IMPLICIT_DEF
----------------
foad wrote:
> Why is this so much more convoluted than the GFX9-HSA case? The loads look the same, it's just all the shifting and ORing afterwards that looks crazy here.
This is an artifact from the terrible way we currently handle unaligned accesses. We treat it as a narrowScalar action, which doesn't really make sense. I'm trying to move towards making widenScalar/narrowScalar only touch the register size, and leave the memory access alone. Unaligned access decomposition is a kind of lowering, and only tangentially related to the register types needed after legalization.

The HSA case enables unaligned access and the mesa case doesn't, so we start out by reporting we need to narrow the s32 result to s24. When that load is legalized, it ends up producing this mess. Once lowering handles alignment decomposition they should look the same

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105253/new/

https://reviews.llvm.org/D105253