[PATCH] D134635: [AMDGPU][GlobalISel] Add Shift/Shufflevector Combine

Tue Sep 27 03:14:44 PDT 2022

foad added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp:336-338
+  // Fold (G_LSHR (G_BITCAST (G_SHUFFLE_VECTOR %a, %b, shufflemask(1, ?))), K)
+  // into a simple G_BITCAST(%a) if the shift amount (K) is 1/2 of the
+  // destination type and the vector types have 2 elements.
----------------
Pierre-vh wrote:
> foad wrote:
> > This doesn't make sense to me:
> > - The original has 0s in its high bits but the replacement does not.
> > - AMDGPU is little-endian so `(G_BITCAST (G_SHUFFLE_VECTOR %a, %b, shufflemask(1, ?)))` puts the //high// bits of %a in the //low// bits of the result - and then the G_LSHR shifts them out completely.
> This is really a headache to understand for me, but IIRC endianness doesn't affect values themselves, but how they're represented in memory? (https://stackoverflow.com/questions/7184789/does-bit-shift-depend-on-endianness)
> 
> So I think that G_SHUFFLE_VECTOR (1,?) puts the low bits in the high bits of the results, and the LSHR puts them back in place.
> 
> ```
> %a:(<2 x s16>) = [0xDEAD, 0xBEEF]
> %b:(<2 x s16>) = undef
> %c = G_SHUFFLE_VECTOR %a, %b, shufflemask(1,0)
> ; %c = [0xBEEF, 0xDEAD] 
> %d = G_LSHR %c, 16
> ; %d = [0x0000, 0xBEEF] 
> ```
> 
> So the lower 16 words stay the same, but indeed a difference before/after the combine is that the high words have 0000 if we leave the combine, but now they have "junk".
> 
> Maybe an additional trunc is needed in between or something?
> 
> What do you think @arsenm ?
> 
I think you need to check for a mask of (?,0) instead of (1,?).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134635/new/

https://reviews.llvm.org/D134635