[PATCH] D134635: [AMDGPU][GlobalISel] Add Shift/Shufflevector Combine

Tue Sep 27 03:28:45 PDT 2022

Pierre-vh added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp:336-338
+  // Fold (G_LSHR (G_BITCAST (G_SHUFFLE_VECTOR %a, %b, shufflemask(1, ?))), K)
+  // into a simple G_BITCAST(%a) if the shift amount (K) is 1/2 of the
+  // destination type and the vector types have 2 elements.
----------------
foad wrote:
> Pierre-vh wrote:
> > foad wrote:
> > > This doesn't make sense to me:
> > > - The original has 0s in its high bits but the replacement does not.
> > > - AMDGPU is little-endian so `(G_BITCAST (G_SHUFFLE_VECTOR %a, %b, shufflemask(1, ?)))` puts the //high// bits of %a in the //low// bits of the result - and then the G_LSHR shifts them out completely.
> > This is really a headache to understand for me, but IIRC endianness doesn't affect values themselves, but how they're represented in memory? (https://stackoverflow.com/questions/7184789/does-bit-shift-depend-on-endianness)
> > 
> > So I think that G_SHUFFLE_VECTOR (1,?) puts the low bits in the high bits of the results, and the LSHR puts them back in place.
> > 
> > ```
> > %a:(<2 x s16>) = [0xDEAD, 0xBEEF]
> > %b:(<2 x s16>) = undef
> > %c = G_SHUFFLE_VECTOR %a, %b, shufflemask(1,0)
> > ; %c = [0xBEEF, 0xDEAD] 
> > %d = G_LSHR %c, 16
> > ; %d = [0x0000, 0xBEEF] 
> > ```
> > 
> > So the lower 16 words stay the same, but indeed a difference before/after the combine is that the high words have 0000 if we leave the combine, but now they have "junk".
> > 
> > Maybe an additional trunc is needed in between or something?
> > 
> > What do you think @arsenm ?
> > 
> I think you need to check for a mask of (?,0) instead of (1,?).
I think the mask is fine, for instance, in the test this is trying to fix, there's those two shufflevectors that need to be folded away to match the DAG codegen
```
  %3:_(<2 x s16>) = G_SHUFFLE_VECTOR %0:_(<2 x s16>), %4:_, shufflemask(1, 0)
  %37:_(s32) = G_BITCAST %3:_(<2 x s16>)
  %33:_(s32) = G_CONSTANT i32 16
  %38:_(s32) = G_LSHR %37:_, %33:_(s32)
  %29:_(s16) = G_TRUNC %38:_(s32)

  %5:_(<2 x s16>) = G_SHUFFLE_VECTOR %2:_(<2 x s16>), %4:_, shufflemask(1, 1)
  %32:_(s32) = G_BITCAST %5:_(<2 x s16>)
  %34:_(s32) = G_LSHR %32:_, %33:_(s32)
  %21:_(s16) = G_TRUNC %34:_(s32)
``` 

Though to make this fully correct I think I need to take the trunc into account, so I will do that.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134635/new/

https://reviews.llvm.org/D134635