[PATCH] D134635: [AMDGPU][GlobalISel] Add Shift/Shufflevector Combine

Thu Sep 29 01:37:17 PDT 2022

foad added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPURegBankCombiner.cpp:336-338
+  // Fold (G_LSHR (G_BITCAST (G_SHUFFLE_VECTOR %a, %b, shufflemask(1, ?))), K)
+  // into a simple G_BITCAST(%a) if the shift amount (K) is 1/2 of the
+  // destination type and the vector types have 2 elements.
----------------
Pierre-vh wrote:
> foad wrote:
> > Pierre-vh wrote:
> > > foad wrote:
> > > > This doesn't make sense to me:
> > > > - The original has 0s in its high bits but the replacement does not.
> > > > - AMDGPU is little-endian so `(G_BITCAST (G_SHUFFLE_VECTOR %a, %b, shufflemask(1, ?)))` puts the //high// bits of %a in the //low// bits of the result - and then the G_LSHR shifts them out completely.
> > > This is really a headache to understand for me, but IIRC endianness doesn't affect values themselves, but how they're represented in memory? (https://stackoverflow.com/questions/7184789/does-bit-shift-depend-on-endianness)
> > > 
> > > So I think that G_SHUFFLE_VECTOR (1,?) puts the low bits in the high bits of the results, and the LSHR puts them back in place.
> > > 
> > > ```
> > > %a:(<2 x s16>) = [0xDEAD, 0xBEEF]
> > > %b:(<2 x s16>) = undef
> > > %c = G_SHUFFLE_VECTOR %a, %b, shufflemask(1,0)
> > > ; %c = [0xBEEF, 0xDEAD] 
> > > %d = G_LSHR %c, 16
> > > ; %d = [0x0000, 0xBEEF] 
> > > ```
> > > 
> > > So the lower 16 words stay the same, but indeed a difference before/after the combine is that the high words have 0000 if we leave the combine, but now they have "junk".
> > > 
> > > Maybe an additional trunc is needed in between or something?
> > > 
> > > What do you think @arsenm ?
> > > 
> > I think you need to check for a mask of (?,0) instead of (1,?).
> I think the mask is fine, for instance, in the test this is trying to fix, there's those two shufflevectors that need to be folded away to match the DAG codegen
> ```
>   %3:_(<2 x s16>) = G_SHUFFLE_VECTOR %0:_(<2 x s16>), %4:_, shufflemask(1, 0)
>   %37:_(s32) = G_BITCAST %3:_(<2 x s16>)
>   %33:_(s32) = G_CONSTANT i32 16
>   %38:_(s32) = G_LSHR %37:_, %33:_(s32)
>   %29:_(s16) = G_TRUNC %38:_(s32)
> 
>   %5:_(<2 x s16>) = G_SHUFFLE_VECTOR %2:_(<2 x s16>), %4:_, shufflemask(1, 1)
>   %32:_(s32) = G_BITCAST %5:_(<2 x s16>)
>   %34:_(s32) = G_LSHR %32:_, %33:_(s32)
>   %21:_(s16) = G_TRUNC %34:_(s32)
> ``` 
> 
> Though to make this fully correct I think I need to take the trunc into account, so I will do that.
> 
> 
> This is really a headache to understand for me, but IIRC endianness doesn't affect values themselves, but how they're represented in memory?

It affects bitcasts. See https://llvm.org/docs/LangRef.html#vector-type:

> A bitcast from a vector type to a scalar integer type will see the elements being packed together (without padding). The order in which elements are inserted in the integer depends on endianess. For little endian element zero is put in the least significant bits of the integer, and for big endian element zero is put in the most significant bits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134635/new/

https://reviews.llvm.org/D134635