[PATCH] D87174: [GlobalISel] Add `X,Y<dead> = G_UNMERGE Z` -> X = G_TRUNC Z

Tue Sep 8 11:51:03 PDT 2020

arsenm added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-llvm.amdgcn.image.store.2d.d16.ll:164
+  ; PACKED:   [[CONCAT_VECTORS1:%[0-9]+]]:_(<6 x s16>) = G_CONCAT_VECTORS [[BITCAST1]](<2 x s16>), [[BITCAST2]](<2 x s16>), [[DEF]](<2 x s16>)
+  ; PACKED:   [[EXTRACT:%[0-9]+]]:_(<3 x s16>) = G_EXTRACT [[CONCAT_VECTORS1]](<6 x s16>), 0
   ; PACKED:   [[BUILD_VECTOR1:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32)
----------------
qcolombet wrote:
> @arsenm At first glance all the changes in AMDGPU seems fine but this one.
> 
> Looking at when the transformation kicks in, the input is:
> ```
>   %16:_(<6 x s16>) = G_CONCAT_VECTORS %13:_(<2 x s16>), %14:_(<2 x s16>), %15:_(<2 x s16>)
>   %3:_(<3 x s16>), %17:_(<3 x s16>) = G_UNMERGE_VALUES %16:_(<6 x s16>)
>   G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.2d), %3:_(<3 x s16>), 7, %1:_(s32), %2:_(s32), %0:_(<8 x s32>), 0, 0 :: (dereferenceable store 6 into custom "TargetCustom8", align 8)
>   S_ENDPGM 0
> ```
> And the output is:
> ```
>   %16:_(<6 x s16>) = G_CONCAT_VECTORS %13:_(<2 x s16>), %14:_(<2 x s16>), %15:_(<2 x s16>)
>   %19:_(s96) = G_BITCAST %16:_(<6 x s16>)
>   %20:_(s48) = G_TRUNC %19:_(s96)
>   %3:_(<3 x s16>) = G_BITCAST %20:_(s48)
>   G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.2d), %3:_(<3 x s16>), 7, %1:_(s32), %2:_(s32), %0:_(<8 x s32>), 0, 0 :: (dereferenceable store 6 into custom "TargetCustom8", align 8)
>   S_ENDPGM 0
> ```
> 
> So far so good.
> 
> Then after the legalizer it is when we have the craziness:
> ```
>   %16:_(<6 x s16>) = G_CONCAT_VECTORS %13:_(<2 x s16>), %14:_(<2 x s16>), %15:_(<2 x s16>)
>   %19:_(s96) = G_BITCAST %16:_(<6 x s16>)
>   %28:_(s32), %29:_(s32), %30:_(s32) = G_UNMERGE_VALUES %19:_(s96)
>   %35:_(s32) = G_CONSTANT i32 16
>   %36:_(s32) = G_LSHR %28:_, %35:_(s32)
>   %37:_(s32) = G_LSHR %29:_, %35:_(s32)
>   %46:_(s32) = G_CONSTANT i32 65535
>   %49:_(s32) = COPY %28:_(s32)
>   %40:_(s32) = G_AND %49:_, %46:_
>   %48:_(s32) = COPY %36:_(s32)
>   %41:_(s32) = G_AND %48:_, %46:_
>   %42:_(s32) = G_SHL %41:_, %35:_(s32)
>   %38:_(s32) = G_OR %40:_, %42:_
>   %32:_(<2 x s16>) = G_BITCAST %38:_(s32)
>   %47:_(s32) = COPY %29:_(s32)
>   %43:_(s32) = G_AND %47:_, %46:_
>   %44:_(s32) = G_CONSTANT i32 0
>   %45:_(s32) = G_SHL %44:_, %35:_(s32)
>   %39:_(s32) = G_OR %43:_, %45:_
>   %33:_(<2 x s16>) = G_BITCAST %39:_(s32)
>   %34:_(<6 x s16>) = G_CONCAT_VECTORS %32:_(<2 x s16>), %33:_(<2 x s16>), %15:_(<2 x s16>)
>   %3:_(<3 x s16>) = G_EXTRACT %34:_(<6 x s16>), 0
>   %21:_(<2 x s32>) = G_BUILD_VECTOR %1:_(s32), %2:_(s32)
>   G_AMDGPU_INTRIN_IMAGE_STORE intrinsic(@llvm.amdgcn.image.store.2d), %3:_(<3 x s16>), 7, %21:_(<2 x s32>), $noreg, %0:_(<8 x s32>), 0, 0, 0 :: (dereferenceable store 6 into custom "TargetCustom8", align 8)
>   S_ENDPGM 0
> ```
> 
> Do you think the AMDGPU target is missing something or should I disable the combine for vector types, at least for now?
This is fine. <3 x s16> is problematic and I'm working on eliminating all of them now. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D87174/new/

https://reviews.llvm.org/D87174