[PATCH] D87174: [GlobalISel] Add `X,Y<dead> = G_UNMERGE Z` -> X = G_TRUNC Z
Matt Arsenault via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Sep 8 11:51:03 PDT 2020
arsenm added inline comments.
================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-llvm.amdgcn.image.store.2d.d16.ll:164
+ ; PACKED: [[CONCAT_VECTORS1:%[0-9]+]]:_(<6 x s16>) = G_CONCAT_VECTORS [[BITCAST1]](<2 x s16>), [[BITCAST2]](<2 x s16>), [[DEF]](<2 x s16>)
+ ; PACKED: [[EXTRACT:%[0-9]+]]:_(<3 x s16>) = G_EXTRACT [[CONCAT_VECTORS1]](<6 x s16>), 0
; PACKED: [[BUILD_VECTOR1:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[COPY8]](s32), [[COPY9]](s32)
----------------
qcolombet wrote:
> @arsenm At first glance all the changes in AMDGPU seems fine but this one.
>
> Looking at when the transformation kicks in, the input is:
> ```
> %16:_(<6 x s16>) = G_CONCAT_VECTORS %13:_(<2 x s16>), %14:_(<2 x s16>), %15:_(<2 x s16>)
> %3:_(<3 x s16>), %17:_(<3 x s16>) = G_UNMERGE_VALUES %16:_(<6 x s16>)
> G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.2d), %3:_(<3 x s16>), 7, %1:_(s32), %2:_(s32), %0:_(<8 x s32>), 0, 0 :: (dereferenceable store 6 into custom "TargetCustom8", align 8)
> S_ENDPGM 0
> ```
> And the output is:
> ```
> %16:_(<6 x s16>) = G_CONCAT_VECTORS %13:_(<2 x s16>), %14:_(<2 x s16>), %15:_(<2 x s16>)
> %19:_(s96) = G_BITCAST %16:_(<6 x s16>)
> %20:_(s48) = G_TRUNC %19:_(s96)
> %3:_(<3 x s16>) = G_BITCAST %20:_(s48)
> G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.2d), %3:_(<3 x s16>), 7, %1:_(s32), %2:_(s32), %0:_(<8 x s32>), 0, 0 :: (dereferenceable store 6 into custom "TargetCustom8", align 8)
> S_ENDPGM 0
> ```
>
> So far so good.
>
> Then after the legalizer it is when we have the craziness:
> ```
> %16:_(<6 x s16>) = G_CONCAT_VECTORS %13:_(<2 x s16>), %14:_(<2 x s16>), %15:_(<2 x s16>)
> %19:_(s96) = G_BITCAST %16:_(<6 x s16>)
> %28:_(s32), %29:_(s32), %30:_(s32) = G_UNMERGE_VALUES %19:_(s96)
> %35:_(s32) = G_CONSTANT i32 16
> %36:_(s32) = G_LSHR %28:_, %35:_(s32)
> %37:_(s32) = G_LSHR %29:_, %35:_(s32)
> %46:_(s32) = G_CONSTANT i32 65535
> %49:_(s32) = COPY %28:_(s32)
> %40:_(s32) = G_AND %49:_, %46:_
> %48:_(s32) = COPY %36:_(s32)
> %41:_(s32) = G_AND %48:_, %46:_
> %42:_(s32) = G_SHL %41:_, %35:_(s32)
> %38:_(s32) = G_OR %40:_, %42:_
> %32:_(<2 x s16>) = G_BITCAST %38:_(s32)
> %47:_(s32) = COPY %29:_(s32)
> %43:_(s32) = G_AND %47:_, %46:_
> %44:_(s32) = G_CONSTANT i32 0
> %45:_(s32) = G_SHL %44:_, %35:_(s32)
> %39:_(s32) = G_OR %43:_, %45:_
> %33:_(<2 x s16>) = G_BITCAST %39:_(s32)
> %34:_(<6 x s16>) = G_CONCAT_VECTORS %32:_(<2 x s16>), %33:_(<2 x s16>), %15:_(<2 x s16>)
> %3:_(<3 x s16>) = G_EXTRACT %34:_(<6 x s16>), 0
> %21:_(<2 x s32>) = G_BUILD_VECTOR %1:_(s32), %2:_(s32)
> G_AMDGPU_INTRIN_IMAGE_STORE intrinsic(@llvm.amdgcn.image.store.2d), %3:_(<3 x s16>), 7, %21:_(<2 x s32>), $noreg, %0:_(<8 x s32>), 0, 0, 0 :: (dereferenceable store 6 into custom "TargetCustom8", align 8)
> S_ENDPGM 0
> ```
>
> Do you think the AMDGPU target is missing something or should I disable the combine for vector types, at least for now?
This is fine. <3 x s16> is problematic and I'm working on eliminating all of them now.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D87174/new/
https://reviews.llvm.org/D87174
More information about the llvm-commits
mailing list