[PATCH] D134433: [AMDGPU][GISel] Enable Matching of V2S16 G_BUILD_VECTOR

Petar Avramovic via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Sep 30 02:03:12 PDT 2022


Petar.Avramovic added a comment.

Just to point out most notable regression:

  define amdgpu_ps <3 x half> @min3(<3 x half> %src0, <3 x half> %src1) {
    %min3 = call <3 x half> @llvm.minnum.v3f16(<3 x half> %src0, <3 x half> %src1)
    ret <3 x half> %min3
  }
  
  declare <3 x half> @llvm.minnum.v3f16(<3 x half>, <3 x half>)

goes from

  	v_pk_min_f16 v0, v0, v2
  	v_pk_min_f16 v1, v1, v3

to

  	v_min_f16_e32 v4, v0, v2
  	v_min_f16_sdwa v0, v0, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
  	v_min_f16_e32 v1, v1, v3
  	v_lshl_or_b32 v0, v0, 16, v4

D134354 <https://reviews.llvm.org/D134354> does mad_mix combine for straightforward test cases same as sdag. The test you  mentioned here `v_mad_mix_v3f32_clamp_postcvt`
does the mad_mix selection in the same way in both cases, the difference you pointed out is clamp that failed to combine, i.e you don't need this patch in order to select mad_mix.

to get same result as sdag you will need:
combine for clamp for v2f16 where one element is undef
maybe to not lower build_vector_trunc in regbankselect so that it is easier to look through, it is easy to select.

Overall it is probably best to go with D134354 <https://reviews.llvm.org/D134354> first and then optimize code.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D134433/new/

https://reviews.llvm.org/D134433



More information about the llvm-commits mailing list