[PATCH] D109228: [AMDGPU][GlobalISel] Legalize G_MUL for non-standard types

Fri Sep 3 07:41:52 PDT 2021

mbrkusanin added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll:628-651
 ; GFX10-LABEL: v_mul_i96:
 ; GFX10:       ; %bb.0:
 ; GFX10-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX10-NEXT:    s_waitcnt_vscnt null, 0x0
-; GFX10-NEXT:    v_mul_lo_u32 v6, v1, v3
-; GFX10-NEXT:    v_mul_lo_u32 v7, v0, v4
-; GFX10-NEXT:    v_mul_hi_u32 v8, v0, v3
-; GFX10-NEXT:    v_mul_lo_u32 v9, v1, v4
 ; GFX10-NEXT:    v_mul_lo_u32 v2, v2, v3
+; GFX10-NEXT:    v_mul_lo_u32 v6, v1, v4
+; GFX10-NEXT:    v_mul_lo_u32 v8, v0, v4
----------------
Now s96 is widened to 128 and then truncated down to 96 which is why those add3 instructions are gone. They will only be selected for most significant register/bits. Here these registers will end up dead after trunc.

A rule to widen to next multiple of 32 might be better then next power of 2 (might not make sense for scalars smaller then 16, because we want s16 in some cases). This way scalars in range (65,96) will be widened into 96, not 128. Same for anything above 128, we don't need to go from 4x32 to 8x32. 

So a rule like widenToNextMultipleOf32 followed by clampScalar(0, S16, S32) that is already there should do the trick.

What do you think @foad?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109228/new/

https://reviews.llvm.org/D109228