[llvm] [AMDGPU] Vectorize more 16 bit shuffles (PR #90648)
Jeffrey Byrnes via llvm-commits
llvm-commits at lists.llvm.org
Tue May 7 12:39:37 PDT 2024
================
@@ -211,6 +267,18 @@ define <2 x i32> @ssub_sat_v2i32(<2 x i32> %arg0, <2 x i32> %arg1) {
; GCN-NEXT: [[INS_1:%.*]] = insertelement <2 x i32> [[INS_0]], i32 [[ADD_1]], i64 1
; GCN-NEXT: ret <2 x i32> [[INS_1]]
;
+; GFX9-LABEL: @ssub_sat_v2i32(
+; GFX9-NEXT: bb:
+; GFX9-NEXT: [[ARG0_0:%.*]] = extractelement <2 x i32> [[ARG0:%.*]], i64 0
+; GFX9-NEXT: [[ARG0_1:%.*]] = extractelement <2 x i32> [[ARG0]], i64 1
+; GFX9-NEXT: [[ARG1_0:%.*]] = extractelement <2 x i32> [[ARG1:%.*]], i64 0
+; GFX9-NEXT: [[ARG1_1:%.*]] = extractelement <2 x i32> [[ARG1]], i64 1
+; GFX9-NEXT: [[ADD_0:%.*]] = call i32 @llvm.ssub.sat.i32(i32 [[ARG0_0]], i32 [[ARG1_0]])
----------------
jrbyrnes wrote:
> Why did i32 cases change?
It was because I changed the way checks were generated. Fixed.
> Gfx940 has some packed 32-bit ops but I'm not sure this cost model was ever updated to account for that
Looks like the cost model has accurate widths and cost for PackedFP32 https://github.com/llvm/llvm-project/blob/7115ed0fff027b65fa76fdfae215ed1382ed1473/llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp#L609
https://github.com/llvm/llvm-project/pull/90648
More information about the llvm-commits
mailing list