[PATCH] D24125: [AMDGPU] Promote uniform i16 ops to i32 ops
Tom Stellard via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 23 12:37:05 PDT 2016
tstellarAMD added inline comments.
================
Comment at: lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:257-263
@@ +256,9 @@
+
+ if (isSigned(I)) {
+ ExtOp1 = Builder.CreateSExt(I.getOperand(1), I32Ty);
+ ExtOp2 = Builder.CreateSExt(I.getOperand(2), I32Ty);
+ } else {
+ ExtOp1 = Builder.CreateZExt(I.getOperand(1), I32Ty);
+ ExtOp2 = Builder.CreateZExt(I.getOperand(2), I32Ty);
+ }
+ ExtRes = Builder.CreateSelect(I.getOperand(0), ExtOp1, ExtOp2);
----------------
I think you can always zero extend for select, since you will be discarding the high-bits with the truncate.
================
Comment at: test/CodeGen/AMDGPU/mul_uint24.ll:39-42
@@ -35,1 +38,6 @@
+; FUNC-LABEL: {{^}}test_umul24_2xi16_sext:
+; GCN: v_mul_u32_u24_e{{(32|64)}} [[MUL:v[0-9]]], {{[sv][0-9], [sv][0-9]}}
+; GCN: v_bfe_i32 v{{[0-9]}}, [[MUL]], 0, 16
+define void @test_umul24_2xi16_sext(<2 x i32> addrspace(1)* %out, <2 x i16> %a, <2 x i16> %b) {
+entry:
----------------
Was this meant to be the duplicate of the above test? If so, I think it would be better to load %a and %b from a global pointer passed in as a kernel argument to guarantee the operands would be in VGPRS:
https://reviews.llvm.org/D24125
More information about the llvm-commits
mailing list