[PATCH] D24125: [AMDGPU] Promote uniform i16 ops to i32 ops

Fri Sep 23 12:37:05 PDT 2016

tstellarAMD added inline comments.

================
Comment at: lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:257-263
@@ +256,9 @@
+
+  if (isSigned(I)) {
+    ExtOp1 = Builder.CreateSExt(I.getOperand(1), I32Ty);
+    ExtOp2 = Builder.CreateSExt(I.getOperand(2), I32Ty);
+  } else {
+    ExtOp1 = Builder.CreateZExt(I.getOperand(1), I32Ty);
+    ExtOp2 = Builder.CreateZExt(I.getOperand(2), I32Ty);
+  }
+  ExtRes = Builder.CreateSelect(I.getOperand(0), ExtOp1, ExtOp2);
----------------
I think you can always zero extend for select, since you will be discarding the high-bits with the truncate.

================
Comment at: test/CodeGen/AMDGPU/mul_uint24.ll:39-42
@@ -35,1 +38,6 @@
 
+; FUNC-LABEL: {{^}}test_umul24_2xi16_sext:
+; GCN: v_mul_u32_u24_e{{(32|64)}} [[MUL:v[0-9]]], {{[sv][0-9], [sv][0-9]}}
+; GCN: v_bfe_i32 v{{[0-9]}}, [[MUL]], 0, 16
+define void @test_umul24_2xi16_sext(<2 x i32> addrspace(1)* %out, <2 x i16> %a, <2 x i16> %b) {
+entry:
----------------
Was this meant to be the duplicate of the above test?  If so, I think it would be better to load %a and %b from a global pointer passed in as a kernel argument to guarantee the operands would be in VGPRS:



https://reviews.llvm.org/D24125