[PATCH] D24125: [AMDGPU] Promote uniform i16 ops to i32 ops

Fri Sep 16 13:53:07 PDT 2016

arsenm added a comment.

There are various integer intrinsics that can also be handled but that can be a later patch


================
Comment at: lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:122
@@ +121,3 @@
+  assert(I.getType()->isIntegerTy(16) && "Op must be 16 bits");
+  assert(DA->isUniform(&I) && "Op must be uniform");
+
----------------
I don't think you need this assert. It's not like the code will be incorrect if this is too aggressive

================
Comment at: lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:144
@@ +143,3 @@
+  I.replaceAllUsesWith(TruncRes);
+  I.dropAllReferences();
+  I.eraseFromParent();
----------------
You shouldn't need this

================
Comment at: lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:153
@@ +152,3 @@
+  assert(I.getOperand(1)->getType()->isIntegerTy(16) && "Op1 must be 16 bits");
+  assert(DA->isUniform(&I) && "Op must be uniform");
+
----------------
Ditto

================
Comment at: lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:173
@@ +172,3 @@
+  I.replaceAllUsesWith(NewICmp);
+  I.dropAllReferences();
+  I.eraseFromParent();
----------------
ditto

================
Comment at: lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:296-298
@@ +295,5 @@
+
+  // TODO: Should we promote smaller types that will be legalized to i16?
+  if (I.getType()->isIntegerTy(16) && DA->isUniform(&I))
+    Changed |= promoteUniformI16OpToI32Op(I);
+
----------------
I think all of these should be skipped if the target doesn't have i16 instructions

================
Comment at: test/CodeGen/AMDGPU/amdgpu-codegenprepare.ll:234-243
@@ -233,1 +233,12 @@
 
+; CHECK-LABEL: @promote_uniform_add_i16(
+; CHECK: %[[A_32:[0-9]+]] = zext i16 %a to i32
+; CHECK: %[[B_32:[0-9]+]] = zext i16 %b to i32
+; CHECK: %[[R_32:[0-9]+]] = add i32 %[[A_32]], %[[B_32]]
+; CHECK: %[[R_16:[0-9]+]] = trunc i32 %[[R_32]] to i16
+; CHECK: ret i16 %[[R_16]]
+define i16 @promote_uniform_add_i16(i16 %a, i16 %b) {
+  %r = add i16 %a, %b
+  ret i16 %r
+}
+
----------------
I think we should probably split this test and rename the existing one since this is testing something very different from the fdiv handling

================
Comment at: test/CodeGen/AMDGPU/amdgpu-codegenprepare.ll:241
@@ +240,3 @@
+define i16 @promote_uniform_add_i16(i16 %a, i16 %b) {
+  %r = add i16 %a, %b
+  ret i16 %r
----------------
There should be tests ensuring that nsw/nuw are preserved as well (I think that should be correct)

================
Comment at: test/CodeGen/AMDGPU/amdgpu-codegenprepare.ll:267-276
@@ +266,12 @@
+
+; CHECK-LABEL: @promote_uniform_udiv_i16(
+; CHECK: %[[A_32:[0-9]+]] = zext i16 %a to i32
+; CHECK: %[[B_32:[0-9]+]] = zext i16 %b to i32
+; CHECK: %[[R_32:[0-9]+]] = udiv i32 %[[A_32]], %[[B_32]]
+; CHECK: %[[R_16:[0-9]+]] = trunc i32 %[[R_32]] to i16
+; CHECK: ret i16 %[[R_16]]
+define i16 @promote_uniform_udiv_i16(i16 %a, i16 %b) {
+  %r = udiv i16 %a, %b
+  ret i16 %r
+}
+
----------------
I'm not sure we want division to be promoted. At least right now it's going to end up emitting the same i32 code

================
Comment at: test/CodeGen/AMDGPU/amdgpu-codegenprepare.ll:340
@@ +339,3 @@
+define i16 @promote_uniform_ashr_i16(i16 %a, i16 %b) {
+  %r = ashr i16 %a, %b
+  ret i16 %r
----------------
There should be a test that shows exact is preserved

================
Comment at: test/CodeGen/AMDGPU/amdgpu-codegenprepare.ll:521-525
@@ +520,7 @@
+; CHECK: ret i16 %[[SEL_16]]
+define i16 @promote_uniform_select_sle_i16(i16 %a, i16 %b) {
+  %cmp = icmp sle i16 %a, %b
+  %sel = select i1 %cmp, i16 %a, i16 %b
+  ret i16 %sel
+}
+
----------------
There should also be tests that this happens with vectors


https://reviews.llvm.org/D24125