[PATCH] D24125: [AMDGPU] Promote uniform i16 ops to i32 ops
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 16 13:53:07 PDT 2016
arsenm added a comment.
There are various integer intrinsics that can also be handled but that can be a later patch
================
Comment at: lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:122
@@ +121,3 @@
+ assert(I.getType()->isIntegerTy(16) && "Op must be 16 bits");
+ assert(DA->isUniform(&I) && "Op must be uniform");
+
----------------
I don't think you need this assert. It's not like the code will be incorrect if this is too aggressive
================
Comment at: lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:144
@@ +143,3 @@
+ I.replaceAllUsesWith(TruncRes);
+ I.dropAllReferences();
+ I.eraseFromParent();
----------------
You shouldn't need this
================
Comment at: lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:153
@@ +152,3 @@
+ assert(I.getOperand(1)->getType()->isIntegerTy(16) && "Op1 must be 16 bits");
+ assert(DA->isUniform(&I) && "Op must be uniform");
+
----------------
Ditto
================
Comment at: lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:173
@@ +172,3 @@
+ I.replaceAllUsesWith(NewICmp);
+ I.dropAllReferences();
+ I.eraseFromParent();
----------------
ditto
================
Comment at: lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp:296-298
@@ +295,5 @@
+
+ // TODO: Should we promote smaller types that will be legalized to i16?
+ if (I.getType()->isIntegerTy(16) && DA->isUniform(&I))
+ Changed |= promoteUniformI16OpToI32Op(I);
+
----------------
I think all of these should be skipped if the target doesn't have i16 instructions
================
Comment at: test/CodeGen/AMDGPU/amdgpu-codegenprepare.ll:234-243
@@ -233,1 +233,12 @@
+; CHECK-LABEL: @promote_uniform_add_i16(
+; CHECK: %[[A_32:[0-9]+]] = zext i16 %a to i32
+; CHECK: %[[B_32:[0-9]+]] = zext i16 %b to i32
+; CHECK: %[[R_32:[0-9]+]] = add i32 %[[A_32]], %[[B_32]]
+; CHECK: %[[R_16:[0-9]+]] = trunc i32 %[[R_32]] to i16
+; CHECK: ret i16 %[[R_16]]
+define i16 @promote_uniform_add_i16(i16 %a, i16 %b) {
+ %r = add i16 %a, %b
+ ret i16 %r
+}
+
----------------
I think we should probably split this test and rename the existing one since this is testing something very different from the fdiv handling
================
Comment at: test/CodeGen/AMDGPU/amdgpu-codegenprepare.ll:241
@@ +240,3 @@
+define i16 @promote_uniform_add_i16(i16 %a, i16 %b) {
+ %r = add i16 %a, %b
+ ret i16 %r
----------------
There should be tests ensuring that nsw/nuw are preserved as well (I think that should be correct)
================
Comment at: test/CodeGen/AMDGPU/amdgpu-codegenprepare.ll:267-276
@@ +266,12 @@
+
+; CHECK-LABEL: @promote_uniform_udiv_i16(
+; CHECK: %[[A_32:[0-9]+]] = zext i16 %a to i32
+; CHECK: %[[B_32:[0-9]+]] = zext i16 %b to i32
+; CHECK: %[[R_32:[0-9]+]] = udiv i32 %[[A_32]], %[[B_32]]
+; CHECK: %[[R_16:[0-9]+]] = trunc i32 %[[R_32]] to i16
+; CHECK: ret i16 %[[R_16]]
+define i16 @promote_uniform_udiv_i16(i16 %a, i16 %b) {
+ %r = udiv i16 %a, %b
+ ret i16 %r
+}
+
----------------
I'm not sure we want division to be promoted. At least right now it's going to end up emitting the same i32 code
================
Comment at: test/CodeGen/AMDGPU/amdgpu-codegenprepare.ll:340
@@ +339,3 @@
+define i16 @promote_uniform_ashr_i16(i16 %a, i16 %b) {
+ %r = ashr i16 %a, %b
+ ret i16 %r
----------------
There should be a test that shows exact is preserved
================
Comment at: test/CodeGen/AMDGPU/amdgpu-codegenprepare.ll:521-525
@@ +520,7 @@
+; CHECK: ret i16 %[[SEL_16]]
+define i16 @promote_uniform_select_sle_i16(i16 %a, i16 %b) {
+ %cmp = icmp sle i16 %a, %b
+ %sel = select i1 %cmp, i16 %a, i16 %b
+ ret i16 %sel
+}
+
----------------
There should also be tests that this happens with vectors
https://reviews.llvm.org/D24125
More information about the llvm-commits
mailing list