[PATCH] D46760: [InstCombine] Enhance narrowUDivURem.

Wed Jun 6 16:04:45 PDT 2018

bixia marked an inline comment as done.
bixia added a comment.

@spatel  Thanks for your comment! I need to revise my example to better match the real case I am looking at-- in particular, to show that the optimization will increase the number of zext instructions. In this example, there are only two zext instructions. However, if we narrow all the arithmetic operations to i32 we will need four zext instructions. In one of your previous comment, you have suggested to extend  TruncInstCombine in aggressive-instcombine. I see the following issues: (1) TruncInstCombine doesn't increase the number of zext/sext instructions. (2) TruncInstCombine currently only handles operations with this properties truncate(op(i64), i32) == op(truncate(i64, i32)),  div/rem/right-shift which are needed here aren't part of such operations. We can properly add value range analysis to resolve this though. (3)  TruncInstCombine is driven by the truncate instruction in the IR, and there is no such truncate instruction in the case here.  To handle the case here (such as the pattern "lshr followed by shl" you mentioned) , is it acceptable to add a new optimization to aggressive-instcombine that can increase the number of zext/sext instructions?

; Function Attrs: nounwind
declare i32 @get_number() #0
declare i32 @use64_3(i64, i64, i64)

define void @narrow_long_chain_with_udiv_urem_2(i64* %result) {
%num1 = call i32 @get_number(), !range !0
%block_id = zext i32 %num1 to i64
%num2 = call i32 @get_number(), !range !0
%thread_id = zext i32 %num2 to i64
%tmp = mul nuw nsw i64 %block_id, 64
%linear_index = add nuw nsw i64 %tmp, %thread_id
%tmp1 = udiv i64 %linear_index, 1
%x = urem i64 %tmp1, 128 
%tmp2 = udiv i64 %tmp1, 128 
%y = urem i64 %tmp2, 32
%z = udiv i64 %tmp2, 32
call i32 @use64_3(i64 %x, i64 %y, i64 %z)
%warp_id = udiv i64 %x, 32
%lane_id = urem i64 %x, 32
%tmp3 = mul nsw i64 %warp_id, 8
%tmp4 = add nsw i64 7, %tmp3
%tmp5 = mul nsw i64 32, %tmp4
%tmp6 = add nsw i64 %lane_id, %tmp5
store i64 %tmp6, i64* %result
ret void
}
attributes #0 = { nounwind }
!0 = !{i32 0, i32 9945}

Repository:
  rL LLVM

https://reviews.llvm.org/D46760