[PATCH] D150587: [KnownBits] Make shl/lshr/ashr implementations optimal

Mon May 15 09:37:44 PDT 2023

nikic added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll:224
 ; SI-PROMOTE-VECT: s_lshr_b32 [[SREG:s[0-9]+]], 0x10000, [[SCALED_IDX]]
-; SI-PROMOTE-VECT: s_and_b32 s{{[0-9]+}}, [[SREG]], 0xffff
+; SI-PROMOTE-VECT: s_and_b32 s{{[0-9]+}}, [[SREG]], 1
 define amdgpu_kernel void @short_array(ptr addrspace(1) %out, i32 %index) #0 {
----------------
I believe this is correct, because SCALED_IDX is `IDX << 4` and as such at least 16. As such, `0x10000 >> SCALED_IDX` is either zero or one and the and mask can be narrowed to 1.

================
Comment at: llvm/test/Transforms/InstCombine/not-add.ll:175
 ; CHECK-NEXT:    [[B15:%.*]] = srem i32 ashr (i32 65536, i32 or (i32 zext (i1 icmp eq (ptr @g, ptr null) to i32), i32 65537)), [[XOR]]
-; CHECK-NEXT:    [[B12:%.*]] = add nuw nsw i32 [[B15]], ashr (i32 65536, i32 or (i32 zext (i1 icmp eq (ptr @g, ptr null) to i32), i32 65537))
+; CHECK-NEXT:    [[B12:%.*]] = add nsw i32 [[B15]], ashr (i32 65536, i32 or (i32 zext (i1 icmp eq (ptr @g, ptr null) to i32), i32 65537))
 ; CHECK-NEXT:    [[B:%.*]] = xor i32 [[B12]], -1
----------------
A nominal regression because I did not try to preserve the exact behavior for "always poison" and always return unknown. If we switch to returning zero this whole code folds away (as well as code in many other tests).

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150587/new/

https://reviews.llvm.org/D150587