[PATCH] D65530: [InstCombine] foldXorOfICmps(): don't give up on non-single-use ICmp's if all users are freely invertible

Mon Aug 12 09:04:31 PDT 2019

spatel added a comment.

Sorry for the delay in looking at this. What do the motivating examples look like for codegen? Are we getting the optimal codegen for these clamps, or would we better off trying to create min/max and/or saturating intrinsics?

Make sure I didn't typo this translation, but I think I'm seeing extra instructions for vectors with this transform on x86 and aarch64:

  define <4 x i32> @t4_select_cond_xor_v0_vec_before(<4 x i32> %X) {
    %need_to_clamp_positive = icmp sgt <4 x i32> %X, <i32 32767, i32 32767, i32 32767, i32 32767>
    %dont_need_to_clamp_negative = icmp sgt <4 x i32> %X, <i32 -32768, i32 -32768, i32 -32768, i32 -32768>
    %clamp_limit = select <4 x i1> %need_to_clamp_positive, <4 x i32> <i32 32767, i32 32767, i32 32767, i32 32767>, <4 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768>
    %dont_need_to_clamp = xor <4 x i1> %need_to_clamp_positive, %dont_need_to_clamp_negative
    %R = select <4 x i1> %dont_need_to_clamp, <4 x i32> %X, <4 x i32> %clamp_limit
    ret <4 x i32> %R
  }

  define <4 x i32> @t4_select_cond_xor_v0_vec_after(<4 x i32> %X) {
    %t1 = icmp slt <4 x i32> %X, <i32 32768, i32 32768, i32 32768, i32 32768>
    %t2 = select <4 x i1> %t1, <4 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768>, <4 x i32> <i32 32767, i32 32767, i32 32767, i32 32767>
    %t3 = add <4 x i32> %X, <i32 32767, i32 32767, i32 32767, i32 32767>
    %t4 = icmp ult <4 x i32> %t3, <i32 65535, i32 65535, i32 65535, i32 65535>
    %t5 = select <4 x i1> %t4, <4 x i32> %X, <4 x i32> %t2
    ret <4 x i32> %t5
  }

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65530/new/

https://reviews.llvm.org/D65530