[PATCH] D65530: [InstCombine] foldXorOfICmps(): don't give up on non-single-use ICmp's if all users are freely invertible
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 12 09:04:31 PDT 2019
spatel added a comment.
Sorry for the delay in looking at this. What do the motivating examples look like for codegen? Are we getting the optimal codegen for these clamps, or would we better off trying to create min/max and/or saturating intrinsics?
Make sure I didn't typo this translation, but I think I'm seeing extra instructions for vectors with this transform on x86 and aarch64:
define <4 x i32> @t4_select_cond_xor_v0_vec_before(<4 x i32> %X) {
%need_to_clamp_positive = icmp sgt <4 x i32> %X, <i32 32767, i32 32767, i32 32767, i32 32767>
%dont_need_to_clamp_negative = icmp sgt <4 x i32> %X, <i32 -32768, i32 -32768, i32 -32768, i32 -32768>
%clamp_limit = select <4 x i1> %need_to_clamp_positive, <4 x i32> <i32 32767, i32 32767, i32 32767, i32 32767>, <4 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768>
%dont_need_to_clamp = xor <4 x i1> %need_to_clamp_positive, %dont_need_to_clamp_negative
%R = select <4 x i1> %dont_need_to_clamp, <4 x i32> %X, <4 x i32> %clamp_limit
ret <4 x i32> %R
}
define <4 x i32> @t4_select_cond_xor_v0_vec_after(<4 x i32> %X) {
%t1 = icmp slt <4 x i32> %X, <i32 32768, i32 32768, i32 32768, i32 32768>
%t2 = select <4 x i1> %t1, <4 x i32> <i32 -32768, i32 -32768, i32 -32768, i32 -32768>, <4 x i32> <i32 32767, i32 32767, i32 32767, i32 32767>
%t3 = add <4 x i32> %X, <i32 32767, i32 32767, i32 32767, i32 32767>
%t4 = icmp ult <4 x i32> %t3, <i32 65535, i32 65535, i32 65535, i32 65535>
%t5 = select <4 x i1> %t4, <4 x i32> %X, <4 x i32> %t2
ret <4 x i32> %t5
}
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D65530/new/
https://reviews.llvm.org/D65530
More information about the llvm-commits
mailing list