[PATCH] D108049: [InstCombine] Canonicalize saturate with shift and xor to min/max clamp
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 31 07:10:37 PDT 2021
spatel added a comment.
That's a big fold!
The larger the pattern match, the more fragile the optimization tends to be because we might eventually find sub-patterns that can be reduced.
Does it make things harder or easier if we fold that icmp? We're checking if the top N/2 + 1 bits are all set or clear, so I visualized it like this:
define i1 @src(i16 %x) {
%t0 = lshr i16 %x, 8
%conv.i = trunc i16 %t0 to i8
%conv1.i = trunc i16 %x to i8
%shr2.i = ashr i8 %conv1.i, 7
%r = icmp eq i8 %shr2.i, %conv.i
ret i1 %r
}
define i1 @tgt(i16 %x) {
%mask = ashr i16 %x, 7
%ones = icmp eq i16 %mask, -1
%zero = icmp eq i16 %mask, 0
%r = or i1 %ones, %zero
ret i1 %r
}
https://alive2.llvm.org/ce/z/reQjDv
But existing combines get that down to just 2 instructions:
define i1 @tgt(i16 %x) {
%x.off = add i16 %x, 65408
%r = icmp ugt i16 %x.off, 65279
ret i1 %r
}
https://alive2.llvm.org/ce/z/8Fh23s
I don't know exactly what the generalization for this will be, but it seems like we should try that first?
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D108049/new/
https://reviews.llvm.org/D108049
More information about the llvm-commits
mailing list