[PATCH] D108049: [InstCombine] Canonicalize saturate with shift and xor to min/max clamp

Tue Aug 31 07:10:37 PDT 2021

spatel added a comment.

That's a big fold! 
The larger the pattern match, the more fragile the optimization tends to be because we might eventually find sub-patterns that can be reduced.
Does it make things harder or easier if we fold that icmp? We're checking if the top N/2 + 1 bits are all set or clear, so I visualized it like this:

  define i1 @src(i16 %x) {
    %t0 = lshr i16 %x, 8
    %conv.i = trunc i16 %t0 to i8
    %conv1.i = trunc i16 %x to i8
    %shr2.i = ashr i8 %conv1.i, 7
    %r = icmp eq i8 %shr2.i, %conv.i
    ret i1 %r
  }

  define i1 @tgt(i16 %x) {
    %mask = ashr i16 %x, 7
    %ones = icmp eq i16 %mask, -1
    %zero = icmp eq i16 %mask, 0
    %r = or i1 %ones, %zero 
    ret i1 %r
  }

https://alive2.llvm.org/ce/z/reQjDv

But existing combines get that down to just 2 instructions:

  define i1 @tgt(i16 %x) {
    %x.off = add i16 %x, 65408
    %r = icmp ugt i16 %x.off, 65279
    ret i1 %r
  }

https://alive2.llvm.org/ce/z/8Fh23s

I don't know exactly what the generalization for this will be, but it seems like we should try that first?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108049/new/

https://reviews.llvm.org/D108049