[PATCH] D25987: [X86] New pattern to generate PSUBUS from SELECT

Tue May 2 07:26:54 PDT 2017

yulia_koval added a comment.

In https://reviews.llvm.org/D25987#742378, @spatel wrote:

> I think we need to take a step back here. The tests are assuming a specific form of IR, but I'm not sure it's canonical. If we change this upstream (in InstCombine), the transform may fail to activate when you hoped it would. It's also possible that generic DAG combines could interfere with the pattern you're expecting. See https://reviews.llvm.org/rL301781 as an example. In either case, this seems like a fragile solution because the pattern can change above this.
>
> http://rise4fun.com/Alive/5e4
>  Unless I made a mistake in transcribing that, the 1st form is what you're expecting in these tests. The 2nd form might be considered better since it includes a canonical 'max'.
>
> Can you test this patch with more examples like this to see if it behaves as expected:
>
>   define i16 @scalar(i16 %x, i32 %y) nounwind {
>     %lhs = zext i16 %x to i32
>     %cond = icmp ugt i32 %lhs, %y
>     %sub = sub i32 %lhs, %y
>     %truncsub = trunc i32 %sub to i16
>     %res = select i1 %cond, i16 %truncsub, i16 zeroinitializer
>     ret i16 %res
>   }
>  
>   define i16 @scalar_max(i16 %x, i32 %y) nounwind {
>     %lhs = zext i16 %x to i32
>     %cond = icmp ugt i32 %lhs, %y
>     %umax = select i1 %cond, i32 %lhs, i32 %y
>     %sub = sub i32 %umax, %y
>     %truncsub = trunc i32 %sub to i16
>     ret i16 %truncsub
>   }
>  
>   define <8 x i16> @test15(<8 x i16> %x, <8 x i32> %y) nounwind {
>     %lhs = zext <8 x i16> %x to <8 x i32>
>     %cond = icmp ugt <8 x i32> %lhs, %y
>     %sub = sub <8 x i32> %lhs, %y
>     %truncsub = trunc <8 x i32> %sub to <8 x i16>
>     %res = select <8 x i1> %cond, <8 x i16> %truncsub, <8 x i16> zeroinitializer
>     ret <8 x i16> %res
>   }
>  
>   define <8 x i16> @test15_max(<8 x i16> %x, <8 x i32> %y) nounwind {
>     %lhs = zext <8 x i16> %x to <8 x i32>
>     %cond = icmp ugt <8 x i32> %lhs, %y
>     %umax = select <8 x i1> %cond, <8 x i32> %lhs, <8 x i32> %y
>     %sub = sub <8 x i32> %umax, %y
>     %truncsub = trunc <8 x i32> %sub to <8 x i16>
>     ret <8 x i16> %truncsub
>   }
>  
>   define <8 x i8> @test15_narrow(<8 x i8> %x, <8 x i16> %y) nounwind {
>     %lhs = zext <8 x i8> %x to <8 x i16>
>     %cond = icmp ugt <8 x i16> %lhs, %y
>     %sub = sub <8 x i16> %lhs, %y
>     %truncsub = trunc <8 x i16> %sub to <8 x i8>
>     %res = select <8 x i1> %cond, <8 x i8> %truncsub, <8 x i8> zeroinitializer
>     ret <8 x i8> %res
>   }
>  
>   define <8 x i8> @test15_narrow_max(<8 x i8> %x, <8 x i16> %y) nounwind {
>     %lhs = zext <8 x i8> %x to <8 x i16>
>     %cond = icmp ugt <8 x i16> %lhs, %y
>     %umax = select <8 x i1> %cond, <8 x i16> %lhs, <8 x i16> %y
>     %sub = sub <8 x i16> %umax, %y
>     %truncsub = trunc <8 x i16> %sub to <8 x i8>
>     ret <8 x i8> %truncsub
>   }
>  
>

Well, first 2 tests don't work with my patch because PSUBUS is a vector instruction. Test15 is working perfectly. Test15_max doesn't work, because I didn't try to recognize this pattern(the one you showed in a link), after optimization, because currently(for at least a year) llvm generates the code like in my tests, for example for psubus.c:

void foo(unsigned short *p, int max, int n)
{
int i;
unsigned m;
for (i = 0; i < n; i++) {

  m = *--p;
  *p = (unsigned short)(m >= max ? m-max : 0);

}
}

IR will be:

  %10 = zext <8 x i16> %reverse to <8 x i32>
  %11 = icmp ugt <8 x i32> %broadcast.splat16, %10
  %12 = sub <8 x i32> %10, %broadcast.splat16
  %13 = trunc <8 x i32> %12 to <8 x i16>
  %14 = select <8 x i1> %11, <8 x i16> zeroinitializer, <8 x i16> %13

I see, that optimization you provided in http://rise4fun.com/Alive/5e4 is valid, but it extends from i16 to i32 which may trigger some regressions, because it switches to wider operations. Maybe something like this will appear in the future, but currently the pattern above is generated. Also, whether the pattern generated from this particular C code will be changed or not, this patch will still be useful for this IR pattern, if it appears. I can add recognition of the test15_max pattern as well, should I do it?

Test15_narrow converted to psubus before my patch and test15_narrow_max converted to max before my patch. The only test, changed by my patch is @test15.

Looks like https://reviews.llvm.org/rL301781 doesn't affect my patch at any tests..
Also if some optimization replace select(sub) with sub(select), won't it ruin any other psubus pattern as well?

https://reviews.llvm.org/D25987