[PATCH] D40984: [InstCombine] canonicalize shifty abs(): ashr+add+xor --> cmp+neg+sel

Fri Dec 15 08:30:05 PST 2017

spatel added a comment.

In https://reviews.llvm.org/D40984#956312, @hfinkel wrote:

> LGTM (please take a quick look at CodeGen for other non-X86 targets (e.g., PowerPC, AArch64) and make sure that looks reasonable too).

Sure - re-reading the description I realize that was not at all clear. Let me try to clarify here and in the commit log (although the code itself is a mess, so it's hard to describe cleanly):

1. DAGCombiner has a generic transform for all targets to convert the **scalar** cmp+sel variant of abs into the shift variant. This is the opposite of this IR canonicalization.
2. DAGCombiner has a generic transform for all targets to convert the **vector** cmp+sel variant of abs into either an ABS node or the shift variant. This is again the opposite of this IR canonicalization.
3. DAGCombiner has a generic transform for all targets to convert the exact scalar shift variant produced by #1 into an ISD::ABS node. Note: It would be an efficiency improvement if we had #1 go directly to an ABS node when that's legal/custom.
4. The pattern matching above is incomplete, so it is possible to escape the intended/optimal codegen in a variety of ways.
  1. For #2, the vector path is missing the case for setlt with a '1' constant.
  2. For #3, we are missing a match for commuted versions of the scalar shift variant.
  3. There is no vector equivalent at all for #3.
5. Therefore, this IR canonicalization can only help get us to the optimal codegen. The version of cmp+sel produced by this patch will be recognized in the DAG and converted to an ABS node when possible or the shift sequence when not.
6. In the following examples with this patch applied, we may get conditional moves rather than the shift produced by the generic DAGCombiner transforms. That is created using a target-specific decision for any given target. Whether it is optimal or not for a particular subtarget may be up for debate.

  define i32 @abs_shifty(i32 %x) {
    %signbit = ashr i32 %x, 31 
    %add = add i32 %signbit, %x  
    %abs = xor i32 %signbit, %add 
    ret i32 %abs
  }

  define i32 @abs_cmpsubsel(i32 %x) {
    %cmp = icmp slt i32 %x, zeroinitializer
    %sub = sub i32 zeroinitializer, %x
    %abs = select i1 %cmp, i32 %sub, i32 %x
    ret i32 %abs
  }

  define <4 x i32> @abs_shifty_vec(<4 x i32> %x) {
    %signbit = ashr <4 x i32> %x, <i32 31, i32 31, i32 31, i32 31> 
    %add = add <4 x i32> %signbit, %x  
    %abs = xor <4 x i32> %signbit, %add 
    ret <4 x i32> %abs
  }

  define <4 x i32> @abs_cmpsubsel_vec(<4 x i32> %x) {
    %cmp = icmp slt <4 x i32> %x, zeroinitializer
    %sub = sub <4 x i32> zeroinitializer, %x
    %abs = select <4 x i1> %cmp, <4 x i32> %sub, <4 x i32> %x
    ret <4 x i32> %abs
  }

  > $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=x86_64 -mattr=avx 
  > abs_shifty:
  > 	movl	%edi, %eax
  > 	negl	%eax
  > 	cmovll	%edi, %eax
  > 	retq
  > 
  > abs_cmpsubsel:
  > 	movl	%edi, %eax
  > 	negl	%eax
  > 	cmovll	%edi, %eax
  > 	retq
  > 
  > abs_shifty_vec:
  > 	vpabsd	%xmm0, %xmm0
  > 	retq
  > 
  > abs_cmpsubsel_vec:
  > 	vpabsd	%xmm0, %xmm0
  > 	retq
  > 
  > $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=aarch64
  > abs_shifty:
  > 	cmp	w0, #0                  // =0
  > 	cneg	w0, w0, mi
  > 	ret
  > 
  > abs_cmpsubsel: 
  > 	cmp	w0, #0                  // =0
  > 	cneg	w0, w0, mi
  > 	ret
  >                                        
  > abs_shifty_vec: 
  > 	abs	v0.4s, v0.4s
  > 	ret
  > 
  > abs_cmpsubsel_vec: 
  > 	abs	v0.4s, v0.4s
  > 	ret
  > 
  > $ ./opt -instcombine shiftyabs.ll -S | ./llc -o - -mtriple=powerpc64le 
  > abs_shifty:  
  > 	srawi 4, 3, 31
  > 	add 3, 3, 4
  > 	xor 3, 3, 4
  > 	blr
  > 
  > abs_cmpsubsel:
  > 	srawi 4, 3, 31
  > 	add 3, 3, 4
  > 	xor 3, 3, 4
  > 	blr
  > 
  > abs_shifty_vec:   
  > 	vspltisw 3, -16
  > 	vspltisw 4, 15
  > 	vsubuwm 3, 4, 3
  > 	vsraw 3, 2, 3
  > 	vadduwm 2, 2, 3
  > 	xxlxor 34, 34, 35
  > 	blr
  > 
  > abs_cmpsubsel_vec: 
  > 	vspltisw 3, -16
  > 	vspltisw 4, 15
  > 	vsubuwm 3, 4, 3
  > 	vsraw 3, 2, 3
  > 	vadduwm 2, 2, 3
  > 	xxlxor 34, 34, 35
  > 	blr
  > 

https://reviews.llvm.org/D40984