Please review: Optimize vector multiply on X86

Mon Jun 17 04:33:09 PDT 2013

Hi Elena,

> From: "Demikhovsky, Elena" <elena.demikhovsky at intel.com>
> I added an optimization that converts vector operation multiply by 
> const to SHIFT.
> I did this optimization for X86 only.
> I’m wondering why it was not implemented for all targets.  I saw a 
> proposal sent by Andrea about month ago, but it is, probably, was 
rejected.

If you mean the change about "Simplify multiplications by vectors whose 
elements are powers of 2", then it has been committed at r183005.
http://llvm.org/viewvc/llvm-project?view=revision&revision=183005 

The combine rules introduced in r183005 are of course triggered only when 
instcombine is run.

For example, If I apply your patch to file avx2-arith.ll and then run:
opt -instcombine avx2-arith.ll  -S

I get the following IR as output:

; Function Attrs: nounwind readnone
define <8 x i32> @mul_const1(<8 x i32> %x) #0 {
  %y = shl <8 x i32> %x, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, 
i32 1>
  ret <8 x i32> %y
}

; Function Attrs: nounwind readnone
define <4 x i64> @mul_const2(<4 x i64> %x) #0 {
  %y = shl <4 x i64> %x, <i64 2, i64 2, i64 2, i64 2>
  ret <4 x i64> %y
}

; Function Attrs: nounwind readnone
define <16 x i16> @mul_const3(<16 x i16> %x) #0 {
  %y = shl <16 x i16> %x, <i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 
3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3>
  ret <16 x i16> %y
}

; Function Attrs: nounwind readnone
define <4 x i64> @mul_const4(<4 x i64> %x) #0 {
  %y = sub <4 x i64> zeroinitializer, %x
  ret <4 x i64> %y
}

; Function Attrs: nounwind readnone
define <8 x i32> @mul_const5(<8 x i32> %x) #0 {
  ret <8 x i32> zeroinitializer
}

; Function Attrs: nounwind readnone
define <8 x i32> @mul_const6(<8 x i32> %x) #0 {
  %y = mul <8 x i32> %x, <i32 0, i32 0, i32 0, i32 2, i32 0, i32 2, i32 0, 
i32 0>
  ret <8 x i32> %y
}

; Function Attrs: nounwind readnone
define <8 x i64> @mul_const7(<8 x i64> %x) #0 {
  %y = shl <8 x i64> %x, <i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, 
i64 1>
  ret <8 x i64> %y
}

; Function Attrs: nounwind readnone
define <8 x i16> @mul_const8(<8 x i16> %x) #0 {
  %y = shl <8 x i16> %x, <i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, 
i16 3>
  ret <8 x i16> %y
}

attributes #0 = { nounwind readnone }

Each multiply by a vector of constant powers of 2 has been converted into 
a vector shift.
The only missing case is function `mul_const6' where the multiply is not 
optimized into a shift. 
I think that is because the current implementation of function 
`getLogBase2Vector'  (in InstCombineMulDivRem.cpp) uses method 
APInt::isPowerOft2() which  returns true if the value is a known power of 
two bigger than 0. 

Except for that one case, all multiply by a vector of constant powers of 2 
are correctly optimized before we reach X86ISelLowering.

I hope this helps,
Andrea Di Biagio
SN Systems - Sony Computer Entertainment Group

**********************************************************************
This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. 
If you have received this email in error please notify postmaster at scee.net
This footnote also confirms that this email message has been checked for 
all known viruses.
Sony Computer Entertainment Europe Limited
Registered Office: 10 Great Marlborough Street, London W1F 7LP, United 
Kingdom
Registered in England: 3277793
**********************************************************************

P Please consider the environment before printing this e-mail