Please review: Optimize vector multiply on X86

Mon Jun 17 05:16:30 PDT 2013

Benjamin already explained me that it was the IR optimization.
I'm proposing to do the same on DAG level because
1) We have the this optimization for scalar and don't have it for vector
2) The MUL node can be produced on DAG-level transformations of other items.

-  Elena

-----Original Message-----
From: Andrea_DiBiagio at sn.scee.net [mailto:Andrea_DiBiagio at sn.scee.net] 
Sent: Monday, June 17, 2013 14:33
To: Demikhovsky, Elena
Cc: Benjamin Kramer; llvm-commits at cs.uiuc.edu; llvm-commits-bounces at cs.uiuc.edu; Nadav Rotem <nrotem at apple.com> (nrotem at apple.com)
Subject: RE: Please review: Optimize vector multiply on X86

Hi Elena,

> From: "Demikhovsky, Elena" <elena.demikhovsky at intel.com> I added an 
> optimization that converts vector operation multiply by const to 
> SHIFT.
> I did this optimization for X86 only.
> I’m wondering why it was not implemented for all targets.  I saw a 
> proposal sent by Andrea about month ago, but it is, probably, was
rejected.

If you mean the change about "Simplify multiplications by vectors whose elements are powers of 2", then it has been committed at r183005.
http://llvm.org/viewvc/llvm-project?view=revision&revision=183005 

The combine rules introduced in r183005 are of course triggered only when instcombine is run.

For example, If I apply your patch to file avx2-arith.ll and then run:
opt -instcombine avx2-arith.ll  -S

I get the following IR as output:

; Function Attrs: nounwind readnone
define <8 x i32> @mul_const1(<8 x i32> %x) #0 {
  %y = shl <8 x i32> %x, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1,
i32 1>
  ret <8 x i32> %y
}

; Function Attrs: nounwind readnone
define <4 x i64> @mul_const2(<4 x i64> %x) #0 {
  %y = shl <4 x i64> %x, <i64 2, i64 2, i64 2, i64 2>
  ret <4 x i64> %y
}

; Function Attrs: nounwind readnone
define <16 x i16> @mul_const3(<16 x i16> %x) #0 {
  %y = shl <16 x i16> %x, <i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3>
  ret <16 x i16> %y
}

; Function Attrs: nounwind readnone
define <4 x i64> @mul_const4(<4 x i64> %x) #0 {
  %y = sub <4 x i64> zeroinitializer, %x
  ret <4 x i64> %y
}

; Function Attrs: nounwind readnone
define <8 x i32> @mul_const5(<8 x i32> %x) #0 {
  ret <8 x i32> zeroinitializer
}

; Function Attrs: nounwind readnone
define <8 x i32> @mul_const6(<8 x i32> %x) #0 {
  %y = mul <8 x i32> %x, <i32 0, i32 0, i32 0, i32 2, i32 0, i32 2, i32 0,
i32 0>
  ret <8 x i32> %y
}

; Function Attrs: nounwind readnone
define <8 x i64> @mul_const7(<8 x i64> %x) #0 {
  %y = shl <8 x i64> %x, <i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1,
i64 1>
  ret <8 x i64> %y
}

; Function Attrs: nounwind readnone
define <8 x i16> @mul_const8(<8 x i16> %x) #0 {
  %y = shl <8 x i16> %x, <i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3,
i16 3>
  ret <8 x i16> %y
}

attributes #0 = { nounwind readnone }

Each multiply by a vector of constant powers of 2 has been converted into a vector shift.
The only missing case is function `mul_const6' where the multiply is not optimized into a shift. 
I think that is because the current implementation of function `getLogBase2Vector'  (in InstCombineMulDivRem.cpp) uses method
APInt::isPowerOft2() which  returns true if the value is a known power of two bigger than 0. 

Except for that one case, all multiply by a vector of constant powers of 2 are correctly optimized before we reach X86ISelLowering.

I hope this helps,
Andrea Di Biagio
SN Systems - Sony Computer Entertainment Group

**********************************************************************
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. 
If you have received this email in error please notify postmaster at scee.net This footnote also confirms that this email message has been checked for all known viruses.
Sony Computer Entertainment Europe Limited Registered Office: 10 Great Marlborough Street, London W1F 7LP, United Kingdom Registered in England: 3277793
**********************************************************************

P Please consider the environment before printing this e-mail
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.