Please review: Optimize vector multiply on X86
Andrea_DiBiagio at sn.scee.net
Andrea_DiBiagio at sn.scee.net
Mon Jun 17 04:33:09 PDT 2013
Hi Elena,
> From: "Demikhovsky, Elena" <elena.demikhovsky at intel.com>
> I added an optimization that converts vector operation multiply by
> const to SHIFT.
> I did this optimization for X86 only.
> I’m wondering why it was not implemented for all targets. I saw a
> proposal sent by Andrea about month ago, but it is, probably, was
rejected.
If you mean the change about "Simplify multiplications by vectors whose
elements are powers of 2", then it has been committed at r183005.
http://llvm.org/viewvc/llvm-project?view=revision&revision=183005
The combine rules introduced in r183005 are of course triggered only when
instcombine is run.
For example, If I apply your patch to file avx2-arith.ll and then run:
opt -instcombine avx2-arith.ll -S
I get the following IR as output:
; Function Attrs: nounwind readnone
define <8 x i32> @mul_const1(<8 x i32> %x) #0 {
%y = shl <8 x i32> %x, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1,
i32 1>
ret <8 x i32> %y
}
; Function Attrs: nounwind readnone
define <4 x i64> @mul_const2(<4 x i64> %x) #0 {
%y = shl <4 x i64> %x, <i64 2, i64 2, i64 2, i64 2>
ret <4 x i64> %y
}
; Function Attrs: nounwind readnone
define <16 x i16> @mul_const3(<16 x i16> %x) #0 {
%y = shl <16 x i16> %x, <i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16
3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3>
ret <16 x i16> %y
}
; Function Attrs: nounwind readnone
define <4 x i64> @mul_const4(<4 x i64> %x) #0 {
%y = sub <4 x i64> zeroinitializer, %x
ret <4 x i64> %y
}
; Function Attrs: nounwind readnone
define <8 x i32> @mul_const5(<8 x i32> %x) #0 {
ret <8 x i32> zeroinitializer
}
; Function Attrs: nounwind readnone
define <8 x i32> @mul_const6(<8 x i32> %x) #0 {
%y = mul <8 x i32> %x, <i32 0, i32 0, i32 0, i32 2, i32 0, i32 2, i32 0,
i32 0>
ret <8 x i32> %y
}
; Function Attrs: nounwind readnone
define <8 x i64> @mul_const7(<8 x i64> %x) #0 {
%y = shl <8 x i64> %x, <i64 1, i64 1, i64 1, i64 1, i64 1, i64 1, i64 1,
i64 1>
ret <8 x i64> %y
}
; Function Attrs: nounwind readnone
define <8 x i16> @mul_const8(<8 x i16> %x) #0 {
%y = shl <8 x i16> %x, <i16 3, i16 3, i16 3, i16 3, i16 3, i16 3, i16 3,
i16 3>
ret <8 x i16> %y
}
attributes #0 = { nounwind readnone }
Each multiply by a vector of constant powers of 2 has been converted into
a vector shift.
The only missing case is function `mul_const6' where the multiply is not
optimized into a shift.
I think that is because the current implementation of function
`getLogBase2Vector' (in InstCombineMulDivRem.cpp) uses method
APInt::isPowerOft2() which returns true if the value is a known power of
two bigger than 0.
Except for that one case, all multiply by a vector of constant powers of 2
are correctly optimized before we reach X86ISelLowering.
I hope this helps,
Andrea Di Biagio
SN Systems - Sony Computer Entertainment Group
**********************************************************************
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify postmaster at scee.net
This footnote also confirms that this email message has been checked for
all known viruses.
Sony Computer Entertainment Europe Limited
Registered Office: 10 Great Marlborough Street, London W1F 7LP, United
Kingdom
Registered in England: 3277793
**********************************************************************
P Please consider the environment before printing this e-mail
More information about the llvm-commits
mailing list