Please review: Optimize vector multiply on X86

Thu Jun 20 06:15:29 PDT 2013

Hi Elena,

> From: "Demikhovsky, Elena" <elena.demikhovsky at intel.com>
> Hi Andrea,
> 
> You are right that 'isSplatVector' returns true if elements are not 
> the same. But later I check the slat value for 0, -1, pow2
> And in the tests I check everything:

What I am saying is that the size in bit of the splat value returned by 
'isSplatVector' matters. For example, it may be bigger than the vector 
element size in bits.

Example, given the following <4 x i64> constant vectors:
a) <i64 2, i64 0, i64 0, i64 0>
b) <i64 2, i64 0, i64 2, i64 0>
c) <i64 2, i64 2, i64 2, i64 2>

They are all considered to be valid vector splats according to 
'isSplatVector' and the splat value is the same for all of them. What 
changes however is the SplatBitSize:
a) SplatValue = 2 ;; SplatBitSize = 256;
b) SplatValue = 2 ;; SplatBitSize = 128;
c) SplatValue = 2 ;; SplatBitSize = 64;

> ==== Multiple 2 gives add
> +define <8 x i32> @mul_const1(<8 x i32> %x) {
> +  %y = mul <8 x i32> %x, <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2,
> i32 2, i32 2>
> +  ret <8 x i32> %y
> +}

In the example above:
if you substitute the second operand of the multiply instruction with any 
of the following constant vectors:

 - <i32 2, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
 - <i32 2, i32 0, i32 0, i32 0, i32 2, i32 0, i32 0, i32 0>
 - <i32 2, i32 0, i32 2, i32 0, i32 2, i32 0, i32 2, i32 0>

You will find out that your new combine transformation always optimizes 
the multiply into an add, and that causes incorrect codegen.
That is because the compiler applies the following sequence of rules:
 1) fold (mul x, (1 << c)) -> x << c  (in DAGCombiner::visitMul())
 2) (shl V, 1) -> add V, V (in X86ISelLowering.cpp, function 
PerformSHLCombine())

> === non-spalt int value is not optimized
> +define <8 x i32> @mul_const6(<8 x i32> %x) {
> +  %y = mul <8 x i32> %x, <i32 0, i32 0, i32 0, i32 2, i32 0, i32 2,
> i32 0, i32 0>
> +  ret <8 x i32> %y
> +}

That is true for the example above but it is not always true in general.
If you change the code above into:

define <8 x i32> @mul_const6(<8 x i32> %x) {
  %y = mul <8 x i32> %x, <i32 2, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, 
i32 0>
  ret <8 x i32> %y
}

You will get that the multiply is wrongly converted into an add.

Andrea Di Biagio
SN Systems - Sony Computer Entertainment Group

**********************************************************************
This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. 
If you have received this email in error please notify postmaster at scee.net
This footnote also confirms that this email message has been checked for 
all known viruses.
Sony Computer Entertainment Europe Limited
Registered Office: 10 Great Marlborough Street, London W1F 7LP, United 
Kingdom
Registered in England: 3277793
**********************************************************************

P Please consider the environment before printing this e-mail