Please review: Optimize vector multiply on X86

Sun Jun 23 04:00:17 PDT 2013

I caught your point. I also fixed the same bug on SDIV lowering. This is the new patch.

-  Elena

-----Original Message-----
From: Andrea_DiBiagio at sn.scee.net [mailto:Andrea_DiBiagio at sn.scee.net] 
Sent: Thursday, June 20, 2013 16:15
To: Demikhovsky, Elena
Cc: Benjamin Kramer; llvm-commits at cs.uiuc.edu; llvm-commits-bounces at cs.uiuc.edu
Subject: RE: Please review: Optimize vector multiply on X86

Hi Elena,

> From: "Demikhovsky, Elena" <elena.demikhovsky at intel.com> Hi Andrea,
> 
> You are right that 'isSplatVector' returns true if elements are not 
> the same. But later I check the slat value for 0, -1, pow2 And in the 
> tests I check everything:

What I am saying is that the size in bit of the splat value returned by 'isSplatVector' matters. For example, it may be bigger than the vector element size in bits.

Example, given the following <4 x i64> constant vectors:
a) <i64 2, i64 0, i64 0, i64 0>
b) <i64 2, i64 0, i64 2, i64 0>
c) <i64 2, i64 2, i64 2, i64 2>

They are all considered to be valid vector splats according to 'isSplatVector' and the splat value is the same for all of them. What changes however is the SplatBitSize:
a) SplatValue = 2 ;; SplatBitSize = 256;
b) SplatValue = 2 ;; SplatBitSize = 128;
c) SplatValue = 2 ;; SplatBitSize = 64;

> ==== Multiple 2 gives add
> +define <8 x i32> @mul_const1(<8 x i32> %x) {
> +  %y = mul <8 x i32> %x, <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2,
> i32 2, i32 2>
> +  ret <8 x i32> %y
> +}

In the example above:
if you substitute the second operand of the multiply instruction with any of the following constant vectors:

 - <i32 2, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
 - <i32 2, i32 0, i32 0, i32 0, i32 2, i32 0, i32 0, i32 0>
 - <i32 2, i32 0, i32 2, i32 0, i32 2, i32 0, i32 2, i32 0>

You will find out that your new combine transformation always optimizes the multiply into an add, and that causes incorrect codegen.
That is because the compiler applies the following sequence of rules:
 1) fold (mul x, (1 << c)) -> x << c  (in DAGCombiner::visitMul())
 2) (shl V, 1) -> add V, V (in X86ISelLowering.cpp, function
PerformSHLCombine())

> === non-spalt int value is not optimized
> +define <8 x i32> @mul_const6(<8 x i32> %x) {
> +  %y = mul <8 x i32> %x, <i32 0, i32 0, i32 0, i32 2, i32 0, i32 2,
> i32 0, i32 0>
> +  ret <8 x i32> %y
> +}

That is true for the example above but it is not always true in general.
If you change the code above into:

define <8 x i32> @mul_const6(<8 x i32> %x) {
  %y = mul <8 x i32> %x, <i32 2, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0,
i32 0>
  ret <8 x i32> %y
}

You will get that the multiply is wrongly converted into an add.

Andrea Di Biagio
SN Systems - Sony Computer Entertainment Group

**********************************************************************
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. 
If you have received this email in error please notify postmaster at scee.net This footnote also confirms that this email message has been checked for all known viruses.
Sony Computer Entertainment Europe Limited Registered Office: 10 Great Marlborough Street, London W1F 7LP, United Kingdom Registered in England: 3277793
**********************************************************************

P Please consider the environment before printing this e-mail
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mul3.diff
Type: application/octet-stream
Size: 11181 bytes
Desc: mul3.diff
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130623/72b4546e/attachment.obj>