[PATCH] Optimization for certain shufflevector by using insertps.

Demikhovsky, Elena elena.demikhovsky at intel.com
Thu Apr 24 00:18:03 PDT 2014

Hi Filipe,

1) you can check INSERTPS mask outside NormalizeVectorShuffle(), where all other masks have been checked.
2) you can use insertps for v4i32 as well.

3) I think that folding load in insertps is not fully correct

you translate this IR to (add +  insertps) commands.

  %0 = load <4 x float>* %pb, align 16
  %vecinit6 = shufflevector <4 x float> %a, <4 x float> %0, <4 x i32> <i32 0, i32 1, i32 2, i32 4>

insertps loads 4 bytes instead of 32. You lose exceptions. It is ok for OpenCL but other compilers can't ignore exceptions.

And in general, I'm not sure that
add + insertps-load-form is better than load + insertps

4) About tests: Why do you check X32 and X64 separately?

-  Elena

-----Original Message-----
From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Nadav Rotem
Sent: Thursday, April 24, 2014 07:51
To: filcab+llvm.phabricator at gmail.com; nrotem at apple.com
Cc: llvm-commits at cs.uiuc.edu
Subject: Re: [PATCH] Optimization for certain shufflevector by using insertps.

I did not review the patch carefully but from a quick look it looks fine.  Andrea, what do you say?


llvm-commits mailing list
llvm-commits at cs.uiuc.edu
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

More information about the llvm-commits mailing list