[llvm-commits] Please Review: AVX code optimization

Nick Lewycky nlewycky at google.com
Fri Jul 13 11:42:43 PDT 2012


On 11 July 2012 03:34, Demikhovsky, Elena <elena.demikhovsky at intel.com>wrote:

> I'm not sure that all architectures will see performance gain.
> While building shuffles, I know that each shuffle will be replaced with
> one machine instruction.
> I also know that shuffle is cheaper (1 cycle) than extract (3 cycles) and
> insert (2 cycles).
> I know that blend is better than other shuffle. And this information is
> specific for X86 and written in IA optimization guide.
>

The IR-level optimizers already do transform your testcases into
shufflevector instructions. Here's the result after opt -O2:

define <8 x i32> @test20(<8 x i32> %a, <8 x i32> %b) nounwind readnone {
  %d = shufflevector <8 x i32> %b, <8 x i32> %a, <8 x i32> <i32 0, i32 1,
i32 2, i32 3, i32 4, i32 5, i32 6, i32 9>
  ret <8 x i32> %d
}

define <8 x i32> @test21(<8 x i32> %a, <8 x i32> %b) nounwind readnone {
  %d = shufflevector <8 x i32> %b, <8 x i32> %a, <8 x i32> <i32 0, i32 1,
i32 9, i32 3, i32 4, i32 5, i32 6, i32 7>
  ret <8 x i32> %d
}

define <4 x i64> @test22(<4 x i64> %a, <4 x i64> %b) nounwind readnone {
  %d = shufflevector <4 x i64> %b, <4 x i64> %a, <4 x i32> <i32 0, i32 1,
i32 7, i32 3>
  ret <4 x i64> %d
}

define <4 x i64> @test23(<4 x i64> %a, <4 x i64> %b) nounwind readnone {
  %d = shufflevector <4 x i64> %b, <4 x i64> %a, <4 x i32> <i32 0, i32 1,
i32 7, i32 3>
  ret <4 x i64> %d
}

In what case does the patch you sent in improve generated code? Running the
optimizing code generator on unoptimized IR? Or is this a pattern that
parts of the backend will produce internally where the IR optimizers
couldn't see it?

Nick


> - Elena
> -----Original Message-----
> From: Nick Lewycky [mailto:nicholas at mxc.ca]
> Sent: Wednesday, July 11, 2012 11:47
> To: Demikhovsky, Elena
> Cc: Commit Messages and Patches for LLVM
> Subject: Re: [llvm-commits] Please Review: AVX code optimization
>
> Demikhovsky, Elena wrote:
> > I wrote an optimization for extractelement - insertelement sequences.
> > Please review.
>
> It looks like this is a dagcombine to turn insertelement+extractelement
> pairs into vector shuffles. Perhaps I'm missing a good reason, but why not
> do this as an IR optimization?
>
> Nick
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120713/4a60f077/attachment.html>


More information about the llvm-commits mailing list