<div class="gmail_quote">On 11 July 2012 03:34, Demikhovsky, Elena <span dir="ltr"><<a href="mailto:elena.demikhovsky@intel.com" target="_blank">elena.demikhovsky@intel.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


I'm not sure that all architectures will see performance gain.<br>

While building shuffles, I know that each shuffle will be replaced with one machine instruction.<br>

I also know that shuffle is cheaper (1 cycle) than extract (3 cycles) and insert (2 cycles).<br>

I know that blend is better than other shuffle. And this information is specific for X86 and written in IA optimization guide.<br></blockquote><div><br></div><div>The IR-level optimizers already do transform your testcases into shufflevector instructions. Here's the result after opt -O2:</div>


<div><br></div><div><div>define <8 x i32> @test20(<8 x i32> %a, <8 x i32> %b) nounwind readnone {</div><div>  %d = shufflevector <8 x i32> %b, <8 x i32> %a, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 9></div>


<div>  ret <8 x i32> %d</div><div>}</div><div><br></div><div>define <8 x i32> @test21(<8 x i32> %a, <8 x i32> %b) nounwind readnone {</div><div>  %d = shufflevector <8 x i32> %b, <8 x i32> %a, <8 x i32> <i32 0, i32 1, i32 9, i32 3, i32 4, i32 5, i32 6, i32 7></div>


<div>  ret <8 x i32> %d</div><div>}</div><div><br></div><div>define <4 x i64> @test22(<4 x i64> %a, <4 x i64> %b) nounwind readnone {</div><div>  %d = shufflevector <4 x i64> %b, <4 x i64> %a, <4 x i32> <i32 0, i32 1, i32 7, i32 3></div>


<div>  ret <4 x i64> %d</div><div>}</div><div><br></div><div>define <4 x i64> @test23(<4 x i64> %a, <4 x i64> %b) nounwind readnone {</div><div>  %d = shufflevector <4 x i64> %b, <4 x i64> %a, <4 x i32> <i32 0, i32 1, i32 7, i32 3></div>


<div>  ret <4 x i64> %d</div><div>}</div></div><div><br></div><div>In what case does the patch you sent in improve generated code? Running the optimizing code generator on unoptimized IR? Or is this a pattern that parts of the backend will produce internally where the IR optimizers couldn't see it?</div>


<div><br></div><div>Nick</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<span class="HOEnZb"><font color="#888888"><br>

- Elena<br>

</font></span><div class="im HOEnZb">-----Original Message-----<br>

From: Nick Lewycky [mailto:<a href="mailto:nicholas@mxc.ca">nicholas@mxc.ca</a>]<br>

Sent: Wednesday, July 11, 2012 11:47<br>

To: Demikhovsky, Elena<br>

Cc: Commit Messages and Patches for LLVM<br>

Subject: Re: [llvm-commits] Please Review: AVX code optimization<br>

<br>

Demikhovsky, Elena wrote:<br>

> I wrote an optimization for extractelement - insertelement sequences.<br>

> Please review.<br>

<br>

It looks like this is a dagcombine to turn insertelement+extractelement pairs into vector shuffles. Perhaps I'm missing a good reason, but why not do this as an IR optimization?<br>

<br>

Nick<br>

</div><div class="im HOEnZb">---------------------------------------------------------------------<br>

Intel Israel (74) Limited<br>

<br>

This e-mail and any attachments may contain confidential material for<br>

the sole use of the intended recipient(s). Any review or distribution<br>

by others is strictly prohibited. If you are not the intended<br>

recipient, please contact the sender and delete all copies.<br>

<br>

<br>

</div><div class="HOEnZb"><div class="h5">_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

</div></div></blockquote></div><br>