[llvm-commits] Please Review: AVX code optimization

Craig Topper craig.topper at gmail.com
Wed Jul 18 23:32:37 PDT 2012


These are all stylistic comments. Contentwise it looks fine.

+      !isa<ConstantSDNode> (N1.getOperand(1)))

Remove the space before the parenthese.

+    if (!isSymmetric && (NumElts==8) ) {

Remove extra space betweeen parentheses. Also capitalize isSymmetric and
distance

+      }
+      else if ((IdxVal >= NumElts/2) && (ExtractIdxVal< NumElts/2)) {

Put else on the same line as the closing brace

+    // The insert-extract pair is symmetric when we extract element "5"
and
+    // insert it in "1".
+    // If the pair is not symmetric and extracted and inserted elements
are not

Couple trailing spaces on these lines

+  if (NewOp.getNode())
+    return NewOp;

Trailing space on the if line.

On Wed, Jul 18, 2012 at 2:03 AM, Demikhovsky, Elena <
elena.demikhovsky at intel.com> wrote:

>  Hi, I’d like to commit this patch, sending the patch again..****
>
> ** **
>
> *- Elena*****
>
> *From:* llvm-commits-bounces at cs.uiuc.edu [mailto:
> llvm-commits-bounces at cs.uiuc.edu] *On Behalf Of *Demikhovsky, Elena
> *Sent:* Monday, July 16, 2012 09:50
> *To:* Nick Lewycky
>
> *Cc:* Commit Messages and Patches for LLVM
> *Subject:* Re: [llvm-commits] Please Review: AVX code optimization****
>
>  ** **
>
> I checked my optimization on codegen level against –O2 optimization. This
> is the code for comparison:****
>
> ** **
>
> IR before –O2:****
>
>   %c = extractelement <8 x i32> %a, i32 1****
>
>   %d = insertelement <8 x i32> %b, i32 %c, i32 7****
>
> The code:****
>
>         vunpcklps          %ymm0, %ymm0, %ymm0                 ## ymm0 =
> ymm0[0,0,1,1,4,4,5,5]****
>
>         vperm2f128      $0, %ymm0, %ymm0, %ymm0          ## ymm0 =
> ymm0[0,1,0,1]****
>
>         vblendps           $128, %ymm0, %ymm1, %ymm0****
>
> ** **
>
> After –O2:****
>
>   %d = shufflevector <8 x i32> %b, <8 x i32> %a, <8 x i32> <i32 0, i32 1,
> i32 2, i32 3, i32 4, i32 5, i32 6, i32 9>****
>
> ** **
>
> ** **
>
> The code:****
>
>         vextractf128    $1, %ymm1, %xmm2****
>
>         vshufps $33, %xmm2, %xmm0, %xmm0        ## xmm0 =
> xmm0[1,0],xmm2[2,0]****
>
>         vshufps $36, %xmm0, %xmm2, %xmm0        ## xmm0 =
> xmm2[0,1],xmm0[2,0]****
>
>         vinsertf128     $1, %xmm0, %ymm1, %ymm0****
>
> ** **
>
> --------------------------****
>
> IR before –O2:****
>
> ** **
>
>   %c = extractelement <4 x i64> %a, i32 3****
>
>   %d = insertelement <4 x i64> %b, i64 %c, i32 2****
>
> ** **
>
> vunpckhpd       %ymm0, %ymm0, %ymm0 ## ymm0 = ymm0[1,1,3,3]****
>
> vblendpd        $4, %ymm0, %ymm1, %ymm0****
>
> ** **
>
> IR after –O2****
>
>   %d = shufflevector <4 x i64> %a, <4 x i64> %b, <4 x i32> <i32 0, i32 1,
> i32 7, i32 3>****
>
> ** **
>
> vextractf128    $1, %ymm1, %xmm2****
>
> vextractf128    $1, %ymm0, %xmm0****
>
> vpunpckhqdq     %xmm2, %xmm0, %xmm0 ## xmm0 = xmm0[1],xmm2[1]****
>
> vinsertf128     $1, %xmm0, %ymm1, %ymm0****
>
> ** **
>
> I have to say that the code, as I generated now in my own branch requires
> more changes in X86ISelLowering. I plan to send the patches one-by-one.***
> *
>
> And answering on your question****
>
> **Ø  **Or is this a pattern that parts of the backend will produce
> internally where the IR optimizers couldn't see it?****
>
> Right, the optimizer does not see this pattern, our backend generates it
> later.****
>
> * *
>
> * *
>
> *- Elena*****
>
> *From:* Nick Lewycky [mailto:nlewycky at google.com <nlewycky at google.com>]
> *Sent:* Friday, July 13, 2012 21:43
> *To:* Demikhovsky, Elena
> *Cc:* Nick Lewycky; Commit Messages and Patches for LLVM
> *Subject:* Re: [llvm-commits] Please Review: AVX code optimization****
>
> ** **
>
> On 11 July 2012 03:34, Demikhovsky, Elena <elena.demikhovsky at intel.com>
> wrote:****
>
> I'm not sure that all architectures will see performance gain.
> While building shuffles, I know that each shuffle will be replaced with
> one machine instruction.
> I also know that shuffle is cheaper (1 cycle) than extract (3 cycles) and
> insert (2 cycles).
> I know that blend is better than other shuffle. And this information is
> specific for X86 and written in IA optimization guide.****
>
> ** **
>
> The IR-level optimizers already do transform your testcases into
> shufflevector instructions. Here's the result after opt -O2:****
>
> ** **
>
> define <8 x i32> @test20(<8 x i32> %a, <8 x i32> %b) nounwind readnone {**
> **
>
>   %d = shufflevector <8 x i32> %b, <8 x i32> %a, <8 x i32> <i32 0, i32 1,
> i32 2, i32 3, i32 4, i32 5, i32 6, i32 9>****
>
>   ret <8 x i32> %d****
>
> }****
>
> ** **
>
> define <8 x i32> @test21(<8 x i32> %a, <8 x i32> %b) nounwind readnone {**
> **
>
>   %d = shufflevector <8 x i32> %b, <8 x i32> %a, <8 x i32> <i32 0, i32 1,
> i32 9, i32 3, i32 4, i32 5, i32 6, i32 7>****
>
>   ret <8 x i32> %d****
>
> }****
>
> ** **
>
> define <4 x i64> @test22(<4 x i64> %a, <4 x i64> %b) nounwind readnone {**
> **
>
>   %d = shufflevector <4 x i64> %b, <4 x i64> %a, <4 x i32> <i32 0, i32 1,
> i32 7, i32 3>****
>
>   ret <4 x i64> %d****
>
> }****
>
> ** **
>
> define <4 x i64> @test23(<4 x i64> %a, <4 x i64> %b) nounwind readnone {**
> **
>
>   %d = shufflevector <4 x i64> %b, <4 x i64> %a, <4 x i32> <i32 0, i32 1,
> i32 7, i32 3>****
>
>   ret <4 x i64> %d****
>
> }****
>
> ** **
>
> In what case does the patch you sent in improve generated code? Running
> the optimizing code generator on unoptimized IR? Or is this a pattern that
> parts of the backend will produce internally where the IR optimizers
> couldn't see it?****
>
> ** **
>
> Nick****
>
> ** **
>
>
> - Elena****
>
> -----Original Message-----
> From: Nick Lewycky [mailto:nicholas at mxc.ca]
> Sent: Wednesday, July 11, 2012 11:47
> To: Demikhovsky, Elena
> Cc: Commit Messages and Patches for LLVM
> Subject: Re: [llvm-commits] Please Review: AVX code optimization
>
> Demikhovsky, Elena wrote:
> > I wrote an optimization for extractelement - insertelement sequences.
> > Please review.
>
> It looks like this is a dagcombine to turn insertelement+extractelement
> pairs into vector shuffles. Perhaps I'm missing a good reason, but why not
> do this as an IR optimization?
>
> Nick****
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.****
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits****
>
>  ** **
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.****
>  ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>


-- 
~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120718/880937bb/attachment.html>


More information about the llvm-commits mailing list