[llvm-commits] [PATCH] AVX vmovaps +vxoprs + vinsertf128 DAG combine to vmovaps
Bruno Cardoso Lopes
bruno.cardoso at gmail.com
Fri Dec 30 13:07:21 PST 2011
Hi Chad,
On Thu, Dec 22, 2011 at 12:12 AM, Chad Rosier <mcrosier at apple.com> wrote:
> This patch is for an AVX specific DAGcombine optimization.
>
> The following code:
>
> __m256 foo(float *f) {
> return _mm256_castps128_ps256 (_mm_load_ps(f));
> }
>
> generates this assembly:
>
> vmovaps (%rdi), %xmm0
> vxorps %ymm1, %ymm1, %ymm1
> vinsertf128 $0, %xmm0, %ymm1, %ymm0
>
> On AVX enabled processors, the vmovaps will zero the upper bits (255:128) of the corresponding YMM register. Therefore, the vxorps and vinsertf128 instructions are not necessary.
>
> This patch implements a DAG combine that removes the unnecessary vxorps and vinsertf128 instructions. Currently, this is only working as an enhancement to one of Bruno's DAGcombines (r135727), but I do plan on making this more general in the future.
LGTM, just a few comments:
+ return DAG.getNode(ISD::BITCAST, dl, VT, ResNode);
Since you're early returning here,
+ } else {
+ // Emit a zeroed vector and insert the desired subvector on its
+ // first half.
+ SDValue Zeros = getZeroVector(VT, true /* HasXMMInt */, DAG, dl);
+ SDValue InsV = Insert128BitVector(Zeros, V1.getOperand(0),
+ DAG.getConstant(0, MVT::i32), DAG, dl);
+ return DCI.CombineTo(N, InsV);
+ }
to follow llvm coding style, you don't need the "else".
+def X86vzload128 : SDNode<"X86ISD::VZEXT_LOAD128", SDTLoad,
+ [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
+ // VZEXT_LOAD128 - Load vector and zero extend.
+ VZEXT_LOAD128,
+
Why not use the previous X86vzload and VZEXT_LOAD instead? You can you
use it and still match it right by using v4i64 in the pattern.
--
Bruno Cardoso Lopes
http://www.brunocardoso.cc
More information about the llvm-commits
mailing list