[llvm-commits] [PATCH] AVX vmovaps +vxoprs + vinsertf128 DAG combine to vmovaps

Chad Rosier mcrosier at apple.com
Tue Jan 3 13:11:37 PST 2012


Thanks, Bruno.  Committed llvm revision 147481 with the suggested changes.

 Chad

On Dec 30, 2011, at 1:07 PM, Bruno Cardoso Lopes wrote:

> Hi Chad,
> 
> On Thu, Dec 22, 2011 at 12:12 AM, Chad Rosier <mcrosier at apple.com> wrote:
>> This patch is for an AVX specific DAGcombine optimization.
>> 
>> The following code:
>> 
>> __m256 foo(float *f) {
>>    return _mm256_castps128_ps256 (_mm_load_ps(f));
>> }
>> 
>> generates this assembly:
>> 
>>        vmovaps (%rdi), %xmm0
>>        vxorps  %ymm1, %ymm1, %ymm1
>>        vinsertf128     $0, %xmm0, %ymm1, %ymm0
>> 
>> On AVX enabled processors, the vmovaps will zero the upper bits (255:128) of the corresponding YMM register.  Therefore, the vxorps and vinsertf128 instructions are not necessary.
>> 
>> This patch implements a DAG combine that removes the unnecessary vxorps and vinsertf128 instructions.  Currently, this is only working as an enhancement to one of Bruno's DAGcombines (r135727), but I do plan on making this more general in the future.
> 
> LGTM, just a few comments:
> 
> +      return DAG.getNode(ISD::BITCAST, dl, VT, ResNode);
> 
> Since you're early returning here,
> 
> +    } else {
> +      // Emit a zeroed vector and insert the desired subvector on its
> +      // first half.
> +      SDValue Zeros = getZeroVector(VT, true /* HasXMMInt */, DAG, dl);
> +      SDValue InsV = Insert128BitVector(Zeros, V1.getOperand(0),
> +                                        DAG.getConstant(0, MVT::i32), DAG, dl);
> +      return DCI.CombineTo(N, InsV);
> +    }
> 
> to follow llvm coding style, you don't need the "else".
> 
> +def X86vzload128  : SDNode<"X86ISD::VZEXT_LOAD128", SDTLoad,
> +                            [SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
> 
> +      // VZEXT_LOAD128 - Load vector and zero extend.
> +      VZEXT_LOAD128,
> +
> 
> Why not use the previous X86vzload and VZEXT_LOAD instead? You can you
> use it and still match it right by using v4i64 in the pattern.
> 
> 
> -- 
> Bruno Cardoso Lopes
> http://www.brunocardoso.cc




More information about the llvm-commits mailing list