[llvm-commits] [PATCH] Fix instruction selection for vduplane from v4f32 to v2f32

Tue Oct 2 08:28:40 PDT 2012

Hi David,

Thanks for the comments. The register coalescer should remove
the unnecessary copy instructions. We just need to make sure
that the index at which we are inserting is right for this
purpose.

I've attached the updated patch.

Cheers,
Silviu

> -----Original Message-----
> From: David Peixotto [mailto:dpeixott at codeaurora.org]
> Sent: 01 October 2012 18:06
> To: Silviu Baranga; James Molloy
> Cc: llvm-commits at cs.uiuc.edu
> Subject: RE: [llvm-commits] [PATCH] Fix instruction selection for
> vduplane from v4f32 to v2f32
> 
> Hi Silviu,
> 
> Thanks for the patch!
> 
> It looks pretty good to me. I have a question about how we ensure the
> copy gets eliminated (e.g. the INSERT_ELEMENT into the undef vector).
> Does this rely on a downstream dag combine optimization, or is it done
> in instruction selection? Can you add a note to your comment saying
> where you expect this node will be eliminated?
> 
> I have a few minor inline comments below. Other than that it looks good
> to me.
> 
> 
> Index: test/CodeGen/ARM/vdup.ll
> ===================================================================
> --- test/CodeGen/ARM/vdup.ll	(revision 164943)
> +++ test/CodeGen/ARM/vdup.ll	(working copy)
> @@ -295,3 +295,39 @@
>    %4 = insertelement <4 x i32> %3, i32 255, i32 3
>    ret <4 x i32> %4
>  }
> +
> +define <2 x float> @check_f32(<4 x float> %v) nounwind {
> +;CHECK: check_f32
> 
> Should have a : after the check_f32 to ensure that we have found the
> function symbol. I learned this tip after I made the first patch. Sorry
> I missed it the first time around.
> 
> +;CHECK: vdup.32 {{.*}}, d{{..}}[1]
> +  %x = extractelement <4 x float> %v, i32 3
> +  %1 = insertelement  <2 x float> undef, float %x, i32 0
> +  %2 = insertelement  <2 x float> %1, float %x, i32 1
> +  ret <2 x float> %2
> +}
> +
> +define <2 x i32> @check_i32(<4 x i32> %v) nounwind {
> +;CHECK: check_i32
> 
> Add :
> 
> +;CHECK: vdup.32 {{.*}}, d{{..}}[1]
> +  %x = extractelement <4 x i32> %v, i32 3
> +  %1 = insertelement  <2 x i32> undef, i32 %x, i32 0
> +  %2 = insertelement  <2 x i32> %1, i32 %x, i32 1
> +  ret <2 x i32> %2
> +}
> +
> +define <4 x i16> @check_i16(<8 x i16> %v) nounwind {
> +;CHECK: check_i16
> 
> Add :
> 
> +;CHECK: vdup.16 {{.*}}, d{{..}}[3]
> +  %x = extractelement <8 x i16> %v, i32 3
> +  %1 = insertelement  <4 x i16> undef, i16 %x, i32 0
> +  %2 = insertelement  <4 x i16> %1, i16 %x, i32 1
> +  ret <4 x i16> %2
> +}
> +
> +define <8 x i8> @check_i8(<16 x i8> %v) nounwind {
> +;CHECK: check_i8
> 
> Add :
> 
> +;CHECK: vdup.8 {{.*}}, d{{..}}[3]
> +  %x = extractelement <16 x i8> %v, i32 3
> +  %1 = insertelement  <8  x i8> undef, i8 %x, i32 0
> +  %2 = insertelement  <8  x i8> %1, i8 %x, i32 1
> +  ret <8 x i8> %2
> +}
> Index: lib/Target/ARM/ARMISelLowering.cpp
> ===================================================================
> --- lib/Target/ARM/ARMISelLowering.cpp	(revision 164943)
> +++ lib/Target/ARM/ARMISelLowering.cpp	(working copy)
> @@ -4215,9 +4319,26 @@
>        // If we are VDUPing a value that comes directly from a vector,
> that will
>        // cause an unnecessary move to and from a GPR, where instead we
> could
>        // just use VDUPLANE.
> -      if (Value->getOpcode() == ISD::EXTRACT_VECTOR_ELT)
> -        N = DAG.getNode(ARMISD::VDUPLANE, dl, VT,
> +      if (Value->getOpcode() == ISD::EXTRACT_VECTOR_ELT) {
> +        // We need to create a new undef vector to use for the
> VDUPLANE if the
> +        // size of the vector from which we get the value is different
> than the
> +        // size of the vector that we need to create. We will insert
> the element
> +        // such that we won't produce unnecessary copies.
> 
> Extra space at the end of the sentence.
> 
> +        if (VT != Value->getOperand(0).getValueType()) {
> +          ConstantSDNode *constIndex;
> +          constIndex = dyn_cast<ConstantSDNode>(Value->getOperand(1));
> +          assert(constIndex && "The index is not a constant!");
> +          unsigned index = constIndex-
> >getAPIntValue().getLimitedValue() %
> +                             VT.getVectorNumElements();
> +          N =  DAG.getNode(ARMISD::VDUPLANE, dl, VT,
> +                 DAG.getNode(ISD::INSERT_VECTOR_ELT, dl, VT,
> DAG.getUNDEF(VT),
> 
> Extra space at the end of the line above.
> 
> +                        Value, DAG.getConstant(index, MVT::i32)),
> +                           DAG.getConstant(index, MVT::i32));
> +        } else {
> +          N = DAG.getNode(ARMISD::VDUPLANE, dl, VT,
>                          Value->getOperand(0), Value->getOperand(1));
> +        }
> +      }
>        else
>          N = DAG.getNode(ARMISD::VDUP, dl, VT, Value);
> 
> 
> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by The Linux Foundation
> 
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix.diff
Type: application/octet-stream
Size: 3209 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121002/e5f21009/attachment.obj>