[llvm-commits] [PATCH] Fix instruction selection for vduplane from v4f32 to v2f32

Thu Oct 4 04:07:36 PDT 2012

Hi all,

It would be good to get another review for this. David
started this thread so I don't know how OK it is to have
him as the reviewer.

Thanks,
Silviu

> -----Original Message-----
> From: Silviu Baranga [mailto:silbar01 at arm.com]
> Sent: 02 October 2012 16:29
> To: 'David Peixotto'
> Cc: llvm-commits at cs.uiuc.edu
> Subject: RE: [llvm-commits] [PATCH] Fix instruction selection for
> vduplane from v4f32 to v2f32
> 
> Hi David,
> 
> Thanks for the comments. The register coalescer should remove
> the unnecessary copy instructions. We just need to make sure
> that the index at which we are inserting is right for this
> purpose.
> 
> I've attached the updated patch.
> 
> Cheers,
> Silviu
> 
> > -----Original Message-----
> > From: David Peixotto [mailto:dpeixott at codeaurora.org]
> > Sent: 01 October 2012 18:06
> > To: Silviu Baranga; James Molloy
> > Cc: llvm-commits at cs.uiuc.edu
> > Subject: RE: [llvm-commits] [PATCH] Fix instruction selection for
> > vduplane from v4f32 to v2f32
> >
> > Hi Silviu,
> >
> > Thanks for the patch!
> >
> > It looks pretty good to me. I have a question about how we ensure the
> > copy gets eliminated (e.g. the INSERT_ELEMENT into the undef vector).
> > Does this rely on a downstream dag combine optimization, or is it
> done
> > in instruction selection? Can you add a note to your comment saying
> > where you expect this node will be eliminated?
> >
> > I have a few minor inline comments below. Other than that it looks
> good
> > to me.
> >
> >
> > Index: test/CodeGen/ARM/vdup.ll
> > ===================================================================
> > --- test/CodeGen/ARM/vdup.ll	(revision 164943)
> > +++ test/CodeGen/ARM/vdup.ll	(working copy)
> > @@ -295,3 +295,39 @@
> >    %4 = insertelement <4 x i32> %3, i32 255, i32 3
> >    ret <4 x i32> %4
> >  }
> > +
> > +define <2 x float> @check_f32(<4 x float> %v) nounwind {
> > +;CHECK: check_f32
> >
> > Should have a : after the check_f32 to ensure that we have found the
> > function symbol. I learned this tip after I made the first patch.
> Sorry
> > I missed it the first time around.
> >
> > +;CHECK: vdup.32 {{.*}}, d{{..}}[1]
> > +  %x = extractelement <4 x float> %v, i32 3
> > +  %1 = insertelement  <2 x float> undef, float %x, i32 0
> > +  %2 = insertelement  <2 x float> %1, float %x, i32 1
> > +  ret <2 x float> %2
> > +}
> > +
> > +define <2 x i32> @check_i32(<4 x i32> %v) nounwind {
> > +;CHECK: check_i32
> >
> > Add :
> >
> > +;CHECK: vdup.32 {{.*}}, d{{..}}[1]
> > +  %x = extractelement <4 x i32> %v, i32 3
> > +  %1 = insertelement  <2 x i32> undef, i32 %x, i32 0
> > +  %2 = insertelement  <2 x i32> %1, i32 %x, i32 1
> > +  ret <2 x i32> %2
> > +}
> > +
> > +define <4 x i16> @check_i16(<8 x i16> %v) nounwind {
> > +;CHECK: check_i16
> >
> > Add :
> >
> > +;CHECK: vdup.16 {{.*}}, d{{..}}[3]
> > +  %x = extractelement <8 x i16> %v, i32 3
> > +  %1 = insertelement  <4 x i16> undef, i16 %x, i32 0
> > +  %2 = insertelement  <4 x i16> %1, i16 %x, i32 1
> > +  ret <4 x i16> %2
> > +}
> > +
> > +define <8 x i8> @check_i8(<16 x i8> %v) nounwind {
> > +;CHECK: check_i8
> >
> > Add :
> >
> > +;CHECK: vdup.8 {{.*}}, d{{..}}[3]
> > +  %x = extractelement <16 x i8> %v, i32 3
> > +  %1 = insertelement  <8  x i8> undef, i8 %x, i32 0
> > +  %2 = insertelement  <8  x i8> %1, i8 %x, i32 1
> > +  ret <8 x i8> %2
> > +}
> > Index: lib/Target/ARM/ARMISelLowering.cpp
> > ===================================================================
> > --- lib/Target/ARM/ARMISelLowering.cpp	(revision 164943)
> > +++ lib/Target/ARM/ARMISelLowering.cpp	(working copy)
> > @@ -4215,9 +4319,26 @@
> >        // If we are VDUPing a value that comes directly from a
> vector,
> > that will
> >        // cause an unnecessary move to and from a GPR, where instead
> we
> > could
> >        // just use VDUPLANE.
> > -      if (Value->getOpcode() == ISD::EXTRACT_VECTOR_ELT)
> > -        N = DAG.getNode(ARMISD::VDUPLANE, dl, VT,
> > +      if (Value->getOpcode() == ISD::EXTRACT_VECTOR_ELT) {
> > +        // We need to create a new undef vector to use for the
> > VDUPLANE if the
> > +        // size of the vector from which we get the value is
> different
> > than the
> > +        // size of the vector that we need to create. We will insert
> > the element
> > +        // such that we won't produce unnecessary copies.
> >
> > Extra space at the end of the sentence.
> >
> > +        if (VT != Value->getOperand(0).getValueType()) {
> > +          ConstantSDNode *constIndex;
> > +          constIndex = dyn_cast<ConstantSDNode>(Value-
> >getOperand(1));
> > +          assert(constIndex && "The index is not a constant!");
> > +          unsigned index = constIndex-
> > >getAPIntValue().getLimitedValue() %
> > +                             VT.getVectorNumElements();
> > +          N =  DAG.getNode(ARMISD::VDUPLANE, dl, VT,
> > +                 DAG.getNode(ISD::INSERT_VECTOR_ELT, dl, VT,
> > DAG.getUNDEF(VT),
> >
> > Extra space at the end of the line above.
> >
> > +                        Value, DAG.getConstant(index, MVT::i32)),
> > +                           DAG.getConstant(index, MVT::i32));
> > +        } else {
> > +          N = DAG.getNode(ARMISD::VDUPLANE, dl, VT,
> >                          Value->getOperand(0), Value->getOperand(1));
> > +        }
> > +      }
> >        else
> >          N = DAG.getNode(ARMISD::VDUP, dl, VT, Value);
> >
> >
> > -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> > hosted by The Linux Foundation
> >
> >
> >