[llvm-commits] [PATCH] Fix instruction selection for vduplane from v4f32 to v2f32

Mon Oct 1 10:06:09 PDT 2012

Hi Silviu,

Thanks for the patch!

It looks pretty good to me. I have a question about how we ensure the copy gets eliminated (e.g. the INSERT_ELEMENT into the undef vector). Does this rely on a downstream dag combine optimization, or is it done in instruction selection? Can you add a note to your comment saying where you expect this node will be eliminated?

I have a few minor inline comments below. Other than that it looks good to me.


Index: test/CodeGen/ARM/vdup.ll
===================================================================

--- test/CodeGen/ARM/vdup.ll	(revision 164943)
+++ test/CodeGen/ARM/vdup.ll	(working copy)
@@ -295,3 +295,39 @@
   %4 = insertelement <4 x i32> %3, i32 255, i32 3
   ret <4 x i32> %4
 }
+
+define <2 x float> @check_f32(<4 x float> %v) nounwind {
+;CHECK: check_f32

Should have a : after the check_f32 to ensure that we have found the function symbol. I learned this tip after I made the first patch. Sorry I missed it the first time around.

+;CHECK: vdup.32 {{.*}}, d{{..}}[1]
+  %x = extractelement <4 x float> %v, i32 3
+  %1 = insertelement  <2 x float> undef, float %x, i32 0
+  %2 = insertelement  <2 x float> %1, float %x, i32 1
+  ret <2 x float> %2
+}
+
+define <2 x i32> @check_i32(<4 x i32> %v) nounwind {
+;CHECK: check_i32

Add :

+;CHECK: vdup.32 {{.*}}, d{{..}}[1]
+  %x = extractelement <4 x i32> %v, i32 3
+  %1 = insertelement  <2 x i32> undef, i32 %x, i32 0
+  %2 = insertelement  <2 x i32> %1, i32 %x, i32 1
+  ret <2 x i32> %2
+}
+
+define <4 x i16> @check_i16(<8 x i16> %v) nounwind {
+;CHECK: check_i16

Add :

+;CHECK: vdup.16 {{.*}}, d{{..}}[3]
+  %x = extractelement <8 x i16> %v, i32 3
+  %1 = insertelement  <4 x i16> undef, i16 %x, i32 0
+  %2 = insertelement  <4 x i16> %1, i16 %x, i32 1
+  ret <4 x i16> %2
+}
+
+define <8 x i8> @check_i8(<16 x i8> %v) nounwind {
+;CHECK: check_i8

Add :

+;CHECK: vdup.8 {{.*}}, d{{..}}[3]
+  %x = extractelement <16 x i8> %v, i32 3
+  %1 = insertelement  <8  x i8> undef, i8 %x, i32 0
+  %2 = insertelement  <8  x i8> %1, i8 %x, i32 1
+  ret <8 x i8> %2
+}
Index: lib/Target/ARM/ARMISelLowering.cpp
===================================================================
--- lib/Target/ARM/ARMISelLowering.cpp	(revision 164943)
+++ lib/Target/ARM/ARMISelLowering.cpp	(working copy)
@@ -4215,9 +4319,26 @@
       // If we are VDUPing a value that comes directly from a vector, that will
       // cause an unnecessary move to and from a GPR, where instead we could
       // just use VDUPLANE.
-      if (Value->getOpcode() == ISD::EXTRACT_VECTOR_ELT)
-        N = DAG.getNode(ARMISD::VDUPLANE, dl, VT,
+      if (Value->getOpcode() == ISD::EXTRACT_VECTOR_ELT) {
+        // We need to create a new undef vector to use for the VDUPLANE if the
+        // size of the vector from which we get the value is different than the
+        // size of the vector that we need to create. We will insert the element
+        // such that we won't produce unnecessary copies. 

Extra space at the end of the sentence.

+        if (VT != Value->getOperand(0).getValueType()) {
+          ConstantSDNode *constIndex;
+          constIndex = dyn_cast<ConstantSDNode>(Value->getOperand(1));
+          assert(constIndex && "The index is not a constant!");
+          unsigned index = constIndex->getAPIntValue().getLimitedValue() %
+                             VT.getVectorNumElements();
+          N =  DAG.getNode(ARMISD::VDUPLANE, dl, VT,
+                 DAG.getNode(ISD::INSERT_VECTOR_ELT, dl, VT, DAG.getUNDEF(VT), 

Extra space at the end of the line above.

+                        Value, DAG.getConstant(index, MVT::i32)),
+                           DAG.getConstant(index, MVT::i32));
+        } else {
+          N = DAG.getNode(ARMISD::VDUPLANE, dl, VT,
                         Value->getOperand(0), Value->getOperand(1));
+        }
+      }
       else
         N = DAG.getNode(ARMISD::VDUP, dl, VT, Value);


-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation



> -----Original Message-----
> From: Silviu Baranga [mailto:silbar01 at arm.com]
> Sent: Monday, October 01, 2012 9:01 AM
> To: 'David Peixotto'; James Molloy
> Cc: llvm-commits at cs.uiuc.edu
> Subject: RE: [llvm-commits] [PATCH] Fix instruction selection for vduplane
> from v4f32 to v2f32
> 
> Hi,
> 
> I'm attaching a patch to fix this issue (the ARM target specific BUILD_VECTOR
> lowering code is creating an invalid VDUPLANE node because it doesn't check
> that the input vector type is correct).
> 
> The patch modifies the ARM BUILD_VECTOR lowering code and checks if the
> VDUPLANE input vector type is the same as the output type. If not, we insert
> the lane that needs to be duplicated in an Undef vector with the correct
> type, and we perform the VDUPLANE on that.
> 
> Please review!
> 
> Thanks,
> Silviu
> 
> > -----Original Message-----
> > From: David Peixotto [mailto:dpeixott at codeaurora.org]
> > Sent: 28 September 2012 17:50
> > To: James Molloy
> > Cc: Silviu Baranga; llvm-commits at cs.uiuc.edu
> > Subject: RE: [llvm-commits] [PATCH] Fix instruction selection for
> > vduplane from v4f32 to v2f32
> >
> > Hi James,
> >
> > Thanks for the reply. Your plan sounds good to me. I definitely don't
> > want to lose this feature, but I just wasn't sure when I would have a
> > chance to fix the way Silviu was suggesting.
> >
> > I doubt I can get to it before next week, so if you have a chance that
> > would be wonderful :) My patch contains a proper test case that you
> > should be able to reuse for your fix.
> >
> > Thanks again for your help!
> >
> > David
> >
> > -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> > hosted by The Linux Foundation
> >
> >
> > > -----Original Message-----
> > > From: James Molloy [mailto:james.molloy at arm.com]
> > > Sent: Friday, September 28, 2012 9:26 AM
> > > To: David Peixotto
> > > Cc: Silviu Baranga; llvm-commits at cs.uiuc.edu
> > > Subject: RE: [llvm-commits] [PATCH] Fix instruction selection for
> > vduplane
> > > from v4f32 to v2f32
> > >
> > > Hi David,
> > >
> > > [Myself and Silviu worked on this patch.]
> > >
> > > I don't think that reverting this patch is the right course of
> > > action
> > to take.
> > > You've found a bug in LLVM - the right action is to fix the bug, not
> > just go in
> > > and rip out the thing that caused it*. Else we'd never make any
> > progress; all
> > > bugs are related to some change in some way.
> > >
> > > I think the right course of action is in general to follow  the
> > > usual
> > process of
> > > raising a PR (which you've done). That way someone will get around
> > > to
> > fixing
> > > it before the next release.
> > >
> > > For this particular issue, myself or Silviu can look at this next
> > week, but if you
> > > have time/inclination to fix it beforehand, that'd be a bonus :)
> > >
> > > Cheers,
> > >
> > > James
> > >
> > > *: Assuming the patch in question passed all pre/post commit checks,
> > which
> > > for LLVM involve at least one code review, regression tests passing
> > and no
> > > test-suite regressions.
> > >
> > > On Fri, 2012-09-28 at 17:02 +0100, David Peixotto wrote:
> > > > Hi Silviu,
> > > >
> > > >
> > > >
> > > > Thanks for the explanation. Now I understand your concerns.
> > > >
> > > >
> > > >
> > > > I agree that it seems like a good idea to keep the DAG in a form
> > that
> > > > the instruction selector expects. Unfortunately, I don’t have a
> > > > lot
> > of
> > > > time to make that change right now. I’ll see if I can get to it,
> > but
> > > > I’m not sure I will have the bandwidth anytime soon.
> > > >
> > > >
> > > >
> > > > Would it be a good idea to revert the commit that introduced the
> > > > problem? Would it make sense to apply this patch for now until I
> > can
> > > > move the fix to the higher level?
> > > >
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > David
> > > >
> > > >
> > > >
> > > > -- Qualcomm Innovation Center, Inc. is a member of Code Aurora
> > Forum,
> > > > hosted by The Linux Foundation
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > From: Silviu Baranga [mailto:silbar01 at arm.com]
> > > > Sent: Thursday, September 27, 2012 10:38 AM
> > > > To: 'David Peixotto'; James Molloy; llvm-commits at cs.uiuc.edu
> > > > Subject: RE: [llvm-commits] [PATCH] Fix instruction selection for
> > > > vduplane from v4f32 to v2f32
> > > >
> > > >
> > > >
> > > >
> > > > Hi David,
> > > >
> > > >
> > > >
> > > > The main problem is that the code written so far (before r163304)
> > is
> > > > assuming
> > > >
> > > > that type of the input value for vduplane is the same as the
> > > > output type. This is natural,
> > > >
> > > > since it reflects the actual vdup instruction.
> > > >
> > > >
> > > >
> > > > The pattern solution is identical to what I suggest, only it is in
> > a
> > > > different place.
> > > >
> > > > There could be problems that we don’t know about between the
> > > > build_vector lowering
> > > >
> > > > code and the instruction selection phase.
> > > >
> > > >
> > > >
> > > > Cheers,
> > > >
> > > > Silviu
> > > >
> > > >
> > > >
> > > > From: David Peixotto [mailto:dpeixott at codeaurora.org]
> > > > Sent: 27 September 2012 17:41
> > > > To: Silviu Baranga; James Molloy; llvm-commits at cs.uiuc.edu
> > > > Subject: RE: [llvm-commits] [PATCH] Fix instruction selection for
> > > > vduplane from v4f32 to v2f32
> > > >
> > > >
> > > >
> > > >
> > > > Hi Silviu,
> > > >
> > > >
> > > >
> > > > Thanks for the suggestion. That sounds like another way to fix the
> > > > issue. We started seeing failures after r163304 that added extra
> > > > support for vduplane
> > > > (http://llvm.org/viewvc/llvm-project?view=rev&revision=163304).
> > That
> > > > change was a nice feature and I thought it had just missed a few
> > > > patterns for going from Q to D vectors.
> > > >
> > > >
> > > >
> > > > Is there a disadvantage to adding these patterns? If I understand
> > > > correctly, I think the patterns are essentially doing what you
> > > > suggest, but doing it in the instruction selector itself rather
> > than
> > > > with code.
> > > >
> > > >
> > > >
> > > > -- Qualcomm Innovation Center, Inc. is a member of Code Aurora
> > Forum,
> > > > hosted by The Linux Foundation
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > From: Silviu Baranga [mailto:silbar01 at arm.com]
> > > > Sent: Thursday, September 27, 2012 2:02 AM
> > > > To: 'David Peixotto'; James Molloy; llvm-commits at cs.uiuc.edu
> > > > Subject: RE: [llvm-commits] [PATCH] Fix instruction selection for
> > > > vduplane from v4f32 to v2f32
> > > >
> > > >
> > > >
> > > >
> > > > Hi David,
> > > >
> > > >
> > > >
> > > > Wouldn’t it be better to fix the original problem?
> > > >
> > > >
> > > >
> > > > It seems that the problem is that the BUILD_VECTOR lowering code
> > > > is creating a VDUPLANE
> > > >
> > > > node that takes a v4i32/v4f32 and creates a v2i32/v2f32. It
> > shouldn’t
> > > > do that. It should extract
> > > >
> > > > the D subreg and do a VDUPLANE on that in this case.
> > > >
> > > >
> > > >
> > > > Cheers,
> > > >
> > > > Silviu
> > > >
> > > > From: llvm-commits-bounces at cs.uiuc.edu
> > > > [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of David
> > Peixotto
> > > > Sent: 27 September 2012 00:55
> > > > To: llvm-commits at cs.uiuc.edu
> > > > Subject: [llvm-commits] [PATCH] Fix instruction selection for
> > vduplane
> > > > from v4f32 to v2f32
> > > >
> > > >
> > > >
> > > >
> > > > We are seeing some instruction selection failures for vduplane
> > > > from larger vectors to smaller vectors. See PR13938:
> > > > http://llvm.org/bugs/show_bug.cgi?id=13938.
> > > >
> > > >
> > > >
> > > > A patch to fix this issue is attached.
> > > >
> > > >
> > > >
> > > > This patch fixes an instruction selection failure for
> > > >
> > > > ARMISD::VDUPLANE. The selector was missing patterns for using vdup
> > > >
> > > > to initialize a vector when the source vector is a Q-reg and the
> > > >
> > > > destination vector is a D-reg. The fix is to add new patterns to
> > > >
> > > > allow the selector to properly select this pattern.
> > > >
> > > >
> > > >
> > > > -- Qualcomm Innovation Center, Inc. is a member of Code Aurora
> > Forum,
> > > > hosted by The Linux Foundation
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> >
> >