[PATCH] Fix the failure exposed by checking constant integer argument range

Tue Nov 26 19:53:55 PST 2013

>
> With so many test changes, it's a little difficult to see exactly what
> defect this is trying to fix. Can you give a simple example of code
> that breaks currently? Or is suboptimal?
>
>
The C test case changes can show this clearly. Previously, all the test
case only pass constant 0 or 1 as the lane argument. Now for a single ACLE
intrinsic function, we have two test cases, one is to check upbound lane
number, and the other is check 0. For example,

 int16x4_t test_vmla_laneq_s16(int16x4_t a, int16x4_t b, int16x8_t v) {
   // CHECK: test_vmla_laneq_s16
-  return vmla_laneq_s16(a, b, v, 1);
-  // CHECK: mla {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[1]
+  return vmla_laneq_s16(a, b, v, 7);
+  // CHECK: mla {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[7]
 }

+int16x4_t test_vmla_laneq_s16_0(int16x4_t a, int16x4_t b, int16x8_t v) {
+  // CHECK: test_vmla_laneq_s16_0
+  return vmla_laneq_s16(a, b, v, 0);
+  // CHECK: mla {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.h[0]
+}

Without the patch, the function test_vmlaq_laneq_s16 would fail to generate mla
    v0.4h, v1.4h, v2.h[7], but two instructions,

        dup     d2, v2.d[1]
        mla     v0.4h, v1.4h, v2.h[3]

> The LowerVECTOR_SHUFFLE is to cover the following two cases,
> > 1) the 1st operand of VDUPlane is EXTRACT_SUBVECTOR
>
> OK, I see why this one is useful: if the extract index is non-zero
> then instead of "INS, OP" we can fold the INS or whatever into the
> main operation.
>
> > 2) the 1st operand of VDUPlane is CONCAT_VECTORS, for which the 2nd
> operand
> > is UNDEF.
>
> This one is more puzzling to me. I'd expect the version with the
> CONCAT to be more useful. Instructions accepting a NEON_VDUPLANE want
> an Rm with 128-bits don't they? If no such instruction exists to make
> use of it then there should be a pattern converting (CONCAT x,
> (IMPLICIT_DEF)) into a simple SUBREG_TO_REG as a last ditch effort.
>
>
First, we already have the pattern you mentioned, but it can't solve my
problem.
Second, looking at the DAG as below,

            0x21dd900: v2f32 = fneg 0x21dd800 [ORD=1] [ID=13]
            0x21dda00: v2f32 = undef [ID=4]
          0x21ddb00: v4f32 = concat_vectors 0x21dd900, 0x21dda00 [ORD=2]
[ID=14]
          0x21ddc00: v4f32 = undef [ID=5]
        0x21ddd00: v4f32 = vector_shuffle 0x21ddb00, 0x21ddc00<1,1,1,1>
[ORD=2] [ID=15]
            0x21af288: <multiple use>
            0x21dd300: f128 = Register %vreg1 [ID=2]
          0x21dd400: f128,ch = CopyFromReg 0x21af288, 0x21dd300 [ID=8]
        0x21dd500: v4f32 = bitcast 0x21dd400 [ID=11]
            0x21af288: <multiple use>
            0x21dd000: f128 = Register %vreg0 [ID=1]
          0x21dd100: f128,ch = CopyFromReg 0x21af288, 0x21dd000 [ID=7]
        0x21dd200: v4f32 = bitcast 0x21dd100 [ID=10]
      0x21dde00: v4f32 = fma 0x21ddd00, 0x21dd500, 0x21dd200 [ORD=3] [ID=16]

Here, if we don't remove the concat_vectors at lowering stage, we would
have to explicitly describe concat_vectors in the pattern being matched.
Otherwise, we would fail to combine fneg and fma to be a fms. But
obviously, when concat_vectors(xxx, undef) is followed by a VDUPLANE, which
is still a vector_shuffle before lowering, this concat_vectors is
meaningless and can be completely removed. With this optimization, we
needn't to match concat_vectors in the patterns at all, and the .md file
would be more neat, right? This is why you can see the code changes like,

-  def : NI_2VE_lane<!cast<Instruction>(subop # "_4s4s"), neon_uimm1_bare,
-                    op, VPR128, VPR128, VPR64, v4i32, v4i32, v2i32,
-                    BinOpFrag<(Neon_vduplane
-                                (Neon_combine_4S node:$LHS, undef),
-                                 node:$RHS)>>;

   def : NI_2VEswap_lane<!cast<Instruction>(subop # "_4s4s"),
                         neon_uimm1_bare, op, VPR128, VPR64, v4f32, v2f32,
-                        BinOpFrag<(Neon_vduplane
-                                    (Neon_combine_4f (fneg node:$LHS),
undef),
-                                    node:$RHS)>>;
+                        BinOpFrag<(Neon_vduplane (fneg node:$LHS),
node:$RHS)>>;

You can reproduce this problem with this piece of LLVM IR code,

declare <4 x float> @llvm.fma.v4f32(<4 x float>, <4 x float>, <4 x float>)
define <4 x float> @test_vfmsq_lane_f32(<4 x float> %a, <4 x float> %b, <2
x float> %v) {
; CHECK: test_vfmsq_lane_f32:
; CHECK: fmls {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.s[1]
entry:
  %sub = fsub <2 x float> <float -0.000000e+00, float -0.000000e+00>, %v
  %lane = shufflevector <2 x float> %sub, <2 x float> undef, <4 x i32> <i32
1, i32 1, i32 1, i32 1>
  %0 = tail call <4 x float> @llvm.fma.v4f32(<4 x float> %lane, <4 x float>
%b, <4 x float> %a)
  ret <4 x float> %0
}

> Could you take me through an example this elision improves?
>
> Cheers.
>
> Tim.
>

-- 
Thanks,
-Jiangning
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20131127/75e0797d/attachment.html>