[llvm] r233693 - [AArch64] Add v8.1a "Rounding Double Multiply Add/Subtract" extension

Vladimir Sukharev Vladimir.Sukharev at arm.com
Tue Mar 31 09:14:47 PDT 2015


Hi, that's very strange picture - http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental_check/6923/testReport/junit/LLVM/CodeGen_AArch64/arm64_neon_v8_1a_ll/
This test is passed on my side, and it's weird I see apple syntax " sqrdmulh.4h v1, v1, v2", while "CHECK-V8a" denotes there was compilation without " -aarch64-neon-syntax=apple" flag.

Would you please provide me with outputs of all these commands?
/Users/buildslave/jenkins/sharedspace/incremental at 2/clang-build/./bin/llc < /Users/buildslave/jenkins/sharedspace/incremental at 2/llvm/test/CodeGen/AArch64/arm64-neon-v8.1a.ll -verify-machineinstrs -march=arm64
/Users/buildslave/jenkins/sharedspace/incremental at 2/clang-build/./bin/llc < /Users/buildslave/jenkins/sharedspace/incremental at 2/llvm/test/CodeGen/AArch64/arm64-neon-v8.1a.ll -verify-machineinstrs -march=arm64 -mattr=+v8.1a
/Users/buildslave/jenkins/sharedspace/incremental at 2/clang-build/./bin/llc < /Users/buildslave/jenkins/sharedspace/incremental at 2/llvm/test/CodeGen/AArch64/arm64-neon-v8.1a.ll -verify-machineinstrs -march=arm64 -mattr=+v8.1a -aarch64-neon-syntax=apple

Thanks, Vladimir


-----Original Message-----
From: Justin Bogner [mailto:justin at justinbogner.com] On Behalf Of Justin Bogner
Sent: 31 March 2015 17:07
To: Vladimir Sukharev
Cc: llvm-commits at cs.uiuc.edu
Subject: Re: [llvm] r233693 - [AArch64] Add v8.1a "Rounding Double Multiply Add/Subtract" extension

Vladimir Sukharev <vladimir.sukharev at arm.com> writes:
> Author: vsukharev
> Date: Tue Mar 31 08:15:48 2015
> New Revision: 233693
>
> URL: http://llvm.org/viewvc/llvm-project?rev=233693&view=rev
> Log:
> [AArch64] Add v8.1a "Rounding Double Multiply Add/Subtract" extension

The CodeGen/AArch64/arm64-neon-v8.1a.ll test fails after this, both
locally and on this bot:

http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental_check/6923/

Can you fix this right away, or should I revert?

> Reviewers: t.p.northover, jmolloy
>
> Subscribers: llvm-commits
>
> Differential Revision: http://reviews.llvm.org/D8502
>
> Added:
>     llvm/trunk/test/CodeGen/AArch64/arm64-neon-v8.1a.ll
>     llvm/trunk/test/MC/AArch64/armv8.1a-rdma.s   (with props)
>     llvm/trunk/test/MC/Disassembler/AArch64/armv8.1a-rdma.txt   (with props)
> Modified:
>     llvm/trunk/lib/Target/AArch64/AArch64InstrFormats.td
>     llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.td
>
> Modified: llvm/trunk/lib/Target/AArch64/AArch64InstrFormats.td
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64InstrFormats.td?rev=233693&r1=233692&r2=233693&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/AArch64/AArch64InstrFormats.td (original)
> +++ llvm/trunk/lib/Target/AArch64/AArch64InstrFormats.td Tue Mar 31 08:15:48 2015
> @@ -5300,6 +5300,27 @@ class BaseSIMDThreeScalar<bit U, bits<2>
>    let Inst{4-0}   = Rd;
>  }
>
> +let mayStore = 0, mayLoad = 0, hasSideEffects = 0 in
> +class BaseSIMDThreeScalarTied<bit U, bits<2> size, bit R, bits<5> opcode,
> +                        dag oops, dag iops, string asm,
> +            list<dag> pattern>
> +  : I<oops, iops, asm, "\t$Rd, $Rn, $Rm", "$Rd = $dst", pattern>,
> +    Sched<[WriteV]> {
> +  bits<5> Rd;
> +  bits<5> Rn;
> +  bits<5> Rm;
> +  let Inst{31-30} = 0b01;
> +  let Inst{29}    = U;
> +  let Inst{28-24} = 0b11110;
> +  let Inst{23-22} = size;
> +  let Inst{21}    = R;
> +  let Inst{20-16} = Rm;
> +  let Inst{15-11} = opcode;
> +  let Inst{10}    = 1;
> +  let Inst{9-5}   = Rn;
> +  let Inst{4-0}   = Rd;
> +}
> +
>  multiclass SIMDThreeScalarD<bit U, bits<5> opc, string asm,
>                              SDPatternOperator OpNode> {
>    def v1i64  : BaseSIMDThreeScalar<U, 0b11, opc, FPR64, asm,
> @@ -5327,6 +5348,16 @@ multiclass SIMDThreeScalarHS<bit U, bits
>    def v1i16  : BaseSIMDThreeScalar<U, 0b01, opc, FPR16, asm, []>;
>  }
>
> +multiclass SIMDThreeScalarHSTied<bit U, bit R, bits<5> opc, string asm,
> +                                 SDPatternOperator OpNode = null_frag> {
> +  def v1i32: BaseSIMDThreeScalarTied<U, 0b10, R, opc, (outs FPR32:$dst),
> +                                     (ins FPR32:$Rd, FPR32:$Rn, FPR32:$Rm),
> +                                     asm, []>;
> +  def v1i16: BaseSIMDThreeScalarTied<U, 0b01, R, opc, (outs FPR16:$dst),
> +                                     (ins FPR16:$Rd, FPR16:$Rn, FPR16:$Rm),
> +                                     asm, []>;
> +}
> +
>  multiclass SIMDThreeScalarSD<bit U, bit S, bits<5> opc, string asm,
>                               SDPatternOperator OpNode = null_frag> {
>    let mayLoad = 0, mayStore = 0, hasSideEffects = 0 in {
> @@ -8519,6 +8550,174 @@ multiclass SIMDLdSt4SingleAliases<string
>  } // end of 'let Predicates = [HasNEON]'
>
>  //----------------------------------------------------------------------------
> +// AdvSIMD v8.1 Rounding Double Multiply Add/Subtract
> +//----------------------------------------------------------------------------
> +
> +let Predicates = [HasNEON, HasV8_1a] in {
> +
> +class BaseSIMDThreeSameVectorTiedR0<bit Q, bit U, bits<2> size, bits<5> opcode,
> +                                    RegisterOperand regtype, string asm,
> +                                    string kind, list<dag> pattern>
> +  : BaseSIMDThreeSameVectorTied<Q, U, size, opcode, regtype, asm, kind,
> +                                pattern> {
> +  let Inst{21}=0;
> +}
> +multiclass SIMDThreeSameVectorSQRDMLxHTiedHS<bit U, bits<5> opc, string asm,
> +                                             SDPatternOperator Accum> {
> +  def v4i16 : BaseSIMDThreeSameVectorTiedR0<0, U, 0b01, opc, V64, asm, ".4h",
> +    [(set (v4i16 V64:$dst),
> +          (Accum (v4i16 V64:$Rd),
> +                 (v4i16 (int_aarch64_neon_sqrdmulh (v4i16 V64:$Rn),
> +                                                   (v4i16 V64:$Rm)))))]>;
> +  def v8i16 : BaseSIMDThreeSameVectorTiedR0<1, U, 0b01, opc, V128, asm, ".8h",
> +    [(set (v8i16 V128:$dst),
> +          (Accum (v8i16 V128:$Rd),
> +                 (v8i16 (int_aarch64_neon_sqrdmulh (v8i16 V128:$Rn),
> +                                                   (v8i16 V128:$Rm)))))]>;
> +  def v2i32 : BaseSIMDThreeSameVectorTiedR0<0, U, 0b10, opc, V64, asm, ".2s",
> +    [(set (v2i32 V64:$dst),
> +          (Accum (v2i32 V64:$Rd),
> +                 (v2i32 (int_aarch64_neon_sqrdmulh (v2i32 V64:$Rn),
> +                                                   (v2i32 V64:$Rm)))))]>;
> +  def v4i32 : BaseSIMDThreeSameVectorTiedR0<1, U, 0b10, opc, V128, asm, ".4s",
> +    [(set (v4i32 V128:$dst),
> +          (Accum (v4i32 V128:$Rd),
> +                 (v4i32 (int_aarch64_neon_sqrdmulh (v4i32 V128:$Rn),
> +                                                   (v4i32 V128:$Rm)))))]>;
> +}
> +
> +multiclass SIMDIndexedSQRDMLxHSDTied<bit U, bits<4> opc, string asm,
> +                                     SDPatternOperator Accum> {
> +  def v4i16_indexed : BaseSIMDIndexedTied<0, U, 0, 0b01, opc,
> +                                          V64, V64, V128_lo, VectorIndexH,
> +                                          asm, ".4h", ".4h", ".4h", ".h",
> +    [(set (v4i16 V64:$dst),
> +          (Accum (v4i16 V64:$Rd),
> +                 (v4i16 (int_aarch64_neon_sqrdmulh
> +                          (v4i16 V64:$Rn),
> +                          (v4i16 (AArch64duplane16 (v8i16 V128_lo:$Rm),
> +                                                    VectorIndexH:$idx))))))]> {
> +    bits<3> idx;
> +    let Inst{11} = idx{2};
> +    let Inst{21} = idx{1};
> +    let Inst{20} = idx{0};
> +  }
> +
> +  def v8i16_indexed : BaseSIMDIndexedTied<1, U, 0, 0b01, opc,
> +                                          V128, V128, V128_lo, VectorIndexH,
> +                                          asm, ".8h", ".8h", ".8h", ".h",
> +    [(set (v8i16 V128:$dst),
> +          (Accum (v8i16 V128:$Rd),
> +                 (v8i16 (int_aarch64_neon_sqrdmulh
> +                          (v8i16 V128:$Rn),
> +                          (v8i16 (AArch64duplane16 (v8i16 V128_lo:$Rm),
> +                                                   VectorIndexH:$idx))))))]> {
> +    bits<3> idx;
> +    let Inst{11} = idx{2};
> +    let Inst{21} = idx{1};
> +    let Inst{20} = idx{0};
> +  }
> +
> +  def v2i32_indexed : BaseSIMDIndexedTied<0, U, 0, 0b10, opc,
> +                                          V64, V64, V128, VectorIndexS,
> +                                          asm, ".2s", ".2s", ".2s", ".s",
> +    [(set (v2i32 V64:$dst),
> +        (Accum (v2i32 V64:$Rd),
> +               (v2i32 (int_aarch64_neon_sqrdmulh
> +                        (v2i32 V64:$Rn),
> +                        (v2i32 (AArch64duplane32 (v4i32 V128:$Rm),
> +                                                 VectorIndexS:$idx))))))]> {
> +    bits<2> idx;
> +    let Inst{11} = idx{1};
> +    let Inst{21} = idx{0};
> +  }
> +
> +  // FIXME: it would be nice to use the scalar (v1i32) instruction here, but
> +  // an intermediate EXTRACT_SUBREG would be untyped.
> +  // FIXME: direct EXTRACT_SUBREG from v2i32 to i32 is illegal, that's why we
> +  // got it lowered here as (i32 vector_extract (v4i32 insert_subvector(..)))
> +  def : Pat<(i32 (Accum (i32 FPR32Op:$Rd),
> +                       (i32 (vector_extract
> +                               (v4i32 (insert_subvector
> +                                       (undef),
> +                                        (v2i32 (int_aarch64_neon_sqrdmulh
> +                                                 (v2i32 V64:$Rn),
> +                                                 (v2i32 (AArch64duplane32
> +                                                          (v4i32 V128:$Rm),
> +                                                          VectorIndexS:$idx)))),
> +                                      (i32 0))),
> +                               (i64 0))))),
> +            (EXTRACT_SUBREG
> +                (v2i32 (!cast<Instruction>(NAME # v2i32_indexed)
> +                          (v2i32 (INSERT_SUBREG (v2i32 (IMPLICIT_DEF)),
> +                                                FPR32Op:$Rd,
> +                                                ssub)),
> +                          V64:$Rn,
> +                          V128:$Rm,
> +                          VectorIndexS:$idx)),
> +                ssub)>;
> +
> +  def v4i32_indexed : BaseSIMDIndexedTied<1, U, 0, 0b10, opc,
> +                                          V128, V128, V128, VectorIndexS,
> +                                          asm, ".4s", ".4s", ".4s", ".s",
> +    [(set (v4i32 V128:$dst),
> +          (Accum (v4i32 V128:$Rd),
> +                 (v4i32 (int_aarch64_neon_sqrdmulh
> +                          (v4i32 V128:$Rn),
> +                          (v4i32 (AArch64duplane32 (v4i32 V128:$Rm),
> +                                                   VectorIndexS:$idx))))))]> {
> +    bits<2> idx;
> +    let Inst{11} = idx{1};
> +    let Inst{21} = idx{0};
> +  }
> +
> +  // FIXME: it would be nice to use the scalar (v1i32) instruction here, but
> +  // an intermediate EXTRACT_SUBREG would be untyped.
> +  def : Pat<(i32 (Accum (i32 FPR32Op:$Rd),
> +                        (i32 (vector_extract
> +                               (v4i32 (int_aarch64_neon_sqrdmulh
> +                                        (v4i32 V128:$Rn),
> +                                        (v4i32 (AArch64duplane32
> +                                                 (v4i32 V128:$Rm),
> +                                                 VectorIndexS:$idx)))),
> +                               (i64 0))))),
> +            (EXTRACT_SUBREG
> +                (v4i32 (!cast<Instruction>(NAME # v4i32_indexed)
> +                         (v4i32 (INSERT_SUBREG (v4i32 (IMPLICIT_DEF)),
> +                                               FPR32Op:$Rd,
> +                                               ssub)),
> +                         V128:$Rn,
> +                         V128:$Rm,
> +                         VectorIndexS:$idx)),
> +                ssub)>;
> +
> +  def i16_indexed : BaseSIMDIndexedTied<1, U, 1, 0b01, opc,
> +                                        FPR16Op, FPR16Op, V128_lo,
> +                                        VectorIndexH, asm, ".h", "", "", ".h",
> +                                        []> {
> +    bits<3> idx;
> +    let Inst{11} = idx{2};
> +    let Inst{21} = idx{1};
> +    let Inst{20} = idx{0};
> +  }
> +
> +  def i32_indexed : BaseSIMDIndexedTied<1, U, 1, 0b10, opc,
> +                                        FPR32Op, FPR32Op, V128, VectorIndexS,
> +                                        asm, ".s", "", "", ".s",
> +    [(set (i32 FPR32Op:$dst),
> +          (Accum (i32 FPR32Op:$Rd),
> +                 (i32 (int_aarch64_neon_sqrdmulh
> +                        (i32 FPR32Op:$Rn),
> +                        (i32 (vector_extract (v4i32 V128:$Rm),
> +                                             VectorIndexS:$idx))))))]> {
> +    bits<2> idx;
> +    let Inst{11} = idx{1};
> +    let Inst{21} = idx{0};
> +  }
> +}
> +} // let Predicates = [HasNeon, HasV8_1a]
> +
> +//----------------------------------------------------------------------------
>  // Crypto extensions
>  //----------------------------------------------------------------------------
>
>
> Modified: llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.td
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.td?rev=233693&r1=233692&r2=233693&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.td (original)
> +++ llvm/trunk/lib/Target/AArch64/AArch64InstrInfo.td Tue Mar 31 08:15:48 2015
> @@ -2778,6 +2778,10 @@ defm UQSUB    : SIMDThreeSameVector<1,0b
>  defm URHADD   : SIMDThreeSameVectorBHS<1,0b00010,"urhadd", int_aarch64_neon_urhadd>;
>  defm URSHL    : SIMDThreeSameVector<1,0b01010,"urshl", int_aarch64_neon_urshl>;
>  defm USHL     : SIMDThreeSameVector<1,0b01000,"ushl", int_aarch64_neon_ushl>;
> +defm SQRDMLAH : SIMDThreeSameVectorSQRDMLxHTiedHS<1,0b10000,"sqrdmlah",
> +                                                  int_aarch64_neon_sqadd>;
> +defm SQRDMLSH : SIMDThreeSameVectorSQRDMLxHTiedHS<1,0b10001,"sqrdmlsh",
> +                                                    int_aarch64_neon_sqsub>;
>
>  defm AND : SIMDLogicalThreeVector<0, 0b00, "and", and>;
>  defm BIC : SIMDLogicalThreeVector<0, 0b01, "bic",
> @@ -2994,6 +2998,20 @@ defm UQSHL    : SIMDThreeScalarBHSD<1, 0
>  defm UQSUB    : SIMDThreeScalarBHSD<1, 0b00101, "uqsub", int_aarch64_neon_uqsub>;
>  defm URSHL    : SIMDThreeScalarD<   1, 0b01010, "urshl", int_aarch64_neon_urshl>;
>  defm USHL     : SIMDThreeScalarD<   1, 0b01000, "ushl", int_aarch64_neon_ushl>;
> +let Predicates = [HasV8_1a] in {
> +  defm SQRDMLAH : SIMDThreeScalarHSTied<1, 0, 0b10000, "sqrdmlah">;
> +  defm SQRDMLSH : SIMDThreeScalarHSTied<1, 0, 0b10001, "sqrdmlsh">;
> +  def : Pat<(i32 (int_aarch64_neon_sqadd
> +                   (i32 FPR32:$Rd),
> +                   (i32 (int_aarch64_neon_sqrdmulh (i32 FPR32:$Rn),
> +                                                   (i32 FPR32:$Rm))))),
> +            (SQRDMLAHv1i32 FPR32:$Rd, FPR32:$Rn, FPR32:$Rm)>;
> +  def : Pat<(i32 (int_aarch64_neon_sqsub
> +                   (i32 FPR32:$Rd),
> +                   (i32 (int_aarch64_neon_sqrdmulh (i32 FPR32:$Rn),
> +                                                   (i32 FPR32:$Rm))))),
> +            (SQRDMLSHv1i32 FPR32:$Rd, FPR32:$Rn, FPR32:$Rm)>;
> +}
>
>  def : InstAlias<"cmls $dst, $src1, $src2",
>                  (CMHSv1i64 FPR64:$dst, FPR64:$src2, FPR64:$src1), 0>;
> @@ -4324,6 +4342,10 @@ defm SQDMLAL : SIMDIndexedLongSQDMLXSDTi
>                                             int_aarch64_neon_sqadd>;
>  defm SQDMLSL : SIMDIndexedLongSQDMLXSDTied<0, 0b0111, "sqdmlsl",
>                                             int_aarch64_neon_sqsub>;
> +defm SQRDMLAH : SIMDIndexedSQRDMLxHSDTied<1, 0b1101, "sqrdmlah",
> +                                          int_aarch64_neon_sqadd>;
> +defm SQRDMLSH : SIMDIndexedSQRDMLxHSDTied<1, 0b1111, "sqrdmlsh",
> +                                          int_aarch64_neon_sqsub>;
>  defm SQDMULL : SIMDIndexedLongSD<0, 0b1011, "sqdmull", int_aarch64_neon_sqdmull>;
>  defm UMLAL   : SIMDVectorIndexedLongSDTied<1, 0b0010, "umlal",
>      TriOpFrag<(add node:$LHS, (int_aarch64_neon_umull node:$MHS, node:$RHS))>>;
>
> Added: llvm/trunk/test/CodeGen/AArch64/arm64-neon-v8.1a.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/AArch64/arm64-neon-v8.1a.ll?rev=233693&view=auto
> ==============================================================================
> --- llvm/trunk/test/CodeGen/AArch64/arm64-neon-v8.1a.ll (added)
> +++ llvm/trunk/test/CodeGen/AArch64/arm64-neon-v8.1a.ll Tue Mar 31 08:15:48 2015
> @@ -0,0 +1,456 @@
> +; RUN: llc < %s -verify-machineinstrs -march=arm64                                          | FileCheck %s --check-prefix=CHECK-V8a
> +; RUN: llc < %s -verify-machineinstrs -march=arm64 -mattr=+v8.1a                            | FileCheck %s --check-prefix=CHECK-V81a
> +; RUN: llc < %s -verify-machineinstrs -march=arm64 -mattr=+v8.1a -aarch64-neon-syntax=apple | FileCheck %s --check-prefix=CHECK-V81a-apple
> +
> +declare <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16>, <4 x i16>)
> +declare <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16>, <8 x i16>)
> +declare <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32>, <2 x i32>)
> +declare <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32>, <4 x i32>)
> +declare i32 @llvm.aarch64.neon.sqrdmulh.i32(i32, i32)
> +declare i16 @llvm.aarch64.neon.sqrdmulh.i16(i16, i16)
> +
> +declare <4 x i16> @llvm.aarch64.neon.sqadd.v4i16(<4 x i16>, <4 x i16>)
> +declare <8 x i16> @llvm.aarch64.neon.sqadd.v8i16(<8 x i16>, <8 x i16>)
> +declare <2 x i32> @llvm.aarch64.neon.sqadd.v2i32(<2 x i32>, <2 x i32>)
> +declare <4 x i32> @llvm.aarch64.neon.sqadd.v4i32(<4 x i32>, <4 x i32>)
> +declare i32 @llvm.aarch64.neon.sqadd.i32(i32, i32)
> +declare i16 @llvm.aarch64.neon.sqadd.i16(i16, i16)
> +
> +declare <4 x i16> @llvm.aarch64.neon.sqsub.v4i16(<4 x i16>, <4 x i16>)
> +declare <8 x i16> @llvm.aarch64.neon.sqsub.v8i16(<8 x i16>, <8 x i16>)
> +declare <2 x i32> @llvm.aarch64.neon.sqsub.v2i32(<2 x i32>, <2 x i32>)
> +declare <4 x i32> @llvm.aarch64.neon.sqsub.v4i32(<4 x i32>, <4 x i32>)
> +declare i32 @llvm.aarch64.neon.sqsub.i32(i32, i32)
> +declare i16 @llvm.aarch64.neon.sqsub.i16(i16, i16)
> +
> +;-----------------------------------------------------------------------------
> +; RDMA Vector
> +; test for SIMDThreeSameVectorSQRDMLxHTiedHS
> +
> +define <4 x i16> @test_sqrdmlah_v4i16(<4 x i16> %acc, <4 x i16> %mhs, <4 x i16> %rhs) {
> +; CHECK-LABEL: test_sqrdmlah_v4i16:
> +   %prod = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %mhs,  <4 x i16> %rhs)
> +   %retval =  call <4 x i16> @llvm.aarch64.neon.sqadd.v4i16(<4 x i16> %acc,  <4 x i16> %prod)
> +; CHECK-V8a:        sqrdmulh    v1.4h, v1.4h, v2.4h
> +; CHECK-V81a:       sqrdmlah    v0.4h, v1.4h, v2.4h
> +; CHECK-V81a-apple: sqrdmlah.4h v0,    v1,    v2
> +   ret <4 x i16> %retval
> +}
> +
> +define <8 x i16> @test_sqrdmlah_v8i16(<8 x i16> %acc, <8 x i16> %mhs, <8 x i16> %rhs) {
> +; CHECK-LABEL: test_sqrdmlah_v8i16:
> +   %prod = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> %mhs, <8 x i16> %rhs)
> +   %retval =  call <8 x i16> @llvm.aarch64.neon.sqadd.v8i16(<8 x i16> %acc, <8 x i16> %prod)
> +; CHECK-V8a:        sqrdmulh    v1.8h, v1.8h, v2.8h
> +; CHECK-V81a:       sqrdmlah    v0.8h, v1.8h, v2.8h
> +; CHECK-V81a-apple: sqrdmlah.8h v0, v1, v2
> +   ret <8 x i16> %retval
> +}
> +
> +define <2 x i32> @test_sqrdmlah_v2i32(<2 x i32> %acc, <2 x i32> %mhs, <2 x i32> %rhs) {
> +; CHECK-LABEL: test_sqrdmlah_v2i32:
> +   %prod = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> %mhs, <2 x i32> %rhs)
> +   %retval =  call <2 x i32> @llvm.aarch64.neon.sqadd.v2i32(<2 x i32> %acc, <2 x i32> %prod)
> +; CHECK-V8a:        sqrdmulh    v1.2s, v1.2s, v2.2s
> +; CHECK-V81a:       sqrdmlah    v0.2s, v1.2s, v2.2s
> +; CHECK-V81a-apple: sqrdmlah.2s v0,    v1,    v2
> +   ret <2 x i32> %retval
> +}
> +
> +define <4 x i32> @test_sqrdmlah_v4i32(<4 x i32> %acc, <4 x i32> %mhs, <4 x i32> %rhs) {
> +; CHECK-LABEL: test_sqrdmlah_v4i32:
> +   %prod = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> %mhs, <4 x i32> %rhs)
> +   %retval =  call <4 x i32> @llvm.aarch64.neon.sqadd.v4i32(<4 x i32> %acc, <4 x i32> %prod)
> +; CHECK-V81:        sqrdmulh    v1.4s, v1.4s, v2.4s
> +; CHECK-V81a:       sqrdmlah    v0.4s, v1.4s, v2.4s
> +; CHECK-V81a-apple: sqrdmlah.4s v0,    v1,    v2
> +   ret <4 x i32> %retval
> +}
> +
> +define <4 x i16> @test_sqrdmlsh_v4i16(<4 x i16> %acc, <4 x i16> %mhs, <4 x i16> %rhs) {
> +; CHECK-LABEL: test_sqrdmlsh_v4i16:
> +   %prod = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %mhs,  <4 x i16> %rhs)
> +   %retval =  call <4 x i16> @llvm.aarch64.neon.sqsub.v4i16(<4 x i16> %acc, <4 x i16> %prod)
> +; CHECK-V8a:        sqrdmulh    v1.4h, v1.4h, v2.4h
> +; CHECK-V81a:       sqrdmlsh    v0.4h, v1.4h, v2.4h
> +; CHECK-V81a-apple: sqrdmlsh.4h v0,    v1,    v2
> +   ret <4 x i16> %retval
> +}
> +
> +define <8 x i16> @test_sqrdmlsh_v8i16(<8 x i16> %acc, <8 x i16> %mhs, <8 x i16> %rhs) {
> +; CHECK-LABEL: test_sqrdmlsh_v8i16:
> +   %prod = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> %mhs, <8 x i16> %rhs)
> +   %retval =  call <8 x i16> @llvm.aarch64.neon.sqsub.v8i16(<8 x i16> %acc, <8 x i16> %prod)
> +; CHECK-V8a:        sqrdmulh    v1.8h, v1.8h, v2.8h
> +; CHECK-V81a:       sqrdmlsh    v0.8h, v1.8h, v2.8h
> +; CHECK-V81a-apple: sqrdmlsh.8h v0,    v1,    v2
> +   ret <8 x i16> %retval
> +}
> +
> +define <2 x i32> @test_sqrdmlsh_v2i32(<2 x i32> %acc, <2 x i32> %mhs, <2 x i32> %rhs) {
> +; CHECK-LABEL: test_sqrdmlsh_v2i32:
> +   %prod = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> %mhs, <2 x i32> %rhs)
> +   %retval =  call <2 x i32> @llvm.aarch64.neon.sqsub.v2i32(<2 x i32> %acc, <2 x i32> %prod)
> +; CHECK-V8a:        sqrdmulh    v1.2s, v1.2s, v2.2s
> +; CHECK-V81a:       sqrdmlsh    v0.2s, v1.2s, v2.2s
> +; CHECK-V81a-apple: sqrdmlsh.2s v0,    v1,    v2
> +   ret <2 x i32> %retval
> +}
> +
> +define <4 x i32> @test_sqrdmlsh_v4i32(<4 x i32> %acc, <4 x i32> %mhs, <4 x i32> %rhs) {
> +; CHECK-LABEL: test_sqrdmlsh_v4i32:
> +   %prod = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> %mhs, <4 x i32> %rhs)
> +   %retval =  call <4 x i32> @llvm.aarch64.neon.sqsub.v4i32(<4 x i32> %acc, <4 x i32> %prod)
> +; CHECK-V8a:        sqrdmulh    v1.4s, v1.4s, v2.4s
> +; CHECK-V81a:       sqrdmlsh    v0.4s, v1.4s, v2.4s
> +; CHECK-V81a-apple: sqrdmlsh.4s v0,    v1,    v2
> +   ret <4 x i32> %retval
> +}
> +
> +;-----------------------------------------------------------------------------
> +; RDMA Vector, by element
> +; tests for vXiYY_indexed in SIMDIndexedSQRDMLxHSDTied
> +
> +define <4 x i16> @test_sqrdmlah_lane_s16(<4 x i16> %acc, <4 x i16> %x, <4 x i16> %v) {
> +; CHECK-LABEL: test_sqrdmlah_lane_s16:
> +entry:
> +  %shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
> +  %prod = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %x, <4 x i16> %shuffle)
> +  %retval =  call <4 x i16> @llvm.aarch64.neon.sqadd.v4i16(<4 x i16> %acc, <4 x i16> %prod)
> +; CHECK-V8a :       sqrdmulh    v1.4h, v1.4h, v2.h[3]
> +; CHECK-V81a:       sqrdmlah    v0.4h, v1.4h, v2.h[3]
> +; CHECK-V81a-apple: sqrdmlah.4h v0,    v1,    v2[3]
> +  ret <4 x i16> %retval
> +}
> +
> +define <8 x i16> @test_sqrdmlahq_lane_s16(<8 x i16> %acc, <8 x i16> %x, <8 x i16> %v) {
> +; CHECK-LABEL: test_sqrdmlahq_lane_s16:
> +entry:
> +  %shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <8 x i32> <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>
> +  %prod = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> %x, <8 x i16> %shuffle)
> +  %retval =  call <8 x i16> @llvm.aarch64.neon.sqadd.v8i16(<8 x i16> %acc, <8 x i16> %prod)
> +; CHECK-V8a:        sqrdmulh    v1.8h, v1.8h, v2.h[2]
> +; CHECK-V81a:       sqrdmlah    v0.8h, v1.8h, v2.h[2]
> +; CHECK-V81a-apple: sqrdmlah.8h v0,    v1,    v2[2]
> +  ret <8 x i16> %retval
> +}
> +
> +define <2 x i32> @test_sqrdmlah_lane_s32(<2 x i32> %acc, <2 x i32> %x, <2 x i32> %v) {
> +; CHECK-LABEL: test_sqrdmlah_lane_s32:
> +entry:
> +  %shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
> +  %prod = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> %x, <2 x i32> %shuffle)
> +  %retval =  call <2 x i32> @llvm.aarch64.neon.sqadd.v2i32(<2 x i32> %acc, <2 x i32> %prod)
> +; CHECK-V8a:        sqrdmulh    v1.2s, v1.2s, v2.s[1]
> +; CHECK-V81a:       sqrdmlah    v0.2s, v1.2s, v2.s[1]
> +; CHECK-V81a-apple: sqrdmlah.2s v0,    v1,    v2[1]
> +  ret <2 x i32> %retval
> +}
> +
> +define <4 x i32> @test_sqrdmlahq_lane_s32(<4 x i32> %acc,<4 x i32> %x, <4 x i32> %v) {
> +; CHECK-LABEL: test_sqrdmlahq_lane_s32:
> +entry:
> +  %shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <4 x i32> zeroinitializer
> +  %prod = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> %x, <4 x i32> %shuffle)
> +  %retval =  call <4 x i32> @llvm.aarch64.neon.sqadd.v4i32(<4 x i32> %acc, <4 x i32> %prod)
> +; CHECK-V8a:        sqrdmulh    v1.4s, v1.4s, v2.s[0]
> +; CHECK-V81a:       sqrdmlah    v0.4s, v1.4s, v2.s[0]
> +; CHECK-V81a-apple: sqrdmlah.4s v0,    v1,    v2[0]
> +  ret <4 x i32> %retval
> +}
> +
> +define <4 x i16> @test_sqrdmlsh_lane_s16(<4 x i16> %acc, <4 x i16> %x, <4 x i16> %v) {
> +; CHECK-LABEL: test_sqrdmlsh_lane_s16:
> +entry:
> +  %shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
> +  %prod = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %x, <4 x i16> %shuffle)
> +  %retval =  call <4 x i16> @llvm.aarch64.neon.sqsub.v4i16(<4 x i16> %acc, <4 x i16> %prod)
> +; CHECK-V8a:        sqrdmulh    v1.4h, v1.4h, v2.h[3]
> +; CHECK-V81a:       sqrdmlsh    v0.4h, v1.4h, v2.h[3]
> +; CHECK-V81a-apple: sqrdmlsh.4h v0,    v1,    v2[3]
> +  ret <4 x i16> %retval
> +}
> +
> +define <8 x i16> @test_sqrdmlshq_lane_s16(<8 x i16> %acc, <8 x i16> %x, <8 x i16> %v) {
> +; CHECK-LABEL: test_sqrdmlshq_lane_s16:
> +entry:
> +  %shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <8 x i32> <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>
> +  %prod = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> %x, <8 x i16> %shuffle)
> +  %retval =  call <8 x i16> @llvm.aarch64.neon.sqsub.v8i16(<8 x i16> %acc, <8 x i16> %prod)
> +; CHECK-V8a:        sqrdmulh    v1.8h, v1.8h, v2.h[2]
> +; CHECK-V81a:       sqrdmlsh    v0.8h, v1.8h, v2.h[2]
> +; CHECK-V81a-apple: sqrdmlsh.8h v0,    v1,    v2[2]
> +  ret <8 x i16> %retval
> +}
> +
> +define <2 x i32> @test_sqrdmlsh_lane_s32(<2 x i32> %acc, <2 x i32> %x, <2 x i32> %v) {
> +; CHECK-LABEL: test_sqrdmlsh_lane_s32:
> +entry:
> +  %shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> <i32 1, i32 1>
> +  %prod = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> %x, <2 x i32> %shuffle)
> +  %retval =  call <2 x i32> @llvm.aarch64.neon.sqsub.v2i32(<2 x i32> %acc, <2 x i32> %prod)
> +; CHECK-V8a:        sqrdmulh    v1.2s, v1.2s, v2.s[1]
> +; CHECK-V81a:       sqrdmlsh    v0.2s, v1.2s, v2.s[1]
> +; CHECK-V81a-apple: sqrdmlsh.2s v0,    v1,    v2[1]
> +  ret <2 x i32> %retval
> +}
> +
> +define <4 x i32> @test_sqrdmlshq_lane_s32(<4 x i32> %acc,<4 x i32> %x, <4 x i32> %v) {
> +; CHECK-LABEL: test_sqrdmlshq_lane_s32:
> +entry:
> +  %shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <4 x i32> zeroinitializer
> +  %prod = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> %x, <4 x i32> %shuffle)
> +  %retval =  call <4 x i32> @llvm.aarch64.neon.sqsub.v4i32(<4 x i32> %acc, <4 x i32> %prod)
> +; CHECK-V8a:        sqrdmulh    v1.4s, v1.4s, v2.s[0]
> +; CHECK-V81a:       sqrdmlsh    v0.4s, v1.4s, v2.s[0]
> +; CHECK-V81a-apple: sqrdmlsh.4s v0,    v1,    v2[0]
> +  ret <4 x i32> %retval
> +}
> +
> +;-----------------------------------------------------------------------------
> +; RDMA Vector, by element, extracted
> +; i16 tests are for vXi16_indexed in SIMDIndexedSQRDMLxHSDTied, with IR in ACLE style
> +; i32 tests are for   "def : Pat" in SIMDIndexedSQRDMLxHSDTied
> +
> +define i16 @test_sqrdmlah_extracted_lane_s16(i16 %acc,<4 x i16> %x, <4 x i16> %v) {
> +; CHECK-LABEL: test_sqrdmlah_extracted_lane_s16:
> +entry:
> +  %shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1,i32 1,i32 1,i32 1>
> +  %prod = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %x, <4 x i16> %shuffle)
> +  %acc_vec = insertelement <4 x i16> undef, i16 %acc, i64 0
> +  %retval_vec =  call <4 x i16> @llvm.aarch64.neon.sqadd.v4i16(<4 x i16> %acc_vec, <4 x i16> %prod)
> +  %retval = extractelement <4 x i16> %retval_vec, i64 0
> +; CHECK-V8a:        sqrdmulh    {{v[0-9]+}}.4h, v0.4h, v1.h[1]
> +; CHECK-V81a:       sqrdmlah    {{v[2-9]+}}.4h, v0.4h, v1.h[1]
> +; CHECK-V81a-apple: sqrdmlah.4h {{v[2-9]+}},    v0,    v1[1]
> +  ret i16 %retval
> +}
> +
> +define i16 @test_sqrdmlahq_extracted_lane_s16(i16 %acc,<8 x i16> %x, <8 x i16> %v) {
> +; CHECK-LABEL: test_sqrdmlahq_extracted_lane_s16:
> +entry:
> +  %shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <8 x i32> <i32 1,i32 1,i32 1,i32 1, i32 1,i32 1,i32 1,i32 1>
> +  %prod = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> %x, <8 x i16> %shuffle)
> +  %acc_vec = insertelement <8 x i16> undef, i16 %acc, i64 0
> +  %retval_vec =  call <8 x i16> @llvm.aarch64.neon.sqadd.v8i16(<8 x i16> %acc_vec, <8 x i16> %prod)
> +  %retval = extractelement <8 x i16> %retval_vec, i64 0
> +; CHECK-V8a:        sqrdmulh    {{v[0-9]+}}.8h, v0.8h, v1.h[1]
> +; CHECK-V81a:       sqrdmlah    {{v[2-9]+}}.8h, v0.8h, v1.h[1]
> +; CHECK-V81a-apple: sqrdmlah.8h {{v[2-9]+}},    v0,    v1[1]
> +  ret i16 %retval
> +}
> +
> +define i32 @test_sqrdmlah_extracted_lane_s32(i32 %acc,<2 x i32> %x, <2 x i32> %v) {
> +; CHECK-LABEL: test_sqrdmlah_extracted_lane_s32:
> +entry:
> +  %shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> zeroinitializer
> +  %prod = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> %x, <2 x i32> %shuffle)
> +  %extract = extractelement <2 x i32> %prod, i64 0
> +  %retval =  call i32 @llvm.aarch64.neon.sqadd.i32(i32 %acc, i32 %extract)
> +; CHECK-V8a:        sqrdmulh    v0.2s, v0.2s, v1.s[0]
> +; CHECK-V81a:       sqrdmlah    v2.2s, v0.2s, v1.s[0]
> +; CHECK-V81a-apple: sqrdmlah.2s v2,    v0,    v1[0]
> +  ret i32 %retval
> +}
> +
> +define i32 @test_sqrdmlahq_extracted_lane_s32(i32 %acc,<4 x i32> %x, <4 x i32> %v) {
> +; CHECK-LABEL: test_sqrdmlahq_extracted_lane_s32:
> +entry:
> +  %shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <4 x i32> zeroinitializer
> +  %prod = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> %x, <4 x i32> %shuffle)
> +  %extract = extractelement <4 x i32> %prod, i64 0
> +  %retval =  call i32 @llvm.aarch64.neon.sqadd.i32(i32 %acc, i32 %extract)
> +; CHECK-V8a:        sqrdmulh    v0.4s, v0.4s, v1.s[0]
> +; CHECK-V81a:       sqrdmlah    v2.4s, v0.4s, v1.s[0]
> +; CHECK-V81a-apple: sqrdmlah.4s v2,    v0,    v1[0]
> +  ret i32 %retval
> +}
> +
> +define i16 @test_sqrdmlsh_extracted_lane_s16(i16 %acc,<4 x i16> %x, <4 x i16> %v) {
> +; CHECK-LABEL: test_sqrdmlsh_extracted_lane_s16:
> +entry:
> +  %shuffle = shufflevector <4 x i16> %v, <4 x i16> undef, <4 x i32> <i32 1,i32 1,i32 1,i32 1>
> +  %prod = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %x, <4 x i16> %shuffle)
> +  %acc_vec = insertelement <4 x i16> undef, i16 %acc, i64 0
> +  %retval_vec =  call <4 x i16> @llvm.aarch64.neon.sqsub.v4i16(<4 x i16> %acc_vec, <4 x i16> %prod)
> +  %retval = extractelement <4 x i16> %retval_vec, i64 0
> +; CHECK-V8a:        sqrdmulh    {{v[0-9]+}}.4h, v0.4h, v1.h[1]
> +; CHECK-V81a:       sqrdmlsh    {{v[2-9]+}}.4h, v0.4h, v1.h[1]
> +; CHECK-V81a-apple: sqrdmlsh.4h {{v[2-9]+}},    v0,    v1[1]
> +  ret i16 %retval
> +}
> +
> +define i16 @test_sqrdmlshq_extracted_lane_s16(i16 %acc,<8 x i16> %x, <8 x i16> %v) {
> +; CHECK-LABEL: test_sqrdmlshq_extracted_lane_s16:
> +entry:
> +  %shuffle = shufflevector <8 x i16> %v, <8 x i16> undef, <8 x i32> <i32 1,i32 1,i32 1,i32 1, i32 1,i32 1,i32 1,i32 1>
> +  %prod = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> %x, <8 x i16> %shuffle)
> +  %acc_vec = insertelement <8 x i16> undef, i16 %acc, i64 0
> +  %retval_vec =  call <8 x i16> @llvm.aarch64.neon.sqsub.v8i16(<8 x i16> %acc_vec, <8 x i16> %prod)
> +  %retval = extractelement <8 x i16> %retval_vec, i64 0
> +; CHECK-V8a:        sqrdmulh    {{v[0-9]+}}.8h, v0.8h, v1.h[1]
> +; CHECK-V81a:       sqrdmlsh    {{v[2-9]+}}.8h, v0.8h, v1.h[1]
> +; CHECK-V81a-apple: sqrdmlsh.8h {{v[2-9]+}},    v0,    v1[1]
> +  ret i16 %retval
> +}
> +
> +define i32 @test_sqrdmlsh_extracted_lane_s32(i32 %acc,<2 x i32> %x, <2 x i32> %v) {
> +; CHECK-LABEL: test_sqrdmlsh_extracted_lane_s32:
> +entry:
> +  %shuffle = shufflevector <2 x i32> %v, <2 x i32> undef, <2 x i32> zeroinitializer
> +  %prod = call <2 x i32> @llvm.aarch64.neon.sqrdmulh.v2i32(<2 x i32> %x, <2 x i32> %shuffle)
> +  %extract = extractelement <2 x i32> %prod, i64 0
> +  %retval =  call i32 @llvm.aarch64.neon.sqsub.i32(i32 %acc, i32 %extract)
> +; CHECK-V8a:        sqrdmulh    v0.2s, v0.2s, v1.s[0]
> +; CHECK-V81a:       sqrdmlsh    v2.2s, v0.2s, v1.s[0]
> +; CHECK-V81a-apple: sqrdmlsh.2s v2,    v0,    v1[0]
> +  ret i32 %retval
> +}
> +
> +define i32 @test_sqrdmlshq_extracted_lane_s32(i32 %acc,<4 x i32> %x, <4 x i32> %v) {
> +; CHECK-LABEL: test_sqrdmlshq_extracted_lane_s32:
> +entry:
> +  %shuffle = shufflevector <4 x i32> %v, <4 x i32> undef, <4 x i32> zeroinitializer
> +  %prod = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> %x, <4 x i32> %shuffle)
> +  %extract = extractelement <4 x i32> %prod, i64 0
> +  %retval =  call i32 @llvm.aarch64.neon.sqsub.i32(i32 %acc, i32 %extract)
> +; CHECK-V8a:        sqrdmulh    v0.4s, v0.4s, v1.s[0]
> +; CHECK-V81a:       sqrdmlsh    v2.4s, v0.4s, v1.s[0]
> +; CHECK-V81a-apple: sqrdmlsh.4s v2,    v0,    v1[0]
> +  ret i32 %retval
> +}
> +
> +;-----------------------------------------------------------------------------
> +; RDMA Scalar
> +; test for "def : Pat" near SIMDThreeScalarHSTied in AArch64InstInfo.td
> +
> +define i16 @test_sqrdmlah_v1i16(i16 %acc, i16 %x, i16 %y) {
> +; CHECK-LABEL: test_sqrdmlah_v1i16:
> +  %x_vec = insertelement <4 x i16> undef, i16 %x, i64 0
> +  %y_vec = insertelement <4 x i16> undef, i16 %y, i64 0
> +  %prod_vec = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %x_vec,  <4 x i16> %y_vec)
> +  %acc_vec = insertelement <4 x i16> undef, i16 %acc, i64 0
> +  %retval_vec =  call <4 x i16> @llvm.aarch64.neon.sqadd.v4i16(<4 x i16> %acc_vec,  <4 x i16> %prod_vec)
> +  %retval = extractelement <4 x i16> %retval_vec, i64 0
> +; CHECK-V8a:        sqrdmulh    {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.4h
> +; CHECK-V81a:       sqrdmlah    {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.4h
> +; CHECK-V81a-apple: sqrdmlah.4h {{v[0-9]+}},    {{v[0-9]+}},    {{v[0-9]+}}
> +  ret i16 %retval
> +}
> +
> +define i32 @test_sqrdmlah_v1i32(i32 %acc, i32 %x, i32 %y) {
> +; CHECK-LABEL: test_sqrdmlah_v1i32:
> +  %x_vec = insertelement <4 x i32> undef, i32 %x, i64 0
> +  %y_vec = insertelement <4 x i32> undef, i32 %y, i64 0
> +  %prod_vec = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> %x_vec,  <4 x i32> %y_vec)
> +  %acc_vec = insertelement <4 x i32> undef, i32 %acc, i64 0
> +  %retval_vec =  call <4 x i32> @llvm.aarch64.neon.sqadd.v4i32(<4 x i32> %acc_vec,  <4 x i32> %prod_vec)
> +  %retval = extractelement <4 x i32> %retval_vec, i64 0
> +; CHECK-V8a:        sqrdmulh    {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.4s
> +; CHECK-V81a:       sqrdmlah    {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.4s
> +; CHECK-V81a-apple: sqrdmlah.4s {{v[0-9]+}},    {{v[0-9]+}},    {{v[0-9]+}}
> +  ret i32 %retval
> +}
> +
> +
> +define i16 @test_sqrdmlsh_v1i16(i16 %acc, i16 %x, i16 %y) {
> +; CHECK-LABEL: test_sqrdmlsh_v1i16:
> +  %x_vec = insertelement <4 x i16> undef, i16 %x, i64 0
> +  %y_vec = insertelement <4 x i16> undef, i16 %y, i64 0
> +  %prod_vec = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %x_vec,  <4 x i16> %y_vec)
> +  %acc_vec = insertelement <4 x i16> undef, i16 %acc, i64 0
> +  %retval_vec =  call <4 x i16> @llvm.aarch64.neon.sqsub.v4i16(<4 x i16> %acc_vec,  <4 x i16> %prod_vec)
> +  %retval = extractelement <4 x i16> %retval_vec, i64 0
> +; CHECK-V8a:        sqrdmulh    {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.4h
> +; CHECK-V81a:       sqrdmlsh    {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, {{v[0-9]+}}.4h
> +; CHECK-V81a-apple: sqrdmlsh.4h {{v[0-9]+}},    {{v[0-9]+}},    {{v[0-9]+}}
> +  ret i16 %retval
> +}
> +
> +define i32 @test_sqrdmlsh_v1i32(i32 %acc, i32 %x, i32 %y) {
> +; CHECK-LABEL: test_sqrdmlsh_v1i32:
> +  %x_vec = insertelement <4 x i32> undef, i32 %x, i64 0
> +  %y_vec = insertelement <4 x i32> undef, i32 %y, i64 0
> +  %prod_vec = call <4 x i32> @llvm.aarch64.neon.sqrdmulh.v4i32(<4 x i32> %x_vec,  <4 x i32> %y_vec)
> +  %acc_vec = insertelement <4 x i32> undef, i32 %acc, i64 0
> +  %retval_vec =  call <4 x i32> @llvm.aarch64.neon.sqsub.v4i32(<4 x i32> %acc_vec,  <4 x i32> %prod_vec)
> +  %retval = extractelement <4 x i32> %retval_vec, i64 0
> +; CHECK-V8a:        sqrdmulh    {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.4s
> +; CHECK-V81a:       sqrdmlsh    {{v[0-9]+}}.4s, {{v[0-9]+}}.4s, {{v[0-9]+}}.4s
> +; CHECK-V81a-apple: sqrdmlsh.4s {{v[0-9]+}},    {{v[0-9]+}},    {{v[0-9]+}}
> +  ret i32 %retval
> +}
> +define i32 @test_sqrdmlah_i32(i32 %acc, i32 %mhs, i32 %rhs) {
> +; CHECK-LABEL: test_sqrdmlah_i32:
> +  %prod = call i32 @llvm.aarch64.neon.sqrdmulh.i32(i32 %mhs,  i32 %rhs)
> +  %retval =  call i32 @llvm.aarch64.neon.sqadd.i32(i32 %acc,  i32 %prod)
> +; CHECK-V8a:        sqrdmulh {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}
> +; CHECK-V81a:       sqrdmlah {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}
> +; CHECK-V81a-apple: sqrdmlah {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}
> +  ret i32 %retval
> +}
> +
> +define i32 @test_sqrdmlsh_i32(i32 %acc, i32 %mhs, i32 %rhs) {
> +; CHECK-LABEL: test_sqrdmlsh_i32:
> +  %prod = call i32 @llvm.aarch64.neon.sqrdmulh.i32(i32 %mhs,  i32 %rhs)
> +  %retval =  call i32 @llvm.aarch64.neon.sqsub.i32(i32 %acc,  i32 %prod)
> +; CHECK-V8a:        sqrdmulh {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}
> +; CHECK-V81a:       sqrdmlsh {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}
> +; CHECK-V81a-apple: sqrdmlsh {{s[0-9]+}}, {{s[0-9]+}}, {{s[0-9]+}}
> +  ret i32 %retval
> +}
> +
> +;-----------------------------------------------------------------------------
> +; RDMA Scalar, by element
> +; i16 tests are performed via tests in above chapter, with IR in ACLE style
> +; i32 tests are for i32_indexed in SIMDIndexedSQRDMLxHSDTied
> +
> +define i16 @test_sqrdmlah_extract_i16(i16 %acc, i16 %x, <4 x i16> %y_vec) {
> +; CHECK-LABEL: test_sqrdmlah_extract_i16:
> +  %shuffle = shufflevector <4 x i16> %y_vec, <4 x i16> undef, <4 x i32> <i32 1,i32 1,i32 1,i32 1>
> +  %x_vec = insertelement <4 x i16> undef, i16 %x, i64 0
> +  %prod = call <4 x i16> @llvm.aarch64.neon.sqrdmulh.v4i16(<4 x i16> %x_vec, <4 x i16> %shuffle)
> +  %acc_vec = insertelement <4 x i16> undef, i16 %acc, i64 0
> +  %retval_vec =  call <4 x i16> @llvm.aarch64.neon.sqadd.v4i16(<4 x i16> %acc_vec, <4 x i16> %prod)
> +  %retval = extractelement <4 x i16> %retval_vec, i32 0
> +; CHECK-V8a:        sqrdmulh    {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, v0.h[1]
> +; CHECK-V81a:       sqrdmlah    {{v[0-9]+}}.4h, {{v[0-9]+}}.4h, v0.h[1]
> +; CHECK-V81a-apple: sqrdmlah.4h {{v[0-9]+}},    {{v[0-9]+}}, v0[1]
> +  ret i16 %retval
> +}
> +
> +define i32 @test_sqrdmlah_extract_i32(i32 %acc, i32 %mhs, <4 x i32> %rhs) {
> +; CHECK-LABEL: test_sqrdmlah_extract_i32:
> +  %extract = extractelement <4 x i32> %rhs, i32 3
> +  %prod = call i32 @llvm.aarch64.neon.sqrdmulh.i32(i32 %mhs,  i32 %extract)
> +  %retval =  call i32 @llvm.aarch64.neon.sqadd.i32(i32 %acc,  i32 %prod)
> +; CHECK-V8a:        sqrdmulh   {{s[0-9]+}}, {{s[0-9]+}}, v0.s[3]
> +; CHECK-V81a:       sqrdmlah   {{s[0-9]+}}, {{s[0-9]+}}, v0.s[3]
> +; CHECK-V81a-apple: sqrdmlah.s {{s[0-9]+}}, {{s[0-9]+}}, v0[3]
> +  ret i32 %retval
> +}
> +
> +define i16 @test_sqrdmlshq_extract_i16(i16 %acc, i16 %x, <8 x i16> %y_vec) {
> +; CHECK-LABEL: test_sqrdmlshq_extract_i16:
> +  %shuffle = shufflevector <8 x i16> %y_vec, <8 x i16> undef, <8 x i32> <i32 1,i32 1,i32 1,i32 1,i32 1,i32 1,i32 1,i32 1>
> +  %x_vec = insertelement <8 x i16> undef, i16 %x, i64 0
> +  %prod = call <8 x i16> @llvm.aarch64.neon.sqrdmulh.v8i16(<8 x i16> %x_vec, <8 x i16> %shuffle)
> +  %acc_vec = insertelement <8 x i16> undef, i16 %acc, i64 0
> +  %retval_vec =  call <8 x i16> @llvm.aarch64.neon.sqsub.v8i16(<8 x i16> %acc_vec, <8 x i16> %prod)
> +  %retval = extractelement <8 x i16> %retval_vec, i32 0
> +; CHECK-V8a:        sqrdmulh    {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, v0.h[1]
> +; CHECK-V81a:       sqrdmlsh    {{v[0-9]+}}.8h, {{v[0-9]+}}.8h, v0.h[1]
> +; CHECK-V81a-apple: sqrdmlsh.8h {{v[0-9]+}},    {{v[0-9]+}}, v0[1]
> +  ret i16 %retval
> +}
> +
> +define i32 @test_sqrdmlsh_extract_i32(i32 %acc, i32 %mhs, <4 x i32> %rhs) {
> +; CHECK-LABEL: test_sqrdmlsh_extract_i32:
> +  %extract = extractelement <4 x i32> %rhs, i32 3
> +  %prod = call i32 @llvm.aarch64.neon.sqrdmulh.i32(i32 %mhs,  i32 %extract)
> +  %retval =  call i32 @llvm.aarch64.neon.sqsub.i32(i32 %acc,  i32 %prod)
> +; CHECK-V8a:        sqrdmulh   {{s[0-9]+}}, {{s[0-9]+}}, v0.s[3]
> +; CHECK-V81a:       sqrdmlsh   {{s[0-9]+}}, {{s[0-9]+}}, v0.s[3]
> +; CHECK-V81a-apple: sqrdmlsh.s {{s[0-9]+}}, {{s[0-9]+}}, v0[3]
> +  ret i32 %retval
> +}
>
> Added: llvm/trunk/test/MC/AArch64/armv8.1a-rdma.s
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/MC/AArch64/armv8.1a-rdma.s?rev=233693&view=auto
> ==============================================================================
> --- llvm/trunk/test/MC/AArch64/armv8.1a-rdma.s (added)
> +++ llvm/trunk/test/MC/AArch64/armv8.1a-rdma.s Tue Mar 31 08:15:48 2015
> @@ -0,0 +1,154 @@
> +// RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+v8.1a -show-encoding < %s 2> %t | FileCheck %s
> +// RUN: FileCheck --check-prefix=CHECK-ERROR < %t %s
> +  .text
> +
> +  //AdvSIMD RDMA vector
> +  sqrdmlah v0.4h, v1.4h, v2.4h
> +  sqrdmlsh v0.4h, v1.4h, v2.4h
> +  sqrdmlah v0.2s, v1.2s, v2.2s
> +  sqrdmlsh v0.2s, v1.2s, v2.2s
> +  sqrdmlah v0.4s, v1.4s, v2.4s
> +  sqrdmlsh v0.4s, v1.4s, v2.4s
> +  sqrdmlah v0.8h, v1.8h, v2.8h
> +  sqrdmlsh v0.8h, v1.8h, v2.8h
> +// CHECK: sqrdmlah  v0.4h, v1.4h, v2.4h // encoding: [0x20,0x84,0x42,0x2e]
> +// CHECK: sqrdmlsh  v0.4h, v1.4h, v2.4h // encoding: [0x20,0x8c,0x42,0x2e]
> +// CHECK: sqrdmlah  v0.2s, v1.2s, v2.2s // encoding: [0x20,0x84,0x82,0x2e]
> +// CHECK: sqrdmlsh  v0.2s, v1.2s, v2.2s // encoding: [0x20,0x8c,0x82,0x2e]
> +// CHECK: sqrdmlah  v0.4s, v1.4s, v2.4s // encoding: [0x20,0x84,0x82,0x6e]
> +// CHECK: sqrdmlsh  v0.4s, v1.4s, v2.4s // encoding: [0x20,0x8c,0x82,0x6e]
> +// CHECK: sqrdmlah  v0.8h, v1.8h, v2.8h // encoding: [0x20,0x84,0x42,0x6e]
> +// CHECK: sqrdmlsh  v0.8h, v1.8h, v2.8h // encoding: [0x20,0x8c,0x42,0x6e]
> +
> +  sqrdmlah v0.2h, v1.2h, v2.2h
> +  sqrdmlsh v0.2h, v1.2h, v2.2h
> +  sqrdmlah v0.8s, v1.8s, v2.8s
> +  sqrdmlsh v0.8s, v1.8s, v2.8s
> +  sqrdmlah v0.2s, v1.4h, v2.8h
> +  sqrdmlsh v0.4s, v1.8h, v2.2s
> +// CHECK-ERROR: error: invalid vector kind qualifier
> +// CHECK-ERROR:   sqrdmlah v0.2h, v1.2h, v2.2h
> +// CHECK-ERROR:            ^
> +// CHECK-ERROR: error: invalid vector kind qualifier
> +// CHECK-ERROR:   sqrdmlah v0.2h, v1.2h, v2.2h
> +// CHECK-ERROR:                   ^
> +// CHECK-ERROR: error: invalid vector kind qualifier
> +// CHECK-ERROR:   sqrdmlah v0.2h, v1.2h, v2.2h
> +// CHECK-ERROR:                          ^
> +// CHECK-ERROR: error: invalid operand for instruction
> +// CHECK-ERROR:   sqrdmlah v0.2h, v1.2h, v2.2h
> +// CHECK-ERROR:            ^
> +// CHECK-ERROR: error: invalid vector kind qualifier
> +// CHECK-ERROR:   sqrdmlsh v0.2h, v1.2h, v2.2h
> +// CHECK-ERROR:            ^
> +// CHECK-ERROR: error: invalid vector kind qualifier
> +// CHECK-ERROR:   sqrdmlsh v0.2h, v1.2h, v2.2h
> +// CHECK-ERROR:                   ^
> +// CHECK-ERROR: error: invalid vector kind qualifier
> +// CHECK-ERROR:   sqrdmlsh v0.2h, v1.2h, v2.2h
> +// CHECK-ERROR:                          ^
> +// CHECK-ERROR: error: invalid operand for instruction
> +// CHECK-ERROR:   sqrdmlsh v0.2h, v1.2h, v2.2h
> +// CHECK-ERROR:            ^
> +// CHECK-ERROR: error: invalid vector kind qualifier
> +// CHECK-ERROR:   sqrdmlah v0.8s, v1.8s, v2.8s
> +// CHECK-ERROR:            ^
> +// CHECK-ERROR: error: invalid vector kind qualifier
> +// CHECK-ERROR:   sqrdmlah v0.8s, v1.8s, v2.8s
> +// CHECK-ERROR:                   ^
> +// CHECK-ERROR: error: invalid vector kind qualifier
> +// CHECK-ERROR:   sqrdmlah v0.8s, v1.8s, v2.8s
> +// CHECK-ERROR:                          ^
> +// CHECK-ERROR: error: invalid operand for instruction
> +// CHECK-ERROR:   sqrdmlah v0.8s, v1.8s, v2.8s
> +// CHECK-ERROR:            ^
> +// CHECK-ERROR: error: invalid vector kind qualifier
> +// CHECK-ERROR:   sqrdmlsh v0.8s, v1.8s, v2.8s
> +// CHECK-ERROR:            ^
> +// CHECK-ERROR: error: invalid vector kind qualifier
> +// CHECK-ERROR:   sqrdmlsh v0.8s, v1.8s, v2.8s
> +// CHECK-ERROR:                   ^
> +// CHECK-ERROR: error: invalid vector kind qualifier
> +// CHECK-ERROR:   sqrdmlsh v0.8s, v1.8s, v2.8s
> +// CHECK-ERROR:                          ^
> +// CHECK-ERROR: error: invalid operand for instruction
> +// CHECK-ERROR:   sqrdmlsh v0.8s, v1.8s, v2.8s
> +// CHECK-ERROR:            ^
> +// CHECK-ERROR: error: invalid operand for instruction
> +// CHECK-ERROR:   sqrdmlah v0.2s, v1.4h, v2.8h
> +// CHECK-ERROR:                   ^
> +// CHECK-ERROR: error: invalid operand for instruction
> +// CHECK-ERROR:   sqrdmlsh v0.4s, v1.8h, v2.2s
> +// CHECK-ERROR:                   ^
> +
> +  //AdvSIMD RDMA scalar
> +  sqrdmlah h0, h1, h2
> +  sqrdmlsh h0, h1, h2
> +  sqrdmlah s0, s1, s2
> +  sqrdmlsh s0, s1, s2
> +// CHECK: sqrdmlah h0, h1, h2  // encoding: [0x20,0x84,0x42,0x7e]
> +// CHECK: sqrdmlsh h0, h1, h2  // encoding: [0x20,0x8c,0x42,0x7e]
> +// CHECK: sqrdmlah s0, s1, s2  // encoding: [0x20,0x84,0x82,0x7e]
> +// CHECK: sqrdmlsh s0, s1, s2  // encoding: [0x20,0x8c,0x82,0x7e]
> +
> +  //AdvSIMD RDMA vector by-element
> +  sqrdmlah v0.4h, v1.4h, v2.h[3]
> +  sqrdmlsh v0.4h, v1.4h, v2.h[3]
> +  sqrdmlah v0.2s, v1.2s, v2.s[1]
> +  sqrdmlsh v0.2s, v1.2s, v2.s[1]
> +  sqrdmlah v0.8h, v1.8h, v2.h[3]
> +  sqrdmlsh v0.8h, v1.8h, v2.h[3]
> +  sqrdmlah v0.4s, v1.4s, v2.s[3]
> +  sqrdmlsh v0.4s, v1.4s, v2.s[3]
> +// CHECK: sqrdmlah v0.4h, v1.4h, v2.h[3]  // encoding: [0x20,0xd0,0x72,0x2f]
> +// CHECK: sqrdmlsh v0.4h, v1.4h, v2.h[3]  // encoding: [0x20,0xf0,0x72,0x2f]
> +// CHECK: sqrdmlah v0.2s, v1.2s, v2.s[1]  // encoding: [0x20,0xd0,0xa2,0x2f]
> +// CHECK: sqrdmlsh v0.2s, v1.2s, v2.s[1]  // encoding: [0x20,0xf0,0xa2,0x2f]
> +// CHECK: sqrdmlah v0.8h, v1.8h, v2.h[3]  // encoding: [0x20,0xd0,0x72,0x6f]
> +// CHECK: sqrdmlsh v0.8h, v1.8h, v2.h[3]  // encoding: [0x20,0xf0,0x72,0x6f]
> +// CHECK: sqrdmlah v0.4s, v1.4s, v2.s[3]  // encoding: [0x20,0xd8,0xa2,0x6f]
> +// CHECK: sqrdmlsh v0.4s, v1.4s, v2.s[3]  // encoding: [0x20,0xf8,0xa2,0x6f]
> +
> +  sqrdmlah v0.4s, v1.2s, v2.s[1]
> +  sqrdmlsh v0.2s, v1.2d, v2.s[1]
> +  sqrdmlah v0.8h, v1.8h, v2.s[3]
> +  sqrdmlsh v0.8h, v1.8h, v2.h[8]
> +// CHECK-ERROR: error: invalid operand for instruction
> +// CHECK-ERROR:   sqrdmlah v0.4s, v1.2s, v2.s[1]
> +// CHECK-ERROR:                   ^
> +// CHECK-ERROR: error: invalid operand for instruction
> +// CHECK-ERROR:   sqrdmlsh v0.2s, v1.2d, v2.s[1]
> +// CHECK-ERROR:                   ^
> +// CHECK-ERROR: error: invalid operand for instruction
> +// CHECK-ERROR:   sqrdmlah v0.8h, v1.8h, v2.s[3]
> +// CHECK-ERROR:                          ^
> +// CHECK-ERROR: error: vector lane must be an integer in range [0, 7].
> +// CHECK-ERROR:   sqrdmlsh v0.8h, v1.8h, v2.h[8]
> +// CHECK-ERROR:                              ^
> +
> +  //AdvSIMD RDMA scalar by-element
> +  sqrdmlah h0, h1, v2.h[3]
> +  sqrdmlsh h0, h1, v2.h[3]
> +  sqrdmlah s0, s1, v2.s[3]
> +  sqrdmlsh s0, s1, v2.s[3]
> +// CHECK: sqrdmlah h0, h1, v2.h[3]  // encoding: [0x20,0xd0,0x72,0x7f]
> +// CHECK: sqrdmlsh h0, h1, v2.h[3]  // encoding: [0x20,0xf0,0x72,0x7f]
> +// CHECK: sqrdmlah s0, s1, v2.s[3]  // encoding: [0x20,0xd8,0xa2,0x7f]
> +// CHECK: sqrdmlsh s0, s1, v2.s[3]  // encoding: [0x20,0xf8,0xa2,0x7f]
> +
> +  sqrdmlah b0, h1, v2.h[3]
> +  sqrdmlah s0, d1, v2.s[3]
> +  sqrdmlsh h0, h1, v2.s[3]
> +  sqrdmlsh s0, s1, v2.s[4]
> +// CHECK-ERROR: error: invalid operand for instruction
> +// CHECK-ERROR:   sqrdmlah b0, h1, v2.h[3]
> +// CHECK-ERROR:            ^
> +// CHECK-ERROR: error: invalid operand for instruction
> +// CHECK-ERROR:   sqrdmlah s0, d1, v2.s[3]
> +// CHECK-ERROR:                ^
> +// CHECK-ERROR: error: invalid operand for instruction
> +// CHECK-ERROR:   sqrdmlsh h0, h1, v2.s[3]
> +// CHECK-ERROR:                    ^
> +// CHECK-ERROR: error: vector lane must be an integer in range [0, 3].
> +// CHECK-ERROR:   sqrdmlsh s0, s1, v2.s[4]
> +// CHECK-ERROR:                        ^
>
> Propchange: llvm/trunk/test/MC/AArch64/armv8.1a-rdma.s
> ------------------------------------------------------------------------------
>     svn:eol-style = native
>
> Propchange: llvm/trunk/test/MC/AArch64/armv8.1a-rdma.s
> ------------------------------------------------------------------------------
>     svn:keywords = Rev Date Author URL Id
>
> Added: llvm/trunk/test/MC/Disassembler/AArch64/armv8.1a-rdma.txt
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/MC/Disassembler/AArch64/armv8.1a-rdma.txt?rev=233693&view=auto
> ==============================================================================
> --- llvm/trunk/test/MC/Disassembler/AArch64/armv8.1a-rdma.txt (added)
> +++ llvm/trunk/test/MC/Disassembler/AArch64/armv8.1a-rdma.txt Tue Mar 31 08:15:48 2015
> @@ -0,0 +1,129 @@
> +# RUN: not llvm-mc -triple aarch64-none-linux-gnu -mattr=+v8.1a --disassemble < %s 2>&1 | FileCheck %s
> +
> +[0x20,0x84,0x02,0x2e] # sqrdmlah  v0.8b, v1.8b, v2.8b
> +[0x20,0x8c,0x02,0x2e] # sqrdmlsh  v0.8b, v1.8b, v2.8b
> +[0x20,0x84,0xc2,0x2e] # sqrdmlah  v0.1d, v1.1d, v2.1d
> +[0x20,0x8c,0xc2,0x2e] # sqrdmlsh  v0.1d, v1.1d, v2.1d
> +[0x20,0x84,0x02,0x6e] # sqrdmlah  v0.16b, v1.16b, v2.16b
> +[0x20,0x8c,0x02,0x6e] # sqrdmlsh  v0.16b, v1.16b, v2.16b
> +[0x20,0x84,0xc2,0x6e] # sqrdmlah  v0.2d, v1.2d, v2.2d
> +[0x20,0x8c,0xc2,0x6e] # sqrdmlsh  v0.2d, v1.2d, v2.2d
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0x84,0x02,0x2e]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0x8c,0x02,0x2e]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0x84,0xc2,0x2e]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0x8c,0xc2,0x2e]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0x84,0x02,0x6e]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0x8c,0x02,0x6e]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0x84,0xc2,0x6e]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0x8c,0xc2,0x6e]
> +
> +[0x20,0x84,0x02,0x7e] # sqrdmlah b0, b1, b2
> +[0x20,0x8c,0x02,0x7e] # sqrdmlsh b0, b1, b2
> +[0x20,0x84,0xc2,0x7e] # sqrdmlah d0, d1, d2
> +[0x20,0x8c,0xc2,0x7e] # sqrdmlsh d0, d1, d2
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0x84,0x02,0x7e]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0x8c,0x02,0x7e]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0x84,0xc2,0x7e]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0x8c,0xc2,0x7e]
> +
> +[0x20,0xd0,0x32,0x2f] # sqrdmlah v0.8b, v1.8b, v2.b[3]
> +[0x20,0xf0,0x32,0x2f] # sqrdmlsh v0.8b, v1.8b, v2.b[3]
> +[0x20,0xd0,0xe2,0x2f] # sqrdmlah v0.1d, v1.1d, v2.d[1]
> +[0x20,0xf0,0xe2,0x2f] # sqrdmlsh v0.1d, v1.1d, v2.d[1]
> +[0x20,0xd0,0x32,0x6f] # sqrdmlah v0.16b, v1.16b, v2.b[3]
> +[0x20,0xf0,0x32,0x6f] # sqrdmlsh v0.16b, v1.16b, v2.b[3]
> +[0x20,0xd8,0xe2,0x6f] # sqrdmlah v0.2d, v1.2d, v2.d[3]
> +[0x20,0xf8,0xe2,0x6f] # sqrdmlsh v0.2d, v1.2d, v2.d[3]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0xd0,0x32,0x2f]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0xf0,0x32,0x2f]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0xd0,0xe2,0x2f]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0xf0,0xe2,0x2f]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0xd0,0x32,0x6f]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0xf0,0x32,0x6f]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0xd8,0xe2,0x6f]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0xf8,0xe2,0x6f]
> +
> +[0x20,0xd0,0x32,0x7f] # sqrdmlah b0, b1, v2.b[3]
> +[0x20,0xf0,0x32,0x7f] # sqrdmlsh b0, b1, v2.b[3]
> +[0x20,0xd8,0xe2,0x7f] # sqrdmlah d0, d1, v2.d[3]
> +[0x20,0xf8,0xe2,0x7f] # sqrdmlsh d0, d1, v2.d[3]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0xd0,0x32,0x7f]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0xf0,0x32,0x7f]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0xd8,0xe2,0x7f]
> +# CHECK: warning: invalid instruction encoding
> +# CHECK: [0x20,0xf8,0xe2,0x7f]
> +
> +[0x20,0x84,0x42,0x2e]
> +[0x20,0x8c,0x42,0x2e]
> +[0x20,0x84,0x82,0x2e]
> +[0x20,0x8c,0x82,0x2e]
> +[0x20,0x84,0x42,0x6e]
> +[0x20,0x8c,0x42,0x6e]
> +[0x20,0x84,0x82,0x6e]
> +[0x20,0x8c,0x82,0x6e]
> +# CHECK: sqrdmlah  v0.4h, v1.4h, v2.4h
> +# CHECK: sqrdmlsh  v0.4h, v1.4h, v2.4h
> +# CHECK: sqrdmlah  v0.2s, v1.2s, v2.2s
> +# CHECK: sqrdmlsh  v0.2s, v1.2s, v2.2s
> +# CHECK: sqrdmlah  v0.8h, v1.8h, v2.8h
> +# CHECK: sqrdmlsh  v0.8h, v1.8h, v2.8h
> +# CHECK: sqrdmlah  v0.4s, v1.4s, v2.4s
> +# CHECK: sqrdmlsh  v0.4s, v1.4s, v2.4s
> +
> +[0x20,0x84,0x42,0x7e]
> +[0x20,0x8c,0x42,0x7e]
> +[0x20,0x84,0x82,0x7e]
> +[0x20,0x8c,0x82,0x7e]
> +# CHECK: sqrdmlah h0, h1, h2
> +# CHECK: sqrdmlsh h0, h1, h2
> +# CHECK: sqrdmlah s0, s1, s2
> +# CHECK: sqrdmlsh s0, s1, s2
> +
> +0x20,0xd0,0x72,0x2f
> +0x20,0xf0,0x72,0x2f
> +0x20,0xd0,0xa2,0x2f
> +0x20,0xf0,0xa2,0x2f
> +0x20,0xd0,0x72,0x6f
> +0x20,0xf0,0x72,0x6f
> +0x20,0xd8,0xa2,0x6f
> +0x20,0xf8,0xa2,0x6f
> +# CHECK: sqrdmlah v0.4h, v1.4h, v2.h[3]
> +# CHECK: sqrdmlsh v0.4h, v1.4h, v2.h[3]
> +# CHECK: sqrdmlah v0.2s, v1.2s, v2.s[1]
> +# CHECK: sqrdmlsh v0.2s, v1.2s, v2.s[1]
> +# CHECK: sqrdmlah v0.8h, v1.8h, v2.h[3]
> +# CHECK: sqrdmlsh v0.8h, v1.8h, v2.h[3]
> +# CHECK: sqrdmlah v0.4s, v1.4s, v2.s[3]
> +# CHECK: sqrdmlsh v0.4s, v1.4s, v2.s[3]
> +
> +0x20,0xd0,0x72,0x7f
> +0x20,0xf0,0x72,0x7f
> +0x20,0xd8,0xa2,0x7f
> +0x20,0xf8,0xa2,0x7f
> +# CHECK: sqrdmlah h0, h1, v2.h[3]
> +# CHECK: sqrdmlsh h0, h1, v2.h[3]
> +# CHECK: sqrdmlah s0, s1, v2.s[3]
> +# CHECK: sqrdmlsh s0, s1, v2.s[3]
>
> Propchange: llvm/trunk/test/MC/Disassembler/AArch64/armv8.1a-rdma.txt
> ------------------------------------------------------------------------------
>     svn:eol-style = native
>
> Propchange: llvm/trunk/test/MC/Disassembler/AArch64/armv8.1a-rdma.txt
> ------------------------------------------------------------------------------
>     svn:keywords = Rev Date Author URL Id
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits


-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No:  2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No:  2548782





More information about the llvm-commits mailing list