[PATCH] [AArch64] Add v8.1a RDMA extension
    Vladimir Sukharev 
    vladimir.sukharev at arm.com
       
    Tue Mar  3 13:00:43 PST 2015
    
    
  
Hi Tim, 
thank you for your warm feedback.
1. **I don't think these new intrinsics are needed. The instructions are effectively "(int_aarch64_neon_sqadd $acc, (int_aarch64_neon_sqrdmulh $LHS, $RHS))".**
Ok, but now I have a severe trouble that could be well-known for v1iNN types. If so, would you please give some hint?
For scalar operations, I have temporarily added
          lib/Target/AArch64/AArch64InstrFormats.td:
  let mayStore = 0, mayLoad = 0, hasSideEffects = 0 in
  class BaseSIMDThreeScalar_cstr<bit U, bits<2> size, bits<5> opcode,
                          dag oops, dag iops, string asm, string cstr, 
                          list<dag> pattern>
    : I<oops, iops, asm,
        "\t$Rd, $Rn, $Rm", cstr, pattern>,
      Sched<[WriteV]> {
    bits<5> Rd;
    bits<5> Rn;
    bits<5> Rm;
    let Inst{31-30} = 0b01;
    let Inst{29}    = U;
    let Inst{28-24} = 0b11110;
    let Inst{23-22} = size;
    let Inst{21}    = 1;
    let Inst{20-16} = Rm;
    let Inst{15-11} = opcode;
    let Inst{10}    = 1;
    let Inst{9-5}   = Rn;
    let Inst{4-0}   = Rd;
  }
  class BaseSIMDThreeScalarExtRDMA<bit U, bits<2> size, bits<5> opcode,
                          dag oops, dag iops, string asm,
                          list<dag> pattern>
    : BaseSIMDThreeScalar_cstr<U, size, opcode, oops, iops, 
  			  asm, "$Rd = $dst", pattern> {
    let Inst{21} =0;
  }
  multiclass SIMDThreeScalarHSExtRDMA<bit U, bits<5> opc, string asm,
                                 SDPatternOperator OpNode = null_frag> {
    def i32  : BaseSIMDThreeScalarExtRDMA<U, 0b10, opc, (outs FPR32:$dst), (ins FPR32:$Rd, FPR32:$Rn, FPR32:$Rm), asm, []>;
    def i16  : BaseSIMDThreeScalarExtRDMA<U, 0b01, opc, (outs FPR16:$dst), (ins FPR16:$Rd, FPR16:$Rn, FPR16:$Rm), asm, []>;
  //  def v1i16  : BaseSIMDThreeScalarExtRDMA<U, 0b01, opc, (outs FPR16:$dst), (ins FPR16:$Rd, FPR16:$Rn, FPR16:$Rm), asm, []>;
  }
  
  
          lib/Target/AArch64/AArch64InstrInfo.td:
  defm SQRDMLAH : SIMDThreeScalarHSExtRDMA<1, 0b10000, "sqrdmlah">;
  def : Pat<(i16 (int_aarch64_neon_sqadd (i16 FPR16:$Rd),
                    (i16 (int_aarch64_neon_sqrdmulh (i16 FPR16:$Rn),
                                                      (i16 FPR16:$Rm))))),
            (SQRDMLAHi16 FPR16:$Rd, FPR16:$Rn, FPR16:$Rm)>;
  //def : Pat<(v1i16 (int_aarch64_neon_sqadd (v1i16 FPR16:$Rd),
  //                  (v1i16 (int_aarch64_neon_sqrdmulh (v1i16 FPR16:$Rn),
  //                                                    (v1i16 FPR16:$Rm))))),
  //          (SQRDMLAHv1i16 FPR16:$Rd, FPR16:$Rn, FPR16:$Rm)>;
  def : Pat<(i32 (int_aarch64_neon_sqadd (i32 FPR32:$Rd),
                    (i32 (int_aarch64_neon_sqrdmulh (i32 FPR32:$Rn),
                                                      (i32 FPR32:$Rm))))),
            (SQRDMLAHi32 FPR32:$Rd, FPR32:$Rn, FPR32:$Rm)>;
SQRDMLAHi32 works fine, but for i16 version, I got Tablegen error
  anonymous_1513: 	(intrinsic_wo_chain:<empty> 117:<empty>, FPR16:i16:$Rd, (intrinsic_wo_chain:i16 124:<empty>, FPR16:i16:$Rn, FPR16:i16:$Rm))
  Included from /work/llvm/lib/Target/AArch64/AArch64.td:58:
  /work/llvm/lib/Target/AArch64/AArch64InstrInfo.td:2992:1: error: In anonymous_1513: Type inference contradiction found, merging '{i32:i64:v8i8:v16i8:v4i16:v8i16:v2i32:v4i32:v1i64:v2i64}' into 'i16'
  def : Pat<(i16 (int_aarch64_neon_sqadd (i16 FPR16:$Rd),
  ^
  anonymous_1513: 	(SQRDMLAHi16:f16 FPR16:f16:$Rd, FPR16:<empty>:$Rn, FPR16:f16:$Rm)
  Included from /work/llvm/lib/Target/AArch64/AArch64.td:58:
  /work/llvm/lib/Target/AArch64/AArch64InstrInfo.td:2992:1: error: In anonymous_1513: Type inference contradiction found, merging 'i16' into 'f16'
  def : Pat<(i16 (int_aarch64_neon_sqadd (i16 FPR16:$Rd),
  ^
  anonymous_1513: 	(intrinsic_wo_chain:<empty> 117:<empty>, FPR16:i16:$Rd, (intrinsic_wo_chain:i16 124:<empty>, FPR16:i16:$Rn, FPR16:i16:$Rm))
The similar is for v1i16
  anonymous_1513: 	(intrinsic_wo_chain:<empty> 117:<empty>, FPR16:v1i16:$Rd, (intrinsic_wo_chain:v1i16 124:<empty>, FPR16:v1i16:$Rn, FPR16:v1i16:$Rm))
  Included from /work/llvm/lib/Target/AArch64/AArch64.td:58:
  /work/llvm/lib/Target/AArch64/AArch64InstrInfo.td:2996:1: error: In anonymous_1513: Type inference contradiction found, merging '{i32:i64:v8i8:v16i8:v4i16:v8i16:v2i32:v4i32:v1i64:v2i64}' into 'v1i16'
  def : Pat<(v1i16 (int_aarch64_neon_sqadd (v1i16 FPR16:$Rd),
  ^
  anonymous_1513: 	(SQRDMLAHv1i16:f16 FPR16:f16:$Rd, FPR16:<empty>:$Rn, FPR16:f16:$Rm)
  Included from /work/llvm/lib/Target/AArch64/AArch64.td:58:
  /work/llvm/lib/Target/AArch64/AArch64InstrInfo.td:2996:1: error: In anonymous_1513: Type inference contradiction found, merging 'v1i16' into 'f16'
  def : Pat<(v1i16 (int_aarch64_neon_sqadd (v1i16 FPR16:$Rd),
  ^
  anonymous_1513: 	(intrinsic_wo_chain:<empty> 117:<empty>, FPR16:v1i16:$Rd, (intrinsic_wo_chain:v1i16 124:<empty>, FPR16:v1i16:$Rn, FPR16:v1i16:$Rm))
It's not the first time I see LLVM has problems with i16, thats why I in my first patch I skipped both implementation and tests for SQRDMLAHi16.
Previously. I was able to implement at least SQRDMLAHv1i16, but now I cannot do it even for v1i16.
What would you suggest here?
Note: First I tried another way of implementation, like the following snipped for vector variant. It has failed due to incomplete type inference between two intrinsics.
  lib/Target/AArch64/AArch64InstrFormats.td:
  multiclass SIMDThreeSameVectorExtRDMA<bit U, bits<5> opc, string asm,
                                 SDPatternOperator OpNode> {
    def v4i16 : BaseSIMDThreeSameVectorExtRDMA<0, U, 0b01, opc, V64,
                                        asm, ".4h",
          [(set (v4i16 V64:$dst),
              (OpNode (v4i16 V64:$Rd), (v4i16 V64:$Rn), (v4i16 V64:$Rm)))]>;
  }
  
  
  lib/Target/AArch64/AArch64InstrInfo.td:
  defm SQRDMLAH : SIMDThreeSameVectorExtRDMA<1,0b10000,"sqrdmlah",
           TriOpFrag<(int_aarch64_neon_sqadd node:$LHS,
       (int_aarch64_neon_sqrdmulh node:$MHS, node:$RHS))> >;
  
  SQRDMLAHv4i16:  (set V64:v4i16:$dst, (intrinsic_wo_chain:v4i16 117:iPTR, V64:v4i16:$Rd, (intrinsic_wo_chain:{i32:i64:v8i8:v16i8:v4i16:v8i16:v2i32:v4i32:v1i64:v2i64} 124:iPTR, V64:v4i16:$Rn, V64:v4i16:$Rm)))
  Included from /work/llvm/lib/Target/AArch64/AArch64.td:58:
  /work/llvm/lib/Target/AArch64/AArch64InstrInfo.td:2730:1: error: In SQRDMLAHv4i16: Could not infer all types in pattern!
though, when I changed "sqadd" for "add", it worked. So, I have abandoned this "TriOpFrag" approach,
2. **it'd be much better to keep things as hierarchical as possible and just add a HasV8_1 predicate, or perhaps FeatureFPARMv8_1ble and just add a HasV8_1 predicate, or perhaps FeatureFPARMv8_1,**
At the time when these predicates has been introduced downstream, we were not sure, whether it might be partial v8.05 implementations. Nowadays we are almost sure it should not be, but not 100%. However, I'd agree with you, and in case there unexpectedly some partial implementation will take place in the future, we'd better split this HasV8_1 predicate.
I'll change HasRDMA for HasV8_1 in next revision.
Yet, fortunately, no FeatureFPARMv8_1 will be required : FP has not been improved.
http://reviews.llvm.org/D7998
EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/
    
    
More information about the llvm-commits
mailing list