[PATCH] D41792: [AArch64] Add ARMv8.2-A FP16 scalar intrinsics

Thu Jan 11 01:35:12 PST 2018

SjoerdMeijer added inline comments.

================
Comment at: clang/include/clang/Basic/arm_fp16.td:58
+class IInst<string n, string p, string t> : Inst<n, p, t, OP_NONE> {}
+
+// ARMv8.2-A FP16 intrinsics.
----------------
az wrote:
> SjoerdMeijer wrote:
> > There's a little bit of duplication here: the definitions above are the same as in arm_neon.td. Would it be easy to share this, with e.g. an include?
> The duplication is small compared to the overall infrastructure/data structure needed to automatically generate the intrinsics. There are 3 ways to do this: 1) copy only the needed data structure in arm_fp16.td (this is what was done in original review) 2) put all data structure in a newly created file and include it in arm_neon.td and arm_fp16.td (done here). 3) put only the duplication in a new file and include it. I did not go for this one given that we create a new file for the only purpose of avoiding a small duplication but I am fine of going with 3 too. Note that some of the duplicated structure in the original arm_fp16.td was a stripped down version of the copied one.
Given that the duplication is tiny, I don't have strong opinions to be honest. Would be nice to share these definitions if that's easy to do, otherwise we can perfectly live with this I think.

================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:4102
   NEONMAP1(vuqadds_s32, aarch64_neon_suqadd, Add1ArgType),
+  // FP16 scalar intrinisics go here.
+  NEONMAP1(vabdh_f16, aarch64_sisd_fabd, Add1ArgType),
----------------
az wrote:
> SjoerdMeijer wrote:
> > Looks like  a few intrinsic descriptions are missing here. For example, the first 2-operand intrinsic vaddh_f16 is missing, but there are also more. Is this intentional, or might they have slipped through the cracks (or am I missing something)?
> I agree that this is confusing. For the intrinsics listed in this table, code generation happens in a generic way based on the info in the table. The ones not listed in this table are addressed in a more specific way below in a the function called EmitAArch64BuiltinExpr. While I do not like how few things were implemented in generating the intrinsics, I am in general following the approach taken for arm_neon instead of introducing a new approach. 
Ah, I see, I somehow missed that. Fair enough.

================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:6511
                                         "vgetq_lane");
+   case NEON::BI__builtin_neon_vaddh_f16:
+     Ops.push_back(EmitScalarExpr(E->getArg(1)));
----------------
nit: indentation seems off by 1 for these case statements.

================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:6531
+     Value *F = CGM.getIntrinsic(Intrinsic::fma, HalfTy);
+    Value *Zero = llvm::ConstantFP::getZeroValueForNegation(HalfTy);
+     Value* Sub = Builder.CreateFSub(Zero, EmitScalarExpr(E->getArg(1)), "vsubh");
----------------
nit: indentation seems off by 1

================
Comment at: clang/utils/TableGen/NeonEmitter.cpp:2464
+        " * Permission is hereby granted, free of charge, to any person "
+        "obtaining "
+        "a copy\n"
----------------
more nits: I see this is copied from above, but I think this and the next line can be on the same line, just increasing readability a tiny bit.

================
Comment at: clang/utils/TableGen/NeonEmitter.cpp:2468
+        "\"Software\"),"
+        " to deal\n"
+        " * in the Software without restriction, including without limitation "
----------------
same here

================
Comment at: clang/utils/TableGen/NeonEmitter.cpp:2470
+        " * in the Software without restriction, including without limitation "
+        "the "
+        "rights\n"
----------------
and same here

================
Comment at: llvm/include/llvm/IR/IntrinsicsAArch64.td:250
   def int_aarch64_neon_umax : AdvSIMD_2VectorArg_Intrinsic;
-  def int_aarch64_neon_fmax : AdvSIMD_2VectorArg_Intrinsic;
+  def int_aarch64_neon_fmax : AdvSIMD_2FloatArg_Intrinsic;
   def int_aarch64_neon_fmaxnmp : AdvSIMD_2VectorArg_Intrinsic;
----------------
There's a scalar and vector variant of FMAX and thus I am wondering if we don't need two definitions here: one using AdvSIMD_2FloatArg_Intrinsic and the other AdvSIMD_2VectorArg_Intrinsic?

================
Comment at: llvm/include/llvm/IR/IntrinsicsAArch64.td:262
   def int_aarch64_neon_umin : AdvSIMD_2VectorArg_Intrinsic;
-  def int_aarch64_neon_fmin : AdvSIMD_2VectorArg_Intrinsic;
+  def int_aarch64_neon_fmin : AdvSIMD_2FloatArg_Intrinsic;
   def int_aarch64_neon_fminnmp : AdvSIMD_2VectorArg_Intrinsic;
----------------
Same here for FMIN

https://reviews.llvm.org/D41792