[PATCH] D67158: [ARM] Add IR intrinsics for a sample of MVE instructions.

Mon Sep 9 02:51:17 PDT 2019

dmgreen added inline comments.

================
Comment at: llvm/include/llvm/IR/IntrinsicsARM.td:806
+
+defm int_arm_mve_minv: IntrinsicSignSuffix<[llvm_i32_ty],
+   [llvm_i32_ty, llvm_anyvector_ty], [IntrNoMem]>;
----------------
A vminv is very close to a llvm.experimental.vector.reduce.umin/llvm.experimental.vector.reduce.smin, although that doesn't contain the first parameter. (I'm just mentioning this for general interest mostly, I don't think we should be using the thing yet. It may be more interesting in the future if llvm started optimising the vector.reduce better, but for the moment I think that the arm intrinsic is a good idea).

================
Comment at: llvm/include/llvm/IR/IntrinsicsARM.td:816
+
+def int_arm_mve_fltnarrow: Intrinsic<[llvm_v8f16_ty],
+   [llvm_v8f16_ty, llvm_v4f32_ty, llvm_i32_ty], [IntrNoMem]>;
----------------
Any reason this is called fltnarrow? As opposed to something closer to the name of the instruction?

================
Comment at: llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp:212
+  template <typename SDValueVector>
+  void MVE_Predicated(SDValueVector &Ops, SDLoc Loc, SDValue PredicateMask);
+  template <typename SDValueVector>
----------------
Maybe call this something like  "AddMVEPredicateToOps"?

================
Comment at: llvm/lib/Target/ARM/ARMInstrMVE.td:283
+// foreach or multiclass definitions.
+class mkpred<ValueType VT> {
+  ValueType p = !cond(!eq(VT.Value, v16i8.Value): v16i1,
----------------
The intel backend has a different way of doing this in the X86InstrAVX512.td file. It defined a set of "X86VectorVTInfo" classes for each type, that can contains extra information about that vector type. That way we could store the predicate type with the other info, and instead of adding the ValueType to the MVE_VADDSUB, for example, we could add this class data. (And potentially add extra stuff like signed_suffix and unsigned_suffix, etc)

I'm not sure if that method is better of worse than this, either from an efficiency of tblgen, or if it's simpler to read.

================
Comment at: llvm/lib/Target/ARM/ARMInstrMVE.td:695
+    def : Pat<(i32 (int_arm_mve_minv_s (i32 rGPR:$prev), (vtype MQPR:$vec))),
+              (i32 (MVE_VMINVs8 (i32 rGPR:$prev), (vtype MQPR:$vec)))>;
+    def : Pat<(i32 (int_arm_mve_minv_u (i32 rGPR:$prev), (vtype MQPR:$vec))),
----------------
Are the instructions correct here? Should they be s8/s16/s32?

================
Comment at: llvm/lib/Target/ARM/ARMInstrMVE.td:1498
+  foreach ptype = [mkpred<vtype>.p] in {
+  def : Pat<(vtype (add (vtype MQPR:$Qm), (vtype MQPR:$Qn))),
+            (vtype (instr (vtype MQPR:$Qm), (vtype MQPR:$Qn)))>;
----------------
I feel like this deserves at least a little indentation. Maybe not for each level of for, but something to make it easier to parse. The last 2 levels are really just defining variables, right? And could be written in-place.

================
Comment at: llvm/lib/Target/ARM/ARMInstrMVE.td:5246
+foreach vi1 = [ v16i1, v8i1, v4i1 ] in {
+  def : Pat<(vi1 (int_arm_mve_pred_i2v (i32 GPR:$pred))),
+            (vi1 (COPY_TO_REGCLASS GPR:$pred, VCCR))>;
----------------
There is an isel node called predicate_cast that does the same thing as this. It may be possible/beneficial to convert this earlier (during lowering, but I'm not sure that would do anything yet). The patterns are further up, near a lot of the vcmp patterns.

================
Comment at: llvm/test/CodeGen/Thumb2/mve-intrinsics/vadc.ll:2
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
+; RUN: llc -mtriple=thumbv8.1m.main -mattr=+mve.fp -o - %s | FileCheck %s
+
----------------
-verify-machineinstrs is probably worth putting on most of these tests.

================
Comment at: llvm/test/CodeGen/Thumb2/mve-intrinsics/vadc.ll:22
+
+declare { <4 x i32>, i32 } @llvm.arm.mve.vadc.v4i32(<4 x i32>, <4 x i32>, i32) #1
+
----------------
The #1 can be removed?

================
Comment at: llvm/test/CodeGen/Thumb2/mve-intrinsics/vcvt.ll:10
+entry:
+  %0 = tail call <8 x half> @llvm.arm.mve.fltnarrow(<8 x half> %a, <4 x float> %b, i32 1)
+  ret <8 x half> %0
----------------
Can you add test for 0 too

================
Comment at: llvm/test/CodeGen/Thumb2/mve-intrinsics/vminvq.ll:4
+
+define arm_aapcs_vfpcc i32 @test_vminvq_u32(i32 %a, <4 x i32> %b) {
+; CHECK-LABEL: test_vminvq_u32:
----------------
Test other VT's too, please.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67158/new/

https://reviews.llvm.org/D67158