[PATCH] D74620: [ARM,MVE] Add vector-scalar intrinsics

Mon Feb 17 00:22:23 PST 2020

dmgreen added a comment.

I like how this uses a splat for all the register arguments. That sounds like a good idea.

The one's that worry me are the floating point instructions. Last time we tried those it was actually causing performance regressions because of extra sp->gpr mov's left in the loop.

If this is just the backend patterns though, not the sinking of splats into loops too, then I think it should be OK. On it's own I don't think it will usually cause problems. And some quick tests seem to verify that.

================
Comment at: clang/test/CodeGen/arm-mve-intrinsics/vaddq.c:2
 // NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
-// RUN: %clang_cc1 -triple thumbv8.1m.main-arm-none-eabi -target-feature +mve.fp -mfloat-abi hard -fallow-half-arguments-and-returns -O0 -disable-O0-optnone -S -emit-llvm -o - %s | opt -S -mem2reg | FileCheck %s
-// RUN: %clang_cc1 -triple thumbv8.1m.main-arm-none-eabi -target-feature +mve.fp -mfloat-abi hard -fallow-half-arguments-and-returns -O0 -disable-O0-optnone -DPOLYMORPHIC -S -emit-llvm -o - %s | opt -S -mem2reg | FileCheck %s
+// RUN: %clang_cc1 -triple thumbv8.1m.main-arm-none-eabi -target-feature +mve.fp -mfloat-abi hard -fallow-half-arguments-and-returns -O0 -disable-O0-optnone -S -emit-llvm -o - %s | opt -S -O1 | FileCheck %s
+// RUN: %clang_cc1 -triple thumbv8.1m.main-arm-none-eabi -target-feature +mve.fp -mfloat-abi hard -fallow-half-arguments-and-returns -O0 -disable-O0-optnone -DPOLYMORPHIC -S -emit-llvm -o - %s | opt -S -O1 | FileCheck %s
----------------
Why is this running the entire -O1 pass pipeline? These tests deliberately uses a limit subset to not need adjusting with every midend llvm change. (But not be littered with clang's verbose ir output).

I'm guessing the half args are being a pain again. Is it something to do with halfs?

================
Comment at: llvm/lib/Target/ARM/ARMInstrMVE.td:4496
+                            UnpredSign)),
+              (VTI.Vec (inst (VTI.Vec MQPR:$Qm), (i32 GPR:$val)))>;
+    // Predicated version
----------------
These GPR's can use the same regclass as the instruction. rGPR in this case I think?

================
Comment at: llvm/lib/Target/ARM/ARMInstrMVE.td:4566
+                          0b0, VTI.Unsigned>;
+  defvar unpred_op = !if(VTI.Unsigned, unpred_op_u, unpred_op_s);
+  defm : MVE_vec_scalar_int_pat_m<!cast<Instruction>(NAME), VTI,
----------------
I find all these if's at different levels a little hard to follow. It looks OK, but is it possible to rearrange things to not need it here?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74620/new/

https://reviews.llvm.org/D74620