[all-commits] [llvm/llvm-project] 45a994: [ARM, MVE] Add ACLE intrinsics for the vminv/vmaxv ...

Fri Mar 20 08:44:04 PDT 2020

  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: 45a9945b9ea95bd065d3c4e08d9089a309b24a23
      https://github.com/llvm/llvm-project/commit/45a9945b9ea95bd065d3c4e08d9089a309b24a23
  Author: Simon Tatham <simon.tatham at arm.com>
  Date:   2020-03-20 (Fri, 20 Mar 2020)

  Changed paths:
    M clang/include/clang/Basic/arm_mve.td
    M clang/test/CodeGen/arm-mve-intrinsics/vminvq.c
    M llvm/include/llvm/IR/IntrinsicsARM.td
    M llvm/lib/Target/ARM/ARMISelLowering.cpp
    M llvm/lib/Target/ARM/ARMInstrMVE.td
    M llvm/test/CodeGen/Thumb2/mve-intrinsics/vminvq.ll

  Log Message:
  -----------
  [ARM,MVE] Add ACLE intrinsics for the vminv/vmaxv family.

Summary:
I've implemented these as target-specific IR intrinsics, because
they're not //quite// enough like @llvm.experimental.vector.reduce.min
(which doesn't take the extra scalar parameter). Also this keeps the
predicated and unpredicated versions looking similar, and the
floating-point minnm/maxnm versions fold into the same schema.

We had a couple of min/max reductions already implemented, from the
initial pathfinding exercise in D67158. Those were done by having
separate IR intrinsic names for the signed and unsigned integer
versions; as part of this commit, I've changed them to use a flag
parameter indicating signedness, which is how we ended up deciding
that the rest of the MVE intrinsics family ought to work. So now
hopefully the ewhole lot is consistent.

In the new llc test, the output code from the `v8f16` test functions
looks quite unpleasant, but most of it is PCS lowering (you can't pass
a `half` directly in or out of a function). In other circumstances,
where you do something else with your `half` in the same function, it
doesn't look nearly as nasty.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76490

  Commit: 1adfa4c99169733dedb67b4f7ab03d2fbb196162
      https://github.com/llvm/llvm-project/commit/1adfa4c99169733dedb67b4f7ab03d2fbb196162
  Author: Simon Tatham <simon.tatham at arm.com>
  Date:   2020-03-20 (Fri, 20 Mar 2020)

  Changed paths:
    M clang/include/clang/Basic/arm_mve.td
    A clang/test/CodeGen/arm-mve-intrinsics/vaddv.c
    M llvm/include/llvm/IR/IntrinsicsARM.td
    M llvm/lib/Target/ARM/ARMISelLowering.cpp
    M llvm/lib/Target/ARM/ARMISelLowering.h
    M llvm/lib/Target/ARM/ARMInstrMVE.td
    A llvm/test/CodeGen/Thumb2/mve-intrinsics/vaddv.ll

  Log Message:
  -----------
  [ARM,MVE] Add ACLE intrinsics for the vaddv/vaddlv family.

Summary:
I've implemented them as target-specific IR intrinsics rather than
using `@llvm.experimental.vector.reduce.add`, on the grounds that the
'experimental' intrinsic doesn't currently have much code generation
benefit, and my replacements encapsulate the sign- or zero-extension
so that you don't expose the illegal MVE vector type (`<4 x i64>`) in
IR.

The machine instructions come in two versions: with and without an
input accumulator. My new IR intrinsics, like the 'experimental' one,
don't take an accumulator parameter: we represent that by just adding
on the input value using an ordinary i32 or i64 add. So if you write
the `vaddvaq` C-language intrinsic with an input accumulator of zero,
it can be optimised to VADDV, and conversely, if you write something
like `x += vaddvq(y)` then that can be combined into VADDVA.

Most of this is achieved in isel lowering, by converting these IR
intrinsics into the existing `ARMISD::VADDV` family of custom SDNode
types. For the difficult case (64-bit accumulators), isel lowering
already implements the optimization of folding an addition into a
VADDLV to make a VADDLVA; so once we've made a VADDLV, our job is
already done, except that I had to introduce a parallel set of ARMISD
nodes for the //predicated// forms of VADDLV.

For the simpler VADDV, we handle the predicated form by just leaving
the IR intrinsic alone and matching it in an ordinary dag pattern.

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76491

Compare: https://github.com/llvm/llvm-project/compare/eddede9d5184...1adfa4c99169