[all-commits] [llvm/llvm-project] 1b4529: [ARM] Begin adding IR intrinsics for MVE instructi...

Simon Tatham via All-commits all-commits at lists.llvm.org
Thu Oct 24 08:34:58 PDT 2019


  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: 1b45297e013e1edae5028d844ca7cb591c79b07d
      https://github.com/llvm/llvm-project/commit/1b45297e013e1edae5028d844ca7cb591c79b07d
  Author: Simon Tatham <simon.tatham at arm.com>
  Date:   2019-10-24 (Thu, 24 Oct 2019)

  Changed paths:
    M llvm/include/llvm/IR/IntrinsicsARM.td
    M llvm/lib/Target/ARM/ARMISelLowering.cpp
    M llvm/lib/Target/ARM/ARMInstrMVE.td
    A llvm/test/CodeGen/Thumb2/mve-intrinsics/vaddq.ll
    A llvm/test/CodeGen/Thumb2/mve-intrinsics/vcvt.ll
    A llvm/test/CodeGen/Thumb2/mve-intrinsics/vminvq.ll

  Log Message:
  -----------
  [ARM] Begin adding IR intrinsics for MVE instructions.

This commit, together with the next few, will add a representative
sample of the kind of IR intrinsics that we'll need in order to
implement the user-facing ACLE intrinsics for MVE. Supporting all of
them will take more work; the intention of this initial series of
commits is to implement an intrinsic or two from lots of different
categories, as examples and proofs of concept.

This initial commit introduces a small number of IR intrinsics for
instructions simple enough that they can use Tablegen ISel patterns:
the predicated versions of the VADD and VSUB instructions (both
integer and FP), VMIN and VMAX, and the float->half VCVT instruction
(predicated and unpredicated).

When using VPT-predicated instructions in automatic code generation,
it will be convenient to specify the predicate value as a vector of
the appropriate number of i1. To make it easy to specify all sizes of
an instruction in one go and give each one the matching predicate
vector type, I've added a system of Tablegen informational records
describing MVE's vector types: each one gives the underlying LLVM IR
ValueType (which may not be the same if the MVE vector is of
explicitly signed or unsigned integers) and an appropriate vNi1 to use
as the predicate vector.

(Also, those info records include the usual encoding for the types, so
that as we add associations between each instruction encoding and one
of the new `MVEVectorVTInfo` records, we can remove some of the
existing template parameters and replace them with references to the
vector type info's fields.)

The user-facing ACLE intrinsics will receive a predicate mask as a
16-bit integer, so I've also provided a pair of intrinsics i2v and
v2i, to convert between an integer and a vector of i1 by just changing
the register class.

Reviewers: dmgreen, miyuki, ostannard

Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67158


  Commit: ceeff95ca48f0c1460c8feb4eebced9a5cd12b58
      https://github.com/llvm/llvm-project/commit/ceeff95ca48f0c1460c8feb4eebced9a5cd12b58
  Author: Simon Tatham <simon.tatham at arm.com>
  Date:   2019-10-24 (Thu, 24 Oct 2019)

  Changed paths:
    M llvm/include/llvm/IR/IntrinsicsARM.td
    M llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
    A llvm/test/CodeGen/Thumb2/mve-intrinsics/scalar-shifts.ll
    A llvm/test/CodeGen/Thumb2/mve-intrinsics/vadc.ll
    A llvm/test/CodeGen/Thumb2/mve-intrinsics/vldr.ll

  Log Message:
  -----------
  [ARM] Add some sample IR MVE intrinsics with C++ isel.

This adds some initial example IR intrinsics for MVE instructions that
deliver multiple output values, and hence, have to be instruction-
selected by custom C++ code instead of Tablegen patterns.

I've added the writeback gather load instructions (taking a vector of
base addresses and a single common offset, returning a vector of
loaded values and an updated vector of base addresses); one example
from the long shift family (taking and returning a 64-bit value in two
GPRs); and the VADC instruction (which propagates a carry bit from
each vector-lane addition to the next, taking an input carry flag in
FPSCR and outputting the final one in FPSCR as well).

To support the VPT-predicated forms of these instructions, I've
written some helper functions to add the cluster of MVE predicate
operands to the end of a MachineInstr. `AddMVEPredicateToOps` is used
when the instruction actually is predicated (so it takes a predicate
mask argument), and `AddEmptyMVEPredicateToOps` is for when the
instruction is unpredicated (so it fills in $noreg for the mask). Each
one comes in a form suitable for `vpred_n`, and one for `vpred_r`
which takes the extra 'inactive' parameter.

For VADC, the representation of the carry flag in the IR intrinsic is
a word intended to be moved directly to and from `FPSCR_nzcvqc`, i.e.
with the carry flag in bit 29 of the word. (The user-facing ACLE
intrinsic will want it to be in bit 0, but I'll do that on the clang
side.)

Reviewers: dmgreen, miyuki, ostannard

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68699


  Commit: e0ef4ebe2f6ac3523ee25081b36c114c0f0ea695
      https://github.com/llvm/llvm-project/commit/e0ef4ebe2f6ac3523ee25081b36c114c0f0ea695
  Author: Simon Tatham <simon.tatham at arm.com>
  Date:   2019-10-24 (Thu, 24 Oct 2019)

  Changed paths:
    M llvm/include/llvm/IR/IntrinsicsARM.td
    M llvm/lib/Target/ARM/ARMISelDAGToDAG.cpp
    M llvm/lib/Target/ARM/ARMInstrMVE.td
    A llvm/test/CodeGen/Thumb2/mve-intrinsics/vld24.ll

  Log Message:
  -----------
  [ARM] Add IR intrinsics for MVE VLD[24] and VST[24].

The VST2 and VST4 instructions take two or four vector registers as
input, and store part of each register to memory in an interleaved
pattern. They come in variants indicating which part of each register
they store (VST20 and VST21; VST40 to VST43 inclusive); the intention
is that issuing each of those variants in turn has the combined effect
of loading or storing the whole set of registers to a memory block of
equal size. The corresponding VLD2 and VLD4 instructions load from
memory in the same interleaved format: each one overwrites only part
of its output register set, and again, the idea is that if you use
VLD4{0,1,2,3} or VLD2{0,1} together, you end up having written to the
whole of each register.

I've implemented the stores and loads quite differently. The loads
were easiest to implement as a single intrinsic that expands to all
four VLD4x instructions or both VLD2x, delivering four complete output
registers. (Implementing each individual load as a separate
instruction taking four input registers to partially overwrite is
possible in theory, but pointless, and when I tried it, I found it
would need extra work to get the register allocation not to be
horrible.) Since that intrinsic delivers multiple outputs, it has to
be instruction-selected in custom C++.

But the store instructions are easier to model individually, because
they don't overwrite any register at all and you can write a DAG Isel
pattern in Tablegen for each one.

Hence, my new intrinsic `int_arm_mve_vld4q` expands to four load
instructions, delivers four full output vectors, and is handled by C++
code, whereas `int_arm_mve_vst4q` expands to just one store
instruction, takes four input vectors and a constant indicating which
lanes to store, and is handled entirely in Tablegen. (And similarly
for vld2q/vst2q.) This is asymmetric, but it was the easiest way to do
each one.

Reviewers: dmgreen, miyuki, ostannard

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68700


  Commit: 7c11da0cfd3396150c13657a2217f2044f314734
      https://github.com/llvm/llvm-project/commit/7c11da0cfd3396150c13657a2217f2044f314734
  Author: Simon Tatham <simon.tatham at arm.com>
  Date:   2019-10-24 (Thu, 24 Oct 2019)

  Changed paths:
    M clang/include/clang/Basic/Attr.td
    M clang/include/clang/Basic/AttrDocs.td
    M clang/include/clang/Basic/DiagnosticSemaKinds.td
    M clang/lib/AST/Decl.cpp
    M clang/lib/Sema/SemaDeclAttr.cpp
    M clang/test/Misc/pragma-attribute-supported-attributes-list.test
    A clang/test/Sema/arm-mve-alias-attribute.c

  Log Message:
  -----------
  [clang] New __attribute__((__clang_arm_mve_alias)).

This allows you to declare a function with a name of your choice (say
`foo`), but have clang treat it as if it were a builtin function (say
`__builtin_foo`), by writing

  static __inline__ __attribute__((__clang_arm_mve_alias(__builtin_foo)))
  int foo(args);

I'm intending to use this for the ACLE intrinsics for MVE, which have
to be polymorphic on their argument types and also need to be
implemented by builtins. To avoid having to implement the polymorphism
with several layers of nested _Generic and make error reporting
hideous, I want to make all the user-facing intrinsics correspond
directly to clang builtins, so that after clang resolves
__attribute__((overloadable)) polymorphism it's already holding the
right BuiltinID for the intrinsic it selected.

However, this commit itself just introduces the new attribute, and
doesn't use it for anything.

To avoid unanticipated side effects if this attribute is used to make
aliases to other builtins, there's a restriction mechanism: only
(BuiltinID, alias) pairs that are approved by the function
ArmMveAliasValid() will be permitted. At present, that function
doesn't permit anything, because the Tablegen that will generate its
list of valid pairs isn't yet implemented. So the only test of this
facility is one that checks that an unapproved builtin _can't_ be
aliased.

Reviewers: dmgreen, miyuki, ostannard

Subscribers: cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D67159


  Commit: 08074cc96557dcb7ec91d7cd84c412414fa9a516
      https://github.com/llvm/llvm-project/commit/08074cc96557dcb7ec91d7cd84c412414fa9a516
  Author: Simon Tatham <simon.tatham at arm.com>
  Date:   2019-10-24 (Thu, 24 Oct 2019)

  Changed paths:
    M clang/include/clang/Basic/BuiltinsARM.def
    M clang/include/clang/Basic/CMakeLists.txt
    M clang/include/clang/Basic/DiagnosticSemaKinds.td
    A clang/include/clang/Basic/arm_mve.td
    A clang/include/clang/Basic/arm_mve_defs.td
    M clang/include/clang/Sema/Sema.h
    M clang/lib/CodeGen/CGBuiltin.cpp
    M clang/lib/CodeGen/CodeGenFunction.h
    M clang/lib/Headers/CMakeLists.txt
    M clang/lib/Sema/SemaChecking.cpp
    M clang/lib/Sema/SemaDeclAttr.cpp
    M clang/lib/Sema/SemaType.cpp
    A clang/test/CodeGen/arm-mve-intrinsics/scalar-shifts.c
    A clang/test/CodeGen/arm-mve-intrinsics/vadc.c
    A clang/test/CodeGen/arm-mve-intrinsics/vaddq.c
    A clang/test/CodeGen/arm-mve-intrinsics/vcvt.c
    A clang/test/CodeGen/arm-mve-intrinsics/vld24.c
    A clang/test/CodeGen/arm-mve-intrinsics/vldr.c
    A clang/test/CodeGen/arm-mve-intrinsics/vminvq.c
    M clang/utils/TableGen/CMakeLists.txt
    A clang/utils/TableGen/MveEmitter.cpp
    M clang/utils/TableGen/TableGen.cpp
    M clang/utils/TableGen/TableGenBackends.h

  Log Message:
  -----------
  [clang,ARM] Initial ACLE intrinsics for MVE.

This commit sets up the infrastructure for auto-generating <arm_mve.h>
and doing clang-side code generation for the builtins it relies on,
and demonstrates that it works by implementing a representative sample
of the ACLE intrinsics, more or less matching the ones introduced in
LLVM IR by D67158,D68699,D68700.

Like NEON, that header file will provide a set of vector types like
uint16x8_t and C functions with names like vaddq_u32(). Unlike NEON,
the ACLE spec for <arm_mve.h> includes a polymorphism system, so that
you can write plain vaddq() and disambiguate by the vector types you
pass to it.

Unlike the corresponding NEON code, I've arranged to make every user-
facing ACLE intrinsic into a clang builtin, and implement all the code
generation inside clang. So <arm_mve.h> itself contains nothing but
typedefs and function declarations, with the latter all using the new
`__attribute__((__clang_builtin))` system to arrange that the user-
facing function names correspond to the right internal BuiltinIDs.

So the new MveEmitter tablegen system specifies the full sequence of
IRBuilder operations that each user-facing ACLE intrinsic should
translate into. Where possible, the ACLE intrinsics map to standard IR
operations such as vector-typed `add` and `fadd`; where no standard
representation exists, I call down to the sample IR intrinsics
introduced in an earlier commit.

Doing it like this means that you get the polymorphism for free just
by using __attribute__((overloadable)): the clang overload resolution
decides which function declaration is the relevant one, and _then_ its
BuiltinID is looked up, so by the time we're doing code generation,
that's all been resolved by the standard system. It also means that
you get really nice error messages if the user passes the wrong
combination of types: clang will show the declarations from the header
file and explain why each one doesn't match.

(The obvious alternative approach would be to have wrapper functions
in <arm_mve.h> which pass their arguments to the underlying builtins.
But that doesn't work in the case where one of the arguments has to be
a constant integer: the wrapper function can't pass the constantness
through. So you'd have to do that case using a macro instead, and then
use C11 `_Generic` to handle the polymorphism. Then you have to add
horrible workarounds because `_Generic` requires even the untaken
branches to type-check successfully, and //then// if the user gets the
types wrong, the error message is totally unreadable!)

Reviewers: dmgreen, miyuki, ostannard

Subscribers: mgorny, javed.absar, kristof.beyls, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D67161


  Commit: e5f485c3bd9c719b4d78524a5b18c1d2524b62bf
      https://github.com/llvm/llvm-project/commit/e5f485c3bd9c719b4d78524a5b18c1d2524b62bf
  Author: Simon Tatham <simon.tatham at arm.com>
  Date:   2019-10-24 (Thu, 24 Oct 2019)

  Changed paths:
    M llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
    A llvm/test/CodeGen/Thumb2/mve-intrinsics/vadc-multiple.ll

  Log Message:
  -----------
  [InstCombine] Known-bits optimization for ARM MVE VADC.

The MVE VADC instruction reads and writes the carry bit at bit 29 of
the FPSCR register. The corresponding ACLE intrinsic is specified to
work with an integer in which the carry bit is stored at bit 0. So if
a user writes a code sequence in C that passes the carry from one VADC
to the next, like this,

    s0 = vadcq_u32(a0, b0, &carry);
    s1 = vadcq_u32(a1, b1, &carry);

then clang will generate IR for each of those operations that shifts
the carry bit up into bit 29 before the VADC, and after it, shifts it
back down and masks off all but the low bit. But in this situation
what you really wanted was two consecutive VADC instructions, so that
the second one directly reads the value left in FPSCR by the first,
without wasting several instructions on pointlessly clearing the other
flag bits in between.

This commit explains to InstCombine that the other bits of the flags
operand don't matter, and adds a test that demonstrates that all the
code between the two VADC instructions can be optimized away as a
result.

Reviewers: dmgreen, miyuki, ostannard

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67162


Compare: https://github.com/llvm/llvm-project/compare/b2a65f0d70f5...e5f485c3bd9c


More information about the All-commits mailing list