[llvm-dev] Scalable Vector Types in IR - Next Steps?

Tue Mar 12 04:35:47 PDT 2019

Hi all,

Thanks Renato for the prod.

We (Arm) have had more off-line discussions with some members of the
community and they have expressed some reservations on adding scalable
vectors as a first class type. They have proposed an alternative to enable
support for C-level intrinsics and autovectorization for SVE.

While Arm's preference is still to support VLA autovec in LLVM (and not just
for SVE; we'll continue the discussion around the RFC), we are evaluating the
details of this alternative -- SVE-capable hardware will begin shipping within
the next couple of years, so we would like to support at least some
autovectorization as well as the C intrinsics by the time that happens.

This alternative proposal has two parts:

  * For the SVE ACLE (C-language extension intrinsics), use an opaque type
    (similar to x86_mmx, but unsized) and just pass intrinsics straight
    through to the backend. This would give users the ability to write
    vector length agnostic (VLA) code for SVE without resorting to assembly.
  * For SVE autovectorization, use fixed length autovec -- either for a
    user-specified length, or multiversioned to different fixed lengths.

I've spent some time over the last month prototyping an opaque type SVE C
intrinsic implementation to see what it would look like; here's my notes so far:

  * I initially tried to use a single unsized opaque type.

  * I ran into problems with using just a single type, since predicates use
      different registers and I couldn't find a nice way of reassigning all
      the register classes cleanly.
    - I added a second opaque type to represent predicates as a result
    - This could be avoided if we added subtype info to the opaque type
      (basically minimum element count and elt type); this would mean that
      we would either need to represent the count and element type in
      a serialized IR form, or that the IR reader would need to be able
      to reconstruct the types by reading the types from the intrinsic name

  * I ran into a problem with the opaque types being unsized -- the C
    intrinsic variables are declared as locals and clang plants
    alloca/load/store IR instructions for them
    - Could special case alloca/load/store for these types, but that's very
      intrusive and liable to break in future
    - Could introduce a special 'alloca intrinsic', but would require quite
      a bit of code in clang to diverge as well as a custom mem2reg-like
      pass just for these types
    - I ended up making them sized, but with a size of 0. I don't know if
      there's a problem I'll run into later on by doing this.
    - While 'load' and 'store' IR instructions are fine for spill/fill memory
      operations on the stack, we need to use intrinsics for everything else
      since we need to know the size of individual elements -- while there
      might not be many big-endian systems in operation, we still need to
      support that.

  * I reused the same (clang-level) builtin type mechanism that OpenCL does
    for the SVE C-level types, and just codegen to the two LLVM types

I now have a minimal end-to-end implementation for a small set of SVE C
intrinsics. I have some additional observations based on our downstream
intrinsic implementation:

  * Our initial downstream implementation attempted to do everything in
    intrinsics, so would be similar to the opaque type version. However,
    we found that we missed several optimizations in the process. Part of
    this is due to the intrinsics being higher-level than the instructions
    -- things like addressing modes are not represented in the intrinsics,
    and with a pure intrinsic approach we miss things like LSR
    optimizations.

  * We also thought that the need for custom extensions for optimizations
    like instcombine on SVE intrinsics would be reduced since someone using
    the intrinsics is already going to the trouble of hand-optimizing their
    code, but we hadn't appreciated that using C++ templates with constant
    parameters and other methods of code generation would be common. As a
    result, we now have user requests that operations like 'svmul(X, 1.0)'
    be recognized and folded away, and are trying to find better
    representations, including lowering to normal IR operations in some cases.

  * Some operations can't be represented cleanly in current IR, but should
    work well with Simon Moll's vector predication proposal.

Any feedback? I've posted my (very rough) initial work to phabricator:

clang: https://reviews.llvm.org/D59245
llvm: https://reviews.llvm.org/D59246

-Graham

> On 8 Mar 2019, at 16:08, Renato Golin <rengolin at gmail.com> wrote:
>
> Hi folks,
>
> We seem to be converging on how the representation of scalable vectors
> will be implemented in IR, and we also have support for such vectors
> in the AArch64 back-end. We're also fresh out of the release process
> and have a good number of months to hash out potential problems until
> next release. What are the next steps to get this merged into trunk?
>
> Given this is a major change to IR, we need more core developers
> reviews before approving. The current quasi-consensus means now it's
> the time for you to look closer. :)
>
> This change per se shouldn't change how any of the passes or lowering
> behave, but it will introduce the ability to break things in the
> future. Unlike the "new pass manager", we can't create a "new IR", so
> it needs to be a change that everyone is conscious and willing to take
> on the project to stabilize it until the next release.
>
> Here are some of the reviews on the matter, mostly agreed upon by the
> current reviewers:
> https://reviews.llvm.org/D32530
> https://reviews.llvm.org/D53137
> https://reviews.llvm.org/D47770
>
> And the corresponding RFC threads:
> http://lists.llvm.org/pipermail/llvm-dev/2016-November/106819.html
> http://lists.llvm.org/pipermail/llvm-dev/2017-March/110772.html
> http://lists.llvm.org/pipermail/llvm-dev/2017-June/113587.html
> http://lists.llvm.org/pipermail/llvm-dev/2018-April/122517.html
> http://lists.llvm.org/pipermail/llvm-dev/2018-June/123780.html
>
> There is also an ongoing discussion about vector predication, which is
> related but not depending on the scalable representation in IR:
> https://reviews.llvm.org/D57504
>
> And the corresponding RFC thread:
> http://lists.llvm.org/pipermail/llvm-dev/2019-January/129791.html
>
> cheers,
> --renato

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.