[llvm-dev] An update on scalable vectors in LLVM

Wed Nov 11 14:06:23 PST 2020

Hi all,

It's been a while since we've given an update on scalable vector support in LLVM. Over the last 12 months a lot of work has been done to make LLVM cope with scalable vectors. This effort is now starting to bear fruit with LLVM gaining more capabilities, including an intrinsics interface for AArch64 SVE/SVE2, LLVM IR Codegen for scalable vectors, and several loop-vectorization prototypes that show the ability to vectorize with scalable VFs.

Assuming not everyone is following this effort closely, people will undoubtably have seen some of the changes in the code-base around this, so here is a brief update to give this effort some wider visibility.

This email is structured as follows:
* Regular Sync-up meetings
* Changes made to represent scalable vectors in the C++ codebase
  * ElementCount and VectorType type hierarchy
  * Migrating to TypeSize
  * StackOffset
* What works for scalable vectors today?
* What’s next?
* Concluding
* Acknowledgements

Regular Sync-up meetings:
=========================

Adding scalable vector support to Clang and LLVM is an effort spanning multiple people and organizations. Over the past year, we have been meeting every two weeks to discuss plans, progress, issues, questions and design choices while we try to find suitable ways to make LLVM work with scalable vectors.

The meeting initially started as a platform to discuss support for scalable vectors in the context of Arm’s Scalable Vector Extensions (SVE/SVE2), but the scope of the topics covers scalable vectors in broader contexts. Engineers from SiFive and Barcelona’s Supercomputer Centre who work on the RISC-V vector extension are active participants.

The next meeting is scheduled (tomorrow), Thursday 12 November, at 3pm GMT / 7am PST.

If you want to participate, sign up here:

  https://docs.google.com/document/d/1SODSKta18QHofMaZIZWn1PIkieHQTqOGk3W1zu4fnjQ/edit?usp=sharing

The meeting invite for the next meeting can be found here:

  https://docs.google.com/document/d/1IEsoRMGC-f6Gg585naWSbKx-lqfGf4rkmLrzDt0XBHo/edit?usp=sharing

Changes made to represent scalable vectors in the C++ codebase:
===============================================================

ElementCount and VectorType type hierarchy:
-------------------------------------------

To represent the number of elements in a vector, the class ElementCount is used to describe N lanes of a fixed-width vector or vscale * N lanes of a scalable vector. Initially, we added the `ElementCount VectorType::getElementCount()` interface alongside `unsigned VectorType::getNumElements()`.

In March this year, Christopher Tetreault took on the monumental task to distinguish vector types in the codebase by modifying the type hierarchy for VectorType (http://lists.llvm.org/pipermail/llvm-dev/2020-March/139811.html). Chris added the FixedVectorType and ScalableVectorType classes and worked to deprecate `unsigned VectorType::getNumElements()`, because that method is only relevant in the context of FixedVectorType. In our meetings we discussed a deprecation cycle so that downstream projects can adapt to the new changes. As of August 31st, that method was marked deprecated (https://reviews.llvm.org/D78127) and this interface will be removed after branching for LLVM 12.

Migrating to TypeSize:
----------------------

Similar to ElementCount, we need a way to represent sizes of scalable types. For this reason the class TypeSize was added. We have changed `EVT::getSizeInBits()` and `DataLayout::getType*SizeInBits()` to return a `TypeSize` instead of an `unsigned`.

Because querying for sizes is so common in CodeGen passes and there is basically only one interface to query those sizes for the ‘flat’ EVT/MVT type structures, this means a lot more code needs migrating to handle the new TypeSize. To avoid making this change in one go, we've added a conversion method to TypeSize, that implicitly converts to `uint64_t`. That means that existing code that assumes a non-scalable size still compiles, e.g.:

   unsigned Size = VT.getSizeInBits();
  .. existing code doing something with Size ..

https://reviews.llvm.org/D75297 added a compile-time warning if VT is a scalable vector. Code like shown above is written with the assumption that Size is not scalable, although in many cases that doesn’t necessarily mean the code is actually broken for scalable vectors, since reasoning about the ‘minimum known size’ is often sufficient. See for example https://reviews.llvm.org/D86697 where most of the changes are about using a different type and interface, even though the behaviour isn't really changed.

To identify common code-paths that lead to such warnings or erroneous behaviour, we have been compiling code using the C/C++ intrinsics for AArch64 SVE/SVE2. Using this approach, we’ve already fixed many code-paths in LLVM to either cope with scalable vectors (e.g. https://reviews.llvm.org/D80139), to bail out early (e.g. https://reviews.llvm.org/D87439), or to use the proper interfaces as in D86697. Here using the proper interfaces means asking for `getFixedSize()` if the code is only supposed to work for fixed-sized types, and asking for `getKnownMinSize()` if it works for both fixed and scalable types (and thus only relative sizes are required for the algorithm to work).

The plan is to remove the implicit conversion entirely in the future, at which point we can probably state that LLVM supports scalable vectors.

StackOffset:
------------

Scalable `alloca`s and vector spills/fills live on the same stack frame as fixed-sized objects. To support that, a stack offset needs to be comprised of a combination of a fixed and scalable part. For that we reason we created a new class StackOffset (https://reviews.llvm.org/D88982).

AArch64 uses Stack-IDs to keep fixed- and scalable types apart, but AArch64FrameLowering finally allocates them together in the regular stack frame. StackOffset is returned by e.g. `getFrameIndexReference`, and is used in AArch64FrameLowering/AArch64RegisterInfo to calculate and resolve frame offsets.

What works for scalable vectors today?
======================================

Today AArch64 SVE/SVE2 is probably the target with the most complete support, although
recently Roger Ferrer also shared a proposal on adding codegen for the RISCV V extension (http://lists.llvm.org/pipermail/llvm-dev/2020-October/145850.html).

* For AArch64 SVE/SVE2 we implemented support for the Arm’s C/C++ level intrinsics (https://developer.arm.com/documentation/100987/). This required changes to implement the calling convention, spilling and filling, adding a lot of the intrinsics to LLVM and Clang, and fixing up many code-paths that didn’t yet support scalable vectors. Most of that functionality already made it into LLVM 11.

* In anticipation of auto-vectorization we started adding support for SVE/SVE2 code generation from regular LLVM IR instructions. Most common ISD nodes are now supported, with MGATHER/MSCATTER and strict FP reductions still work in progress.
For this effort, we’ve needed to fix up type legalization in SelectionDAG to work on scalable vectors along the way. See for example https://reviews.llvm.org/D85754 and https://reviews.llvm.org/D79587.

* To emit code for fixed-width vectors, we can reuse most of the codegenerator and patterns that we use for scalable vectors. We can use this capability when targeting a specific CPU, or for a range of CPUs when a flag is passed that gives information about the min/max vector-width. As an example of how this was implemented, see e.g. https://reviews.llvm.org/D85117, where the lowering code generates a predicate for the specified VL and passes it to the instruction.

* We extended the meaning of ISD::EXTRACT_SUBVECTOR and ISD::INSERT_SUBVECTOR in the context of scalable vectors, so that it is possible to insert/extract a fixed-width vector into or from a scalable vector. (https://reviews.llvm.org/D79806)

* We added an intrinsic for vscale and corresponding ISD::VSCALE (https://reviews.llvm.org/D68203)

What’s next?
============

The next big thing - and probably the most exciting one - will be auto-vectorization using scalable vectors.

Vineet Kumar already did an excellent job describing the styles of vectorization for scalable vectors in his recent proposal (http://lists.llvm.org/pipermail/llvm-dev/2020-November/146319.html).

To summarise, there are three styles of vectorization:

1. Unpredicated vector body, scalar tail.
2. Predicated vector body, with scalar tail loop folded into the vector body.
3. Unpredicated vector body, predicated vector tail.

We (Arm) prefer starting out with adding support for 1 in upstream LLVM, because it is the easiest to support and gives a lot of ‘bang for buck’ that will help us incrementally add more scalable auto-vec capabilities to the vectorizer. A proof of concept of what this style of vectorization requires was shared on Phabricator recently: https://reviews.llvm.org/D90343.

Barcelona Supercomputer Centre shared a proof of concept for style 2 that uses the Vector Predication Intrinsics proposed by Simon Moll (VP: https://reviews.llvm.org/D57504, link to the POC: https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi). In the past Arm has shared an alternative implementation of 2 which predates the Vector Predication intrinsics (https://reviews.llvm.org/D87056).

The three vectorization approaches are all complementary and in the end we want to support all styles. For now, these approaches have a lot in common as they all require a lot of the same ‘plumbing’ in the vectorizer to propagate ElementCount, consider a scalable VF for VPlans and need modifications to the cost-model to work with scalable vectors.

Some preparatory work to return an Invalid cost value (needed when asked if a scalable VF can be scalarized), was proposed by David Sherwood last week (http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html).

Concluding:
===========

Looking back at where we were a year ago, we are now starting to see that auto-vectorization capabilities are not that far away. Hopefully in a couple of months we’ll be able to slowly enable more scalable vectorization and work towards building LNT with scalable vectors enabled. When that becomes sufficiently stable, we can consider gearing up a BuildBot to help guard any new changes we make for scalable vectors.

Acknowledgements:
=================

I want to take a moment to thank everyone who has been involved in this effort so far. A big shout to Christopher Tetreault for his work on VectorType, and Eli Friedman for a lot of the code-reviews! Also a lot of thanks to Paul Walker for sharing an early prototype for fixed-width codegen and Cameron McInally for taking on the vector reductions and other ISD nodes for fixed-width codegen for SVE/SVE2! Also thanks to Muhammad Asif Manzoor for help with codegen for FP operations, and David Greene for looking into new register constraints for SVE. Thanks to John McCall for engaging with us on having a deprecation cycle for the changes to VectorType. Then also a shout out for some other reviewers, including Sjoerd Meijer, David Green, Richard Sandiford, Oliver Stannard, Vineet Kumar and Renato Golin!

And last but not least the members from our Arm team (David Sherwood, Cullen Rhodes, Kerry McLaughlin, Francesco Petrogalli, Caroline Concatto, Graham Hunter, Andrzej Warzynski) for their contributions over the past 12+ months!

Apologies if I have forgotten anyone else who has contributed, either with patches, reviews or other feedback, there has been so much work and patches going on the past year, but I can guarantee your contributions have been appreciated!

Thank you,

Sander