[llvm-dev] [RFC] NEC SX-Aurora VE backend

Thu Apr 4 15:02:55 PDT 2019

Hello,

we’d like to propose the integration of a new backend into LLVM: NEC
SX-Aurora TSUBASA Vector Engine (VE). We hope to get some feedback here
and at EuroLLVM about the path and proper procedure of merging.

The SX-Aurora VE is a Vector CPU on an PCI-E Accelerator card. It has
48GB memory from six HBM2 stacks, accessible with 1.2TB/s bandwidth, 8
cores with vector and scalar units, each. The cores share a last level
cache (LLC) and have 64 scalar registers and a normal scalar unit with
two levels of caches as well as 64 long vector registers (256 x 64
bits), 16 vector mask registers and a vector length register. The VE can
run HPC and AI loads with high efficiency provided that the code is well
vectorized.

The VE was released officially in March 2018 and is currently available
integrated into appropriate servers containing 1, 2, 4 or 8 VEs
https://www.nec.com/en/global/solutions/hpc/sx/. An SDK is provided,
containing optimized mathematical libraries, proprietary C, C++ and
Fortran compilers as well as a proprietary MPI implementation capable of
communicating over PCIe and in PeerDirect manner over Infiniband. The
main programming model involves running code natively on the VEs, OpenMP
or MPI parallel, with systemcall execution offloaded to the host. Hybrid
programming is possible through offloading kernels from the main program
running on the host CPU to the VEs (accelerator model), or offloading
execution from the VE programs to the host (reverse offloading).

In addition to the proprietary compilers an effort has been started to
create a VE backend on LLVM https://github.com/as-aurora-dev/llvm.
Initially aimed at compiler and vectorization research, it has reached a
level which we would like to merge into the upstream LLVM repositories.
The state of this development and a sketch of the development plan is
summarized below:

1. Assembler, VE CodeGen infrastructure (registers, instruction
encodings): We plan to use the assembler and linker from the
freely-available binutils-ve package. With them LLVM becomes the first
freely-available compiler for VE, and it enables users to use VEs
without any proprietary software. As next step, we will implement
assembler and linker (lld) in llvm like in other backends.

2. Scalar code backend that produces correct code (capable of handling
vector intrinsics): A complete scalar code backend for C/C++ is already
implemented. We passed all regression tests, check-llvm and check-clang,
and almost all of the test-suite. We are now working to pass the full
test-suite. In this backend we are providing vector intrinsic functions
that cover almost fully the SX-Aurora vector ISA including masked vector
instructions. It helps experienced programmers to write highly optimized
code.

3. Merging LLVM-VP developments (https://reviews.llvm.org/D57504) for
proper vectorization: We plan to use LLVM-VP as the core infrastructure
for vectorization. (Saarland University is working on the LLVM-VP
backend for VE).

4. Vectorization improvements: We welcome everybody who is interested in
explicit vector length ISA and would like to develop and work on an off
the shelf machine. Detailed information about ISA and Architecture is
available at https://www.hpc.nec/documents/. We would like to support
researchers who will develop new vectorization technology for true
vector processor.

Finally: In order to facilitate the integration into the LLVM tree we
intend to provide access to the LLVM community to an Aurora build server
for nightly builds and regression testing.

Erich Focht (NEC), Kazuhisa Ishizaka (NEC), Kazushi Marukawa (NEC),
Simon Moll (CDL Saarland University)