[llvm-dev] RFC: Adding Support For Vectorcall Calling Convention

Ben Simhon, Oren via llvm-dev llvm-dev at lists.llvm.org
Wed Nov 30 07:20:48 PST 2016


Adding Support For Vectorcall Calling Convention
=====================================================

Vectorcall Calling Convention for x64
----------------------------------------------------
The __vectorcall calling convention specifies that arguments to
functions are to be passed in registers, when possible. __vectorcall
uses more registers for arguments than __fastcall or the default x64
calling convention use. The __vectorcall calling convention is only
supported in native code on x86 and x64 processors that include
Streaming SIMD Extensions 2 (SSE2) and above.

The Definition of HVA Types
--------------------------------------
A Homogeneous Vector Aggregate (HVA) type is a composite type of up
to four data members that have identical vector types. An HVA type has
the same alignment requirement as the vector type of its members.

For example:
    typedef struct {
    __m256 x;
    __m256 y;
    __m256 z;
    } hva3; // HVA type with 3 __m256 elements

Vectorcall Extension
----------------------------
Vectorcall extends the standard x64 calling convention while adding
support for HVA and vector types.

There are four main differences:
-  Floating-point types are considered vector types just like __m128,
       __m256 and __m512. The first 6 vector typed arguments are
       saved in physical registers XMM0/YMM0/ZMM0 until XMM5/YMM5/ZMM5.
-  After vector types and integer types are allocated, HVA types are
       allocated, in ascending order, to unused vector registers
       XMM0/YMM0/ZMM0 to XMM5/YMM5/ZMM5.
-  Just like in the default x65 CC, Shadow space is allocated for
       vector/HVA types. The size is fixed to 8 bytes per argument.
-  HVA types are returned in XMM0/YMM0/ZMM0 to XMM3/YMM3/ZMM3 while
       vector types are returned in XMM0/YMM0/ZMM0 and integers in RAX

For more information or examples please see also:
https://msdn.microsoft.com/en-us/library/dn375768.aspx

Observations
------------------
-  LLVM IR must preserve the original position of the arguments.
-  Since HVA structures are allocated in lower priority than vector
       types, the vector types should be allocated first. Hence, one
       pass on the argument list is not sufficient anymore, because HVA
       structures are allocated on a second pass.

Issues in Clang
--------------------
Structure Expansion
~~~~~~~~~~~~~~~~~~~
The current clang implementation expends HVA structures into multiple
vector types.

For example:
C code: int __vectorcall foo(hva3 a);
LLVM IR Output: define x86_vectorcallcc i32 @foo(__m256 %a.0, __m256 %a.1, __m256 %a.2);
*The example omits the decoration that is added to the function name

Thus the backend can't differentiate between expended HVA structures and
simple vector types, and doesn't know the original position of each
parameter in the argument list.

We cannot rely on debug information or updated argument names to
identify HVA structures.

HVA Classification
~~~~~~~~~~~~~~~~~~
Clang should understand if each HVA should be expended. In other words,
the FE should know if an HVA structure should be passed by value (by
codegen) or passed indirect.

The current implementation doesn't follow the two argument list rounds
concept of vectorcall, in which Clang first goes over integer and vector
types and only after that over the HVA types. As a result the HVA
structures are passed incorrectly.

Proposed Solution
--------------------------
The ABI in LLVM IR must provide argument position. The information is
important in order to allocate the correct physical register.

The information can be achieved by passing HVA structures by value. It
will replace the existing expansion of the HVA structure arguments.

For Example:
Instead of: define x86_vectorcallcc i32 @foo(__m256 %a.0, __m256 %a.1, __m256 %a.2);
Pass the following: define x86_vectorcallcc i32 @foo(%struct.hva3 %a);

CodeGen needs to know if the structure is an HVA.
There are four possible ways to solve that:

1. CodeGen will analyze the structures just like currently done in clang
   in order to identify HVA structures

2. CodeGen can assume that structure arguments passed by value (not
   expended) are HVA structures

3. Clang will use an existing attribute that will mark that this HVA
   should be passed in registers.

4. Clang will pass a new attribute that will indicate if this is an HVA
   structure that should be expended and passed in register

I propose to use the third option.
The existing attribute "InReg" has similar meaning (argument should be
saved in register) and is defined to be target specific.

Other reasons why I prefer this option are:
- Avoiding code duplication between clang and codegen
- Avoiding making assumptions that are not necessarily true (for example
"long double _Complex" type that is passed by structure as well) or
might be violated in the future
- Avoiding adding new keywords that are not necessary.

In case we encounter a structure passed by value with an InReg flag set,
we can surely assume that this is an HVA.

I will be happy to get your comments or inputs on vectorcall calling convention and
the suggested solution.

Thanks,
Oren

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161130/45b4e4dc/attachment.html>


More information about the llvm-dev mailing list