[llvm-dev] RFC: Adding Support For Vectorcall Calling Convention

Reid Kleckner via llvm-dev llvm-dev at lists.llvm.org
Wed Nov 30 08:42:37 PST 2016


Don't we already implement this correctly on Windows?

I agree Clang should do the HVA classification. LLVM just doesn't have the
information. Right now, Clang splits HVAs passed in registers and passes
other structs or HVAs that don't fit in the available vector registers with
byval.

Traditionally, Clang has tried very hard to split aggregates passed by
value to make LLVM's job easier. Your proposal undoes a lot of that, but
that seems to be the direction we're going today. See ARM and AArch64,
which pass HVAs as arrays.

I think either your suggestion of the array suggestion are improvements
over the current situation. One problem with passing the LLVM struct type
directly and marking it inreg is that it might be hard for the backend to
figure out what the HVA element type is. The array convention solves this
because the element type is obvious.

On Wed, Nov 30, 2016 at 7:20 AM, Ben Simhon, Oren via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Adding Support For Vectorcall Calling Convention
>
> =====================================================
>
>
>
> Vectorcall Calling Convention for x64
>
> ----------------------------------------------------
>
> The __vectorcall calling convention specifies that arguments to
>
> functions are to be passed in registers, when possible. __vectorcall
>
> uses more registers for arguments than __fastcall or the default x64
>
> calling convention use. The __vectorcall calling convention is only
>
> supported in native code on x86 and x64 processors that include
>
> Streaming SIMD Extensions 2 (SSE2) and above.
>
>
>
> The Definition of HVA Types
>
> --------------------------------------
>
> A Homogeneous Vector Aggregate (HVA) type is a composite type of up
>
> to four data members that have identical vector types. An HVA type has
>
> the same alignment requirement as the vector type of its members.
>
>
>
> For example:
>
>     typedef struct {
>
>     __m256 x;
>
>     __m256 y;
>
>     __m256 z;
>
>     } hva3; // HVA type with 3 __m256 elements
>
>
>
> Vectorcall Extension
>
> ----------------------------
>
> Vectorcall extends the standard x64 calling convention while adding
>
> support for HVA and vector types.
>
>
>
> There are four main differences:
>
> -  Floating-point types are considered vector types just like __m128,
>
>        __m256 and __m512. The first 6 vector typed arguments are
>
>        saved in physical registers XMM0/YMM0/ZMM0 until XMM5/YMM5/ZMM5.
>
> -  After vector types and integer types are allocated, HVA types are
>
>        allocated, in ascending order, to unused vector registers
>
>        XMM0/YMM0/ZMM0 to XMM5/YMM5/ZMM5.
>
> -  Just like in the default x65 CC, Shadow space is allocated for
>
>        vector/HVA types. The size is fixed to 8 bytes per argument.
>
> -  HVA types are returned in XMM0/YMM0/ZMM0 to XMM3/YMM3/ZMM3 while
>
>        vector types are returned in XMM0/YMM0/ZMM0 and integers in RAX
>
>
>
> For more information or examples please see also:
>
> https://msdn.microsoft.com/en-us/library/dn375768.aspx
>
>
>
> Observations
>
> ------------------
>
> -  LLVM IR must preserve the original position of the arguments.
>
> -  Since HVA structures are allocated in lower priority than vector
>
>        types, the vector types should be allocated first. Hence, one
>
>        pass on the argument list is not sufficient anymore, because HVA
>
>        structures are allocated on a second pass.
>
>
>
> Issues in Clang
>
> --------------------
>
> Structure Expansion
>
> ~~~~~~~~~~~~~~~~~~~
>
> The current clang implementation expends HVA structures into multiple
>
> vector types.
>
>
>
> For example:
>
> C code: int __vectorcall foo(hva3 a);
>
> LLVM IR Output: define x86_vectorcallcc i32 @foo(__m256 %a.0, __m256 %a.1,
> __m256 %a.2);
>
> *The example omits the decoration that is added to the function name
>
>
>
> Thus the backend can't differentiate between expended HVA structures and
>
> simple vector types, and doesn't know the original position of each
>
> parameter in the argument list.
>
>
>
> We cannot rely on debug information or updated argument names to
>
> identify HVA structures.
>
>
>
> HVA Classification
>
> ~~~~~~~~~~~~~~~~~~
>
> Clang should understand if each HVA should be expended. In other words,
>
> the FE should know if an HVA structure should be passed by value (by
>
> codegen) or passed indirect.
>
>
>
> The current implementation doesn’t follow the two argument list rounds
>
> concept of vectorcall, in which Clang first goes over integer and vector
>
> types and only after that over the HVA types. As a result the HVA
>
> structures are passed incorrectly.
>
>
>
> Proposed Solution
>
> --------------------------
>
> The ABI in LLVM IR must provide argument position. The information is
>
> important in order to allocate the correct physical register.
>
>
>
> The information can be achieved by passing HVA structures by value. It
>
> will replace the existing expansion of the HVA structure arguments.
>
>
>
> For Example:
>
> Instead of: define x86_vectorcallcc i32 @foo(__m256 %a.0, __m256 %a.1,
> __m256 %a.2);
>
> Pass the following: define x86_vectorcallcc i32 @foo(%struct.hva3 %a);
>
>
>
> CodeGen needs to know if the structure is an HVA.
>
> There are four possible ways to solve that:
>
>
>
> 1. CodeGen will analyze the structures just like currently done in clang
>
>    in order to identify HVA structures
>
>
>
> 2. CodeGen can assume that structure arguments passed by value (not
>
>    expended) are HVA structures
>
>
>
> 3. Clang will use an existing attribute that will mark that this HVA
>
>    should be passed in registers.
>
>
>
> 4. Clang will pass a new attribute that will indicate if this is an HVA
>
>    structure that should be expended and passed in register
>
>
>
> I propose to use the third option.
>
> The existing attribute "InReg" has similar meaning (argument should be
>
> saved in register) and is defined to be target specific.
>
>
>
> Other reasons why I prefer this option are:
>
> - Avoiding code duplication between clang and codegen
>
> - Avoiding making assumptions that are not necessarily true (for example
>
> "long double _Complex" type that is passed by structure as well) or
>
> might be violated in the future
>
> - Avoiding adding new keywords that are not necessary.
>
>
>
> In case we encounter a structure passed by value with an InReg flag set,
>
> we can surely assume that this is an HVA.
>
>
>
> I will be happy to get your comments or inputs on vectorcall calling
> convention and
>
> the suggested solution.
>
>
>
> Thanks,
>
> Oren
>
>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161130/567e4216/attachment.html>


More information about the llvm-dev mailing list