[cfe-dev] [llvm-dev] [RFC] Expose user provided vector function for auto-vectorization.

Simon Moll via cfe-dev cfe-dev at lists.llvm.org
Tue Jun 4 05:27:13 PDT 2019


Hi,

I think this is going in the right direction. Please do not tie the 
vector-variant mechanism to closely to either VectorABI or OpenMP. We 
already know that there is more we could do beyond "map"-like SIMD 
functions. Besides, i guess it makes sense to compile a list of use 
cases to validate the current design and look ahead. It is easy to lose 
track of the requirements if it's just a trail of emails.

On 6/3/19 7:59 PM, Francesco Petrogalli via cfe-dev wrote:
> Hi All,
>
> The original intend of this thread is to "Expose user provided vector function for auto-vectorization.”
>
> I originally proposed to use OpenMP `declare variant` for the sake of using something that is defined by a standard. The RFC itself is not about fully implementing the `declare variant` directive. In fact, given the amount of complication it is bringing, I would like to move the discussion away from `declare variant`. Therefore, I kindly ask to move any further discussion about `declare variant` to a separate thread.
>
> I believe that to "Expose user provided vector function for auto-vectorization” we need three components.
>
> 1. The main component is the IR representation we want to give to this information. My proposal is to use the `vector-variant` attribute with custom symbol redirection.
>
> 	vector-variant = {“_ZGVnN2v_f(custon_vector_f_2), _ZGVnN4v_f(custon_vector_f_4)”}
>
> The names here are made of the Vector Function ABI mangled name, plus custom symbol redirection in parenthesis. I believe that themes mangled according to the Vector Function ABI have all the information needed to build the signature of the vector function and the properties of its parameters (linear, uniform, aligned…). This format will cover most (if not all) the cases that are needed for auto-vectorization. I am not aware of any situation in which this information might not be sufficient. Please provide such an example if you know of any.

Does the vector variant inherit all function and parameter attributes 
from the scalar function? This should work ok for map-like SIMD 
arithmetic. However, in light of functions beyond SIMD arithmetic, i 
think the RFC should specify clearly what we may assume about a 
vector-variant given its name and scalar function declaration.

By building the vector-variant mechanism around the current 
VectorABI/OpenMP, we are also inheriting their limitations, such as:

1. A vector variant may return a scalar value with a property (linear, 
uniform, aligned, ..). For example, this may the case for custom 
reduction functions (my_reduction_operator(<8 x double> %v) --> double).

2. User-defined vector variants may take the mask at a different 
parameter position than required by VectorABI. LLVM-VP solves this by 
introducing the "mask" attribute for function parameters 
(https://reviews.llvm.org/D57504).

3. Upcoming Vector/SIMD ISAs such as the V-extension and NEC SX-Aurora 
have an active vector length besides just the mask. What ever solution 
out of this RFC should accommodate for that. Just as for the mask, 
LLVM-VP provides an parameter attribute for the vector length "vlen".

4. For SIMD functions beyond "map", the behavior of a SIMD function may 
significantly depend on the mask. In this case already the scalar 
function would need to be marked as "convergent" (but only if the code 
is actually going to be vectorized..). Eg, memory accesses (store_f64 -> 
store_v8f64(<8 x i1> %M)) or a function that simply returns the mask.

This is the same issue that the GPU folks are discussing for 
thread-group semantics: 
http://lists.llvm.org/pipermail/llvm-dev/2018-December/128662.html

ISPC (http://ispc.github.io/) and the more general Region Vectorizer 
(https://github.com/cdl-saarland/rv) are examples of frameworks that 
actually implement thread-group semantics for vectorization, including 
"wavefront" intrinsics, etc.

> We can attach the IR attribute to call instructions (preferred for avoiding conflicts when merging modules who don’t see the same attributes) or to function declaration, or both.
>
> 2. The second component is a tool that other parts of LLVM (for example, the loop vectorizer) can use to query the availability of the vector function, the SVFS I have described in the original post of the RFC, which is based on interpreting the `vector-variant` attribute.

The SVFS seems similar to the function resolver API in RV 
(https://github.com/cdl-saarland/rv/blob/master/include/rv/resolver/resolver.h). 
To clarify, RV's resolver API is all about flexibility, eg we use it to 
implement inter-procedural vectorization, OpenMP declare simd and SLEEF 
vector math. However, it does not commit to a specific 
order/prioritization of vector variants.

You also mentioned splitting vector functions when no vector variant for 
the full vectorization factor is available. I suggest to not hide this 
split call in an opaque wrapper function. In particular the cost model 
of the SLP vectorizer would benefit from this information.. and by 
extension also future versions of the loop/function vectorizer.

> The final component is the one that seems to have generated most of the controversies discussed in the thread, and for which I decided to move away from `declare variant`.
>
> 3. The third component is a set of descriptors that can be attached to the scalar function declaration / definition in the C/C++ source file, to be able to inform about the availability of an associated vector functions that can be used when / if needed.
>
> As someone as suggested, we should use a custom attribute. Because the mangling scheme of the Vector Function ABI provides all the information about the shape and properties of the vector function, I propose the approach exemplified in the following code:
>
>
> ```
> // AArch64 Advanced SIMD compilation
> double foo(double) __attribute__(simd_variant(“nN2v”,”neon_foo”));
> float64x2_t neon_foo(float64x2_t x) {…}
>
> // x86 SSE compilation
> double foo(double) __attribute__(simd_variant(“aN2v”,”sse_foo”));
> __m128 sse_foo(__m128 x) {…}
> ```
>
> The attribute would use the “core” tokens of the mangled names (without _ZGV prefix and the scalar function name postfix) to describe the vector function provided in the redirection.
Since this attribute implies the "_ZGV" prefix, shouldn't it rather be 
called "vectorabi_variant"?
> Formal syntax:
>
> ```
> __attribute__(simd_variant(“<isa><mask><VLEN><par_type_list>”, “custom_vector_name”))
>
> <isa> := “a” (SSE), “b” (AVX) , …, “n” (NEON), “s” (SVE) (from the vector function ABI specifications of each of the targets that support this, for now AArch64 and x86)
>
> <mask> := “N” for no mask, or “M” for masking
>
> <VLEN> := number of lanes in a vector | “x” for scalable vectorization (defined in the AArch64 Vector function ABI).
>
> <part_type_list> := “v” | “l” | … all these tokens are defined in the Vector Function ABI of the target (which get selected by the <isa>). FWIW, they are the same for x86 and AArch64.
> ```
>
> Please let me know what you thing about this proposal. I will rework the proposal if it makes it easier to follow and submit a new RFC about it, but before getting into rewriting everything I want to have some feedback on this change.
>
> Kind regards,
>
> Francesco
>
>> On May 31, 2019, at 8:17 PM, Doerfert, Johannes <jdoerfert at anl.gov> wrote:
>>
>> On 06/01, Saito, Hideki wrote:
>>> Page 22 of OpenMP 5.0 specification (Lines 13/14):
>>>
>>> 	When any thread encounters a simd construct, the iterations of the loop associated with the
>>> 	construct may be executed concurrently using the SIMD lanes that are available to the thread
>>>
>>> This is the Execution Model. The word here is "may" i.e., not "must".

As long as this reads "may" and there is no clear semantics for 
"concurrent execution using the SIMD lanes", "pragma omp simd" is 
precluded from advancing from "vectorize this loop" to a SPMD-like 
programming model for vectorization as it is common place in the GPU domain.

Thanks!

Simon

-- 

Simon Moll
Researcher / PhD Student

Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : moll at cs.uni-saarland.de
Fax. +49 (0)681 302-3065  : http://compilers.cs.uni-saarland.de/people/moll




More information about the cfe-dev mailing list