[cfe-dev] RFC: Adding vscale vector types to C and C++

Fri Jun 7 08:20:54 PDT 2019

Hi,

Thanks for the reply and Phabricator review.

JinGu Kang <jingu at codeplay.com> writes:
>> Using intrinsics might seem old-fashioned when there are various
>> frameworks that express data-parallel algorithms in a more abstract way,
>> or libraries like P0214 (std::simd) that provide mostly performance-
>> portable vector interfaces.  But in practice, each vector architecture
>> has its own quirks and unique features that aren't easy for the compiler
>> to use automatically and aren't performance-portable enough to be part
>> of a generic interface.  So even though target-neutral approaches are a
>> very welcome development, they're not a complete solution.  Intrinsics
>> are still vital when you really want to hand-optimise a routine for a
>> particular architecture.  And that's still a common requirement.
>>
>> For example, Arm has been porting various codebases that already support
>> AArch64 AdvSIMD intrinsics to SVE2.  Even though AdvSIMD and SVE2 have
>> some features in common, the routines for the two architectures are
>> often significantly different from each other (and in ways that can't be
>> abstracted by interfaces like std::simd).  We need to have direct access
>> to SVE2 features for this kind of work.
>
> I am +1000 with using intrinsic functions. Internally, there was discussion
> about supporting this type. For instance, how we can implement vector swizzle
> like ".xyz" or "hi/lo"? At this moment, CLANG uses shuffle vector to implement
> it. I guess we would want to swizzle vector per vector unit which is unknown at
> compile time. I am not sure we can implement it efficiently with current LLVM's
> IR vector operations. We could miss instruction combine or other optimization
> opportunities. However, I guess it would not be easy for the passes to handle
> this type's operations. If I missed something, please let me know.

Yeah, the initial vscale patch that was applied to the LLVM repo only
supported two kinds of index mask for shufflevectors: zeroinitializer
or undef.  Arm's internal implementation supports much more than that,
but this is still an area that needs to be agreed with the community.

In general, the Clang implementation of the SVE built-in functions
uses a combination of generic IR operations and target-specific LLVM
intrinsics.  At the moment the permute-like functions use intrinsics.

Thanks,
Richard