[llvm-commits] Better type-legalization for SIMD registers with subregisters.

Tue Jun 19 18:19:31 PDT 2012

Your approach makes sense to me but I'm not certain if getCommonSIMDElementType() as it is now is the right target hook. It seems like you want a more flexible target hook that can potentially return different types for different element type / width combo's. That is, something like getTypeToTransformTo().

In the future, it makes sense to tablegen to derive these kind of target information.

Evan

On Jun 9, 2012, at 2:10 PM, Rotem, Nadav wrote:

> Hi, 
> 
> I attached a patch which modifies the behavior of the type-legalizer for SIMD registers with subregisters. I'd appreciate feedback. 
> 
> Vectorizing compilers choose a vectorization factor based on the width of the SIMD unit and the data types that are used in the program. Programs that use multiple data types are vectorized and turn into programs that contain "illegal" data types (which don't fit into SIMD registers).
> It is the role of the type-legalizer to modify these types into "legal" data types, in the most efficient way. 
> 
> The SIMD type-legalizer attempts to promote vector elements to the *first* legal type it finds.  There are a number of cases where this behavior is suboptimal.
> For example, AVX has an 8-wide SIMD units: a 256-bit register (YMM) that contain the smaller 128bit (XMM) subregister.  On this target, a vectorizer may vectorize chars into the type v8i8. 
> The type-legalizer would try to promote this type to the first legal register type, which is v8i16 (XMM register). This is bad news because the instruction set for the type v8i16 is sparse. A much better choice would be v8i32 (YMM register). 
> 
> I propose a small change to the legalization strategy. 
> Targets would be able to declare the 'common' vector element type, which is the data type for which is the instruction set is dense. I imagine that 90% of the times it would be i32. 
> The type-legalizer would prefer to promote element types to the 'common' element types before trying other register types. 
> 
> I ran some tests on AVX and I noticed the following code improvements (geomean 4% on Sandybridge):
> 1. Compare/Select pairs generate the <8 x i1> type. Currently the mask is legalized to v8i6, After the change it becomes v8i32.
> 2. The vector zext/sext instructions (for example v8i8 to v8i32) no longer requires cross-lane operations (XMM to upper YMM).
> 3. Legalization of shuffle-vector of <8 x i8> produces much better code as v8i32 (and not v8i16), despite the danger of cross-lane operations.
> 
> However, I also noticed that some ARM vector-trunc optimizations broke (see in the patch).  
> 
> Thanks,
> Nadav
> 
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
> 
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
> <promote_i32.patch>_______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits