[LLVMdev] [LLVM, llc] TypeLegalization, DAGCombining, vectors loading

Rotem, Nadav nadav.rotem at intel.com
Wed Dec 14 00:53:48 PST 2011


Dan, 

I completely agree with you.  The vectorizer (or whoever generates this vector code) should be aware of the target instruction set and decide on the vectorization factor accordingly.  When our vectorizer[1] decides on the vectorization factor, it takes into account the available instruction set, as well as the operations used in the program. 
For example, AVX1 focuses on floating point operations, and vectorizing integer code to VF=8, would generate suboptimal code, because it would require the op legalizer to unpack/pack operations on each 'hole' in the instruction set.  

Thanks,
Nadav



[1] Intel's OpenCL SDK Vectorizer


-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Dan Gohman
Sent: Tuesday, December 13, 2011 23:21
To: Stepan Dyatkovskiy
Cc: llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] [LLVM, llc] TypeLegalization, DAGCombining, vectors loading

On Dec 13, 2011, at 11:37 AM, Stepan Dyatkovskiy wrote:

> Please ignore my concurrent post :-) Lets proceed in this branch.
> 
>> do you understand what it means in the non-vector case?
> I'm beginning to understand it now. It means the type that should be in
> abstract VM memory. Isn't it? The main question about MemoryVT is: 
> should it be original always (as it was defined in .ll) or not?
> 
> About vectors with element size less than 8 bits. This topic is 
> interesting for me. I would like to work with it. What is the best place 
> for discussing? llvmdev or bug #1784 (vectors of i1 and vectors x86 long 
> double don't work) ?


I tried to fix PR1784 multiple times. I have since had
some insights which have changed my mind.

<4 x i32> on a machine with <8 x i32> vectors misses out on
50% of the theoretical performance. <8 x i32> on a machine
with only <4 x i32> takes on unneeded code bloat and register
pressure. No amount of heroism in LegalizeTypes can change
this basic situation.

The further you go, either in the conceptual distance
between code and target machine, or in diversity of target
machines, the worse the problem gets.

Also, all of the proposed solutions for fixing exotic
vector types have substantial downsides.

So in addition to asking "why doesn't <2 x i5> work?", it's
also useful to ask "who is producing <2 x i5> values, and
what am I expecting to get out of letting them do that?"

Dan

_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.





More information about the llvm-dev mailing list