[llvm-commits] [llvm] r171798 - in /llvm/trunk: lib/Transforms/Vectorize/LoopVectorize.cpp test/Transforms/LoopVectorize/X86/unroll-small-loops.ll

Tue Jan 8 00:35:10 PST 2013

On Jan 7, 2013, at 9:21 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
> IMHO, it is not always possible to statically determine if it's beneficial to vectorize a
> loop with small(tiny?) trip count. Here are two examples:

| Here's another way of trying to say the same thing: if we don't need a scalar cleanup loop (e.g. because the vectorization factor of a loop is known to subdivide the constant tripcount), isn't it
| always beneficial to do the vectorization, even if the new tripcount is low?

One of the points Shuxin is suggesting is that for _very, very_ small loops which have sections before and after which access those values in a non-SIMD-isable way, a compiler might be able to unroll the loop and propagate those values into the surrounding code, whilst if it has decided to vectorize then that optimization is both blocked and you're potentially paying the penalty for moving data to and from the vector unit. So there are certainly cases where vectorizing small loops gives worse performance. However, it's unclear to me whether just forbidding vectorization on small loops based on trip-count rather than trying to determine if it gives worse results than scalar compilation is the best approach.

Regards,
Dave

>
> e.g1 :  suppose HW has 16-byte SIMD support.
>
>    double a[];
>    for (i = 0; i < 3; i++)
>        a[i] = ....
>
>   We have 2 ways to vect this loop:
>  vect1:
>    a[0:1] = ...
>    a[2] = ...
>
>  vect2:
>    a[0] = ...
>    a[1..2] =
>
>  Unless we know the alignment of the array <a> wrt 16-byte boundary, we are not
> able to determine which one works better. If we unfortunately pick up the
> one with unaligned access, the performance may be worse than the
> un-vectorized version.
>
> e.g2.
>    for (i = 0; i < very-small-num; i++)  {
>       a[i] = ..
>              = a[i-1]
>     }
>
>    If it is vectorized, we have
>      for (...) {
>         a[i:i+1] =
>                     = a[i-1:i]
>      }
>      [ remainder scalar loop]
>
>     In the vectorized version, the load and store cannot be scalar-replaced.
> therefore, each memory unit need to be accessed twice, including one access
> which is bound to be unaligned.
>
>     In contrast, in the un-vectorized, "a[i]" and a[i-1]" can be scalar replaced,
> therefore each memory unit is accessed only once.
>
>    It is very difficult to tell if SIMD wins. It depends the neighboring code, the
> humidity, the outdoor temperature etc etc etc.
>
>   In my humble experience in another compiler,  if I set threshold of trip-count
> less than 4, the performance starts to slightly fluctuate. But I think threshold
> "trip-count=16" is bit conservative.
>
>
> On 01/07/2013 05:02 PM, Chris Lattner wrote:
>> On Jan 7, 2013, at 1:54 PM, Nadav Rotem <nrotem at apple.com> wrote:
>>
>>> Author: nadav
>>> Date: Mon Jan  7 15:54:51 2013
>>> New Revision: 171798
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=171798&view=rev
>>> Log:
>>> LoopVectorizer: When we vectorizer and widen loops we process many elements at once. This is a good thing, except for
>>> small loops. On small loops post-loop that handles scalars (and runs slower) can take more time to execute than the
>>> rest of the loop. This patch disables widening of loops with a small static trip count.
>> Isn't it still (extremely) valuable to vectorize loops that are a multiple of the vectorization threshold?  Turning a loop that adds 4 element arrays into a single SIMD add is a pretty nice win and requires no cleanup loop.
>>
>> -Chris
>>
>>
>

_______________________________________________
llvm-commits mailing list
llvm-commits at cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.