[LLVMdev] Vectorization of loops with conditional dereferencing

Nadav Rotem nrotem at apple.com
Thu Nov 14 07:03:08 PST 2013


I think that the best way to move forward with the vectorization of this loop is to make progress on the vectorization pragmas.  The LoopVectorizer is already prepared for handling pragmas and we just need to add the clang-side support. Is anyone planning to work on this ?


On Nov 14, 2013, at 2:18 AM, Renato Golin <renato.golin at linaro.org> wrote:

> On 1 November 2013 13:40, Hal Finkel <hfinkel at anl.gov> wrote:
> Done = false;
> FirstI = -1, LastI = -1;
> while (!Done) {
>   for (I = FirstI+1; I < N; ++I)
>     if (r[i] > 0) {
>       FirstI = I;
>       break;
>     }
> 
>   for (; I < N && !page_bound(&m[i]) && ...; ++I) {
>     if (r[i] > 0)
>       LastI = I;
>   }
> 
>   Done = I == N;
> 
>   for (I = FirstI; I <= LastI; ...) {
>     // Run the scalar/vector loop sequence as before.
>   }
> }
> 
> Hi Hal,
> 
> Even if you could do something like this, there is no guarantee that (Last - First) will be multiple of VF, or even bigger than VF, and you'll have to cope with boundary conditions on every external loop, which, depending on the distribution of r[] could be as many as half the size of r[] itself.
> 
> I can't see an algorithmically (compile-time or run-time) way to guarantee that the number of clusters will be small without scanning the whole array. So, even if you add such a check in run-time and only vectorize if the number of clusters is small, the worst case scenario will run twice as many times, (the initial check can be vectorized, as it's a read-only loop), but the later cannot. in the best case scenario, you'll have only a handful of loops, which will run in parallel.
> 
> Worst case: n/VF + n
> Best case: n/VF + ammortized n/VF
> 
> For VF == 2,
>  * best case is as fast as scalar code, considering overheads.
>  * worst case is 50% slower
> 
> For VF == 4,
>  * best case is 50% faster than scalar code
>  * worst case is 25% slower
> 
> And all that, depends on each workload, so it'll change for every different set of arguments, which in an n-body simulation, changes dynamically.
> 
> cheers,
> --renato

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131114/a52a3df7/attachment.html>


More information about the llvm-dev mailing list