[PATCH] Add #pragma vectorize enable/disable to LLVM

Wed Dec 4 10:22:29 PST 2013

On Dec 4, 2013, at 11:51 AM, Renato Golin <renato.golin at linaro.org> wrote:

> On 4 December 2013 17:20, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
>> Ultimately, I think, we want to call the functionality that the vectorizer requires from those passes from the vectorizer only on the subset of BB’s modified by the vectorizer (instcombine as a library function ...).
>> It matters for -O1 and -Oz which did not pay this penalty before. In the short term, I don’t have a good answer here.
> 
> Yes, that'd be ideal.
> 
> In the meantime, is the compile time penalty a worry? Or can we tackle
> this later?
> 
> 
>> +  if (!LateVectorize)
>> +      MPM.add(createLoopVectorizePass(DisableUnrollLoops, LoopVectorize));
>> 
>> Let’s fix this whitespace error while we are here.
> 
> Sure!
> 
> 
>> I think that “vectorizer.enable” flag should enable “aggressive” vectorization at -Os, i.e disable the size heuristic we use.
> 
> That's a good point. I have no strong opinions here, if every one
> agrees this is correct.
> 
> PS: I'll have to write up some basic rules about the semantics, so
> that people know what to expect, as I'm already getting confused. ;)

#pragma vectorize enable

Enables vectorization using the vectorizer’s aggressive heuristics. If runtime memory checks or remainder loops are required vectorization will still be performed. (Usually, at Os we would not vectorize such loops).

We would not vectorize the following loop at -Os because it requires a scalar remainder loop.

int example(int *A, int n) {
  int r = 0;

  for (int i = 0; i < n; i++)
    r += A[i]

  return r;
}

However with the pragma we will vectorize at -Os and emit a scalar remainder loop.

int example(int *A, int n) {
  int r = 0;

  #pragma vectorize enable
  for (int i = 0; i < n; i++)
    r += A[i]

  return r;
}

I think this is one of the main use cases of this flag: People who care about code size (and speed) but want to selectively enable vectorization of loops that have symbolic counts or loops that need runtime checks.

Thanks,
Arnold