[llvm-dev] Working on FP SCEV Analysis

Wed May 18 11:40:29 PDT 2016

>There needs to be some actual motivating case to make it worth even writing the code for.

This goes back into “priority to implement” question. If there aren’t any customers, priority goes down, by a lot.

>So under that paradigm - followed religiously - one would plug in any loop transformation, polyhedral or non-polyhedral etc cost models etc to morph code vectorizable

I won’t comment on other transformations. Powerful vectorizer certainly helps other optimizers make a case,
and sometimes require more optimizations to fully appreciate.

>This might be a good paradigm to follow from the peak performance angle, but not so from the compile-time or code size angle.
>It seems best to pursue a paradigm like this with a peak performance library rather than mainstream llvm.

This should be evaluated feature-by-feature. I fully understand that LLVM is also used as JIT compiler.
I don’t think FP induction is adding significantly more compile-time and code-size than integer induction.

>So i suggest y'all start from: "Here are the cases we care about making faster, and why we care about making them faster”.
>+1

This was our thinking before the paradigm shift.

The following code vectorizes for TTT being int (might need a bit of extension in SCEV) but not when TTT is float/double (unless FP induction
analysis is available). Adding 2-lines of code like this to a 1000-line loop suddenly stops vectorizing the entire loop. These are the things that
greatly irritate programmers.  Resolving programmer frustration is equally important as performance. In this case, a robust vectorizer should
either 1) vectorize FP induction or 2) tell the programmer that FP induction needs to be converted to integer induction. Either way, FP induction
analysis is needed. Showing a backward dependence edge on “x” would certainly help, but not as helpful as 1) or 2). ICC Vectorizer customers
appreciate improved “loop was not vectorized” messaging as much as functional and performance enhancements of the vectorizer.
In general, investing in making vectorizer “robust” pays off very well, through performance and/or programmer satisfaction.

void foo(TTT *a, int N, TTT x, TTT y){
    int i;
    for (i=0;i<N;i++){
        A[i] = x;
        x+=y;
    }
}

FYI, I have a customer asking for an extension of OpenMP linear for non-POD types (I won’t bother getting into that discussion in llvm-dev).
When vectorizer becomes stronger, more feature requests will come. ☺

Thanks,
Hideki

From: ghoflehner at apple.com [mailto:ghoflehner at apple.com]
Sent: Tuesday, May 17, 2016 7:07 PM
To: Daniel Berlin <dberlin at dberlin.org>; Saito, Hideki <hideki.saito at intel.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] Working on FP SCEV Analysis

On May 17, 2016, at 6:14 PM, Daniel Berlin via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

On Tue, May 17, 2016 at 5:17 PM, Saito, Hideki via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

>What situations are they common in?

ICC Vectorizer made a paradigm shift a while ago.
If there aren’t a clear reason why something can’t be vectorized, we should try our best to vectorize.
The rest is a performance modeling (and priority to implement) question, not a capability question.
We believe this is a good paradigm to follow in a vectorizer development.

In some sense, yes, but not at all possible costs.
There needs to be some actual motivating case to make it worth even writing the code for.

This paradigm can have far reaching consequences. The vectorizer is the performance cow to milk at the IR level. So under that paradigm - followed religiously - one would plug in any loop transformation, polyhedral or non-polyhedral etc cost models etc to morph code vectorizable. And when that is not sufficient one would probably start adding large numbers of run-time checks, multi-versioned code etc. This might be a good paradigm to follow from the peak performance angle, but not so from the compile-time or code size angle. It seems best to pursue a paradigm like this with a peak performance library rather than mainstream llvm.

It was a big departure from
“vectorize when all things look nice to vectorizer”.

These are not diametrically opposed.

I mean, it may be not worth the cost of mainintaing the *compiler code* to do o it.
This isn't the same as "when things look nice to the vectorizer", it's more "we're willing to vectorize whatever we can, as long as someone is going to actually use it".

Nobody has here provided a useful set of cases/applications/etc that suggests it should be done. I'm not saying there are none, i'm saying, literally, nobody has motivated this use case yet :)

We shouldn’t give up vectorizing simply because programmer wrote a FP induction code.(*)

We shouldn't add code to the compiler just because we can.

I would similarly be against, for example, vectorizing loops with binary coded decimal induction variables, and adding an entire BCD SCEV infrastructure, without some motivating case *somewhere*.

So i suggest y'all start from: "Here are the cases we care about making faster, and why we care about making them faster”.
+1  I think a lot of people would be very interested in non-toy examples that show big performance differences between icc and clang. That would also allow to dig deeper into questions like is it “vectorizer capability, dependence analysis and/or supporting transformations and/or ??? ”  to explain the gap.

In compilers, building infrastructure first, then finding customers works a lot worse than figuring out what customers want, and then building infrastructure for them :)

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160518/df95363d/attachment.html>