[llvm-dev] Working on FP SCEV Analysis

Tue May 17 17:17:21 PDT 2016

>What situations are they common in?

ICC Vectorizer made a paradigm shift a while ago.
If there aren’t a clear reason why something can’t be vectorized, we should try our best to vectorize.
The rest is a performance modeling (and priority to implement) question, not a capability question.
We believe this is a good paradigm to follow in a vectorizer development. It was a big departure from
“vectorize when all things look nice to vectorizer”.

We shouldn’t give up vectorizing simply because programmer wrote a FP induction code.(*)
Then, the next question is what’s the best solution for that problem, and extending SCEV
appears to be one of the obvious directions to explore.

Thanks,
Hideki Saito
Intel Compilers and Languages
----------------------
(*) Quick (and dirty) overview of vectorization legality
Vectorization is a cross-iteration optimization. We need to have a solution for cross-iteration dependences.
Forward dependencies are considered “safe for vectorization” since vector execution order naturally satisfy them.
Backward dependencies are unsafe, unless vectorizer knows how to “break” them. Induction is cyclic dependence
by nature and as such considered unsafe for vectorization, unless vectorizer knows how to break them.
[Given a CFG that executes from top to bottom, forward dependence is the downward data dependence edge.]

_____________________________________________
From: Demikhovsky, Elena
Sent: Tuesday, May 17, 2016 3:15 AM
To: Sanjoy Das <sanjoy at playingwithpointers.com>; Chandler Carruth <chandlerc at google.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>; Hal Finkel (hfinkel at anl.gov) <hfinkel at anl.gov>; Adam Nemet (anemet at apple.com) <anemet at apple.com>; Andrew Trick <atrick at apple.com>; mzolotukhin at apple.com; Zaks, Ayal <ayal.zaks at intel.com>; Saito, Hideki <hideki.saito at intel.com>
Subject: RE: [llvm-dev] Working on FP SCEV Analysis

Hi Sanjoy,

Please see my answers bellow:

  - Core motivation: why do we even care about optimizing floating
    point induction variables?  What situations are they common in?  Do
    programmers _expect_ compilers to optimize them well?  (I haven't
    worked on our vectorizers so pardon the possibly stupid question)
    in the example you gave, why do you need SCEV to analyze the
    increment to vectorize the loop (i.e how does it help)?  What are
    some other concrete cases you'll want to optimize?

[Demikhovsky, Elena] I gave an example of loop that can be vectorized in the fast-math mode. ICC compiler vectorizes loops with *primary* and *secondary* IVs:
This is the example for *primary* induction:

(1) for (float i = 0.5; i < 0.75; i+=0.05) {}   → i is a "primary" IV

And for *secondary*:

(2) for (int i = 0, float x = start; i < N; i++, x += delta) {}     → x is a "secondary" IV

Now I'm working only on (2)

  - I presume you'll want SCEV expressions for `sitofp` and `uitofp`.

[Demikhovsky, Elena] I'm adding these expressions, of course. They are similar to "truncate" and "zext", in terms of implementation.

    (The most important question:) With these in the game, what is the
    canonical representation of SCEV expressions that can be expressed
    as, say, both `sitofp(A + B)` and `sitofp(A) + sitofp(B)`?
[Demikhovsky, Elena] Meanwhile I have  (start + delta * sitofp(i)).
I don't know how far we can go with FP simplification and under what flags. The first implementation does not assume that sitofp(A + B) is equal to sitofp(A) + sitofp(B)

    Will we have a way to mark expressions (like we have `nsw` and
    `nuw` for `sext` and `zext`) which we can distribute `sitofp` and
    `uitofp` over?
[Demikhovsky, Elena] I assume that sitofp and uitofp should be 2 different operations.

    Same questions for `fptosi` and `fptoui`.
[Demikhovsky, Elena] the same answer as above, because I don’t want to combine these operations

  - How will you partition the logic between floating and integer
    expressions in SCEV-land?  Will you have, say, `SCEVAddExpr` do
    different things based on type, or will you split it into
    `SCEVIAddExpr` and `SCEVFAddExpr`? [0]

[Demikhovsky, Elena] Yes, I’m introducing SCEVFAddExpr and SCEVFMulExpr - (start + delta * sitofp(i))

    * There are likely to be similarities too -- e.g. the "inductive"
      or "control flow" aspect of `SCEVAddRecExpr` is likely to be
      common between floating point add recurrences[1], and integer add
      recurrences; and part of figuring out the partitioning is also
      figuring out how to re-use these bits of logic.
[Demikhovsky, Elena] I’m adding SCEVFAddRecExpr to describe the recurrence of FP IV

[0]: I'll prefer the latter since e.g. integer addition is associative, but floating point addition isn't; and it is better to force programmers to handle the two operations differently.

[1]: For instance, things like this:
https://github.com/llvm-mirror/llvm/blob/master/lib/Analysis/ScalarEvolution.cpp#L7564
are likely to stay common between floating point and integer add recs.

-- Sanjoy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160518/f406e553/attachment.html>