[llvm-commits] Using SCEVExpander for Polly (was Re: [PATCH] Multidimensional Array Index Delinearization Analysis)

Mon Oct 1 11:00:01 PDT 2012

On Oct 1, 2012, at 1:59 AM, Tobias Grosser <tobias at grosser.es> wrote:

> On 09/29/2012 04:34 AM, Andrew Trick wrote:
>> 
>> On Sep 27, 2012, at 3:11 AM, Tobias Grosser <tobias at grosser.es> wrote:
>> 
>>> On 09/27/2012 11:29 AM, Sameer Sahasrabuddhe wrote:
>>>> 
>>>> Hi Hal,
>>>> 
>>>> I tried the version from Delinearization-20120926.patch on a Fortran
>>>> loopnest, with -O3 before invoking delinearization. See attached file
>>>> "m.pre.ll".
>>>> 
>>>> The delinearizer misses out on the negation expression implemented as
>>>> an XOR (the value "%not"):
>>>> 
>>>>     ~n = -n - 1
>>>> 
>>>> As a result, the "n" above is not available as a possible term in a GCD.
>>>> It worked when I manually substituted that negation with its expansion.
>>>> I am not sure if this should be handled as an additional method
>>>> "addPolysForXor()", or the IR itself should be modified as a precursor
>>>> to delinearization. This expansion is similar to what happens in
>>>> ScalarEvolution::getNotSCEV().
>>> 
>>> It seems we would need to mirror a lot of pattern matching from ScalarEvolution. Working directly on SCEVs could avoid this.
>>> 
>>> Another remark comping from a similar angle. The current analysis is not on demand, but always iterates over all instructions. I have the feeling within LLVM, people try to perform analysis on demand (e.g. Prestons Dependency Analysis, but also ScalarEvolution). Working directly on SCEVs would make it easy to do an on-demand analysis that is only called for the scevs used by memory access instructions.
>>> 
>>> @Andrew: I remember you mentioned that ScalarEvolution has some design problems. Could you elaborate on them and if they would cause issues in this context?
> 
> Hi Andrew,
> 
>> If we ignore SCEVExpander and only consider SCEV-the-analysis, then we have a pretty robust system. There are a few issues:
>> - LCSSA form artificially limits analysis.
>> - SCEV inherently cannot preserve nsw/nuw flags. When it attempts to do it, the results can depend on the query order.
>> - Expressions with sext/zext/trunc do not have a canonical form.
>> - SCEV queries can take time exponential to the expression depth (mainly a problem because of sext/zext/trunc).
>> 
>> SCEV should be just fine within a single loop nest with simple induction variable expressions.
> 
> What is a single loop nest? Or better, what would be more complex than a single loop nest?
> 
> Is this is a single loop nest?
> 
> | for i
> |   for j
> 
> This as well?
> 
> | for i
> |  for j
> |
> | for k
> 
> Or just if it is surrounded by a common loop? Do you have an example for something that is not a single loop nest?
> 
> I am interested, as we use SCEV to derive loop bounds and array subscripts in Polly. We basically base our high level loop transformations on SCEV. Until now, I did not see any problems. This
> may be due to the constraints we put on the kind of loop nests we optimize, but also because we have been lucky so far. Do you think it was a bad decision to base Polly on SCEV?
> 
> I was actually even thinking of "improving" Polly by using SCEVExpander
> extensively. When recreating a new loop structure we move basic blocks from the old loop nest to a new one. For now, we use the old indvars pass to rewrite all loops to single induction variable loops. This allows us to rewrite the blocks by replacing the explicitly available old induction variables with the ones of the new loop nest. To remove the need for the old indvars pass, I was thinking of rewriting based on SCEVs. I would basically take the SCEV of a value that I want to rewrite
> and I would replace all AddRec expressions that reference an old iv, with a reference to the newly calculated ivs. I would then use the SCEVExpander to code generate these 'updated' SCEVs. Does this sound like a good idea or what kind of problems would I face with this approach?

I think it's a good decision to base Polly on SCEV. The need to support SCEVExpander does introduce complexity that I don't like, so I'd prefer standard passes avoid SCEVExpander (while continuing to use SCEV). However, SCEVExpander was designed for the situation where you want to restructure and rewrite the entire loop. So it may actually be the best approach for you.

In addition to the SCEV issues above, there are problems that you may run into with 
SCEVExpander. There are a couple of specific, somewhat minor issues:

- Rematerializing geps from an expression involving pointer arithmetic. The expression may refer to multiple pointer-type expressions, so we don't know the gep base.

- Hoisting operations that may trap (UDiv)

More generally, I think that SCEVExpander is a bad fit for incremental optimization because there's no clear way to associate a SCEV with the IR that produced it. We may end up rewriting induction variables in other loops just to perform the most minor transformation. I added logic to reuse induction variables as much as possible, but it gets very tricky especially with a mix a pointers and integers of different sizes.

A related problem is that we don't know where the boundary of SCEV analysis should be. It is designed as a global analysis, but the client usually wants an expression only involving the current loop nest. LCSSA bounds the analysis in some cases, but it is entirely arbitrary.

So, for your sibling loops:
for i
  for j
for k

Recurrences over k may or may not contain recurrences over i.

You should probably look carefully at the SCEVs produced in the cases that you care about to see what kind of problems you may have. If you only expanding some known set of recurrences that match the new induction variables that you have selected, then SCEVExpander will probably work for you.

-Andy