[LLVMdev] Area for improvement

Mon Feb 21 23:25:55 PST 2005

Well, what about this?

  for (i)
    for (j)
      A[i][j].X = 0;
      if (func(j))
        A[i][j].Y = 0;
      else
        A[i][j].Z = 0;

This is CSE across multiple basic blocks, i.e. GCSE.

Or even:

  for (i)
    for (j)
      A[i][j].X = 0;
      if (func(j))
        A[i-1][j].Y = 0;
      else
        A[i+1][j].Z = 0;

Much easier if the GEPs have been decomposed first and the usual global 
opts already performed :)

Chris Lattner wrote:

>> Consider the bytecode I originally posted for the nested for loop.  
>> If the getelementptrs haven't been replaced with something simpler, 
>> LSR will have to take them apart itself.  The multiplication cannot 
>> be replaced with addition without the two-dimensional access being 
>> replaced with one-dimensional access plus some pointer arithmetic.  I 
>> would guess that they will not have been decomposed before LSR is 
>> run, correct?
>
>
> No, this is exactly what LSR should do in this case.
>
>> It also appears the current LSR reduces each GEP in isolation of all 
>> the others.  That clearly is bad for this example, as all the GEPs in 
>> this loop do the same thing with the induction variable.  Creating a 
>> separate pointer for each one only makes things worse.  Ordinarily 
>> the addressing is explicit so the redundancy would have been removed 
>> and the LSR algorithm doesn't have to worry about it, but here it 
>> does.  Bottom line is it has to strength reduce operations that are 
>> implicit, so it first has to determine what those operations are and 
>> detect common subexpressions among those operations itself (possibly 
>> across multiple basic blocks).
>
>
> Yes, if you mean something like this:
>
>   for (i)
>     for (j)
>       A[i][j].X = 0;
>       A[i][j].Y = 0;
>
> It should be sufficient to reduce this to (just considering the inner 
> loop):
>
>   for (i)
>     p = &A[i];   // this would be futher reduced later.
>     for (j)
>       p[j].X = 0;
>       p[j].Y = 0;
>
> (this needs to do the auto-cse of p into the two GEPs).  Further 
> strength reduction should simplify this to:
>
>   for (i)
>     p = &A[i];   // this would be futher reduced later.
>     for (j)
>       p->X = 0;
>       p->Y = 0;
>       ++p;
>
> ... which is what we want.
>
>> It doesn't appear to handle explicit multiplications at all (probably 
>> doesn't add much value to do so).
>
>
> True, this can be added later.  Other operations, including div and 
> rem, can be strength reduced as well.
>
>> It's a challenge :)
>
>
> :)  I would suggest continuing Nate's approach.  Start with the most 
> trivial of simple loops (e.g. for (i) A[i] = 0; ), and work up from 
> there, adding complexity as necessary.
>
> The idea should be to always have a pass that works, but continue to 
> add new capabilities to it.
>
> -Chris
>