[LLVMdev] Landing my new development on the trunk ...

Thu Oct 28 09:38:36 PDT 2010

On 10/27/10 8:34 PM, Eli Friedman wrote:
> On Wed, Oct 27, 2010 at 1:29 PM, Brian West<bnwest at rice.edu>  wrote:
>> Here is the patch for the new Operator Strength Reduction optimization
>> pass that I have written.  The bulk of the code is in
>>
>> lib/Transforms/Scalar/OperatorStrengthReduce.cpp
>>
>> The algorithm finds reduction opportunities in both array accesses and
>> explicit multiplications within loops.
>>
>> Next, I plan on writing the regression tests.
>
> Comments:
> 1. Please don't top-post.
> 2. Instruction::getOpcodeName instead of your OpcodeToString function.

Can do. I did not see getOpcodeName() in 
lib/Target/CBackend/CBackend.cpp, so I wrote my own. FWIW I am only 
using OpcodeToString() for debug output.

> 3. LLVM already has a significant amount of infrastructure for loop
> passes; why does this pass have its own code for finding loops?
I saw the loop infrastructure for CFG loops. This algorithm finds loops 
in the data flow (more precisely: strongly-connected components in the 
SSA-graph), e.g.

bb5:
%20 = add nsw i32 %i.0, 1
br label %bb6

bb6:
%i.0 = phi i32 [ 0, %entry ], [ %20, %bb5 ]

There is a data flow loop between %20 and %i.0. The OSR paper has a nice 
figure showing data flow loops.

Here is small excerpt from the OSR paper:

"OSR is driven by a simple depth-first search of the SSA-graph, using 
Tarjan’s strongly-connected component finder. The SCC-finder lets OSR 
discover induction variables in topological order and process them as 
they are discovered. As the SCCs are discovered, they are processed by a 
set of mutually-recursive routines that test for induction variables and 
region constants, and then reduce the appropriate operations."

> 4. What's the use case for this pass?  Can this pass improve
> performance of generated code beyond the existing -loop-reduce pass?
> If not, could it usefully act as a faster replacement for -loop-reduce
> for fast code generation? (Perhaps someone doing fast code generation
> with LLVM could comment on whether -loop-reduce is useful, and whether
> the performance is an issue.)

-loop-reduce (LSR) only looks at implicit multiplication in array accesses. The following code has an explicit multiplication within a loop that LSR will not reduce:

extern float
loop4(float* a, int start, int stop)
{
int i;
float sum;
float *pEntry;
char *p;
unsigned long offset;
const unsigned int float_size = sizeof(float);

sum = 0.0;
p = (char*)a;
for (i=start-1; i<stop; ++i) {
offset = i * float_size;
pEntry = (float*) (p + offset);
sum += *pEntry;
}

return sum;
}

OSR also finds reductions in some esoteric array accesses that LSR 
misses. I found these LSR misses while writing less-than-real-world code 
to break OSR. I can include these examples, if requested.

For array accesses, OSR and LSR produces similar LLVM IR and I am 
guessing that their execution times should be similar.

Empirically the OSR optimization is compile-time faster than LSR. I have 
also noticed that OSR has more "analysis" requirements: Induction 
Variable User, Natural Loop Information, Canonicalize natural loops, and 
Scalar Evolution Analysis. Both OSR and LSR require the Dominator Tree 
Construction analysis pass.

I did not mention in the original email (and should have) that OSR needs 
-instcombine to be run after it for cleanup. Also -licm, -reassociate, 
-gvn and -sccp can be enabling optimizations for OSR.

> -Eli
Eli,

Thanks for your comments.

Brian West