[LLVMdev] Landing my new development on the trunk ...
Brian West
bnwest at rice.edu
Thu Oct 28 09:38:36 PDT 2010
On 10/27/10 8:34 PM, Eli Friedman wrote:
> On Wed, Oct 27, 2010 at 1:29 PM, Brian West<bnwest at rice.edu> wrote:
>> Here is the patch for the new Operator Strength Reduction optimization
>> pass that I have written. The bulk of the code is in
>>
>> lib/Transforms/Scalar/OperatorStrengthReduce.cpp
>>
>> The algorithm finds reduction opportunities in both array accesses and
>> explicit multiplications within loops.
>>
>> Next, I plan on writing the regression tests.
>
> Comments:
> 1. Please don't top-post.
> 2. Instruction::getOpcodeName instead of your OpcodeToString function.
Can do. I did not see getOpcodeName() in
lib/Target/CBackend/CBackend.cpp, so I wrote my own. FWIW I am only
using OpcodeToString() for debug output.
> 3. LLVM already has a significant amount of infrastructure for loop
> passes; why does this pass have its own code for finding loops?
I saw the loop infrastructure for CFG loops. This algorithm finds loops
in the data flow (more precisely: strongly-connected components in the
SSA-graph), e.g.
bb5:
%20 = add nsw i32 %i.0, 1
br label %bb6
bb6:
%i.0 = phi i32 [ 0, %entry ], [ %20, %bb5 ]
There is a data flow loop between %20 and %i.0. The OSR paper has a nice
figure showing data flow loops.
Here is small excerpt from the OSR paper:
"OSR is driven by a simple depth-first search of the SSA-graph, using
Tarjan’s strongly-connected component finder. The SCC-finder lets OSR
discover induction variables in topological order and process them as
they are discovered. As the SCCs are discovered, they are processed by a
set of mutually-recursive routines that test for induction variables and
region constants, and then reduce the appropriate operations."
> 4. What's the use case for this pass? Can this pass improve
> performance of generated code beyond the existing -loop-reduce pass?
> If not, could it usefully act as a faster replacement for -loop-reduce
> for fast code generation? (Perhaps someone doing fast code generation
> with LLVM could comment on whether -loop-reduce is useful, and whether
> the performance is an issue.)
-loop-reduce (LSR) only looks at implicit multiplication in array accesses. The following code has an explicit multiplication within a loop that LSR will not reduce:
extern float
loop4(float* a, int start, int stop)
{
int i;
float sum;
float *pEntry;
char *p;
unsigned long offset;
const unsigned int float_size = sizeof(float);
sum = 0.0;
p = (char*)a;
for (i=start-1; i<stop; ++i) {
offset = i * float_size;
pEntry = (float*) (p + offset);
sum += *pEntry;
}
return sum;
}
OSR also finds reductions in some esoteric array accesses that LSR
misses. I found these LSR misses while writing less-than-real-world code
to break OSR. I can include these examples, if requested.
For array accesses, OSR and LSR produces similar LLVM IR and I am
guessing that their execution times should be similar.
Empirically the OSR optimization is compile-time faster than LSR. I have
also noticed that OSR has more "analysis" requirements: Induction
Variable User, Natural Loop Information, Canonicalize natural loops, and
Scalar Evolution Analysis. Both OSR and LSR require the Dominator Tree
Construction analysis pass.
I did not mention in the original email (and should have) that OSR needs
-instcombine to be run after it for cleanup. Also -licm, -reassociate,
-gvn and -sccp can be enabling optimizations for OSR.
> -Eli
Eli,
Thanks for your comments.
Brian West
More information about the llvm-dev
mailing list