[PATCH/RFC] Pre-increment preparation pass

Mon Feb 4 20:42:19 PST 2013

On Feb 4, 2013, at 7:33 PM, Hal Finkel <hfinkel at anl.gov> wrote:

>> Here are some possible next steps for improving pre/post inc
>> generation:
>> 
>> - Fix DAGCombine so that it preserves the IV chains formed at
>> IR-level.
> 
> If you're talking about what I think you're talking about, at least for the case where all of the offsets are constants, I've already worked on this. I have a patch on the list, see my e-mail titled, "Constant folding around pre-increment loads and stores." This does not generally prevent pre-increment formation, but before this fix, makes the result less useful.
> 
>> 
>> - Modify LSR to make use of target hooks to detect IV chains that
>> will
>>  result in pre/post-inc ld/st formation. Use that information to
>>  guide heuristics so that we generate those chains in more cases,
>>  rather than purely attempting to reduce register pressure. Handle
>>  the cases that matter to you without regressing other
>>  targets. Possibly add some detection of common idioms if that makes
>>  it easier.
>> 
>> - Add very simple straight-line address-chain formation pass after
>> LSR
>>  to cleanup simple ld/st sequences. This would need to form phis. It
>>  also probably could be done without SCEV.
> 
> If you don't mind, I'd appreciate some more specific advice. First, is the current implementation of LSR capable of performing this transformation:
>>> for (int i = 0; i < N; ++i) {
>>> x[i] = y[i]
>>> }
>>> needs to be transformed to look more like this:
>>> T *a = x[-1], *b = y[-1];
>>> for (int i = 0; i < N; ++i) {
>>> *++a = *++b;
>>> }
> or is this what the "straight-line address-chain formation pass" you imagine would do? If LSR can do this, what contributes to the decision of whether or not it should be done? In some sense, this is the most important part because this is what enables using the pre-increment forms in the first place. Convincing LSR to otherwise form the chains in unrolled loops seems to be a function of the chain cost functions. Where should I start looking to see how to modify those?

I hope that LSR can handle any case involving unrolled inner loops. 

By "straight-line" address chains, I mean not involving a phi and not requiring any induction variable recognition.
This new pass would be very simple, but could cover a series of ld/st operations that don't involve a loop index, like structure initialization. Naturally, it would also work on outer loops, non-loops, or non-simplified loops.

I actually don't think this would be useful to you. I was just pointing out that it's a missing optimization and might be useful in reducing code size.

-Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130204/e82ea883/attachment.html>