[PATCH/RFC] Pre-increment preparation pass
Hal Finkel
hfinkel at anl.gov
Tue Jan 29 13:31:23 PST 2013
Hello again,
When targeting PPC A2 cores, use of pre-increment loads and stores is very important for performance. The fundamental problem with the generation of pre-increment instructions is that most loops are not naturally written to take advantage of them. For example, a loop written as:
for (int i = 0; i < N; ++i) {
x[i] = y[i]
}
needs to be transformed to look more like this:
T *a = x[-1], *b = y[-1];
for (int i = 0; i < N; ++i) {
*++a = *++b;
}
I've attached a pass that performs this transformation. For unrolled loops (or other situations with multiple accesses to the same array), the lowest-offset use is transformed into the pre-increment access and the others are updated to base their addressing off of the pre-increment access. I schedule this pass after LSR is run.
This seems to work pretty well, but I don't know if this is the best way of doing what I'd like. Should this really be part of LSR somehow?
In case you're curious, for inner loops (where this really matters), the induction variable is often moved into the special loop counter register, and so removing the dependence of the addressing on the induction variable is also a good thing.
Thanks again,
Hal
--
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: incamprep.patch
Type: text/x-patch
Size: 19886 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130129/d6112103/attachment.bin>
More information about the llvm-commits
mailing list