[PATCH] Cleanup / consolidation of small loop unroll logic
Arnold Schwaighofer
aschwaighofer at apple.com
Tue Jan 28 19:57:19 PST 2014
On Jan 28, 2014, at 6:29 PM, Chandler Carruth <chandlerc at gmail.com> wrote:
> This look OK? I just wanted to double check that there wasn't a specific reason to keep these two things separate. Hoping to get some good benchmarks on the new stuff you added soon.
>
I am a little concerned about this part:
+ if (!Legal->getRuntimePointerCheck()->Need &&
LoopCost < SmallLoopCost) {
This now guards all unrolling. We won’t unroll vectorized code if we need a runtime check where we did before. For example code that looks like the following snippet does not get unrolled.
int foo_func(float *A, float *B, int N) {
for (i in 0..N)
A[i] *= B[i];
}
For optimal throughput we might want to unroll and vectorized this and we don’t mind the runtime check.
I wanted unrolling for load/stores ports to not create an extra runtime check if there wasn’t already one and so I had the "!Legal->getRuntimePointerCheck()->Need” guard. Initially this code was in the if (VF == 1) section. When I ported the patch this got lost. This should have been something like:
if (EnableLoadStoreRuntimeUnroll &&
(VF > 1 || !Legal->getRuntimePointerCheck()->Need) &&
LoopCost < SmallLoopCost) {
// Unroll until store/load ports (estimated by max unroll factor) are
// saturated.
unsigned UnrollStores = UF / (Legal->NumStores ? Legal->NumStores : 1);
unsigned UnrollLoads = UF / (Legal->NumLoads ? Legal->NumLoads : 1);
UF = std::max(std::min(UnrollStores, UnrollLoads), 1u);
return UF;
}
Or for your patch:
- if (EnableLoadStoreRuntimeUnroll &&
- !Legal->getRuntimePointerCheck()->Need &&
+ // We want to unroll small loops in order to reduce the loop overhead and
+ // potentially expose ILP opportunities.
+ DEBUG(dbgs() << "LV: Loop cost is " << LoopCost << '\n');
+ if ((VF > 1 || !Legal->getRuntimePointerCheck()->Need) &&
LoopCost < SmallLoopCost) {
+ // We assume that the cost overhead is 1 and we use the cost model
+ // to estimate the cost of the loop and unroll until the cost of the
+ // loop overhead is about 5% of the cost of the loop.
+ unsigned SmallUF = std::min(UF, (unsigned)PowerOf2Floor(SmallLoopCost / LoopCost));
+
> Also, unrelated, but can you commit the register pressure tweak yet? Still need more benchmarking?
Yes, this still needs benchmarking. If you want I can commit it behind a flag.
More information about the llvm-commits
mailing list