[LLVMdev] Separating loop nests based on profile information?

Sun Jan 11 22:17:15 PST 2015

On Thu, Jan 8, 2015 at 3:33 PM, Philip Reames <listmail at philipreames.com>
wrote:

>
> On 01/07/2015 05:33 PM, Chandler Carruth wrote:
>
>> How does this compare with classical approaches of loop peeling,
>> partitioning, fission, or whatever you might call it?
>>
> I'm still developing a good sense for this, but let me lay out some
> observations.  Fair warning, this is going to be long and rambling.
>
> Let's start with a toy example:
> while(c) {
>   x = this->x;
>   y = this->y;
>   if (x == y) {
>     rare();
>   }
> }
>
>

> If we could tell x and y were loop invariant, we could unswitch this
> loop.  However, the rare call clobbers our view of memory, so LICM fails,
> and thus unswitch fails.
>
>

> We'd like to apply PRE here and push the reload into the loop preheader
> and the rare case.  This currently fails for two reasons: 1) We don't allow
> code duplication during PRE,

?????
If we don't, we aren't doing real PRE. So i'm not entirely sure what you
mean here.

> and 2) the canonical loop-simplify form of having a single latch means
> that PRE doesn't actually see the rare block at all, it sees the preheader
> and the join point after the if block.

>
> I think both of the previous are fixable:
>

GCC's PRE already does the above.
It doesn't do profile guided duplication.
We aren't doing anything special with these blocks.

here is the code I used to test:

struct a{
int x;
int y;
};
extern void rare();
int mai(int c, struct a *this)
{
int d = 0;
while(c) {
int x = this->x;
int y = this->y;
d += x + y;
if (x == y) {
rare();
}
}
return d;
}

It will do exactly what you expect, it is transformed into:

struct a{
int x;
int y;
};
extern void rare();
int mai(int c, struct a *this)
{
int d = 0;
        int pretemp1 = this->x
        int pretemp2 = this->y

while(c) {
                pretemp1phi = phi(rare block pretemp1, preheader pretemp1).
                pretemp2phi = phi(rare block pretemp2, preheader pretemp2)

d += pretemp1phi + pretemp2phi
if (x == y) {
rare();
                        pretemp1 = this->x;
                        pretemp2 = this->y;

}
}
return d;
}
I don't see why profile guided duplication is necessary here. This is a
basic load PRE case.  It is handled by the first version of GVN-based Load
PRE I wrote for GCC.  It is always a win.

Looking at what LLVM does, the failing on the PRE side is that our PRE/GVN
models are not strong enough to handle this. I'm not sure at all why we
think anything else is necessary.  It's certainly not requiring special
code duplication heuristics, etc.

So either you are thinking of a case that is different from the above, or I
am seriously confused :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150112/9e2fa48d/attachment.html>