[LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass

Tue Jul 30 21:13:58 PDT 2013

Hi Tobias and Sven,

Thanks for your discussion and suggestion. 

@Sven: ISL actually allows users to have different identifiers with the same name. The problem that we have discussed is caused by incorrect usage of isl_space in Polly, so please do not worry about ISL library. You can skip the following information related to Polly implementation.

@Tobias and Polly developers:
I have attached a patch file to fix this problem. The key idea is to keep ISL library always seeing the same single parameter for those memory accesses that use the same index variables even though they have different constant base values.

For the example we have seen before:
for(i = 0; i < 8; i++) { for (ctr = 0; ctr < 8; ctr++) { x1 = input[i*64 + ctr*8 + 1] ; x0 = input[i*64 + ctr*8 + 0] ; input[i*64 + ctr*8 + 0] = x0 - x1; input[i*64 + ctr*8 + 1] = x0 + x1; input[i*64 + ctr*8 + 2] = x0 * x1; }

Without this patch file, Polly would produce the Context as follows:

  Context:
[p_0, p_1, p_2] -> { : p_0 >= -9223372036854775808 and p_0 <= 9223372036854775807 and p_1 >= -9223372036854775808 and p_1 <= 9223372036854775807 and p_2 >= -9223372036854775808 and p_2 <= 9223372036854775807 } p0: {0,+,128}<%for.cond2.preheader> p1: {2,+,128}<%for.cond2.preheader> p2: {4,+,128}<%for.cond2.preheader> Statements { Stmt_for_body6 Domain := [p_0, p_1, p_2] -> { Stmt_for_body6[i0] : i0 >= 0 and i0 <= 7 }; Scattering := [p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> scattering[0, i0, 0] }; ReadAccess := [p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_0 + 16i0 }; ReadAccess := [p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_1 + 16i0 }; MustWriteAccess := [p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_0 + 16i0 }; MustWriteAccess := [p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_1 + 16i0 }; MustWriteAccess := [p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_2 + !
 16i0 }; }

Furthermore, it will produce very complex RAW, WAW and WAR dependence as follows:

RAW dependences: [p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> Stmt_for_body6[o0] : (exists (e0 = [(p_1)/2], e1 = [(7p_1 + p_2 + 112i0)/16]: 16o0 = -p_0 + p_1 + 16i0 and 2e0 = p_1 and 16i0 >= -p_1 + p_2 and 16i0 <= 112 + p_0 - p_1 and p_1 >= 16 + p_0 and i0 >= 0 and 16e1 >= -15 + 7p_1 + p_2 + 112i0 and 16e1 <= -1 + 7p_1 + p_2 + 112i0 and p_2 >= 16 + p_0)) or ... or (exists (e0 = [(p_1)/2], e1 = [(-p_1 + p_2)/16]: 16o0 = p_0 - p_1 + 16i0 and 2e0 = p_1 and 16e1 = -p_1 + p_2 and 16i0 >= -p_0 + p_2 and 16i0 <= 112 - p_0 + p_1 and p_1 <= -16 + p_0 and p_2 >= 16 + p_0)) }

It not only leads to significant compile-time overhead, but also produces incorrect dependence results.  As we can see from the source code, there should be no RAW, WAW and WAR across loop iterations at all. 

With this patch file, Polly would produce the following Context and RAW, WAW, WAR dependence:

Context:
    [p_0] -> {  : p_0 >= -9223372036854775808 and p_0 <= 9223372036854775807 }
    p0: {0,+,128}<%for.cond2.preheader>
    Statements {
    Stmt_for_body6
            Domain :=
                [p_0] -> { Stmt_for_body6[i0] : i0 >= 0 and i0 <= 7 };
            Scattering :=
                [p_0] -> { Stmt_for_body6[i0] -> scattering[0, i0, 0] };
            ReadAccess := 
                [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_0 + 16i0 };
            ReadAccess := 
                [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = 2 + p_0 + 16i0 };
            MustWriteAccess := 
                [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_0 + 16i0 };
            MustWriteAccess := 
                [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = 2 + p_0 + 16i0 };
            MustWriteAccess := 
                [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = 4 + p_0 + 16i0 };
    }

RAW dependences:
[p_0] -> {  }
WAR dependences:
[p_0] -> {  }
WAW dependences:
[p_0] -> {  }

We can see it only creates a single parameter and it can catch the exact RAR, WAW and WAW dependence.

To see whether our patch file could catch normal data dependence, I have constructed another example as follows:

  for(i = 0; i < 8; i++) {
    for (ctr = 0; ctr < 8; ctr++) {
      x0 = input[i*64 + ctr + 0] ; 
      x1 = input[i*64 + ctr + 1] ;
      input[i*64 + ctr + 0] = x0 - x1;
      input[i*64 + ctr + 1] = x0 + x1;
      input[i*64 + ctr + 2] = x0 * x1;
    }

The original Polly would produce similar complex results. However, with the attached patch file, Polly would produce the following Context and RAW, WAW, WAR dependence:
Context:
    [p_0] -> {  : p_0 >= -2147483648 and p_0 <= 2147483647 }
    p0: {0,+,128}<%for.cond2.preheader>
    Statements {
    Stmt_for_body6
            Domain :=
                [p_0] -> { Stmt_for_body6[i0] : i0 >= 0 and i0 <= 7 };
            Scattering :=
                [p_0] -> { Stmt_for_body6[i0] -> scattering[0, i0, 0] };
            ReadAccess := 
                [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_0 + 2i0 };
            ReadAccess := 
                [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = 2 + p_0 + 2i0 };
            MustWriteAccess := 
                [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_0 + 2i0 };
            MustWriteAccess := 
                [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = 2 + p_0 + 2i0 };
            MustWriteAccess := 
                [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = 4 + p_0 + 2i0 };
    }

RAW dependences:
[p_0] -> { Stmt_for_body6[i0] -> Stmt_for_body6[1 + i0] : exists (e0 = [(p_0)/2]: 2e0 = p_0 and i0 >= 0 and i0 <= 6) }
WAR dependences:
[p_0] -> {  }
WAW dependences:
[p_0] -> { Stmt_for_body6[i0] -> Stmt_for_body6[1 + i0] : exists (e0 = [(p_0)/2]: 2e0 = p_0 and i0 >= 0 and i0 <= 6) 

Results show that it still creates a single parameter and catches the exact RAR, WAW and WAW dependence.  Of course, this patch file makes the compiling very faster, i.e., compile-time is reduced from several minutes to less then 1 second.

@Tobias:
>This is obviously a hack. The base is not always a constant.
>You can probably just call use something like,
>isl_pw_aff *BaseValue = visit(AR->getOperand(0))
>Affine = isl_pw_aff_sum(Affine, BaseValue);
Currently, I only handle constant base values because I have not found a good way to handle general base values. isl_pw_aff_add requires two isl_pw_aff parameters, but Affine is actually of isl_aff type. Perhaps we could first commit a patch file to handle common cases, then we can considering submitting another patch file to handle general cases.
>I think this is the right idea, but probably the wrong place to put it.
>I would put this into SCEVValidator::visitAddRecExpr. This function 
>always adds the AddRecExpr itself as a parameter, whenever it is found 
>to be parametric. However, what we should do is to create a new ScevExpr
>that starts at zero and is otherwise identical. We then add this as a 
>parameter. When doing this, we now also need to keep all the parameters
>that have been found previously in the base expression.
A lot of Polly functions access ParameterIds and Parameters using existing ScevExpr. If we create new ScevExpr and put them into Parameters and ParameterIds, it may require some extra tricks to handle the mapping from existing ScevExpr to newly created ScevExpr. I think we can consider fixing it later.
Cheers,
Star Tan

At 2013-07-30 00:27:27,"Tobias Grosser" <tobias at grosser.es> wrote:
>On 07/29/2013 09:15 AM, Sven Verdoolaege wrote:
>> On Mon, Jul 29, 2013 at 07:37:14AM -0700, Tobias Grosser wrote:
>>> On 07/29/2013 03:18 AM, Sven Verdoolaege wrote:
>>>> On Sun, Jul 28, 2013 at 04:42:25PM -0700, Tobias Grosser wrote:
>>>>> Sven: In terms of making the behaviour of isl easier to understand,
>>>>> it may make sense to fail/assert in case operands have parameters that
>>>>> are named identical, but that refer to different pointer values.
>>>>
>>>> No, you are allowed to have different identifiers with the same name.
>>>> I could optionally print the pointer values, but then I'd have
>>>> to think about what to do with them when reading a textual
>>>> representation of a set with such pointer values in them.
>>>
>>> Yes, this is how it is today.
>>
>> No, the pointer values are currently not printed.
>
>I was referring to the first sentence. I do not think printing pointer 
>values is what we want. It would make the output unpredictable not only 
>when address space randomisation is involved.
>
>>> I wondered if there is actually a need to
>>> allow the use of different identifiers with the same name (except all being
>>> unnamed?). I personally do not see such a need and would prefer isl to
>>> assert/fail in case someone tries to do so. This may avoid confusions as
>>> happened here. Do you see a reason why isl should allow this?
>>
>> Removing this feature would break existing users.
>
>Even if it would, the benefits for future users may outweigh this.
>Also, are you aware of a user that actually breaks?
>
>Anyway, on the Polly side we know the behaviour and can handle it. So 
>this is nothing I am very strong about. I just mentioned it as it seemed 
>to be a good idea.
>
>Cheers,
>Tobias
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130731/274bbbb4/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-ScopInfo-Create-shared-parameter-for-memory-accesses.patch
Type: application/octet-stream
Size: 6694 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130731/274bbbb4/attachment.obj>