[PATCH] D35761: [Polly][WIP] Use SCEV information for the second level aliasing

Sat Aug 5 02:19:32 PDT 2017

gareevroman updated this revision to Diff 109861.
gareevroman added a comment.

> 1. Why do we need to use SCEV? Should we not be able to tell form our information which base pointers are expected to be identical?

I think we should be able to tell that. However, it seems we don't do it.

The test case swaps the following access relations (the original JSCoP can be found in the new version of the patch):

{

  "kind" : "read",
  "relation" : "{ Stmt_for_body6[i0, i1, i2] -> MemRef_C1[0] }"

}

{

  "kind" : "read",
  "relation" : "{ Stmt_for_body6[i0, i1, i2] -> MemRef_C[i0, i1] }"

}

Subsequently, for some reason, Polly generates different accesses to elements of the matrix C. For example, in case of C[3][7], Polly generates the following code:

    %459 = mul nsw i64 96, %polly.indvar98
    %460 = mul nsw i64 4, %polly.indvar134
    %461 = add nsw i64 %459, %460
    %462 = add nsw i64 %461, 3
    %463 = mul nsw i64 8, %polly.indvar128
    %464 = add nsw i64 %463, 7
    %465 = mul nsw i64 256, %polly.indvar
    %466 = add nsw i64 %465, %polly.indvar140
    br label %polly.stmt.for.body6940

  polly.stmt.for.body6940:                          ; preds = %polly.stmt.for.body6914
    %polly.access.cast.C941 = bitcast [1024 x double]* %C to double*
    %467 = mul nsw i64 96, %polly.indvar98
    %468 = mul nsw i64 4, %polly.indvar134
    %469 = add nsw i64 %467, %468
    %470 = add nsw i64 %469, 3
    %polly.access.mul.C942 = mul nsw i64 %470, 1024
    %471 = mul nsw i64 8, %polly.indvar128
    %472 = add nsw i64 %471, 7
    %polly.access.add.C943 = add nsw i64 %polly.access.mul.C942, %472
    %polly.access.C944 = getelementptr double, double* %polly.access.cast.C941, i64 %polly.access.add.C943
    %tmp_p_scalar_945 = load double, double* %polly.access.C944, align 8, !alias.scope !136, !noalias !137
    %polly.access.cast.Packed_A946 = bitcast [24 x [256 x [4 x double]]]* %Packed_A to double*
    %polly.access.mul.Packed_A947 = mul nsw i64 %polly.indvar134, 256
    %polly.access.add.Packed_A948 = add nsw i64 %polly.access.mul.Packed_A947, %polly.indvar140
    %polly.access.mul.Packed_A949 = mul nsw i64 %polly.access.add.Packed_A948, 4
    %polly.access.add.Packed_A950 = add nsw i64 %polly.access.mul.Packed_A949, 3
    %polly.access.Packed_A951 = getelementptr double, double* %polly.access.cast.Packed_A946, i64 %polly.access.add.Packed_A950
    %tmp1_p_scalar_952 = load double, double* %polly.access.Packed_A951, align 8, !alias.scope !7, !noalias !10
    %polly.access.cast.Packed_B953 = bitcast [256 x [256 x [8 x double]]]* %Packed_B to double*
    %polly.access.mul.Packed_B954 = mul nsw i64 %polly.indvar128, 256
    %polly.access.add.Packed_B955 = add nsw i64 %polly.access.mul.Packed_B954, %polly.indvar140
    %polly.access.mul.Packed_B956 = mul nsw i64 %polly.access.add.Packed_B955, 8
    %polly.access.add.Packed_B957 = add nsw i64 %polly.access.mul.Packed_B956, 7
    %polly.access.Packed_B958 = getelementptr double, double* %polly.access.cast.Packed_B953, i64 %polly.access.add.Packed_B957
    %tmp2_p_scalar_959 = load double, double* %polly.access.Packed_B958, align 8, !alias.scope !6, !noalias !8
    %p_mul960 = fmul double %tmp1_p_scalar_952, %tmp2_p_scalar_959
    %p_add961 = fadd double %tmp_p_scalar_945, %p_mul960
    %polly.access.C1962 = getelementptr double, double* %C1, i64 0
    %tmp3_p_scalar_963 = load double, double* %polly.access.C1962, align 8, !alias.scope !3, !noalias !13
    %p_add18964 = fadd double %tmp3_p_scalar_963, %p_add961
    %scevgep965 = getelementptr [1024 x double], [1024 x double]* %C, i64 %462, i64 %464
    store double %p_add18964, double* %scevgep965, align 8, !alias.scope !138, !noalias !139
    %polly.indvar_next141 = add nsw i64 %polly.indvar140, 1
    %polly.loop_cond142 = icmp sle i64 %polly.indvar_next141, 255
    br i1 %polly.loop_cond142, label %polly.loop_header137, label %polly.loop_exit139

Let's consider how accesses to the matrix C are formed. Indices of the first dimension are equal:

…

  %459 = mul nsw i64 96, %polly.indvar98
  %460 = mul nsw i64 4, %polly.indvar134
  %461 = add nsw i64 %459, %460
  %462 = add nsw i64 %461, 3

…

  %467 = mul nsw i64 96, %polly.indvar98
  %468 = mul nsw i64 4, %polly.indvar134
  %469 = add nsw i64 %467, %468
  %470 = add nsw i64 %469, 3

…

Indices of the second dimension are equal too:

…

  %463 = mul nsw i64 8, %polly.indvar128
  %464 = add nsw i64 %463, 7

…

  %471 = mul nsw i64 8, %polly.indvar128
  %472 = add nsw i64 %471, 7

…

However, to store a value we access the two dimensional array:

%scevgep965 = getelementptr [1024 x double], [1024 x double]* %C, i64 %462, i64 %464

  store double %p_add18964, double* %scevgep965, align 8

To read the value we access the one dimensional array:

  %polly.access.cast.C941 = bitcast [1024 x double]* %C to double*

…

  %polly.access.mul.C942 = mul nsw i64 %470, 1024

…

  %polly.access.add.C943 = add nsw i64 %polly.access.mul.C942, %472
    %polly.access.C944 = getelementptr double, double* %polly.access.cast.C941, i64 %polly.access.add.C943
    %tmp_p_scalar_945 = load double, double* %polly.access.C944, align 8

Consequently, as far as I understand, we have two different base pointers that point to the same location. Since we compare raw pointers to determine a second level alias set, Polly generates different second level alias set for these read and write accesses to the matrix C. In case of DeLiCM, we have a similar situation. However, I haven't manage to get a reduced test case.

Probably, it'd be good to fix the problem of the redundant code generation of Polly. Unfortunately, I'm busy at the moment and can't do it. In any case, I think that it makes sense to make the second level aliasing use the SCEV information, since it makes it more robust.

> 2. What exactly must be checked in the new test case here.

Check that we don't create different alias sets for locations represented by different raw pointers.

> What was generated before, why was this wrong,

Previously, we generated 64 second level alias sets instead of 32 second level alias sets, since we comparee raw pointers to determine second level alias sets.

> and what does the new code generate.

32 second level alias sets.

> Also, I may have missed this earlier. Why are the number of elements in the check lines growing so much. Is this expected?

> CHECK-NEXT: !43 = !{!3, !4, !0, !6, !7, !11, !14, !16, !18, !20, !22, !24, !26, !28, !30, !32, !34, !36, !38, !40}

I think this is expected. The innermost loop computes 32 different elements of the matrix C. Consequently, 32 different second level alias sets are generated.

P.S.: Sorry, I've found out that we check the wrong output. I've updated the test case.

https://reviews.llvm.org/D35761

Files:
  include/polly/CodeGen/IRBuilder.h
  lib/CodeGen/IRBuilder.cpp
  test/ScheduleOptimizer/kernel_gemm___%for.body---%for.end24.jscop
  test/ScheduleOptimizer/kernel_gemm___%for.body---%for.end24.jscop.transformed
  test/ScheduleOptimizer/pattern-matching-based-opts_12.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D35761.109861.patch
Type: text/x-patch
Size: 10012 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170805/2f461e76/attachment.bin>