[PATCH] D81416: [LV] Interleave to expose ILP for small loops with scalar reductions.

Fri Aug 21 12:33:45 PDT 2020

fhahn added a comment.

In D81416#2230857 <https://reviews.llvm.org/D81416#2230857>, @AaronLiu wrote:

> In D81416#2230628 <https://reviews.llvm.org/D81416#2230628>, @fhahn wrote:
>
>> Just to double check, wouldn't it be sufficient if loop-unroll would unroll the loop? Or is this not happening? It seems like `loop-unroll` would unroll the loop in the test-case.
>
> It would unroll the loop in the testcase, but it will not break dependence and expose ILP, will not help performance, and actually unroll hurt performance in this case.

I didn't took a very close look, but wouldnt unrolling generate similar code as interleaving, modulo different order of instructions,  but with similar compute instructions trees?

Also, it seems like the loop already gets interleaved on current trunk (With IC=2) and this patch makes it more aggressive. Might be good to describe how the IC gets boosted by this option.

================
Comment at: llvm/test/Transforms/LoopVectorize/PowerPC/interleave_IC.ll:5
+;void fun(Vector<double>       &MatrixB,
+;         const Vector<double> &MatrixA,
+;         const unsigned int * const start,
----------------
AaronLiu wrote:
> fhahn wrote:
> > If C++ code is included, it would be good if it would be self-contained and build-able. Otherwise I am not sure what value it adds?
> We want to show the original problem, but do not want to copy the original code. For easy to understand, we use kind of pseudo C++ code, to show the characteristics of the original code which has reductions inside of nested loops, and indirect references for induction variables and reduction operands, etc.  The original self-contained and build-able codes are too complex to show here.
Right, I guess it involves guessing what the types and operators do. I think updating the IR to use more descriptive names rather than `tmp*` can also go a long why to make things easier to follow.

================
Comment at: llvm/test/Transforms/LoopVectorize/PowerPC/interleave_IC.ll:55
+%14 = type { %15, i64 }
+%15 = type { i32, %15*, %15*, %15* }
+%16 = type { i32 (...)**, i8* }
----------------
AaronLiu wrote:
> fhahn wrote:
> > Area all those types necessary? Would be good to clean up the test, including the GEPs with null/undef, otherwise the test might be painful to update in the future.
> The testcase is extracted and reduced from a real application which has very complex and nested data structures. This is the reason why it has those types defined in the IR.
sure, unfortunately the reduction tools are not perfect. Still, ideally the test would be as small as necessary to illustrate the problem. Otherwise it will potentially become a burden when making further changes. There are only a few memory accesses in the test and it should be possible to update them to use regular types. 

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81416/new/

https://reviews.llvm.org/D81416