[PATCH] D54517: [CGP] Limit Complex Addressing mode by number of BasicBlocks to traverse

Tue Nov 20 19:57:31 PST 2018

skatkov added a comment.

In https://reviews.llvm.org/D54517#1304803, @bjope wrote:

> Basically I think the code looks ok. However, I'm not so familiar with this algorithm so it is hard to comment about the actual solution.
>
> My understanding is that you introduce a threshold, and if the size of the TraverseOrder vector grows past the threshold we bail out from findCommon. So what is the impact of this?
>  I assume it means that we limit the amount of "Complex Addressing mode" optimizations somehow. Is this limit only hit for "large" programs? When compiling a program that hits the threshold, do we lose some optimizations or will the amount of Complex Addressing mode optimizations in such a program be reduced significantly?
>
> How did you choose the current threshold? (that is probably something people want to know when trying to finetune this 4 years from now, so it could be nice to say something about it in the commit msg even if it just is an estimate)

Hi Bjorn, thank you for looking into this patch.

The optimization itself does the following:
Let's we have load from pointer p. We traverse all paths from p to original pointer skipping phi nodes and selects. Let's we found that p can be actually p.1, p.2, .. p.N. Each of p.i = gv.i + base.i + index.i * scale.i + offset.i.
If for all i: all parts are the same we apply optimization (move pointer computation close to loop to create a complex addressing load.
If for all i the parts are different only by one field gv, base or index then we go to complex case which is targeted by this patch.

Say the difference in the base. Generally we want to build a number of phi nodes to have right base at load.
TraverseOrder size is actually approximately the number of basic blocks between load's BB and all p.i's BB traversing be predecessors from load's BB.
The best benefit you get if all Phi nodes for base are already exists, so you will not create new Phi nodes but probably removes redundant ones (to get p).

The number 100 has no specific meaning. It is some threshold which is big enough to accept more or less big CFG.
The PR39625 contains a test which seems a corner case for this optimization. The distance between load and original p.i reaches the values about 16000 there. I did several runs with different values of this threshold and it choose 100 as giving the reasonable compile time even in debug build.

I hope it helps.

https://reviews.llvm.org/D54517