[PATCH] D15722: [WIP][Polly] SSA Codegen

Thu Apr 7 04:37:01 PDT 2016

Meinersbur added a comment.

> > Is it correct that this gives us O(n*m) generated PHIs (n=number of MK_PHIKind, m=number of statements)?

> 

> 

> Mh, it is unfortunatly not that simple. First, "n" should also include MK_Value but that is not even the problem.

Right; didn't think about MK_Value at that moment.

> The problem is with "m" which is not clearly defined. "m" has to be the number of "join-points" in the generated CFG. This number can grow exponentially with the number of statements/blocks in the generated CFG (which can be arbitrarily higher than the number of statements in the SCoP).

I was thinking about statements in the isl_ast, but meant the number of nodes in there. For each isl_ast_node (isl_ast_node_for and isl_ast_node_if), can there be at most one "join-point"?

> Thus, one can achieve O(n*2^m) PHIs, but that also means the optimized version has 2^m different paths through the region which will cause explosions in various other places too. That said, I am not even sure this kind of analysis makes sense because:

> 

> We will now only generate a PHI (for one particular SSA-value) if it is "needed" to join different versions of the original "value". While these PHIs can be dead, the total number of PHIs is linear in the number of paths.

The number of paths is already 2^m? Per join-point we can also have has many PHIs as scalars, no?

> Additionally, if a path does not introduce a new version of the SSA-value there won't be a PHI once it is joined and if it does contain a new version of an SSA-value we would have placed a store before and mem2reg would have placed the PHI later. The main difference in the complexity comes from dead PHIs that are (as mentioned) limited by the number of paths and could be removed easily after code generation by us.

Just exploring whether we can have a case where the number of unused PHIs grows excessively such that it would introduce another scaling problem.

> In short: BBMaps are "intra-statement" maps for __all__ local ssa-values but ScalarMaps are (control-)scope-aware "inter-statement" maps for __some__ ssa-values.

> 

> Summarized:

> 

> 1. ScalarMaps are used to remember mappings between statements

> 2. ScalarMaps are (control-)scope-aware, thus each (control-) scope (here loop or conditional) has it's own ScalarMap that is initialized with the ScalarMap of the parent (control-) scope.

> 3. ScalarMaps do not contain all ssa-value mappings but only those that are interesting, thus have a SAI object.

Thank you for the explanation.

================
Comment at: include/polly/Support/ScopHelper.h:176
@@ +175,3 @@
+///
+/// @returns The merged mappings in @p MergeMap.
+void createMergePHIs(PollyIRBuilder &Builder,
----------------
jdoerfert wrote:
> Meinersbur wrote:
> > This function returns void
> I know, therefor the comment in the @returns clause.
`@return`(s) is intended to document the function's return value, but this function doesn't return anything. MergeMap should be documented using eg.
```
  /// @param MergeMap Receives the merged mappings in @p MergeMap.
``` 

================
Comment at: lib/CodeGen/BlockGenerators.cpp:265
@@ +264,3 @@
+Value *BlockGenerator::createLoopCarriedPHI(Value *V, ValueMapT &BBMap) {
+  auto *PHI = PHINode::Create(V->getType(), 2, V->getName() + ".polly.lc");
+
----------------
jdoerfert wrote:
> Meinersbur wrote:
> > What is the reason to not insert the PHI into the generated BB at this point (but to keep them "floating")?
> This function and all callers are in the BlockGenerator, but the block where these PHIs are supposed to reside in [the loop header of a Polly generated loop] is created in the IslNodeBuilder (or more precise the LoopGenerator). Thus, we either keep them floating or remember loop header blocks somewhere in the BlockGenerator. I choose the first as we can easily place them in the IslNodeBuilder if they are actually needed or delete them otherwise.
OK

================
Comment at: lib/CodeGen/BlockGenerators.cpp:442
@@ +441,3 @@
+    // In the second case we map both, the PHICopy as well as the BaseAddr to
+    // the NewValue to allow uses textually before as well as after the scalar
+    // definition.
----------------
jdoerfert wrote:
> Meinersbur wrote:
> > "textually" doesn't seem the right word; we are not doing text processing here. Should be something about processing order.
> AFAIK this is a standard term that we also used in our discussions and emails before. Processing order would not meet the requirements anyway as the actual dynamically executed order is not what this is about. "textually" means the order in which you statically write/see/discover things. In other words the order in which you first see the statements in a linearized/dumped AST if you read it like a page (from top to bottom).
OK, but never seen this personally.

================
Comment at: lib/CodeGen/IslNodeBuilder.cpp:523-524
@@ +522,4 @@
+  auto PreMap = ScalarMap;
+  auto PreLCPHIs = LCPHIs;
+  LCPHIs.clear();
+
----------------
jdoerfert wrote:
> Meinersbur wrote:
> > std::move
> That would "destory" LCPHIs here, wouldn't it? AFAIK you shall not access something that you passed to std::move. We still need the container but just without its content.
std::swap then?
```
ValueMapT PreLCPHIs;
LCPHIs.swap(PreLCPHIs);
```
Although I'd agree this doesn't really make the intend clearer; It's up to you.

================
Comment at: lib/Support/ScopHelper.cpp:468
@@ +467,3 @@
+    if (!IncomingVal)
+      IncomingVal = UndefValue::get(Val->getType());
+
----------------
jdoerfert wrote:
> Meinersbur wrote:
> > If the incoming value is not found in ScalarMap, I can think of two reasons:
> > 1) It's defined before the Scop
> > 2) We forgot to put in there
> > 3) It was defined only in one incoming branch.
> > In none of these cases "undef" would make sense.  In 3) it cannot be used without an explicit PHI in the original code. What other case is there?
> 1) and 2) cannot/should never happen or would at least be bugs. 2) is clearly a bug and 1) should always be present.
> 
> The reason for the undef here is the same as above, thus 3).
> 
> 
> Starting with:
> ```
>    if (a) {
> S:   x = A[i];
> P:   A[i] = x;
>    }
> ```
> 
> the scheduler can generate:
> 
> ```
>    if (a) {
> S:   x = A[i];
>    }
> 
>    if (a) {
> P:   A[i] = x;
>    }
> ```
> 
> which needs a PHI (for x after the conditional containing S) even though the original code did not contain/need any. Additionally, there is only one path to that PHI that defines a value for x but we need an operand for the other path reaching the PHI. As I explained Sebastian, these undef's were basically present before, as "content" of the alloca slots we generated. And if you look at the code we generate at the momement for the example abvoe and run mem2reg you will see the same udnef's again.
Thank you for the explanation with example

================
Comment at: lib/Support/ScopHelper.cpp:479
@@ +478,3 @@
+
+  if (Values.size() == 1) {
+    MergePHI->eraseFromParent();
----------------
jdoerfert wrote:
> Meinersbur wrote:
> > Because Value.size() will be equal to BBs.size(), this condition can be tested before creating an not used MergePHI.
> It is not necessarily equal to BBs.size() as the mappings in several BBs can be the same. If that happens to be the case for all BBs we can remove the PHI immediatly but we do not know that until we looked up all mappings.
Thank you; I missed that Values is a set that 'collapses' equal values.

http://reviews.llvm.org/D15722