[PATCH] D28961: [SLP] Fix for PR31690: Allow using of extra values in horizontal instructions.

Wed Jan 25 02:03:29 PST 2017

ABataev added inline comments.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:4249
+            // TreeN is a candidate for extra args itself, mark it.
+            ExtraArgs.erase(TreeN);
+            auto &P = Stack.back();
----------------
mkuper wrote:
> Let me make sure I understand what's going on here.
> What happens is that we've added an operand to the tree, but it turns out that we shouldn't have, and we should actually be using it as an "extra value". So we're undoing it. Right?
> Why don't we catch it before we add it to the tree? Is it because it happens to have the same opcode as the reduction, and we only realize we're "stuck" once we looks at its operands?
> 
> In any case, regardless of whether that's entirely correct, this needs to be documented more clearly.
> Also, is this checked by any of the tests you added?
> 
I'll try to explain what's going on here.
Imagine we have something
```
%add1 = fadd fast float %extra_val1, %extra_val2
%add2 = fadd fast float %red_val1, %add1
%add3 = fadd fast float %red_val2, %add2
```
Here we have reduced values `%red_val1` and `%red_val2` and the reduction opcode is `fadd`.
Analysis of the horizontal reduction is from bottom to top and from left to right. So, we starting building the tree from the `%add3` operation. It is marked as a root. `%red_val2` is marked as reduced value. Then `%add2` op is marked as the next node of the tree. `%red_val1` is marked as next reduced value. Then `%add1` is marked as the next possible reduction operation. But during analysis of its argument `%extra_val1` it is detected that it is neither reduction operation, not reduced value. We mark `%extra_val1` as an `extra value` for the operation `%add1`. Then we start analysis of `%extra_val2`. Again, it is neither reductiomn operation, nor reduced value. But `%add1` already marked as (possible) reduction operation with an `extra value`. In this case we understand, that the whole `%add1` is an `extra value` itself for `%add2` reduction operation. We unmark `%add1` operation as a reduction operation with an `extra value` and mark `%add1` as an `extra value` for `%add2` reduction operation.
The main problem here that currently we're able to catch only 1-level operations with `extra values` (because of `if` in line 4251). I'll try to make it more universal.

Yes, this functionality is tested. See the test `test/Transforms/SLPVectorizer/X86/horizontal-list.ll`, function `@extra_args`. Analysis for the instruction `[[ADD:%.*]] = fadd fast float [[CONV]], 3.000000e+00` works this way.

Actually, I had to add this functionality on Monday, because something changed in codegen/optimization passes after I published this patch on Friday. Before Monday `clang` at `-O2/-O3` generated a bit different code and this piece of code was not required. But then some changes were made (maybe in InstCombiner) and the IR code has changed, so I just had to make this changes to make it detect such kind of constructs. 

https://reviews.llvm.org/D28961