[llvm-dev] [RFC] Add IR level interprocedural outliner for code size.

Sean Silva via llvm-dev llvm-dev at lists.llvm.org
Tue Aug 1 12:28:07 PDT 2017


On Tue, Aug 1, 2017 at 10:28 AM, Daniel Berlin <dberlin at dberlin.org> wrote:

>
>>
>> Also as a side note, I think in the original MachineOutliner RFC thread
>> there was some confusion as to whether it was possible to solve the code
>> folding outlining problem exactly as a graph problem on SSA using standard
>> value numbering algorithms in polynomial time.
>>
>
>
>> I can elaborate further, but
>> 1. it is easy to see that you can map an arbitrary dag to an isomorphic
>> data flow graph in an SSA IR e.g. in LLVM IR or pre-RA MIR
>>
>
>
>> 2. Given two dags, you can create their respective isomorphic data flow
>> graphs (say, put them into two separate functions)
>> 3. An exact graph based code folding outliner would be able to discover
>> if the two dataflow graphs are isomorphic (that is basically what I mean by
>> exact) and outline them.
>> 4. Thus, graph isomorphism on dags can be solved with such an algorithm
>> and thus the outlining problem is GI-hard and a polynomial time solution
>> would be a big breakthrough in CS.
>>
>
> First, you'd have to reduce them in both directions to prove that ;)
>
> All that's been shown is that you can reduce it to a hard problem.
> You can also reduce it to 3-sat., but that doesn't mean anything unless
> you can reduce 3-sat to it.
>

Only one direction is needed to prove it is GI-hard. I think you are
thinking about proving it GI-complete. Solving a GI-hard problem in
polynomial time proves that GI is in P which would be a major breakthrough
in CS.


>
> 5. The actual problem the outliner is trying to solve is actually more
>> like finding subgraphs that are isomorphic, making the problem even harder
>> (something like "given dags A and B does there exist a subgraph of A that
>> is isomorphic to a subgraph of B")
>>
>>
> This assumes, strongly, that this reduction is the way to do it, and also
> that SSA/etc imparts no structure in the reduction that enables you to
> solve it faster (IE restricts the types of graphs, etc)
>
> FWIW, the exact argument above holds for value numbering, and since the
> days of kildall, it was believed not solvable in a complete fashion in less
> than exponential time due to the need to do graph isomorphism on large
> value graphs at join points.
>
> Except, much like RA and other things, it turns out this is not right for
> SSA, and in the past few years, it was proved you could do complete  value
> numbering in polynomial time on SSA.
>

Can you explain what structure SSA imbues besides just modeling a dag? The
description I gave above is phrased solely in terms of a data flow graph
(dag of say `add` instructions) in a single BB; there are no phi nodes or
anything.

I'm thinking something like:

For each vertex v (with label %foo) in the dag in topological order:
  append to the BB an instruction `add  %foo <- %bar, %baz, %qux, ...`
where `%bar, %baz, %qux, ...` are the labels of all instructions u such
that an edge u->v exists.

The BB generated this way is guaranteed to be SSA.

(for simplicity I've assumed an N-ary add instruction, but it is obvious
you can isomorphically desugar that into two-operand instructions)





> So with all due respect to quentin, i don't buy it yet.
>
> Without more, i'd bet using similar techniques to solve value numbering in
> polynomial time to SSA could be applied here.
>
> This is because the complete polynomial value numbering techniques are in
> fact, value graph isomorphism ....
>

My understanding is that standard complete GVN algorithms are looking for:

%0 = ...
add %1 <- %0, %0
add %2 <- %1, %1
add %3 <- %2, %2
...
add %1 <- %0, %0
add %2 <- %1, %1
add %3 <- %2, %2
...

and identifying the repeated
```
add %1 <- %0, %0
add %2 <- %1, %1
add %3 <- %2, %2
```


However, a code folding outliner like the one discussed here wants to be
able to identify the two sequences as equivalent even if the second run of
adds doesn't start with %0 is as an input.

%0 = ...
add %1 <- %0, %0
add %2 <- %1, %1
add %3 <- %2, %2
...
%foo0 = ...
add %foo1 <- %foo0, %foo0
add %foo2 <- %foo1, %foo1
add %foo3 <- %foo2, %foo2
...

A code folding outliner wants to find:

outlinedFunc(%arg) {
add %1 <- %arg, %arg
add %2 <- %1, %1
add %3 <- %2, %2
}

In other words, I think the algorithms you are thinking of prove equivalent
*values* whereas what the code folding outliner is looking for is
equivalent *computations* (which may have different inputs at the different
sites they occur and so do not necessarily have identical values at run
time). For example, %3 and %foo3 have different values at runtime.

As another example

%0 = ...
add %1 <- %0, %0
mul %2 <- %1, %1
add %3 <- %2, %2

%foo0 = ...
add %foo1 <- %foo0, %foo0
mul %foo2 <- %foo1, %foo1

%bar1 = ...
mul %bar2 <- %bar1, %bar1
add %bar3 <- %bar2, %bar2

The algorithm we want for the code folding outliner should identify that
```
add %1 <- %0, %0
mul %2 <- %1, %1
```
is the same computation as the `foo` computation
and
```
mul %2 <- %1, %1
add %3 <- %2, %2
```
is the same computation as the `bar` computation.



Or to put it yet another way, if you have:

foo(a,b,c) {
  a single BB with data flow graph G
}

bar(d,e,f) {
  a single BB with a graph isomorphic to G subject to d=b e=a c=f
}

This is easy to do with a standard VN algorithm if you knew a priori knew
d=b e=a c=f and just initialized the initial congruence classes right. The
hard part manifests itself I think in terms of discovering that
correspondence. (and in general the number of such live-in values to the
subgraphs being compared is O(the size of the graph), so the algorithm must
be polynomial in the number of such live-in values).

How does the algorithm you are thinking of infer the equivalence of those
live-in values?



Sorry for really digging in here, but I've actually thought about this a
lot since we talked about this at the social and I can't seem to find a
problem with my reasoning for the GI-hard proof and am genuinely curious.

-- Sean Silva



>
> So i'd assume the opposite (it can be done in polynomial time) without
> more.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170801/e98d3b1d/attachment.html>


More information about the llvm-dev mailing list