[PATCH] D153493: [dataflow] avoid more accidental copies of Environment

Mon Jun 26 13:26:01 PDT 2023

sammccall added a comment.

In D153493#4450191 <https://reviews.llvm.org/D153493#4450191>, @ymandel wrote:

> I don't want to block the progress you're making elsewhere and I think the concerns here are sufficiently localized to revisit another time. So, feel free to reify any suggestions/disagreements in FIXMEs and submit.
>
> In D153493#4449993 <https://reviews.llvm.org/D153493#4449993>, @sammccall wrote:
>
>>> Regarding the design. This looks like an optimal solution in terms of copying but introduces lifetime risks and complexity that I'm uncomfortable with.
>>
>> FWIW I agree with the complexity (not the lifetime risks).
>> I think it ultimately stems from having too many interacting types & functions with complicated and inconsistent signatures & constraints (TypeErasedDataflowAnalysisState, Environment, TypeErasedLattice, Lattice, the various join functions, etc) so we resort to case-bashing.
>> However this system is difficult to make changes to, and getting consensus as to what changes are good also doesn't seem easy. So I think we need to be able to fix bugs within the existing design (where copies need to be avoided ad-hoc).
>
> I realize the complexity is frustrating, but I don't see how that's related to the issue here. The complexity of the `JoinedStateBuilder` is caused by the desire to minimize copies. The multiple cases come directly from the algorithm itself, most particularly that we may or may not be in an initial state, and the previous block may or may not have an environment we care about. At least, that is the specific issue I'm concerned with.

We have 9 state transitions (3 states x 3 operations). Each would be less than half as complicated if it didn't have to deal with Lattice's join being mutating and Environment's being non-mutating.

I believe further simplifications are possible (essentially: back at the callsite track `vector<const Env*> All` plus `deque<Environment> Owned` for owned copies, special case `All.size() == 0` and `All.size() == 1` and otherwise `joinVec(All)`.
However with Lattice's mutating join involved this no longer works, and after addressing that it's no longer simpler.

>>> There's a lot of flexibility here and I wonder if we can reduce that without (substantially?) impacting the amount of copying. Specifically, I wonder if we can have the builder *always* own an environment, which simplifies the handling of CurrentState and, generally, the number of cases to consider. It costs more in principle, but maybe negligible in practice?
>>
>> I don't understand the specific suggestion:
>>
>> - if we create the builder lazily, we're just moving handling this extra state into the callsites
>> - if the builder owns a copy of initenv, its FC token will end up in the result FC
>> - if the builder owns a copy of initenv and detects when it should discard it, that just sounds like nullptr with extra steps
>>
>> can you clarify?
>
> I suppose I'm thinking along the lines of:
>
>> - if the builder owns a copy of initenv, its FC token will end up in the result FC
>
> Can you expand on why this is a problem? That would help motivate the complexity.

The purpose of this change is to stop introducing meaningless variables into the FC, because they impede debugging and slow down the SAT solver.
An extra join with a copy of InitEnv introduces up to 3 extra variables: one for InitEnv, one for the copy, one for the join.

> "join with InitEnv" should reduce to the identity function, so if its not, that feels like a problem we're sweeping under the rug.

Only in a mathematical sense - in reality humans need to read these SAT systems, and we're using a solver that can't perform that reduction.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153493/new/

https://reviews.llvm.org/D153493