[Patch] GVN fold conditional-branch on-the-fly

Fri Sep 6 22:25:48 PDT 2013

>
>> As far as I can tell,  what  GVN at llvm really need to improve is detecting
>> mem-op redundancy. But this is not GVN's fault.
>>
> Actually, it is, in part.  GVN does not detect a lot of  opportunities
> because it cannot detect the memory operations are the same.  This is
> because it does not understand how to do things like value number
> loads and stores for real, etc.   It knows how to pull values out of a
> direct must-alias store/load dependence, which is not the same.  It
> relies on having eliminated everything else in the way, and
> canonicalized everything in the world up to that point.   It does not
> really understand stores and loads at all.
>
In particular, GVN's memory related slowness (but not it's
powerlessness), and the memdep calls, are really a symptom of a larger
problem.
It comes from the fact that GVN is really solving a dataflow problem
in the most expensive way possible.  It's not really value numbering,
it's part of redundancy elimination.

The problem it is currently solving is:

Given this store, what values does it generate (including various
possible permutations and coercions), and which value how far down the
CFG are these values still available.
Given a load, what stored values are still available at this point.

It's not actually asking whether they are the *same* (which would be
value numbering), just whether one can be replaced with the other
(which is redundancy elimination). It doesn't care that they are the
same, only whether they are must-dep and one can be coerced into the
other.

The way GVN computes this is:
 for each load
   ask memdep for all dependences
   If the dependence is local, see what values are generated  by this
dependence to see whether it can be coerced into what the load wants.
   If the dependence is non-local
      look at up to 100 dependences to see what values are generated
by the dependencies, and whether they can all be coerced into the
load. See AnalyzeLoadAvailability (which is a misnomer, it's really
computing value availability)
      If this fails due to unavailable values it also sees if it can
do a cheap form of PRE

There are three problems with doing it this way:

1. It ends up repeatedly computing what each store/load/meminst
generates in terms of values
2. For nonlocal dependences, this involves walking the CFG backwards a
*lot* to gather the entire set of non-local dependences (It gives up
when there are more than 100).  It proceeds to do this for *every
load*, instead of computing a dataflow set and propagating it through
blocks (since memory is not in SSA in LLVM, you can't do this
sparsely) like a normal dataflow problem.
Because of this, it does repeated follow-all-paths CFG walking inside
memdep, and unnecessary dependence finding, because it doesn't
actually care about all the dependences (and in fact, in the case of
finding certain types of dependencies, could short circuit everything
and just give up)
3. The invalidation of memdep caches repeatedly make the above two much worse.

Another way of looking at it is that  it doesn't really want the
dependences, it wants the value availability, which needs memdep info
to compute, but is not the same problem.

GCC's GVN-PRE computes this dataflow problem part of PRE, because it
also computes "could i make this stored value available again here",
not just "is there a store with this value available".

As mentioned, LLVM's GVN does a very simple form of load PRE that
solves parts of this problem for some loads, in some cases.

In any case, if you rewrote how it uses memdep to solve the dataflow
problem it's really trying to solve, the way everyone else solves it,
it would likely be a lot faster :)
This would not increase the powerfulness, but it would increase the speed.
Increasing powerfulness would involve actually having to value number
loads and stores, which is almost basically pointless if you treat
every phi node as a new value, like it does currently.