[PATCH] D26224: NewGVN

Mon Dec 12 17:52:54 PST 2016

dberlin added inline comments.

================
Comment at: lib/Transforms/Scalar/NewGVN.cpp:10
+/// \file
+/// This file implements the new LLVM's Global Value Numbering pass.
+///
----------------
davide wrote:
> silvas wrote:
> > This file comment needs to be signficantly expanded. Remember, lots of people looking at this class might be e.g. people taking a compiler class that want to look at a "real" GVN implementation. Let's make sure come away impressed so that they will want to join LLVM!
> > 
> > At the very least, some citations for the relevant paper and stuff, along with summary of which exact variant of the algorithm are implemented would be good. Also, I think the high-level idea of GVN is simple enough that a high-level from-scratch description would be appropriate. I can help with writing this if you want.
> > 
> > In theory, we can expand this later, but when have you seen a commit improving a file-level comment? The only one I remember was in response to post-commit review asking for an improved file-level comment. So getting it right the first time is actually pretty important.
> I agree. Tried to expand it a bit.

I'm happy to describe the sparse predicated algorithm a bit if you want to add it.  I'll touch on the predication/etc bits when we add them.

Traditional GVN algorithms fall into two categories: Congruence partitioning and Hash based.

Hash based GVN's hash the operation performed by an instruction in some fashion, and look it up in a hash table.  Anything that hashes the same and is otherwise "congruent" is considered equal.  A hash based value numbering is optimistic if it is assumes that everything not in the table is congruent to everything else, and pessimistic if it is assumes everything not in the table is not congruent to everything else.

Congruence partitioning based GVN's start with every value in a single partition, and split the partition as they discover values that are not equal.

Optimistic hash based GVN and congruence partitioning GVN will discover the same set of congruences.

Most compilers nowadays use optimistic hash based approaches.  The downside to optimistic hash based value numbering is that it requires reprocessing the entire routine again and again until the hashtables stops changing.  This is because value dependences are not tracked well enough to know what must be reprocessed, and values can be involved in cycles (meaning there is no perfect order in which you can process the function to get a correct result).   This makes these algorithms non-sparse. There are refinements to these algorithms, such as SCC based value numbering, which only requires iterating SCC's of the SSA graph, but most compilers use the hash table approach.

By contrast, the algorithm is more like the sparse conditional constant propagation algorithm, and uses a worklist of instructions to process.   Dependencies between values and instructions are tracked finely enough (through the CongruenceClass structure) that when the value an operation has changes, we add the possibly dependent instructions to the worklist and keep going.

Memory locations s also value numbered by this algorithm.  For memory, the goal of the algorithm is to discover the values stored at various memory locations (instead of just what loads are equivalent).  Because of this loads and stores are value numbered together (while they are different expression classes, the hash ensures this occurs).    MemorySSA is used to value number memory state.  

To give a concrete example, given:
1 = MemoryDef(0)
store %a, %ptr

and

MemoryUse (1)
load %ptr

These will be value numbered into the same congruence class, as the memory is the same location with the same value.

This also enables the algorithm to discover equivalences that alias analysis cannot easily do.

A trivial example:

1= MemoryDef(0)
store %a, %ptr
MemoryUse(1)
load %ptr
2 = MemoryDef(1)
store %a, %ptr
MemoryUse(2)
load %ptr

These loads are equivalent, but a simple value numbering will not discover this.

The algorithm we use will discover that the stores store the same value, and thus will say that 1 and 2 are equivalent memory states.

It will then value number 
MemoryUse(2) 
load %ptr

as if it was 
MemoryUse(1)
load %ptr

This enables the algorithm to discover fairly advanced (and even cyclic) equivalences between memory locations, much as it will do for scalars.

The algorithm used also performs unreachable code elimination/etc, similar to how sparse conditional constant propagation works. It optimistically assumes edges are unreachable until proven otherwise, and ignores unreachable values when value numbering phi nodes to create a maximal answer to value equivalence.

In addition to the above this algorithm supports forward propagation, global reassociation, and predication.

https://reviews.llvm.org/D26224