[PATCH] D130489: [GVN] MemorySSA for GVN (alternative algorithm)

Mon Jul 25 07:47:31 PDT 2022

chill created this revision.
Herald added subscribers: kosarev, jeroen.dobbelaere, ormris, wenlei, kerbowa, steven_wu, hiraditya, jvesely.
Herald added a project: All.
chill requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

This is an alternative algorithm for using Memory SSA for Redundant Loads Elimination in GVN. This one does not need to scan
whole basic blocks for memory accesses. It passes bootstrap and llvm-tests-suite. I have also squashed it in one commit for
easier initial review.

Description of the algorithm:

  The RLE algorithm in GVN expects as input, for a given load instruction, the

set of all the basic blocks (and specific instructions, if they can be
determined) that may define the content of the load location, such that there
exists a path from the memory definition to the load along which the location is
not modified.  For the purpose of GVN, such memory definitions are instructions
which may write to the location or instructions that must read (a superset) of
the bits of the location, in other words "may alias" writes and "must alias"
reads.

Broadly, this algorithm has three parts (more details below):

- quick check for a local dependency
- discover the set of basic blocks (and MemoryDefs, a.k.a clobbers) which could affect the load location, as well as all the blocks that lie on a path from a clobbering block to the load instruction, but do not modify the memory location ("transparent" blocks).
- examine the uses of the clobbers which belong to one of the basic blocks found in the previous step and find ones which alias the memory location

  First, we do a quick check for a local dependency, i.e. a memory read or write

in the same basic block as the initial load instruction. Doing this first allows
us to avoid having to disambiguate between the parts of the initial basic block
before and after the original load instruction (when entered from a backedge).

  We find the clobbering set of blocks by doing a walk of the reverse CFG

(iterative DFS with a worklist), stopping (i.e. not going to predecessors) at
blocks/instructions which definitely modify the memory location.

  With each basic block, we obtain a few pieces of associated information:

  - a memory address

  - a initial clobbering memory access

  - a final clobbering memory access

  The worklist is seeded with the predecessors of the starting basic block (the

one that contains the initial load instruction). For each of these blocks, the
memory address is the pointer operand of the load, possibly phi-translated and
the initial clobbering memory access is the defining access of the load, again,
possibly phi-translated. The final clobbering memory access is at first the same
as the initial one. These blocks are added to the set.

  On each iteration we fetch and examine the next basic block. There are a few

cases to consider:

- if the memory address is unknown, it means phi-translation failed; do nothing at this point, later the block will be thought of as modifying the memory location in unknown way
- we have a (final) clobbering memory access that is an actual instruction (not a MemoryPhi) in the current block and indeed clobbers the memory address; we record the dependency information for later use, stop further traversal from this block and continue with the next iteration
- we have a (final) clobbering memory access that is an actual instruction (not a MemoryPhi) in the current block and does not actually clobber the memory address; we record its defining memory access as the new final clobbering memory access, re-queue the block, and continue to the next iteration.
- if none of the above clauses triggers, we know that block is "transparent", i.e. the memory location is not modified when execution passes through this block. We add the predecessors of the block to the set. If needed, phi-translate the address and the final clobbering memory access. Then add the predecessors to the worklist and proceed to the next iteration.

  There are a few subtle moments when transitioning to predecessors:
- if the address needs to, but cannot be phi-translated, we consider the memory location as modified in an unknown way just before entering the block; since we know the block is transparent, for uniformity and without loss of information we can treat this situation as if the memory location was modified immediately after exiting the block
- if a predecessor is a block already visited with a different address we mark the current block as modifying the memory location in an unknown way. The dependency information that RLE in GVN expects is per block and it cannot represent dependencies coming from the same block.
- if a predecessor is an already visited (with the same address) block, just skip it at this point

  At this phase of the algorithm, the set we're building serves as the "visited"

set so as to prevent infinite loops.

  After this first phase of the algorithm finishes we have obtained the needed

set of the basic block as well as the memory writing instructions that could
modify our location. For each block, the initial clobbering memory access is the
one we enter the block with and there's a use-def chain to the final clobbering
memory access where every access is non-aliasing, except for the final one.

  We still have to determine if there are memory reads that act as definitions

(see above). Once again we do an iterative DFS on the reverse CFG.

  On each iteration we look for memory uses in the current block. For this we

first construct a list of the memory clobbers that could be a defining memory
access for our memory location. This list begins with the initial clobbering
memory access for the current block, following the clobbers along the use-def
chain up to the final clobbering memory access. If the final clobbering memory
access belongs to the same block and is not a MemoryPhi that finishes building
the list - we can't have an aliasing use in the block, that is not a use of the
final clobbering memory access. Otherwise, we look to extend the list in a
couple of ways:

- if the final clobbering memory access is in another block, we transition to this other block and repeat the procedure
- if the final clobbering memory access is in the same block and it's a MemoryPhi, we transition to the immediate dominator of the current block and repeat the procedure

  In both cases, transition occurs if and only if the new block is already among

the collected set of clobbering blocks.

  Once we have the list of clobbers, we look in turn at uses of each clobber in

the list, that are in the current block, looking for memory reads which could
provide (a superset of) the bits of the memory location. For those reads, we
choose the one closest to the end of the block. A following (dominating) clobber
in the list is processed only if no usable memory read was found so far.

  If an interesting memory use was not found, we look for a clobber in the

current block. This was already determined in the phase which collected the set
of clobbering block. If there's a clobber, we include it in the list of the
results, otherwise the block is "transparent" and we add its to the worklist.

https://reviews.llvm.org/D130489

Files:
  llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h
  llvm/include/llvm/Transforms/Scalar/GVN.h
  llvm/lib/Passes/PassBuilder.cpp
  llvm/lib/Passes/PassRegistry.def
  llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
  llvm/lib/Transforms/Scalar/GVN.cpp
  llvm/lib/Transforms/Scalar/GVNHoist.cpp
  llvm/test/Analysis/MemoryDependenceAnalysis/InvariantLoad.ll
  llvm/test/Analysis/TypeBasedAliasAnalysis/gvn-nonlocal-type-mismatch.ll
  llvm/test/Analysis/ValueTracking/gep-negative-issue.ll
  llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
  llvm/test/Other/new-pm-defaults.ll
  llvm/test/Other/new-pm-lto-defaults.ll
  llvm/test/Other/new-pm-print-pipeline.ll
  llvm/test/Other/new-pm-thinlto-defaults.ll
  llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
  llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
  llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
  llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
  llvm/test/Transforms/GVN/PRE/rle.ll
  llvm/test/Transforms/GVN/condprop-memdep-invalidation.ll
  llvm/test/Transforms/GVN/no-mem-dep-info.ll
  llvm/test/Transforms/LoopVectorize/X86/metadata-enable.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D130489.447337.patch
Type: text/x-patch
Size: 89206 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220725/8dc21e86/attachment.bin>