[all-commits] [llvm/llvm-project] dc8477: [GlobalISel] Add a store-merging optimization pass...

Mon Nov 15 21:10:58 PST 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: dc84770d559b3305524fc86b945a697a803fc5c7
      https://github.com/llvm/llvm-project/commit/dc84770d559b3305524fc86b945a697a803fc5c7
  Author: Amara Emerson <amara at apple.com>
  Date:   2021-11-15 (Mon, 15 Nov 2021)

  Changed paths:
    A llvm/include/llvm/CodeGen/GlobalISel/LoadStoreOpt.h
    M llvm/include/llvm/InitializePasses.h
    M llvm/lib/CodeGen/GlobalISel/CMakeLists.txt
    M llvm/lib/CodeGen/GlobalISel/GlobalISel.cpp
    A llvm/lib/CodeGen/GlobalISel/LoadStoreOpt.cpp
    M llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
    M llvm/test/CodeGen/AArch64/GlobalISel/gisel-commandline-option.ll
    A llvm/test/CodeGen/AArch64/GlobalISel/store-merging.ll
    A llvm/test/CodeGen/AArch64/GlobalISel/store-merging.mir
    M llvm/unittests/CodeGen/GlobalISel/CMakeLists.txt
    A llvm/unittests/CodeGen/GlobalISel/GISelAliasTest.cpp

  Log Message:
  -----------
  [GlobalISel] Add a store-merging optimization pass and enable for AArch64.

This is a first attempt at a constant value consecutive store merging pass,
a counterpart to the DAGCombiner's store merging optimization.

The high level goals of this pass:

* Have a simple and efficient algorithm. As close to linear time as we can get.
  Thus, prioritizing scalability of the algorithm over merging every corner case
  we can find. The DAGCombiner's store merging code has been the source of
  compile time and complexity issues in the past and I wanted to avoid that.
* Don't introduce any new data structures for ordering memory operations. In MIR,
  we don't have the concept of chains like we do in the DAG, and the instruction
  order is stricter than enforcing ordering with graph edges. Although I
  considered adding something similar, I couldn't justify the overhead.

The pass is current split into 3 main parts. The main store merging code focuses
on identifying candidate stores and managing the candidate group that's under
consideration for merging. Analyzing addressing of stores is a potentially
complex part and for now there's just a basic implementation to identify easy
cases. Finally, the other main bit of complexity is the alias analysis, which
tries to follow the same logic as the DAG's AA.

Currently this implementation only supports merging of constant stores. Stores
of arbitrary variables are technically possible with a very small change, but
the DAG chooses not to do this. Doing so here makes most code worse since
there's extra overhead in merging values into wider registers.

On AArch64 -Os, this optimization results in very minor savings on CTMark.

Differential Revision: https://reviews.llvm.org/D109131