[all-commits] [llvm/llvm-project] cfc607: [GlobalISel] Combine (a[0]) | (a[1] << k1) | ...| ...
Jessica Paquette via All-commits
all-commits at lists.llvm.org
Tue Jan 19 10:25:07 PST 2021
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: cfc60730179042a93cb9cb338982e71d20707a24
https://github.com/llvm/llvm-project/commit/cfc60730179042a93cb9cb338982e71d20707a24
Author: Jessica Paquette <jpaquette at apple.com>
Date: 2021-01-19 (Tue, 19 Jan 2021)
Changed paths:
M llvm/include/llvm/CodeGen/GlobalISel/CombinerHelper.h
M llvm/include/llvm/CodeGen/TargetLowering.h
M llvm/include/llvm/Target/GlobalISel/Combine.td
M llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
M llvm/lib/CodeGen/TargetLoweringBase.cpp
A llvm/test/CodeGen/AArch64/GlobalISel/prelegalizer-combiner-load-or-pattern-align.mir
A llvm/test/CodeGen/AArch64/GlobalISel/prelegalizer-combiner-load-or-pattern.mir
Log Message:
-----------
[GlobalISel] Combine (a[0]) | (a[1] << k1) | ...| (a[m] << kn) into a wide load
This is a restricted version of the combine in `DAGCombiner::MatchLoadCombine`.
(See D27861)
This tries to recognize patterns like below (assuming a little-endian target):
```
s8* x = ...
s32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
->
s32 val = *((i32)a)
s8* x = ...
s32 val = a[3] | (a[2] << 8) | (a[1] << 16) | (a[0] << 24)
->
s32 val = BSWAP(*((s32)a))
```
(This patch also handles the big-endian target case as well, in which the first
example above has a BSWAP, and the second example above does not.)
To recognize the pattern, this searches from the last G_OR in the expression
tree.
E.g.
```
Reg Reg
\ /
OR_1 Reg
\ /
OR_2
\ Reg
.. /
Root
```
Each non-OR register in the tree is put in a list. Each register in the list is
then checked to see if it's an appropriate load + shift logic.
If every register is a load + potentially a shift, the combine checks if those
loads + shifts, when OR'd together, are equivalent to a wide load (possibly with
a BSWAP.)
To simplify things, this patch
(1) Only handles G_ZEXTLOADs (which appear to be the common case)
(2) Only works in a single MachineBasicBlock
(3) Only handles G_SHL as the bit twiddling to stick the small load into a
specific location
An IR example of this is here: https://godbolt.org/z/4sP9Pj (lifted from
test/CodeGen/AArch64/load-combine.ll)
At -Os on AArch64, this is a 0.5% code size improvement for CTMark/sqlite3,
and a 0.4% improvement for CTMark/7zip-benchmark.
Also fix a bug in `isPredecessor` which caused it to fail whenever `DefMI` was
the first instruction in the block.
Differential Revision: https://reviews.llvm.org/D94350
More information about the All-commits
mailing list