[PATCH] D117406: [DAGCombiner] Adjust some checks in DAGCombiner::reduceLoadWidth

Sat Jan 15 14:20:31 PST 2022

bjope created this revision.
bjope added reviewers: spatel, nemanjai, samparker.
Herald added subscribers: ecnelises, steven.zhang, hiraditya.
bjope requested review of this revision.
Herald added a project: LLVM.

In code review for D117104 <https://reviews.llvm.org/D117104> two slightly weird checks were found
in DAGCombiner::reduceLoadWidth. They were typically checking
if BitsA was a mulitple of BitsB by looking at (BitsA & (BitsB - 1)),
but such a comparison actually only make sense if BitsB is a power
of two.

The checks were related to the code that attempted to shrink a load
based on the fact that the loaded value would be left shifted.

Afaict the legality of the value types is checked later (typically in
isLegalNarrowLdSt), so the existing checks were both overly
conservative as well as being wrong whenever ExtVTBits wasn't a
power of two. The latter was a situation triggered by a number of
lit tests so we could not just assert on ExtVTBIts being a power of
two).

When attempting to simply remove the checks I found some problems,
that seems to have been guarded by the checks (maybe just out of
luck). A typical example would be a pattern like this:

  t1 = load i96* ptr
  t2 = srl t1, 64
  t3 = truncate t2 to i64

When DAGCombine is visiting the truncate reduceLoadWidth is called
attempting to narrow the load to 64 bits (ExtVT := MVT::i64). Then
the SRL is detecte and we set ShAmt to 64.

In the past we've bailed out due to i96 not being a multiple of 64.
If we simply remove that check then we would end up replacing the
load with a new load that would read 64 bits but with a base pointer
adjusted by 64 bits. So we would read 32 bits the wasn't accessed by
the original load.
This patch will instead utilize the fact that the logical left shift
can be folded away by using a zextload. Thus, the pattern above will
now be combined into

  t3 = load i32* ptr+offset, zext to i64

Another example is found in test/CodeGen/PowerPC/pr39478.ll:

  t7: i64,ch = load<(load (s64) from %ir.p64)> t0, ...
  t27: i64 = srl t7, Constant:i32<8>
  t17: i32 = truncate t27

Here the shift count isn't a multiple of the 32. With this patch the
narrowing kick in and we get an early combine into

  t30: i32,ch = load<(load (s32) from %ir.p64 + 1, align 1)> t0, ..

Unfortunately that expose some short-comings in the PPC lowering of
BSWAP, and the end result in pr39478.ll looks like a regression.
However, in general, this should be beneficial for PPC as well.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D117406

Files:
  llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
  llvm/test/CodeGen/ARM/shift-combine.ll
  llvm/test/CodeGen/PowerPC/pr39478.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D117406.400319.patch
Type: text/x-patch
Size: 5269 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220115/783224d4/attachment.bin>