[PATCH] D138186: InstCombine: Simplify vector load based on demanded elements

Thu Nov 17 06:18:27 PST 2022

spatel requested changes to this revision.
spatel added a comment.
This revision now requires changes to proceed.

In D138186#3933180 <https://reviews.llvm.org/D138186#3933180>, @piotr wrote:

> I've always thought we didn't have such a transformation in the generic instcombine, because it may hurt some targets, but I don't immediately see how. (We do have a similar one for intrinsic loads in the AMDGPU-specific file).

This is correct - this is not universally beneficial/recoverable.

Looking at the diffs here for example, we can see that a <4 x float> load (legal on x86) is replaced by a <3 x float> load (not legal on x86). There is no way to recover the wider load at any later point in optimization because there is nothing in the IR that says it is safe to load the extra bytes. A single 128-bit load reduced to 96-bit load could then have to be split into a 64-bit load and 32-bit load, and that's likely worse for perf.

We've been trying to solve bugs like that (the source is written in a way that is hard to optimize) for a long time, so this would make things worse. Example:
https://github.com/llvm/llvm-project/issues/17113

I recommend trying this in VectorCombine or AggressiveInstCombine with TTI legality checks (assuming it needs to happen early in IR to unlock other transforms).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138186/new/

https://reviews.llvm.org/D138186