[PATCH] D138766: [InstCombine] If loading from small alloca, load whole alloca and perform variable extraction

Wed Dec 14 13:17:30 PST 2022

lebedev.ri added a comment.

Thank you for looking into it!

In D138766#3996126 <https://reviews.llvm.org/D138766#3996126>, @nlopes wrote:

> In D138766#3995966 <https://reviews.llvm.org/D138766#3995966>, @lebedev.ri wrote:
>
>> In D138766#3995896 <https://reviews.llvm.org/D138766#3995896>, @nlopes wrote:
>>
>>> FWIW, I've discovered today that GVN does a similar optimization (but without the freeze..).
>>> See here (scroll to the bottom): https://web.ist.utl.pt/nuno.lopes/alive2/index.php?hash=aed14c64378404c9&test=Transforms%2FPhaseOrdering%2FX86%2Fvec-load-combine.ll
>>
>> That seems to be with constant indexes, though?
>
> True.
> So it could use a simple extractelement rather than bit masking.

Define "could"? Define "simple"?
I've looked at alternative lowerings (`shufflevector` or chain of `extractelement`'s),
and they all result in worse codegen. We can not use a single `extractelement,
because the byte offset may not be a multiple of the element size.
The shift is the optimal lowering here, any alternative chosen lowering
would need to be canonicalized into it, and and which point why bother?

(Yes, i will look into doing this in SROA.)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138766/new/

https://reviews.llvm.org/D138766