[llvm-dev] LLVM-IR store-load propagation

Fri Jun 19 01:32:25 PDT 2020

On Fri, Jun 19, 2020 at 5:08 AM Johannes Doerfert
<johannesdoerfert at gmail.com> wrote:
>
> Hi Matteo,
>
>
> I think Roman (CC'ed) had a similar problem recently.
Similar, yes.

> IIRC, the solution that was discussed was to enable memory promotion of the alloca into a vector.
I'm working on that :)
But i'm not sure if we will be okay doing that unconditionally,
is spilling going to be able to undo all that?

> Once that happen the existing instcombine logic might already create your select.
>
> If the above turns out to be not sufficient for a more complex case, you might want to
> consider building something on top of https://reviews.llvm.org/D80991, or maybe Shinji (CC'ed)
> might even ;)
>
> Cheers,
>   Johannes
Roman

> On 6/18/20 7:28 PM, Matteo Favaro via llvm-dev wrote:
>
> Hello everyone,
>
> This week I was looking into the following example (
> https://godbolt.org/z/uhgQcq) where two constants are written to a local
> array and an input argument, masked and shifted, is used to select between
> them. The possible values for the CC variable are 0 and 1, so I'm expecting
> that at the maximum level of optimizations the two constants are actually
> propagated, resulting in the return value being a 'select' controlled by CC
> and returning either one or the other.
>
> Although, I quickly realized that the implementation in the function 'src'
> was not going to be optimized any further, resulting in the generation of
> two 'store' instructions and one 'load' instruction, apparently hindering
> the constant propagation pass.
>
> I then decided to explicitly access the local buffer with constant indexes
> and see if LLVM would have been able to identify that CC could have been
> either 0 or 1 (effectively avoiding the 'default' case of the switch and
> therefore the '0xdeadc0de' constant). As a result the function 'tgt' is
> optimized in the way I would expect it to be.
>
> This also seemed to be a good exercise for Alive2, so I fed it with the
> unoptimized 'src' and the optimized 'tgt' functions to prove their
> equivalence, obtaining the result 'Transformation seems to be correct'. As
> a counter-proof I tampered with the logic or modified the constants,
> obtaining a valid proof of why the transformation wasn't correct
> (effectively showing that the original 'src' and 'tgt' functions may
> actually be semantically equivalent).
>
> To replicate the Alive2 result at https://alive2.llvm.org, the following
> input can be used:
>
> define i64 @_Z3srcm(i64 %Flags) {
> entry:
>   %Memory = alloca [2 x i64], align 16
>   %and = lshr i64 %Flags, 6
>   %shr = and i64 %and, 1
>   %0 = bitcast [2 x i64]* %Memory to i8*
>   %arrayidx = getelementptr inbounds [2 x i64], [2 x i64]* %Memory, i64 0,
> i64 0
>   store i64 5369966919, i64* %arrayidx, align 16
>   %arrayidx1 = getelementptr inbounds [2 x i64], [2 x i64]* %Memory, i64 0,
> i64 1
>   store i64 5369966790, i64* %arrayidx1, align 8
>   %arrayidx2 = getelementptr inbounds [2 x i64], [2 x i64]* %Memory, i64 0,
> i64 %shr
>   %1 = load i64, i64* %arrayidx2, align 8
>   ret i64 %1
> }
>
> define i64 @_Z3tgtm(i64 %Flags) {
> entry:
>   %0 = and i64 %Flags, 64
>   %trunc = icmp eq i64 %0, 0
>   %. = select i1 %trunc, i64 5369966919, i64 5369966790
>   ret i64 %.
> }
>
> At this point I decided to replicate the 'tgt' function logic and coded a
> quick LLVM pass that:
>
>    1. uses the known/unknown computed bits information to identify a
>    non-volatile 'load' instruction that uses an index proved to have only two
>    possible values;
>    2.  check if there's at least one 'store' to the accessed buffer using
>    one of the two indexes;
>    3. converts the single 'load' instruction into two 'load' instructions
>    using the concrete indexes;
>    4. generates a 'select' instruction that returns one of the two loaded
>    values, using as condition a check on the index.
>
> The pass seems to be working fine, but I'm left wondering if LLVM is
> purposefully avoiding such an optimization, and if so what is the reason to
> do so (e.g. hard to prove that the optimization is actually going to
> improve the quality of the code, the logic I'm using is completely off the
> rails).
>
> Assuming the logic it's correct and this could be seen as a new custom
> optimization pass, what would be the suggested way to implement it in a
> solid and generic fashion?
>
> Thanks for any insight,
> Matteo
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev