[llvm-dev] LLVM-IR store-load propagation

Johannes Doerfert via llvm-dev llvm-dev at lists.llvm.org
Thu Jun 18 19:07:08 PDT 2020


Hi Matteo,


I think Roman (CC'ed) had a similar problem recently.

IIRC, the solution that was discussed was to enable memory promotion of 
the alloca into a vector.

Once that happen the existing instcombine logic might already create 
your select.


If the above turns out to be not sufficient for a more complex case, you 
might want to

consider building something on top of https://reviews.llvm.org/D80991, 
or maybe Shinji (CC'ed)

might even ;)


Cheers,

   Johannes


On 6/18/20 7:28 PM, Matteo Favaro via llvm-dev wrote:
> Hello everyone,
>
> This week I was looking into the following example (
> https://godbolt.org/z/uhgQcq) where two constants are written to a local
> array and an input argument, masked and shifted, is used to select between
> them. The possible values for the CC variable are 0 and 1, so I'm expecting
> that at the maximum level of optimizations the two constants are actually
> propagated, resulting in the return value being a 'select' controlled by CC
> and returning either one or the other.
>
> Although, I quickly realized that the implementation in the function 'src'
> was not going to be optimized any further, resulting in the generation of
> two 'store' instructions and one 'load' instruction, apparently hindering
> the constant propagation pass.
>
> I then decided to explicitly access the local buffer with constant indexes
> and see if LLVM would have been able to identify that CC could have been
> either 0 or 1 (effectively avoiding the 'default' case of the switch and
> therefore the '0xdeadc0de' constant). As a result the function 'tgt' is
> optimized in the way I would expect it to be.
>
> This also seemed to be a good exercise for Alive2, so I fed it with the
> unoptimized 'src' and the optimized 'tgt' functions to prove their
> equivalence, obtaining the result 'Transformation seems to be correct'. As
> a counter-proof I tampered with the logic or modified the constants,
> obtaining a valid proof of why the transformation wasn't correct
> (effectively showing that the original 'src' and 'tgt' functions may
> actually be semantically equivalent).
>
> To replicate the Alive2 result at https://alive2.llvm.org, the following
> input can be used:
>
> define i64 @_Z3srcm(i64 %Flags) {
> entry:
>    %Memory = alloca [2 x i64], align 16
>    %and = lshr i64 %Flags, 6
>    %shr = and i64 %and, 1
>    %0 = bitcast [2 x i64]* %Memory to i8*
>    %arrayidx = getelementptr inbounds [2 x i64], [2 x i64]* %Memory, i64 0,
> i64 0
>    store i64 5369966919, i64* %arrayidx, align 16
>    %arrayidx1 = getelementptr inbounds [2 x i64], [2 x i64]* %Memory, i64 0,
> i64 1
>    store i64 5369966790, i64* %arrayidx1, align 8
>    %arrayidx2 = getelementptr inbounds [2 x i64], [2 x i64]* %Memory, i64 0,
> i64 %shr
>    %1 = load i64, i64* %arrayidx2, align 8
>    ret i64 %1
> }
>
> define i64 @_Z3tgtm(i64 %Flags) {
> entry:
>    %0 = and i64 %Flags, 64
>    %trunc = icmp eq i64 %0, 0
>    %. = select i1 %trunc, i64 5369966919, i64 5369966790
>    ret i64 %.
> }
>
> At this point I decided to replicate the 'tgt' function logic and coded a
> quick LLVM pass that:
>
>     1. uses the known/unknown computed bits information to identify a
>     non-volatile 'load' instruction that uses an index proved to have only two
>     possible values;
>     2.  check if there's at least one 'store' to the accessed buffer using
>     one of the two indexes;
>     3. converts the single 'load' instruction into two 'load' instructions
>     using the concrete indexes;
>     4. generates a 'select' instruction that returns one of the two loaded
>     values, using as condition a check on the index.
>
> The pass seems to be working fine, but I'm left wondering if LLVM is
> purposefully avoiding such an optimization, and if so what is the reason to
> do so (e.g. hard to prove that the optimization is actually going to
> improve the quality of the code, the logic I'm using is completely off the
> rails).
>
> Assuming the logic it's correct and this could be seen as a new custom
> optimization pass, what would be the suggested way to implement it in a
> solid and generic fashion?
>
> Thanks for any insight,
> Matteo
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200618/9ace600e/attachment.html>


More information about the llvm-dev mailing list