[llvm-dev] LLVM-IR store-load propagation
Roman Lebedev via llvm-dev
llvm-dev at lists.llvm.org
Fri Jun 19 01:32:25 PDT 2020
On Fri, Jun 19, 2020 at 5:08 AM Johannes Doerfert
<johannesdoerfert at gmail.com> wrote:
>
> Hi Matteo,
>
>
> I think Roman (CC'ed) had a similar problem recently.
Similar, yes.
> IIRC, the solution that was discussed was to enable memory promotion of the alloca into a vector.
I'm working on that :)
But i'm not sure if we will be okay doing that unconditionally,
is spilling going to be able to undo all that?
> Once that happen the existing instcombine logic might already create your select.
>
> If the above turns out to be not sufficient for a more complex case, you might want to
> consider building something on top of https://reviews.llvm.org/D80991, or maybe Shinji (CC'ed)
> might even ;)
>
> Cheers,
> Johannes
Roman
> On 6/18/20 7:28 PM, Matteo Favaro via llvm-dev wrote:
>
> Hello everyone,
>
> This week I was looking into the following example (
> https://godbolt.org/z/uhgQcq) where two constants are written to a local
> array and an input argument, masked and shifted, is used to select between
> them. The possible values for the CC variable are 0 and 1, so I'm expecting
> that at the maximum level of optimizations the two constants are actually
> propagated, resulting in the return value being a 'select' controlled by CC
> and returning either one or the other.
>
> Although, I quickly realized that the implementation in the function 'src'
> was not going to be optimized any further, resulting in the generation of
> two 'store' instructions and one 'load' instruction, apparently hindering
> the constant propagation pass.
>
> I then decided to explicitly access the local buffer with constant indexes
> and see if LLVM would have been able to identify that CC could have been
> either 0 or 1 (effectively avoiding the 'default' case of the switch and
> therefore the '0xdeadc0de' constant). As a result the function 'tgt' is
> optimized in the way I would expect it to be.
>
> This also seemed to be a good exercise for Alive2, so I fed it with the
> unoptimized 'src' and the optimized 'tgt' functions to prove their
> equivalence, obtaining the result 'Transformation seems to be correct'. As
> a counter-proof I tampered with the logic or modified the constants,
> obtaining a valid proof of why the transformation wasn't correct
> (effectively showing that the original 'src' and 'tgt' functions may
> actually be semantically equivalent).
>
> To replicate the Alive2 result at https://alive2.llvm.org, the following
> input can be used:
>
> define i64 @_Z3srcm(i64 %Flags) {
> entry:
> %Memory = alloca [2 x i64], align 16
> %and = lshr i64 %Flags, 6
> %shr = and i64 %and, 1
> %0 = bitcast [2 x i64]* %Memory to i8*
> %arrayidx = getelementptr inbounds [2 x i64], [2 x i64]* %Memory, i64 0,
> i64 0
> store i64 5369966919, i64* %arrayidx, align 16
> %arrayidx1 = getelementptr inbounds [2 x i64], [2 x i64]* %Memory, i64 0,
> i64 1
> store i64 5369966790, i64* %arrayidx1, align 8
> %arrayidx2 = getelementptr inbounds [2 x i64], [2 x i64]* %Memory, i64 0,
> i64 %shr
> %1 = load i64, i64* %arrayidx2, align 8
> ret i64 %1
> }
>
> define i64 @_Z3tgtm(i64 %Flags) {
> entry:
> %0 = and i64 %Flags, 64
> %trunc = icmp eq i64 %0, 0
> %. = select i1 %trunc, i64 5369966919, i64 5369966790
> ret i64 %.
> }
>
> At this point I decided to replicate the 'tgt' function logic and coded a
> quick LLVM pass that:
>
> 1. uses the known/unknown computed bits information to identify a
> non-volatile 'load' instruction that uses an index proved to have only two
> possible values;
> 2. check if there's at least one 'store' to the accessed buffer using
> one of the two indexes;
> 3. converts the single 'load' instruction into two 'load' instructions
> using the concrete indexes;
> 4. generates a 'select' instruction that returns one of the two loaded
> values, using as condition a check on the index.
>
> The pass seems to be working fine, but I'm left wondering if LLVM is
> purposefully avoiding such an optimization, and if so what is the reason to
> do so (e.g. hard to prove that the optimization is actually going to
> improve the quality of the code, the logic I'm using is completely off the
> rails).
>
> Assuming the logic it's correct and this could be seen as a new custom
> optimization pass, what would be the suggested way to implement it in a
> solid and generic fashion?
>
> Thanks for any insight,
> Matteo
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
More information about the llvm-dev
mailing list