<div dir="ltr">Thanks Johannes and Roman for pointing me to those patch proposals (D80991, D68934). They seem to approach the problem as a way more generic solution and I'll surely keep an eye on them to see if they'll be merged. From a quick glance it looks like that after running that analysis pass one would be able to know the potential values associated with a variable at a specific point in the code, which would mean the pass I described would be possible "for free", right? If that's the case I think that more or less I'm using the llvm::KnownBits in a weakest form to spot the unknown bits in the index (an integer) and generate the possible values.<div><br></div><div>The idea of promoting the 'alloca' to be a vector also sounds really interesting, my only fear is that doing that unconditionally would result in a non-optimizable representation that would hinder some custom passes that I have in my pipeline. That's mainly the reason why in the dummy pass I described in the first email I check if there's at least a chance (specifically a 'store' to one of the known indexes) for the code to be optimized. I would gladly read the discussion where the "alloca promotion" solution has been pointed out if it's available on the mailing list, how can I find it?</div><div><br></div><div>Cheers,</div><div>Matteo</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Il giorno ven 19 giu 2020 alle ore 10:33 Roman Lebedev <<a href="mailto:lebedev.ri@gmail.com">lebedev.ri@gmail.com</a>> ha scritto:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Fri, Jun 19, 2020 at 5:08 AM Johannes Doerfert<br>

<<a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>> wrote:<br>

><br>

> Hi Matteo,<br>

><br>

><br>

> I think Roman (CC'ed) had a similar problem recently.<br>

Similar, yes.<br>

<br>

> IIRC, the solution that was discussed was to enable memory promotion of the alloca into a vector.<br>

I'm working on that :)<br>

But i'm not sure if we will be okay doing that unconditionally,<br>

is spilling going to be able to undo all that?<br>

<br>

> Once that happen the existing instcombine logic might already create your select.<br>

><br>

> If the above turns out to be not sufficient for a more complex case, you might want to<br>

> consider building something on top of <a href="https://reviews.llvm.org/D80991" rel="noreferrer" target="_blank">https://reviews.llvm.org/D80991</a>, or maybe Shinji (CC'ed)<br>

> might even ;)<br>

><br>

> Cheers,<br>

>   Johannes<br>

Roman<br>

<br>

> On 6/18/20 7:28 PM, Matteo Favaro via llvm-dev wrote:<br>

><br>

> Hello everyone,<br>

><br>

> This week I was looking into the following example (<br>

> <a href="https://godbolt.org/z/uhgQcq" rel="noreferrer" target="_blank">https://godbolt.org/z/uhgQcq</a>) where two constants are written to a local<br>

> array and an input argument, masked and shifted, is used to select between<br>

> them. The possible values for the CC variable are 0 and 1, so I'm expecting<br>

> that at the maximum level of optimizations the two constants are actually<br>

> propagated, resulting in the return value being a 'select' controlled by CC<br>

> and returning either one or the other.<br>

><br>

> Although, I quickly realized that the implementation in the function 'src'<br>

> was not going to be optimized any further, resulting in the generation of<br>

> two 'store' instructions and one 'load' instruction, apparently hindering<br>

> the constant propagation pass.<br>

><br>

> I then decided to explicitly access the local buffer with constant indexes<br>

> and see if LLVM would have been able to identify that CC could have been<br>

> either 0 or 1 (effectively avoiding the 'default' case of the switch and<br>

> therefore the '0xdeadc0de' constant). As a result the function 'tgt' is<br>

> optimized in the way I would expect it to be.<br>

><br>

> This also seemed to be a good exercise for Alive2, so I fed it with the<br>

> unoptimized 'src' and the optimized 'tgt' functions to prove their<br>

> equivalence, obtaining the result 'Transformation seems to be correct'. As<br>

> a counter-proof I tampered with the logic or modified the constants,<br>

> obtaining a valid proof of why the transformation wasn't correct<br>

> (effectively showing that the original 'src' and 'tgt' functions may<br>

> actually be semantically equivalent).<br>

><br>

> To replicate the Alive2 result at <a href="https://alive2.llvm.org" rel="noreferrer" target="_blank">https://alive2.llvm.org</a>, the following<br>

> input can be used:<br>

><br>

> define i64 @_Z3srcm(i64 %Flags) {<br>

> entry:<br>

>   %Memory = alloca [2 x i64], align 16<br>

>   %and = lshr i64 %Flags, 6<br>

>   %shr = and i64 %and, 1<br>

>   %0 = bitcast [2 x i64]* %Memory to i8*<br>

>   %arrayidx = getelementptr inbounds [2 x i64], [2 x i64]* %Memory, i64 0,<br>

> i64 0<br>

>   store i64 5369966919, i64* %arrayidx, align 16<br>

>   %arrayidx1 = getelementptr inbounds [2 x i64], [2 x i64]* %Memory, i64 0,<br>

> i64 1<br>

>   store i64 5369966790, i64* %arrayidx1, align 8<br>

>   %arrayidx2 = getelementptr inbounds [2 x i64], [2 x i64]* %Memory, i64 0,<br>

> i64 %shr<br>

>   %1 = load i64, i64* %arrayidx2, align 8<br>

>   ret i64 %1<br>

> }<br>

><br>

> define i64 @_Z3tgtm(i64 %Flags) {<br>

> entry:<br>

>   %0 = and i64 %Flags, 64<br>

>   %trunc = icmp eq i64 %0, 0<br>

>   %. = select i1 %trunc, i64 5369966919, i64 5369966790<br>

>   ret i64 %.<br>

> }<br>

><br>

> At this point I decided to replicate the 'tgt' function logic and coded a<br>

> quick LLVM pass that:<br>

><br>

>    1. uses the known/unknown computed bits information to identify a<br>

>    non-volatile 'load' instruction that uses an index proved to have only two<br>

>    possible values;<br>

>    2.  check if there's at least one 'store' to the accessed buffer using<br>

>    one of the two indexes;<br>

>    3. converts the single 'load' instruction into two 'load' instructions<br>

>    using the concrete indexes;<br>

>    4. generates a 'select' instruction that returns one of the two loaded<br>

>    values, using as condition a check on the index.<br>

><br>

> The pass seems to be working fine, but I'm left wondering if LLVM is<br>

> purposefully avoiding such an optimization, and if so what is the reason to<br>

> do so (e.g. hard to prove that the optimization is actually going to<br>

> improve the quality of the code, the logic I'm using is completely off the<br>

> rails).<br>

><br>

> Assuming the logic it's correct and this could be seen as a new custom<br>

> optimization pass, what would be the suggested way to implement it in a<br>

> solid and generic fashion?<br>

><br>

> Thanks for any insight,<br>

> Matteo<br>

><br>

><br>

> _______________________________________________<br>

> LLVM Developers mailing list<br>

> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</blockquote></div>