<div dir="ltr">Thanks Johannes and Roman for pointing me to those patch proposals (D80991, D68934). They seem to approach the problem as a way more generic solution and I'll surely keep an eye on them to see if they'll be merged. From a quick glance it looks like that after running that analysis pass one would be able to know the potential values associated with a variable at a specific point in the code, which would mean the pass I described would be possible "for free", right? If that's the case I think that more or less I'm using the llvm::KnownBits in a weakest form to spot the unknown bits in the index (an integer) and generate the possible values.<div><br></div><div>The idea of promoting the 'alloca' to be a vector also sounds really interesting, my only fear is that doing that unconditionally would result in a non-optimizable representation that would hinder some custom passes that I have in my pipeline. That's mainly the reason why in the dummy pass I described in the first email I check if there's at least a chance (specifically a 'store' to one of the known indexes) for the code to be optimized. I would gladly read the discussion where the "alloca promotion" solution has been pointed out if it's available on the mailing list, how can I find it?</div><div><br></div><div>Cheers,</div><div>Matteo</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Il giorno ven 19 giu 2020 alle ore 10:33 Roman Lebedev <<a href="mailto:lebedev.ri@gmail.com">lebedev.ri@gmail.com</a>> ha scritto:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Fri, Jun 19, 2020 at 5:08 AM Johannes Doerfert<br>
<<a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>> wrote:<br>
><br>
> Hi Matteo,<br>
><br>
><br>
> I think Roman (CC'ed) had a similar problem recently.<br>
Similar, yes.<br>
<br>
> IIRC, the solution that was discussed was to enable memory promotion of the alloca into a vector.<br>
I'm working on that :)<br>
But i'm not sure if we will be okay doing that unconditionally,<br>
is spilling going to be able to undo all that?<br>
<br>
> Once that happen the existing instcombine logic might already create your select.<br>
><br>
> If the above turns out to be not sufficient for a more complex case, you might want to<br>
> consider building something on top of <a href="https://reviews.llvm.org/D80991" rel="noreferrer" target="_blank">https://reviews.llvm.org/D80991</a>, or maybe Shinji (CC'ed)<br>
> might even ;)<br>
><br>
> Cheers,<br>
> Johannes<br>
Roman<br>
<br>
> On 6/18/20 7:28 PM, Matteo Favaro via llvm-dev wrote:<br>
><br>
> Hello everyone,<br>
><br>
> This week I was looking into the following example (<br>
> <a href="https://godbolt.org/z/uhgQcq" rel="noreferrer" target="_blank">https://godbolt.org/z/uhgQcq</a>) where two constants are written to a local<br>
> array and an input argument, masked and shifted, is used to select between<br>
> them. The possible values for the CC variable are 0 and 1, so I'm expecting<br>
> that at the maximum level of optimizations the two constants are actually<br>
> propagated, resulting in the return value being a 'select' controlled by CC<br>
> and returning either one or the other.<br>
><br>
> Although, I quickly realized that the implementation in the function 'src'<br>
> was not going to be optimized any further, resulting in the generation of<br>
> two 'store' instructions and one 'load' instruction, apparently hindering<br>
> the constant propagation pass.<br>
><br>
> I then decided to explicitly access the local buffer with constant indexes<br>
> and see if LLVM would have been able to identify that CC could have been<br>
> either 0 or 1 (effectively avoiding the 'default' case of the switch and<br>
> therefore the '0xdeadc0de' constant). As a result the function 'tgt' is<br>
> optimized in the way I would expect it to be.<br>
><br>
> This also seemed to be a good exercise for Alive2, so I fed it with the<br>
> unoptimized 'src' and the optimized 'tgt' functions to prove their<br>
> equivalence, obtaining the result 'Transformation seems to be correct'. As<br>
> a counter-proof I tampered with the logic or modified the constants,<br>
> obtaining a valid proof of why the transformation wasn't correct<br>
> (effectively showing that the original 'src' and 'tgt' functions may<br>
> actually be semantically equivalent).<br>
><br>
> To replicate the Alive2 result at <a href="https://alive2.llvm.org" rel="noreferrer" target="_blank">https://alive2.llvm.org</a>, the following<br>
> input can be used:<br>
><br>
> define i64 @_Z3srcm(i64 %Flags) {<br>
> entry:<br>
> %Memory = alloca [2 x i64], align 16<br>
> %and = lshr i64 %Flags, 6<br>
> %shr = and i64 %and, 1<br>
> %0 = bitcast [2 x i64]* %Memory to i8*<br>
> %arrayidx = getelementptr inbounds [2 x i64], [2 x i64]* %Memory, i64 0,<br>
> i64 0<br>
> store i64 5369966919, i64* %arrayidx, align 16<br>
> %arrayidx1 = getelementptr inbounds [2 x i64], [2 x i64]* %Memory, i64 0,<br>
> i64 1<br>
> store i64 5369966790, i64* %arrayidx1, align 8<br>
> %arrayidx2 = getelementptr inbounds [2 x i64], [2 x i64]* %Memory, i64 0,<br>
> i64 %shr<br>
> %1 = load i64, i64* %arrayidx2, align 8<br>
> ret i64 %1<br>
> }<br>
><br>
> define i64 @_Z3tgtm(i64 %Flags) {<br>
> entry:<br>
> %0 = and i64 %Flags, 64<br>
> %trunc = icmp eq i64 %0, 0<br>
> %. = select i1 %trunc, i64 5369966919, i64 5369966790<br>
> ret i64 %.<br>
> }<br>
><br>
> At this point I decided to replicate the 'tgt' function logic and coded a<br>
> quick LLVM pass that:<br>
><br>
> 1. uses the known/unknown computed bits information to identify a<br>
> non-volatile 'load' instruction that uses an index proved to have only two<br>
> possible values;<br>
> 2. check if there's at least one 'store' to the accessed buffer using<br>
> one of the two indexes;<br>
> 3. converts the single 'load' instruction into two 'load' instructions<br>
> using the concrete indexes;<br>
> 4. generates a 'select' instruction that returns one of the two loaded<br>
> values, using as condition a check on the index.<br>
><br>
> The pass seems to be working fine, but I'm left wondering if LLVM is<br>
> purposefully avoiding such an optimization, and if so what is the reason to<br>
> do so (e.g. hard to prove that the optimization is actually going to<br>
> improve the quality of the code, the logic I'm using is completely off the<br>
> rails).<br>
><br>
> Assuming the logic it's correct and this could be seen as a new custom<br>
> optimization pass, what would be the suggested way to implement it in a<br>
> solid and generic fashion?<br>
><br>
> Thanks for any insight,<br>
> Matteo<br>
><br>
><br>
> _______________________________________________<br>
> LLVM Developers mailing list<br>
> <a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote></div>