[LLVMdev] some superoptimizer results

Thu Jul 23 10:52:24 PDT 2015

On 07/22/2015 09:27 PM, John Regehr wrote:
>> I just noticed: most of the results in this batch seem to be about 
>> exploiting `[zs]ext i1` having cost 1
>> in order to replace a select of cost 3.
>> Could you do a run where select has cost 1 and [zs]ext i1 (and trunc 
>> to i1) has cost 2 or 3?
>
> I tried this (or something quite similar) earlier and it caused Souper 
> to introduce *a lot* of selects.  So the problem is that Souper's 
> preferences become skewed too far in the other direction.
>
> How about for the next run I give everything (except maybe div/rem) 
> cost 1?  Then the only thing Souper will tell us about is when it can 
> eliminate instructions, which seems to be a generally desirable thing.
>
> As I said in the blog post, in the next batch Souper will exploit 
> known bits.  I'm excited to see what kind of results come out of 
> Philip's value-tracking-dom-conditions.
Your timing for mentioning this is slightly ironic.  I'm in the process 
of deciding whether I should just delete all the code in question.  It's 
been in tree for a while now and I haven't been able to justify the time 
to make it ready for production.

At this point, I'm not really seeing much remaining benefit from 
enabling the dominating condition work in ValueTracking.  I've got one 
benchmark (one!) that shows a very slight improvement with the code 
enabled.  There are also a number of crashing bugs with the option 
enabled which I haven't yet had time to track down.  I suspect these are 
miscompiles.  The good news is that all of them have the same 
"signature" so I suspect it's only one or two specific issues.  In 
theory, these crashes could be hiding performance improvements, but I 
have little evidence of that.

Unless someone else steps up to help drive this work, I'm likely to 
delete the code from tree for now.  It's not a failed experiment per se, 
just one that I don't have the time to take to completion.  I don't want 
the code in tree and bit rotting if it's not serving any purpose.

For the purposes of documentation, let me summarize a few lessons learned:
- The use list based implementation appears to be *very* slightly faster 
than the dom walk based one.  Both implementations scale to 10s of 
thousands of (uses, blocks) with around a 2-3% compile time impact.  The 
measurements collected were fairly noisy, so add 1-2% for possible 
measurement error.
- Given the simplicity - both implementation and conceptually - I now 
think the dominance based approach should be preferred.  It also has 
room for more performance tweaking and makes caching more obvious.
- Increasing the depth of the known bits search at which dominating 
conditions are applied increases compile time without material 
improvement in result quality.
- Enabling this seems to tickle a lot of unrelated bugs.  Taking the 
time to triage those might be good purely from a robustness standpoint 
since in *theory* this is just returning a result the known bits could 
have figured out otherwise.
- At least for me, the ability to reason about non-nullness is more 
important than the value of particular bits.  If I were to start this 
from scratch, I'd probably start by making isKnownNonNull dominating 
condition aware.  Given that starting place, a separate pass (i.e. 
not-instcomine) might make far more sense and help to decouple the code 
more.
- I suspect that many of the cases found by this approach could be 
implemented as special cases.  To use nullness as an example, 
recognizing the pattern { if (!p) p = malloc(X) } is much easier than 
handling general control flow.  The general framework might be most 
useful in finding the special patterns that could be manually 
implemented.  My suspicion is that handling the top-N cases will get 95% 
of the gain.

>
> John
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev