[LLVMdev] some superoptimizer results
Philip Reames
listmail at philipreames.com
Thu Jul 23 10:52:24 PDT 2015
On 07/22/2015 09:27 PM, John Regehr wrote:
>> I just noticed: most of the results in this batch seem to be about
>> exploiting `[zs]ext i1` having cost 1
>> in order to replace a select of cost 3.
>> Could you do a run where select has cost 1 and [zs]ext i1 (and trunc
>> to i1) has cost 2 or 3?
>
> I tried this (or something quite similar) earlier and it caused Souper
> to introduce *a lot* of selects. So the problem is that Souper's
> preferences become skewed too far in the other direction.
>
> How about for the next run I give everything (except maybe div/rem)
> cost 1? Then the only thing Souper will tell us about is when it can
> eliminate instructions, which seems to be a generally desirable thing.
>
> As I said in the blog post, in the next batch Souper will exploit
> known bits. I'm excited to see what kind of results come out of
> Philip's value-tracking-dom-conditions.
Your timing for mentioning this is slightly ironic. I'm in the process
of deciding whether I should just delete all the code in question. It's
been in tree for a while now and I haven't been able to justify the time
to make it ready for production.
At this point, I'm not really seeing much remaining benefit from
enabling the dominating condition work in ValueTracking. I've got one
benchmark (one!) that shows a very slight improvement with the code
enabled. There are also a number of crashing bugs with the option
enabled which I haven't yet had time to track down. I suspect these are
miscompiles. The good news is that all of them have the same
"signature" so I suspect it's only one or two specific issues. In
theory, these crashes could be hiding performance improvements, but I
have little evidence of that.
Unless someone else steps up to help drive this work, I'm likely to
delete the code from tree for now. It's not a failed experiment per se,
just one that I don't have the time to take to completion. I don't want
the code in tree and bit rotting if it's not serving any purpose.
For the purposes of documentation, let me summarize a few lessons learned:
- The use list based implementation appears to be *very* slightly faster
than the dom walk based one. Both implementations scale to 10s of
thousands of (uses, blocks) with around a 2-3% compile time impact. The
measurements collected were fairly noisy, so add 1-2% for possible
measurement error.
- Given the simplicity - both implementation and conceptually - I now
think the dominance based approach should be preferred. It also has
room for more performance tweaking and makes caching more obvious.
- Increasing the depth of the known bits search at which dominating
conditions are applied increases compile time without material
improvement in result quality.
- Enabling this seems to tickle a lot of unrelated bugs. Taking the
time to triage those might be good purely from a robustness standpoint
since in *theory* this is just returning a result the known bits could
have figured out otherwise.
- At least for me, the ability to reason about non-nullness is more
important than the value of particular bits. If I were to start this
from scratch, I'd probably start by making isKnownNonNull dominating
condition aware. Given that starting place, a separate pass (i.e.
not-instcomine) might make far more sense and help to decouple the code
more.
- I suspect that many of the cases found by this approach could be
implemented as special cases. To use nullness as an example,
recognizing the pattern { if (!p) p = malloc(X) } is much easier than
handling general control flow. The general framework might be most
useful in finding the special patterns that could be manually
implemented. My suspicion is that handling the top-N cases will get 95%
of the gain.
>
> John
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list