[PATCH] D19439: Optimization bisect support in X86-specific passes

Andy Kaylor via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 26 10:18:16 PDT 2016


andrew.w.kaylor added a comment.

In http://reviews.llvm.org/D19439#412264, @DavidKreitzer wrote:

> My comments about vzeroupper make me wonder whether we want skipFunction to be able to make the distinction between skipping a pass for OptBisect and skipping a pass for -O0. I can certainly imagine wanting to run some functionally optional "optimization" passes at -O0, e.g. a pass that can improve the size/performance of the generated code w/o negatively affecting debuggability.


I think you're right about having "optimization" passes that we want to run at -O0.  I would expect that to be uncommon enough and limited enough in scope that it's probably OK not to involve those passes in the bisect, the benefit being that -O0, "optnone" and the bisect all follow the same rules.  We could revisit that later if it turns out to limit the usefulness of the bisect.


================
Comment at: lib/Target/X86/X86FixupBWInsts.cpp:138
@@ -137,3 +137,3 @@
 bool FixupBWInstPass::runOnMachineFunction(MachineFunction &MF) {
-  if (!FixupBWInsts)
+  if (skipFunction(*MF.getFunction()) || !FixupBWInsts)
     return false;
----------------
DavidKreitzer wrote:
> I see that you chose to put the skipFunction calls first before other pass-skipping checks. Did you give any thought to that placement and/or do this intentionally? The effect of putting the call earlier is that the bisection counts will be higher.
> 
> For these skipFunction calls, it probably doesn't matter too much, but when we get around to adding shouldRunCase calls, we tend to want to delay the calls as long as possible to avoid wasting bisect numbers on optimization cases that get filtered out by subsequent safety/performance checks.
> 
> Along those lines, you might want to make LastBisectNum in OptBisect.h a 64-bit integer rather than a 32-bit integer. It would not surprise me if a 32-bit counter overflows for a large LTO compilation, especially if we are not careful with our calls to shouldRunCase.
I started out always making the skip check be the first thing that happens so that it doesn't creep in to the logic of the run functions, but as I've been making more of these changes I've started putting it behind more trivial checks like this to trim the extra call in cases where bisection isn't being done.  Given the logarithmic order of the bisection I wouldn't think it will make much difference at this level, but I suppose once we get into having all the cases instrumented those will add up quickly.

It's definitely best to be consistent and I don't have any reason not to put the skip check behind trivial checks like this, so I'll standardize on that.

================
Comment at: lib/Target/X86/X86VZeroUpper.cpp:258
@@ -257,1 +257,3 @@
 bool VZeroUpperInserter::runOnMachineFunction(MachineFunction &MF) {
+  if (skipFunction(*MF.getFunction()))
+    return false;
----------------
DavidKreitzer wrote:
> skipFunction will return true at -O0, right? That is not the behavior we want for the vzeroupper insertion pass. Leaving the machine in a VEX-256 "dirty" state at function call/return boundaries is a very bad idea, because it will result in AVX<=>SSE transition penalties for any subsequent transitions until the next vzeroupper.
> 
> The vzeroupper strategy that we defined at the initial implementation of AVX was to assume and preserve a zeroupper state at function entry/call/return boundaries so that AVX128<=>SSE transitions would incur no penalties. Think of it as a kind of "performance ABI". Having just one call to one routine that violates these rules can crater performance for the lifetime of a program.
> 
> So perhaps the best approach is to just make VZeroUpperInserter a required pass and document the rationale. I would have no objection to hooking this pass up to the optimization bisector for debugging purposes since the pass is optional from a functional perspective. But that might cause more confusion than it's worth.
> 
It turns out that there is a lit test which verifies that nothing is skipped by the "optnone" attribute check that would otherwise be run at -O0 and I think it makes sense for opt bisect to follow the same rule.  So given that we think VZeroUpper should be run at O0 I'll remove this check.


Repository:
  rL LLVM

http://reviews.llvm.org/D19439





More information about the llvm-commits mailing list