[PATCH] D19439: Optimization bisect support in X86-specific passes

David Kreitzer via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 26 09:44:49 PDT 2016


DavidKreitzer added a comment.

My comments about vzeroupper make me wonder whether we want skipFunction to be able to make the distinction between skipping a pass for OptBisect and skipping a pass for -O0. I can certainly imagine wanting to run some functionally optional "optimization" passes at -O0, e.g. a pass that can improve the size/performance of the generated code w/o negatively affecting debuggability.


================
Comment at: lib/Target/X86/X86FixupBWInsts.cpp:138
@@ -137,3 +137,3 @@
 bool FixupBWInstPass::runOnMachineFunction(MachineFunction &MF) {
-  if (!FixupBWInsts)
+  if (skipFunction(*MF.getFunction()) || !FixupBWInsts)
     return false;
----------------
I see that you chose to put the skipFunction calls first before other pass-skipping checks. Did you give any thought to that placement and/or do this intentionally? The effect of putting the call earlier is that the bisection counts will be higher.

For these skipFunction calls, it probably doesn't matter too much, but when we get around to adding shouldRunCase calls, we tend to want to delay the calls as long as possible to avoid wasting bisect numbers on optimization cases that get filtered out by subsequent safety/performance checks.

Along those lines, you might want to make LastBisectNum in OptBisect.h a 64-bit integer rather than a 32-bit integer. It would not surprise me if a 32-bit counter overflows for a large LTO compilation, especially if we are not careful with our calls to shouldRunCase.

================
Comment at: lib/Target/X86/X86VZeroUpper.cpp:258
@@ -257,1 +257,3 @@
 bool VZeroUpperInserter::runOnMachineFunction(MachineFunction &MF) {
+  if (skipFunction(*MF.getFunction()))
+    return false;
----------------
skipFunction will return true at -O0, right? That is not the behavior we want for the vzeroupper insertion pass. Leaving the machine in a VEX-256 "dirty" state at function call/return boundaries is a very bad idea, because it will result in AVX<=>SSE transition penalties for any subsequent transitions until the next vzeroupper.

The vzeroupper strategy that we defined at the initial implementation of AVX was to assume and preserve a zeroupper state at function entry/call/return boundaries so that AVX128<=>SSE transitions would incur no penalties. Think of it as a kind of "performance ABI". Having just one call to one routine that violates these rules can crater performance for the lifetime of a program.

So perhaps the best approach is to just make VZeroUpperInserter a required pass and document the rationale. I would have no objection to hooking this pass up to the optimization bisector for debugging purposes since the pass is optional from a functional perspective. But that might cause more confusion than it's worth.



Repository:
  rL LLVM

http://reviews.llvm.org/D19439





More information about the llvm-commits mailing list