[LLVMdev] X86VZeroUpper optimization question.

Thu Mar 13 15:36:11 PDT 2014

Hi Bruno,

I'm looking at a test case where we're failing to insert a vzeroupper
between an instruction that dirties the YMM regs and a call that uses SSE
regs. No test case yet - I'm still trying to reduce it to something sane. I
can see where the logic in the X86VZeroUpper optimization goes off the
rails though: The entry state for the basic block is ST_UNKNOWN, and the
optimization contains the following logic:

if (CurState == ST_DIRTY) {
  // Only insert the VZEROUPPER in case the entry state isn't unknown.
  // When unknown, only compute the information within the block to have
  // it available in the exit if possible, but don't change the block.
  if (EntryState != ST_UNKNOWN) {
    BuildMI(BB, I, dl, TII->get(X86::VZEROUPPER));
    ++NumVZU;
  }
  // After the inserted VZEROUPPER the state becomes clean again, but
  // other YMM may appear before other subsequent calls or even before
  // the end of the BB.
  CurState = ST_CLEAN;
}

If CurState == ST_DIRTY and EntryState == ST_UNKNOWN, then some instruction
in this basic block has dirtied the YMM regs. In that case, why would you
want to avoid putting a vzeroupper instruction in? Is it just to avoid
inserting duplicate vzerouppers when the block is revisited? If that's the
case then I think the problem is actually in runOnMachineFunction, which
contains the comment: "Each BB state depends on all predecessors, loop over
until everything converges.  (Once we converge, we can implicitly mark
everything that is still ST_UNKNOWN as ST_CLEAN.)". We do iterate to
convergence, but we don't mark anything as clean afterwards, nor do a final
re-visit of the basic blocks that had previously had ST_UNKNOWN entry
states. Is that an oversight?

Cheers,
Lang.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140313/cd1f2713/attachment.html>