[LLVMdev] X86VZeroUpper optimization question.
Lang Hames
lhames at gmail.com
Thu Mar 13 15:36:11 PDT 2014
Hi Bruno,
I'm looking at a test case where we're failing to insert a vzeroupper
between an instruction that dirties the YMM regs and a call that uses SSE
regs. No test case yet - I'm still trying to reduce it to something sane. I
can see where the logic in the X86VZeroUpper optimization goes off the
rails though: The entry state for the basic block is ST_UNKNOWN, and the
optimization contains the following logic:
if (CurState == ST_DIRTY) {
// Only insert the VZEROUPPER in case the entry state isn't unknown.
// When unknown, only compute the information within the block to have
// it available in the exit if possible, but don't change the block.
if (EntryState != ST_UNKNOWN) {
BuildMI(BB, I, dl, TII->get(X86::VZEROUPPER));
++NumVZU;
}
// After the inserted VZEROUPPER the state becomes clean again, but
// other YMM may appear before other subsequent calls or even before
// the end of the BB.
CurState = ST_CLEAN;
}
If CurState == ST_DIRTY and EntryState == ST_UNKNOWN, then some instruction
in this basic block has dirtied the YMM regs. In that case, why would you
want to avoid putting a vzeroupper instruction in? Is it just to avoid
inserting duplicate vzerouppers when the block is revisited? If that's the
case then I think the problem is actually in runOnMachineFunction, which
contains the comment: "Each BB state depends on all predecessors, loop over
until everything converges. (Once we converge, we can implicitly mark
everything that is still ST_UNKNOWN as ST_CLEAN.)". We do iterate to
convergence, but we don't mark anything as clean afterwards, nor do a final
re-visit of the basic blocks that had previously had ST_UNKNOWN entry
states. Is that an oversight?
Cheers,
Lang.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140313/cd1f2713/attachment.html>
More information about the llvm-dev
mailing list