<div dir="ltr">Hi Bruno,<div><br></div><div>I'm looking at a test case where we're failing to insert a vzeroupper between an instruction that dirties the YMM regs and a call that uses SSE regs. No test case yet - I'm still trying to reduce it to something sane. I can see where the logic in the X86VZeroUpper optimization goes off the rails though: The entry state for the basic block is ST_UNKNOWN, and the optimization contains the following logic:</div>
<div><br></div><div><span style="font-family:'courier new',monospace;font-size:x-small">if (CurState == ST_DIRTY) {</span><br></div><div><font face="courier new, monospace" size="1"> // Only insert the VZEROUPPER in case the entry state isn't unknown.</font><div>
<font face="courier new, monospace" size="1"> // When unknown, only compute the information within the block to have</font><div><font face="courier new, monospace" size="1"> // it available in the exit if possible, but don't change the block.</font><div>
<font face="courier new, monospace" size="1"> if (EntryState != ST_UNKNOWN) {</font><div><font face="courier new, monospace" size="1"> BuildMI(BB, I, dl, TII->get(X86::VZEROUPPER));</font><div><font face="courier new, monospace" size="1"> ++NumVZU;</font><div>
<font face="courier new, monospace" size="1"> }</font><div><font face="courier new, monospace" size="1"> // After the inserted VZEROUPPER the state becomes clean again, but</font><div><font face="courier new, monospace" size="1"> // other YMM may appear before other subsequent calls or even before</font><div>
<font face="courier new, monospace" size="1"> // the end of the BB.</font><div><font face="courier new, monospace" size="1"> CurState = ST_CLEAN;</font><div><font face="courier new, monospace" size="1">}</font><div><br>
</div></div></div></div></div></div></div></div></div></div></div></div></div><div>If CurState == ST_DIRTY and EntryState == ST_UNKNOWN, then some instruction in this basic block has dirtied the YMM regs. In that case, why would you want to avoid putting a vzeroupper instruction in? Is it just to avoid inserting duplicate vzerouppers when the block is revisited? If that's the case then I think the problem is actually in runOnMachineFunction, which contains the comment: "Each BB state depends on all predecessors, loop over until everything converges. (Once we converge, we can implicitly mark everything that is still ST_UNKNOWN as ST_CLEAN.)". We do iterate to convergence, but we don't mark anything as clean afterwards, nor do a final re-visit of the basic blocks that had previously had ST_UNKNOWN entry states. Is that an oversight?</div>
<div><br></div><div>Cheers,</div><div>Lang.</div></div>