Hello,<div><br></div><div>This patch removes an badly scaling code path from RescheduleMIBelowKill in the twoaddr pass.</div><div><br></div><div>The problem is essentially that we walk the instructions of the basic blocks of a function, and call RescheduleMIBelowKill for a large number of them. It in turn calls findLocalKill, which then walks *the entire* use/def chain for the register for the function. For functions with a very large number of register uses, this starts to scale very poorly. =]</div>

<div><br></div><div>There is a better algorithm to use here. Right after we do findLocalKill, we walk the instructions linearly from the original instruction to the kill, checking if dependencies have been violated. This already has a cap to prevent n^2 complexity on very large basic blocks.</div>

<div><br></div><div>If we fuse the search for the kill with the checking for dependencies we can share the single linear walk and the single cap to prevent n^2 behavior.</div><div><br></div><div>This requires a bit of reshuffling, and I've tried to start pulling out helper functions here. However, there is a lot more refactoring that could be done here, and in particular, we should factor this enough to allow RescheduleKillAboveMI to use the same algorithm and share most of the code. I'd prefer to do that in follow-up patches. The latter function doesn't show up as terribly hot in my profiles.</div>

<div><br></div><div>For some of the test cases from PR13225, after I applied the patch mailed out earlier to generate basic blocks in the natural CFG order, this moves findLocalKill (and its callers) from over 60% of the CPU profile, to below 0.25%. ;] The constant factors in this one were really really high.</div>

<div><br></div><div>The only remaining issue is MarkVirtRegAliveInBlock.</div>