<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Jan 3, 2011, at 9:42 PM, Cameron Zwarich wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div>On Jan 3, 2011, at 9:29 PM, Jakob Stoklund Olesen wrote:</div></blockquote><blockquote type="cite"><div><blockquote type="cite"><font class="Apple-style-span" color="#144FAE"><br></font></blockquote><blockquote type="cite">It seems like a good idea to avoid allocating and freeing a DenseMap for every bitcast and cmp instruction. 403.gcc has 157000 of those.<br></blockquote><br>Good idea. I'll test that now. It's a bit annoying that you can't rely on RAII with all of these early returns, but I guess we could have a little .clear() helper RAII object. ;-)<br></div></blockquote><div><br></div><div>Don't bother. Just clear the maps before using them instead of after. Nobody will notice.</div><div><br></div><div>Note that 3 of the maps can share a single DenseMap<BasicBlock*, Instruction*>.</div><br><blockquote type="cite"><div><blockquote type="cite">Is the fix-point loop in CodeGenPrepare still necessary? When critical edge splitting is disabled?<br></blockquote><br>I've just been running some experiments on this. The fixed point loop is probably necessary for the 'ext' optimizations, as a lot of 'ext' casts get optimized after other instructions have been sunk into their block. On all of test-suite + SPEC2000 & SPEC2006, there are only 4 noop copies optimized in a later iteration (these don't really matter as they will be eliminated by the coalescer later), but there are 15 memory instructions that have their addressing code sunk into their BB in a later iteration. I was thinking of just iterating the ext optimizations afterwards, possibly based on a worklist, but it would be nice to know why these memory instructions have sinkable addressing code after the first iteration.<br></div></blockquote><div><br></div><div>It is probably chained bitcast / ext / gep instructions getting lowered one at a time.</div><div><br></div><div>If that is the case, you could probably get away with iterating over each basic block separately instead of re-checking the whole function. That assumes that the chains to be lowered already were in the same basic block. I have no idea if that is generally true.</div><div><br></div><div>It would be safer and faster to add the operands of lowered instructions to a work list, but that is a bit more work to implement.</div><div><br></div><blockquote type="cite"><div>As an aside, I tried adding another pass of CFG optimizations (which is probably not there because it would reverse the critical edge splitting), and it merges a decent number of extra blocks.<br></div></blockquote></div><br><div>That makes sense.</div><div><br></div><div>I see critical edge splitting during register allocation in our future ;-)</div><div><br></div><div>/jakob</div><div><br></div></body></html>