<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Jul 8, 2016, at 11:12 AM, vivek pandya <<a href="mailto:vivekvpandya@gmail.com" class="">vivekvpandya@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><blockquote type="cite" style="color:rgb(80,0,80);font-size:13px" class=""><div dir="ltr" class="">Hello LLVM Developers,<div class=""><br class=""></div><div class="">I have a thought to improve IPRA and I would like summaries discussion on IRC regarding that so we can develop an idea out of that if it really helps.</div><div class=""><br class=""></div><div class="">So idea is to have more callee saved registers at infrequently called leaf procedures and try provide more registers to procedures which are in upper region of the call graph. But as pointed out by Quentin this optimization may help in context of "true" IPRA but in our case we may not require this. But I think that it can improve performance in current IPRA. I explain both arguments ( Quentin's and mine) with following example. </div><div class=""><br class=""></div><div class="">Consider following call sequence A->B->C , here C is very less time called leaf procedure while A is called frequently and B may call C based on some condition now while propagating actual register usage information from C to A we almost clobbered most of the registers so in this case as per Quentin's point we does not hurt the performance as we fall back to CC but I think we can improve the performance as follows:</div><div class="">If we mark every register preserved by C (i.e having more spill reloads at procedure entry and exit ) and if this can help  at A. Suppose A requires more number of distinct registers than CC can provide and if not provided it will spill variables to memory. Now if we can provide more registers at A by having more spills at C then we can save spill at A which can be beneficial because A is frequently called but C is less frequently called and thus reducing total number of spill/restore in program execution.</div><div class=""><br class=""></div><div class="">However again effect of this optimization will be limited by the scope of current IPRA (i.e one Module only) because we can' really propagate the details about more callee saved registers to caller which is defined in other module, but still it may helpful.</div><div class=""><br class=""></div><div class="">Any thoughts on this ?</div></div></blockquote></div></div></blockquote><div><br class=""></div><div><br class=""></div><div>I think it is interesting, have you considered:</div><div><br class=""></div><div>- the code size impact? (C will have a lot of spills)</div><div>- what if C is cold but all (most) of its call sites are located in different modules?</div><div>- an alternative approach where we would break the CGSCC ordering to codegen B and A before C, so we would be able to spill minimally when performing the code ten for C?</div><div><br class=""></div><div><br class=""></div><div>— </div><div>Mehdi</div><div><br class=""></div></div></body></html>