<div dir="ltr">I observed that LLVM fails to mark "tail" on some simple cases. In particular it seems that there is only one pass which does it, TailRecursionElimination, and TRE will skip the entire function if any call argument is derived from an alloca or byval argument.<div>


<br></div><div><div>I've implemented a patch which does the full expensive analysis: look at every instruction, make note of allocas and byval arguments and all values which are potentially derived from those, and then mark calls which never get those as input tail, and calls which get alloca-derived values and could write them into memory "poison" all non-readnone functions which are reachable after they run. This is surely O(n^2) with an expensive "isPotentiallyReachable" call at every step, but I didn't notice any slowdown without any instruments.<br>


</div></div><div><br></div><div><div>Roughly 80,000 additional calls are marked tail in a bootstrap of clang. Sadly this doesn't correlate to actual "jmp" instructions due to what appear to be further optimizer deficiencies.</div>


</div><div><br></div><div>I have attached my patch for review. Are there ways this could be done more efficiently? Are there places we're redoing work that could be shared? What is a sensible set of limits on it that will prevent runaway optimizer time? Should it be part of TRE or moved to a separate pass? Land it as-is and we'll find out what breaks later?</div>


<div><br></div><div>Nick</div><div><br></div></div>