<br><br><div class="gmail_quote">On Fri, Dec 16, 2011 at 2:35 PM, Chris Lattner <span dir="ltr"><<a href="mailto:clattner@apple.com">clattner@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><div><div class="im"><div>On Dec 16, 2011, at 2:27 PM, Kostya Serebryany wrote:</div><blockquote type="cite"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><div>This is a good question. Would it be possible for ASan to do its instrumentation earlier? </div>
</div></blockquote><div><br></div><div>It would be possible but undesirable. </div><div>First, asan blows up the IR and running asan early will increase the compile-time. </div><div>Second, asan greatly benefits from all optimizations running before it because it needs to instrument less memory accesses. </div>
<div>It actually benefits from load widening too: in the test case above asan instruments only one load instead of two. </div></div></blockquote><div><br></div></div><div>You certainly wouldn't want to run asan before mem2reg/SRoA, but after that, the benefits are probably small. I'd guess that there is some non-zero value to exposing the code generated by asan to the optimizer. Have you looked at that at all?</div>
</div></div></blockquote><div><br></div><div>This is the usual phase ordering problem. </div><div>If asan is done early, you get all of the optimizations cleaning up after asan. </div><div>If you run asan late, you instrument a cleaner IR. </div>
<div>Asan should run after the loop invariant loads are hoisted up, common subexpressions eliminated, loads widened/combined, dead stores eliminated, memsets merged, etc. </div><div>Otherwise the optimizer will have a hard time optimizing the original code *and* the instrumentation. </div>
<div><br></div><div>I have not looked into placing asan somewhere else in LLVM (this is in my TODO list somewhere at the bottom). </div><div>But I had to place asan early in gcc (before the loop optimizations) and it was a disaster (4x compile-time slowdown, 20% poorer run-time performance). </div>
<div>This is not necessary relevant to LLVM of course.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">
<div><div class="im"><br><blockquote type="cite"><div class="gmail_quote"><div>In this case, we have an array of 22 bytes which is 16-aligned. </div><div>
I suspect that load widening changed the alignment of alloca instruction to make the transformation legal. Right? </div><div>Can we change the load widening algorithm to also change the size of alloca instruction to be dividable by 16? </div>
<div>This will solve the problem, at least the variant I observe now. </div></div></blockquote><br></div></div><div>I believe it is 16-byte aligned based on ABI requirements for x86-64, though you're right that the optimizer will increase alignment in other cases. In any case, we don't want to increase the size of the object, because that would prevent packing some other data in after it. For example, a 2-byte aligned 10 byte object can be placed after it in memory if we keep it 22-bytes in size.</div>
</div></blockquote><div><br></div><div>ok.</div><div><br></div><div>I wonder if the load widening can attach some metadata to the stack objects, accesses to which it has modified? </div><div>Then asan will increase the alloca size as appropriate (it does it anyway).</div>
<div><br></div><div>Why you don't like the idea to disable or restrict the widening when asan is on?</div><div><br></div><div>--kcc </div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><span class="HOEnZb"><font color="#888888"><div><br></div><div>-Chris</div><div><br></div></font></span></div>
</blockquote></div><br>