<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Wed, Dec 7, 2016 at 9:02 AM Robinson, Paul <<a href="mailto:paul.robinson@sony.com">paul.robinson@sony.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="EN-US" link="blue" vlink="purple" class="gmail_msg"><div class="m_-6647814819049291282WordSection1 gmail_msg">

<p class="MsoNormal gmail_msg" style="margin-left:.5in">I don't see how ASan and debuggers are different. It feels like both need reasonably accurate source location attribution for any instruction. ASan just happens to care more about loads and stores than interactive

 stepping debuggers.<u class="gmail_msg"></u><u class="gmail_msg"></u></p>

<p class="MsoNormal gmail_msg"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d" class="gmail_msg"><u class="gmail_msg"></u> <u class="gmail_msg"></u></span></p>

</div></div><div lang="EN-US" link="blue" vlink="purple" class="gmail_msg"><div class="m_-6647814819049291282WordSection1 gmail_msg"><p class="MsoNormal gmail_msg"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d" class="gmail_msg">Actually they are pretty different in their requirements.</span></p></div></div></blockquote><div><br></div><div>I think they're closer than they appear below.</div><div> <span style="color:rgb(31,73,125);font-family:Calibri,sans-serif;font-size:11pt"> </span></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="EN-US" link="blue" vlink="purple" class="gmail_msg"><div class="m_-6647814819049291282WordSection1 gmail_msg">

<p class="MsoNormal gmail_msg"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d" class="gmail_msg">ASan cares about *accurate* source location info for *specific* instructions, the ones that do something ASan cares about.  The source attributions for any

 other instruction is irrelevant to ASan.  The source attributions for these instructions *must* survive optimization.</span></p></div></div></blockquote><div><br>Kostya can correct me if I'm wrong - but I don't believe there's a requirement that the must survive anymore than debug info locations.<br><br>I believe the sanitizers run on similar requirements about impact on optimizations - they probably don't want to adversely perturb optimizations by adding a more strict location tracking system that was undroppable (maybe I'm wrong here) like intrinsics. I think this is perhaps the critical point - if ASan has the same "don't mess with optimization" requirement as debug info, and it needs high accuracy, it can be no higher than debug info /can/ be (even if it's not that accurate now). If that's the case, then we should endeavor to make debug info (if only for the instructions ASan cares about) as accurate ASan needs, and that benefits all debug info consumers.<br><br>Now, if there's a competing need for what information (as I brought up in this thread) hopefully we can have a conversation about what those competing needs look like - how to address them (if we can reconcile the different needs, or need different tuning mode, etc).</div><div> <span style="color:rgb(31,73,125);font-family:Calibri,sans-serif;font-size:11pt"> </span></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="EN-US" link="blue" vlink="purple" class="gmail_msg"><div class="m_-6647814819049291282WordSection1 gmail_msg">

<p class="MsoNormal gmail_msg"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d" class="gmail_msg">Debuggers care about *useful* source location info for *sets* of instructions, i.e. the instructions related to some particular source statement.  If that set

 is only 90% complete/accurate, instead of 100%, generally that doesn't adversely affect the user experience.  If you step past statement A, and happen to execute one or two instructions from the next statement B before you actually stop, generally that is

 not important to the user.  Debuggers are able to tolerate a moderate amount of slop in the source attributions, because absolute accuracy is not critical to correct operation of the debugger.  This is why optimizations can get away with dropping attributions

 that are difficult to represent accurately.<u class="gmail_msg"></u><u class="gmail_msg"></u></span></p>

<p class="MsoNormal gmail_msg"><a name="m_-6647814819049291282__MailEndCompose" class="gmail_msg"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d" class="gmail_msg"><u class="gmail_msg"></u> <u class="gmail_msg"></u></span></a></p>

<p class="MsoNormal gmail_msg"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d" class="gmail_msg">ASan should be able to encode source info for just the instructions it cares about, e.g. pass an index or other encoded representation to the RT calls.  Being

 actual parameters, they will survive any correct optimization, unlike today's situation where multiple calls might be merged by an optimization, damaging the correctness of ASan reports.  (We've see this exact thing happen.)  ASan does not need a line table

 mapping all instructions back to their source; it needs a parameter at each call (more or less).  It does need a file table, that's the main bit of redundancy with debug info that I see happening.<u class="gmail_msg"></u><u class="gmail_msg"></u></span></p>

<p class="MsoNormal gmail_msg"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d" class="gmail_msg">--paulr<u class="gmail_msg"></u><u class="gmail_msg"></u></span></p>

<p class="MsoNormal gmail_msg"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d" class="gmail_msg"><u class="gmail_msg"></u> <u class="gmail_msg"></u></span></p>

<div style="border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt" class="gmail_msg">

<div class="gmail_msg">

<div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0in 0in 0in" class="gmail_msg">

<p class="MsoNormal gmail_msg"><b class="gmail_msg"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" class="gmail_msg">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"" class="gmail_msg"> Reid Kleckner [mailto:<a href="mailto:rnk@google.com" class="gmail_msg" target="_blank">rnk@google.com</a>]

<br class="gmail_msg">

<b class="gmail_msg">Sent:</b> Wednesday, December 07, 2016 8:23 AM<br class="gmail_msg">

<b class="gmail_msg">To:</b> Robinson, Paul<br class="gmail_msg">

<b class="gmail_msg">Cc:</b> Hal Finkel; David Blaikie; <a href="mailto:llvm-dev@lists.llvm.org" class="gmail_msg" target="_blank">llvm-dev@lists.llvm.org</a><br class="gmail_msg">

<b class="gmail_msg">Subject:</b> Re: [llvm-dev] Debug Locations for Optimized Code<u class="gmail_msg"></u><u class="gmail_msg"></u></span></p>

</div>

</div></div></div></div><div lang="EN-US" link="blue" vlink="purple" class="gmail_msg"><div class="m_-6647814819049291282WordSection1 gmail_msg"><div style="border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt" class="gmail_msg">

<p class="MsoNormal gmail_msg"><u class="gmail_msg"></u> <u class="gmail_msg"></u></p>

<div class="gmail_msg">

<div class="gmail_msg">

<div class="gmail_msg">

<p class="MsoNormal gmail_msg">On Wed, Dec 7, 2016 at 7:39 AM, Robinson, Paul via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="gmail_msg" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<u class="gmail_msg"></u><u class="gmail_msg"></u></p>

<p class="MsoNormal gmail_msg">When we are looking at a situation where an instruction is merely *moved*<br class="gmail_msg">

from one place to another, retaining the source location and having a<br class="gmail_msg">

less naïve statement-marking tactic could help the debugging experience<br class="gmail_msg">

without perturbing other consumers (although one still wonders whether<br class="gmail_msg">

profiles will get messed up in cases where e.g. a loop invariant gets<br class="gmail_msg">

hoisted out of a cold loop into a hot predecessor).<br class="gmail_msg">

<br class="gmail_msg">

When we are looking at a situation where two instructions are *merged* or<br class="gmail_msg">

*combined* into one, and the original two instructions had different<br class="gmail_msg">

source locations, that's a separate problem.  In that case there is no<br class="gmail_msg">

single correct source location for the new instruction, and typically<br class="gmail_msg">

erasing the source location will give a better debugging experience (also<br class="gmail_msg">

a less misleading profile).<br class="gmail_msg">

<br class="gmail_msg">

My personal opinion is that having sanitizers *rely* on debug info for<br class="gmail_msg">

accurate source attribution is just asking for trouble.  It happens to<br class="gmail_msg">

work at –O0 but cannot be considered reliable in the face of optimization.<br class="gmail_msg">

IMO this is a fundamental design flaw; debug info is best-effort and full<br class="gmail_msg">

of ambiguities, as shown above. Sanitizers need a more reliable<br class="gmail_msg">

source-of-truth, i.e. they should encode source info into their own<br class="gmail_msg">

instrumentation.<u class="gmail_msg"></u><u class="gmail_msg"></u></p>

<div class="gmail_msg">

<p class="MsoNormal gmail_msg"><u class="gmail_msg"></u> <u class="gmail_msg"></u></p>

</div>

<div class="gmail_msg">

<p class="MsoNormal gmail_msg">I don't see how ASan and debuggers are different. It feels like both need reasonably accurate source location attribution for any instruction. ASan just happens to care more about loads and stores than interactive stepping debuggers.<u class="gmail_msg"></u><u class="gmail_msg"></u></p>

</div>

<div class="gmail_msg">

<p class="MsoNormal gmail_msg"><u class="gmail_msg"></u> <u class="gmail_msg"></u></p>

</div>

<div class="gmail_msg">

<p class="MsoNormal gmail_msg">It really doesn't make sense for ASan to invent another mechanism to track source location information. Any mechanism we build would be so redundant with debug info that, as an implementation detail, we would find a way to make them use

 the same storage when possible. With that in mind, maybe we should really find a way to mark source locations as "hoisted" or "sunk" so that we can suppress them from our line tables or do something else clever.<u class="gmail_msg"></u><u class="gmail_msg"></u></p>

</div>

</div>

</div>

</div>

</div></div></div></blockquote></div></div>