<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Dec 4, 2013 at 11:40 PM, David Blaikie <span dir="ltr"><<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class=""><div class="h5">On Wed, Dec 4, 2013 at 6:24 AM, Alexey Samsonov <<a href="mailto:samsonov@google.com">samsonov@google.com</a>> wrote:<br>


><br>

> On Tue, Dec 3, 2013 at 9:21 PM, David Blaikie <<a href="mailto:dblaikie@gmail.com">dblaikie@gmail.com</a>> wrote:<br>

>><br>

>> On Tue, Dec 3, 2013 at 8:27 AM, Alexey Samsonov <<a href="mailto:samsonov@google.com">samsonov@google.com</a>><br>

>> wrote:<br>

>> ><br>

>> > On Tue, Dec 3, 2013 at 11:16 AM, David Blaikie <<a href="mailto:dblaikie@gmail.com">dblaikie@gmail.com</a>><br>

>> > wrote:<br>

>> >><br>

>> >> On Mon, Dec 2, 2013 at 8:00 AM, Alexey Samsonov <<a href="mailto:samsonov@google.com">samsonov@google.com</a>><br>

>> >> wrote:<br>

>> >> > Hi David,<br>

>> >> ><br>

>> >> > Let me outline some ideas I got after reading this code as a<br>

>> >> > sanity-check-request - I have too little knowledge of DBG_VALUE<br>

>> >> > machine<br>

>> >> > instructions, working with machine functions, etc.<br>

>> >> ><br>

>> >> > On Fri, Nov 1, 2013 at 1:46 AM, David Blaikie <<a href="mailto:dblaikie@gmail.com">dblaikie@gmail.com</a>><br>

>> >> > wrote:<br>

>> >> >><br>

>> >> >> To document some of my own state on this issue (I'm happy to take<br>

>> >> >> this<br>

>> >> >> into another thread or bug to discuss if people think there's a<br>

>> >> >> better<br>

>> >> >> context for such discussion)...<br>

>> >> >><br>

>> >> >> In DwarfDebug::beginFunction there is a bunch of code (that could<br>

>> >> >> probably<br>

>> >> >> be refactored into some named functions to make it more documented)<br>

>> >> >> but<br>

>> >> >> here's the basic idea:<br>

>> >> >><br>

>> >> >> For all MBB<br>

>> >> >>   For all MI<br>

>> >> >>     if it's a debug value<br>

>> >> >>       record the location in the 'history' for this variable<br>

>> >> >>       mark the register as in-use by a certain variable (what<br>

>> >> >> happens<br>

>> >> >> if<br>

>> >> >> we end up with two variables in the same register? I don't think<br>

>> >> >> this<br>

>> >> >> code<br>

>> >> >> can handle that)<br>

>> >> ><br>

>> >> ><br>

>> >> > I agree - there are real-life cases when several variables refer to<br>

>> >> > the<br>

>> >> > same<br>

>> >> > register - e.g. in ASan mode addresses of all local variables are<br>

>> >> > defined as<br>

>> >> > offsets from $rsp via DBG_VALUE instruction.<br>

>> >> > So, instead of [Register]->[Debug variable] mapping we need to have<br>

>> >> > [Register] -> [Vector of debug variables] here.<br>

>> >><br>

>> >> Fair point. Though are we ever going to indirect through transient<br>

>> >> registers? (eg: "variable is in deref(reg+offset) where 'reg' isn't<br>

>> >> just the frame register live across the whole function)<br>

>> ><br>

>> ><br>

>> > Don't know, we might. In ASan we build all code with use-after-return<br>

>> > detection support:<br>

>> ><br>

>> > define i32 @_Z7my_funci(i32 %a) #0 {<br>

>> > entry:<br>

>> >   %MyAlloca = alloca [224 x i8], align 32<br>

>> > //<--------------------------------------- actual stack frame<br>

>> >   %0 = ptrtoint [224 x i8]* %MyAlloca to i64<br>

>> >   %1 = load i32* @__asan_option_detect_stack_use_after_return<br>

>> >   %2 = icmp ne i32 %1, 0<br>

>> >   br i1 %2, label %3, label %5<br>

>> ><br>

>> > ; <label>:3                                       ; preds = %entry<br>

>> >   %4 = call i64 @__asan_stack_malloc_2(i64 224, i64 %0)   <-------------<br>

>> > "fake stack frame" returned by ASan runtime<br>

>> >   br label %5<br>

>> ><br>

>> > ; <label>:5                                       ; preds = %entry, %3<br>

>> >   %6 = phi i64 [ %0, %entry ], [ %4, %3 ]<br>

>> > <------------------------------<br>

>> > pointer to actual or fake stack frame, where all the local variables are<br>

>> > located.<br>

>> > <...><br>

>> >   %9 = add i64 %6, 96<br>

>> >   %10 = inttoptr i64 %9 to i32*<br>

>> > <--------------------------------------------- user variable, given as<br>

>> > offset into either actual or fake stack frame<br>

>> > <...><br>

>> >   call void @llvm.dbg.declare(metadata !{i32* %10}, metadata !18)<br>

>> ><br>

>> > I can imaging the situation where "%6" won't be spilled on stack, and<br>

>> > we'll<br>

>> > have to report variable location<br>

>> > as "reg storing %6 + offset" (I didn't verify this happens, though).<br>

>><br>

>> OK - could be interesting. Be good to have a test case of this.<br>

>><br>

>> >> If the only indirection is for stack cases, then we don't need to<br>

>> >> worry about clobbers I think - well, not in the traditional sense the<br>

>> >> way this algorithm is written. We might need to worry about stack<br>

>> >> reuse... which could get interesting. I'm not sure how that (lifetime<br>

>> >> markers) is modeled.<br>

>> >><br>

>> >><br>

>> >> > The code in this branch also inserts a range-terminator instruction<br>

>> >> > between<br>

>> >> > a pair DBG_VALUE instrs for the same variable if instructions in this<br>

>> >> > pair<br>

>> >> > are from different basic blocks (I assume, this is done for the same<br>

>> >> > purpose as inserting range-terminators in the second loop, and it can<br>

>> >> > hurt<br>

>> >> > us as well).<br>

>> >><br>

>> >> Right - they're both parts of the same general algorithm to terminate<br>

>> >> at the end of BBs. There's no actual DBG_VALUE to terminate here, it's<br>

>> >> just implicit in this algorithm that the live ranges terminate at the<br>

>> >> BB boundary.<br>

>> >><br>

>> >> >>     else<br>

>> >> >>       if this instruction clobbers any register<br>

>> >> >>         if this instruction is in the same MBB as the current<br>

>> >> >> defining<br>

>> >> >> debug value for the variable<br>

>> >> >>           record this instruction in the history (it will act as a<br>

>> >> >> debug<br>

>> >> >> loc range terminator) of the variable<br>

>> >> >><br>

>> >> >> For all the variables<br>

>> >> >>   If the end of the history is debug value (not a non-debug<br>

>> >> >> instruction<br>

>> >> >> which would terminate the range)<br>

>> >> >>     If that debug value is in a different basic block than the final<br>

>> >> >> basic<br>

>> >> >> block in the function<br>

>> >> >>       add the last instruction in the debug values basic block to<br>

>> >> >> the<br>

>> >> >> history to terminate the range<br>

>> >> >><br>

>> >> ><br>

>> >> > We terminate ranges too early - some registers may not in fact be<br>

>> >> > clobbered<br>

>> >> > between two different basic blocks. However, looks like we *have* all<br>

>> >> > the<br>

>> >> > information we need - DBG_VALUE instructions and<br>

>> >> > register-clobbering instructions - in DwarfDebug::beginFunction(),<br>

>> >> > and<br>

>> >> > should just properly compute location ranges. It is nontrivial,<br>

>> >> > though,<br>

>> >> > and<br>

>> >> > I think the main purpose range-terminator instructions were added is<br>

>> >> > to deal with confusion when MBB has more than one predecessor. We<br>

>> >> > need<br>

>> >> > to<br>

>> >> > somehow implement "phi-nodes" for variable locations...<br>

>> >> > Suppose we have:<br>

>> >> ><br>

>> >> > MBB1:<br>

>> >> >   dbg_loc("x") = %rax<br>

>> >> >   jmp MBB3<br>

>> >> > MBB2:<br>

>> >> >   dbg_loc("x") = %rdx<br>

>> >> >   jmp MBB3<br>

>> >> > MBB3:<br>

>> >> >   // where is "x" now?<br>

>> >> ><br>

>> >> > currently we add dbg_loc("x") = undef at the end of MBB1 and MBB2 for<br>

>> >> > safety, but in general we should do a proper analysis.<br>

>> >><br>

>> >> My thinking is that we shouldn't need to perform an analysis here - we<br>

>> >> should rely on the register allocator to tell us what it already knows<br>

>> >> about liveness. Recomputing this from an MBB analysis seems like we'd<br>

>> >> be redoing work that's already been performed.<br>

>> ><br>

>> ><br>

>> > I don't understand what liveness information you mean here. Looking at<br>

>> > MBB's, we know<br>

>> > which instruction clobbers which (physical) registers, sets of live-ins<br>

>> > for<br>

>> > MBBs,<br>

>> > and we know registers each dbg_value instruction refers to. Looks like<br>

>> > this<br>

>> > is the granularity<br>

>> > we need - location ranges for each variable in general start and end at<br>

>> > some<br>

>> > instructions in the<br>

>> > middle of machine basic blocks. That is, I can't imagine what kind of<br>

>> > data<br>

>> > we<br>

>> > may need from register allocator that would simplify the location range<br>

>> > building.<br>

>><br>

>> If we have live-ins in the MBB, does that include rsp?<br>

><br>

><br>

> They don't, but I think we may safely assume that $rsp won't change in the<br>

> function.<br>

<br>

</div></div>Right - that's one of my points. The frame pointer is beyond analysis<br>

- it's a core assumption of the whole system. So we might need to bake<br>

such assumptions into debug info as well, though I'm not sure that'll<br>

be the case. It'd be nice if we could bake it into the register<br>

allocator and have it exposed in some generic way.<br>

<div class="im"><br>

>> If so, then<br>

>> sure - we can use that to avoid termination of variables trivially by<br>

>> looking at the live-ins of the following block, right?<br>

><br>

><br>

> No. Live-ins are related to execution order (predecessor/successor MBBs),<br>

> while<br>

> location ranges are related to the the order in which MBBs are written to an<br>

> object file,<br>

> so I doubt we may use live-ins here (I guess there are not many pieces in<br>

> early stages<br>

> of LLVM CodeGen that actually care about the ordering of MBBs, but we do).<br>

><br>

> One more (theoretical example):<br>

> MBB1:<br>

>   dbg_loc("a") = %r8<br>

>   jmp to MBB3<br>

> MBB2 (live-in set contains %r8 and %r9):<br>

>   <...>  <-------------------------------------- where is "a"?<br>

> MBB3:<br>

>   dbg_loc("a") = %r9<br>

>   jmp to MBB2<br>

><br>

> In this example we can't continue the location-range for "a" even though the<br>

> register it's stored in is in the live-set of the<br>

> following basic block.<br>

<br>

</div>Ah, right - thanks for the example. Because we don't have a phi-like<br>

structure that says "this live in is this variable".<br>

<br>

So the only thing for it would be to put new dbg_value intrinsics at<br>

the beginning of every block, which seems painful (we've already had<br>

some people complaining about explosions of debug_value intrinsics<br>

that we emit/need today, adding more would be unfortunate). I'd<br>

certainly like to avoid doing that for anything in a stack slot if we<br>

could.<br>

<div><div class="h5"><br>

> On the contrary, if we're clever enough and MBB2 can<br>

> only be reached from MBB3, we may create a location-range<br>

> that would specify that "a" is located in %r9 even in MBB2.<br>

><br>

><br>

>><br>

>> So we wouldn't<br>

>> need to do any graph walking, etc, which it sounded like you were<br>

>> suggesting.<br>

>><br>

>><br>

>> > Alternatively, we may move calculation of these ranges and insertion of<br>

>> > location-range-terminatiors earlier, but IMO it's easier to this in<br>

>> > AsmPrinter - after all location<br>

>> > ranges rely on labels, so we don't want to do any transformation of<br>

>> > MachineFunction once we<br>

>> > calculated location ranges for variables.<br>

>> >><br>

>> >><br>

>> >> Thing is, things like the frame pointer are sort of "beyond" register<br>

>> >> allocation - their blessing and immutability is an underlying<br>

>> >> assumption of the whole system. So I'm wondering if the right thing to<br>

>> >> do here is to never model frame index and frame pointer reg+offsets as<br>

>> >> dbg_values and instead model them in the same way we model allocas<br>

>> >> (indeed we could generalize the alloca handling, hopefully).<br>

>> ><br>

>> ><br>

>> > IIUC, you don't like the following situation happening now:<br>

>> ><br>

>> > MOV64mr %RSP, 1, %noreg, 48, %noreg, %R8<kill>; mem:ST8[FixedStack6]<br>

>> > DBG_VALUE %RSP, 48, !"c"; line no:2<br>

>> ><br>

>> > Why build a dbg_value and location range for the address "c" (deref($rsp<br>

>> > +<br>

>> > 48)), if this<br>

>> > address is spilled into a slot with frame index 6?<br>

>><br>

>> I don't have a strong idea about how this should or shouldn't work -<br>

>> honestly I know very little about this code. So don't take my<br>

>> suggestions as dictates, I'm just tossing around ideas.<br>

>><br>

>> But yes, I'm suggesting it might be possible to address the issue by<br>

>> avoiding DBG_VALUEs in that case.<br>

>><br>

>> > Maybe it's possible to<br>

>> > avoid this,<br>

>> > but this means we'll have to dive into CodeGen, find the place where it<br>

>> > generates<br>

>> > mentioned dbg_value instruction, and instead of generating tell that<br>

>> > location of "c" can<br>

>> > be found using the given stack slot.<br>

>><br>

>> That's what I had in mind, yes - go find where we special case allocas<br>

>> and put their locations into some side table and use that machinery.<br>

><br>

><br>

> Yep, allocas are put in a special table in MachineModuleInfo early at<br>

> SelectionDAG building,<br>

> and are handled separately (basically, we map each variable to a stack<br>

> slot).<br>

> But if we're going to use this for (a subset of) varaibles currently tracked<br>

> with dbg_value,<br>

> we'll probably have to distinguish "stack slot storing the variable" and<br>

> "stack slot storing the address of<br>

> the variable". Ew.<br>

<br>

</div></div>For the non-trivial-pass-by-value cases you mean? Yes, that's one<br>

unfortunate part of this.<br>

<div class="im"><br>

>> Alternatively, if we're going to support reg offsets and we have<br>

>> liveness/live-in information that allows us to power all this stuff<br>

>> well, perhaps we could swing the other way and remove the existing<br>

>> alloca special case and track those with MI DBG_VALUEs too once we've<br>

>> addressed the problem. That would be a nice cleanup.<br>

><br>

><br>

> I see, so if we're going to make dbg_value -> location-range transformation<br>

> work<br>

> "properly", we may later remove the side table for allocas. I'm still not<br>

> convinced we<br>

> should gather additional information from CodeGen to build location ranges<br>

> in DwarfDebug,<br>

> so I guess I'll try to do a proof-of-concept rewrite of the current<br>

> algorithm and see if<br>

> it doesn't break anything and fix existing ASan problems (and Clang test you<br>

> provided).<br>

<br>

</div>I'm not sure exactly what change to the algorithm you have in mind -<br>

if you're just talking about the algorithm that takes the current<br>

debug_value MI intrinsics and produces live ranges, I'm not sure how<br>

you're going to improve it with only those current debug_value<br>

intrinsics. I hope it's not too much work for you to do, given how<br>

much we're kind of "in the dark" here.<br></blockquote><div><br></div><div>OK, proof of concept turned out to be happy 400-lines-of-code monster:</div><div><a href="http://llvm-reviews.chandlerc.com/D2658">http://llvm-reviews.chandlerc.com/D2658</a></div>

<div>Sorry.</div><div><br></div><div>Still, I think the analysis there can allow us to be much more precise in generating</div><div>location ranges for variables w/o modifying the CodeGen at all, and help us in general case (i.e.</div>

<div>not only for variables that are addressed using the frame pointer).</div><div><br></div><div>I've verified that my patch works for the trivial cases of debug info broken with ASan.</div><div>Do you have an example which is currently broken for plain Clang?</div>

</div><div><br></div>-- <br><div>Alexey Samsonov, MSK</div>

</div></div>