<div dir="ltr">Yea, I already whipped up some changes locally that do almost exactly that.</div><br><div class="gmail_quote"><div dir="ltr">On Tue, Mar 8, 2016 at 3:17 PM Greg Clayton <<a href="mailto:gclayton@apple.com">gclayton@apple.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">If the PDB line tables have sizes you could do:<br>
<br>
PDBLineEntry curr = pdb->GetLineEntry(n);<br>
PDBLineEntry next = pdb->GetLineEntry(n+1);<br>
<br>
line_entries.insert(curr.addr, curr.line, curr.file, false /*is_terminal*/);<br>
<br>
if (curr.addr + curr.size != next.addr)<br>
{<br>
// Insert terminal entry for end of curr<br>
line_entries.insert(curr.addr + curr.size, curr.line, curr.file, true /*is_terminal*/);<br>
}<br>
<br>
Above is pseudo code, but you get the idea...<br>
<br>
> On Mar 8, 2016, at 12:56 PM, Zachary Turner <<a href="mailto:zturner@google.com" target="_blank">zturner@google.com</a>> wrote:<br>
><br>
> Let's suppose I've got this function (ignore the operands to branch instructions, I disassembled a real function and just manually adjusted addresses on the left side only just to create a contrived example).<br>
><br>
> infinite-dwarf.exe`main at infinite.cpp:5<br>
> 4<br>
> 5 int main(int argc, char **argv) {<br>
> 6 int n = 0;<br>
> infinite-dwarf.exe`main:<br>
> infinite-dwarf.exe[0x410000] <+0>: 55 push ebp<br>
> infinite-dwarf.exe[0x410001] <+1>: 89 e5 mov ebp, esp<br>
> infinite-dwarf.exe[0x410003] <+3>: 83 ec 18 sub esp, 0x18<br>
> infinite-dwarf.exe[0x410006] <+6>: 8b 45 0c mov eax, dword ptr [ebp + 0xc]<br>
> infinite-dwarf.exe[0x410009] <+9>: 8b 4d 08 mov ecx, dword ptr [ebp + 0x8]<br>
> infinite-dwarf.exe[0x41000c] <+12>: c7 45 fc 00 00 00 00 mov dword ptr [ebp - 0x4], 0x0<br>
> infinite-dwarf.exe[0x410013] <+19>: 89 45 f8 mov dword ptr [ebp - 0x8], eax<br>
> infinite-dwarf.exe[0x410016] <+22>: 89 4d f4 mov dword ptr [ebp - 0xc], ecx<br>
> infinite-dwarf.exe`main + 25 at infinite.cpp:6<br>
> 5 int main(int argc, char **argv) {<br>
> 6 int n = 0;<br>
> 7 while (n < 10) {<br>
> infinite-dwarf.exe[0x410019] <+25>: c7 45 f0 00 00 00 00 mov dword ptr [ebp - 0x10], 0x0<br>
> infinite-dwarf.exe`main + 32 at infinite.cpp:7<br>
> 6 int n = 0;<br>
> 7 while (n < 10) {<br>
> 8 std::cout << n << std::endl;<br>
> infinite-dwarf.exe[0x410020] <+32>: 83 7d f0 0a cmp dword ptr [ebp - 0x10], 0xa<br>
> infinite-dwarf.exe`main + 36 at infinite.cpp:7<br>
> 6 int n = 0;<br>
> 7 while (n < 10) {<br>
> 8 std::cout << n << std::endl;<br>
> infinite-dwarf.exe[0x410024] <+36>: 0f 8d 4a 00 00 00 jge 0x410074<br>
> infinite-dwarf.exe`main + 42 at infinite.cpp:8<br>
> 7 while (n < 10) {<br>
> 8 std::cout << n << std::endl;<br>
> 9 Sleep(1000);<br>
> infinite-dwarf.exe[0x41002a] <+42>: 8b 45 f0 mov eax, dword ptr [ebp - 0x10]<br>
> infinite-dwarf.exe`main + 45 at infinite.cpp:8<br>
> 7 while (n < 10) {<br>
> 8 std::cout << n << std::endl;<br>
> 9 Sleep(1000);<br>
> infinite-dwarf.exe[0x41002d] <+45>: 89 e1 mov ecx, esp<br>
> infinite-dwarf.exe[0x41002f] <+47>: 89 01 mov dword ptr [ecx], eax<br>
> infinite-dwarf.exe[0x410031] <+49>: b9 80 c1 40 00 mov ecx, 0x40c180<br>
> infinite-dwarf.exe[0x410036] <+54>: e8 55 0a 00 00 call 0x410a90<br>
> infinite-dwarf.exe[0x41003b] <+59>: 83 ec 04 sub esp, 0x4<br>
> infinite-dwarf.exe`main + 62 at infinite.cpp:8<br>
> 7 while (n < 10) {<br>
> 8 std::cout << n << std::endl;<br>
> 9 Sleep(1000);<br>
> infinite-dwarf.exe[0x41003e] <+62>: 89 e1 mov ecx, esp<br>
> infinite-dwarf.exe[0x410040] <+64>: c7 01 50 0d 41 00 mov dword ptr [ecx], 0x410d50<br>
> infinite-dwarf.exe[0x410046] <+70>: 89 c1 mov ecx, eax<br>
> infinite-dwarf.exe[0x410048] <+72>: e8 e3 0c 00 00 call 0x410d30<br>
> infinite-dwarf.exe[0x41004d] <+77>: 83 ec 04 sub esp, 0x4<br>
><br>
><br>
> ; function becomes discontiguous here<br>
><br>
><br>
><br>
> infinite-dwarf.exe`main + 80 at infinite.cpp:9<br>
> 8 std::cout << n << std::endl;<br>
> 9 Sleep(1000);<br>
> 10 n++;<br>
> infinite-dwarf.exe[0x510050] <+80>: 89 e1 mov ecx, esp<br>
> infinite-dwarf.exe[0x510052] <+82>: c7 01 e8 03 00 00 mov dword ptr [ecx], 0x3e8<br>
> infinite-dwarf.exe[0x510058] <+88>: 8b 0d 04 93 43 00 mov ecx, dword ptr [0x439304]<br>
> infinite-dwarf.exe[0x51005e] <+94>: 89 45 ec mov dword ptr [ebp - 0x14], eax<br>
> infinite-dwarf.exe[0x510061] <+97>: ff d1 call ecx<br>
> infinite-dwarf.exe[0x510063] <+99>: 83 ec 04 sub esp, 0x4<br>
> infinite-dwarf.exe`main + 102 at infinite.cpp:10<br>
> 9 Sleep(1000);<br>
> 10 n++;<br>
> 11 }<br>
> infinite-dwarf.exe[0x510066] <+102>: 8b 45 f0 mov eax, dword ptr [ebp - 0x10]<br>
> infinite-dwarf.exe[0x510069] <+105>: 83 c0 01 add eax, 0x1<br>
> infinite-dwarf.exe[0x51006c] <+108>: 89 45 f0 mov dword ptr [ebp - 0x10], eax<br>
> infinite-dwarf.exe`main + 111 at infinite.cpp:7<br>
> 6 int n = 0;<br>
> 7 while (n < 10) {<br>
> 8 std::cout << n << std::endl;<br>
> infinite-dwarf.exe[0x51006f] <+111>: e9 ac ff ff ff jmp 0x410020<br>
> infinite-dwarf.exe[0x510074] <+116>: 31 c0 xor eax, eax<br>
> infinite-dwarf.exe`main + 118 at infinite.cpp:13<br>
> 12<br>
> 13 return 0;<br>
> 14 }<br>
> infinite-dwarf.exe[0x510076] <+118>: 83 c4 18 add esp, 0x18<br>
> infinite-dwarf.exe[0x510079] <+121>: 5d pop ebp<br>
> infinite-dwarf.exe[0x51007a] <+122>: c3 ret<br>
><br>
><br>
> About halfway down, the addresses suddenly increase by 0x100000. So the compiler decided that for some strange reason while unrolling the loop it was just going to start placing code somewhere else entirely. Am I correct in saying that 0x410050 should be a terminal entry in this example?<br>
><br>
> On Mon, Mar 7, 2016 at 3:31 PM Greg Clayton <<a href="mailto:gclayton@apple.com" target="_blank">gclayton@apple.com</a>> wrote:<br>
><br>
> > On Mar 7, 2016, at 3:21 PM, Zachary Turner <<a href="mailto:zturner@google.com" target="_blank">zturner@google.com</a>> wrote:<br>
> ><br>
> > Does DWARF not store this information? Because it seems like it could be efficiently stored in an interval tree, the question is just whether it is efficient to convert what DWARF stores into that format.<br>
><br>
> No it stores it just like we do, but in a compressed format that is useless for searching.<br>
><br>
> > PDB returns line entries in the format I described, with a start address and a byte length, so to determine whether something is a terminal entry I have to add them to some kind of data structure that collapses ranges and then manually scan through for breaks in the continuity of the range.<br>
> ><br>
> > Is there some way we can make this more generic so that it's efficient for both DWARF and PDB?<br>
><br>
> We need an efficient memory format that LLDB can use to search things, which is how things currently are done: all plug-ins are expected to parse debug info and make a series of lldb_private::LineTable::Entry structs.<br>
><br>
> We could defer this functionality into the plug-ins directly where you always must say "hey SymbolFile, here is a section offset address, please get me the lldb_private::LineEntry:<br>
><br>
> bool<br>
> SymbolFile::GetLineEntryForAddress (const lldb_private::Address &addr, lldb_private::LineEntry &line_entry);<br>
><br>
> The thing I don't like about this approach where we don't supply the format we want the line tables to be in is this does make it quite painful to iterate over all line table entries for a compile unit. You would need to get the address range for all functions in a compile unit, then make a loop that would iterate through all addresses and try to lookup each address to find the lldb_private::LineEntry for that address. Right now we just get the LineTable from the compile unit and say "bool LineTable::GetLineEntryAtIndex(uint32_t idx, LineEntry &line_entry);".<br>
><br>
><br>
<br>
</blockquote></div>