<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>
</head>
<body dir="ltr">
<div id="divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri,Arial,Helvetica,sans-serif;" dir="ltr">
<blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;">
<p><i>> <span style="font-family: Calibri, Arial, Helvetica, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols; font-size: 13.3333px;">could you get a backtrace of lldb-server when it is in the
"stuck"</span></i></p>
<p><span style="font-family: Calibri, Arial, Helvetica, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols; font-size: 13.3333px;"><i>state (just attach with lldb/gdb after it hangs and do "bt")?</i></span></p>
</blockquote>
<p><span style="font-family: Calibri, Arial, Helvetica, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols; font-size: 13.3333px;"><br>
</span></p>
<p><span style="font-family: Calibri, Arial, Helvetica, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols; font-size: 13.3333px;"><span style="font-size: 12pt;">You wish
</span><img class="EmojiInsert" id="OWAEmoji35851" alt="☹" style="vertical-align: bottom; user-select: none;" src="cid:eb2d8a09-d75b-49ee-b1d4-c804a990d92a"><span style="font-size: 12pt;"> The </span><span style="font-size: 12pt;">lldb-server does not react
to any signals including SIGSTOP, so gdb just hangs forever.</span></span></p>
<p><br>
</p>
<blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;">
<p><i>> <span style="font-family: Calibri, Arial, Helvetica, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols; font-size: 13.3333px;">If you can get me reasonably detailed repro steps, I can
try to </span></i><span style="font-family: Calibri, Arial, Helvetica, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols; font-size: 13.3333px;"><i>investigate</i></span></p>
</blockquote>
<p><span style="font-family: Calibri, Arial, Helvetica, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols; font-size: 13.3333px;"><i></i></span></p>
<p><br>
</p>
<p>Unfortunately I do not have repro myself. It happens semi-randomly on some machines and I need to borrow the machine with the problem. Here are some details from my records:</p>
<p></p>
<ul>
<li>It is pretty big piece of RX memory, /proc/<pid>/maps shows this: <br>
409701000-40b49c000 r-xp 00000000 00:00 0</li><li>Writing into some locations within that VMA works</li><li>When it repros, it is pretty consistent, but changing in the target may shift it - i.e. make no repro or fail at different address.</li></ul>
<p></p>
<blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;">
<p><i>> </i><span style="font-family: Calibri, Arial, Helvetica, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols; font-size: 13.3333px;"><i>are you able to still reproduce the bug with logging
enabled?</i></span></p>
</blockquote>
<p><span style="font-family: Calibri, Arial, Helvetica, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", NotoColorEmoji, "Segoe UI Symbol", "Android Emoji", EmojiSymbols; font-size: 13.3333px;"><i></i></span></p>
<p><br>
</p>
<p></p>
<div>Yes. Here are a few last lines from the log:</div>
<div><br>
</div>
<div><span style="font-family: "Courier New", monospace; font-size: 8pt;">1481139040.768961000 0x7fff253c9780 Communication::Write (src = 0x7fff253c8f48, src_len = 7) connection = 0x24a6bd0</span></div>
<div><span style="font-family: "Courier New", monospace; font-size: 8pt;">1481139040.768963000 0x24a6bd0 ConnectionFileDescriptor::Write (src = 0x7fff253c8f48, src_len = 7)</span></div>
<div><span style="font-family: "Courier New", monospace; font-size: 8pt;">1481139040.768973000 0x24a6cc0 Socket::Write() (socket = 6, src = 0x7fff253c8f48, src_len = 7, flags = 0) => 7 (error = (null))</span></div>
<div><span style="font-family: "Courier New", monospace; font-size: 8pt;">1481139040.768976000 0x24a6bd0 ConnectionFileDescriptor::Write(fd = 6, src = 0x7fff253c8f48, src_len = 7) => 7 (error = (null))</span></div>
<div><span style="font-family: "Courier New", monospace; font-size: 8pt;">1481139040.768979000 0x7fff253c9780 Communication::Read (dst = 0x7fff253c7140, dst_len = 8192, timeout = 0 usec) connection = 0x24a6bd0</span></div>
<div><span style="font-family: "Courier New", monospace; font-size: 8pt;">1481139040.768982000 0x24a6bd0 ConnectionFileDescriptor::BytesAvailable (timeout_usec = 0)</span></div>
<div><span style="font-family: "Courier New", monospace; font-size: 8pt;">1481139040.768984000 0x24a6bd0 ConnectionFileDescriptor::BytesAvailable() ::select (nfds=7, fds={6, 4}, NULL, NULL, timeout=0x7fff253c6d80)...</span></div>
<div><span style="font-family: "Courier New", monospace; font-size: 8pt;">1481139040.768986000 0x24a6bd0 ConnectionFileDescriptor::BytesAvailable() ::select (nfds=7, fds={6, 4}, NULL, NULL, timeout=0x7fff253c6d80) => 0, error = (null)</span></div>
<div><span style="font-family: "Courier New", monospace; font-size: 8pt;">1481139090.788317000 0x7fff253c9780 Communication::Read (dst = 0x7fff253c7140, dst_len = 8192, timeout = 0 usec) connection = 0x24a6bd0</span></div>
<div><span style="font-family: "Courier New", monospace; font-size: 8pt;">1481139090.788356000 0x24a6bd0 ConnectionFileDescriptor::BytesAvailable (timeout_usec = 0)</span></div>
<div><span style="font-family: "Courier New", monospace; font-size: 8pt;">1481139090.788364000 0x24a6bd0 ConnectionFileDescriptor::BytesAvailable() ::select (nfds=7, fds={6, 4}, NULL, NULL, timeout=0x7fff253c6d80)...</span></div>
<div><span style="font-family: "Courier New", monospace; font-size: 8pt;">1481139090.788368000 0x24a6bd0 ConnectionFileDescriptor::BytesAvailable() ::select (nfds=7, fds={6, 4}, NULL, NULL, timeout=0x7fff253c6d80) => 1, error = (null)</span></div>
<div><span style="font-family: "Courier New", monospace; font-size: 8pt;">1481139090.788378000 0x24a6cc0 Socket::Read() (socket = 6, src = 0x7fff253c7140, src_len = 25, flags = 0) => 25 (error = (null))</span></div>
<div><span style="font-family: "Courier New", monospace; font-size: 8pt;">1481139090.788382000 0x24a6bd0 ConnectionFileDescriptor::Read() fd = 6, dst = 0x7fff253c7140, dst_len = 8192) => 25, error = (null)</span></div>
<div><span style="font-family: "Courier New", monospace; font-size: 8pt;">1481139090.788395000 NativeProcessLinux::WriteMemory(0x409d5a7d0, 0x25271d0, 4)</span></div>
<div><span style="font-family: "Courier New", monospace; font-size: 8pt;">1481139090.788409000 NativeProcessLinux::ReadMemory using process_vm_readv to read 8 bytes from inferior address 0x409d5a7d0: Success</span></div>
<div><span style="font-family: "Courier New", monospace; font-size: 8pt;">1481139090.788414000 PTRACE_POKEDATA [1][0][0][0][57][41][54][41]</span></div>
<div><br>
</div>
<p></p>
<p>Thanks,</p>
<p>Eugene</p>
<p><br>
</p>
<div id="Signature">
<p>Sent from <a href="http://aka.ms/weboutlook" id="LPNoLP">Outlook</a><br>
</p>
</div>
<br>
<br>
<div style="color: rgb(0, 0, 0);">
<div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="x_divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> Pavel Labath <labath@google.com><br>
<b>Sent:</b> Wednesday, December 7, 2016 2:34 AM<br>
<b>To:</b> Eugene Birukov<br>
<b>Cc:</b> LLDB<br>
<b>Subject:</b> Re: [lldb-dev] Lldb-server spins forever in ptrace with 100% CPU on Linux Ubuntu 16.04</font>
<div> </div>
</div>
</div>
<font size="2"><span style="font-size:10pt;">
<div class="PlainText">Hello Eugene,<br>
<br>
this sounds troubling, and I'd like to get to the bottom of this. If<br>
you can get me a bit more information about this, I believe we can<br>
figure it out:<br>
<br>
- could you get a backtrace of lldb-server when it is in the "stuck"<br>
state (just attach with lldb/gdb after it hangs and do "bt")? I want<br>
to see the where is it spinning, as I don't see any obvious infinite<br>
loop there.<br>
<br>
- are you able to still reproduce the bug with logging enabled? If so,<br>
I'd like to see the log file to understand this better. (You can<br>
enable logging by starting lldb-server with: --log-file XXX.log<br>
--log-channels "lldb all:linux all". If you're starting it via lldb<br>
client you can set the LLDB_DEBUGSERVER_LOG_FILE and<br>
LLDB_SERVER_LOG_CHANNELS environment vars to achieve this)<br>
<br>
- If you can get me reasonably detailed repro steps, I can try to<br>
investigate (I am fine with the first step being "install ubuntu 16.04<br>
in virtualbox")<br>
<br>
On 6 December 2016 at 23:41, Eugene Birukov via lldb-dev<br>
<lldb-dev@lists.llvm.org> wrote:<br>
> Hi,<br>
> 1. I believe that lldb-server spins inside ptrace. I put breakpoint on the<br>
> highlighted line, and it does not hit. If I put breakpoint on line before,<br>
> it hits but lldb-server hangs.<br>
<br>
Do you mean actually inside the ptrace(2) syscall? Your description<br>
would certainly fit that, but that sounds scary, as it would mean a<br>
kernel bug. If that's the case, then we have to start looking in the<br>
kernel. I have some experience with that, but If we can boil this down<br>
to a simple use case, we can also ask the kernel ptrace folks for<br>
help.<br>
<br>
<br>
> 2. It seems that hang is caused by the client trying to read response too<br>
> fast. I mean, if I step through the client code it works - i.e. there is<br>
> significant delay between client writing into pipe and issuing ::select to<br>
> wait for response.<br>
<br>
I am not sure how this fits in with the item above. I find it hard to<br>
believe that the presence of select(2) in one process would affect the<br>
outcome of ptrace() in another. Unless we are actually encountering a<br>
kernel scheduler bug, which I find unlikely. Hopefully we can get more<br>
insight here with additional logging information.<br>
<br>
<br>
> Any advice how to deal with the situation except putting random sleeps in<br>
> random places?<br>
Inserting sleeps in various places is a valid (albeit very slow)<br>
strategy for debugging races. If you can push the sleep down, you will<br>
eventually reach the place where it will be obvious what is racing<br>
(or, at least, which component is to blame). Hopefully we can do<br>
something smarter though.<br>
<br>
If you are suspecting a kernel bug, I'd recommend recreating it in a<br>
simple standalone application (fork, trace the child, write its<br>
memory), as then it is easy to ask for help<br>
<br>
pl<br>
</div>
</span></font></div>
</div>
</body>
</html>