[LLVMdev] MCJIT RemoteMemoryManager Failures on ARM

Kaylor, Andrew andrew.kaylor at intel.com
Tue Nov 26 15:29:05 PST 2013


Looking at the code, one obvious source of intermittent failure is that the Linux implementations of ReadBytes and WriteBytes don't check for EINTR.  I doubt that's the failure you're seeing because it would be more randomly distributed but it's something that should be fixed.

More likely as the cause of failure in your case is that read is returning less than the number of bytes requested.   In theory, this can happen if we read one end of the pipe while the other end is being written, but the current code doesn't check for it.  A race condition like this seems more likely than a code generation problem.

I'm attaching a patch (which I haven't even tried to compile) that I think addresses these issues.  Can you try it out and see if it fixes this problem for you?

If this doesn't do the trick, by stepping through the remote case in the debugger you can see what the communication is leading up to the failure.  From there it should be relatively simple to use just the RemoteTargetExternal class to create a test driver that communicates with the child process in the same way.  This ought to give you a failing test case completely independent of any of significant part of LLVM (unless the failure is entirely timing dependent).

Thanks,
Andy


From: Renato Golin [mailto:renato.golin at linaro.org]
Sent: Tuesday, November 26, 2013 2:44 PM
To: Kaylor, Andrew
Cc: NAKAMURA Takumi; LLVM Dev
Subject: Re: MCJIT RemoteMemoryManager Failures on ARM

On 26 November 2013 19:05, Kaylor, Andrew <andrew.kaylor at intel.com<mailto:andrew.kaylor at intel.com>> wrote:
I would also note that the failure isn't actually in anything MCJIT-specific.  Aside from the fact that it seems to be clang-specific, the code that is failing is specific to the lli remote implementation.  It's not clear to me why it would fail under aggressive optimization with clang, but I wouldn't characterize that code as particularly robust.

I agree. I think this is more likely a codegen fault on Clang's side that crashes the client, not even the remote implementation, that even being crude, has very little room for failure of that magnitude.


I just updated the bugzilla report with a few comments about the failure.  The short of it is that there's nothing MCJIT-specific about this failure.  It's most likely a pipe I/O problem.  I think it's possible that the clang optimizations are just exposing a timing-related vulnerability in the pipe handling.

Ok, I'll disable those tests for ARM for now and will look into the bug.

I don't know much about how MCJIT works, so creating the reduced test case will prove difficult. But I'll progress, because I do want MCJIT to work well on ARM, and disabling tests is the wrong way to head. ;)

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131126/77118bd3/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lli-remote-comm.patch
Type: application/octet-stream
Size: 2997 bytes
Desc: lli-remote-comm.patch
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131126/77118bd3/attachment.obj>


More information about the llvm-dev mailing list