[llvm-dev] [ORC JIT][MLIR] GDBRegistrationListener "second attempt to perform debug registration" assert

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Wed May 20 19:59:49 PDT 2020


+Lang

On Wed, May 20, 2020 at 4:44 PM Straw, Adam D via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi all,
>
> Attention:  Lang Hames
>
>
>
> I am developing the nGraph MLIR
> <https://github.com/NervanaSystems/ngraph/tree/master/src/contrib/mlir>
> implementation and hitting the following assert while running nGraph unit
> tests:
>
>
>
>   assert(ObjectBufferMap.find(K) == ObjectBufferMap.end() &&
>
>          "Second attempt to perform debug registration.");
>
>
>
> Here is a permalink
> <https://github.com/llvm/llvm-project/blob/3d5360a4398bfa6878f94ca9ac55bc568692c765/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp#L163>
> for that line of code on GitHub.  The failure only occurs when running
> multiple unit tests back-to-back.  Inevitably, if I rerun the failing unit
> test, it passes.  The failure also tends to move around with a different
> unit tests failing on successive runs.
>
>
>
> I am able to hit the failure in GDB.  Here is a partial backtrace:
>
>
>
> #3  0x00007fffec4cf412 in __GI___assert_fail (assertion=assertion at entry=0x7ffff1d145e0
> "ObjectBufferMap.find(K) == ObjectBufferMap.end() && \"Second attempt to
> perform debug registration.\"",
>
>     file=file at entry=0x7ffff1d144d8
> "/localdisk/adstraw/ngraph/build/mlir_project/llvm-project/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp",
> line=line at entry=164,
>
>     function=function at entry=0x7ffff1d15520 <(anonymous
> namespace)::GDBJITRegistrationListener::notifyObjectLoaded(unsigned long,
> llvm::object::ObjectFile const&, llvm::RuntimeDyld::LoadedObjectInfo
> const&)::__PRETTY_FUNCTION__> "virtual void
> {anonymous}::GDBJITRegistrationListener::notifyObjectLoaded(llvm::JITEventListener::ObjectKey,
> const llvm::object::ObjectFile&, const
> llvm::RuntimeDyld::LoadedObjectInfo&)")
>
>     at assert.c:101
>
> #4  0x00007ffff01ed4d3 in (anonymous
> namespace)::GDBJITRegistrationListener::notifyObjectLoaded
> (this=0x5555591a9270, K=93825060893584, Obj=..., L=...)
>
>     at
> /localdisk/adstraw/ngraph/build/mlir_project/llvm-project/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp:163
>
> #5  0x00007ffff01b0eca in std::function<void (unsigned long,
> llvm::object::ObjectFile const&, llvm::RuntimeDyld::LoadedObjectInfo
> const&)>::operator()(unsigned long, llvm::object::ObjectFile const&,
> llvm::RuntimeDyld::LoadedObjectInfo const&) const (__args#2=...,
> __args#1=..., __args#0=<optimized out>, this=0x5555594e09b8) at
> /usr/include/c++/7/bits/std_function.h:706
>
> #6  llvm::orc::RTDyldObjectLinkingLayer::onObjLoad (this=<optimized out>,
> K=3, R=..., Obj=..., MemMgr=0x5555591c4160,
> LoadedObjInfo=std::unique_ptr<llvm::RuntimeDyld::LoadedObjectInfo> = {...},
>
>     Resolved=std::map with 8 elements = {...}, InternalSymbols=std::set
> with 0 elements)
>
>     at
> /localdisk/adstraw/ngraph/build/mlir_project/llvm-project/llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:267
>
>
>
> A little debug context:
>
>
>
>    - *Frame #6* with K=3 we are about to execute the `NotifyLoaded`
>    callback
>    - *Frame #5* we execute the `NotifyLoaded` callback which apparently
>    transforms K from 3 to 93825060893584 (0x5555596CF390) which looks like a
>    pointer.  I don’t know the code well enough to grok what’s going on here.
>    I’m not even sure where this callback resides, but it could be here
>    <https://github.com/llvm/llvm-project/blob/c66f89005f6d23b6885d8f93f33ff27dc60ce7dd/llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp#L312>
>    in RTDyldObjectLinkingLater.cpp where we are using the address of the
>    `MemMgr` as K.
>    - *Frame #4 *now in `notifyObjectLoaded` with K=93825060893584 which
>    will go on to assert in *frame #3* due to the fact that that key (K)
>    already exists in the map
>
>
>
> All of this leads me to believe there is some sort of race.  Theory:  with
> two unit tests A and B running back to back…
>
>
>
>    1. Unit test A allocates and deallocates a `MemMgr` (or whatever other
>    object we are using for a key) at address X
>    2. Unit test A calls ` notifyFreeingObject` to free the object at K=X
>    but gets stuck (does not lock) the ` JITDebugLock` mutex here
>    <https://github.com/llvm/llvm-project/blob/3d5360a4398bfa6878f94ca9ac55bc568692c765/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp#L181>
>    for whatever reason
>    3. Meanwhile unit test B allocates a `MemMgr` (or whatever other
>    object we are using for a key) at newly freed address X
>    4. Unit test B calls `notifyObjectLoaded` to register the object at
>    K=X and locks the ` JITDebugLock` here
>    <https://github.com/llvm/llvm-project/blob/3d5360a4398bfa6878f94ca9ac55bc568692c765/llvm/lib/ExecutionEngine/GDBRegistrationListener.cpp#L162>
>    with unit test A still waiting in step #2 above
>    5. This goes on to assert as the key was not erased (step #2) before
>    it was added (step #4)
>
>
>
> I could use a little help to debug the error further.  Curiously, we do
> hit this issue in continuous integration (CI) --- only when running on
> development systems “by hand”.  I am investigating the differences (perhaps
> single vs. multi threaded?) as to why this might be the case.
>
>
>
> Note that I am using a slightly old version of MLIR master, I’m at this
> commit:
>
>
>
> commit 2f8b4545f4960778e37114c024073d208751ca89
>
> Author: Adam Straw <adam.d.straw at intel.com>
>
> Date:   Tue Apr 14 22:49:18 2020 +0300
>
>
>
>     [mlir] Fix assert on signed integer type in EDSC
>
>
>
>     Integer type in Std dialect is signless so we should be checking
>
>     for signless integer type instead of signed integer type in EDSC.
>
>
>
>     Differential Revision: https://reviews.llvm.org/D78144
>
>
>
> Thanks in advance for the help.
>
>
>
> Adam
>
>
>
> Note:  I am out of office this Fri and the following Mon but otherwise
> should be prompt with email replies.
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200520/d07a4fcb/attachment-0001.html>


More information about the llvm-dev mailing list