<div dir="ltr">I need to take off until later this evening. So far I've found the following:<div><br></div><div>The assertion failure I'm seeing is due to the to the addGarbageObject call in:</div><div> void ilist_traits<MachineInstr>::removeNodeFromList(MachineInstr *N)</div><div>in MachineBasicBlock.cpp . This is somewhat puzzling as the destructor for MachineBasicBlock calls removeGarbageObject so it should be impossible for new to return that pointer again. The only explanation I can come up with is that this class does not have a virtual destructor and the object in question is being deleted with a different type than MachineBasicBlock.<br></div><div><br></div><div>Also, changing removeGarbage in LeaksContext.h to:</div><div><br></div><div><div> void removeGarbage(const T* o) {</div><div> if (o == Cache)</div><div> Cache = nullptr; // Cache hit</div><div> else {</div><div> assert(Ts.count(o) == 1 && "Removing Object not in set!");</div><div> Ts.erase(o);</div><div> }</div><div> }</div></div><div><br></div><div>causes the added assertion to fail in my code. I'm not sure if the leaks detector is supposed to disallow removing an object that was never added, but if so, it seems worth adding the assertion. (And if not, perhaps explicitly documenting that it is allowed to remove and object that was never added or has already been removed.)</div><div><br></div><div>And while I don't expect anyone to pull and build Halide to repro this, here's how to do it:</div><div><br></div><div>1) Clone <a href="https://github.com/halide/Halide.git">https://github.com/halide/Halide.git</a> to a directory I'll call "halide".</div><div>2) Symlink halide/llvm to your llvm build. (This may not be necessary as llvm-config may take care of everything.)</div><div>3) Make sure llvm-config from the right development build is on your path.</div><div>4) Type "make test_boundary_conditions" in the halide dir. (Or "make OPTIMIZE-g test_boundary_conditions" )</div><div><br></div><div>This should work on Linux or Mac. I've been debugging on Linux, but I expect this repros on Mac OS X as well.</div><div><br></div><div>Thanks,</div><div>-Z-</div><div><br></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Dec 10, 2014 at 5:27 PM, Duncan P. N. Exon Smith <span dir="ltr"><<a href="mailto:dexonsmith@apple.com" target="_blank">dexonsmith@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">+<a href="mailto:zalman@google.com">zalman@google.com</a><br>
<br>
> On 2014 Dec 10, at 15:57, Duncan P. N. Exon Smith <<a href="mailto:dexonsmith@apple.com">dexonsmith@apple.com</a>> wrote:<br>
><br>
>><br>
>> On 2014 Dec 10, at 14:08, Tom Stellard <<a href="mailto:tom@stellard.net">tom@stellard.net</a>> wrote:<br>
>><br>
>> On Wed, Dec 10, 2014 at 11:21:08AM -0800, Duncan P. N. Exon Smith wrote:<br>
>>><br>
>>>> On 2014 Dec 10, at 08:40, Tom Stellard <<a href="mailto:tom@stellard.net">tom@stellard.net</a>> wrote:<br>
>>>><br>
>>>> On Tue, Dec 09, 2014 at 09:22:16PM -0800, Duncan P. N. Exon Smith wrote:<br>
>>>>> The `Metadata`/`Value` split (PR21532) landed in r223802 -- at least, the<br>
>>>>> C++ side of it. This was a rocky day, but I suppose that's what I get<br>
>>>>> for failing to stage the change in smaller pieces.<br>
>>>>><br>
>>>>> As of r223916 (lldb), I'm not aware of any remaining (in-tree) breakage,<br>
>>>>> so if I've missed some problem in the sea of buildbot errors, please<br>
>>>>> flag me down.<br>
>>>>><br>
>>>>> I'll follow up soon with bitcode and assembly changes!<br>
>>>><br>
>>>> Hi Duncan,<br>
>>>><br>
>>>> I started getting random assertion failures in some tests yesterday, and I think<br>
>>>> it may be related to this change. Here is the stack trace:<br>
>>>><br>
>>>> #0 0x00007ffff59f4c39 in raise () from /lib64/libc.so.6<br>
>>>> #1 0x00007ffff59f6348 in abort () from /lib64/libc.so.6<br>
>>>> #2 0x00007ffff59edb96 in __assert_fail_base () from /lib64/libc.so.6<br>
>>>> #3 0x00007ffff59edc42 in __assert_fail () from /lib64/libc.so.6<br>
>>>> #4 0x00007ffff3a30e92 in llvm::LeakDetectorImpl<void>::addGarbage(void const*) [clone .part.19] () from /opt/buildbot/lib/<a href="http://libLLVM-3.6svn.so" target="_blank">libLLVM-3.6svn.so</a><br>
>>>> #5 0x00007ffff3a30fd3 in llvm::LeakDetector::addGarbageObjectImpl(void*) () from /opt/buildbot/lib/<a href="http://libLLVM-3.6svn.so" target="_blank">libLLVM-3.6svn.so</a><br>
>>>> #6 0x00007ffff3a40eed in llvm::MDNode::getTemporary(llvm::LLVMContext&, llvm::ArrayRef<llvm::Metadata*>) () from /opt/buildbot/lib/<a href="http://libLLVM-3.6svn.so" target="_blank">libLLVM-3.6svn.so</a><br>
>>>> #7 0x00007ffff3426b3f in MapValueImpl(llvm::Metadata const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()<br>
>>>> from /opt/buildbot/lib/<a href="http://libLLVM-3.6svn.so" target="_blank">libLLVM-3.6svn.so</a><br>
>>>> #8 0x00007ffff3426bd6 in MapValueImpl(llvm::Metadata const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()<br>
>>>> from /opt/buildbot/lib/<a href="http://libLLVM-3.6svn.so" target="_blank">libLLVM-3.6svn.so</a><br>
>>>> #9 0x00007ffff3426bd6 in MapValueImpl(llvm::Metadata const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()<br>
>>>> from /opt/buildbot/lib/<a href="http://libLLVM-3.6svn.so" target="_blank">libLLVM-3.6svn.so</a><br>
>>>> #10 0x00007ffff3426eed in llvm::MapValue(llvm::Metadata const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()<br>
>>>> from /opt/buildbot/lib/<a href="http://libLLVM-3.6svn.so" target="_blank">libLLVM-3.6svn.so</a><br>
>>>> #11 0x00007ffff3426f39 in llvm::MapValue(llvm::MDNode const*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()<br>
>>>> from /opt/buildbot/lib/<a href="http://libLLVM-3.6svn.so" target="_blank">libLLVM-3.6svn.so</a><br>
>>>> #12 0x00007ffff3427174 in llvm::RemapInstruction(llvm::Instruction*, llvm::ValueMap<llvm::Value const*, llvm::WeakVH, llvm::ValueMapConfig<llvm::Value const*, llvm::sys::SmartMutex<false> > >&, llvm::RemapFlags, llvm::ValueMapTypeRemapper*, llvm::ValueMaterializer*) ()<br>
>>>> from /opt/buildbot/lib/<a href="http://libLLVM-3.6svn.so" target="_blank">libLLVM-3.6svn.so</a><br>
>>>> #13 0x00007ffff3755786 in (anonymous namespace)::ModuleLinker::linkGlobalValueBody(llvm::GlobalValue&) () from /opt/buildbot/lib/<a href="http://libLLVM-3.6svn.so" target="_blank">libLLVM-3.6svn.so</a><br>
>>>> #14 0x00007ffff375767f in llvm::Linker::linkInModule(llvm::Module*) () from /opt/buildbot/lib/<a href="http://libLLVM-3.6svn.so" target="_blank">libLLVM-3.6svn.so</a><br>
>>>> #15 0x00007ffff3758cfb in llvm::Linker::LinkModules(llvm::Module*, llvm::Module*, std::function<void (llvm::DiagnosticInfo const&)>) ()<br>
>>>> from /opt/buildbot/lib/<a href="http://libLLVM-3.6svn.so" target="_blank">libLLVM-3.6svn.so</a><br>
>>>> #16 0x00007ffff6c9d8cf in clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) () from /opt/buildbot/lib/libOpenCL.so.1<br>
>>>> #17 0x00007ffff6e61f23 in clang::ParseAST(clang::Sema&, bool, bool) () from /opt/buildbot/lib/libOpenCL.so.1<br>
>>>> #18 0x00007ffff6c9e6bb in clang::CodeGenAction::ExecuteAction() () from /opt/buildbot/lib/libOpenCL.so.1<br>
>>>> #19 0x00007ffff6b7ead6 in clang::FrontendAction::Execute() () from /opt/buildbot/lib/libOpenCL.so.1<br>
>>>> #20 0x00007ffff6b5d179 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) () from /opt/buildbot/lib/libOpenCL.so.1<br>
>>>> #21 0x00007ffff6b1282c in (anonymous namespace)::compile_llvm (llvm_ctx=...,<br>
>>>> source="\n__kernel void test_fn(__local float *sSharedStorage, __global float *srcValues, __global uint *offsets, __global float *destBuffer, uint alignmentOffset )\n{\n int tid = get_global_id( 0 );\n sSha"..., headers=..., name="<a href="http://input.cl" target="_blank">input.cl</a>", triple="r600--", processor="verde", opts="",<br>
>>>> address_spaces=..., optimization_level=@0x7fffffff21cc: 2, r_log=...) at llvm/invocation.cpp:255<br>
>>>> #22 0x00007ffff6b140c8 in clover::compile_program_llvm (source=..., headers=..., ir=ir@entry=PIPE_SHADER_IR_NATIVE, target=..., opts=..., r_log=...)<br>
>>>> at llvm/invocation.cpp:710<br>
>>>> #23 0x00007ffff6b0a371 in clover::program::build (this=this@entry=0x23a0530, devs=..., opts=opts@entry=0x7ffff793dc0d "", headers=...)<br>
>>>> at core/program.cpp:63<br>
>>>> #24 0x00007ffff6af31c4 in clBuildProgram (d_prog=0x23a0538, num_devs=0, d_devs=0x0, p_opts=<optimized out>, pfn_notify=0x0, user_data=0x0)<br>
>>>> at api/program.cpp:182<br>
>>>><br>
>>>> Does this look related? If so, let me know what other information you need to<br>
>>>> try to debug this issue.<br>
>>><br>
>>> This could be related; I'm not sure.<br>
>>><br>
>><br>
>> I'm pretty sure that this commit is the cause of the regression.<br>
>><br>
>> r223801 works and r223810 does not, and I don't think any of the other<br>
>> commits in that range could cause this.<br>
>><br>
>>> It looks like a leak detection assertion, and I didn't need to change<br>
>>> that logic at all. `ValueMap` calls `MDNode::getTemporary()` and<br>
>>> `MDNode::deleteTemporary()` in the same ways it used to (and I didn't<br>
>>> touch the implementation of those).<br>
>>><br>
>>> Can you reproduce this with `llvm-link`? If so, that sounds like the<br>
>>> best place to start.<br>
>><br>
>> I can't reproduce this using llvm-link unfortunately. Any other ideas?<br>
><br>
> (Continuing via email, since Tom stepped away IRC.)<br>
><br>
> Tom, from the trace [1], it the problematic pointer (0x27e4c80) only shows<br>
> up once.<br>
><br>
> [1]: <a href="http://people.freedesktop.org/~tstellar/md-crash.out" target="_blank">http://people.freedesktop.org/~tstellar/md-crash.out</a><br>
><br>
> That means that something *else* -- other than `MDNode::getTemporary()`<br>
> -- must be adding that address to garbage and failing to remove it.<br>
><br>
> I just dug into `LeakDetector::addGarbageObject()` and it stores *all*<br>
> calls to `addGarbage()` in the same place. There are a fair number of<br>
> these in the IR:<br>
><br>
> $ git grep -e addGarbageObject -w -- lib/IR/<br>
> lib/IR/BasicBlock.cpp: LeakDetector::addGarbageObject(this);<br>
> lib/IR/BasicBlock.cpp: LeakDetector::addGarbageObject(this);<br>
> lib/IR/Function.cpp: LeakDetector::addGarbageObject(this);<br>
> lib/IR/Function.cpp: LeakDetector::addGarbageObject(this);<br>
> lib/IR/Function.cpp: LeakDetector::addGarbageObject(this);<br>
> lib/IR/Function.cpp: LeakDetector::addGarbageObject(this);<br>
> lib/IR/Globals.cpp: LeakDetector::addGarbageObject(this);<br>
> lib/IR/Globals.cpp: LeakDetector::addGarbageObject(this);<br>
> lib/IR/Globals.cpp: LeakDetector::addGarbageObject(this);<br>
> lib/IR/Globals.cpp: LeakDetector::addGarbageObject(this);<br>
> lib/IR/Globals.cpp: LeakDetector::addGarbageObject(this);<br>
> lib/IR/Instruction.cpp: LeakDetector::addGarbageObject(this);<br>
> lib/IR/Instruction.cpp: LeakDetector::addGarbageObject(this);<br>
> lib/IR/Instruction.cpp: if (!P) LeakDetector::addGarbageObject(this);<br>
> lib/IR/Metadata.cpp: LeakDetector::addGarbageObject(N);<br>
><br>
> I think the next step is to identify who called `addGarbageObject()` with<br>
> the problematic address, and what the stack trace was.<br>
><br>
> The weird thing is, I also noticed a semantic change I made here<br>
> accidentally. `addGarbageObject()` has two overloads: `void*` and<br>
> `const Value*`. `MDNode::getTemporary()` used to match the latter, but<br>
> now it matches the former.<br>
><br>
> The weird part: all the other calls to `addGarbageObject()` look like they<br>
> send in a `Value *`.<br>
><br>
> Do you have any other calls to `addGarbageObject()`? Do they match<br>
> `void *`?<br>
><br>
> Also, what happens with the attached patch? (If this fixes your problem,<br>
> I think it's just papering over something...)<br>
><br>
> <0001-IR-Detect-Metadata-leaks-separately-from-generic-obj.patch><br>
<br>
Zalman also had a reproduction, and he's been able to track it down to<br>
an `addGarbageObject()` call from `MachineBasicBlock`. It looks like<br>
the MBB gets deallocated but `removeGarbageObject()` isn't called yet.<br>
<br>
CC'ing him here so he can join the thread once he's gotten a little<br>
further.<br>
<br>
BTW, if this is blocking anyone, I can commit the patch I attached to<br>
my previous email (or you can apply it locally). I think it's probably<br>
the right thing eventually -- it improves the output when there *is* an<br>
issue -- but I haven't committed it yet since it'll cover up the<br>
problem.</blockquote></div><br></div>