[llvm-commits] preserve pointer roots in globals

Fri Jun 29 02:01:51 PDT 2012

On Fri, Jun 29, 2012 at 12:34 AM, Duncan Sands <baldrick at free.fr> wrote:

> Hi Nick,
>
> > The attached patch improves the interaction between GlobalOpt and leak
> checkers.
> > One of the things that GlobalOpt will do is delete all the stores to a
> global
> > that isn't loaded. If for some reason we don't mop up the malloc/new
> which was
> > being stored in that global, a leak checker will complain.
>
> if the malloc cleanup code sees that the malloc result is stored to a
> global,
> then won't that prevent it from removing the malloc?  In short, isn't it
> important to remove such dead stores exactly because you then have a
> chance of
> deleting the malloc call?
>
> Also, isn't the leak checker right to complain, and the fact that it
> doesn't
> complain when the value is stored to the global just a weakness of the leak
> checker?  After all, if without the store the memory leaked, then with the
> store the memory leaked too, since the global is never read.  So aren't you
> just hiding real leaks here?
>
> This patch seems completely backwards to how it should be to me.
>

This is actually I think everyone's first reaction... and it's not
unreasonable. We've been debating this pretty heatedly ourselves, so let me
try to give some background.

First, how the heap leak checking strategy we're interested in works in
brief[1] and I suspect greatly simplified, so don't take this as a literal
explanation:

1) It instruments the malloc library and records every allocation in a set
of pointers.
2) It removes each freed allocation from the set.
3) At the end of the program's execution, it removes from the set all
"live" pointers.

The remaining pointers are considered "leaked".

Why is this strategy useful? Why wouldn't you insist that *everything* not
freed is leaked? Because of threads.

In C++ (even the latest C++ standard), there is no reasonable and
thread-safe way to free memory associated with global variables on
shutdown. This stems from two strange realities of the system: there is no
way to safely and correctly join *all* threads in a program, and the
destructors (and thus memory deallocation) for global variables are run
after main is exited. Because we can't join all threads before we exit
main, it is always possible to exit main with a thread still alive and
running and using a global variable.

The result is that every global variable shared between threads with a
non-trivial destructor is technically thread-hostile, and if the destructor
deallocates memory, it is definitely thread-hostile. The solution is to
always "leak" global allocations that might be referenced by some thread
still running past the end of main. These leaks are categorically different
from others -- they are necessarily bounded (they are all named) and don't
represent an un-known heap growth in the program.

I think the idea of GC roots is the best way to think about it -- the
globals simply form a set of roots of allocations that are still live right
to the end of the program, and thus have not yet been "leaked" per-se.

Now, the more interesting part is the part about whether these allocations
are *actually* unused, and whether they can still be folded. If globalopt
is going to be able to *remove* the allocation, then by all means we should
let it. I think in many cases it will do this in the presence of that last
store, but Nick would know for sure. If it doesn't, we can certainly teach
it to do so.

The other case is the interesting one: when the allocation is itself used
(handed into a registry or some other subsystem by the code in the globals
initializer), but the actual global variable that holds the pointer is not
used. Here, we clearly must keep the allocation (it is used, and can't be
simulated by globalopt), but prior to Nick's patch LLVM would delete the
global referring to the result. This removes a root that the leak checker
uses to determine that in fact, this entity remains live.

The heap leak checking strategy has proven over many years in active use to
be extremely high value, even in optimized programs, so we're quite
interested in making sure that such a use case is well supported by LLVM.
(It happens to be well supported by GCC, and we've worked to keep it that
way.)  On the flip side, we've seen thousands of tests fail due to "leaks"
that were entirely correct code following this pattern.

That said, if you or others can figure out a way to do heap leak checking
that *doesn't* require preserving global allocation roots, that would be
fantastic. None of us have been able to come up with anything remotely
satisfying that has minimal performance impact, catches as many leaks, and
avoids issues with intentional "leaks" through globals for the sake of
thread-safety.

-Chandler

[1]: http://google-perftools.googlecode.com/svn/trunk/doc/heap_checker.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120629/ce8469b0/attachment.html>