[llvm-commits] preserve pointer roots in globals

Fri Jun 29 03:00:44 PDT 2012

Chandler Carruth wrote:
> On Fri, Jun 29, 2012 at 12:34 AM, Duncan Sands <baldrick at free.fr
> <mailto:baldrick at free.fr>> wrote:
>
>     Hi Nick,
>
>      > The attached patch improves the interaction between GlobalOpt and
>     leak checkers.
>      > One of the things that GlobalOpt will do is delete all the stores
>     to a global
>      > that isn't loaded. If for some reason we don't mop up the
>     malloc/new which was
>      > being stored in that global, a leak checker will complain.
>
>     if the malloc cleanup code sees that the malloc result is stored to
>     a global,
>     then won't that prevent it from removing the malloc?  In short, isn't it
>     important to remove such dead stores exactly because you then have a
>     chance of
>     deleting the malloc call?
>
>     Also, isn't the leak checker right to complain, and the fact that it
>     doesn't
>     complain when the value is stored to the global just a weakness of
>     the leak
>     checker?  After all, if without the store the memory leaked, then
>     with the
>     store the memory leaked too, since the global is never read.  So
>     aren't you
>     just hiding real leaks here?
>
>     This patch seems completely backwards to how it should be to me.
>
>
> This is actually I think everyone's first reaction... and it's not
> unreasonable. We've been debating this pretty heatedly ourselves, so let
> me try to give some background.
>
> First, how the heap leak checking strategy we're interested in works in
> brief[1] and I suspect greatly simplified, so don't take this as a
> literal explanation:
>
> 1) It instruments the malloc library and records every allocation in a
> set of pointers.
> 2) It removes each freed allocation from the set.
> 3) At the end of the program's execution, it removes from the set all
> "live" pointers.
>
> The remaining pointers are considered "leaked".
>
> Why is this strategy useful? Why wouldn't you insist that *everything*
> not freed is leaked? Because of threads.
>
> In C++ (even the latest C++ standard), there is no reasonable and
> thread-safe way to free memory associated with global variables on
> shutdown. This stems from two strange realities of the system: there is
> no way to safely and correctly join *all* threads in a program, and the
> destructors (and thus memory deallocation) for global variables are run
> after main is exited. Because we can't join all threads before we exit
> main, it is always possible to exit main with a thread still alive and
> running and using a global variable.
>
> The result is that every global variable shared between threads with a
> non-trivial destructor is technically thread-hostile, and if the
> destructor deallocates memory, it is definitely thread-hostile. The
> solution is to always "leak" global allocations that might be referenced
> by some thread still running past the end of main. These leaks are
> categorically different from others -- they are necessarily bounded
> (they are all named) and don't represent an un-known heap growth in the
> program.
>
> I think the idea of GC roots is the best way to think about it -- the
> globals simply form a set of roots of allocations that are still live
> right to the end of the program, and thus have not yet been "leaked" per-se.
>
>
> Now, the more interesting part is the part about whether these
> allocations are *actually* unused, and whether they can still be folded.
> If globalopt is going to be able to *remove* the allocation, then by all
> means we should let it. I think in many cases it will do this in the
> presence of that last store, but Nick would know for sure. If it
> doesn't, we can certainly teach it to do so.

Indeed, Duncan is exactly right. With this patch in place as-is we will 
no longer eliminate any mallocs that feed into a global variable. This 
will prevent us from eliminating the allocations at global initializer 
time, something we were previously very good at.

Previously we'd just speculatively delete the store of ptr to global 
then let the entire -O2 optz'n stack try to eliminate the remaining 
malloc/new. Often it would, sometimes it wouldn't.

With this patch in, we would be in the position of teaching globalopt 
whatever tricks we think we need for the cases we find (I've started 
with none at all in this patch). It's flipping around the "never miss a 
chance to optimize" situation into a "check whether we can prove it'll 
lead to an optz'n and don't transform unless we're sure".

Ultimately, the issue is pragmatic. I don't think a leak checker in the 
style that we use at Google is a google-specific technology, or else I 
wouldn't have tried sending it upstream. I expect that other users of 
llvm do (or will) run into the same problem. And it won't be a problem 
for them to continue to use gcc since gcc doesn't do this optimization 
anyhow.

Nick

> The other case is the interesting one: when the allocation is itself
> used (handed into a registry or some other subsystem by the code in the
> globals initializer), but the actual global variable that holds the
> pointer is not used. Here, we clearly must keep the allocation (it is
> used, and can't be simulated by globalopt), but prior to Nick's patch
> LLVM would delete the global referring to the result. This removes a
> root that the leak checker uses to determine that in fact, this entity
> remains live.
>
> The heap leak checking strategy has proven over many years in active use
> to be extremely high value, even in optimized programs, so we're quite
> interested in making sure that such a use case is well supported by
> LLVM. (It happens to be well supported by GCC, and we've worked to keep
> it that way.)  On the flip side, we've seen thousands of tests fail due
> to "leaks" that were entirely correct code following this pattern.
>
> That said, if you or others can figure out a way to do heap leak
> checking that *doesn't* require preserving global allocation roots, that
> would be fantastic. None of us have been able to come up with anything
> remotely satisfying that has minimal performance impact, catches as many
> leaks, and avoids issues with intentional "leaks" through globals for
> the sake of thread-safety.
>
> -Chandler
>
> [1]: http://google-perftools.googlecode.com/svn/trunk/doc/heap_checker.html
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits