[PATCH] A new HeapToStack allocation promotion pass

Mon Oct 7 08:31:42 PDT 2013

----- Original Message -----
> On Oct 5, 2013, at 3:29 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> >>> Yep, Nick also pointed this out; thanks for confirming!
> >> 
> >> No problem. Here's one with longjmp().
> > 
> > It seems in general that we have two situations to deal with:
> > 
> > 1. If the pointer (or some alias) is captured, and we (or some
> > function we call) has some indefinite loop (including the use
> > operating-system-assisted synchronization primitives), then some
> > other thread might free the memory. Maybe I could call safe
> > functions in this regard 'non-blocking'?
> 
> Have you considered changing your approach, to base it on nocapture
> instead?
> 
> From one of your emails, you mentioned that you're mostly interested
> in the template case where the callee graph is pretty well known.
>  Given that, you should be able to turn this into a simple function
> pass that doesn't require interprocedural knowledge: only allow it
> to be passed to no-capture calls.  This is a very simple form of
> escape analysis.

Yes, I thought about that. Unfortunately, we may be too far down the rabbit hole already ;)

If the value is captured, then I need to make sure that there are no blocking (synchronizing) calls along the execution path. This includes analyzing the function containing the malloc/free. I could just reject captured malloc values, but, unless I'm going to reject any execution path with any function call, then I need to know if the functions will return normally (regardless of whether or not the malloc is captured). And since I need to know if the functions on the execution path return normally anyway, I might as well look for the problematic loops (and atomic/volatile accesses) while I'm at it.

I could certainly start with a version that just rejects all function calls, but it is not clear to me that the rest of it is particularly complicated; and I would not be satisfied with such a solution: it would miss a bunch of low-hanging fruit.

I am somewhat concerned about compile-time impact if looking for loops with an indeterminate iteration count involving running SE on everything, but as far as I can tell, I'd actually get petty far by just rejecting all loops (that would miss much less of the low-hanging fruit). So if we need a fall-back position, I'd rather go there. Maybe with some additional use-case-driven logic on top of that if the overhead can be kept sufficiently low.

Thanks again,
Hal

> 
> -Chris
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory