[PATCH] D33688: [Polly] Heap allocation for new arrays

Fri Jun 23 02:27:45 PDT 2017

niosega added inline comments.

================
Comment at: lib/CodeGen/IslNodeBuilder.cpp:1430-1434
+      auto *CreatedArray = CallInst::CreateMalloc(
+          &*InstIt, IntPtrTy, SAI->getElementType(),
+          ConstantInt::get(Type::getInt64Ty(Ctx), Size),
+          ConstantInt::get(Type::getInt64Ty(Ctx), ArraySizeInt), nullptr,
+          SAI->getName());
----------------
Meinersbur wrote:
> simbuerg wrote:
> > Meinersbur wrote:
> > > philip.pfaffe wrote:
> > > > Meinersbur wrote:
> > > > > simbuerg wrote:
> > > > > > Meinersbur wrote:
> > > > > > > Where is the memory free'd?
> > > > > > Good Catch. We just discussed possible locations for the free. The easiest way would be the exit(s) of the SCoP. However, to keep it simple, we would have to do a copy-in/-out to the original array base pointer, right? Everything else (e.g. calculating lexicographic maximal accesses) would give us trouble with non-polyhedral accesses between SCoPs.
> > > > > copy-in/out is not specific to heap arrays. Such things can be done using with additional copy statements.
> > > > > 
> > > > > For scalar expansion, the scalar are not available after the SCoP anyway, with the exception of 'escaping' scalars. In that case only a single value is available to the outside. I suggest to exclude escaping scalars at the moment.
> > > > > 
> > > > > Here, alloca/malloc are generated when entering the Scop. I think free'ing it when exiting it is the most obvious choice.
> > > > > 
> > > > > @gareevroman
> > > > > Btw, for alloca this location looks to be the wrong choice. alloca's should be in a function's entry block. The SCoP itself could be within the loop, meaning that everytime the SCoP is executed, the stack grows, up to a possible stack overflow.
> > > > Couldn't that create use-after-free if the SCoP is within a loop?
> > > @gareevroman
> > > I was wrong, the alloca is actually added to the entry block.
> > > 
> > > This means that also the `malloc` is in the entry block, but the call to `free` below is added into last block of the //orginal// region (where it is not even used). We either have to
> > > 
> > > 1. Add the malloc to the entry block and the free at every `ret` (or non-returning function call)
> > > 
> > > - or -
> > > 
> > > 2. Add the malloc to the start of the generated code (`polly.start`) and the free at the end of it (`polly.exiting`).
> > Wait, if you malloc at function entry and free at every ret of the function you will end up with problems in code that can't be modeled as a SCoP because you cannot map to the correct
> > 'last write' in the expanded array that is required for a memory access outside of the SCoP. Therefore, you will have to do copy-in/-out before/after the SCoP to stay correct, hence you can malloc/free right at the SCoP boundary (because you need to do the copying anyway).
> > 
> > Use-After-Free wouldn't become an issue as soon as you malloc/free at the SCoP boundary.
> We have no control about memory accesses outside of a SCoP and should not try to modify them to read from the expanded array itself.
> 
> For scalar uses, a single LoadInst is enough for outside uses.
> 
> For array uses, one has to copy the data to the original array in any case. It might captured by a function call and/or used after the function containing the SCoP.
> 
> ```
> void function_with_scop(float A[]) {
> #pragma scop
> for (int i = 0; i < 128; i+=1) 
>   for (int j = 0; j < 128; j+=1) 
>     A[j] = ...
> #pragma endscop
>   print(A[getIndexToPrint()]);
> }
> 
> int main() {
>   float A[128];
>   function_with_scop(A[128]);
>   print(A[5]);
> }
> ```
> How do you expand A to two dimensions without a copy-back?
> 
> I suggest to consider only uses where the array/scalar is entire contained within the SCoP at first. Escaping scalars is easy to add. Copy-back can be implemented afterwards. For special cases we could avoid copy-back like this:
> ```
> void function_with_scop() {
> { // generated
> A_expaned = malloc(128*128*sizeof(float))
> for (int i = 0; i < 128; i+=1) 
>   for (int j = 0; j < 128; j+=1) 
>     A_expaned[i][j] = ...
> A = &A[127];
> }
>   use(A[5]);
> }
> ```
> for which we have to be able to replace all base ptrs of A.
> 
> Choice of alternatives:
> ```
> void scop_in_loop() {
>   for (int i = 0; i < 128; i+=1) {
> #pragma scop
>     for (int j = 0; j < 128; j+=1) 
>       A[i] = ...
> #pragma endscops
>   }
> }
> ```
> 
> we could generate 
> 1.
> ```
> void scop_in_loop() {
>   for (int i = 0; i < 128; i+=1)
> { // generated
>   A_expanded = malloc(...);
>   ...
>   free(A_generated);
> }
> }
> ```
> 
> or
> 
> 2.
> ```
> void scop_in_loop() {
>   A_expanded = malloc(...);
>   for (int i = 0; i < 128; i+=1)
> { // generated
>   ...
> }
>   free(A_generated);
> }
> ```
> 
> 1. allocates memory multiple times.
> 2. allocates memory even if it is never used (e.g. SCoP is in an if-conditional). The memory necessary might also depend on a parameter, which is not known at the entry of a function;
> ```
> void scop_in_loop() {
>   int n = array->size();
>   for (int i = 0; i < n; i+=1) {
> #pragma scop
>     for (int j = 0; j < 128; j+=1) 
>       A[i] = ...
> #pragma endscops
>   }
> }
> ```
> 
> There is also alternative 3 where we create a global of the required size:
> ```
> static float A_expanded[128][128];
> ```
> The would be memory inside an executable's `.bss` segment which most operating systems do not assign physical memory to until its first use. No free'ing required here, but we also cannot make the size dependent on some parameter. You are also polluting the host's virtual address space and get issues with threading.
> 
> I tend to go towards alternative 2.
I am currently working on inserting the malloc and the free at the right place (polly.start and polly.exiting) .
But I am having trouble to find how to get the BasicBloc or the Instruction before which I must insert these two calls. 
Does anybody know how to get a pointer to them ?

https://reviews.llvm.org/D33688