[PATCH] D82470: [OpenMP][IRBuilder] Support allocas in nested parallel regions

Fady Ghanim via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jul 3 01:35:27 PDT 2020


fghanim added inline comments.


================
Comment at: llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h:282
+  IRBuilder<> AllocaBuilder;
+
   /// Map to remember source location strings
----------------
jdoerfert wrote:
> fghanim wrote:
> > jdoerfert wrote:
> > > fghanim wrote:
> > > > jdoerfert wrote:
> > > > > fghanim wrote:
> > > > > > What's the benefit of this over just maintaining an alloca insertion point?
> > > > > > 
> > > > > > With this we will have 3 `IRBuilder`s to maintain and keep track of: the clang (or flang) `IRBuilder`, the OMP `IRBuilder` and the `allocaBuilder`
> > > > > No functional difference. Basically the same reason why clang has an AllocIRBuilder (I think), we can use it independently of the other one. The alternative is to save the builder IP, set it to the alloca IP, do alloca stuff, reset the builder IP, when you use the AllocaIRBuilder below. Again, no functional difference, just less setting/resetting of IPs.
> > > > I understand where you are coming from, and I understand that functionally there is mostly no change - for now. But I prefer we do a small struct with some RAII, over adding an entire builder. I think it's a bit overkill and will be a source to lots of trouble.
> > > > 
> > > > Clang maintains an instruction `AllocaInsertPt`, not a specific builder. Speaking of which, This is completely from memory (i.e. check to make sure); I think we already save the previous `AllocaInsertionPt` in clang as part of the `bodyGenCB`. In which case, this is not needed; a nested `parallel` will always be generated by clang as part of the `bodyGenCB`, which in turn is going to save the Alloca insertion point for us. 
> > > > To enable the creation of the allocas for `TID` and `ZeroAddr` in the outer function, Why not pass current `AllocaInsertionPt` as an arg. to `createParallel`
> > > > I understand where you are coming from, and I understand that functionally there is mostly no change - for now. But I prefer we do a small struct with some RAII, over adding an entire builder. I think it's a bit overkill and will be a source to lots of trouble.
> > > 
> > > Could you explain what kind of overkill you see here? And maybe also the lots of trouble you expect from a builder versus a insertion point? It is the same information, just packed in a different struct, right?
> > > 
> > > 
> > > > Clang maintains an instruction AllocaInsertPt, not a specific builder. 
> > > 
> > > Right.
> > > 
> > > 
> > > > Speaking of which, This is completely from memory (i.e. check to make sure); I think we already save the previous AllocaInsertionPt in clang as part of the bodyGenCB. 
> > > 
> > > I think so.
> > > 
> > > 
> > > > In which case, this is not needed; 
> > > 
> > > Not strictly, no. 
> > > 
> > > > a nested parallel will always be generated by clang as part of the bodyGenCB,
> > > 
> > > Yes.
> > > 
> > > > which in turn is going to save the Alloca insertion point for us.
> > > 
> > > Yes.
> > > 
> > > > To enable the creation of the allocas for TID and ZeroAddr in the outer function, Why not pass current AllocaInsertionPt as an arg. to createParallel
> > > 
> > > So far, because this will not be the only one that needs to create allocas. If you want to add the allocaIP to all runtime calls that create allocas, I'm fine with that too.
> > > Could you explain what kind of overkill you see here?
> > Using an IRBuilder when an insertion point suffices
> > 
> > > And maybe also the lots of trouble you expect from a builder versus a insertion point?
> > Currently, when using the OMPBuilder you need to juggle two IRBuilders, let's not make them 3 :)
> > 
> > > It is the same information, just packed in a different struct, right?
> > no it is not. `InsetrionPoint` vs  struct that contains `Insertionpoint` + other things. But that's beside the point
> > 
> > I mean, is the `AllocaBuilder` providing a new functionality other than a convenient way to keep track of the Alloca `InsertionPoint`?
> > Personally, I prefer some suitable structure to your needs and a stack (i.e. the way clang does it) . This should also resolve the todo below. Alternatively, pass an allocation point as an argument as suggested earlier. I am open to any third way if you have any.
> > Personally, I prefer some suitable structure to your needs and a stack (i.e. the way clang does it).
> 
> The stack is already here, implicitly, and actually on the stack. With
> `IRBuilder<>::InsertPointGuard AIPG(AllocaBuilder);`
> we have all the stack we need without leaking the state.
> The updates to the insertion point are done for use and the AllocaBuilder is always ready to be used. No need to create a builder or update an insertion point (other than the places that create the new alloca insertion points). Note that the suitable structure is the builder, as it is the only interaction with the insertion point, e.g. we do not pass the insertion point to other functions.
> 
> 
> > InsetrionPoint vs struct that contains Insertionpoint + other things. But that's beside the point
> 
> I disagree. The "other things" is the only interaction we have with the insertion point. So storing or accepting an insertion point is fine for the only purpose of creating a Builder and interacting with the builder. The difference is that we need to add more churn to update and reset the insertion point (what happens behind the scenes with the one line I quoted above).
> 
> 
> > Alternatively, pass an allocation point as an argument as suggested earlier.
> 
> I did not go this route because I think it complicates the API with little gain. I was also unsure what the flang situation looks like here. Do we have an alloca insertion point there too?
> At the end of the day, this makes it easier for the user.
But we haven't solved a problem, we just kicked it down the road with the TODO you added, with the only trade-off being convenience now. The `AllocaBuilder` is not usable by the frontend - the frontend doesn't & **shouldn't** know or care about any builders other than its own. However, an insertion point can be passed to the frontend (i.e. `bodyCB`). So keeping track of an Alloca insertion point across nests -similar to how clang does it- is what we need. I still prefer having a stack to push to - maybe even piggy back on the finalization stack by creating a struct that contains all relevant info per nest level. But you don't like it, fine.

Then just pass an argument for allocaIP and be done with it. Flang (or clang or whichever other frontend) are required to emit allocas into function entry block by all LLVM passes that follow. As such, we can safely assume the outermost call is always going to have the AllocaIP in the entry block of the enclosing function. Which means the first time we call `CreateParallel` they can pass that as arg.. or pass an empty insertion point, in which case we already detect it.

Adding a builder for every insertion point that we need to pass across nests or regions is not something we want to do for the reasons I mentioned earlier. Whether we like it or not, it is a builder we have to juggle along the other two, no matter how much we try to specialize its use, or keep it "hidden".



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82470/new/

https://reviews.llvm.org/D82470





More information about the llvm-commits mailing list