[PATCH] D82470: [OpenMP][IRBuilder] Support nested parallel regions

Wed Jun 24 22:14:09 PDT 2020

jdoerfert marked 3 inline comments as done.
jdoerfert added inline comments.

================
Comment at: llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h:282
+  IRBuilder<> AllocaBuilder;
+
   /// Map to remember source location strings
----------------
fghanim wrote:
> jdoerfert wrote:
> > fghanim wrote:
> > > What's the benefit of this over just maintaining an alloca insertion point?
> > > 
> > > With this we will have 3 `IRBuilder`s to maintain and keep track of: the clang (or flang) `IRBuilder`, the OMP `IRBuilder` and the `allocaBuilder`
> > No functional difference. Basically the same reason why clang has an AllocIRBuilder (I think), we can use it independently of the other one. The alternative is to save the builder IP, set it to the alloca IP, do alloca stuff, reset the builder IP, when you use the AllocaIRBuilder below. Again, no functional difference, just less setting/resetting of IPs.
> I understand where you are coming from, and I understand that functionally there is mostly no change - for now. But I prefer we do a small struct with some RAII, over adding an entire builder. I think it's a bit overkill and will be a source to lots of trouble.
> 
> Clang maintains an instruction `AllocaInsertPt`, not a specific builder. Speaking of which, This is completely from memory (i.e. check to make sure); I think we already save the previous `AllocaInsertionPt` in clang as part of the `bodyGenCB`. In which case, this is not needed; a nested `parallel` will always be generated by clang as part of the `bodyGenCB`, which in turn is going to save the Alloca insertion point for us. 
> To enable the creation of the allocas for `TID` and `ZeroAddr` in the outer function, Why not pass current `AllocaInsertionPt` as an arg. to `createParallel`
> I understand where you are coming from, and I understand that functionally there is mostly no change - for now. But I prefer we do a small struct with some RAII, over adding an entire builder. I think it's a bit overkill and will be a source to lots of trouble.

Could you explain what kind of overkill you see here? And maybe also the lots of trouble you expect from a builder versus a insertion point? It is the same information, just packed in a different struct, right?

> Clang maintains an instruction AllocaInsertPt, not a specific builder. 

Right.

> Speaking of which, This is completely from memory (i.e. check to make sure); I think we already save the previous AllocaInsertionPt in clang as part of the bodyGenCB. 

I think so.

> In which case, this is not needed; 

Not strictly, no. 

> a nested parallel will always be generated by clang as part of the bodyGenCB,

Yes.

> which in turn is going to save the Alloca insertion point for us.

Yes.

> To enable the creation of the allocas for TID and ZeroAddr in the outer function, Why not pass current AllocaInsertionPt as an arg. to createParallel

So far, because this will not be the only one that needs to create allocas. If you want to add the allocaIP to all runtime calls that create allocas, I'm fine with that too.

================
Comment at: llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h:294
     PostOutlineCBTy PostOutlineCB;
+    BasicBlock *EntryBB, *ExitBB;
+
----------------
fghanim wrote:
> jdoerfert wrote:
> > fghanim wrote:
> > > I see two benefits of passing entry and exit blocks, as opposed to what we used to do:
> > > 1. less memory, but in return we collect blocks twice (i.e. O(N) mem & O(N+E) work vs O(1) mem and 2 * O(N+E) work ). Do you expect that the vector is likely to become large enough where it is a problem? if not, what's the benefit of the change?
> > > 
> > > 2. If some blocks are added later, then this becomes a correctness issue. Which is unlikely since it happens after the body codegen is complete. However, if I am mistaken, shouldn't we also delay searching for inputs/outputs?
> > It is 2. Here, finalization steps will outline the inner region and introduce split blocks in the process. Those were not in the outer regions blocks vector which caused problems. The inputs/outputs are not allowed to change though, that invariant seems to hold fine.
> > (Once we do OpenMP loop transformations on this level other new blocks would be introduced during finalize.)
> Oh right. Thanks for explaining that. :)
> Nit: if possible, add a simple assert to check that inputs/outputs didn't change
We can only if we remember them. I'll do it in debug builds I guess.

================
Comment at: llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h:299
+    void collectBlocks(SmallPtrSetImpl<BasicBlock *> &BlockSet,
+                       SmallVectorImpl<BasicBlock *> &BlockVector);
   };
----------------
fghanim wrote:
> jdoerfert wrote:
> > fghanim wrote:
> > > What is the benefit of passing `blockSet` when it is exclusively used inside of `collectBlocks`? 
> > > I don't think I saw a usage of it in calling functions. am I missing something?
> > It's used in line 684 outside of collectBlocks.
> Oh, Thanks. I missed that. :)
> In this case; I think both the Blockset, and BlockVector have the same contents, correct? can't we use the vector instead on line 684, and keep the set local?
We could but we don't want to: 
(1) that will make the currently logarithmic query in 684 linear, and 
(2) we already compute the set during the vector construction because the we would get the same linear instead of logarithmic behavior there.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82470/new/

https://reviews.llvm.org/D82470