[PATCH] D82470: [OpenMP][IRBuilder] Support nested parallel regions

Wed Jun 24 16:52:22 PDT 2020

fghanim added inline comments.

================
Comment at: llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h:282
+  IRBuilder<> AllocaBuilder;
+
   /// Map to remember source location strings
----------------
jdoerfert wrote:
> fghanim wrote:
> > What's the benefit of this over just maintaining an alloca insertion point?
> > 
> > With this we will have 3 `IRBuilder`s to maintain and keep track of: the clang (or flang) `IRBuilder`, the OMP `IRBuilder` and the `allocaBuilder`
> No functional difference. Basically the same reason why clang has an AllocIRBuilder (I think), we can use it independently of the other one. The alternative is to save the builder IP, set it to the alloca IP, do alloca stuff, reset the builder IP, when you use the AllocaIRBuilder below. Again, no functional difference, just less setting/resetting of IPs.
I understand where you are coming from, and I understand that functionally there is mostly no change - for now. But I prefer we do a small struct with some RAII, over adding an entire builder. I think it's a bit overkill and will be a source to lots of trouble.

Clang maintains an instruction `AllocaInsertPt`, not a specific builder. Speaking of which, This is completely from memory (i.e. check to make sure); I think we already save the previous `AllocaInsertionPt` in clang as part of the `bodyGenCB`. In which case, this is not needed; a nested `parallel` will always be generated by clang as part of the `bodyGenCB`, which in turn is going to save the Alloca insertion point for us. 
To enable the creation of the allocas for `TID` and `ZeroAddr` in the outer function, Why not pass current `AllocaInsertionPt` as an arg. to `createParallel`

================
Comment at: llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h:294
     PostOutlineCBTy PostOutlineCB;
+    BasicBlock *EntryBB, *ExitBB;
+
----------------
jdoerfert wrote:
> fghanim wrote:
> > I see two benefits of passing entry and exit blocks, as opposed to what we used to do:
> > 1. less memory, but in return we collect blocks twice (i.e. O(N) mem & O(N+E) work vs O(1) mem and 2 * O(N+E) work ). Do you expect that the vector is likely to become large enough where it is a problem? if not, what's the benefit of the change?
> > 
> > 2. If some blocks are added later, then this becomes a correctness issue. Which is unlikely since it happens after the body codegen is complete. However, if I am mistaken, shouldn't we also delay searching for inputs/outputs?
> It is 2. Here, finalization steps will outline the inner region and introduce split blocks in the process. Those were not in the outer regions blocks vector which caused problems. The inputs/outputs are not allowed to change though, that invariant seems to hold fine.
> (Once we do OpenMP loop transformations on this level other new blocks would be introduced during finalize.)
Oh right. Thanks for explaining that. :)
Nit: if possible, add a simple assert to check that inputs/outputs didn't change

================
Comment at: llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h:299
+    void collectBlocks(SmallPtrSetImpl<BasicBlock *> &BlockSet,
+                       SmallVectorImpl<BasicBlock *> &BlockVector);
   };
----------------
jdoerfert wrote:
> fghanim wrote:
> > What is the benefit of passing `blockSet` when it is exclusively used inside of `collectBlocks`? 
> > I don't think I saw a usage of it in calling functions. am I missing something?
> It's used in line 684 outside of collectBlocks.
Oh, Thanks. I missed that. :)
In this case; I think both the Blockset, and BlockVector have the same contents, correct? can't we use the vector instead on line 684, and keep the set local?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82470/new/

https://reviews.llvm.org/D82470