[PATCH] D109749: Experimental Partial Mem2Reg

Graham Hunter via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 14 02:32:40 PDT 2021


huntergr created this revision.
huntergr added reviewers: chandlerc, jdoerfert, kiranchandramohan.
Herald added subscribers: mgrang, hiraditya, mgorny.
huntergr requested review of this revision.
Herald added a subscriber: sstefan1.
Herald added a project: LLVM.

Clang's current lowering for OpenMP parallel worksharing loops with a reduction clause prevents lots of optimization opportunities because the address of the stack variable for the reduction is passed to an OpenMP runtime function after the loop; this causes SROA/mem2reg to skip over promoting it to SSA form.

The intent of this work is to partially promote the reduction variable to SSA form before the runtime call takes place for a loop like the following so that optimizations (like vectorization) can be performed.

  int loop(int data[restrict 128U]) {
    int retval = 0;
  
  #pragma omp parallel for simd schedule(simd:static) default(none) shared(data) reduction(+:retval)
    for (int i = 0; i < 128; i++) {
      int n = 0;
  
      if (data[i]) {
        n = 1;
        retval += n;
      }
    }
    return retval;
  }

The code as it is right now was written to avoid clashing too much with other code in order to reduce maintenance costs downstream; I expect I'll need to refactor it considerably but I would like to hear from reviewers before undertaking that work.

I have a few questions to resolve first:

- Is this feature something the community wants, or am I just overcomplicating things? Is there an easier way to get the above loop to vectorize?
- I've been a bit paranoid about ensuring ordering here and used the PostDominatorTree; I think it may be possible to do this with a modification to the IDF algorithm used in mem2reg, but I haven't worked through it yet. Does anyone have more experience with it to help guide that?
- This is currently a separate pass, but could be implemented as part of the normal SROA/mem2reg optimization pass. Would this be preferred? Does the outcome of the previous question about PostDom trees affect that?


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D109749

Files:
  llvm/include/llvm/InitializePasses.h
  llvm/include/llvm/Transforms/Scalar.h
  llvm/include/llvm/Transforms/Scalar/PartialMemToReg.h
  llvm/include/llvm/Transforms/Utils/PromoteMemToReg.h
  llvm/lib/Passes/PassBuilder.cpp
  llvm/lib/Passes/PassRegistry.def
  llvm/lib/Transforms/Scalar/CMakeLists.txt
  llvm/lib/Transforms/Scalar/PartialMemToReg.cpp
  llvm/lib/Transforms/Scalar/Scalar.cpp
  llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp
  llvm/test/Transforms/Mem2Reg/partial-mem2reg.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D109749.372440.patch
Type: text/x-patch
Size: 44599 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210914/7a46bb14/attachment.bin>


More information about the llvm-commits mailing list