[PATCH] D81036: [OpenMP] Begin Adding Analysis Remarks for OpenMP Best Practises.

Thu Jun 4 10:27:07 PDT 2020

jhuber6 marked an inline comment as done.
jhuber6 added inline comments.

================
Comment at: llvm/lib/Transforms/IPO/OpenMPOpt.cpp:413
+    return false;
+  }
+
----------------
jdoerfert wrote:
> jdoerfert wrote:
> > jhuber6 wrote:
> > > jdoerfert wrote:
> > > > Early exits are usually easier to read, thus:
> > > > ```
> > > > if (!UV)
> > > >  return;
> > > > ```
> > > > 
> > > > ---
> > > > 
> > > > Maybe we should be more descriptive, telling people that fixed numbers are not future compatible. *And* we should not issue the remark if we find a worksharing loop with a constant trip count (that is a multiple of this constant). I guess the latter part can just be a TODO for now, maybe also a test case:
> > > > ```
> > > > #pragma omp parallel for num_threads(8)
> > > > for (int i = 0; i < 8; ++i)
> > > >   body(i);
> > > > ```
> > > > That reminds me that we should determine the loop trip count for worksharing loops as well. We could even use that value to feed it into "set_num_threads" to avoid waking more threads than needed.
> > > Might be helpful, something like "Use of OpenMP parallel region with constant number of threads is not future compatibile. Consider a dynamic thread count instead." 
> > > 
> > > And are there any general ways to extract the loop tripcount in LLVM? Depending on where you are in the optimization the loop could take many different forms as far as I'm aware.
> > "Consider using a scalable thread count." + ", or remove the `num_threads` clause.
> >  And are there any general ways to extract the loop tripcount in LLVM? Depending on where you are in the optimization the loop could take many different forms as far as I'm aware.
> 
> There is, and yes loops have different forms. The usual way is to ask ScalarEvolution for the trip count, however, that won't work for worksharing loops. 
> 
> Though, worksharing loops are actually easy to predict. Look for the `__kmpc_for_init` (or similar) call and follow the lower and upper bound pointers. They should be written exactly once, and the difference between those two values is the trip count, at least for clang as we do normalize the loops. More general we also need to follow the stride argument and do some slightly more complex computation but it is still doable. You can even use ScalarEvolution to represent the bounds, stride, and difference. That way allows you to also create an expression representing the value into the code (if that is needed later), and it will simplify the expression if possible, e.g., constant fold it.
The loop body where `__kmpc_for_init` is called is inside the callback function, which might also be separated by another funciton. I'm guessing the way to handle this would to start with the function where there's a `__kmpc_init` call and go backwards in the gall graph until you find the function that contains the `__kmpc_fork` call and check if there's a corresponding `__kmpc_push_num_threads` call. If there is and the found trip count isn't a multiple of the constant thread count, emit a message. Is that about right? I haven't seen any examples that show traversing the call graph in this way, just iterating over all the functions.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81036/new/

https://reviews.llvm.org/D81036