[PATCH] D140264: [OpenMP] Improve AMDGPU Plugin

Kevin Sala Penadés via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Dec 19 14:38:30 PST 2022


kevinsala added inline comments.


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:802
+      } else if (ActionFunction == releaseSignalAction) {
+        if (auto Err = releaseSignalAction(&ActionArgs))
+          return Err;
----------------
jdoerfert wrote:
> kevinsala wrote:
> > We could pass a specialized pointer to the arguments too:
> > 
> > ```
> > if (auto Err = releaseSignalAction((ReleaseSignalArgsTy *) &ActionArgs))
> >   return Err;
> > ```
> > 
> > This would remove the cast on all action functions.
> The cast is a no-op, is it not? And the explicit one in your example is just implicitly done by the compiler in my code. `void* -> arg_ty *` is convertible.
Yes, no performance difference. It was to pass directly the corresponding pointer type and improve readability on action functions. It's a detail with no importance


================
Comment at: openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp:1546
+
     GridValues.GV_Max_Teams = GridMaxDim.x / GridValues.GV_Max_WG_Size;
     if (GridValues.GV_Max_Teams == 0)
----------------
jdoerfert wrote:
> kevinsala wrote:
> > I think here we're still overwriting the value of max teams determined by `OMP_NUM_TEAMS` in the `GenericDeviceTy` constructor. The same for `GV_Max_WG_Size` and  `OMP_TEAMS_THREAD_LIMIT`. To fix it, we could move the code below from the `GenericDeviceTy` constructor to the `GenericDeviceTy::init`:
> > 
> > 
> > ```
> > Error GenericDeviceTy::init(GenericPluginTy &Plugin) {
> >   // NOTE: This call will initialize GV_Max_Teams and GV_Max_WG_Size with device's maximum values.
> >   if (auto Err = initImpl(Plugin))
> >     return Err;
> > 
> >   // Moved code. Originally in GenericDeviceTy constructor.
> >   if (OMP_NumTeams > 0)
> >     GridValues.GV_Max_Teams =
> >         std::min(GridValues.GV_Max_Teams, uint32_t(OMP_NumTeams));
> > 
> >   if (OMP_TeamsThreadLimit > 0)
> >     GridValues.GV_Max_WG_Size =
> >         std::min(GridValues.GV_Max_WG_Size, uint32_t(OMP_TeamsThreadLimit));
> > ```
> I don't follow completely. Can you make it a follow up?
The version on main branch has a problem because we are ignoring the envars `OMP_NUM_TEAMS` and `OMP_TEAMS_THREAD_LIMIT`. It's not related to this patch, but we could fix it here, or in a separate patch. Basically, the current code is:

```
GenericDeviceTy::GenericDeviceTy(int32_t DeviceId, int32_t NumDevices,
                                 const llvm::omp::GV &OMPGridValues)
    : MemoryManager(nullptr), OMP_TeamLimit("OMP_TEAM_LIMIT"),
      OMP_NumTeams("OMP_NUM_TEAMS"),
      OMP_TeamsThreadLimit("OMP_TEAMS_THREAD_LIMIT"),
      /*...  */ {
  // Initialize GV_Max_Teams and GV_Max_WG_Size
  if (OMP_NumTeams > 0)
    GridValues.GV_Max_Teams =
        std::min(GridValues.GV_Max_Teams, uint32_t(OMP_NumTeams));

  if (OMP_TeamsThreadLimit > 0)
    GridValues.GV_Max_WG_Size =
        std::min(GridValues.GV_Max_WG_Size, uint32_t(OMP_TeamsThreadLimit));
}

Error GenericDeviceTy::init(GenericPluginTy &Plugin) {
  // Both CUDA and AMDGPU overwrite the previous values of GV_Max_Teams and
  // GV_Max_WG_Size, ignoring the values of OMP_NUM_TEAMS and OMP_TEAMS_THREAD_LIMIT
  if (auto Err = initImpl(Plugin))
    return Err;

  // ...
}
```

We initialize `GV_Max_Teams` and `GV_Max_WG_Size` considering the envars, but then, the plugin-specific code (in `initImpl`) overwrites them with the device maximums.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140264/new/

https://reviews.llvm.org/D140264



More information about the llvm-commits mailing list