[Openmp-commits] [PATCH] D119313: [OpenMP][CUDA] Set the hard team limit to 2^31-1

Wed Feb 9 15:08:06 PST 2022

tianshilei1992 added a comment.

In D119313#3309288 <https://reviews.llvm.org/D119313#3309288>, @jdoerfert wrote:

> We ask the device how many blocks/teams it supports, do we not?

Yes, we do. Then after that query, we also have a capping.

  // Query attributes to determine number of threads/block and blocks/grid.
  int MaxGridDimX;
  Err = cuDeviceGetAttribute(&MaxGridDimX, CU_DEVICE_ATTRIBUTE_MAX_GRID_DIM_X,
                             Device);
  if (Err != CUDA_SUCCESS) {
    DP("Error getting max grid dimension, use default value %d\n",
       DeviceRTLTy::DefaultNumTeams);
    DeviceData[DeviceId].BlocksPerGrid = DeviceRTLTy::DefaultNumTeams;
  } else if (MaxGridDimX <= DeviceRTLTy::HardTeamLimit) {
    DP("Using %d CUDA blocks per grid\n", MaxGridDimX);
    DeviceData[DeviceId].BlocksPerGrid = MaxGridDimX;
  } else {
    DP("Max CUDA blocks per grid %d exceeds the hard team limit %d, capping "
       "at the hard limit\n",
       MaxGridDimX, DeviceRTLTy::HardTeamLimit);
    DeviceData[DeviceId].BlocksPerGrid = DeviceRTLTy::HardTeamLimit;
  }

So actually, I think we even don't need this "hard" limit. We don't have that limit in the spec. I don't know why they are there at the first place.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119313/new/

https://reviews.llvm.org/D119313