[Openmp-commits] [llvm] [mlir] [openmp] [Flang][OpenMP] Add support for schedule clause for GPU (PR #81618)

Tue Feb 20 11:41:53 PST 2024

================
@@ -685,17 +685,22 @@ template <typename Ty> class StaticLoopChunker {
     Ty KernelIteration = NumBlocks * BlockChunk;
----------------
jdoerfert wrote:

Do we need *ThreadChunk here too?

Let's say we have 5 blocks, and each block does a chunk of 3.
Each block has 11 threads and a chunk size of 2.
What I'd expect to work on in one iteration of the do loop below is:
```
Iteration   : 0      1      2    3    ...   20      21
Block/Thread: B0T0, B0T0, B0T1, B0T1, ..., B0T10, B0T10
Iteration   : 66     67     68   69    ...  86      87
Block/Thread: B1T0, B1T0, B1T1, B1T1, ..., B1T10, B1T10
...
Iteration   : 264    265   266  267    ...  284     285
Block/Thread: B4T0, B4T0, B4T1, B4T1, ..., B4T10, B4T10
```
So, 2 * 11 = 22 iterations for a block and 5 * 22 = 110 iterations for the kernel.


https://github.com/llvm/llvm-project/pull/81618