[llvm] [mlir] [openmp] [Flang][OpenMP] Add support for schedule clause for GPU (PR #81618)

Tue Feb 20 15:23:02 PST 2024

================
@@ -711,8 +716,8 @@ template <typename Ty> class StaticLoopChunker {
 
         ++IV;
       }
-
-      IV += KernelIteration;
+      // Start the new kernel iteration before the first thread chunk
+      IV += (KernelIteration - EffectiveThreadChunk);
----------------
jdoerfert wrote:

Now that I look at this again, I see why you need to adjust this.
We might need a second loop, which is unfortunate but as long as default values will allow us to fold it, it's OK.
I think we might need to make KernelIteration larger, as described above.
However, as you noted, we certainly need to cover the block chunk gap (iteration 22 to 65 in the example above).
So, in this loop IV would be incremented by EffectiveThreadChunk.
This happens in an outer loop BlockChunk times, then we would move on and increment to the start of the next block chunk.

We really need the test that tracks what thread and block executed which iteration.


https://github.com/llvm/llvm-project/pull/81618