[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.

Ethan Stewart via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Apr 29 11:12:34 PDT 2021


estewart08 added a comment.

In D99432#2726060 <https://reviews.llvm.org/D99432#2726060>, @ABataev wrote:

> In D99432#2726050 <https://reviews.llvm.org/D99432#2726050>, @estewart08 wrote:
>
>> In D99432#2726025 <https://reviews.llvm.org/D99432#2726025>, @ABataev wrote:
>>
>>> In D99432#2726019 <https://reviews.llvm.org/D99432#2726019>, @estewart08 wrote:
>>>
>>>> In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not see how this helps SPMD mode with team privatization of declarations in-between target teams and parallel regions.
>>>
>>> DiŠ² you try the reproducer with the applied patch?
>>
>> Yes, I still saw the test fail, although it was not with latest llvm-project. Are you saying the reproducer passes for you?
>
> I don't have CUDA installed but from what I see in the LLVM IR it shall pass. Do you have a debug log, does it crashes or produces incorrect results?

This is on an AMDGPU but I assume the behavior would be similar for NVPTX.

It produces incorrect/incomplete results in the dist[0] index after a manual reduction and in turn the final global gpu_results array is incorrect.
When thread 0 does a reduction into dist[0] it has no knowledge of dist[1] having been updated by thread 1. Which tells me the array is still thread private.
Adding some printfs, looking at one teams' output:

SPMD

  Thread 0: dist[0]: 1
  Thread 0: dist[1]: 0  // This should be 1
  After reduction into dist[0]: 1  // This should be 2
  gpu_results = [1,1]  // [2,2] expected

Generic Mode:

  Thread 0: dist[0]: 1
  Thread 0: dist[1]: 1   
  After reduction into dist[0]: 2
  gpu_results = [2,2]


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99432/new/

https://reviews.llvm.org/D99432



More information about the cfe-commits mailing list