[PATCH] D99432: [OPENMP]Fix PR48851: the locals are not globalized in SPMD mode.
Ethan Stewart via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Apr 29 12:36:14 PDT 2021
estewart08 added a comment.
In D99432#2726391 <https://reviews.llvm.org/D99432#2726391>, @ABataev wrote:
> In D99432#2726337 <https://reviews.llvm.org/D99432#2726337>, @estewart08 wrote:
>
>> In D99432#2726060 <https://reviews.llvm.org/D99432#2726060>, @ABataev wrote:
>>
>>> In D99432#2726050 <https://reviews.llvm.org/D99432#2726050>, @estewart08 wrote:
>>>
>>>> In D99432#2726025 <https://reviews.llvm.org/D99432#2726025>, @ABataev wrote:
>>>>
>>>>> In D99432#2726019 <https://reviews.llvm.org/D99432#2726019>, @estewart08 wrote:
>>>>>
>>>>>> In reference to https://bugs.llvm.org/show_bug.cgi?id=48851, I do not see how this helps SPMD mode with team privatization of declarations in-between target teams and parallel regions.
>>>>>
>>>>> DiŠ² you try the reproducer with the applied patch?
>>>>
>>>> Yes, I still saw the test fail, although it was not with latest llvm-project. Are you saying the reproducer passes for you?
>>>
>>> I don't have CUDA installed but from what I see in the LLVM IR it shall pass. Do you have a debug log, does it crashes or produces incorrect results?
>>
>> This is on an AMDGPU but I assume the behavior would be similar for NVPTX.
>>
>> It produces incorrect/incomplete results in the dist[0] index after a manual reduction and in turn the final global gpu_results array is incorrect.
>> When thread 0 does a reduction into dist[0] it has no knowledge of dist[1] having been updated by thread 1. Which tells me the array is still thread private.
>> Adding some printfs, looking at one teams' output:
>>
>> SPMD
>>
>> Thread 0: dist[0]: 1
>> Thread 0: dist[1]: 0 // This should be 1
>> After reduction into dist[0]: 1 // This should be 2
>> gpu_results = [1,1] // [2,2] expected
>>
>> Generic Mode:
>>
>> Thread 0: dist[0]: 1
>> Thread 0: dist[1]: 1
>> After reduction into dist[0]: 2
>> gpu_results = [2,2]
>
> Hmm, I would expect a crash if the array was allocated in the local memory. Could you try to add some more printfs (with data and addresses of the array) to check the results? Maybe there is a data race somewhere in the code?
As a reminder, each thread updates a unique index in the dist array and each team updates a unique index in gpu_results.
SPMD - shows each thread has a unique address for dist array
Team 0 Thread 1: dist[0]: 0, 0x7f92e24a8bf8
Team 0 Thread 1: dist[1]: 1, 0x7f92e24a8bfc
Team 0 Thread 0: dist[0]: 1, 0x7f92e24a8bf0
Team 0 Thread 0: dist[1]: 0, 0x7f92e24a8bf4
Team 0 Thread 0: After reduction into dist[0]: 1
Team 0 Thread 0: gpu_results address: 0x7f92a5000000
--------------------------------------------------
Team 1 Thread 1: dist[0]: 0, 0x7f92f9ec5188
Team 1 Thread 1: dist[1]: 1, 0x7f92f9ec518c
Team 1 Thread 0: dist[0]: 1, 0x7f92f9ec5180
Team 1 Thread 0: dist[1]: 0, 0x7f92f9ec5184
Team 1 Thread 0: After reduction into dist[0]: 1
Team 1 Thread 0: gpu_results address: 0x7f92a5000000
gpu_results[0]: 1
gpu_results[1]: 1
Generic - shows each team shares dist array address amongst threads
Team 0 Thread 1: dist[0]: 1, 0x7fac01938880
Team 0 Thread 1: dist[1]: 1, 0x7fac01938884
Team 0 Thread 0: dist[0]: 1, 0x7fac01938880
Team 0 Thread 0: dist[1]: 1, 0x7fac01938884
Team 0 Thread 0: After reduction into dist[0]: 2
Team 0 Thread 0: gpu_results address: 0x7fabc5000000
--------------------------------------------------
Team 1 Thread 1: dist[0]: 1, 0x7fac19354e10
Team 1 Thread 1: dist[1]: 1, 0x7fac19354e14
Team 1 Thread 0: dist[0]: 1, 0x7fac19354e10
Team 1 Thread 0: dist[1]: 1, 0x7fac19354e14
Team 1 Thread 0: After reduction into dist[0]: 2
Team 1 Thread 0: gpu_results address: 0x7fabc5000000
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D99432/new/
https://reviews.llvm.org/D99432
More information about the cfe-commits
mailing list