[Openmp-dev] Runtime error when executing multiple target regions within a target data region

Joachim Protze via Openmp-dev openmp-dev at lists.llvm.org
Fri Jul 13 09:48:24 PDT 2018


Hi Alexey,

I just realized that the pull from master failed for the openmp 
repository. I rebuilt everything again and it works now.

Thanks
Joachim

On 07/13/2018 06:26 PM, Joachim Protze wrote:
> Hi Alexey,
> 
> I pulled and rebuilt everything. I still see the same issue.
> 
> I try to offload to Tesla P100 SMX2 16GB.
> Also, we built everything with Cuda 9.1
> 
> I did some experiments on the num-teams. 57 teams and more run into the 
> issue. With 56 teams the code runs trough. We checked the documentation 
> and this specific card has 56 SM, so starting with 57 teams some SM need 
> to be reused.
> 
> The execution with 57 teams shows a big variation in successful 
> repetition of target regions. I have seen 66 successful iterations 
> before falling back to the host, but also the 19th iteration falling 
> back to the host.
> 
> -> Especially this random behavior suggests some kind of data race.
> 
> When removing the num_teams and thread_limit clause, the runtime chooses 
> 128 teams and 96 threads. With that number of teams the execution falls 
> back reliably during the 10th iteration.
> 
> 
> 
> This is the libomptarget debug output, you can see the print from the 
> target region (I changed the condition, so that the last iteration 
> prints) and then the error messages:
> 
> 64, 992, 0
> Target CUDA RTL --> Kernel execution error at 0x0000000000d93910!
> Target CUDA RTL --> CUDA error(700) is: an illegal memory access was 
> encountered
> Libomptarget --> Looking up mapping(HstPtrBegin=0x00007fda4a068010, 
> Size=0)...
> Libomptarget --> Mapping exists with HstPtrBegin=0x00007fda4a068010, 
> TgtPtrBegin=0x00007fd988000000, Size=0, updated RefCount=1
> Libomptarget --> There are 0 bytes allocated at target address 
> 0x00007fd988000000 - is not last
> Libomptarget --> Looking up mapping(HstPtrBegin=0x00007fda4541c010, 
> Size=0)...
> Libomptarget --> Mapping exists with HstPtrBegin=0x00007fda4541c010, 
> TgtPtrBegin=0x00007fd982000000, Size=0, updated RefCount=1
> Libomptarget --> There are 0 bytes allocated at target address 
> 0x00007fd982000000 - is not last
> OMP: Warning #96: Cannot form a team with 64 threads, using 48 instead.
> OMP: Hint Consider unsetting KMP_DEVICE_THREAD_LIMIT (KMP_ALL_THREADS), 
> KMP_TEAMS_THREAD_LIMIT, and OMP_THREAD_LIMIT (if any are set).
> 48, 2147483647, 1
> Libomptarget --> Unloading target library!
> Libomptarget --> Image 0x00000000006020c0 is compatible with RTL 
> 0x000000000063f8b0!
> Libomptarget --> Unregistered image 0x00000000006020c0 from RTL 
> 0x000000000063f8b0!
> Libomptarget --> Done unregistering images!
> Libomptarget --> Removing translation table for descriptor 
> 0x000000000062ba70
> Libomptarget --> Done unregistering library!
> Target CUDA RTL --> Error when unloading CUDA module
> Target CUDA RTL --> CUDA error(700) is: an illegal memory access was 
> encountered
> 
> 
> Best
> Joachim
> 
> On 07/13/2018 04:37 PM, Alexey Bataev wrote:
>> Hi Joachim, I tried to compile your example with the latest version of
>> the compiler and it works for me without any changes. Could you also try
>> it with the updated version?
>>
>> -------------
>> Best regards,
>> Alexey Bataev
>>
>> 13.07.2018 5:50, Joachim Protze via Openmp-dev пишет:
>>> Hi all,
>>>
>>> we experience strange errors when we try to launch multiple target
>>> regions within a data region, see attached code. The result when using
>>> unstructured data mapping is similar. We are using clang built from
>>> trunk this week.
>>>
>>> When we map the data for each iteration (as in line 23), the whole
>>> code runs through. When we use a larger value for TEAMS, the execution
>>> falls back to the host in an earlier iteration (for 1024 in the second
>>> iteration instead of 7th as shown below).
>>>
>>> So, there seems to be an issue with the allocation of teams, when the
>>> data region stays open. Any ideas, how this can be fixed?
>>>
>>> Best,
>>> Joachim
>>>
>>>
>>> Output when running the attached code (num_teams, thread_limit,
>>> is_initial_device):
>>>
>>> 256, 992, 0
>>> 0
>>> 256, 992, 0
>>> 1
>>> 256, 992, 0
>>> 2
>>> 256, 992, 0
>>> 3
>>> 256, 992, 0
>>> 4
>>> 256, 992, 0
>>> 5
>>> 256, 992, 0
>>> OMP: Warning #96: Cannot form a team with 256 threads, using 48 instead.
>>> OMP: Hint Consider unsetting KMP_DEVICE_THREAD_LIMIT
>>> (KMP_ALL_THREADS), KMP_TEAMS_THREAD_LIMIT, and OMP_THREAD_LIMIT (if
>>> any are set).
>>> 48, 2147483647, 1
>>> 6
>>> 48, 2147483647, 1
>>> 7
>>> 48, 2147483647, 1
>>> 8
>>> 48, 2147483647, 1
>>> 9
>>>
>>>
>>> _______________________________________________
>>> Openmp-dev mailing list
>>> Openmp-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>
>>
> 



More information about the Openmp-dev mailing list