[Openmp-dev] Runtime error when executing multiple target regions within a target data region

Joachim Protze via Openmp-dev openmp-dev at lists.llvm.org
Fri Jul 13 09:26:43 PDT 2018


Hi Alexey,

I pulled and rebuilt everything. I still see the same issue.

I try to offload to Tesla P100 SMX2 16GB.
Also, we built everything with Cuda 9.1

I did some experiments on the num-teams. 57 teams and more run into the 
issue. With 56 teams the code runs trough. We checked the documentation 
and this specific card has 56 SM, so starting with 57 teams some SM need 
to be reused.

The execution with 57 teams shows a big variation in successful 
repetition of target regions. I have seen 66 successful iterations 
before falling back to the host, but also the 19th iteration falling 
back to the host.

-> Especially this random behavior suggests some kind of data race.

When removing the num_teams and thread_limit clause, the runtime chooses 
128 teams and 96 threads. With that number of teams the execution falls 
back reliably during the 10th iteration.



This is the libomptarget debug output, you can see the print from the 
target region (I changed the condition, so that the last iteration 
prints) and then the error messages:

64, 992, 0
Target CUDA RTL --> Kernel execution error at 0x0000000000d93910!
Target CUDA RTL --> CUDA error(700) is: an illegal memory access was 
encountered
Libomptarget --> Looking up mapping(HstPtrBegin=0x00007fda4a068010, 
Size=0)...
Libomptarget --> Mapping exists with HstPtrBegin=0x00007fda4a068010, 
TgtPtrBegin=0x00007fd988000000, Size=0, updated RefCount=1
Libomptarget --> There are 0 bytes allocated at target address 
0x00007fd988000000 - is not last
Libomptarget --> Looking up mapping(HstPtrBegin=0x00007fda4541c010, 
Size=0)...
Libomptarget --> Mapping exists with HstPtrBegin=0x00007fda4541c010, 
TgtPtrBegin=0x00007fd982000000, Size=0, updated RefCount=1
Libomptarget --> There are 0 bytes allocated at target address 
0x00007fd982000000 - is not last
OMP: Warning #96: Cannot form a team with 64 threads, using 48 instead.
OMP: Hint Consider unsetting KMP_DEVICE_THREAD_LIMIT (KMP_ALL_THREADS), 
KMP_TEAMS_THREAD_LIMIT, and OMP_THREAD_LIMIT (if any are set).
48, 2147483647, 1
Libomptarget --> Unloading target library!
Libomptarget --> Image 0x00000000006020c0 is compatible with RTL 
0x000000000063f8b0!
Libomptarget --> Unregistered image 0x00000000006020c0 from RTL 
0x000000000063f8b0!
Libomptarget --> Done unregistering images!
Libomptarget --> Removing translation table for descriptor 
0x000000000062ba70
Libomptarget --> Done unregistering library!
Target CUDA RTL --> Error when unloading CUDA module
Target CUDA RTL --> CUDA error(700) is: an illegal memory access was 
encountered


Best
Joachim

On 07/13/2018 04:37 PM, Alexey Bataev wrote:
> Hi Joachim, I tried to compile your example with the latest version of
> the compiler and it works for me without any changes. Could you also try
> it with the updated version?
> 
> -------------
> Best regards,
> Alexey Bataev
> 
> 13.07.2018 5:50, Joachim Protze via Openmp-dev пишет:
>> Hi all,
>>
>> we experience strange errors when we try to launch multiple target
>> regions within a data region, see attached code. The result when using
>> unstructured data mapping is similar. We are using clang built from
>> trunk this week.
>>
>> When we map the data for each iteration (as in line 23), the whole
>> code runs through. When we use a larger value for TEAMS, the execution
>> falls back to the host in an earlier iteration (for 1024 in the second
>> iteration instead of 7th as shown below).
>>
>> So, there seems to be an issue with the allocation of teams, when the
>> data region stays open. Any ideas, how this can be fixed?
>>
>> Best,
>> Joachim
>>
>>
>> Output when running the attached code (num_teams, thread_limit,
>> is_initial_device):
>>
>> 256, 992, 0
>> 0
>> 256, 992, 0
>> 1
>> 256, 992, 0
>> 2
>> 256, 992, 0
>> 3
>> 256, 992, 0
>> 4
>> 256, 992, 0
>> 5
>> 256, 992, 0
>> OMP: Warning #96: Cannot form a team with 256 threads, using 48 instead.
>> OMP: Hint Consider unsetting KMP_DEVICE_THREAD_LIMIT
>> (KMP_ALL_THREADS), KMP_TEAMS_THREAD_LIMIT, and OMP_THREAD_LIMIT (if
>> any are set).
>> 48, 2147483647, 1
>> 6
>> 48, 2147483647, 1
>> 7
>> 48, 2147483647, 1
>> 8
>> 48, 2147483647, 1
>> 9
>>
>>
>> _______________________________________________
>> Openmp-dev mailing list
>> Openmp-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
> 
> 



More information about the Openmp-dev mailing list