[Openmp-commits] [PATCH] D132532: [libomptarget] Avoid deleting the same entry multiple times
Ye Luo via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Wed Aug 24 08:17:28 PDT 2022
ye-luo added a comment.
In D132532#3745744 <https://reviews.llvm.org/D132532#3745744>, @gValarini wrote:
> In D132532#3745645 <https://reviews.llvm.org/D132532#3745645>, @ye-luo wrote:
>
>> In D132532#3745640 <https://reviews.llvm.org/D132532#3745640>, @jdoerfert wrote:
>>
>>> I thought about this but IIRC, this doesn't work either. Now you have no way to prevent races between the copy back and copy to since the entry is gone while the copy back runs.
>>
>> I thought about the racing. If race to write the same data, there is no harm. If race to write different data, the source of race is user program which needs to protect the race anyway.
>
> I believe what @jdoerfert is trying to say is that the race will happen between a thread deleting the device data and another making the data transfer. Imagine the following case:
>
> - Thread A: is not the "owner" (`IsOwned == false`) of the entry but will do a data transfer.
> - Thread B: is the "owner" (`IsOwned == true`) of the entry and will delete it.
>
> If A is switched out of execution right after `getTgtPtrBegin` returns, B might remove the data before A starts any data transfer. This will lead to a segfault on the device side. Although it seems to be a problem with the user program, I think it should lead to data corruption and not a segfault. I also believe this may happen on correct programs, but I cannot think of a simple example right now.
>
> I was planning to send a patch with this same fix this week. It was quite similar to this one, but it also had a new reference counter designed to track threads using the entry on the data end function. This way, multiple threads might get the entry, but only the last one would delete the buffer and the entry data.
Your example is valid. I think we probably need to distinguish the transfer not caused by refcount to 0 and transfer caused by refcount to 0.
transfer not caused by refcount to 0 needs to be placed before actually adjusting the recount. I feel we need probably write down a few rules before devising a scheme.
Another more conservative route I though about is only schedule ahead transfers of refcount=INF.
tracking threads. My quick take will be another can of worms. I need to think about it when your patch shows up.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D132532/new/
https://reviews.llvm.org/D132532
More information about the Openmp-commits
mailing list