[Openmp-commits] [PATCH] D62199: [OPENMP][NVPTX]Fix barriers and parallel level counters, NFC.

Alexey Bataev via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Fri May 24 06:44:45 PDT 2019

ABataev added a comment.

In D62199#1515159 <https://reviews.llvm.org/D62199#1515159>, @grokos wrote:

> Let me address what has been written so far from my perspective and request some changes since there has been demand for them:
> 1. No doubt there are 2 changes here, unrelated to one another. Yes, they could be split into 2 patches to strictly comply with community guidelines. And I would prefer it if the author had submitted two distinct patches. On the other hand, I don't like missing the forest for the trees. This is a 4-line patch, I figured out it is OK to let both changes in at once. Sure, the small size of a patch is not a free pass to doing whatever we want, but in this case committing both changes in one go doesn't do any harm, at least none that I can think of so that I would make a big deal out of it. There's the letter of the law and the spirit of the law, in cases where these two collide I tend to go with the latter.
> 2. My bad that I didn't request a test for cache flushing, I was predisposed due to the NFC tag. I agree there is no way to test whether the cache has been flushed, but @ABataev you could provide a test case which fails without cache flushing so that you can demonstrate that this patch does indeed work. There must be such a test case, after all how did you figure out there is a problem there?

This change was not caused by the test failure or something similar, I just read the standard carefully and noticed that implicit flushing should happen each time the barrier construct is executed. I can try to add the test, anyway. Or provide a reference to a test, if it exist already in the codenase.

> 1. The NFC bit does not apply to the barrier fix. If we keep both changes in one patch, that should be reflected in the title. I agree that not being able to test a change does not make it NFC. You can call it NTC (non-testable change), but not NFC.
> 2. In case of volatile, it seems there's a gray line here between what constitutes a NFC patch. The way I see it, "volatile" prevents compiler optimizations, it doesn't change anything in the way this code works, the algorithms/data structures used etc., the only change is how the code is compiled. On the other hand, we could argue that since the compiled library behaves differently there is indeed a functional change. Once again, to me this is a gray line and there is no absolute answer. However, since here there are more objections than approvals, @ABataev can you change the patch's title and remove the NFC part altogether?

I will revert the commit and split the patch into 3 parts.

> 1. Regarding justification for the need for volatile, I agree that the explanation of the problem is very short, but the solution makes sense given the nature of that problem - at least to me. I could have blocked the patch until the author finds a way to describe the issue in detail - in the meantime there would be a correctness problem in the library which could cause inexplicable failures/wrong results to the users. Even if the "fix" is more of a hack - which I don't believe - I would rather let this hack in than leave the correctness problem lying around indefinitely. From the description I understand that a thread may cache values from the `parallelLevel` array but at the same time those values change in RAM without the thread knowing about it. This leads to corrupt states where some threads have a stale view of the parallel level. The only way to demonstrate the problem is to post ptx code, which I think is too much in the scope of a review, that's why I don't see the point in providing more details. The author could point to two places in the code, one where a thread caches a value from that array and another one where another thread modifies the global value, but it is not always possible to track where those changes occur. This is in case I understood the problem correctly. If not, then @ABataev I am also requesting a detailed explanation of the problem because the current description leads to wrong conclusions.
> 2. 1, 4 and 5 represent my point of view, which is of course subjective, anyone could have a different mentality. But calling my reviews "reviews" and my activity here "suspicious" doesn't contribute at all. In pretty much the same way the "this is your problem, I've got the owner's approval" reply is inappropriate (and in no way do I endorse this "argument" as a reply to reviewers' requests).
> 3. Since February 2018 I have been the owner of the libomptarget project. The issue of libomptarget having no owner came up in one of our biweekly multi-company telecons, I was the only one who volunteered and at a subsequent F2F there was consensus for my appointment since I had written most of the code for the base library and the CUDA and generic-elf64 plugins.
> 4. Regarding the nvptx device runtime, certainly it is a different project from the rest of libomptarget. I think the RTL is still considered "coupled" with libomptarget for historical reasons only (both the base library, the plugins and the RTL were all written by the same people at the same company in the scope of the same project). But in terms of further development and expertise needed, it is a completely different kettle of fish. From my current position I can only contribute to the RTL by reviewing patches, it is true that I no longer do any development in this RTL. If anyone feels that the nvptx RTL repository should be managed/owned separately from the rest of libomptarget, please speak up and I will raise the issue at the next community telecon.




More information about the Openmp-commits mailing list