[Openmp-commits] [PATCH] D64217: [OpenMP][NFCI] Cleanup the target state queue implementation

Thu Jul 18 09:17:16 PDT 2019

ABataev added a comment.

> If they are uninitialized in LLVM-IR, it doesn't matter what happens at the PTX level and lower, LLVM could simply get rid of the accesses.

They are initialized in LLVM IR, by zeroinitializer, they are not initialized in PTX file explicitly. But it is ok for PTX since, according to the documents, the memory implicitly initialized by zeros.

> Why do you fight so hard against zero-cost initialization that *might be* needed for correctness?

I'm not fighting against `zero-cost` initialization. Just it is not possible to implement `zero-cost` initialization.

> Do I need/want the code and results of a correct and faster implementation? Yes, please share the results and the code so we can improve things. I don't know why you think I would not want improvements?

The code is not implemented in full yet. I'm doing it step by step. As soon as I commit the patches with threadid fixes etc. I will continue working on code and performance improvement.
I don't have full implementation at the moment as I'm busy with other things, like bug fixing.
The results are just the times of our internal test suite.
Here they are:

  real 315.65
  user 63.56
  sys 104.82

  real 310.98
  user 62.94
  sys 105.07

  real 310.91
  user 54.78
  sys 104.05

  real 301.62
  user 59.57
  sys 105.11

It is the execution results with my previous patches for NVPTX runtime. The last patch I sent for the review gives something about `299` secs `real` time.

> I am not aware of any test case / system configuration under which this patch causes problems. If it does, we split it in two, one to do the refactoring, one to remove volatile.

I'm telling you that `volatile` stuff is required at least for CUDA 8. Make it a conditional, so the volatile modifier could be applied to cuda 8 at least. Any test with full runtime reveals the problem.
If we're going to drop the support for cuda 8 it is a completely different story. In this case, I think, we can try to remove all the volatile functionality since, seems to me, the problems were fixed in cuda9+. But I'm not sure about this, we need to check this very carefully. I don't want to end up with non-functional runtime that works only from time to time, in some particular systems and only with a limited set of code.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64217/new/

https://reviews.llvm.org/D64217