[PATCH] D69922: [OpenMP] Use the OpenMP-IR-Builder

Fri Nov 8 13:38:48 PST 2019

jdoerfert marked an inline comment as done.
jdoerfert added inline comments.

================
Comment at: clang/test/OpenMP/barrier_codegen.cpp:22
+// CLANGCG-NOT: readonly
+// IRBUILDER:      ; Function Attrs: nofree nosync nounwind readonly
+// IRBUILDER-NEXT: declare i32 @__kmpc_global_thread_num(%struct.ident_t*)
----------------
ABataev wrote:
> jdoerfert wrote:
> > ABataev wrote:
> > > jdoerfert wrote:
> > > > ABataev wrote:
> > > > > jdoerfert wrote:
> > > > > > ABataev wrote:
> > > > > > > Not sure about correct use of `nosync` and `readonly` attributes. OpenMP runtime uses lazy initialization of the runtime library and when any runtime function is called, the inner parts of the OpenMP runtime are initialized automatically. It may use some sync primitives and may modify memory, I assume. Same about `nofree`.
> > > > > > There are two versions of these functions, host and device. I assume host functions are not inlined but device functions might be. This is basically all the modes we support right now.
> > > > > > 
> > > > > > If we do not inline the function (host) we don't necessarily care what they do but what effect the user can expect.
> > > > > > The user can not expect to synchronize through `__kmpc_global_thread_num` calls in a defined way, thus `nosync`.
> > > > > > Similarly, from the users perspective there is no way to determine if something was written or freed, no matter how many of these calls I issue and under which control conditions, all I see is the number as a result. Thus, `readonly` and `nofree`. I believe `readnone` is even fine here but it might not work for the device version (see below) so I removed it.
> > > > > > 
> > > > > > If we do inline the function (device) we need to make sure the attributes are compatible with the inlined code to not cause UB. The code of `__kmpc_global_thread_num` at least does not write anything and only reads stuff (as far as I can tell).
> > > > > > 
> > > > > > Please correct me if I overlooked something. 
> > > > > But someone may try to inline the host-based runtime or try to use LTO with it.
> > > > > The question is not about the user expectations but about the optimizations which can be triggered with these attributes.
> > > > This is our runtime and we have supported and unsupported usage models.
> > > Hmm, I don't think this the right approach. Plus, you still did not answer about optimizations. Maybe, currently, these attributes won't trigger dangerous optimizations but they can do this in the future and it may lead to unpredictable results. I would use the pessimistic model here rather than over-optimistic.
> > I did (try to) describe why there cannot be any problems wrt. optimizations:
> > The specified behavior of the runtime call is _as if_ it is `readonly`, `nofree`, and `nosync`.
> > That is, from the perspective of the compiler this is true and optimizations are allowed to use that fact.
> >  
> > The fact that the first ever runtime call initializes the runtime is neither part of the specification nor of the observable behavior. If we change the order between two call to `__kmpc_global_thread_num`, or similar calls, we cannot observe if/which one initialized the runtime and which read only stuff.
> Here is the code of this function from the libomp:
> ```
>   int gtid;
> 
>   if (!__kmp_init_serial) {
>     gtid = KMP_GTID_DNE;
>   } else
> #ifdef KMP_TDATA_GTID
>       if (TCR_4(__kmp_gtid_mode) >= 3) {
>     KA_TRACE(1000, ("*** __kmp_get_global_thread_id_reg: using TDATA\n"));
>     gtid = __kmp_gtid;
>   } else
> #endif
>       if (TCR_4(__kmp_gtid_mode) >= 2) {
>     KA_TRACE(1000, ("*** __kmp_get_global_thread_id_reg: using keyed TLS\n"));
>     gtid = __kmp_gtid_get_specific();
>   } else {
>     KA_TRACE(1000,
>              ("*** __kmp_get_global_thread_id_reg: using internal alg.\n"));
>     gtid = __kmp_get_global_thread_id();
>   }
> 
>   /* we must be a new uber master sibling thread */
>   if (gtid == KMP_GTID_DNE) {
>     KA_TRACE(10,
>              ("__kmp_get_global_thread_id_reg: Encountered new root thread. "
>               "Registering a new gtid.\n"));
>     __kmp_acquire_bootstrap_lock(&__kmp_initz_lock);
>     if (!__kmp_init_serial) {
>       __kmp_do_serial_initialize();
>       gtid = __kmp_gtid_get_specific();
>     } else {
>       gtid = __kmp_register_root(FALSE);
>     }
>     __kmp_release_bootstrap_lock(&__kmp_initz_lock);
>     /*__kmp_printf( "+++ %d\n", gtid ); */ /* GROO */
>   }
> 
>   KMP_DEBUG_ASSERT(gtid >= 0);
> 
>   return gtid;
> 
> ```
> 
> As you can see, it is very complex. It has locks, in the debug mode it writes something, etc. It is not only about the initialization, there are some other calls (like `__kmp_get_global_thread_id()`, which really writes something). Do you really think it is safe to mark this as readonly and nosync?
>  Do you really think it is safe to mark this as readonly and nosync?

Yes. I am arguing why in the last few comments.

The central question is not what the function does but how it behaves (as long as we do not inline it, then what it does leaks).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69922/new/

https://reviews.llvm.org/D69922