[PATCH] D28691: Add OpenCL 2.0 atomic builtin functions as Clang builtin

Thu Aug 3 06:30:31 PDT 2017

yaxunl marked 6 inline comments as done.
yaxunl added inline comments.

================
Comment at: include/clang/Basic/Builtins.def:717
+ATOMIC_BUILTIN(__opencl_atomic_fetch_max, "v.", "t")
+
 #undef ATOMIC_BUILTIN
----------------
t-tye wrote:
> Will the OpenCL 2.0 memory fences also be supported which also have a memory order and memory scope?
I am considering supporting it with a separate patch since this patch is already quite large.

================
Comment at: include/clang/Basic/SyncScope.h:23
+enum class SyncScope {
+  OpenCLWorkItem = 0,
+  OpenCLWorkGroup = 1,
----------------
t-tye wrote:
> The OpenCL workitem scope is only used for image fences and does not apply to atomic operations so not sure that it should be in this enumeration which is used only for memory atomics.
You are right. I think we should drop it from this enum for now.

================
Comment at: lib/CodeGen/CGAtomic.cpp:896
+        return V;
+      auto DestAS = getContext().getTargetAddressSpace(LangAS::opencl_generic);
+      auto T = V->getType();
----------------
rjmccall wrote:
> You can sink this line and the next.
will do.

================
Comment at: lib/CodeGen/CGCall.cpp:3911
+          V = Builder.CreateBitOrPointerCast(V,
+                  IRFuncTy->getParamType(FirstIRArg));

----------------
rjmccall wrote:
> No.  Callers should ensure that they've added the right argument type, at least at the level of address spaces.
Will remove.

================
Comment at: lib/CodeGen/TargetInfo.cpp:7554-7555
+  switch (S) {
+  case SyncScope::OpenCLWorkItem:
+    Name = "singlethread";
+    break;
----------------
t-tye wrote:
> OpenCL only uses workitem for image fences which are not the same as atomic memory fences.
> 
> For image fences I don't think it would map to singlethread which is intended for signal handler support to ensure a signal handler has visibility of the updates done by a thread which is more of an optimization barrier. In contrast an image fence may need to flush caches to make the image and vector access paths coherent in a single thread.
> 
> Since this patch does not support fences probably want to leave workitem scope out. Current AMDGPU targets do not need to do anything for an OpenCL image fence, but other targets may need to generate an intrinsic since there is no LLVMIR instruction for this.
Thanks. I will remove this for now.

================
Comment at: lib/Headers/opencl-c.h:13145-13150
   memory_scope_work_item,
   memory_scope_work_group,
   memory_scope_device,
   memory_scope_all_svm_devices,
+#if defined(cl_intel_subgroups) || defined(cl_khr_subgroups)
   memory_scope_sub_group
----------------
t-tye wrote:
> Do these have to have the same values as the SycnScope enumeration? Should that be ensured in a similar way to the memory_order enumeration?
It is desirable to have the same value as SyncScope enumeration, otherwise the library has to do the translation when calling __opencl_atomic_* builtins.

Will do.

================
Comment at: lib/Sema/SemaChecking.cpp:3160
+            Op == AtomicExpr::AO__opencl_atomic_load)
+                ? 0
+                : 1);
----------------
Anastasia wrote:
> Could we merge this and the line after please.
will do

================
Comment at: lib/Sema/SemaChecking.cpp:3103
+        Diag(Scope->getLocStart(),
+             diag::err_atomic_op_has_non_constant_synch_scope)
+            << Scope->getSourceRange();
----------------
rjmccall wrote:
> t-tye wrote:
> > IIRC OpenCL allows the scope to be a runtime value. So will doing this will likely cause failures in conformance?
> Ah, if that's true, you'll need to emit a switch in IRGen, the same way we handle non-constant memory orders.
Will support it in another patch since this one is already quite large.

https://reviews.llvm.org/D28691