[PATCH] D95560: [CUDA][HIP] Fix function scope static variable

Artem Belevich via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jan 27 16:39:12 PST 2021


tra added a comment.

> A static variable in device and global functions is supposed to have
> implicit device attribute. Currently it does not. This causes incorrect 
> diagnostics about host variables accessed by device functions.

Correct diagnostics sevice-side local static vars is a valid concern.
Could you elaborate on why are static variables in device functions are supposed to be `__device__`? I'm not quite sure that it's been established. At least not as a full `__device__` variable, with runtime registration and the host-side shadow.

Judging by the tests and the comments, it may be better to rephrase the purpose of this patch along the lines that it allows treating a subset of the static variables for which the host may need to know device-side address  as `__device__`, with all the overhead it entails. static vars that can't be created in the host code, remain purely static on device. When I read the patch description for the first time, it sounded more invasive than it actually is.



================
Comment at: clang/lib/CodeGen/CodeGenModule.cpp:101
+// does that.
+class CUDAStaticDeviceVarEmitter
+    : public StmtVisitor<CUDAStaticDeviceVarEmitter> {
----------------
Nit. "This class does that" could be dropped. I'd generally follow a `"<this thing> does <that> for <this reason>"` structure.
E.g something along these lines:
```
Helper class for emitting device-side static variables created in host-side functions. While we do not emit host-side functions on device, we still need to emit the static variables the host code will expect to see on the device.
```


================
Comment at: clang/lib/Sema/SemaCUDA.cpp:533-540
+      // isConstantInitializer cannot be called with dependent value, therefore
+      // we skip checking dependent value here. This is OK since
+      // checkAllowedCUDAInitializer is called again when the template is
+      // instantiated.
       AllowedInit =
-          ((VD->getType()->isDependentType() || Init->isValueDependent()) &&
-           VD->isConstexpr()) ||
+          (VD->getType()->isDependentType() || Init->isValueDependent()) ||
           Init->isConstantInitializer(Context,
----------------
This does not seem to be directly relevant for this patch. Perhaps move it into a separate patch?


================
Comment at: clang/lib/Sema/SemaDecl.cpp:7247-7250
+  // CUDA/HIP: Function-scope static variables in device or global functions
+  // have implicit device or constant attribute. Function-scope static variables
+  // in host device functions have implicit device or constant attribute in
+  // device compilation only.
----------------
This is somewhat confusing. I guess the issue is that we're conflating all the functionality implied by the `__device__` attribute and the `accessible on device` which is a subset of it. For the static vars in D functions you only need for it to be accessible on device, IMO. For HD functions, you do need the full `__device__` functionality, with host shadow and runtime registration.

While adding implicit `__device__` works for statics in the device-only functions, it's a bit of an overkill. It also gives us a somewhat different AST between host/device compilations.

Perhaps we can handle statics in device-only functions w/o adding implicit `__device__`. Can we check the parent of the variable instead when we check whether we're allowed to reference the variable? 


================
Comment at: clang/test/CodeGenCUDA/func-scope-static-var.cu:54
+// NORDC: @_ZZ4fun2vE1b = dso_local addrspace(1) global i32 2
+// RDC: @_ZZ4fun2vE1b = internal addrspace(1) global i32 2
+// HOST: @_ZZ4fun2vE1b = internal global i32 2
----------------
What's the reason for externalizing the variables for no-rdc only?
If we do not externalize them, then we'll potentially have a problem with the host code attempting to get variable's device-side address and fail at runtime, because it's not visible on device.

I think the right thing to do here is to always externalize them, but add unique suffix for RDC.


================
Comment at: clang/test/CodeGenCUDA/func-scope-static-var.cu:87
+// In host device function, explicit static device variables are externalized
+// if used and registered. Default static variables are implicit device
+// variables in device compilation and host variables in host compilation,
----------------
Nit: `static variables w/o attributes are implicitly __device__`. Or `By default, static variables are implicitly __device__`.

It's also not clear what you mean by `which are independent`.  It may be better to add more details in a separate sentence.


================
Comment at: clang/test/CodeGenCUDA/func-scope-static-var.cu:126-127
+
+// In kernels, static device variables are not externalized nor shadowed.
+// Static managed variable behaves like a normal static device variable.
+
----------------
We could use an explanation why we're not externalizing or shadowing them.



CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95560/new/

https://reviews.llvm.org/D95560



More information about the cfe-commits mailing list