[PATCH] D70172: [CUDA][HIP][OpenMP] Emit deferred diagnostics by a post-parsing AST travese

Yaxun Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Mar 18 14:07:50 PDT 2020


yaxunl marked 4 inline comments as done.
yaxunl added inline comments.


================
Comment at: clang/lib/Sema/Sema.cpp:1514
+  void visitUsedDecl(SourceLocation Loc, Decl *D) {
+    if (auto *TD = dyn_cast<TranslationUnitDecl>(D)) {
+      for (auto *DD : TD->decls()) {
----------------
rjmccall wrote:
> yaxunl wrote:
> > rjmccall wrote:
> > > bader wrote:
> > > > yaxunl wrote:
> > > > > rjmccall wrote:
> > > > > > yaxunl wrote:
> > > > > > > rjmccall wrote:
> > > > > > > > erichkeane wrote:
> > > > > > > > > rjmccall wrote:
> > > > > > > > > > erichkeane wrote:
> > > > > > > > > > > Note that when recommitting this (if you choose to), this needs to also handle NamespaceDecl.  We're a downstream and discovered that this doesn't properly handle functions or records handled in a namespace.
> > > > > > > > > > > 
> > > > > > > > > > > It can be implemented identically to TranslationUnitDecl.
> > > > > > > > > > Wait, what?  We shouldn't be doing this for TranslationUnitDecl either.   I don't even know how we're "using" a TranslationUnitDecl, but neither this case not the case for `NamespaceDecl` should be recursively using every declaration declared inside it.  If there's a declaration in a namespace that's being used, it should be getting visited as part of the actual use of it.
> > > > > > > > > > 
> > > > > > > > > > The logic for `RecordDecl` has the same problem.  
> > > > > > > > > Despite the name, this seems to be more of a home-written ast walking class.  The entry point is the 'translation unit' which seems to walk through everything in an attempt to find all the functions (including those that are 'marked' as used by an attribute).
> > > > > > > > > 
> > > > > > > > > You'll see the FunctionDecl section makes this assumption as well (not necessarily that we got to a function via a call). IMO, this approach is strange, and we should register entry points in some manner (functions marked as emitted to the device in some fashion), then just follow its call-graph (via the clang::CallGraph?) to emit all of these functions.
> > > > > > > > > 
> > > > > > > > > It seemed really odd to see this approach here, but it seemed well reviewed by the time I noticed it (via a downstream bug) so I figured I'd lost my chance to disagree with the approach.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > Sure, but `visitUsedDecl` isn't the right place to be entering the walk.  `visitUsedDecl` is supposed to be the *callback* from the walk.  If they need to walk all the global declarations to find kernels instead of tracking the kernels as they're encountered (which would be a *much* better approach), it should be done as a separate function.
> > > > > > > > 
> > > > > > > > I just missed this in the review.
> > > > > > > The deferred diagnostics could be initiated by non-kernel functions or even host functions.
> > > > > > > 
> > > > > > > Let's consider a device code library where no kernels are defined. A device function is emitted, which calls a host device function which has a deferred diagnostic. All device functions that are emitted need to be checked.
> > > > > > > 
> > > > > > > Same with host functions that are emitted, which may call a host device function which has deferred diagnostic.
> > > > > > > 
> > > > > > > Also not just function calls need to be checked. A function address may be taken then called through function pointer. Therefore any reference to a function needs to be followed.
> > > > > > > 
> > > > > > > In the case of OpenMP, the initialization of a global function pointer which refers a function may trigger a deferred diangostic. There are tests for that.
> > > > > > Right, I get that emitting deferred diagnostics for a declaration D needs to trigger any deferred diagnostics in declarations used by D, recursively.  You essentially have a graph of lazily-emitted declarations (which may or may not have deferred diagnostics) and a number of eagerly-emitted "root" declarations with use-edges leading into that graph.  Any declaration that's reachable from a root will need to be emitted and so needs to have any deferred diagnostics emitted as well.  My question is why you're finding these roots with a retroactive walk of the entire translation unit instead of either building a list of roots as you go or (better yet) building a list of lazily-emitted declarations that are used by those roots.  You can unambiguously identify at the point of declaration whether an entity will be eagerly or lazily emitted, right?  If you just store those initial edges into the lazily-emitted declarations graph and then initiate the recursive walk from them at the end of the translation unit, you'll only end up walking declarations that are actually relevant to your compilation, so you'll have much better locality and (if this matters to you) you'll naturally work a lot better with PCH and modules.
> > > > > I will try the approach you suggested. Basically I will record the emitted functions and variables during parsing and use them as starting point for the final traversal.
> > > > > 
> > > > > This should work for CUDA/HIP. However it may be tricky for OpenMP since the emission of some entities depending on pragmas. Still it may be doable. If I encounter difficulty I will come back for discussion.
> > > > > 
> > > > > I will post the change for review.
> > > > > 
> > > > > Thanks.
> > > > FYI: SYCL is also using deferred diagnostics engine to emit device side diagnostics, although this part hasn't been up-streamed yet, but we are tracking changes in this area.
> > > > SYCL support implementation should be quite similar to CUDA/HIP.
> > > Okay, thank you.  Do you still need all the cases in here for records, templates, and so on?  It looks to me like you should always end up here with exactly the variables and functions that are being used, and you should never need to make special efforts to e.g. visit all the specializations of a template or visit all the methods of a class.
> > I can remove handling of templates and records. However I have to keep the handling of CapturedDecl. It is generated from code like
> > 
> > ```
> > void t1(int r) {}
> > 
> > int main() {
> > #pragma omp target
> >   {
> >     t1(0);
> >   }
> >   return 0;
> > }
> > 
> > ```
> > And it is like a function decl embeded in function main, e.g.
> > 
> > 
> > ```
> > -FunctionDecl 0x86f7c70 <line:8:1, line:15:1> line:8:5 main 'int ()'
> >   `-CompoundStmt 0x873c3f8 <col:12, line:15:1>
> >     |-OMPTargetDirective 0x873c3a0 <line:9:1, col:19>
> >     | `-CapturedStmt 0x873c378 <line:10:3, line:13:3>
> >     |   `-CapturedDecl 0x873bd18 <<invalid sloc>> <invalid sloc> nothrow
> >     |     |-CapturedStmt 0x873c350 <line:10:3, line:13:3>
> >     |     | `-CapturedDecl 0x873c198 <<invalid sloc>> <invalid sloc> nothrow
> >     |     |   |-CompoundStmt 0x873c338 <line:10:3, line:13:3>
> >     |     |   | `-CallExpr 0x873c310 <line:12:5, col:9> 'void'
> >     |     |   |   |-ImplicitCastExpr 0x873c2f8 <col:5> 'void (*)(int)' <FunctionToPointerDecay>
> >     |     |   |   | `-DeclRefExpr 0x873c290 <col:5> 'void (int)' Function 0x86f7b18 't1' 'void (int)'
> >     |     |   |   `-IntegerLiteral 0x873c2b0 <col:8> 'int' 0
> >     |     |   `-ImplicitParamDecl 0x873c228 <line:9:1> col:1 implicit __context 'struct (anonymous at nvptx_va_arg_delayed_diags2.c:9:1) *const restrict'
> >     |     |-AlwaysInlineAttr 0x873c040 <<invalid sloc>> Implicit __forceinline
> >     |     |-ImplicitParamDecl 0x873bda0 <col:1> col:1 implicit .global_tid. 'const int'
> >     |     |-ImplicitParamDecl 0x873be08 <col:1> col:1 implicit .part_id. 'const int *const restrict'
> >     |     |-ImplicitParamDecl 0x873be70 <col:1> col:1 implicit .privates. 'void *const restrict'
> >     |     |-ImplicitParamDecl 0x873bed8 <col:1> col:1 implicit .copy_fn. 'void (*const restrict)(void *const restrict, ...)'
> >     |     |-ImplicitParamDecl 0x873bf40 <col:1> col:1 implicit .task_t. 'void *const'
> >     |     |-ImplicitParamDecl 0x873bfd8 <col:1> col:1 implicit __context 'struct (anonymous at nvptx_va_arg_delayed_diags2.c:9:1) *const restrict'
> >     |     |-RecordDecl 0x873c098 <col:1> col:1 implicit struct definition
> >     |     | `-CapturedRecordAttr 0x873c140 <<invalid sloc>> Implicit
> >     |     `-CapturedDecl 0x873c198 <<invalid sloc>> <invalid sloc> nothrow
> >     |       |-CompoundStmt 0x873c338 <line:10:3, line:13:3>
> >     |       | `-CallExpr 0x873c310 <line:12:5, col:9> 'void'
> >     |       |   |-ImplicitCastExpr 0x873c2f8 <col:5> 'void (*)(int)' <FunctionToPointerDecay>
> >     |       |   | `-DeclRefExpr 0x873c290 <col:5> 'void (int)' Function 0x86f7b18 't1' 'void (int)'
> >     |       |   `-IntegerLiteral 0x873c2b0 <col:8> 'int' 0
> >     |       `-ImplicitParamDecl 0x873c228 <line:9:1> col:1 implicit __context 'struct (anonymous at nvptx_va_arg_delayed_diags2.c:9:1) *const restrict'
> >     `-ReturnStmt 0x873c3e8 <line:14:3, col:10>
> >       `-IntegerLiteral 0x873c3c8 <col:10> 'int' 0
> > 
> > ```
> > If I do not handle it, I will not be able to reach the call of t1().
> Sure, although I wonder if it might be more reasonable to just make UsedDeclVisitor walk into `CapturedDecl`s (and `BlockDecl`s) when it sees the corresponding statements/expressions.  Unlike other declaration references, those are never "cross-references"; they're just local code tied to a declaration for representational reasons.
done


================
Comment at: clang/lib/Sema/Sema.cpp:1540
+    } else if (auto *VD = dyn_cast<VarDecl>(D)) {
+      if (auto *Init = VD->getInit()) {
+        auto DevTy = OMPDeclareTargetDeclAttr::getDeviceType(VD);
----------------
rjmccall wrote:
> yaxunl wrote:
> > rjmccall wrote:
> > > Can there also be deferred diagnostics associated with this initializer?
> > Yes. A global variable may be marked by omp declare target directive to be emitted on device. If the global var is initialized with the address of a function, the function will be emitted on device. If the device function calls a host device function which contains a deferred diag, that diag will be emitted. This can only be known after everything is parsed.
> I meant directly with the initializer.  Is there a way today to defer a diagnostic that you would emit while processing an initializer expression?  If so, this needs to trigger that.
I don't think the initializer itself (without a target declare directive) will cause a deferred diagnostic since it does not cause change of emission states of functions. 


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172





More information about the cfe-commits mailing list