[PATCH] D70172: [CUDA][HIP] Fix assertion due to dtor check on windows

Yaxun Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jan 9 10:23:52 PST 2020


yaxunl added a comment.

In D70172#1809571 <https://reviews.llvm.org/D70172#1809571>, @rjmccall wrote:

> I thought you were saying that the destructor decl hadn't been created yet, but I see now that you're saying something more subtle.
>
> `CurContext` is set to the destructor because the standard says in [class.dtor]p13:
>
>   At the point of definition of a virtual destructor (including an implicit definition), the non-array deallocation function is determined as if for the expression `delete this` appearing in a non-virtual destructor of the destructor’s class.
>   
>
> Which is to say that, semantically, the context is as if it were within the destructor, to the extent that this affects access control and so on.
>
> I can see why this causes problems for your call graph (really a use graph), since it's a use in the apparent context of the destructor at a point where the destructor is not being defined.  A similar thing happens with default arguments, but because we don't consider uses from default arguments to be true ODR-uses until the default argument is used, that probably doesn't cause problems for you.
>
> I don't think the destructor -> deallocation function edge is actually interesting for your use graph.  It'd be more appropriate to treat the deallocation function as used by the v-table than by the destructor; I don't know whether you make any attempt to model v-tables as nodes in your use graph.  You might consider finding a simple way to suppress adding this edge, like just not adding edges from a destructor that's not currently being defined (`D->willHaveBody()`).
>
> With all that said, maintaining a use graph for all the functions you might emit in the entire translation unit seems very expensive and brittle.  Have you considered doing this walk in a final pass?   You could just build up a set of all the functions you know you're going to emit and then walk their bodies looking for uses of lazy-emitted entities.  If we don't already have a function that calls a callback for every declaration ODR-used by a function body, we should.


The deferred diagnostic mechanism is shared between CUDA/HIP and OpenMP. The diagnostic messages not only depend on the callee, but also depend on the caller, the caller information needs to be kept. Also if a caller is to be emitted, all the deferred diagnostics associated with the direct or indirect callees need to be emitted. Therefore a call graph is needed for this mechanism.

If we ignore the dtor->deallocation edge in the call graph, we may miss diagnostics, e.g.

  static __device__ __host__ void f(__m256i *p) {
    __asm__ volatile("vmovaps  %0, %%ymm0" ::"m"(*(__m256i *)p)
                   : "r0"); // MS-error{{unknown register name 'r0' in asm}}
  }
  struct CFileStream {
    void operator delete(void *p) {
      f(0);  // MS-note{{called by 'operator delete'}}
    }
    CFileStream();
    virtual ~CFileStream();  // MS-note{{called by '~CFileStream'}}
  };
  
  struct CMultiFileStream {
    CFileStream m_fileStream;
    ~CMultiFileStream();
  };
  
  // This causes vtable emitted so that deleting dtor is emitted for MS.
  CFileStream::CFileStream() {}

Assuming the host compilation is on windows.

Here f() is a host device function which is unknown to be emitted, therefore the inline assembly error results in a delayed diagnostic. When f() is checked in the delete operator body, a 'delete operator -> f' edge is added to the call graph since f() is unknown to be emitted.

Since CFileStream::CFileStream is defined, clang sets vtbl to be emitted and does an explicit dtor check even though dtor is not defined. clang knows that this dtor check is for deleting dtor and will check delete operator as referenced, which causes `dtor -> delete operator' to be added to the call graph. Then clang checks dtor as referenced. Since deleting dtor will be emitted together with vtbl, clang should assume dtor is to be emitted. Then clang will found the callees 'delete operator' and f(), and emits the delayed diagnostics associated with them.

If we do not add 'dtor -> delete operator' edge to the call graph, the diagnostic msg in f() will not be emitted.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172





More information about the cfe-commits mailing list