[PATCH] D70172: [CUDA][HIP] Fix assertion due to dtor check on windows

Thu Nov 14 13:02:16 PST 2019

yaxunl added a comment.

In D70172#1745998 <https://reviews.llvm.org/D70172#1745998>, @rnk wrote:

> Are we sure using both Itanium and MS C++ ABIs at the same time is really the best way forward here? What are the constraints on CUDA that require the Itanium ABI? I'm sure there are real reasons you can't just use the MS ABI as is, but I'm curious what they are. Was there some RFC or design showing that this is the right way forward?
>
> I wonder if it would be more productive to add new, more expansive attributes, similar to `__attribute__((ms_struct))`, that tag class or function decls as MS or Itanium C++ ABI. CUDA could then leverage this as needed, and it would be much easier to construct test cases for MS/Itanium interop. This is an expansion in scope, but it seems like it could be generally useful, and if we're already going to enter the crazy world of multiple C++ ABIs in a single TU, we might as well bite the bullet and do it in a way that isn't specific to CUDA.

We are not using Itanium ABI when we do host compilation of CUDA/HIP on windows. During the host compilation on windows only MS C++ ABI is used.

This issue is not due to mixing MS ABI with Itanium ABI.

This issue arises from the delayed diagnostics for CUDA/HIP. Basically we do not want to emit certain diagnostics (e.g. error in inline assembly code) in `__host__` `__device__` functions to avoid clutter. We only want to emit such diagnostics once we are certain these functions will be emitted in IR.

To implement this, clang maintains a call graph. For each reference to a function, clang checks the current context. If it is evaluating context and it is a function, clang assumes the referenced function is callee and its context is the caller. Clang checks if the caller is known to be emitted (if it has body and external linkage). If not, clang adds this caller/callee pair to the call graph. If the caller is known to be emitted, clang will check if the callee is known to be emitted. If so, do nothing. If the callee is not known to be emitted, clang will eliminate it and all its callee from the call graph, and emits the delayed diagnostics associated with them.

You can see a caller is added to the call graph only if it is not known to be emitted. Therefore clang has an assert that if a callee is known to be emitted, it should not be in the call graph.

On windows, when vtable is known to be emitted for a class, clang does a body check for dtor of the class. It makes the dtor as the context, then checks the dtor. I think it is to emulate the situation that a deleting dtor is calling a normal dtor. This happens if the dtor is not defined since otherwise the dtor has already been checked. Since dtor is not defined yet, it is not known to be emitted and put into call graph. Later on, if the dtor is defined, it will be checked again. This time it is known to be emitted, then clang finds that it is in the call graph, then the assert fails.

So the issue is that clang incorrectly assume the dtor is not known to be emitted in the first check and put it in the call graph. To fix that, a map is added to Sema to tell clang that it is checking a deleting dtor which is supposed to be emitted even if it is not defined.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172