[PATCH] D83906: [CodeGen] Emit a call instruction instead of an invoke if the called llvm function is marked nounwind

Wed Mar 8 17:43:34 PST 2023

dexonsmith added a comment.

In D83906#4179512 <https://reviews.llvm.org/D83906#4179512>, @wlei wrote:

> Hi @ahatanak
>
> We recently hit an issue of inconsistent codegen related with this optimization. In one build, Clang frontend generates different llvm IRs for the same function that is originally from one header file. It turned out this optimization gives different results for different function definition order which is naturally unstable.
>
> See this two repro programs:
>
> p1.cpp: https://godbolt.org/z/bavTYEG1x
>
>   void foo() {};
>   void bar() noexcept {foo();};
>
>  p2.cpp: https://godbolt.org/z/zfsnzPrE6
>
>   void foo();
>   void bar() noexcept {foo();};
>   void foo(){};
>
> See the codegens of bar are different, for p2.cpp, the callee(foo)’s definition is after the caller(bar), it's unknown to be marked `nounwind` before it see foo's definition, so it still generates the `invoke` things.
>
> This inconsistency affected the AutoFDO, one of our work assigns consecutive number IDs to the BBs of CFG, the unstable CFGs causes the BB ID mismatched and a lot of samples are lost.
>
> Would like to hear from your feedback. Wondering if FE can handle this perfectly or perhaps we can just leave it for BE. Thank you in advance!
>
> cc @hoy @modimo @wenlei

To be clear, there's no miscompile, correct?

(Also, can the backend safely optimize an `invoke` to a `linkonce_odr` function that's `nounwind` to a `call`? I thought it couldn't, in case the function is de-refined to a version that's not `nounwind`. But the frontend can do it since it has access to the source and knows it can't be de-refined in that way?)

In any case, let's say the backend can do this optimization.

I wonder if this is just a single example, where there could be various other (header-related) peepholes that cause similar problems for stable output. IIRC, the usual Clang approach is to make as-close-to-optimal IR up front, but maybe in some situations it's desirable to delay optimizations to improve stability. Another application where that could be useful is caching.

Maybe the high level principle deserves a broader discussion on the forums. Do we want IRGen to prefer stable IR, or optimized IR? Should there be a `-cc1` flag to decide (which AutoFDO could set)?

@rjmccall, any thoughts?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83906/new/

https://reviews.llvm.org/D83906