[llvm-dev] Incorrect code generation when using -fprofile-generate on code which contains exception handling (Windows target)

Chrulski, Christopher M via llvm-dev llvm-dev at lists.llvm.org
Tue Jan 14 10:58:59 PST 2020

Thanks for the link. I agree, looks like it's another instance of the same class of problems. Since the compiler still requires funclet operand bundles, it sounds like one of the 3 options below will be required until a longer term project of eliminating them is available. Anybody have a strong preference?


-----Original Message-----
From: Shoaib Meenai <smeenai at fb.com> 
Sent: Monday, January 13, 2020 1:10 AM
To: Chrulski, Christopher M <christopher.m.chrulski at intel.com>; llvm-dev at lists.llvm.org
Cc: Reid Kleckner <rnk at google.com>
Subject: Re: [llvm-dev] Incorrect code generation when using -fprofile-generate on code which contains exception handling (Windows target)

I think this is the same underlying issue as https://bugs.llvm.org/show_bug.cgi?id=40320. CCing Reid, who's had a bunch of thoughts on this in the past.

´╗┐On 1/11/20, 10:25 AM, "llvm-dev on behalf of Chrulski, Christopher M via llvm-dev" <llvm-dev-bounces at lists.llvm.org on behalf of llvm-dev at lists.llvm.org> wrote:

    I've run into a bug with the LLVM backend that causes incorrect code generation to happen when using -fprofile-generate on programs that contain C++ exception handling when building for Windows.
    The problem occurs when the value profiling inserts function calls into exception handling blocks. The instrumentation inserts value profiling intrinsic calls, and these are subsequently lowered into target library calls. However, these library calls do not get a funclet operand bundle associated with them. This causes the Windows Exception Handling Preparation Pass to drop all the instructions within the exception handler starting from the PGO instrumentation call, and replace them with 'unreachable'. This is being done by the function removeImplausibleInstructions (WinEHPrepare.cpp).
    A simple reproducer of the problem shown here which will lead to incorrect code on the method test::run(). In this example, the virtual function called from within the exception handler triggers the bug when using -fprofile-generate.
      #include <stdexcept>
      #include <iostream>
      extern void may_throw(int);
      class base {
        base() : x(0) {};
        int get_x() const { return x; }
        virtual void update() { x++; }
        int x;
      class derived : public base {
        derived() {}
        virtual void update() { x--; }
      class test {
        void run(base* b, int count) {
          try {
            for (int i = 0; i < count; ++i)
          catch (std::exception& e) {
            // Virtual function call in exception handler for value profiling.
      void run_test() {
        test tester;
        base *obj = new derived;
        tester.run(obj, 100);
        std::cout << "Value in obj (should be -1): " << obj->get_x() << "\n";
        if (obj->get_x() == -1)
          std::cout << "test passed\n";
          std::cout << "test failed\n";
      int main() {
        // Without PGO, test runs and prints result.
        // With -fprofile-generate, program seg-faults without printing.
        return 0;
      void may_throw(int x) {
        if (x > 10)
          throw std::range_error("value out of range");
    On Windows, build with: clang -O2 -fprofile-generate test.cpp
    When profiling is enabled the program will seg fault without printing anything. Without the -fprofile-generate flag, the program will run successfully.
    The compiler problem is as follows: Prior to the Windows Exception Handling Preparation Pass, the IR for the function "test::run" contains the following:
    19:                                               ; preds = %17
      %20 = catchpad within %18 [%rtti.TypeDescriptor19* @"??_R0?AVexception at std@@@8", i32 8, %"class.std::exception"** %6]
      %21 = load i64, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @"__profc_?run at test@@QEAAXPEAVbase@@H at Z", i64 0, i64 2), align 8
      %22 = add i64 %21, 1
      store i64 %22, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @"__profc_?run at test@@QEAAXPEAVbase@@H at Z", i64 0, i64 2), align 8
      %23 = bitcast %class.base* %1 to void (%class.base*)***
      %24 = load void (%class.base*)**, void (%class.base*)*** %23, align 8, !tbaa !9
      %25 = load void (%class.base*)*, void (%class.base*)** %24, align 8
      %26 = ptrtoint void (%class.base*)* %25 to i64
      call void @__llvm_profile_instrument_target(i64 %26, i8* bitcast ({ i64, i64, i64*, i8*, i8*, i32, [2 x i16] }* @"__profd_?run at test@@QEAAXPEAVbase@@H at Z" to i8*), i32 0)
      call void %25(%class.base* %1) [ "funclet"(token %20) ]
      call void @_CxxThrowException(i8* null, %eh.ThrowInfo* null) #15 [ "funclet"(token %20) ]
    Following this pass, this IR has been replaced with the following, causing a breakage to the original program. This is occurring because the instrumentation function call, "__llvm_profile_instrument_target", is not marked with the funclet operand bundle [ "funclet"(token %20) ].
    19:                                               ; preds = %17
      %20 = catchpad within %18 [%rtti.TypeDescriptor19* @"??_R0?AVexception at std@@@8", i32 8, %"class.std::exception"** %6]
      %21 = load i64, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @"__profc_?run at test@@QEAAXPEAVbase@@H at Z", i64 0, i64 2), align 8
      %22 = add i64 %21, 1
      store i64 %22, i64* getelementptr inbounds ([3 x i64], [3 x i64]* @"__profc_?run at test@@QEAAXPEAVbase@@H at Z", i64 0, i64 2), align 8
    Possible solutions:
      1) Avoid value profiling of calls within exception handling blocks
        Pros: Solves the problem
        Cons: Could lose some cases of value profiling, but since the exception code is not supposed to be the primary execution path, this should not be a significant performance issue.
      2) Propagate the funclet information onto the value profiling intrinsics created. And then also propagate this info to the library routines these intrinsics get lowered into. 
          For indirect function calls, the funclet information can be copied from the original function call. 
          However, for MemIntrinsic call operand value profiling, these do not have funclet operand bundles attached to them by the front-end. (Not sure if it's possible to do because the interfaces that are used to create these do not take operand bundles) Therefore, PGO would need to determine the appropriate funclet value with colorEHFunlets to identify the funclet operand bundle to attach to the instrumentation calls. Unfortunately, because it is possible that a basic block could be associated with multiple funclets or both a funclet and outside the funclet, this may also need to clone some of basic blocks similar to the WinEHPrepare.cpp routine cloneCommonBlocks(), prior to computing the instrumentation.
        Pros: does not disable value profiling opportunities.
        Cons: complex to implement due to the need to determine the appropriate funclet to place on the memory operand value profiling calls. This would necessitate the same cloning behavior to be done for the PGO use compilation.
      3) Teach the Windows Exception Preparation Pass about the value profiling library functions. Currently this pass will ignore llvm intrinsic functions that are marked with the 'does not throw' attribute, but the value profiling intrinsic calls have been lowered from being intrinsic calls into runtime library target specific functions before reaching this point.
        Pros: does not disable value profiling opportunities
        Cons: requires exposing function names from InstrProf.h to the WinEHPrepare.cpp file, or requires a new attribute on the function calls to identify them as instrumentation library calls. Also, the IR does not correctly reflect the correct state regarding the operand bundle funclet information for the PGO inserted function calls.
    For options 2 or 3 to work, it also requires that the PGO indirect function call promotion pass used for -fprofile-use to maintain the 'funclet' operand bundle on the specialized function call that is inserted as a direct function call target. Fortunately, the code within that pass is cloning the original indirect call, so the 'funclet' operand bundle is being maintained on it.
    Any thoughts on which of these options should be taken, or other suggestions for resolving this problem?
    LLVM Developers mailing list
    llvm-dev at lists.llvm.org

More information about the llvm-dev mailing list