<table border="1" cellspacing="0" cellpadding="8">

    <tr>

        <th>Issue</th>

        <td>

            <a href=https://github.com/llvm/llvm-project/issues/60996>60996</a>

        </td>

    </tr>

    <tr>

        <th>Summary</th>

        <td>

            [Modules] Disappointing O(n^2) scaling of compile times with modules and optimizations

        </td>

    </tr>

    <tr>

      <th>Labels</th>

      <td>

            new issue

      </td>

    </tr>

    <tr>

      <th>Assignees</th>

      <td>

      </td>

    </tr>

    <tr>

      <th>Reporter</th>

      <td>

          davidstone

      </td>

    </tr>

</table>

<pre>

    Given the following set of translation units

```cpp

export module a;

#define REPEAT8(x) \

        (x), \

        (x), \

        (x), \

        (x), \

        (x), \

        (x), \

        (x), \

        (x)

export unsigned a() {

        unsigned x = 0;

        REPEAT8(REPEAT8(REPEAT8(REPEAT8(REPEAT8(REPEAT8(++x))))));

        return x;

}

```

```cpp

export module b;

import a;

export unsigned b() {

        return a();

}

```

```cpp

export module c;

import b;

unsigned c() {

        return b();

}

```

Compiling with optimizations turned on (say, `-O3`) takes a few seconds for `a`, which is reasonable (`a` is designed to be expensive to compile and optimize in this example). But then it takes a few seconds for `b`, and a few seconds for `c`. My expectation was that the compiler would compile and optimize the function `a` once, when compiling `module a`, and then importing modules would be fast. The net effect effect of this problem is that after converting a program to use modules (and not using any module partitions), the last few modules are each taking about 30 minutes to compile, as each high-level module is optimizing almost my entire program serially.

This is effectively quadratic scaling of compile times -- `a` does all the work of `a`, `b` does all the work of `a` + `b`, `c` does all the work of `a` + `b` + `c`, etc.

I understand how we got to this point -- we want the generation of the .pcm file to happen as quickly as possible, because then any downstream consumer that doesn't care about optimizations can get access to that data quickly (just precompiling `b` took about 300 milliseconds on my machine, and precompiling `c` took about 15 milliseconds). However, this seems to sell out the common case of compiling your program.

</pre>

<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMVk2P2zYQ_TX0ZbCGTNmydfAhu862PQQpivwBihpZzFKkwqEsu7--IPXhD2yCLtBDAcGSyeHwvTePIwkidTSIe7Z5ZpvDQnS-tm5fipMqyVuDi8KWl_1v6oQGfI1QWa1tr8wRCD3YCrwThrTwyhrojPLEkgNLPo2_WTJcsm2HETy31nlobNlpBMHS57t4npZYKYPw1-c_P3_6tmN8d2Y8B7Z5meLycYzxl__58C2xW_KdiaKXIBjfRXbbWYV8njwDSw-Q3CiUX0X5-BPjz4w_D3Dvr5sNHPrOGThfx7aHh0p-pLzFQ3lVE2cfq_4oS_GOLCOyUbH_CJ98H98j7BmY_Dmw4kPAXmzTKh1OUa98Dbb1qlF_x0NEEPJhCdYA4zsSl2ivLHn6moYsPAcv3pBAQIU9EEprSoLKuhAkYsgL9LWSNSgCh4KsEYXGkG2MCBMljqS8hQIBzy0aUicM_2WEhyBMOWFDUKEBKAI8i6bVyHi-hOfOh65gQPlfoSpGVCHfuwGSZckSvlwiDOmHbtILAl-LuMMEyUFvO12-jzD2p87IuHqiao3EQRE047KgO8uSuQddwQ1cogtC0BBB454FQiXIL-FbjWDQA1YVyvkWemHQp3W20NgEjSN6UXl0IK054ZBVhJCjE02QuiOct2F8F0AY66GjGGkuk1Vb4byK_hgbTiCrBfko55RBOAQUsg7FiAkK23lIE2iU6TzSTXEjZRqia3WsnzSeUE_bKZpkjWl0Y8lDcwE0XjmcCRA6JbS-LG_N_S2oEIwSdVEn1Bf40YnSCa8kkBSxALaai-hVgwRPT3PNShu4aB1J9ta9hegbd4-W-mUcMP58673RZf96zfQsx_Xo5R3LP6AzJTryoWS17aFHOFofFB5sYJXxgVSP0AszuPiIBt3g7mgXhGUrG6iiChZq0bZoQll-dEq-6Ut4bC2RKoaCFShFcEz0aXBHaXtD3qFogsOoa9ANrgs0DeNbDzKYYjDCfZ-RwsARPQgpkWgAHlYKL-b9Gd9978hD6_Du7ESFvLVvs8WCx7RW08G2JtilEbJWBqfj9ZhFPmRZbe6SxBbzu-3xhG6wvCIgxCaCJdQabDe3h8YakILw6qywy8V2brLrclHu0zJPc7HA_SrbbtP1ap2li3qfb5M0LzdiU-ZilxdZXlXJhifbQpYFlmm5UHue8DThfLPKVqskXa6KJFll6yxbrYvdOkW2TrARSi-1PjVL644LRdThPkvyPFtoUaCm-JHFucEe4iTjPHxzuX1Y81R0R2LrRCvydM3ildfx6-zLcMTZ5gAHRaKN_goMvzK-M2zzmYdXw09PV3zPzG3i2jUHLyw6p_e19y2x9BPjr4y_HpWvu2IpbcP4a4Az3p5aZ7-j9Iy_RhLE-Gsk-U8AAAD__6KyLl0">