<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/123262>123262</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [clang++][aarch64] help optimize __builtin_mul_overflow performance
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            clang
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          eric-yq
      </td>
    </tr>
</table>

<pre>
    Hi team, I have a sample code compiling with clang++, it shows 10 times slower than g++.
The main performance issue is located in function `__builtin_mul_overflow under clang++`
Can you help give some suggestions ? I do not want to use both g++ and clang++ in my CICD pipeline.

Compiling command and `Time taken` comparation:  `( 0.22 seconds vs. 0.02 seconds. )`
```c
# Ubuntu 24.04, g++ 13.3 and clang++ 18.1.3
# Server:AWS c7g.xlarge(AWS Graviton3, Neoverse-V1)

# g++ -std=c++17 -O3 -march=armv8-a+crc testint.cpp -o testint-g++
# ./testint-g++ 
Time taken for 10000000 iterations: 0.0208047 seconds
Sum of results: 9747553088193654009

# clang++ -std=c++17 -O3 -march=armv8-a+crc testint.cpp -o testint-clang++ --rtlib=compiler-rt
# ./testint-clang++ 
Time taken for 10000000 iterations: 0.226598 seconds //// ( 0.22 seconds vs. 0.02 seconds. )
Sum of results: 18269431752893742105
```

Sample code: testint.cpp
```c
#include <iostream>
#include <chrono>
#include <random>
#include <cstdint>
#include <vector>
// 定义 128 位整数类型(如果编译器支持)
using int128_t = __int128;
// 被基准测试的函数
inline bool int128_mul_overflow(int128_t a, int128_t b, volatile int128_t* c) {
    return __builtin_mul_overflow(a, b, c);
}
// 随机生成 128 位整数
int128_t generate_random_int128() {
    static std::mt19937_64 rng(std::random_device{}());
    std::uniform_int_distribution<uint64_t> dist(0, std::numeric_limits<uint64_t>::max());
    // 生成两个 64 位整数,并将它们组合成一个 128 位整数
    int128_t high = static_cast<int128_t>(dist(rng));
    int128_t low = static_cast<int128_t>(dist(rng));
    return (high << 64) | low;
}
// 生成随机数据并存储在 vector 中
std::vector<std::pair<int128_t, int128_t>> generate_random_data(int count) {
    std::vector<std::pair<int128_t, int128_t>> data;
    data.reserve(count);
    for (int i = 0; i < count; ++i) {
        int128_t a = generate_random_int128();
        int128_t b = generate_random_int128();
        data.emplace_back(a, b);
    }
    return data;
}
// 基准测试函数
void benchmark_int128_mul_overflow(const std::vector<std::pair<int128_t, int128_t>>& data) {
    int128_t c = 0; 
    int128_t sum = 0; // 用于累加结果
    auto start = std::chrono::high_resolution_clock::now();
    for (const auto& pair : data) {
        if (int128_mul_overflow(pair.first, pair.second, &c)) {
            sum += c; // 累加结果以防止优化
        }
    }
    auto end = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> duration = end - start;
    std::cout << "Time taken for " << data.size() << " iterations: " << duration.count() << " seconds\n";
    std::cout << "Sum of results: " << static_cast<uint64_t>(sum) << "\n"; // 输出累加结果
}
int main() {
    int iterations = 10000000; // 可以根据需要调整迭代次数
    auto data = generate_random_data(iterations);
    benchmark_int128_mul_overflow(data);
    return 0;
}
```
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJysWEu3o7gR_jXlTR37CPESCy8A20lvkkVPkqWPDLq20oAcEL7T8-tzSjyMfZ25PZme47mNEPX6qupTgew6fW6U2kKYQbhbyd5eTLtVrS7W3_-zOpny-_avGq2SNfAcv-BF3hRK7GR9rRQWpqQ_9VVXujnju7YXLCrZnIFn7pejtthdzHuHHkOra9VhV5l31aK9yAbHBzfA0l8uCmupG7yq9s20tWwKhbrrevqLlSmkVSXqBt_6prDaNAgROx5Pva6sbo51Xx3NTbVvlXnHvilV--BJxICluWzwu-nxoqornvVNYWdqhV1_PquOVHYI_gG_YGmwMRbfZWPRGuw7hSdjL5O7KJtyqZ28qr9j_iXf4VVfVaUbRSGRyRmcwtQ1ydH_ELFfdK3Qym-qgYg5DGUryQfwU6QHgAtkG86xU4Vpyg5v3QbZhs03Ngg8GQKjv-5X0IL7-I9T39geebBhAWVhctTzN_4H7z2x8Tb-KPlVtTfVwv4AWQ5Jmv7rKxbxefNrJduzAi7oxl9aedPWND6p_psi2Du1_qdH_rioSdGkfd3ZEvxdMSy9GNd_93Fdy7a4gL-TbX0Tawk8K9oCLaWhsZviesW1mZbrKYmD4g3ww9MOUv3MgOKbadFjw3-orRqA7QhZApAJFsQTisDSr32N5g1b1fWVdU8lcRCHoc-E8BI_CgPGFoEtofsJwT2oW7e20ifS6OpGtevWvgh7KfMHQuc8ChMxFxTww_zDHyy3V2h5gkdJ4HtxyEXixwH3WLgsygG6r3fKIKEFGi8KWDdF1ZcKwc-16WxL_OPvP-wVl9Y05tVOK5vSvJbpbKkb-2rrpgpr2mlngGUfQrqHJIV9AFkCIkGPC3SrHYgd7CNIQsiC6YLBPobMgywj0WQPIhubSQinLALB3cN7SHL3cA5JBHsB6QESz0klkAr3TEA39xGIHIQ36aEs9B1xim6sx8XRIvg7PB6HJfjZ0n8BKYd08OYAWUoXIgYRkd4sdP4NtkPyJklBBOMz2e4eFUt1Q7yGJ2OqyfCSdIGL2R3pmH9anWh1M5W0ulLzbeApFsAThJj8RURsle3bBl-TOnDh1DptJDjGGe-W0SYuAIdZkrtoY0IxGVAUkLBP80eRjp6fVUMdpI5DOU0A02_pd2el1QU6LkjBT2vrJYkfH6MAW-pTMe-Mekp104UiBfFu0DbHM-gbH-8bTUch2T2WurOtPvXDGZH3urFRcKQ6RtoCLhghM8s2fU3H-LHStaY-XUqMbspfXxifoXwGzmEmIA2mixSj4Pc7YajXnMopSyCL3AVztefaSvhOPIN06IRsrL2EuW55NCzY3fDnOaRQ5jxe9PniemTI1LGQnSVqmUrR3wMXI4ouY4-QzHpotvj_1YzlDVyM_uTg5xgFQzHlpP1lTX9MxMcqnyGPCJB0_wB5uoPEkY_gjlhCkkoFDnyHE6xkdC6fiQvz-c5V6nYZ7aLFXUntP7RLKa0caAEL0zf2uWv-hCmnekaWVptWdTS7ABeTsfsDdCyOnmiXQQZ-5i7z0TU_w-FA1Y9ePuRfOtnfY4W7yQfB0x8UdAGp-lrJQh1Psvi2YL9lr7pCWdTWHZanGvoR8n_B-TejSzypprjUsv12fE37hWk6-6fSCTwaXH-AfkavuKfseavr68XmQ8dQpQcUs9g7chnO0hBECikb6Sbx70fxoFn21lB_t3bs9DGGadSga2rfY6s6Uzk2PhaVKb6NrOsQeVV7A0qknqIlSJDmoI9hu_je8H6ePsFNops33XYOSLcaRjRaAo-KgXeeNLqOI7B4RnEVT3h9is7I06EjHwGZm2HSHQ0XtJVPDOOGmbvZe4XerxzEil6Dfg7AL1WU_fQ6lZemP1XKscZ401kmD9ZDql_pMr2dKBo4fxqxgfNp07Vqp39T01gwyzzN30uZ0Y_NyFTPgtO7SZg3wPnn3n2cyRfGHg-r5RTARdfXj6bvJnExQWZ7VwsDQaQ_2E1Dwoly6bX-w9DkuHgGyGVkenF5sB7SQUe25gKMyGKWLM66hCzSfCDcfO05n5kbL-bRQEB2oIId9bitNIfUe54aXHlSVl9y9nSm3VO7LMXPqHLs9g8zAXsg7ellaFVu_TLxE7lSWy_2YxaHse-vLtsgiZUnvDIRZahOAUtUUrKoLIuYMcViudJbznjIPC9mHuMe34RvPpell3DFfHniPgRM1VJXm6q61RvTnlfuU8vW4z6P-KqSJ1V17qsQ5-PbJodwt2q3JLA-9ecOAlbpznZ3FVbbyn1KWn55CXcQZpLeiKMAwt3w8cVcra71b-p_zPrLb0Crvq22F2uvVNdDVZy1vfSnTWFq4AeyPv6zvrbm36qwwA8umg74YQzotuX_DQAA__9hIjvx">