<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/78956>78956</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [PSTL] High memory usage for parallel `std::sort`
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          ruben-laso
      </td>
    </tr>
</table>

<pre>
    The memory usage of parallel `std::sort` is very high compared to the sequential version and even other parallel implementations.

The code in this [repo](https://github.com/ruben-laso/parallel-std-sort-report) is a simple test case to compare the memory usage of parallel `std::sort`, `tbb::parallel_sort` and sequential `std::sort`.
The test case has been replicated in several systems with versions of GCC 10, 11 and 12.

An example of the results (and max. memory usage according to `/usr/bin/time`) is shown in the following table:

| Executable         | Size        | Time     | Max Resident Memory |
| ------------------ | ----------- | -------- | ------------------- |
| ./pstl_sort.out    | 33554432    | 0:00.23  | 423776k             |
| ./tbb_sort.out     | 33554432    | 0:00.44  | 143952k             |
| ./seq_sort.out     | 33554432    | 0:03.32  | 134836k             |
| ./pstl_sort.out    | 1073741824  | 0:05.68  | 14236656k           |
| ./tbb_sort.out     | 1073741824  | 0:13.02  | 4207680k            |
| ./seq_sort.out     | 1073741824  | 2:07.38  | 4198124k            |

*Note: specs and details of the benchmark are in the aforementioned [repo](https://github.com/ruben-laso/parallel-std-sort-report).*

In the example, the parallel `std::sort` (`pstl_sort`) uses ~3 times more memory than the `tbb::parallel_sort` (`tbb_sort`) and the sequential `std::sort` (`seq_sort`).
It also runs faster, though.

Despite the [docs](https://github.com/llvm/llvm-project/blob/main/pstl/README.md) saying that `std::sort`` "require additional O(n) memory space for parallel execution", it seems that the memory overhead is higher than what could be expected.

Is that high memory usage a deliberate trade-off for performance?
Is the algorithm still in development to improve memory usage?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJysVtGSmzYU_Rr55Y4ZEGDgwQ_OOrSZadpOkveOQNdGjUBEV3h3-9Bv70iY2N5sstmZ8mCPhHx0zpHuuRZE6jggbln-huX7lZhcZ-zWTg0Oay3IrBojH7efOoQee2MfYSJxRDAHGIUVWqMGtonJSZbuWLojYx3bxKAITmgfoVPHDlrTj8KiBGfAdQiEXyYcnBLaLyJlBhCDBDzhAMZ1aC_Yqh819jg44ZQZKGLxnsW7-dOTao1EUAO4ThGw_I3F0bB8z3jZOTeSJ8Vrxuujct3URK3pGa8v6hivl63W5OTa0197DOsYr7wKARQ4gENy0ApCr-KsKKh5jS-M3_lp1zTz9LL2r8U378OVPc9ARBfxF0qdIGgQB7A4atUKh9K7QnhCKzTQIznsCe6V6xbLyXP95e4OksAqScLeCb-xeDcAPoig3xyCWos0aUfAeOnX9-IhunVAtK2xUg1H71OQXE9kGa8bNTBeO9VjmA3uUmfuh_n8EA5Ga3MffikajV71FRVW3MHbB2yn8BKWx09_VP_g9fiT6vHr4L14gA9ISuLg4P3MlBV3F9D1Nw88mb4ZP315PX8BjfzNIjcfbGQmt9BJ0zzPspQv45iluziOeDoPM54WxeYzXD_fALumucH9EXCWzcMkS6ucvwBM-OUngdPITwTgNCvTlxg_a0USF2mRJSXProDzaFMujHm62eQ30D9nxXPASRrFfPE4LjZl_Pn1VjwF5p5xEaVnxllSlQnPvgM8f_Ld78b5uw00Ykuh6CQ6oTQtJdbg0Ha9sJ_BR8y5OMTB2BCEygwo__eoixi_qbZ3867n4vcB4Yc_jHzGS7aJv570ucgnQoJ_U_B1T9Ab-zUwXSfmTX4YiDPqcsxnUG_ak0byfUbLSc6_PcfbOwdCkwE7DQQHQQ7tLNJMx-4mAvdIo3Jz1LP8jTQtvei61qflaz1a8ze2zgegNg3jdS9CEHqjGK8_vN3t37-Neul1kXgM8dcJ92z3CJq4xS-TsghCSuWvg9DwB-Pl4BHO3tIoWh-oV50UQ3YqMzDOvVTlgNA3hbDbVSMzJ7QdCunz2bdvtPNR3ft1rZm0hMbfjBFbh_LGq3dntND1b7sCSNSqQSu8lVZIXJvDYWaI9mBsL4YWWVpfASEIfTRWua4HckprXwsST6jN6CvBNxjVj9acbpswS-uV3KaySiuxwm1SxHnF47wqV902aYsCD1XFedEkeGjyJI6rRMj8EKe8qpqV2vKYZ3HCeRJnVV5GbZHmpYybrIh5micJy2LshdKRP93I2ONKEU24Lcoq36y0aFBT-DfF-YD3EF56y_P9ym7DjWimI7Es1oocXVCccjr8Dfvz46ffWL6HX78x8eY8n7kfq8nq7auvZWBIjNdBwX8BAAD__9jvBCI">