<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/78485>78485</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            OpenMP excessive power consumption for waiting threads
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          marioroy
      </td>
    </tr>
</table>

<pre>
    Re-posting from https://forums.developer.nvidia.com/t/openmp-excessive-power-consumption-for-waiting-threads/279272

"The OpenMP power consumption test is with the `-p` argument to `primes1` or `primes3` which involves ordered output or one thread writing output at a time. Other threads wait their turn, orderly. I expect for the waiting threads to be idle or consume low CPU utilization. That is not the case and seeing full 6400% CPU utilization (AMD Threadripper 3970X - 64 logical CPU threads) for printing prime numbers to /dev/null. Nothing like GNU GCC consuming just173% for the same test."

I see also, near 6400% CPU utilization using clang for the power consumption test, during orderly output.

**[Prime Demos](https://github.com/marioroy/mce-sandbox/tree/main/demos)**

```text
gcc -o primes1.gcc -O3 -fopenmp -I../src primes1.c -lm
clang -o primes1.clang -O3 -fopenmp -I../src primes1.c -lm
nvc -o primes1.nvc -O3 -mp=multicore -I../src primes1.c -lm

gcc -o primes3.gcc -O3 -fopenmp -I../src primes3.c -L/usr/local/lib64 -lprimesieve -lm
clang -o primes3.clang -O3 -fopenmp -I../src primes3.c -L/usr/local/lib64 -lprimesieve -lm
nvc -o primes3.nvc -O3 -mp=multicore -I../src primes3.c -L/usr/local/lib64 -lprimesieve -lm
```

**OpenMP Ordered Power Consumption Test**

```text
Threadripper 3970X idle (browser NV forums page) 120 watts

./primes1.gcc   1e10 -p >/dev/null   10.173 secs, 201 watts
./primes1.clang 1e10 -p >/dev/null   12.729 secs, 288 watts
./primes1.nvc   1e10 -p >/dev/null   21.346 secs, 322 watts

./primes3.gcc   1e10 -p >/dev/null    7.092 secs, 181 watts
./primes3.clang 1e10 -p >/dev/null    8.876 secs, 274 watts
./primes3.nvc   1e10 -p >/dev/null   11.080 secs, 361 watts
```

**OpenMP Performance Test**

```text
Threadripper 3970X idle (browser NV forums page) 120 watts

./primes1.gcc   1e12                 16.168 secs, 399 watts
./primes1.clang 1e12                 16.274 secs, 395 watts
./primes1.nvc   1e12                 14.780 secs, 393 watts

./primes3.gcc   1e12                  5.762 secs, 437 watts
./primes3.clang 1e12                  6.277 secs, 434 watts
./primes3.nvc   1e12                  5.755 secs, 442 watts
```

I first witnessed the power consumption issue using Codon.

https://github.com/exaloop/codon/issues/456

Is it okay for waiting threads to be spinning the CPU during ordered or exclusive blocks?  I wonder about cloud customers possibly paying extra power consumption simply for threads waiting their turn. The Intel oneAPI compilers are also impacted.




</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzMV01v27oS_TX0ZiBBoiRLWniRJs-FgdcmeEgf7paixjZbihRIyh_311-QUv3ROnHQ1Q0MO6bIM3NmDsljZq3YKMQFKT6R4mnGBrfVZtExI7TRx1mj2-Pifxj12jqhNrA2uoOtc70l2QOhS0KXa22GzsYt7lDqHk2sdqIVLOa6I3TpCF3qHlXXR3jgaK3YebQ9mohrZYeud0KraK1NtGfCx4jc1iBrLaFLWta0pCR5IsnD9E7p6xbhuUf15QUCDlzggEPrQFjYC7cFt0Ug8yTqyTwBZjZDh8qB036wN6JDm_on2pwHMj-w3wq-BaF2Wu7QgjYtGmxBD64fnJ-uFcKYJexNSPrnQ-aAgRMdxvDstmimaRY8OZ-QMOAGowh9HHHlMYYV4KFH7mCtTUh6qsRpsdPQIIhWoo8-8kWQeg-PL99gcEKKv5nnH8PrloUCKB3CAWcWgakWLGJo4CAlzPMkIbT4dTUQWj18eYLXENaIvkcDWV0mf0EE8xyk3gjOZFh2alIdsu6NUCHlUEZQQ9egCYkTumxxR-hSDVLG8FW7rZ8nxQ-Ez1-_wefHx4mRH_4-WJeWmU_uZzEs6zD0NSb0SgsrzwmYtNpXUyEzbxIbrAfnkvkKTLi3xeOh2sGElo79mVobX8swvIpPL4HuE3bakuKJ0Op6b2yE2w7NtBV-bir_L8fIMtU2-uD3iEEMz4UK1fJgtJ5iXEadJ-PL4cGNQxvOIdIwqTkOX58ziNbjnoNoFceELq3hpzkcItmNq8eCXKyfBj6OoHZX8cNXv7rrSfbUDdIJrg3eAblBJfsAlcxj_JfQ5WANoUupOZP-UzTzHCI5ThK4wzcJZx8i_AdxrsqSfbwsfxDqJIrf9Tmdks_T-fUSBP94IfjXIPgPyOzGiRBOI0Krxui9RQNf_w_jTQA926A_F1KawJ45Zy_BPd9LtQKkmCYQ9UCy_1ydFf5REqdlBha59fuSJukl4CXU2Mf3oGhc0voMVVVvQflmvZsVTeMsn5-gMkrfo5ndpwllnNT0BJhWb9DM7tOEKq7Kc260zN-CukszTeOkSs4051dZ3ZfdC5q1Nh1THP81QqPw6186j9N5dWZZ13cldhPEV_oMUtwV1w2QPC4v611nH5TV71BQxOX8LKg8K-8K6gaIJ1VegNyX0u1MiuIMktO7IlrBWhjrvIVTaC22b9zWwtoBp5v9UbdaXd3Q79zDeGBS657QJffLCF0GKG8482J-lYoF4UD_YMfgGm7bMtsLpcZhDMbj0j5422gAD1wO3vlCIzX_YUm2BFjBXqsWDbBGDw641EMLfLBOd9469dpa0cgj9Ozo4fDgDLtRBiu6Xh4nV3P2mlNCk930vhBhpRxKb14fXlbAddcL6SMxM7ooEF3PuMP22upcvM_aRdbWWc1muEjLpCjSoijL2XYxL4q2bLIWWZ62eZohrpuW1mlesaZOWTsTC5rQPEnTMq3SOivjlPOiydt6XaQ5tk1N8gQ7JmQs5a6LtdnMQlMWZZVXxUyyBqUNP1IoVbgfm-_9YPE0Mwu_JmqGjSV5IoV19ozihJO4mI6k0y-QG3W80eDZYOTiHSH5INNH1Bv9Hbm7FFNI_Z8AAAD__8kKIik">