<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/74673>74673</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[OpenMP] Unsound deduplication: omp_get_thread_limit
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
MattPD
</td>
</tr>
</table>
<pre>
Here's a simple example:
```
#include <omp.h>
#include <stdio.h>
int main()
{
#pragma omp target thread_limit(4)
{
printf("\n1:target thread_limit: %d\n", omp_get_thread_limit());
}
#pragma omp target thread_limit(3)
{
printf("\n2:target thread_limit: %d\n", omp_get_thread_limit());
}
return 0;
}
```
GCC-compiled binary execution: https://gcc.godbolt.org/z/Pjv3TWoTq
```
1:target thread_limit: 4
2:target thread_limit: 3
```
Clang/LLVM-compiled binary execution: https://clang.godbolt.org/z/zdPbrdMPn
```
1:target thread_limit: 4
2:target thread_limit: 4
```
By my reading of the OpenMP spec GCC does the right thing here; cf. <https://www.openmp.org/spec-html/5.2/openmpse12.html#x34-330002.4>:
> If a target construct with a thread_limit clause is encountered, the thread-limit-var ICV from the data environment of the generated initial task is instead set to an implementation defined value between one and the value specified in the clause.
The common subexpression elimination (CSE) of the second call to `omp_get_thread_limit` by LLVM does not seem to be correct, as it's not an available expression at any program point(s) (in the scope of the clause in question) after the second target construct with a `thread_limit` clause is encountered.
Compiling with `-Rpass=openmp-opt -Rpass-analysis=openmp-opt -Rpass-missed=openmp-opt` we have:
https://clang.godbolt.org/z/G7dfhP7jh
```
<source>:8:42: remark: OpenMP runtime call omp_get_thread_limit deduplicated. [OMP170] [-Rpass=openmp-opt]
8 | printf("\n1:target thread_limit: %d\n",omp_get_thread_limit());
| ^
```
OMP170 has the following explanation: https://openmp.llvm.org/remarks/OMP170.html
> This optimization remark indicates that a call to an OpenMP runtime call was replaced with the result of an existing one. This occurs when the compiler knows that the result of a runtime call is immutable. Removing duplicate calls is done by replacing all calls to that function with the result of the first call. This cannot be done automatically by the compiler because the implementations of the OpenMP runtime calls live in a separate library the compiler cannot see.
This optimization will trigger for known OpenMP runtime calls whose return value will not change.
What's interesting is that the above example for OMP170 shows `omp_get_thread_limit`, too (notably without any intervening changes to thread_limit, like in the reproducer in question).
At the same time I do not believe we have an analysis checking whether this precondition holds here: "This occurs when the compiler knows that the result of a runtime call is immutable."
I presume that in order to determine the "immutable" part we'd need a form of flow-sensitive analysis that would enable us to check for and implement the invalidation ("kill" in the dataflow analysis sense) of the-result-of-the-call-to`omp_get_thread_limit` as an available expression (or other functions listed in `DeduplicableRuntimeCallIDs`)--in other words, anything that would allow us to express that prior calls (like `__kmpc_set_thread_limit`) affect the result of subsequent calls of `omp_get_thread_limit`; cf. the commit implementing the ICV, https://github.com/llvm/llvm-project/commit/d7b12004bd7d6d9a592f1773101cbedd9daf8492
AFAICT, such analysis doesn't appear to exist in the original patch introducing deduplication, either:
- https://github.com/llvm/llvm-project/commit/9548b74a831ea005649465797f359e0521f3b8a9
- https://reviews.llvm.org/D69930
> This initial patch deduplicates `__kmpc_global_thread_num` and `omp_get_thread_num` calls. We can also identify arguments that are equivalent to such a call result and use it instead. Later we can determine "gtid" arguments based on the use in kernel functions etc.
In particular, cf. `deduplicateRuntimeCalls(Function &F, RuntimeFunctionInfo &RFI, Value *ReplVal = nullptr)` in "llvm/lib/Transforms/IPO/OpenMPOpt.cpp". AFAICT, the checked precondition boils down to the presence of _use_:
```
auto &Uses = RFI.UsesMap[&F];
if (Uses.size() + (ReplVal != nullptr) < 2)
return false;
```
If the required flow-sensitive analysis does exists, we could perhaps invalidate the relevant entry (if that's supported?) in `OMPInformationCache::RuntimeFunctionInfo &RFI`; cf. the use of `OMPInfoCache.RFIs` in `deduplicateRuntimeCalls`
https://github.com/llvm/llvm-project/blob/2b76e20ea782790a78ec58d5f94ce88a173bab7f/llvm/lib/Transforms/IPO/OpenMPOpt.cpp#L1467
FWIW, there is an optimization (constant folding) that seems somewhat more precise (i.e., checking for the presence of the `OMPThreadLimitClause` such that `ThreadLimit` from that clause `isIntegerConstantExpr`), cf.
- [OpenMP] Creating the `omp_target_num_teams` and `omp_target_thread_limit` attributes to outlined functions
- https://reviews.llvm.org/D106298
- https://github.com/llvm/llvm-project/commit/0276db14167b9348904322084e7fc1a04cc72452
The relevant logic being this bit:
```
const Expr *CGOpenMPRuntime::getNumThreadsExprForTargetDirective(
CodeGenFunction &CGF, const OMPExecutableDirective &D,
int32_t &DefaultVal)
. . .
Expr *ThreadLimit = nullptr;
Expr *NumThreads = nullptr;
if (D.hasClausesOfKind<OMPThreadLimitClause>()) {
const auto *ThreadLimitClause = D.getSingleClause<OMPThreadLimitClause>();
ThreadLimit = ThreadLimitClause->getThreadLimit();
if (ThreadLimit->isIntegerConstantExpr(CGF.getContext()))
if (auto Constant =
ThreadLimit->getIntegerConstantExpr(CGF.getContext()))
DefaultVal = Constant->getExtValue();
}
```
This is further refined in [OpenMP] Folding threadLimit and numThreads when single value in kernels, https://reviews.llvm.org/D106033
That being said, this is a more special case of _constant_ folding, so there may not be a comparable analysis for (non-constant) available expressions in general.
I do not believe we'd lose the constant folding optimization if we were to revert deduplication, and if the deduplication has only been implemented for immutable functions I don't see how it can be correct (with the particular case of ThreadLimit being already covered by the aforementioned constant folding, the actually constant case appears to be subsumed, too).
One idea for the fix in this context: Treat `omp_get_thread_limit` exactly like `omp_get_max_threads`.
What this means in practice is to remove it from `DeduplicableRuntimeCallIDs`, effectively reverting the addition in this commit (noting that `omp_get_max_threads` is not present in `DeduplicableRuntimeCallIDs`, so it's possible this addition was incorrect in the first place):
- [OpenMP][Opt] Annotate known runtime functions and deduplicate more,
- https://github.com/llvm/llvm-project/commit/e28936f6137c5a9c4f7673e248c192a9811543b6#diff-de101c82aff66b2bda2d1f53fde3dde7b0d370f14f1ff37b7919ce38531230dfR123
Another idea--if this is possible and correct here--would be to special-case the particular case of `omp_get_thread_limit` (possibly applicable to other similar functions?) in `deduplicateRuntimeCalls` to only perform the deduplication within a more limited scope (perhaps a basic block--or basic blocks--corresponding to the nested scope of the applicable target construct with a thread_limit clause that's in effect?).
Thoughts?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy0Wt1y2zqSfhr6BiUVBUqieOELR7ayrk02rownuXSBRFPCGAQYAJSs8_Rb3SD1Y8ueZM-OzyknEkB0o3--_roZ4b1aG4DrZPYpmd1eiS5srLv-KkJ4uL0qrdxf_xc4SHjumWBeNa0GBi8C_0yymyS9TdKbZJ72_8ePPFOm0p0ElmRL27TjTZLdXVrzQSp7ukq_lQmsEcokfJHwol_JPw1bioRnrRPrRjDbtCwIt4bAwsaBkE9aNSokfDE9PpkWZw-nReuUCTWdzpPZ0kyS7ObSKdkNS_hM4hbcyZco72kN4emVMJTFiyQ7kZLfnt7oN5XO_kBp_h9VunAQOmdYelw_LL5yNv3-vFyOKtu0SoNkpTLC7Rm8QNUFZQ3qtAmh9RgwfJXw1bqqxmsrS6vD2Lp1wld_JXz18K9t9vjTPv66KOgDL03jjg9Mkn2g-1ILgyp8-fLj659cosLnLlzjL_lQOvn1wfz_X2P6wTU-7VmzZ7hfmTWzNQsbYN9aMF8fmG-hYp-XSyYteFpwar3B83HvBhM8-8SqeoxJeX7L3W43ti2Ypu2viGeNNqHRCV_Nxjzhq7jsYcLH8fvsJZuOsixNUz6eYnIfgCK7Y_c1E0P8V9b44LoqsJ0KG_z-5MKs0qLzwJRnYCrbmQAOJEY0XiDuHNHO0VY4dr_8wWpnG1qVIggGZqucNQ2YMNhjDQacCCCZMioooVkQ_hlFKOMDCMk8mt0yYRhhHT4s0P9MQnK3Sm6WyWJiQLKt0B2wEsIOwDBrgAkjSUZcQTOpwxMkkFbjpcbs1HWP-L1tGmuY70p4aR14jzIBr2ei_IQvlv-4S3gx3MVDZY1kldAaNU7m6cVMn6es3DMM7uh9YwPzAA0-U6Jc56AKaFbhGSJDHvcIw8RWKC1KQvyDTgKX9qx1du1Ew1qrDMKJR80Svuiv6SvbwqDp4EjDfnXgKZt4wUQdwJ3e5L2gSObp6xtdDI3xWVZTImN40ynJPB19b4X3SXYb43Vk28DidyNhhN57dXmxUd6DPFtCFXbANmJ7LIK_Bw6fc1lvHvJ_bS5ncrb0tnMVxKRZJNnNFOGAOWiEe8a_9RntOhNUA9H7l_zOJMiu1arCaB-zZPbp29eHSZ4ms1v8cMEYyaxHeMYYW7AkX7LXP__nyvl7NSgKuST4N3-S2d0HCBkNwDYiQmBttbY7jBB4abWIafYW5nvw03rb9H6MzvAJX8UTI-qdlvvsjj1ulGe2DapRf8UEjo8xZSQ5BZXAXDoksDAXnbsTnjlotahAxlgm_AbfaYI1YRi8KB8I9Q2Me8lV1TnPdhvoYSfWNceejd31ol-dcy4WIbFpuoDpP2bfobFblHCIKdrlcZtE8Cv3vZK4CZ-Py8FGUXVnKjLChQuQL5TzgZ7p9a-EQRAqIR4vumAbERTu2KOwszuVUBEc4JfnqO1fFcLTK3qm1ZZQSTAPrcC6wLQqHdb9s_N7ZTzAeADs187dKXSiU-s1OFbbaOeLDkWnWI8GIIYVqwU9jjKqjTBrOEOynxsRUVkRzEVPqxMfitJuD6SchPeR7jfo6_cLA9VSaxG2jUVP78k_tosQT_K2YFBe1Kt36GkOL5lWzzBUNwets7KrwL1C-7Mb3US9vWiAkV3umbQs-lsr2MIArlSEemxm1QaqZwL0DYQNlQ7lWeuoeCjywsZq6Xs-gxDE_wO5gIB2cpd71MB3eBM8RxlmnUTlLJMQwDXKxMBMOD8cknDOWuEC22FnJZkBkEyg6xoUX2u7G3kwXgVFRugtQBJ2ttOSgaG63JFHyDLkeOQghwyI-WC2Qit54BAJ589Ka9Sg9xlyJZR4lIOi4cg0RtEwI1uP8BOaZBTsB3xD-HfZQ8IX1jFL_htAARPRR06GwXo7FK5Sw_fohqXQ-v7WU8gWoxEamU7YWSc9URezj1z2xEQC4b23UK9BXG6dsq7PxYQvKH6Tefr09Ny01ZO_lCfIVmqoXseJ70oPvzq0dTzO1h-lW8-x-xjEIn3wVdQdkMXifV61SypsunJc2SbhKyxE_R-j1tl_EXlbxfMSvpJ5OeFpOi1lLueyELOC15M8zybppCpBykKKejEtzmL4ZnVzv3xEub6rNsc4QL5oEp4HJtoWhIuWVD4MoWOdWisjNGtFqDYIGJT9VCgO9IMAYMlAocuOrQD9Hv2Nmxaz6aLMp2KRTUCk6Ww-LabzWV7kdTYrIJ3xSZ2VC1FcluRgq2DnT-v67bwosvRiIR-6hXjPE2rlTyJnrW0p9OB10zWUC0ZeCIl-kYJmzH4izhgmtLdMSYyGes-EW3cYGQNPcMDgV6e2QlNq295XEaH6kERhxIvD0M-M2ReBNHsXRRwhKeF8HZREHDhKKoUHyWx0bk_Zn8EZ0CfJCqE6Q_N7Q1imqk4Lh56mPnKenljpJI19whergQ0kfL7CJ_r14ft7U2NRmn9f3ePqDyqRCb_5Dq3-ITRLsltmOq3b4JA-zlNCDs6HgFFlwlePThiPiIpE7f7hG9I1qsbf2jCu2jbhfMyOkU9JiTgK8rymlFZpTIWdicUPCPDBVNTePHUent4bhCElReaCd_mnx1jJbtn31f0YP3wVbTL7RBaY3Z5RYFUjLOGesVd_QeTJLOGf8OuDDfjk3AzYujN-mCBFQtyTjFpoD8dRziVyfF_32ParUw7kuzWIekgCAQJejCtC2xbcRrT-WG-gP0_DVpjAwAS3pwaxpoAmRuO7trUuYHe1wivEAvDt6wNGgGsIO5ai2lCXld18ECbn6IqhG6G4P4sOGX9f3fshWt6Pz8Euf4xMpbYYeLzM58BTEPmC50Uq8gVUs4Wc1cW0gsVCTPKsFGVen5zzmwGbfZlM5_mp11Y_73_20euoHRbmnJcmfEH9NLqgtloqs0ZDE6R4gMYzbxvY4cfGOortSnkgR42R7CyPvAsJxuv4J2ZDZn4kZPuCtW5J3TlamjCKhCXz9GQHrvXDGnEY9CTzVPl7E2ANbtkrfffSuliBe2Q5rx7Y1pKNsK1dOhCHQtpjbuxOEW-fAojGv4LkfvkNgQnBqbILkfDaLmhlMCcGDPz9kjJJ57xY_P1il_J8LsvJdDLPyyKbLop0mnGeLqaQ19VEpNOqyvl0xl-PlQ75p-1aVayEaB_lWUl9-kVEoIhhaHpE3eXnaOI-SWIqriH8T9dEj3rcubLukYx5qxxUiBmIWwcoWloJn8GcIv_yM2F_FPbt68MdjVqR9R2OwG23CV-eYKMJGX8KtAC16HT4IfQZ6I3xv-PH4RYnwXdWP06Bd9h7vNm7WyNG3443wsdo99_q_1ZGJtnyYi5kd4dxBztM9iNGRwP0deLmzaOkwu14DeEfyqw1DAf-GznnM5XXt3_z4CjJ7tYQTlP00jnx2ie78LnLScsXy88rVHtpTYCXk3HPeYkaDiUDDCegkuebhp9XstcQ_p5wxo5xRKYZzulPv3sJxD8umOPjVyKRN3pWd456FQc1gQgWn1PUWkVU7tvq6CJEKHMMQupcPXm_nxcceJl_2y28A0Nplp2rJ0IPB16ofqweNRaxFNAEW2hWiVhNn4ZC8nSsJEvmbV98GrHvW3gkpbZphaPm78AcsHzQpMGMhpOosbrQJiKN6Gf1-pxrvh0UUAOtbT_-eV3szquhqpGw7FDdYJmDLbjwtlOhJjqWtrM1mhxao_esBDh5P4CVwbrjiOCEK6O-sYPyAGxjd0jNkYgfJ-9ok8NU7EikD2Y_Td3oL6Hxmz2r7BaQqvXjMFFbF3tJi3H2tuxHkiuq0NEU7bCBJMUOz_dvBbCx7Zr-bYu1rwc43wxgoyIOlKBWL7EnVB7PpZTLbtgjVuSP3kzAi6iC3rOhBx_2NeKl34sV-804LApqQMRIaZ2ogqqIAJFfG7ulJogoxr8fKSwZUHevtqD3fVgMPELIvg84Xo869zgzO4wc3tUddcKAjaQp_N6Mg9KqfxfTWu8VRhVJP6izE3j1IYb6fjyOUWlWTGB18z5bor8HBKAbY2xAvh5HlsPk6xjEmA8nZJnQ4VCU_w6pAb4osnk9n2R5NRNFNa3zeZ4Bny6qScFFsZhMZtOsnCc8k6quRxIm6aRacFHX83nJSym4nNSzrJaQSQl5mcosT-vJtJ7UdZaXeTEpKsgWs2zCs1TW3yf8DANvTBwjYSiPRpTyEQEPNserDzZGlBuN4mypJADpEXJECfRO-n4Q_Alf9IL2mH59OBDjJLW8ahQedaSdp13S--0LnYAw1YKjkeJbJEPAocE3IT1pBLJ_aYdq9c2cYKXwSBu1rZ5HI-tOP_vRiEzjW-yVMRFie2zAHw_rW4TT6_3Bq99wHH33KRpNMD6vZLZbbwJZh764kteZLLJCXMH1JE8nRZoV-eJqc13OZAG1mMlUcJ6XdVFnU5mmdVov0nwqF1fqmqc8m_B0zrN0PsnHOaRlOZM1gICJ5LNkmkIjlD4U1yvlfQfX-XSeZ1dalKA9_VMazg3sGC3Su7LbK3dNaVB2a59MU40N9PGUoIKmf4Nz5AX_NN52Z5nXv566FE5XndPXf5yIpB52m6T-_wYAAP__QoLFCg">