<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/61903>61903</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Use in generated IR instead of inserting constant value of 32
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
soconne
</td>
</tr>
</table>
<pre>
Currently the LLVM IR generated from calls to warpSize in CUDA kernels defaults to a constant value of 32. The comment in _clang_cuda_builtin_vars.h indicates there is no built-in to retrieve it.
https://github.com/llvm/llvm-project/blob/main/clang/lib/Headers/__clang_cuda_builtin_vars.h#L114
However ptx has a dedicated register: llvm.nvvm.read.ptx.sreg.warpsize().
https://llvm.org/docs/NVPTXUsage.html#llvm-nvvm-ptr-to-gen-intrinsics
While current NVIDIA GPU architectures default to a warpsize of 32 lanes, this may not always be the case, which may break existing compiled code.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUk0GP2zYQhX8NdRlYoEnJax10cHfhZoFtEKTxtjeDIsfSNBRpkCN7nV9fSO42AdoeepEA6XH4vZk3JmfqA2Ir6p9E_VSYiYeY2hxtDAGLLrpb-zilhIH9DXhAeHl5_QWeP0OPAZNhdHBKcQRrvM_AEa4mnX-lbwgU4PHwtIOvmAL6DA5PZvK8iAzYGDKbwHAxfkKIJ9CqBPgyINg4jhh4LnC03oT-aCdnjt1EnikcLyblcgAKjqxhzDNVQqAMIcIiWlGYL0nIifCCQFwK-STkbmA-Z6F3Qu2F2vfEw9SVNo5C7b2_vL9W5xT_QMtC7TsfO6H2o6Eg1H6BmUU0f_yAxmHKQu2P_40plH5Zr6v79ffnh3jFCyY48xsMJoMBh3crDhL2lBmT0DuYUcpwuYxlQuPKM7-VOWFfzg3O9A2F2grV_Kuz5WhMM6uLdkb8-Prpy--HbHosBx69UHpxOpdfnTmtOK56DCsKnChksvlH4t8G8gj2HgP4-Pr89LyDnz8dwCQ7EKPlKeHfA77P953yPlnwJmAW6hF4oAyjuUGIDMZfzS1Dh0uyrMk4S64D2WHRdAnNV8A3ykyhn4NxJo8ObHRY_khYuFa7RjemwHa92cqq0s3DphhaWTeNrLdd1TRSmm5bua2WpwepldN4qruCWiWVlpVUa103elNqWdcP-mFT1Y2W9boWlcTRkC_fm1pQzhO2m3UjdeFNhz4v26NUwCssP4VS8zKldmlyN_VZVNJT5vy9ChN7bA8ZYQ769216_gwUMqNxc-soZEx_mf_nwhRT8u3_DvWCOIdisfBnAAAA__8g6VYs">