<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/122891>122891</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Running lowermodulelds multiple times needs to be thinlto only, breaking fortran at present
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
arsenm,
ergawy,
JonChesterfield,
Pierre-vh,
jhuber6
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
JonChesterfield
</td>
</tr>
</table>
<pre>
As of https://github.com/llvm/llvm-project/pull/85626 and https://github.com/llvm/llvm-project/pull/75333 the lowermodulelds codegen pass is run as part of LTO. That doesn't work - the pass was designed to run once as part of codegen where it can globally allocate variables to the LDS space.
There is a narrow exemption carved out to unblock thinlto - provided the IR module is carved up into independent modules prior to codegen, no calls or references between them, running the allocator on each subgraph works ok and a second run during codegen over the entire module is a no-op. There's a partial check to notice when that invariant is breached - if the input IR has some allocated variables and some non-allocated variables, the pass aborts. That's the error message which @ergawy reported to me for Fortran.
I think the pass should be added to the thinlto pipeline and removed from the full lto pipeline, and generally only run once during codegen except for the thinlto case.
Arguments could be made that the pass should cope with being run multiple times on various bits of IR and spliced together. The problem with running on subgraphs is the reachable test can't be done - we don't know if a call to an external function accesses some visible LDS, or if a function can be called by an external kernel. A correct lowering then looks like a lot of table lookups and overallocation, at which point the user is likely to discover they've run out of LDS and/or have occupancy problems.
Tagging Joseph as well since I think he moved the openmp pipeline to use LTO at some point, in which case that's also vulnerable to this pattern. I suspect we're getting by at present because O0 doesn't get much use and most compilation flows are straightforward, e.g. they don't run opt on bits of IR manually.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJycVs1u4zgTfBr50rBh0z9JDj7km8BABgPMh5l5AYpsSxxTbKJJ2uO3XzQlO95gL7unxBJZrK6uLkqn5LqAuG-2_2uU0pwwDI1SjfrSKIXc6cv1_vMrhS89pox8dOjt_fn_HTLj_Nzfn_zuS4u8k9_bt5kuuSfef9o-a8le968J6Ah9zjE169dGHRp16FzuS7swNDTq4P359mcemX6jyY06xOJ9ow7P253agQ72PyM8bdfrNeQewdMFeSBbPHqbwJDFDgNEnRK4BFwC6ARRcxbG3359X8CvXmewhCk06inDhfgE8wpWd110AotVYAuZKgQFg484t2MuPTKCy2B0gM5Tq72_gvaejM4IZ81Otx6T4MgB395-Qora4KJZvjbL11_j_gQagmamC-AfHGJ2FMBoPqMFKll2l9B6MifIvQs-E8whMp2dFY49wvsPGEUQsGlnieBCJnDBYsRgMeRpUYLIjlhwp0oa9QUCgdHeJyAGxiMyBoMJWswXxCDHDLKMSwgudPXYqVJioACoTQ-ptB3r2FdZE9Cp9llDQkPBVjFtYdl_05DOyBUMQ3aMD3VoCDSnKB1DxkY9ySNpgdMeTI8iB0Gg7AxKK4SjzuBC1T1kwWhZaKGFObhjPcaFWLII1usEiYZ7FWgfGias68tAYf4PC0SJu2V0S5zT6KxKs5bDTAwDpqQ7oedMD81mOU4nMEbiPDpsQDgSw4E4sw6TNd5rq08fh6SeirfQImhrx43y7maI6CJ6F7AyZxxILHBkGuqqY_EeHpcJf1nZYUCurqXgrx9m_9Qk_GMw5krz8VCj083Kr9yVAUOWGZx4Dtri2JLPRRiKCBeXe2hRjpFjh-Kzix4huwGTGErEppKgdbnmzfuPsS3RO1MF6DD3yNUfMg6tx2FEvXmUwt2QNQ6ERzWE9BAypjq5NQZaBEsBYQ6X-l99eAp0Ed_oOhkiuRYtMnLQHo4lmDqq2hhMCSc7nV1yAv_t7aeoTDwi3FdLVrRYEdFCe_0b5gk5oF_AKxhiRpPHhJsGLoAnOiXw7oSgwVNNo1yrkTcljs6VmZpM66gOt86TBSO5MDakJGQRRcD8VYqzLpnbOF4b9XTG0RBlDM-3nwLeqAMx9PqMQMaUqIO53tRPt1zTXSeUv1LC2EtyXtB7SE6sdXN2L7N-ngKMIoYhfrhYIi-hBLZQr7pW5lKLC1MtYr9qsDEbfCI4Fy-Gru2VAXES2lnEXcA7pJKiaHqRNGGEDnMWntKEDJExSUi2aLQc_n35cE10mGEopq-0ROOBxD00ROerynD0dEmgGSFl1q7r85H4otkKZ1x0iyrr3VtV2ZjFow8GH3QoMo6Lmd2v7cv6Rc9wv3pa7563291Gzfq9Wi539mnzop9Wa6PVeql2yhzXx-XSqPZ5uZq5vVqq7XK12qxW281SLazZquVmtUL1vDN2s5IUGrTzC7lcF8TdzKVUcL9S6vllNfO6RZ-mb4uAF6hvp68C3tcbuS1dajZL71JOHzDZZY_7H9PwfbqaP813QLT1Xmw_4kQiSMSS0D4JxHGMxIfmzAr7_b_-bKgVpEYdphLPe_VXAAAA__9Slj_z">