<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/117078>117078</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Register allocation results in many copies of undef lanes for cross class copy
</td>
</tr>
<tr>
<th>Labels</th>
<td>
backend:AMDGPU,
llvm:regalloc
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
arsenm
</td>
</tr>
</table>
<pre>
This testcase demonstrates the net result of the register allocation pipeline is to produce 14 instructions to copy undefined lanes.
[many-copies-of-undef-lanes.ll.zip](https://github.com/user-attachments/files/17837630/many-copies-of-undef-lanes.ll.zip)
```
# Output of -stop-after=register-coalescer
---
name: copies_of_undef_lanes_tuple_copy
tracksRegLiveness: true
machineFunctionInfo:
isEntryFunction: true
stackPtrOffsetReg: '$sgpr32'
occupancy: 8
body: |
bb.0:
undef %41.sub0:sgpr_64 = S_MOV_B32 0
undef %42.sub9:areg_512_align2 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
%42.sub8:areg_512_align2 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
%41.sub1:sgpr_64 = COPY %41.sub0
%43:vreg_64_align2 = COPY %41
%47:sreg_64 = S_AND_B64 $exec, -1, implicit-def dead $scc
bb.1:
undef %67.sub1:vreg_64_align2 = COPY %42.sub9
%67.sub0:vreg_64_align2 = COPY %42.sub8
undef %52.sub0_sub1:vreg_512_align2 = nofpexcept V_PK_MUL_F32 8, %67, 0, 0, 0, 0, 0, 0, 0, implicit $mode, implicit $exec
%42:areg_512_align2 = COPY %52
%42:areg_512_align2 = V_MFMA_F32_32X32X8F16_mac_e64 %43, %43, %42, 0, 0, 0, implicit $mode, implicit $exec
$vcc = COPY %47
S_CBRANCH_VCCNZ %bb.1, implicit killed $vcc
S_BRANCH %bb.2
bb.2:
S_ENDPGM 0
...
```
The problem is `%42:areg_512_align2 = COPY %52`. This expands to a set of 16 instructions to copy every lane. This could be rewritten to `undef %42.sub0_sub1:areg_512_align2 = COPY %52`, which should result in only 2 instructions for the live lanes.
In the general case, we should expand partially undefined copies similar to what SplitKit does, with a sequence of copies for the minimum set of live lanes. I'm not sure where this should go; I guess the coalescer could do it?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJysVltz2ygU_jX4hZFGAtmyH_zg2HU30-YySZvt7osGoSOJDQItoKTeX78DkhOn6237UI8HIc6F71zEB7NWNApgjeYXaL6bscG12qyZsaC6Wamrw_pTKyx2YB1nFnAFnVbWGebAYtcCVuCwATtIh3UdVgw0wjowmEmpOXNCK9yLHqRQgL0vjXujq4EDTjMsvLeBe60g4ro_4EFVUAsFFZZMgY1RskPJZhrnFx1Th4jrXoCNdB0F7WjUlDL-R_RovkNk2TrXW0Q3iOwR2TfCtUMZc90hsh8smIg5x3jbgXIWkX0tJPhnmi9pvqAJIvsf70NWb6CN4yKZ_uMrofhmcP0QEhRZp_uI1Q4MortjqiKumQTLwYw2URSNE8U6QHSDRxCFrosAogggCjf0EgqfsVHbGcYf7R00H8UTKLA-eOzMAKO4Y7wVCvaDCum-VLX22QkyjIV9p5w5HKVvTbF1jD_eOnNT1xbcHTRejkiOSGab3lDi55MnzfnQM8UPXmf8LUeZb6hgmG-PymUZJyco8Fh7jMg8S2M7lF7odygWGUZ0h--Lq5uH4oISnJyzId5mheiGGWiKeUoKJkWjSLB9KDbb7cP727vi97vLT--8lwIWGU4Q2WLR9VJw4TAiGXwFPnl_cbr8xU5DdOk30W1vbv84jf0lQr9GEd08eQSL7BTAq82J99w7HnWntG2ud8WFf5uQkC2O0lOMkc9hBazyKpYfseLpWZZxer5Qi_wYyvfQTZV5gThaJT9jtTyz6TxIkuJ0528qo3Tdw1cOvcMPxe2H4urzx2JPCV76qAMCP0l-PJxWsdMVfKeyU6nI_zTLMa45-Tn9h-Jqf7XxqAtKvlDyZblPF0XHeGixsSnGYF4n5BdEkD1x_rYQx6_7vthe3G2ut78VD9vt9Z9eFjrj1OGjkBKqyc2r2_titJxsyH8bjLxpsPvi3fXu9v3V8UsfxziOz5-zYfzUgmeXUkLnucZLf64ciyTGgerga89UFbiIYQvh1E4X53kKnsAcAkdNxlwPssKlJ8FnI5wD5XXRIvnmhHrp2x-i8ol9bgVvsW2D84lshcJayQMmb5HV2gQSluIJzpDnpQrSBhQYJrGn9LABHL2P0eOeGSeYlKdUPHIQtqITkhkf13PLHL7vpXAfhMOV9vy5xc_CtSF1fw-gOPj8TaZHcJ1Qohu6Y3ZPsOJLRPIOK-2wHQzg5xYMYOdTOwFsNKIX-BI3A9jxBvLCnVP2K42FQ3Q_q9a0WtEVm8E6zSlJM5rPV7N2na1KyvOKpKtVmS0XJK_qFOiCp7BK8lXOZ2JNEpKlKUkITQhdxTRNsiTNeZZnFZQpoCyBjgkZS_nUxdo0M2HtAOs0zZN8OZOsBGnDnYqQkvFHUBWim83V7v3tZ0T8B4oI8baIbgw04aLk1-e7mVn79agcGouyRArr7OsuTjgJ67szF6yxK6xvC39rOWZc19OJGdIbCsCNthZzyfyo-8NsMHL9natSgDk-ot7ov4A7RPYh3nBdGkN-WpN_AwAA___Akhsz">