<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/117078>117078</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Register allocation results in many copies of undef lanes for cross class copy
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:AMDGPU,
            llvm:regalloc
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          arsenm
      </td>
    </tr>
</table>

<pre>
    This testcase demonstrates the net result of the register allocation pipeline is to produce 14 instructions to copy undefined lanes.

[many-copies-of-undef-lanes.ll.zip](https://github.com/user-attachments/files/17837630/many-copies-of-undef-lanes.ll.zip)


```
# Output of -stop-after=register-coalescer
---
name: copies_of_undef_lanes_tuple_copy
tracksRegLiveness: true
machineFunctionInfo:
  isEntryFunction: true
 stackPtrOffsetReg: '$sgpr32'
  occupancy:       8
body: |
  bb.0:
    undef %41.sub0:sgpr_64 = S_MOV_B32 0
    undef %42.sub9:areg_512_align2 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
 %42.sub8:areg_512_align2 = V_ACCVGPR_WRITE_B32_e64 0, implicit $exec
 %41.sub1:sgpr_64 = COPY %41.sub0
    %43:vreg_64_align2 = COPY %41
 %47:sreg_64 = S_AND_B64 $exec, -1, implicit-def dead $scc
  
 bb.1:
    undef %67.sub1:vreg_64_align2 = COPY %42.sub9
 %67.sub0:vreg_64_align2 = COPY %42.sub8
    undef %52.sub0_sub1:vreg_512_align2 = nofpexcept V_PK_MUL_F32 8, %67, 0, 0, 0, 0, 0, 0, 0, implicit $mode, implicit $exec
    %42:areg_512_align2 = COPY %52
    %42:areg_512_align2 = V_MFMA_F32_32X32X8F16_mac_e64 %43, %43, %42, 0, 0, 0, implicit $mode, implicit $exec
    $vcc = COPY %47
 S_CBRANCH_VCCNZ %bb.1, implicit killed $vcc
    S_BRANCH %bb.2
  
 bb.2:
    S_ENDPGM 0

...
```

The problem is `%42:areg_512_align2 = COPY %52`. This expands to a set of 16 instructions to copy every lane. This could be rewritten to `undef %42.sub0_sub1:areg_512_align2 = COPY %52`, which should result in only 2 instructions for the live lanes.

In the general case, we should expand partially undefined copies similar to what SplitKit does, with a sequence of copies for the minimum set of live lanes. I'm not sure where this should go; I guess the coalescer could do it?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJysVltz2ygU_jX4hZFGAtmyH_zg2HU30-YySZvt7osGoSOJDQItoKTeX78DkhOn6237UI8HIc6F71zEB7NWNApgjeYXaL6bscG12qyZsaC6Wamrw_pTKyx2YB1nFnAFnVbWGebAYtcCVuCwATtIh3UdVgw0wjowmEmpOXNCK9yLHqRQgL0vjXujq4EDTjMsvLeBe60g4ro_4EFVUAsFFZZMgY1RskPJZhrnFx1Th4jrXoCNdB0F7WjUlDL-R_RovkNk2TrXW0Q3iOwR2TfCtUMZc90hsh8smIg5x3jbgXIWkX0tJPhnmi9pvqAJIvsf70NWb6CN4yKZ_uMrofhmcP0QEhRZp_uI1Q4MortjqiKumQTLwYw2URSNE8U6QHSDRxCFrosAogggCjf0EgqfsVHbGcYf7R00H8UTKLA-eOzMAKO4Y7wVCvaDCum-VLX22QkyjIV9p5w5HKVvTbF1jD_eOnNT1xbcHTRejkiOSGab3lDi55MnzfnQM8UPXmf8LUeZb6hgmG-PymUZJyco8Fh7jMg8S2M7lF7odygWGUZ0h--Lq5uH4oISnJyzId5mheiGGWiKeUoKJkWjSLB9KDbb7cP727vi97vLT--8lwIWGU4Q2WLR9VJw4TAiGXwFPnl_cbr8xU5DdOk30W1vbv84jf0lQr9GEd08eQSL7BTAq82J99w7HnWntG2ud8WFf5uQkC2O0lOMkc9hBazyKpYfseLpWZZxer5Qi_wYyvfQTZV5gThaJT9jtTyz6TxIkuJ0528qo3Tdw1cOvcMPxe2H4urzx2JPCV76qAMCP0l-PJxWsdMVfKeyU6nI_zTLMa45-Tn9h-Jqf7XxqAtKvlDyZblPF0XHeGixsSnGYF4n5BdEkD1x_rYQx6_7vthe3G2ut78VD9vt9Z9eFjrj1OGjkBKqyc2r2_titJxsyH8bjLxpsPvi3fXu9v3V8UsfxziOz5-zYfzUgmeXUkLnucZLf64ciyTGgerga89UFbiIYQvh1E4X53kKnsAcAkdNxlwPssKlJ8FnI5wD5XXRIvnmhHrp2x-i8ol9bgVvsW2D84lshcJayQMmb5HV2gQSluIJzpDnpQrSBhQYJrGn9LABHL2P0eOeGSeYlKdUPHIQtqITkhkf13PLHL7vpXAfhMOV9vy5xc_CtSF1fw-gOPj8TaZHcJ1Qohu6Y3ZPsOJLRPIOK-2wHQzg5xYMYOdTOwFsNKIX-BI3A9jxBvLCnVP2K42FQ3Q_q9a0WtEVm8E6zSlJM5rPV7N2na1KyvOKpKtVmS0XJK_qFOiCp7BK8lXOZ2JNEpKlKUkITQhdxTRNsiTNeZZnFZQpoCyBjgkZS_nUxdo0M2HtAOs0zZN8OZOsBGnDnYqQkvFHUBWim83V7v3tZ0T8B4oI8baIbgw04aLk1-e7mVn79agcGouyRArr7OsuTjgJ67szF6yxK6xvC39rOWZc19OJGdIbCsCNthZzyfyo-8NsMHL9natSgDk-ot7ov4A7RPYh3nBdGkN-WpN_AwAA___Akhsz">