<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/122706>122706</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Global ISel packing earlier with many masks
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            llvm:globalisel,
            mlir:amdgpu
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          tpopp
      </td>
    </tr>
</table>

<pre>
    I've reduced the function a lot to hopefully make the information more useful, but in the full function, this has been more noticeably excessive. This is using commands like `llc -O3 -march=amdgcn -mcpu=gfx942  -mtriple amdgcn-amd-hmcsa -global-isel={true,false}`.

Both cases pack inputs and use `v_pk_fma_f16` instructions, but global isel will put them early and mask the values to get high/low words for various other instructions, resulting in a lot of extra masking computations, while sd-isel inserts them just before the fma calls. I haven't yet seen if there is some heuristic that could be tweaked to tradeoff the cost of extra masking.

```
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-p7:160:256:256:32-p8:128:128-p9:192:256:256:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:7:8:9"
target triple = "amdgcn-amd-amdhsa"

define amdgpu_kernel void @"main$async_dispatch_157_elementwise_2x1024x5120_f16xf16xf16xf32xi8"(<4 x half> %i37) {
bb:
  %i53 = fcmp olt <4 x half> %i37, zeroinitializer
 %i54 = select <4 x i1> %i53, <4 x half> zeroinitializer, <4 x half> splat (half 0xH9AC3)
  %i55 = select <4 x i1> %i53, <4 x half> splat (half 0xH3C00), <4 x half> zeroinitializer
  %i57 = select <4 x i1> %i53, <4 x half> zeroinitializer, <4 x half> splat (half 0xH95CA)
  %i59 = select <4 x i1> %i53, <4 x half> zeroinitializer, <4 x half> splat (half 0xH7E00)
  %i63 = select <4 x i1> %i53, <4 x half> zeroinitializer, <4 x half> splat (half 0xH3C00)
  %i66 = tail call <4 x half> @llvm.fma.v4f16(<4 x half> zeroinitializer, <4 x half> %i59, <4 x half> %i57)
  %i67 = tail call <4 x half> @llvm.fma.v4f16(<4 x half> zeroinitializer, <4 x half> %i66, <4 x half> %i55)
  %i68 = tail call <4 x half> @llvm.fma.v4f16(<4 x half> zeroinitializer, <4 x half> %i67, <4 x half> %i54)
  %i74 = fadd <4 x half> %i63, %i68
  %i87 = tail call <4 x half> @llvm.roundeven.v4f16(<4 x half> %i74)
  %.inv = fcmp oge <4 x half> %i87, splat (half 0xHD800)
  %i88 = select <4 x i1> %.inv, <4 x half> %i87, <4 x half> splat (half 0xHD800)
 %i90 = fptosi <4 x half> %i88 to <4 x i8>
  store <4 x i8> %i90, ptr addrspace(1) null, align 1
  ret void
}

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare <4 x half> @llvm.roundeven.v4f16(<4 x half>) #0

; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare <4 x half> @llvm.fma.v4f16(<4 x half>, <4 x half>, <4 x half>) #0

; uselistorder directives
uselistorder ptr @llvm.fma.v4f16, { 2, 1, 0 }

attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
```

[reduced.gisel.txt](https://github.com/user-attachments/files/18397297/reduced.gisel.txt)
[reduced.sdisel.txt](https://github.com/user-attachments/files/18397295/reduced.sdisel.txt)
[reduced.txt](https://github.com/user-attachments/files/18397296/reduced.txt)
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJzMV01v4zgS_TX0pSBDIvV58EGxO7N92sPsPaClksUJRQokJdvz6xekbLcTJz29QCNYIJEsseq9V1WkWOTWioNC3JDsiWS7FZ9cr83GjXocV3vdnjffCS1mBIPt1GALrkfoJtU4oRVwkNqB09DrEbtJyjMM_BWDkVCdNgMPdoM2CJP1JoRuYT85EOoCJeUNz4-5XljouYU94sVRaSca5Ht5Bjw1aK2YcQ3_8YbCwmSFOkCjh4Gr1oIUrwgkj6VsIPo3g2jgpukJ2_GhPTQKoqEZJ8J2h-5UpRQgGpwRo0RYxiM-tFE_NJZDdJB6z2UkLErCdqR4cmZCQrcdlxZJsSN5vCZxTeL6SbseGm7RwsibVxBqnJwFrloftZczv4yvL93AX7okJ3kMQllnphC1vaZk4QPPB0chJYyT80kaALmR5wA3cPsaEjdzOaH1uT-gg14cekKfpT7CUZvWQqcNzNwIPVnQrkfzwGjQTtL53IlrIXUHeHKGB5ZLVsfJ8ZvPsRcSwbYhJx4RjbOLxL8m62CPnS9YKOzAoeFS2jV8h57PqAgtHJzRgfWVFZ03M-hLaPWA0ONkhHWiAddzB42eZAt7BHdE_uonngZneIu6C57QaPuo-FIPkseXv7h23PgMtdxxyc96ckDYDgilGI2E1XkaLtGY3D9Qwmq2XKKR3T-k92bZ_Uh-_1AQVid5TFhNs_x29SOlH6GXazRW_kdFHw3FlWVO8gAWzTRdhuYrzZyWF5sqvwDON7BoviFGc5b4t1lCozmJA46_RTONA4S_RSrA5mn0J6NRnUV_JJEShNU-Fm9UEUp_JPSybC7JvFs9fGh7yxdbEtctdkIty2ucXl7RKJQwa9ECSWNC6cCFIjTl9qyal1bYkbumf0my4gUlDqjcUVh8oSev95QlNPZL6HT7Z_QkSk_mU7pN4QQ9lx1h34DQTLCC0ApI8UTier8nzAuCMJKxIL1rhhG09JPiQ-ct_I1GCyWc4FL8jcYDBP80-FuU2Ny8RXL1zZj3fQf6HurRwo6SOyC09C8gPv2rqreM0OqH7Ox_p30AZds49qD_rPBGW3xNtNm2fhNt9SW0xbclH1fanH0J7bUMN9o80DouZPh0PszJNJZyHtbdwNdz6jeShzn_zyqWtH42ULzRU3yRnjz_TE_2Rk_5VXqKz_Sk93qK5RPQ8bb9EGWZJkH4zaf8xZwaPakWZ1SfRbIIuJOzFmq--6Qd8CNNZYjsYSbuyrczsSx_tgA80ycJKj_K3M_ovFMVL7pHp634ELX0u_9VRknYt6DUOt9s3L--wHkNozPA29bYkTdIaJn4jUBNMjSgXIqDgiSgGHRhO_K7VbG79A_sCZ6vTW7tnLGE1aC0r9jeN3hKdwZ9Z-p3LVB6UkehWrAjNpPkju8lhhbOoJuMggEHbc6ElkorXCJvsZHcPBbpF-sftjXK4v8XvZ-vuccJ8dGb99FMFqXwBW7RQCsMNk7MaElcvxnxVf5AwtZv-UD9j8RfYriVljtnxH5yaAPj0r8UT78xWVeuux7U_2VPlyPU-uC757U7OZLt_KpwbvT1IvSZ0OeDcP20Xzd6IPR5smgi7hxvet8JWUKfOyHR35OSVQWtCkKfH3FDxe4YbfsbKbM7yjvg95y_hyy_I1tYVu2GtRWr-Ao3ScHyokirMl31mw6TnBa0TUrOsiRjeUV5nPEsbqsmTThfiQ2NaRYnCUsYrVK2LtsuqZqkKlPM46alJI1x4EKuw3zS5rAS1k64SSgt4nwl-R6lDSdlSr0JYfVyZgtHRN-Dbn1DK4UhrF4aXv82263MxttH--lg_XQV1tkfJE44iZs_ltPf9z9RhkOkP3_5c59AA0fhehi4Oodzjl1NRm5-ktagbblFo9F_YeMIfQ6xhNQu4cwb-t8AAAD___V9zXs">