<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/63606>63606</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [SelectionDAG] Poor codegen involving CONCAT_VECTORS
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:WebAssembly,
            llvm:codegen
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          tlively
      </td>
    </tr>
</table>

<pre>
    In https://reviews.llvm.org/D154124, I have a test case with this LLVM IR involving a vector reduction:

```llvm
target triple = "wasm32-unknown-emscripten"

declare i1 @llvm.vector.reduce.or.v64i1(<64 x i1>)

define i1 @test_any_v64i8(<64 x i8> %x) {
  %bits = trunc <64 x i8> %x to <64 x i1>
  %ret = call i1 @llvm.vector.reduce.or.v64i1(<64 x i1> %bits)
  ret i1 %ret
}
```

When compiling to WebAssembly (with SIMD enabled: `llc -mattr=+simd128`), this produces the following initial SelectionDAG:

```
    t0: ch,glue = EntryToken
              t2: v16i8 = WebAssemblyISD::ARGUMENT TargetConstant:i32<0>
              t4: v16i8 = WebAssemblyISD::ARGUMENT TargetConstant:i32<1>
              t6: v16i8 = WebAssemblyISD::ARGUMENT TargetConstant:i32<2>
              t8: v16i8 = WebAssemblyISD::ARGUMENT TargetConstant:i32<3>
            t9: v64i8 = concat_vectors t2, t4, t6, t8
          t10: v64i1 = truncate t9
        t11: i64 = bitcast t10
      t14: i1 = setcc t11, Constant:i64<0>, setne:ch
    t15: i32 = any_extend t14
  t16: ch = WebAssemblyISD::RETURN t0, t15

```

This all looks reasonable so far, but subsequent type legalization starts down a path of terrible codegen choices culminating 300+ instructions to implement this function. I could do something custom to fix it in the WebAssembly backend, but this looks like it may be a more target-independent problem. Would it be beneficial for the target-independent selection DAG code to hoist the `concat_vectors` above the rest of the instructions here? If it did, then we could efficiently compute everything independently on the four original input vectors, then efficiently combine the results at the end.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyklt9v4roSx_8a8zIqSuyQDQ88UCgrpP1x1bK7j5XjDMS3js21J1DuX39kB7awZ3uko62qICUzn5nxfD22DEHvLOKMTe7ZZDmSPbXOz8joA5rTqHbNaba20BLtAxNzxleMrzweNB7D2JhDN3Z-x_hqmU-KnBeML2ANrTwgSCAMBEoGhKOmFqjVAT59-v4Z1o-g7cGZg7Y7kHBARc6Dx6ZXpJ2NcbIlyy7PMhv-Y7jhFUm_QwLyem8QmFgC4_woQyf4XW9frDvaO-yC8npPaBnn17wGlZEeQefAigQdDxmMUwY4dn58KAudM14xsSgLeAWdM_HA-PSWs9X2gom1Pkt7eo6e1bVnxcQDMD55ZXwK7MP94AzxVa0ppOzJ91bB312AHNym8ObskZKvksb8-1ou4X_WBBCBkZPQ50I_LH_pwfUC_GjRgnLdXpvYSHLwA-t5CNjV5gSMV6ntT-vPS0Ara4MNE3NIjVRw10kiz8SS8fuguybnVcTzaZRQksreu1hCAGoRts4Yd4xhtNWkpYEnNJjkspx_fE8xl9IAKIuxVcv4Ymf6QTMPlvxp417Qvtm9_RGPHoe81FWyvqpt_bSMEcV8_vjx2-eHLxvYJEEunA0kLTEx14IzsciuWnbDLv6cnb_HLv-czd9jV3_OFr9l0zSR4-YZVO2skvQ8iDnEZkRZpPlCZXpWv0Iozy6M_G1XScIIv7GlPI-WuiySXa1JyUAJcGVHeerSGRaQlEqefAHXRZXFpdF8Ea0sMjFX7ZX08kniCJ5AcUjgK6FtUoSzGeXlINB31_XxYfPt8UsUcqw-n_yD4IfnJm6iOByMcy8BPMrg0jaE4GArfeTUPUHo64D_69ES0GmPYHAnjf6_jHsLAklPARp3tCBhL6kFtwVC73UkKdfgLk6B1um4VVVvOm0lxY0qsozxe9A2kB8Ge4gzQnd7g12KFhPc9jZ9G8MalOtNA42D4DqkNkJUH8h10W-rX0ETaJvmwfWkqaV6Qdtc6knYoWajXzA6dfIEdTySOucRhtPjTtsG92ibmMreu9pgN4YfKQVN0bxGi1ut4rDZOp_C_sY1XOYQLOcf04LEbFuno6JajPPuVsyszEDW7oDps4-HZFzSFm-XqkWPTKxgvY3pNLoZBiNaOOJ5pXAb00NL5pTmcE8IeEB_GtbuKktzAmfPg7T34LzeaSsNaLvvCS6JXQL8wq3jMXfOtTcUQA6VoW3Go2YmmqmYyhHO8rKqslJkfDJqZ6VqKoW8qitVFBn_UJZVwbdVWU_EtCxEPdIznnGRlXyaV1wUk3GlaszLvFJVJvmkLFmRYSe1-XnNGOkQepyVoszKkZE1mpCuLZxfFCDmV7qIxz5fMM7TvUHMz1qNryfLkZ_F13d1vwvx5NSB3u4zI9Jk0pXo5pCZLOE_zvmfon-7wyy-flnMN8_fHxabr49Po96b2e2Naaep7euxch3jq5TP8HO39-6_qIjxVSouML5K9f0VAAD__8657p0">