<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/54630>54630</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Missed vectorization in ldc2.
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          bartek-siudeja
      </td>
    </tr>
</table>

<pre>
    Cross-posting two issues:
https://github.com/ldc-developers/ldc/issues/3950
https://github.com/AuburnSounds/intel-intrinsics/issues/86
I am not really sure where do these belong, as I have practically zero experience with LLVM and compiler internals. But the pattern seems similar in both. And LLVM was mentioned in both.

A godbolt sample for the second case: https://godbolt.org/z/aKh4rWY83

Looking at LLVM IR panel in godbolt
```
auto sqrt(double2 a)
{
    // pragma(inline, false);
    a.ptr[0] = llvm_sqrt(a.array[0]);
    a.ptr[1] = llvm_sqrt(a.array[1]);
    return a;
}
```
compiles into vectorized IR:
```
define <2 x double> @_D7example4sqrtFNaNbNiNhG2dZQg(<2 x double> %a_arg) local_unnamed_addr #0 !dbg !5 {
  %1 = call <2 x double> @llvm.sqrt.v2f64(<2 x double> %a_arg), !dbg !7 ; [#uses = 1] [debug line = app/example.d:8:5]
  ret <2 x double> %1, !dbg !8                    ; [debug line = app/example.d:10:5]
}
```
Then
```
auto get_low(double4 d4)
{
    return __ir_pure!(
        `%r = shufflevector <4 x double> %0, <4 x double> undef,
                    <2 x i32> <i32 0, i32 1>
        ret <2 x double> %r`, double2)(d4);
}
```
compiles into
```
define <2 x double> @_D7example7get_lowFNaNbNiNfNhG4dZNhG2d(<4 x double> %d4_arg) local_unnamed_addr #3 !dbg !11 {
  %r.i = shufflevector <4 x double> %d4_arg, <4 x double> undef, <2 x i32> <i32 0, i32 1>, !dbg !12 ; [#uses = 1] [debug line = app/example.d:22:5]
  ret <2 x double> %r.i, !dbg !12                 ; [debug line = app/example.d:22:5]
}
```
Yet
```
auto sqrt(double4 d4)
{
    return sqrt(get_low(d4));
}
```
becomes unvectorized:
```
define <2 x double> @_D7example4sqrtFNaNbNiNhG4dZNhG2d(<4 x double> %d4_arg) local_unnamed_addr #0 !dbg !13 {
  %a.0.vec.extract.i = extractelement <4 x double> %d4_arg, i32 0, !dbg !14 ; [#uses = 1] [debug line = app/example.d:8:5 @[ app/example.d:30:5 ]]
  %1 = tail call double @llvm.sqrt.f64(double %a.0.vec.extract.i) #1, !dbg !14 ; [#uses = 1] [debug line = app/example.d:8:5 @[ app/example.d:30:5 ]]
  %a.0.vec.insert.i = insertelement <2 x double> poison, double %1, i32 0, !dbg !14 ; [#uses = 1] [debug line = app/example.d:8:5 @[ app/example.d:30:5 ]]
  %a.8.vec.extract.i = extractelement <4 x double> %d4_arg, i32 1, !dbg !16 ; [#uses = 1] [debug line = app/example.d:9:5 @[ app/example.d:30:5 ]]
  %2 = tail call double @llvm.sqrt.f64(double %a.8.vec.extract.i) #1, !dbg !16 ; [#uses = 1] [debug line = app/example.d:9:5 @[ app/example.d:30:5 ]]
  %a.8.vec.insert.i = insertelement <2 x double> %a.0.vec.insert.i, double %2, i32 1, !dbg !16 ; [#uses = 1] [debug line = app/example.d:9:5 @[ app/example.d:30:5 ]]
  ret <2 x double> %a.8.vec.insert.i, !dbg !17    ; [debug line = app/example.d:30:5]
}
```
And the final assembly is also unvectorized (in aarch64 and x86):
```
pure nothrow @nogc __vector(double[2]) example.sqrt(__vector(double[4])):
        sqrtsd  xmm1, xmm0
        unpckhpd        xmm0, xmm0
        sqrtsd  xmm0, xmm0
        unpcklpd        xmm1, xmm0
        movapd  xmm0, xmm1
        ret
```
If I uncomment `pragma(inline, false)` then this function becomes a single `jmp` to double2 version. Expected output:
```
pure nothrow @nogc @safe __vector(double[2]) example.sqrt2(__vector(double[4])):
        sqrtpd  xmm0, xmm0
        ret
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzNWFtP4zgU_jXtyxFR4qS3hz4AHVZoZ5B2drWrmZfKiZ3G4MRZ24HO_Po9dpLSlhYKjLSgkjTxuXzn5uPTVLEf80utjDmrlbGiWoF9UCCMabgZxOeDcDEIzwtra_9ErvCzErZo0iBTJT5Ilp0xfs-lqrk27Qu8dgLIVTwbhS8KOW_SRld_qqZijkdUlsszvGpRGZGZbXnTcSvtGmgJlbKgOZXyB5hGc3goOF6ZAltwwyFFVNVqQC6BGriGgt5zqDXNrMg8z0-uFfA1Ahe8ypAdMcHnz39_AVoxQGi1kFyDg6MrKk0AF411sqGm1r0Dw3lpwIhSSOoIIVW2COAc2b2cB1Rc8soKVXG2WW8taK_nsFIsVdKCoWUtOeRKexWGZ8qhoIaj02DPey1PoDSad_UT_-nvRaL_-TaNt4V_VurOhZTaFs71V0RecemQdCI68nHYffwjbawC86-2AzJlqkklJ0AHZNYRTy7aL4B_LR7n1lWJJFNRSVFx5_McPcYdU7xFToPa6sHoIhyMFjCIFyDlfbnsVNGAak1_dMvHWKPnWaMDrJpbzC80oX85mCwOGt7F3LiYK7jnmVVa_MTQXX_dFMMeC-M5GoyALgmsofXWIP4EgyRcLiZ87aOaOJhXN_QmvRE3xW-Eff8DAzd9ykRGdEldUGcgFWbpsqkqWnK2pIxpXI5DvEQsXbnbCLZCgayRd4vL7YNwnL8CByS4J_k4eUG_C-GjqglKvAB0L0JoDDrIaWojMbpgPG1WIFs3LIDWNaZEZ3nA0HGo6Hzk4tJhxXgcQIgG7CqdwoG_DseLOqNwR-mxiP9V8Op4Day4XUr1sCmDBFhyuA66FFsuhV7WuBchfOfgzbqHjrLJSHvApmjyXPI2xZwzkn1nhN4Z-wu4RfIcV3Yl7zrIO1bExAuKL_EbeGHuS4Qvd3mPBUN7uJfdS-JTYtqa_6oyenvVTDr394WTY-kk7LsvoDZ7nziNJc-XT7yVXlG0Vz86ECfGplfzTIBOicNOtkfkXTVGyIlFhlY-UfzWIttTeiwfvvFT-8yLBdZRbxWmpz8hKVNsqCV6take9_Vfuam_MzO3N_Yo3stMGoQBog742rrzS5en3ROX3J0xXkjUTfZtqUnev6s7lyDxgeU4bNcxOR6TctOlLBWybVUt3t0G1banfuWA_c6PCDr6CAb14PCwynUfm_ZhKzS7iVQrYVT1uL1uut-HCRMNpr8i5fYjNH6XLbO32ULelHLTk1LufzGoB_ealDuQqbv5Rz5OzI71rn2794BOXtO74tMOiG6Wc_MYNgQqcZA0vExxdhQGcL5RO90E_PwDlOqsGCd-iFzjvOp60-E24w6KbogttHpwHqrUKsMjZCtxk41oDGmHGujxd13wEGnSzT9bWvuu7rgMA1iXpQ8x3sNdiqaqs7uiZv2zpzhIuSXrCIWXJXdkHdFaqnta78qKnpxRDzrwOse5vqmwsbc5Pw6fG0THoYtkhReMXo5sbi6H_lxAcY6vVq4UxuFtWXtq1Z9-cRbUBqkD-LSu0ecYbNXYurGvCi3eDc35ySEmb4tx_WxcnvpyyOYxm8UzOrTCSj7_IjDL2Wb8pd5NmNiSZSQYNlrOn_tBCHfU7nZWa3WLQrZ_vxkl4zgcFnOs2SQmYzpjI8YiEqeEz3IyS-hoElIapUNJUy7N3O06o8VQzElISBiTGSGjSRwHPMvGeTYh8TgnaRhP0LW8xA0-8Du60quhnnsMuAsYt9ELY83jItaxWFWc9_LxJFooPU8pbqB3Z0Y0jN_SoUc995D_A6eKb-U">