<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/97580>97580</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [X86] Vector-Vector dot product not reduced to corresponding single instruction
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Hendiadyoin1
      </td>
    </tr>
</table>

<pre>
    Given the following cpp code snippets:
```c++
float simple_dot_product(f32x4 a, f32x4 b) {
 return a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];
}

f32x4 dot_product_broadcast(f32x4 a, f32x4 b) {
 float d = a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];
    f32x4 r = {d,d,d,d};
    return r;
}

float selective_dot_product(f32x4 a, f32x4 b) {
    return a[0] * b[0] + a[2] * b[2] + a[3] * b[3];
}

f32x4 selective_dot_product_selective_broadcast(f32x4 a, f32x4 b) {
    float d = a[0] * b[0] + a[2] * b[2] + a[3] * b[3];
 f32x4 r = {d,d,0,d};
    return r;
}
```

clang/llvm fails to reduce these down to simple `dpps` (`DotProductPackedSingles`) instructions when SSE4.2 is enabled, similar might be true for the `double` case

Godbolt link with hopefully correct targets:
https://godbolt.org/z/od5ezWM19

Note that this might be affected by fp-accuracy affecting flags, such as `-fassociative-math` or `-ffp-contract=*`, as using the dot product instruction might yield higher accuracy (taking a look at https://www.felixcloutier.com/x86/dpps its a bit unclear if intermittent rounding is performed or if this acts as a sort of multiply-add type thing) 
Also note that pre-multiplying `a` and `b` yields better codegen without `-ffast-math` or the like, as seen in the linked collection
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy8VUGP4zYP_TXKhUggy4kTH3yYmXzZ79JigQHa3gayRNtqFMmQ6Mlkf30h27vxFN1ipygKJLAsWuTjeyQlYzStQ6zY7pHtjis5UOdD9X902kh988Zlq9rrW_XJvKID6hAab62_GteC6ntQXiNEZ_oeKbL8gfEj4w-s4NNPMfGYfuNuY70kiObSW3zRnl764PWgiIlDk4u3LUgmnmBa1kyUwPbzSQhIQ3Ag2e6Rs90RmHiA-v7yOFqypSVbWsTSIpaWfGlJLyyfY7L9cV5M4EdYC9QvdfBSKxl_BP-UugaWH__TJABgBhTG2Gz_qJl4uv_377-deQ7fZWGSEC0qMq8fVfEe4O85-Nfk-kugL_fdj0iYmPxhFf-RVt8Rin9QqK-9t6RDWelaJk7Wvl6gkcZGIA8B9aAwdXVE0P7q0ubUn8AKrvs-soIDEwdW8KOnzxOBn6U6o342rrWYPkg8GRcpDIqMdxGuHTp4fv7fdiPAREAna4spleTcWBngYtqOoEagMKSREsbRkmL6obaYoioZcZnCJ69rbwmscWe4Guqg8z02g7U3UD4EVAQkQ7ucQx1RP76JExOndvKw8SFR8YWJk9c7_PLrT1m5DPSzp8SJJKDOxDtU2TSoCDXUN2j6tVRqCFLd5v00EBsr2zimOagOZEwJrRsZo1dGpoJbXyR1KTkfJlvTr5V3FKQilh-ZGNUTT-nsEJPLRIv2BHPtLnmekd0MWg2daTsM8A0UEweS5-RBgvX-DJLgPRvX63XToDVvyvqBDIaN8hcmTm-HgolT0h4MRZBQG4LBKYsygGnAOMJwMUToCIIfnE5RTIQeQ-PDBXXKzjQTe1IlH8lN9IHAN3AZLJne3tZSa6Bbn7g2qThLmPh_sNGD-yZCH3D99UyKxAouE4XS6bSu03rkIEKNRBjGO6lFNxaJH2hmWkZa0p-IteaMM9sR0YFx87Y7owbl7TgnvFvpKtdlXsoVVtlecJ4VZS5WXSXqjKvycNjXXBbbTJV1sUMhs23GC81zvjKV4GLL9zzPtuKwE5vyoEvRSKF3UuRbrNmW40Uau0mdmSpzZWIcsCr3uwNfWVmjjePlLITDK4xGJtJEWYUqnVnXQxvZllsTKd69kCE73uq_HYo0cX5BRT6sp8e7inKe5kGgU_uPrRR7P8kaxyZfVt1qCLb6U18Z6oZ6Lp-EYH6s--B_x3QznEbckYnTlNdrJf4IAAD__7sCiL8">