<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/97580>97580</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[X86] Vector-Vector dot product not reduced to corresponding single instruction
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Hendiadyoin1
</td>
</tr>
</table>
<pre>
Given the following cpp code snippets:
```c++
float simple_dot_product(f32x4 a, f32x4 b) {
return a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];
}
f32x4 dot_product_broadcast(f32x4 a, f32x4 b) {
float d = a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];
f32x4 r = {d,d,d,d};
return r;
}
float selective_dot_product(f32x4 a, f32x4 b) {
return a[0] * b[0] + a[2] * b[2] + a[3] * b[3];
}
f32x4 selective_dot_product_selective_broadcast(f32x4 a, f32x4 b) {
float d = a[0] * b[0] + a[2] * b[2] + a[3] * b[3];
f32x4 r = {d,d,0,d};
return r;
}
```
clang/llvm fails to reduce these down to simple `dpps` (`DotProductPackedSingles`) instructions when SSE4.2 is enabled, similar might be true for the `double` case
Godbolt link with hopefully correct targets:
https://godbolt.org/z/od5ezWM19
Note that this might be affected by fp-accuracy affecting flags, such as `-fassociative-math` or `-ffp-contract=*`, as using the dot product instruction might yield higher accuracy (taking a look at https://www.felixcloutier.com/x86/dpps its a bit unclear if intermittent rounding is performed or if this acts as a sort of multiply-add type thing)
Also note that pre-multiplying `a` and `b` yields better codegen without `-ffast-math` or the like, as seen in the linked collection
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy8VUGP4zYP_TXKhUggy4kTH3yYmXzZ79JigQHa3gayRNtqFMmQ6Mlkf30h27vxFN1ipygKJLAsWuTjeyQlYzStQ6zY7pHtjis5UOdD9X902kh988Zlq9rrW_XJvKID6hAab62_GteC6ntQXiNEZ_oeKbL8gfEj4w-s4NNPMfGYfuNuY70kiObSW3zRnl764PWgiIlDk4u3LUgmnmBa1kyUwPbzSQhIQ3Ag2e6Rs90RmHiA-v7yOFqypSVbWsTSIpaWfGlJLyyfY7L9cV5M4EdYC9QvdfBSKxl_BP-UugaWH__TJABgBhTG2Gz_qJl4uv_377-deQ7fZWGSEC0qMq8fVfEe4O85-Nfk-kugL_fdj0iYmPxhFf-RVt8Rin9QqK-9t6RDWelaJk7Wvl6gkcZGIA8B9aAwdXVE0P7q0ubUn8AKrvs-soIDEwdW8KOnzxOBn6U6o342rrWYPkg8GRcpDIqMdxGuHTp4fv7fdiPAREAna4spleTcWBngYtqOoEagMKSREsbRkmL6obaYoioZcZnCJ69rbwmscWe4Guqg8z02g7U3UD4EVAQkQ7ucQx1RP76JExOndvKw8SFR8YWJk9c7_PLrT1m5DPSzp8SJJKDOxDtU2TSoCDXUN2j6tVRqCFLd5v00EBsr2zimOagOZEwJrRsZo1dGpoJbXyR1KTkfJlvTr5V3FKQilh-ZGNUTT-nsEJPLRIv2BHPtLnmekd0MWg2daTsM8A0UEweS5-RBgvX-DJLgPRvX63XToDVvyvqBDIaN8hcmTm-HgolT0h4MRZBQG4LBKYsygGnAOMJwMUToCIIfnE5RTIQeQ-PDBXXKzjQTe1IlH8lN9IHAN3AZLJne3tZSa6Bbn7g2qThLmPh_sNGD-yZCH3D99UyKxAouE4XS6bSu03rkIEKNRBjGO6lFNxaJH2hmWkZa0p-IteaMM9sR0YFx87Y7owbl7TgnvFvpKtdlXsoVVtlecJ4VZS5WXSXqjKvycNjXXBbbTJV1sUMhs23GC81zvjKV4GLL9zzPtuKwE5vyoEvRSKF3UuRbrNmW40Uau0mdmSpzZWIcsCr3uwNfWVmjjePlLITDK4xGJtJEWYUqnVnXQxvZllsTKd69kCE73uq_HYo0cX5BRT6sp8e7inKe5kGgU_uPrRR7P8kaxyZfVt1qCLb6U18Z6oZ6Lp-EYH6s--B_x3QznEbckYnTlNdrJf4IAAD__7sCiL8">