<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/136368>136368</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Inefficient codegen for `copysign(known_zero_sign_bit, x)`
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
dzaima
</td>
</tr>
</table>
<pre>
The code:
```c
double native(double x) {
double bound = fabs(x) > M_PI/2 ? M_PI/2 : 0;
return copysign(bound, x);
}
```
via `-O3 -march=haswell` generates:
```asm
.LCPI0_1:
.quad 0x3ff921fb54442d18
.LCPI0_2:
.quad 0x7fffffffffffffff
native:
vmovddup qword ptr [rip + .LCPI0_2] ; 0x7fffffffffffffff
vandpd xmm2, xmm0, xmm1
vmovsd xmm3, qword ptr [rip + .LCPI0_1] ; PI/2
vcmpltsd xmm2, xmm3, xmm2
vandpd xmm2, xmm2, xmm3 ; xmm2 == bound
vandnpd xmm0, xmm1, xmm0
vandpd xmm1, xmm2, xmm1 ; unnecessary! could be just xmm2
vorpd xmm0, xmm1, xmm0
ret
```
which has an extraneous `vandpd` masking out the sign bit of `bound`, even though that's always 0. Moreover, manually doing the more efficient bitwise arith still results in the suboptimal code.
The better assembly would be:
```asm
vandpd xmm1, xmm0, xmmword ptr [rip + .LCPI2_0] ; extract sign
vandpd xmm0, xmm0, xmmword ptr [rip + .LCPI2_1] ; mask out sign
vmovsd xmm2, qword ptr [rip + .LCPI2_2] ; PI/2
vcmpltsd xmm0, xmm2, xmm0
vandpd xmm0, xmm0, xmm2
vorpd xmm0, xmm1, xmm0
ret
```
Compiler explorer link, plus the manual impl: https://godbolt.org/z/dv14nn39x (as an aside, there's an easily-avoidable `vmovq xmm1, xmm0` there; perhaps from the inline assembly workaround messing with register allocation?)
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJyMlUtv6ygUxz8N2Rw1wtjOY-FF0k6kSnM1dzH7CMxxzC0PF3Ae_fQjbLdNMu2dsSKhwOF_HvwO8BDUwSJWpNyS8mnG-9g6X8k3rgyfCScv1d8tQu0kknxD6IYs6PirCd1I1wuNYHlURyRsNf0_E7YGstwSuoFpSrjeSiD5EzRcBMJWo03-B_zY_3wmbMeA5LvrPxugJB8kAMBj7L2F2nWXFC9hq0GQsMfB2WhIlk_XARK6OSoOZEEf_srhwXBftyR_ank4odZkQeGAFj2PGO5S48EQupn_-fjzme6zcRHmrz2XAEDPedOsWdaIsigKJrPVpy2bbKdv3ELPy-b2I3QzlezG_GjcUcq-g9eT8xK66IGUW686IGwLHy7KJyD59mvZDyluZScBzsawoUjG0GnM7jyG0SxPy79znL07Hk_oWqU2nY5BXnvLp3Gw-yKaD7tBMk0kOBIf48HepmI7eZvCe0q34tmdeDaI99ZijSFwfyEsg9r1WoJA-NWH-Bmi80nkWy_T5zHeIXZqVd1CywNwC3iOnlt0fUjYjZEl0AwPL8oewPURYouQGAahIrgmGY4pLwbHeEQLsXX9oYXY8kjYMgDXJ34JQOfww3l0R_TJ1HDbc60vIF0ST8LGeQRsGlUrtDG5OKmAwL2KLYSotAaPodcxgLJjKL1wXVSG66HJ5yk9ukk9LzBG9MBDQCP0BU5T4b5sljvwbso3jt-ixfb0Ha2hfnUc6vOlKP3_oh-8puIPlb9XHeF_5-V79tln0_2WfXqH3z2e98Hf6PwHf_8Gj9DNozOd0ugBz512Hj1oZV_Stk73YQRiYASU6XS6UtsYu-G2YzvCdgcnhdNx7vyBsN0bYTt5zApr8_UZCFuNTPOgJCbN2KLHEUcLyIPSlwd-dErydMEn3o07vt5GvqDTtnwLHfqWdwEa78wQm7JaWbwGzL9wPzwUBkNITJ8Stx4PKgwoau1qHpWzJN8Rtp7JKpfrfM1nWGXLoqR0XSzLWVuJomSiLjJJy4YKwQtei5JxWQhRU1mvZ6pilJW0yFaMlllWzrMiE_USRZOzZZYJSQqKhis91_poUoFmKoQeqyxf5IvVTHOBOgyPJmMWTzCsEpZAmfkqbXoQ_SGQgmoVYviUiSpqrJ7tZ5Omtjughcb5VMWrN-7FupPdv6F3-zSzFyp-vHgLOuu9ru4OVMW2F_PaGcJ2yeU0PHTe_cI6ErYbAg2E7aZMjhX7JwAA__8UCHLm">