<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/56737>56737</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[RISCV] Generalize performFP_TO_INTCombine to vectors
</td>
</tr>
<tr>
<th>Labels</th>
<td>
backend:RISC-V,
performance
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
preames
</td>
</tr>
</table>
<pre>
We have custom lowering for (fp_to_int (ftrunc X)) and variants in the scalar domain. Unless I'm missing something, we can extend this handling into the vector domain as well.
Unlike the scalar domain, we don't have static rounding modes for vector ops for all the variants. ftrunc is the only one directly supported via static rounding. For all the others, we'd have to preserve the rounding mode, change it, then change it back. That's relatively expensive, but, I think, probably still worthwhile.
Here's a test with current codegen inline:
define <vscale x 1 x i64> @trunc_nxv1f64_to_si(<vscale x 1 x double> %x) {
; CHECK-LABEL: trunc_nxv1f64_to_si:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetvli a0, zero, e64, m1, ta, mu
; CHECK-NEXT: vfcvt.rtz.x.f.v v9, v8
; CHECK-NEXT: vfcvt.f.x.v v9, v9
; CHECK-NEXT: lui a0, %hi(.LCPI15_0)
; CHECK-NEXT: fld ft0, %lo(.LCPI15_0)(a0)
; CHECK-NEXT: vmflt.vv v0, v9, v8
; CHECK-NEXT: lui a0, %hi(.LCPI15_1)
; CHECK-NEXT: fld ft1, %lo(.LCPI15_1)(a0)
; CHECK-NEXT: vfadd.vf v10, v9, ft0
; CHECK-NEXT: vmerge.vvm v9, v9, v10, v0
; CHECK-NEXT: vfabs.v v10, v8
; CHECK-NEXT: vmflt.vf v0, v10, ft1
; CHECK-NEXT: vfsgnj.vv v9, v9, v8
; CHECK-NEXT: vmerge.vvm v8, v8, v9, v0
; CHECK-NEXT: vfcvt.rtz.x.f.v v8, v8
; CHECK-NEXT: ret
%a = call <vscale x 1 x double> @llvm.trunc.nxv1f64(<vscale x 1 x double> %x)
%b = fptosi <vscale x 1 x double> %a to <vscale x 1 x i64>
ret <vscale x 1 x i64> %b
}
define <vscale x 1 x i64> @si_only(<vscale x 1 x double> %x) {
; CHECK-LABEL: si_only:
; CHECK: # %bb.0:
; CHECK-NEXT: vsetvli a0, zero, e64, m1, ta, mu
; CHECK-NEXT: vfcvt.rtz.x.f.v v8, v8
; CHECK-NEXT: ret
%b = fptosi <vscale x 1 x double> %x to <vscale x 1 x i64>
ret <vscale x 1 x i64> %b
}
Unless I'm missing something, we should be able to emit the same codegen for these two.
Note there's also a SAT version of this transform. We should do the same there.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzNVttu4zYQ_Rr5hYigi2VbD35I7KRrNEgXXXd33wxKHFncUKRBUrI3X98hJee2ieOiKNDAkURyLucMh8MpFPs5_wakph2QsjVWNUSoPWgut6RSmgTJrNptrNpwaf3A6laW5HuQ5PgjVDLSUc2ptIZwSWwNxJRUUE2YaiiXISF_SQHGkFWQTBvScGOcbaMasDV-BcmC7NE3lQQOFtAeThsEJJlwguhXebMdlFYdzRJqUEuIMIiWQXTZP9ERv4dfMQwumMKvqe25GkstL4lWrWTOTaMYGM948KN2_ZAK0bsfWCKhIQaI0i0oKX7iA-1zjao4MO1up7QFDA2nrz2h_s0zswof2vQIER3r0SHlnQYDuuvpvIDphEuMzxYIt26AEvJphhS0vEcv65ri6tQQDQIRdIDI4LADafDbqRWt1165gMt797nTqqCFY2A54tsjiXpfcwEvwvwJNHjDlFgwluy5rTF3tAZMkRIBbhEOx72QEKSDDoMKhyRIF53bGiAHEuM_n4yD9JoE48iHdCMPXVxNxi7fDMdsey3PVFsI8CpJdnAJGEyvBmzpFVl8ul78fnF7eXV9i57JWzaPgB7FnWD_FySpM1sUYfSr2MXd9ff1INsZsJ3ghEYuaA-glXsDcsFXE_stof67PWWlKjsbavsQHsIq7EiXO5Vu9qFKhQqP4vkJcdEeMSKt2sUzvF18XsXZJnLH933FSjBM8qOmUK81kxn9wELXVMKGHcKMBpgfcXsfbHwO2PgtsPGZYCvKWNhVpIufoXUBOEUQ9BaQYfO0E-45WDipWtHCuB0cZE_ueB_H6hjHXsXRPWXfbOUPH_z8zOA_ZzMbpJ80T5N5lcWzD51psP2qO3IZxaKwxPKP9ebEaR9HQnRN6E90OJzoc-rDM0eFd1TtrDL8lCsHCevv26XqaA85vFvM0NXAfrp8Xjc_rIGGb9xl8q_r3tHO_7jW_dMsOX_zDv_N5p3VxJhatViNCiB4i_pbHBq8kH1DQht4vBxdX4GTBkX26sXdeqesv_GPN6wwCq_ZL5drbEu04UoSVfUNktVUGjTU4FX_7dE1U0_evBlcHcE8nmSz2SRJ83zE5inL05yOLLcC5kF29efqy-JrkC3JbyBBU8EfgOxAO9s3nzfrPzaru_VCNYVLXqTU90dm1Goxr63dGZdAyQ3-ttgItEVYqgYH7rwOrwvsK36gFg4xcC1gu3OTTabpdFTPk5KxMUuyCqokmsZjFo2zDNJoRnElidORoAUI43AGSeJaG2wR0aMDffEVp_q6nwyAqSzBTWbLEZ8nUYI2k0mcR5N0FhazKaTTcVLm-TiKK4YnDrA9FKEvLUpvR3ru4Rbt1riKww22e4-LFPd8K8GHzNmnra2VnmOfhsE2I89s7mn9DfPBUoE">