<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/129538>129538</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[RISCV][EVL] Improve sdiv/udiv code generation for tail folding by EVL.
</td>
</tr>
<tr>
<th>Labels</th>
<td>
backend:RISC-V
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Mel-Chen
</td>
</tr>
</table>
<pre>
After https://github.com/llvm/llvm-project/pull/127180, the vectorizer emits vp.merge + general sdiv/udiv instead of vp.sdiv/udiv for tail folding by EVL.
However, using vp.udiv/sdiv may yield better performance. The improvement could come from fewer vsetvli instructions and lower vector register pressure.
The current IR and assembly for sdiv: https://godbolt.org/z/YvPhGa8df
The vp intrinsic IR and assembly for sdiv: https://godbolt.org/z/1achsE3Wo
Not yet sure at which stage this optimization should be applied. We need more discussion.
Label it as RISCV backend issue for now.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJykU0uPnDwQ_DXm0loEBnZmDhxmH3zfSpso2kSzytGPBpwYjNyG0eyvj-zdKA8phygnH9pdXVWqEkRmmBFb1tyw5i4Taxidb9-hvbodcc6k05f22Af0MIawEKuOjHeMd4MJ4ypz5SbGO2u378_V4t0XVIHxblmtZbwr-a7cF4zfQhgRNlTBefOCHnAygWBb8gn9gMD4DQw4oxcWSJuN8W7VZgMzU0ChwfXx78-T3nkIwljondVmHkBe4P70mAMrjv-7M27o49mV4mxb8vV1N0LAJC5wMWg1SAxR3oK-d34Ss8IcPo0IZlq823DCOYByq9Wg3ITQezdBj2f0sBGGzZrE0K8qGDcTiFmDdWmcpILHwVC64JFo9Ziz4siKY7yhVu8j_sNT2hNEOEl7ScqS0ur4u-9OS2dD7vzAePfCePd5-zD-J_a6f8PcFjBz8GYmo_4JuBRqpPvq2b3yfe8CXDBAlAAiwHk0agQKYkAIoyFwSzCTeRHRB6AxWSYRxLJYgzqHZ4QZUcPkPII2pFYi4-Zox6OQaMEEEARPDx9vTyCF-oqzBkO0YqI9u3P8mum20ofqIDJsy11dFmVVlftsbA9Sasnlblcrsa95r6-rusBdf13UO9xXRWZaXvCmqIqqrOt9tcubUnKlZI28LhRKyeoCJ2FsHoMcjcjS9bbkh6baZzaSpNQUzt_4seoY-V6dGOexP75NJZDrQKwurKFAP9CCCTY1LUlkzR1rbu5Pj6y5g4fXsP0SfOU0vjUiWfqnuGert-1ftzNJo9TPpG5r-bcAAAD__4-AW3o">