[PATCH] D136764: [MachineCombiner][RISCV] Add fmadd/fmsub/fnmsub instructions patterns
Anton Sidorenko via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 16 05:02:40 PST 2022
asi-sc added a comment.
In D136764#3930123 <https://reviews.llvm.org/D136764#3930123>, @asi-sc wrote:
> I do have a draft work that will improve machine combiner logic to deal with this problem. In my opinion, combining instructions that are separated by a call is a doubtful from performance point of view thing and we must do additional check in these situations. However, I don't see how we can fix it exactly in this patch. Do we agree that sometimes additional moves are inserted, their origin is clear, and the problem is likely to be addressed in further patches? I don't expect major performance issues caused by exactly this behavior and didn't observe any. Or maybe there are any suggestions on how to address it in this patch?
I updated my draft patch to handle the situation described. Now locally there is no difference in asm instructions for both Whets-N5 (sin/cos) and Whets-N8(exp/sqrt) comparing to the baseline version. However, there is still the same execution time difference. So, it seems to me as architecture-related thing and not instructions combining issue.
Also, according to llvm-mca report (`llvm-mca -march=riscv64 -mcpu=sifive-u74`) for the unrolled loop body of N5 (sin/cos), there should not be so dramatic difference (diff in cycles is less than 1%):
baseline
Iterations: 100
Instructions: 15300
Total Cycles: 380004
Total uOps: 18500
Dispatch Width: 2
uOps Per Cycle: 0.05
IPC: 0.04
Block RThroughput: 560.0
this patch
Iterations: 100
Instructions: 16000
Total Cycles: 383504
Total uOps: 19200
Dispatch Width: 2
uOps Per Cycle: 0.05
IPC: 0.04
Block RThroughput: 567.0
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D136764/new/
https://reviews.llvm.org/D136764
More information about the llvm-commits
mailing list