[PATCH] D136764: [MachineCombiner][RISCV] Add fmadd/fmsub/fnmsub instructions patterns

Wed Nov 16 05:02:40 PST 2022

asi-sc added a comment.

In D136764#3930123 <https://reviews.llvm.org/D136764#3930123>, @asi-sc wrote:

> I do have a draft work that will improve machine combiner logic to deal with this problem. In my opinion, combining instructions that are separated by a call is a doubtful from performance point of view thing and we must do additional check in these situations. However, I don't see how we can fix it exactly in this patch. Do we agree that sometimes additional moves are inserted, their origin is clear, and the problem is likely to be addressed in further patches? I don't expect major performance issues caused by exactly this behavior and didn't observe any. Or maybe there are any suggestions on how to address it in this patch?

I updated my draft patch to handle the situation described. Now locally there is no difference in asm instructions for both Whets-N5 (sin/cos) and Whets-N8(exp/sqrt) comparing to the baseline version. However, there is still the same execution time difference. So, it seems to me as architecture-related thing and not instructions combining issue.

Also, according to llvm-mca report (`llvm-mca -march=riscv64 -mcpu=sifive-u74`) for the unrolled loop body of N5 (sin/cos), there should not be so dramatic difference (diff in cycles is less than 1%):
baseline

  Iterations:        100      
  Instructions:      15300    
  Total Cycles:      380004   
  Total uOps:        18500    

  Dispatch Width:    2        
  uOps Per Cycle:    0.05     
  IPC:               0.04     
  Block RThroughput: 560.0    

this patch

  Iterations:        100
  Instructions:      16000  
  Total Cycles:      383504 
  Total uOps:        19200  

  Dispatch Width:    2
  uOps Per Cycle:    0.05
  IPC:               0.04
  Block RThroughput: 567.0  

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136764/new/

https://reviews.llvm.org/D136764