[PATCH] D140542: [MachineCombiner] Support local strategy for traces

Anton Sidorenko via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jan 27 08:49:31 PST 2023


asi-sc added a comment.

In D140542#4083878 <https://reviews.llvm.org/D140542#4083878>, @spatel wrote:

> I don't see anything wrong with this patch, but I haven't looked at MachineCombiner in a long time, so I'm likely not the best reviewer.
>
> I do want to note a potential alternative - have you looked at using enableAggressiveFMAFusion() in DAGCombiner?
> If you flip that switch for RISCV, the `sub3` calculation in the last block would become:
>
>   	fmadd.d	ft1, ft0, fa1, fa0
>   	fnmsub.d	fa0, ft0, fa1, ft1
>
> If the fma instruction timing is the same as plain fmul on your target, then is that even better than the 1 fmadd produced on the test here?

@spatel , yeah, there are almost no recent contributions to machine combiner, so I had to find people who contributed to it years ago.

Thanks for the suggestion, I agree it can fix the provided test, but it won't solve the problem in general: when machine combiner estimates profitability, it bases the decision on the trace gathered according to some strategy. The current implementation supports only one strategy that choses multiple blocks from a function. So, when we ask machine combiner to check whether the suggested pattern is good or bad, an instruction that was really far away (let's say 4 BBs and 400 instructions above) may dramatically affect its decision (when it is on the critical path). It works fine for CPUs with big OOO buffer, but not for a tiny CPU that executes instructions in-order. My experiments show that for such CPUs we should calculate critical path separately for each BB as instructions that were in other BBs usually has almost no effect. Does it sound reasonable?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140542/new/

https://reviews.llvm.org/D140542



More information about the llvm-commits mailing list