[PATCH] D132828: Add new optimization pass of Tree Height Reduction

Tue Aug 30 02:36:46 PDT 2022

fhahn added a comment.

Thanks for updating the patch! Have you considered implementing this as MachineFunctionPass instead of an LLVM IR pass? Doing the transformation on MachineIR would allow for more precise cost estimates, including more accurate information about register usage, selected instructions and processor resource usage. `MachineCombiner.cpp` might be interesting example to look at for similar (although simpler) transformations with relatively accurate uarch-driven cost-modeling.

> I run C/C++ benchmarks in SPECspeed 2017 on Fujitsu A64FX processor, which has two pipelines for integer operations and SIMD/FP operations each. 600.perlbench_s and 619.lbm_s had 3% improvement. Other benchmarks (602, 605, 620, 623, 625, 631, 641, 644, 657) were within 1% up/down. In these runs, to emphasize the performance improvement, the number of OpenMP threads is limited to one.

It might be interesting to also run SPECrate instead of just running speed with one thread?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132828/new/

https://reviews.llvm.org/D132828