[PATCH] D114361: [MachineCSE] Add an option to enable global CSE

wangpc via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Nov 26 03:44:32 PST 2021


pcwang-thead added a comment.

> OK, that's a good start. I was expected something among the lines of "I have tested RISCV on the llvm test suite or some other large codebase under Oz and it reduced the total codesize by 0.16%".
>
> My experiments on ARM and AArch64 are not as great. This seems to increase codesize more than it reduces it, especially on ARM.  The AArch64 numbers were dominated by one large increase, with some of the smaller cases being smaller. I would be interested in what the tests in-tree showed too.
>
> You might want to check X86 as it's easy to run. If I was making target independent changed like this I would expect to test at least a couple of architecture combos (say, X86 with Arm and AArch64 for 32bit and 64bit variants), and potentially add target overrides where needed. In this case the default should maybe be kept as before, unless we have some evidence this is beneficial across most architectures.

Thank you for your nice advice.

I have tested RISCV on SPECINT 2006 under `Oz`, here is the result:

                  code size
  400.perlbench    +0.438%
  401.bzip2        0%
  403.gcc          -1.128%
  429.mcf          0%
  445.gobmk        -0.221%
  456.hmmer        -1.682%
  458.sjeng        0%
  462.libquantum   0%
  464.h264ref      -0.858%
  471.omnetpp      -0.616%
  473.astar        0%

`perlbench` got increased code size.

The result may not be convincing with outdated benchmarks, so I tested it on OpenCV codebase.

Most of executable files and libraries had no code size change, while some large files got smaller, like:

  opencv_perf_imgproc  -0.069%
  opencv_perf_video    -0.288%
  opencv_test_core     -0.407%
  opencv_test_core     -0.249%
  opencv_test_dnn      -0.182%
  opencv_test_imgproc  -0.246%
  libopencv_imgproc.so -0.247%
  ……

Besides, third-party libraries used by OpenCV(like `libquirc`, `libwebp`, `libjpeg-turbo`, `libtiff`, etc.) got smaller code size.
Some small examples of OpenCV increased a few bytes, as a result of increment of register pressure.

I have made aggressive MachineCSE disabled by default, targets may override it if it's profitable.

In fact, I think this work-around can be more elegant via live intervals analysis as @shchenz said. At least, we should do CSE on `Extend Basic Blocks` instead of local or adjacent blocks.



================
Comment at: llvm/lib/CodeGen/MachineCSE.cpp:440
+      TII->enableAggressiveMachineCSE(*MI->getMF()))
+    return true;
+
----------------
shchenz wrote:
> If the register pressure is increased, doing more CSEs may introduce register spill/reload and thus it will generate worse code even for optimization for size?
Yes, you are right.

`AggressiveMachineCSE` should be placed after `MayIncreasePressure`.


================
Comment at: llvm/lib/CodeGen/MachineCSE.cpp:468
     if (CSBB != BB && !CSBB->isSuccessor(BB))
       return false;
   }
----------------
shchenz wrote:
> Can we estimate the register pressure here to do a more aggressive CSE? If so, we should not limit this only for "optimization for size".
Absolutely!

IMO, the key point is that we should do some live range analysis here?



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114361/new/

https://reviews.llvm.org/D114361



More information about the llvm-commits mailing list