[llvm] [AArch64] Replace 64-bit MADD with [SU]MADDL when possible (PR #135926)

Yuri Gribov via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 21 08:42:44 PDT 2025


yugr wrote:

Thanks for the prompt feedback.

> Do you have examples of where SDAG does not handle the mul->smull transform?

I've studied several most optimized files and it turns out that all replaced MADDs have one or both of their multiplied operands coming from a different block and one of them being a `PHI`. For example in ldecod/context_ini.c :
```
bb.2.for.cond33.preheader:
  ...
  %388:gpr64all = COPY $xzr
  %378:gpr64all = COPY %388:gpr64all
  ...
  %395:gpr64 = SUBREG_TO_REG 0, %394:gpr32, %subreg.sub_32
  ...

bb.36.for.cond36.preheader:
  ...
  %25:gpr64 = PHI %378:gpr64all, %bb.2, %713:gpr64all, %bb.64
  ...

bb.38.if.then44:
  ...
  # THIS IS REPLACED
  %397:gpr64 = nuw nsw MADDXrrr %25:gpr64, %395:gpr64, %379:gpr64common
  ...

bb.64.for.inc72.8:
  ...
  %713:gpr64all = SUBREG_TO_REG 0, %389:gpr32, %subreg.sub_32
  ...
```

BTW it turns out that this optimization is infrequent - most of the replacements are just in few files:
```
5 ./MultiSource/Benchmarks/tramp3d-v4/CMakeFiles/tramp3d-v4.dir/tramp3d-v4.cpp.o                                                                                                                            
9 ./MultiSource/Applications/JM/ldecod/CMakeFiles/ldecod.dir/macroblock.c.o
10 ./MicroBenchmarks/LCALS/SubsetALambdaLoops/CMakeFiles/lcalsALambda.dir/__/LCALSStats.cxx.o
10 ./MicroBenchmarks/LCALS/SubsetARawLoops/CMakeFiles/lcalsARaw.dir/__/LCALSStats.cxx.o
10 ./MicroBenchmarks/LCALS/SubsetBLambdaLoops/CMakeFiles/lcalsBLambda.dir/__/LCALSStats.cxx.o
10 ./MicroBenchmarks/LCALS/SubsetBRawLoops/CMakeFiles/lcalsBRaw.dir/__/LCALSStats.cxx.o
10 ./MicroBenchmarks/LCALS/SubsetCLambdaLoops/CMakeFiles/lcalsCLambda.dir/__/LCALSStats.cxx.o
10 ./MicroBenchmarks/LCALS/SubsetCRawLoops/CMakeFiles/lcalsCRaw.dir/__/LCALSStats.cxx.o
10 ./MultiSource/Benchmarks/Bullet/CMakeFiles/bullet.dir/btAxisSweep3.cpp.o
17 ./MultiSource/Applications/JM/lencod/CMakeFiles/lencod.dir/context_ini.c.o
21 ./MultiSource/Benchmarks/Ptrdist/bc/CMakeFiles/bc.dir/execute.c.o
59 ./MultiSource/Applications/JM/ldecod/CMakeFiles/ldecod.dir/context_ini.c.o
```
> Ideally this kind of thing would be fixed by GISel, but it still feels a long way off, and I have thought of adding more to combine inside MIPeephole opt (mostly as I was experimenting with autogenerating peephole optimizations). I had come to the conclusion that inventing another place to do known-bits based transformations was likely not the best for maintenance and correctness though. Maybe it would be OK if we limit the scope.

Got it, so SDAG is the preferred place to do such optimization then ? I guess this should be closed then.

https://github.com/llvm/llvm-project/pull/135926


More information about the llvm-commits mailing list