[PATCH] D147800: [SystemZ] Enable MachineCombiner for FP reassociation.

Jonas Paulsson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu May 18 01:32:45 PDT 2023


jonpa updated this revision to Diff 523291.
jonpa added a comment.

- MachineSink behaved differently with the new pseudos that clobber CC - fixed with a patch in MachineSink plus making sure to mark the CC def as dead on the newly created instructions in MachineCombiner.

- Previously tried selecting _CCPseudo:s by using a pattern with added complexity for them, and then also an even higher complexity for MDEBR. Seemed better to instead predicate the reg/mem pattern with "no reassociation flags", and selecting the _CCPseudo in case of "reassociation flags", or else the target instruction, which the patch now does. These two alternatives gave identical output on SPEC.

- Experimented with MDEBR but it seemed that those cases are rare and there was not any more reassociation done on benchmarks - so I removed the folding I had working in FinalizaeReassocication (ldebr; wfmdb -> mdebr).

- PeepholeOptimizer does not fold loads across basic blocks but it seems good to fold them in FinalizeReassociation. Tried doing this  first with only loads from constant pool, but it seemed to be even better to do it on any load.

- Removed the check (in optimizeLoad()) when folding into reg/mem that there is no other user in MBB. With this restriction:

                                   main                patch
  mdb            :                 9667                 5507    -4160
  meeb           :                 8838                 4831    -4007
  adb            :                10787                 8591    -2196
  aeb            :                 7322                 5534    -1788
  sdb            :                 4271                 4409     +138
  seb            :                 4706                 4094     -612
  Copies         :              1006170               999666    -6504

Without it (as patch is now), the number of reg/mem instructions are much closer to main:

  mdb            :                 9667                 9061     -606
  meeb           :                 8838                 8210     -628
  adb            :                10787                 9467    -1320
  aeb            :                 7322                 6992     -330
  sdb            :                 4271                 4637     +366
  seb            :                 4706                 4697       -9
  Copies         :              1006170              1006497     +327

As seen in the number of register moves (copies) in the output, the folding into 2-addres reg/mem has a price of copying the source reg. The lesser number of copies didn't seem to matter in performance. With the extra folding I see a great improvement in f538.imagick_r (~15%), which is probably be the same improvement as if disabling the pre-ra machine-scheduler, so it seems that the increased spilling there is avoided also this way. LBM also gains another 2% with this (now ~20%), so it looks preferable at least the moment. If the scheduler is improved to improve on the register pressure consistently, perhaps this could be reevaluated.

- With nightly full runs I now see three big improvements on z15:

  Improvements:
  0.794: f519.lbm_r 
  0.855: f538.imagick_r 
  0.906: f510.parest_r 

Will now give it a try to find further improvements with fused add/sub and multiply.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147800/new/

https://reviews.llvm.org/D147800

Files:
  llvm/include/llvm/CodeGen/TargetInstrInfo.h
  llvm/lib/Target/SystemZ/CMakeLists.txt
  llvm/lib/Target/SystemZ/SystemZ.h
  llvm/lib/Target/SystemZ/SystemZFinalizeReassociation.cpp
  llvm/lib/Target/SystemZ/SystemZISelDAGToDAG.cpp
  llvm/lib/Target/SystemZ/SystemZInstrFP.td
  llvm/lib/Target/SystemZ/SystemZInstrFormats.td
  llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
  llvm/lib/Target/SystemZ/SystemZInstrInfo.h
  llvm/lib/Target/SystemZ/SystemZInstrVector.td
  llvm/lib/Target/SystemZ/SystemZOperators.td
  llvm/lib/Target/SystemZ/SystemZScheduleZ13.td
  llvm/lib/Target/SystemZ/SystemZScheduleZ14.td
  llvm/lib/Target/SystemZ/SystemZScheduleZ15.td
  llvm/lib/Target/SystemZ/SystemZScheduleZ16.td
  llvm/lib/Target/SystemZ/SystemZTargetMachine.cpp
  llvm/lib/Target/X86/X86InstrInfo.cpp
  llvm/lib/Target/X86/X86InstrInfo.h
  llvm/test/CodeGen/SystemZ/fp-mul-02.ll
  llvm/test/CodeGen/SystemZ/machine-combiner-reassoc-fp.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D147800.523291.patch
Type: text/x-patch
Size: 64970 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20230518/7d45be52/attachment-0001.bin>


More information about the llvm-commits mailing list