[PATCH] D100242: [SystemZ / TII] Peephole optimization of zero-extension of i1.

Jonas Paulsson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 23 03:26:52 PDT 2021


jonpa updated this revision to Diff 339965.
jonpa added a comment.

I did some further experiments with using a 3-address pseudo, utilizing the already in-place regalloc-hints towards 2-address opcodes and then handling the 3-address pseudos during Post RA pseudo expansion.

I am not sure which variant is really the best - see tables below.  The "single user only / 3-address pseudo" is nice: it's less aggressive but seems to have only positive effects. Looking at preliminary benchmark runs, the "rewrite register (no pseudo)" versions may be slightly preferred on the other hand... If there really are any benefits, that would be nice, but I suspect these differences are so small that instead we should expect to see no improvements, and then look at the patch/opcode differences mostly.

****Opcode diffs, SPEC 2017:****

trunk <> patch, multiple users

  lhi            :               226044               221311    -4733   // ~7000: highest number of eliminated immediate-loads
  lghi           :               445650               443171    -2479
  lr             :                61854                62741     +887   // Increase in register moves
  lgr            :               853624               854133     +509
  ltr            :                 6173                 6387     +214
  lochilh        :                 9162                 9361     +199
  cih            :                 8103                 7935     -168
  ltgr           :                 9394                 9548     +154
  chi            :                53571                53420     -151
  lochie         :                13917                13794     -123
  cghi           :                14071                13954     -117
  iihf           :                 4263                 4163     -100
  l              :               177406               177487      +81
  ...

trunk <> patch, single user

  lhi            :               226044               222775    -3269  // ~5000
  lghi           :               445650               443624    -2026
  lgr            :               853624               854050     +426  // some new register moves near calls/argument copys.
  cih            :                 8103                 7974     -129
  lochilh        :                 9162                 9265     +103
  stg            :               375242               375320      +78
  ...

trunk <> patch, multiple users, emit a 3-address pseudo

  lhi            :               226044               222916    -3128  // ~4000
  lg             :               986749               985733    -1016  // - Mostly due to a single file where many reloads
  lghi           :               445650               444656     -994  //   turned into (slightly fewer) copys instead.
  cghsi          :                32839                32395     -444
  cih            :                 8103                 7721     -382
  cgijle         :                 7698                 8059     +361
  jle            :                36186                35849     -337
  chsi           :                57211                57445     +234
  lochilh        :                 9162                 9317     +155
  jlh            :               164726               164574     -152
  mvghi          :                57787                57638     -149
  l              :               177406               177554     +148
  lochie         :                13917                13779     -138
  cijlh          :                78622                78745     +123
  cgije          :               118679               118798     +119
  je             :               336281               336165     -116
  ltg            :               157803               157693     -110
  lgr            :               853624               853724     +100
  risbhg         :                 6313                 6413     +100
  ...

trunk <> patch, single user, emit a 3-address pseudo

  lhi            :               226044               223291    -2753  // ~4000
  lghi           :               445650               444558    -1092
  lg             :               986749               986531     -218
  lgr            :               853624               853467     -157
  ltg            :               157803               157663     -140
  cgije          :               118679               118792     +113
  je             :               336281               336172     -109
  ...

**Benchmark measurements (quick runs):**

These show only slight variations in performance. It's hard to say which one is really best, if any. I have made previously full runs with just B and C, where they both seemed slightly profitable and possibly B was the better one...

**z14:**
========

  Overall results (integral over benchmarks):
  Build:                                                                    Result       Improvements Regressions
  2017_C_PeepLOCI_peep_multiple_users_false                                 0.986        0.960        1.026
  2017_B_PeepLOCI                                                           0.989        0.958        1.031
  2017_D_PeepLOCI_peep_pseudo                                               0.996        0.962        1.034
  2017_E_PeepLOCI_peep_pseudo_peep_multiple_users_false                     1.007        0.985        1.022
  
  Overall results (by average over benchmarks):
  Build:                                                                    Average result
  2017_C_PeepLOCI_peep_multiple_users_false                                 99.927 %
  2017_B_PeepLOCI                                                           99.942 %
  2017_D_PeepLOCI_peep_pseudo                                               99.980 %
  2017_E_PeepLOCI_peep_pseudo_peep_multiple_users_false                     100.038 %



**z15:**
========

  Overall results (integral over benchmarks):
  Build:                                                                    Result       Improvements Regressions
  2017_D_PeepLOCI_peep_pseudo                                               0.989        0.967        1.021
  2017_B_PeepLOCI                                                           0.998        0.966        1.032
  2017_E_PeepLOCI_peep_pseudo_peep_multiple_users_false                     0.999        0.980        1.019
  2017_C_PeepLOCI_peep_multiple_users_false                                 1.007        0.979        1.028
  
  Overall results (by average over benchmarks):
  Build:                                                                    Average result
  2017_D_PeepLOCI_peep_pseudo                                               99.941 %
  2017_B_PeepLOCI                                                           99.990 %
  2017_E_PeepLOCI_peep_pseudo_peep_multiple_users_false                     99.997 %
  2017_C_PeepLOCI_peep_multiple_users_false                                 100.039 %


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100242/new/

https://reviews.llvm.org/D100242

Files:
  llvm/lib/Target/SystemZ/SystemZInstrFormats.td
  llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
  llvm/lib/Target/SystemZ/SystemZInstrInfo.h
  llvm/lib/Target/SystemZ/SystemZInstrInfo.td
  llvm/test/CodeGen/SystemZ/setcc-05.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D100242.339965.patch
Type: text/x-patch
Size: 12208 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210423/74ff4513/attachment.bin>


More information about the llvm-commits mailing list