[PATCH] D98905: [SystemZ] Reuse known zeros/ones after zero-extension of i1.

Jonas Paulsson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Mar 18 16:02:53 PDT 2021


jonpa created this revision.
jonpa added a reviewer: uweigand.
Herald added a subscriber: hiraditya.
jonpa requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

This is an optimization for zero extensions of i1:s, which resulted from looking into the perl regression against GCC. I noticed a lot of LHI 0, LHI 1, LOC sequences, which gcc did not seem to have.

Basically, after an ICMP NE 0, or an ICMP EQ 1, those known constants are already the ones needed as the CC result, so there is no need to load them with LHI (LGHI).

The i32 cases where quite straightforward, but then doing the same for i64 was a bit more of an effort. Since we always return i32 from getSetCCResultType(), these cases needed different handling depending on the user. If an i64 use is needed, then I chose to promote the setcc result in combineZERO_EXTEND(). For the i32 user (of i64 comparison), the reused operand was instead truncated.

For the case of i64 user, a lot of llgfr:s remained without the handling in combineZERO_EXTEND():

  Optimized legalized selection DAG: %bb.0 'prototype_p:bb'
  SelectionDAG has 15 nodes:
    t0: ch = EntryToken
    t5: i64,ch = load<(load 8 from `%0** undef`)> t0, undef:i64, undef:i64
          t25: i32 = truncate t5
          t24: i32 = SystemZISD::ICMP t5, Constant:i64<0>, TargetConstant:i32<0>
        t29: i32 = SystemZISD::SELECT_CCMASK Constant:i32<1>, t25, TargetConstant:i32<14>, TargetConstant:i32<6>, t24
      t20: i64 = zero_extend t29
    t12: ch,glue = CopyToReg t0, Register:i64 $r2d, t20
    t13: ch = SystemZISD::RET_FLAG t12, Register:i64 $r2d, t12:1
  
  ->
          ltg     %r0, 0(%r1)
          lochilh %r0, 1
          llgfr   %r2, %r0

The general effects on SPEC'17 output:

1. Just the i32/i32 cases of NE 0:

  lhi            :               225081               221486    -3595
  lghi           :               445509               444420    -1089
  locghilh       :                 3717                 2676    -1041
  chsi           :                57297                56385     -912
  lt             :                13672                14348     +676
  lochilh        :                 8796                 9424     +628
  llgfr          :                90010                90533     +523
  risbgn         :               137540               137980     +440
  tmll           :                53266                53693     +427
  ltr            :                 6140                 6550     +410
  lgr            :               849527               849890     +363
  llc            :                39671                39994     +323
  locrlh         :                 1492                 1807     +315
  ...



2. Also the i64 cases, compared to (1):



  lghi           :               444420               441828    -2592
  lhi            :               221486               219782    -1704
  lr             :                62223                62878     +655
  cghsi          :                32665                32175     -490
  tmll           :                53693                54181     +488
  llgfr          :                90533                90066     -467
  ltg            :               157760               158133     +373
  risbgn         :               137980               138334     +354
  jne            :                42684                42990     +306
  lg             :               982786               982931     +145
  je             :               335154               335281     +127
  cije           :               107363               107237     -126
  lgfr           :                91442                91565     +123
  lgr            :               849890               850006     +116
  ltgr           :                10951                11067     +116
  ...



3. Also EQ 1 (both i32 and i64), compared to (2):



  lochilh        :                 9492                 9787     +295
  lhi            :               219782               219567     -215
  lochie         :                14183                13975     -208
  chi            :                53350                53448      +98
  chsi           :                56337                56259      -78
  lr             :                62878                62950      +72
  tmll           :                54181                54243      +62
  lghi           :               441828               441770      -58
  locghie        :                 7174                 7116      -58
  risbgn         :               138334               138383      +49
  ...

In total, master <> (3):

  lhi            :               225081               219567    -5514
  lghi           :               445509               441770    -3739
  locghilh       :                 3717                 2673    -1044
  chsi           :                57297                56259    -1038
  lochilh        :                 8796                 9787     +991
  tmll           :                53266                54243     +977
  risbgn         :               137540               138383     +843
  lr             :                62152                62950     +798
  lt             :                13672                14343     +671
  jne            :                42430                43028     +598
  cghsi          :                32644                32170     -474
  lgr            :               849527               849994     +467
  ltr            :                 6140                 6557     +417
  ...

I see some more LR:s, which I think is when the reused constant also has another user. I did not manage to avoid these cases when working with the DAGs (local only), so some kind of pseudo-expander might be more powerful here. That probably requires more work, and I am not sure if trading an LHI for an LR is bad, since the comparison does not clobber the register...

There are less comparisons with memory - the value is now loaded, compared and reused (see fun9 below).

Does this seem like a good idea to try?

New tests, master <> patched (skipped functions identical):

  fun0:                                   fun0:
          chi     %r2, 0                          chi     %r2, 0
          lhi     %r2, 0                <
          lochilh %r2, 1                          lochilh %r2, 1
                                        >
          br      %r14                            br      %r14
  
  
  fun4:                                   fun4:
          cghi    %r2, 0                          cghi    %r2, 0
          lhi     %r2, 0                <
          lochilh %r2, 1                          lochilh %r2, 1
                                        >
          br      %r14                            br      %r14
  
  
  fun5:                                   fun5:
          cghsi   0(%r1), 0             |         ltg     %r0, 0(%r1)
          lghi    %r0, 0                <
          locghilh        %r0, 1                  locghilh        %r0, 1
          stg     %r0, 0(%r1)                     stg     %r0, 0(%r1)
          br      %r14                            br      %r14
  
  
  fun6:                                   fun6:
          chi     %r2, 1                          chi     %r2, 1
          lhi     %r2, 0                |         lochilh %r2, 0
          lochie  %r2, 1                |
          br      %r14                            br      %r14
  
  
  fun8:                                   fun8:
          cghi    %r2, 1                          cghi    %r2, 1
          lhi     %r2, 0                |         lochilh %r2, 0
          lochie  %r2, 1                |
          br      %r14                            br      %r14
  
  
  fun9:                                   fun9:
          cghsi   0(%r1), 1             |         lg      %r0, 0(%r1)
          lghi    %r0, 0                |         cghi    %r0, 1
          locghie %r0, 1                |         locghilh        %r0, 0
          stg     %r0, 0(%r1)                     stg     %r0, 0(%r1)
          br      %r14                            br      %r14




https://reviews.llvm.org/D98905

Files:
  llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
  llvm/lib/Target/SystemZ/SystemZISelLowering.h
  llvm/test/CodeGen/SystemZ/int-cmp-59.ll
  llvm/test/CodeGen/SystemZ/setcc-05.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D98905.331705.patch
Type: text/x-patch
Size: 10588 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210318/576c486d/attachment.bin>


More information about the llvm-commits mailing list