[PATCH] D120531: [SystemZ] Use VREP for storing replicated regs/immediates.

Jonas Paulsson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Feb 24 18:55:37 PST 2022


jonpa created this revision.
jonpa added a reviewer: uweigand.
Herald added subscribers: steven.zhang, hiraditya.
jonpa requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.

When a replicated register / immediate is immediately stored, it is better to use a Vector Replicate rather than a scalar multipliation (by e.g. 0x0101), or two immediate loads into a GPR.

This patch transforms such stores in SystemZ::combineSTORE() before type legalization.

I also tried doing this after legalization, but that seems to be more work without any benefit (actually the i128 case is better handled after splitting). If the types have to be legal, the right vector type have to be produced, with an extracted element. If that element is i16 (not uncommon), an i32 first needs to be extracted and the the store must truncate itself. This is all taken care of by the type legalizer if it is transformed before it.

What's more, the zero-extend node which the patch depends on in order to be sure that the multiply produces a replicated word is easy to detect on the initial DAG, but it is removed later by DAGCombine. And computeKnownBits() do not necessarily work (at least not on the i16 it seems).

I also don't think the DAGCombiner / legalizer will produce these patterns so if there is no other argument, I think it is probably simplest to do this with the inital DAG..?

- Maybe detect the immediate splat with SystemZVectorConstantInfo, and perhaps also see if there are other immediates to be built/stored instead of via GPRs?

- I tried avoiding the extra LAY:s but it did not seem to be better on benchmarks - the scalar multiply is slower than the LAY so it should always be better anyway, right?

- OK to do before legalize only, or continue work for Combine2?

SPEC:

  vsteg          :                 5918                 6372     +454
  stg            :               370142               369696     -446
  vrepif         :                 7729                 8077     +348
  llihl          :                 7265                 6940     -325
  oill           :                18574                18254     -320
  vsteh          :                 2557                 2875     +318
  vstef          :                 5796                 6105     +309
  vlrepb         :                  207                  509     +302
  llc            :                39072                38771     -301
  sth            :                25741                25463     -278
  mhi            :                 6070                 5802     -268
  lay            :                55017                55271     +254
  st             :               127451               127280     -171
  msfi           :                 7106                 6974     -132
  sty            :                 3627                 3514     -113
  iilf           :                 6305                 6212      -93
  vrepih         :                 1133                 1211      +78
  vreph          :                  191                  259      +68
  vlvgp          :                 8511                 8576      +65
  ...
  OPCDIFFS: -251
  ...
  Spill|Reload   :               607848               607780      -68
  Copies         :               995492               995481      -11

Some improvements on benchmarks on z14, but more neutral on z15.


https://reviews.llvm.org/D120531

Files:
  llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
  llvm/test/CodeGen/SystemZ/store-replicated-vals.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D120531.411285.patch
Type: text/x-patch
Size: 15742 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220225/0fee27d9/attachment-0001.bin>


More information about the llvm-commits mailing list