[PATCH] D120531: [SystemZ] Use VREP for storing replicated regs/immediates.
Jonas Paulsson via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 24 18:55:37 PST 2022
jonpa created this revision.
jonpa added a reviewer: uweigand.
Herald added subscribers: steven.zhang, hiraditya.
jonpa requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.
When a replicated register / immediate is immediately stored, it is better to use a Vector Replicate rather than a scalar multipliation (by e.g. 0x0101), or two immediate loads into a GPR.
This patch transforms such stores in SystemZ::combineSTORE() before type legalization.
I also tried doing this after legalization, but that seems to be more work without any benefit (actually the i128 case is better handled after splitting). If the types have to be legal, the right vector type have to be produced, with an extracted element. If that element is i16 (not uncommon), an i32 first needs to be extracted and the the store must truncate itself. This is all taken care of by the type legalizer if it is transformed before it.
What's more, the zero-extend node which the patch depends on in order to be sure that the multiply produces a replicated word is easy to detect on the initial DAG, but it is removed later by DAGCombine. And computeKnownBits() do not necessarily work (at least not on the i16 it seems).
I also don't think the DAGCombiner / legalizer will produce these patterns so if there is no other argument, I think it is probably simplest to do this with the inital DAG..?
- Maybe detect the immediate splat with SystemZVectorConstantInfo, and perhaps also see if there are other immediates to be built/stored instead of via GPRs?
- I tried avoiding the extra LAY:s but it did not seem to be better on benchmarks - the scalar multiply is slower than the LAY so it should always be better anyway, right?
- OK to do before legalize only, or continue work for Combine2?
SPEC:
vsteg : 5918 6372 +454
stg : 370142 369696 -446
vrepif : 7729 8077 +348
llihl : 7265 6940 -325
oill : 18574 18254 -320
vsteh : 2557 2875 +318
vstef : 5796 6105 +309
vlrepb : 207 509 +302
llc : 39072 38771 -301
sth : 25741 25463 -278
mhi : 6070 5802 -268
lay : 55017 55271 +254
st : 127451 127280 -171
msfi : 7106 6974 -132
sty : 3627 3514 -113
iilf : 6305 6212 -93
vrepih : 1133 1211 +78
vreph : 191 259 +68
vlvgp : 8511 8576 +65
...
OPCDIFFS: -251
...
Spill|Reload : 607848 607780 -68
Copies : 995492 995481 -11
Some improvements on benchmarks on z14, but more neutral on z15.
https://reviews.llvm.org/D120531
Files:
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
llvm/test/CodeGen/SystemZ/store-replicated-vals.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D120531.411285.patch
Type: text/x-patch
Size: 15742 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220225/0fee27d9/attachment-0001.bin>
More information about the llvm-commits
mailing list