[PATCH] D120277: [SystemZ] Expand some memcpys/memsets into Load/Store sequences.
Jonas Paulsson via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Feb 21 12:33:54 PST 2022
jonpa created this revision.
jonpa added a reviewer: uweigand.
Herald added subscribers: ctetreau, steven.zhang, dmgreen, hiraditya.
jonpa requested review of this revision.
Herald added a project: LLVM.
Herald added a subscriber: llvm-commits.
For sizes over 16 bytes MVC is not always efficient, so up to a certain limit it would be better to use a Load/Store sequence.
This is still experimental with two different approaches included for now, to show what they look like.
Approach 1:
New TLI hook prefersVectorSplatForMemset(), to avoid the mandatory scalar multiplication as a means of replication, but instead directly generate a splat vector. I think this makes sense, and it is probably even better to just do this whenever target returns a vector type from getOptimalMemOpType().
Approach 2:
Detect replicated bytes in SystemZTargetLowering::combineSTORE().
If getMemsetStores() generates the multiplies (as it does now unaltered), they need to be combined in combineSTORE() . This is used like '-memset-splat=false -byterepl-fix'. This seems to work also, although it is more LOCs. It does seem though as it would be good also on its own - I see if using this *without* expanding any memcpy/memsets:
vsteh : 2557 2875 +318
vlrepb : 187 475 +288
llc : 39057 38771 -286
sth : 25792 25515 -277
mhi : 6009 5741 -268 // multiply
stg : 371885 371803 -82
vstef : 5779 5859 +80
st : 122692 122620 -72
lay : 54734 54797 +63
vsteg : 6106 6159 +53
lg : 987456 987405 -51
sthy : 1054 1014 -40
vlvgp : 8300 8339 +39
vrepb : 95 134 +39
vrepib : 283 320 +37
msgrkc : 6741 6707 -34 // multiply
iilf : 6397 6364 -33
msfi : 7106 7082 -24
vl : 107362 107381 +19
...
Spill|Reload : 611703 611679 -24
Copies : 1002825 1002832 +7
Example:
- llc %r0, 0(%r4)
- msrkc %r0, %r0, %r0
- st %r0, 0(%r1)
+ vlrepb %v0, 0(%r4)
+ vstef %v0, 0(%r1), 0
Maybe this could be done in the common DAGCombiner even...
LegalAMVecTy and GEPOffsSplit are experimental options I have played with to see how to best fix the problem where a memcpy address is >U12 range, and now we instead get multiple VL/VSTs which all are out of range. This is a problem that needs to be fixed before using this patch, I think. I see for instance:
stg %r2, 8696(%r15 stg %r2, 8696(%r15
lg %r2, 8816(%r15 lg %r2, 8816(%r15
> vl %v0, 0(%r2), 3
lay %r1, 8712(%r15 lay %r1, 8712(%r15
mvc 0(44,%r1), 0(% | vst %v0, 0(%r1), 3
> vl %v0, 16(%r2),
> lay %r1, 8728(%r15
> vst %v0, 0(%r1), 3
> vl %v0, 28(%r2)
> lay %r1, 8740(%r15
> vst %v0, 0(%r1)
The problem right now is that if the vector type is set to reject long displacements in isLegalAddressingMode(), LSR generates worse code for some loops :-/
The MVI_TYPEFIX code is probably not needed above 16 bytes, but in some cases there were immediates of type i64 that first got loaded into a register instead of used directly with MVI.
The test cases for now include full ranges of interesting sizes of memcpy/memset.
https://reviews.llvm.org/D120277
Files:
llvm/include/llvm/CodeGen/TargetLowering.h
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
llvm/lib/Target/SystemZ/SystemZISelLowering.h
llvm/test/CodeGen/SystemZ/memcpy-03.ll
llvm/test/CodeGen/SystemZ/memset-08.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D120277.410362.patch
Type: text/x-patch
Size: 63394 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220221/21e0fe29/attachment.bin>
More information about the llvm-commits
mailing list