[PATCH] D103865: [SystemZ] Generate XC loop for memset 0 of variable length.

Jonas Paulsson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jun 23 17:15:24 PDT 2021


jonpa updated this revision to Diff 354114.
jonpa added a comment.

> Aren't there already two opcodes, one with 4 operands and one with 5? Can't you just use the one with 4 operands instead?

Ah, yes, I guess I was looking at the names: 'Length' vs 'Loop'...

> That said, maybe it actually is better to use the version with 5 operands, and actually pass in two values: the length (minus one), and the loop trip count (length / 256). These two values can easily be computed at the SelectionDAG level already, and if they are, that might open opportunities to optimize / reuse the values at that level. (Also, then it is more obvious to re-use the same opcode, since the operands do in fact have the same semantics then, it's just that in one case they're immediates and in the other they are registers.)

I updated the patch to create the length-1 and trip count on the DAG instead.

I see that this means that they are initially placed before the zero-length check, while before they were put in the preheader of the loop. The SRLG is actually moved to the preheader by MachineSink later, and this could probably be done for the AGHI as well if the comparison would be made with the AGHI source against 0.

To keep things simple, I just changed the comparison to be against -1 instead for now. What do you think about that - is the zero-length case rare enough to ignore where the AGHI is placed? We could do the extra work of finding the AGHI/-1 and comparing with 0 of that source reg instead, and expect the AGHI will be sunk later if there are no other users I would hope.

> Hmm, that's not quite what I had in mind, sorry. This code still has two completely separate code paths, just now in one function. What I had been thinking of is to actually share the same code path that e.g. emits the loop -- they should be identical except that the trip count is an immediate vs. a register.
>
> The point is to merge the two paths to the extent where there is no longer any point in extracting buildMemMemLoop as a subroutine as it actually is used only once.

I gave it another try: First let the reg/immediate cases set up the needed MBBs and then build the loop afterwards for both cases in one place. I guess that seems more simple than having a separate class for creating the loop...

> That makes sense. It should be straightforward to sort and de-duplicate the target instructions at final emission time.

AsmPrinter::EmitToStreamer() calls getSubtargetInfo() which needs an MF, so it did not seem quite simple to emit all the EXRL targets in emitEndOfAsmFile(). That method seems to be intended for other things than emitting instructions. Does it seem right to try to create a new section at that point after all else and emit the target instructions?

For now, I tried just improving the output per function. This saves 4110 XC target instructions (prior count was ~9000). Worst case is now 168 in one file, and 38 files have >20, so it is an improvement, but still some room for improvement if reusing per file...


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103865/new/

https://reviews.llvm.org/D103865

Files:
  llvm/lib/Target/SystemZ/SystemZAsmPrinter.cpp
  llvm/lib/Target/SystemZ/SystemZAsmPrinter.h
  llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
  llvm/lib/Target/SystemZ/SystemZInstrFormats.td
  llvm/lib/Target/SystemZ/SystemZInstrInfo.td
  llvm/lib/Target/SystemZ/SystemZSelectionDAGInfo.cpp
  llvm/test/CodeGen/SystemZ/memset-05.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D103865.354114.patch
Type: text/x-patch
Size: 22623 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210624/b3dc7639/attachment.bin>


More information about the llvm-commits mailing list