[PATCH] D103865: [SystemZ] Generate XC loop for memset 0 of variable length.

Ulrich Weigand via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jun 22 06:07:42 PDT 2021


uweigand added a comment.

In D103865#2831600 <https://reviews.llvm.org/D103865#2831600>, @jonpa wrote:

>> One thing that is odd is the extra always-zero parameter to the LoopVarLen insns. This really should be removed.
>
> Done - it was easy to remove on the MachineInstr, but on the SystemZISD node it seems reasonable to keep a dummy zero there and reuse the existing opcode, or?

Aren't there already two opcodes, one with 4 operands and one with 5?   Can't you just use the one with 4 operands instead?

That said, maybe it actually is better to use the version with 5 operands, and actually pass in two values: the length (minus one), and the loop trip count (length / 256).   These two values can easily be computed at the SelectionDAG level already, and if they are, that might open opportunities to optimize / reuse the values at that level.    (Also, then it is more obvious to re-use the same opcode, since the operands do in fact have the same semantics then, it's just that in one case they're immediates and in the other they are registers.)

>> As a suggestion that might remove a bit of the duplication between emitMemMemWrapper and emitMemMemLoopVarLen: the (old) routine is currently called for both the Sequence and Loop insns, and it distinguishes between them based on the operand count. However, for the LoopVarLen case, your patch now introduces a separate new routine. I think it might be simpler to continue to use the same routine, and just internally distinguish between the cases based on whether the length operand is an immediate or a register.
>
> OK - I merged the two functions instead.

Hmm, that's not quite what I had in mind, sorry.   This code still has two completely separate code paths, just now in one function.   What I had been thinking of is to actually share the same code path that e.g. emits the loop -- they should be identical except that the trip count is an immediate vs. a register.

The point is to merge the two paths to the extent where there is no longer any point in extracting buildMemMemLoop as a subroutine as it actually is used only once.

> On spec, I see now 8869 exrl instructions, each with its own xc target instruction. I wonder if we should try to avoid duplicating target instructions. I see in some files hundreds of identical xc targets (worst case is 845 identical ones in one file: f510.parest_r/build/vectors.s !). 67 files have more than 20 identical xc targets...

That makes sense.  It should be straightforward to sort and de-duplicate the target instructions at final emission time.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103865/new/

https://reviews.llvm.org/D103865



More information about the llvm-commits mailing list