[PATCH] D76055: [SystemZ] Improve foldMemoryOperandImpl().

Mon Mar 23 09:48:43 PDT 2020

uweigand added a comment.

In D76055#1921620 <https://reviews.llvm.org/D76055#1921620>, @jonpa wrote:

> >   logical compares (CLFHSI, CLGHSI) -- those should be trivial to add
>
> Patch updated to include these as well, with a check for an uint<16> immediate. This gives +1800 CLGHSI and +200 CLFHSI compared to before.

OK, good!

>>   addition (ASI, AGSI, potentially also ALSI, ALGSI) -- it should be possible to add them, I think (it may be a bit different since it would replace a read-modify-write)
> 
> These are already mostly handled, right? Exception is AHIMuxK which I couldn't commit since it is causing a test case to get a lot of uncoalesced COPYs as reported on Bugzilla. AGHIK could also be handled the same way.

Ah right, I missed those.

A(G)HIK is an interesting case; we already have code to check whether we can replace a three-operand with a two-operand opcode, but that is done *after* the A(G)HI check.  I guess in the past, this wasn't very important since most operations were two-operand; but a while back, we changed that to start out everything as three-operand, and only switch back to two-operand later -- this might have caused a regression for foldMemoryOperandImpl ...

Not sure I understand the AHIMuxK problem.  In order to do foldMemoryOperandImpl, we'd have do first to the three-operand to two-operand conversion like for A(G)HIK.  And then it doesn't matter whether the spilled register is GR32 or GRH32 as we'd always leave it in memory and replace it with an ASI, right?

>> Then there are a number of instructions that operate on a byte or halfword level in memory:
> 
> The only thing I can see is just a few cases of:
>  NIFMux -> NI (just a few cases with "small" immediate)
>  OILMux -> OI few cases with small immediate
>  XIFMux -> XI 10 cases with small immediate
> 
> Given it is rare it seems it can probably wait..

Agreed.  Thanks for checking!

> What remains (from looking at benchmarks) seems to be:
> 
> - We can inspect the allocated physregs in foldMemoryOperandImpl(), so we could try to handle W... FP instructions. We could experiment with doing this only if there is a present allocation that fits, or what happens if we constrain it in case it is unallocated (to FP registers 0-15):
> 
>   ``` WFA[SDX]B op 1/2 -> two address A[DEX]BR if low16 vecregs -> A[DEX]B if dst same...

Well ... even for the first step you already need the dst to be the same ... A[DEX]BR themselves are two-operand.

In any case, that seems a similar case to the three- vs. two-operand issue, except that now we'd need to constrain the regclass, so that might be an overall loss.  Hard to say.  Might be interesting to experiment.

> - MS(G)RKC -> MS(G) reg/mem mapping seems missing (4000 potential cases)

OK, need to watch out for CC though.

> - VLVGP (Ops 1, 2)  (justa a dozen cases)  -> 2 x VLEG
> - VLVGP32 (Ops 1, 2 same reg) (just ~20 cases) -> VLREPF ?
> - LFER / LEFR (88 cases)

Not sure whether this is worthwhile, but OK.

> - When where 'Ops.size() == 2', for ops [0, 1]: A(G)RK, S(G)RK, NRK, ORK, WF[AS][DS]B, MS(G)RKC
> 
>   Many of these have the same virt reg in operands 0 and 1, but not all. Since we now don't use a reg/mem instruction, we do <tmp> = reload <tmp> = ARK  <tmp> store <tmp>
> 
>   If we could use a temporary register in foldMemoryOperandImpl(), I think we instead could do <tmp> = A store <tmp>
> 
>   Not sure if that's allowed...

So this is the case where we have a three-operand instruction, and the output and one of the inputs are spilled (but not the same), right?

Can't we do the folding just on the input then, and leave the output as spilled?   Not sure how this works with the foldMemoryOperandImpl logic ...

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76055/new/

https://reviews.llvm.org/D76055