[PATCH] D100242: [SystemZ / TII] Peephole optimization of zero-extension of i1.
Jonas Paulsson via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Apr 16 05:38:38 PDT 2021
jonpa updated this revision to Diff 338071.
jonpa added a comment.
- Only check for opcodes in analyzeSelect() and avoid common-code changes by returning a new instruction from optimizeSelect().
- seemed best to use MRI to find the LHIMux / LGHI (as opposed to looking for it in MBB). Even if there is a load-immedate that has several users, the LOC is 2-address so it is still beneficial.
- I tried a simplistic search to handle cases with multiple users but where the LOC use should be the kill ("last user"). The kill-flags did not really help much, so this is not trivial to handle. This gave just some additional eliminations (~5300 -> ~6000), so it is probably acceptable to just check for a single use and not worry about those extra cases. Possibly some simple check could be used rather than a full CFG-search...
- I found out that the new extra LGR/LR instructions relates very much to physregs/calls: The LOC-imm serves as a "natural" change of registers if the immediate is loaded into a register, while if the compare operand is reused there will be a need for a COPY if for instance the compare register comes from a COPY of a physreg, while the LOC-def needs to be live across a call. This is also not trivial to detect - I had to use slow experimental functions to determine if the vregs (and connected vregs) crossed a call. This got rid so far of most of the extra moves, but not all.
- I also tried another idea: instead of detecting the cases to avoid (per previous point), do all cases but return false in SystemZRegisterInfo::shouldCoalesce() for the COPY created by TwoAddress for the LOC(G)HI. In many cases regalloc can then eliminate the COPY without the help of RegisterCoalescer, and in the remaining cases SystemZInstrInfo::copyPhysReg() could then lower the COPY with a L(G)HI instead of L(G)R.
This however didn't seem as good as good as I had hoped:
- There are with this many cases of CGHI+LGHI which previously became LTGR (not all of those COPYs become coalesced on trunk to begin with, so many of them get the LGR which then becomes an LTGR in SystemZElimCompare).
- This didn't so far really eliminate the extra COPYs (LGR/LR:s), but it may be possible to investigate further in shouldCoalesce() using LiveIntervals that is available there and fine-tune this even more.
With the patch as it is we trade ~5000 immediate loads for ~500 register moves, which seems good just looking at the instruction count, but not if a register move is potentially more costly than an immediate load..?
In summary:
- There are relatively few extra cases to be handled if the interesting multiple use cases are searched for.
- It is relatively hard to get rid of the extra L(G)R:s as it depends on presence of calls in the function (global search).
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D100242/new/
https://reviews.llvm.org/D100242
Files:
llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp
llvm/lib/Target/SystemZ/SystemZInstrInfo.h
llvm/lib/Target/SystemZ/SystemZInstrInfo.td
llvm/test/CodeGen/SystemZ/setcc-05.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D100242.338071.patch
Type: text/x-patch
Size: 8945 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210416/dd72040d/attachment.bin>
More information about the llvm-commits
mailing list