[PATCH] D20531: MachineCopyPropagation: remove some more copys when they are not needed
Jonas Paulsson via llvm-commits
llvm-commits at lists.llvm.org
Mon May 23 11:11:23 PDT 2016
jonpa created this revision.
jonpa added a reviewer: MatzeB.
jonpa added a subscriber: llvm-commits.
I was experimenting with the Loop Unroller, and found a benchmark that regressed for no obvious reason as relating to the unrolling of loops. However, I found in a loop with four outer loops, an extra unneeded copy of a register.
It was clear that the COPY was not needed, however not so clear how it should be handled. Comparing the two compilations, the regcoalescer results were identical. It was the greedy regalloc that behaved a bit differently (there were more live intervals in the unrolled loop), for one reason or other.
I looked at MCP, and found that it was missing this very simple case. I am not sure if MCP should really be a simple pass and instead the regalloc should be fixed, but I guessed that MCP has some purpose of cleaning up after regalloc, and experimented with extending it to handle my regression. Here is my patch:
1) If a register use is defined by a COPY, and the COPY source reg could be used directly, use it instead if it seems that the COPY may be removable then.
2) Whenever an instruction defines (clobbers) any register, remove any COPY that is tracked in MaybeDeadCopies, just like it was done for RegMask operands.
This was needed to handle:
%R4D<def> = LGR %R12D
%R7D<def> = LGR %R9D<kill>
%R1D<def> = SLLG %R14D, %noreg, 2
%R0H<def> = LFH %R4D<kill>, 0, %R1D<kill>; mem:LD4[%arrayidx35.i.i](tbaa=!6)
%R1D<def> = LGR %R5D
%R4D<def> = LGR %R3D
The COPY of R12D to R4D is unnecessary. R4D could be replaced with R12D in the LFH instruction (first point above). The COPY itself can be removed when the last instruction redefines R4D (point two above).
Why is this pass so simple? It is not trusting live-in lists, and not recomputing liveness -- is this something "todo", or? My first thought was that liveness could be checked with a reaching phys defs algorithm (see http://reviews.llvm.org/D17257), and basically the pass could be doing a bit more work then. After working with this benchmark, it became clear that even a single COPY could make a noticable difference (SPEC/bzip2).
Greatful for any feedback on this!
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 7607 bytes
Desc: not available
More information about the llvm-commits