[PATCH] D29295: Move core RDF files from lib/Target/Hexagon to CodeGen
Krzysztof Parzyszek via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 31 08:54:43 PST 2017
kparzysz added a comment.
In https://reviews.llvm.org/D29295#661310, @hfinkel wrote:
> We should definitely address the question of "Why is this needed currently?". If you're using this for copy propagation and DCE, why do you have dead code post-RA that was not eliminated by our pre-RA DCE? Why do you have copy-propagation opportunities that present themselves only after RA? Does this mean that you/we're missing opportunities to make better RA decisions?
CP/DCE was part of the original motivation for this effort. The idea of doing some post-RA optimizations had been floating around for quite some time before this code was written. Then after it was written, it took some time before it was upstreamed. Back then, the code coming out of the register allocator was not great---there were quite a few cases of register copies left over after PHI elimination, which we thought could be removed. We were receiving bug reports from our customers about redundant register assignments (crossing basic block boundary), etc., and we thought that having a general framework would help solve that, as well as provide a way to address any future concerns related to register allocation specifics. Eventually, the register allocator became a lot better, and so that issue was no longer the most important. The CP/DCE code uses target-specific hooks, which on Hexagon, allow us to do some cleanup in cases that the register allocator cannot handle. For example, we can convert a post-increment load into a regular load where the address register is not used afterwards (i.e. the "increment" part of the instruction is "dead"). We could also recognize certain Hexagon instructions as copies, for example "r1:0 = combine(r2, r3)", which is equivalent to "r1 = r2; r0 = r3". In a more specific case, like "r1:0 = combine(r1, r3)", it could be replaced with "r0 = r3". Those are not sources of major improvements in code size of performance, but still do some useful things that help certain benchmarks. Another role that they play at the moment is helping to test developments in the graph construction and liveness computations.
One of existing use cases for CP are situations where a register copy is immediately followed by uses of the destination register. This creates a flow dependency and restricts packetization. Even in the absence of DCE, CP itself could make a difference. The note  also refers to such cases. I don't have a testcase handy that shows this, but it still does happen in our code.
 Not all such issues are correctible in the allocator itself, as not all of them qualify as "bugs". For example, having a pair of copies "r1 = r10; r2 = r18" takes two instructions, but a pair "r1 = r10; r0 = r18" could be replaced with a single instruction "r1:0 = combine(r10, r18)". This form of register renaming is not implemented yet, but it would be a good use case for this framework. So far most of the time has been spent on developing the framework itself, and not so much on actually utilizing it (although we do have other users than the CP/DCE).
 Other things unchanged, this wouldn't make much of a difference. The benefit of doing it is to relax restrictions on post-RA scheduling (which on Hexagon includes packetization).
More information about the llvm-commits