[llvm-dev] Redundant copies

Sjoerd Meijer via llvm-dev llvm-dev at lists.llvm.org
Thu Mar 12 12:50:03 PDT 2020


+ Sam

Hi Roger,

FWIW: we have observed redundant copies/movies, they are annoying us for some time now but we haven't got round to looking at it. Not sure we if we are looking at exactly the same problem, but I guess so.

Treating symptoms with post RA dead code elimination might be very effective, but it might also be worth to just have a look at the source of the problem (regalloc?) to see if we are not missing something obvious.

Regarding a post RA pass: you may want to have a look at the ARM hardware-loop pass. In order to make that beneficial, we have to do quite some dead code elimination post RA, both in inside loops and in preheaders, see e.g. ARMLowOverheadLoops::IterationCountDCE. This is using ReachingDefAnalysis (RDA), which has been extended by Sam and made more generic to support this, which was also going to be his eurollvm talk: http://llvm.org/devmtg/2020-04/talks.html#LightningTalk_26. End of advertisement. ;-) Basically what I want to say is that this should provide most of the things you'll need.

Cheers,
Sjoerd.



________________________________
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Roger Ferrer Ibáñez via llvm-dev <llvm-dev at lists.llvm.org>
Sent: 12 March 2020 18:06
To: LLVM-Dev <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] Redundant copies

Hi all,

we have encountered a case of redundant copies still left in the final code and we would like to, at least, mitigate it. The original motivating case comes from a context where we have large vector registers. In that context, copies are expensive and we would like to avoid them as much as possible.

This small testcase in C, similar to the original vector case, exposes the issue but using scalars.

long a, b;
long fn1();
long fn2() {
  long c = a, d = c;
  for (; b;) {
    long e = fn1();
    d = d + e;
  }
  long f = d - c;
  return f;
}

For instance in RISC-V we emit something like this but other backends like ARM or X86 show the same behaviour.

add s0, zero, s2   # ← copy
beqz a0, .LBB0_3
# %bb.1:                                # %for.body.preheader
add s0, zero, s2  # ← not needed
.LBB0_2:                                # %for.body

Has anyone encountered a similar issue like this in the past?

We are looking into removing these copies with a post RA pass to address the most obvious case: if we see a copy with the same physregs in dest and source to an earlier one and the reaching definition of the dest and source registers is one and the same, then that copy should be redundant.

This might be too specific though, so perhaps there are better approaches?

Thanks!

--
Roger Ferrer Ibáñez
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200312/0a708cae/attachment.html>


More information about the llvm-dev mailing list