<div dir="ltr"><div>Hi all,</div><div><br></div><div>we have encountered a case of redundant copies still left in the final code and we would like to, at least, mitigate it. The original motivating case comes from a context where we have large vector registers. In that context, copies are expensive and we would like to avoid them as much as possible.<br></div><div><br></div><div>This small testcase in C, similar to the original vector case, exposes the issue but using scalars.<br></div><div><br></div><div style="margin-left:40px">long a, b;<br>long fn1();<br>long fn2() {<br>  long c = a, d = c;<br>  for (; b;) {<br>    long e = fn1();<br>    d = d + e;<br>  }<br>  long f = d - c;<br>  return f;<br>}</div><div><br></div><div>For instance in RISC-V we emit something like this but other backends like ARM or X86 show the same behaviour.<br></div><div><br></div><div style="margin-left:40px">      add     s0, zero, s2   # ← copy<br>   beqz    a0, .LBB0_3<br># %bb.1:                                # %for.body.preheader<br>    add     s0, zero, s2  # ← not needed<br>.LBB0_2:                                # %for.body<br></div><div><br></div><div></div><div>Has anyone encountered a similar issue like this in the past?</div><div><br></div><div>We are looking into removing these copies with a post RA pass to address the most obvious case: if we see a copy with the same physregs in dest and source to an earlier one and the reaching definition of the dest and source registers is one and the same, then that copy should be redundant.</div><div><br></div><div>This might be too specific though, so perhaps there are better approaches?<br></div><div><br></div><div>Thanks!<br></div><br>-- <br><div dir="ltr" data-smartmail="gmail_signature">Roger Ferrer Ibáñez<br></div></div>