[PATCH] D30751: [MachineCopyForwarding] Add new pass to do register COPY forwarding at end of register allocation.

Tue Mar 14 14:01:15 PDT 2017

gberry added a comment.

@javed.absar The purpose of this pass is not to reduce register pressure (since it is run just after register allocation), but to allow more scheduling flexibility and to a lesser degree to remove some redundant COPYs.  I'll elaborate on this in my response to Quentin.
As for your question about why more ARM tests aren't effected, I don't have a good answer, but my guess would be that there are just more X86 lit test cases both in general and in the number that are sensitive to changes in register allocation.

================
Comment at: test/CodeGen/AArch64/arm64-zero-cycle-regmov.ll:7
 ; CHECK-LABEL: t:
-; CHECK: mov x0, [[REG1:x[0-9]+]]
-; CHECK: mov x1, [[REG2:x[0-9]+]]
+; CHECK: mov [[REG2:x[0-9]+]], x3
+; CHECK: mov [[REG1:x[0-9]+]], x2
----------------
javed.absar wrote:
> Would it be better to rewrite these as MIR tests?
I'm not sure how that would help.  In this test, similar to the one Hal asked about before, the newly checked 'mov's aren't new, I just needed to add them to get the new register numbers.  Here are the full diffs of the generated code for this test case:

```
 _t:                                     ; @t
 ; BB#0:                                 ; %entry
 	stp	x20, x19, [sp, #-32]!   ; 8-byte Folded Spill
 	stp	x29, x30, [sp, #16]     ; 8-byte Folded Spill
 	mov	 x19, x3
 	mov	 x20, x2
-	mov	 x0, x20
-	mov	 x1, x19
+	mov	 x0, x2
+	mov	 x1, x3
 	bl	_foo
 	mov	 x0, x20
 	mov	 x1, x19
 	bl	_foo

```

================
Comment at: test/CodeGen/AArch64/neg-imm.ll:9
 ; CHECK_LABEL: test:
 ; CHECK_LABEL: %entry
+; CHECK: subs [[REG0:w[0-9]+]],
----------------
javed.absar wrote:
> Would it be better adding new/separate test file instead of changing the purpose of this one ?
Again, I'm not trying to change the purpose of this test.  My change just caused things to be scheduled slightly differently.  The test is still checking that the condition is computed by a 'subs' feeding a 'csel'.  Here are the full diffs:

```
test:                                   // @test
 	str	x20, [sp, #-32]!        // 8-byte Folded Spill
 	stp	x19, x30, [sp, #16]     // 8-byte Folded Spill
+	subs	w8, w0, #1              // =1
 	mov	 w19, w0
-	subs	w8, w19, #1             // =1
 	csel	w20, wzr, w8, lt
 .LBB0_1:                                // %for.body
                                         // =>This Inner Loop Header: Depth=1
 	cmp		w19, w20
 	b.eq	.LBB0_3
 // BB#2:                                // %if.then3
                                         //   in Loop: Header=BB0_1 Depth=1
 	mov	 w0, w20
 	bl	foo
 .LBB0_3:                                // %for.inc
                                         //   in Loop: Header=BB0_1 Depth=1
 	cmp		w20, w19
 	add	w20, w20, #1            // =1
 	b.le	.LBB0_1
 // BB#4:                                // %for.cond.cleanup
 	ldp	x19, x30, [sp, #16]     // 8-byte Folded Reload
 	ldr	x20, [sp], #32          // 8-byte Folded Reload
 	ret

```

https://reviews.llvm.org/D30751