[PATCH] D9107: Fix CombineToPostIndexedLoadStore in DAGCombiner.cpp

Tue Jul 7 13:50:45 PDT 2015

Hi François,

A couple more comments.

It seems your patch exposed a few short coming in the load store optimizer for ARM. I am not sure we can proceed with this commit without fixing the load store optimizer first, otherwise we may regress a bunch of thing… Which is unfortunate since your patch seems a good general improvement to me.

Did you happen to run benchmark on ARM with/without this change?
That would help making our mind.

Anyway, let me look at the XFAILed test cases.

Cheers,
-Quentin


================
Comment at: test/CodeGen/ARM/automod-test.ll:27
@@ +26,3 @@
+;	bne	.LBB0_1
+; ======================================================
+
----------------
Looks like we could improve the load/store optimizer to catch those cases as well.

================
Comment at: test/CodeGen/ARM/avoid-cpsr-rmw.ll:31
@@ -30,3 +30,2 @@
-; CHECK-NOT: muls
   %ptr1.addr.09 = phi i32* [ %add.ptr, %while.body ], [ %ptr1, %entry ]
   %ptr2.addr.08 = phi i32* [ %incdec.ptr, %while.body ], [ %ptr2, %entry ]
----------------
qcolombet wrote:
> Could you comment why this check is failing now?
> I am not saying this is wrong, I’d like to check we are not missing something.
Never mind me, I found the answer in an exchange we had offline.

For the record, the assembly code generated without and with your patch are the following :
 Without the patch
 -----------------
	ldr.w	r9, [r0]
	ldr	r3, [r0, #4]
	ldr	r2, [r0, #8]
	ldr.w	r12, [r0, #12]
	adds	r0, #16 <— after this point, having muls would be wrong.
	mul	r3, r3, r9
	mul	r2, r3, r2
	mul	r2, r2, r12

 With the patch
 --------------
	ldr.w	r12, [r0, #4]
	ldr	r3, [r0, #8]
	ldr.w	r9, [r0, #12]
	ldr	r2, [r0], #16
	mul	r2, r12, r2
	muls	r2, r3, r2 <— Now the muls is necessary, since we got rid of the adds.
	mul	r2, r2, r9


http://reviews.llvm.org/D9107