[PATCH] D9107: Fix CombineToPostIndexedLoadStore in DAGCombiner.cpp

Tue Jul 7 16:21:27 PDT 2015

qcolombet requested changes to this revision.

This revision now requires changes to proceed.

Hi François,

Looks like the XFAILed test cases were just us being lucky in the previous version.
This is unfortunate, but like I said, I do not think we can land the patch as is, since it may regress some stuff.

Looking at the test cases makes me feel like we may not want to form the pre/post addressing mode that early, but instead rely on later passes to catch those. The rational is that I would prefer we grab the multiple load/store first since they impact the performance, whereas pre/post increment are mainly size optimization and thus less important.

The bottom line is that IMHO, we should investigate that it gives to completely get rid of that DAG combine and do the pre/post optimization later.

Let me know what do you think?

Cheers,
-Quentin


================
Comment at: test/CodeGen/ARM/2012-10-04-AAPCS-byval-align8.ll:33
@@ -33,1 +32,3 @@
+; CHECK: ldr    r2, {{\[}}[[BASE]]{{\]}}, #4
+; CHECK: ldr    [[TMP:r[0-9]+]], {{\[}}[[BASE]]{{\]}}
 ; CHECK: movw r0, #555
----------------
This one is funny because we change:
ld [@addr]
ld [@addr, 4]

into:
ld [@addr], #4
ld [@addr]
{reg used for @addr is redefined}

And that prevents the load store optimizer to catch it.

================
Comment at: test/CodeGen/ARM/2013-01-21-PR14992.ll:18
@@ -18,1 +17,3 @@
+;CHECK:  ldr [[REG:r[0-9]+]], [{{r[0-9]+}}
+;CHECK-NOT: ldr [[REG]], [{{r[0-9]+}}
   tail call void @bar(i32* %add.ptr) nounwind optsize
----------------
This one is worrisome as I expect it may expose runtime regressions and because it seems plausible to happen frequently.

Without:
	ldm	r0!, {r4, r5, r6}
	bl	bar

With:
	ldr	r4, [r0, #4]
	ldr	r5, [r0, #8]
	ldr	r6, [r0], #12
	bl	bar

The problem here is that after register allocation, without recoloring and with the support of PRE and POST automod, we won’t be able to form a ldm since the registers are not in the right order (r6, r4, r5, instead of r4, r5, r6). The pre- regalloc load/store optimizer could be taught to reorder them to make that possible but that does not seem trivial work.

================
Comment at: test/CodeGen/ARM/wrong-t2stmia-size-opt.ll:21
@@ +20,2 @@
+; CHECK: str r1
+; CHECK: str.w lr
----------------
We miss that we can express this as:
[@addr], #8
[@addr, -#4]

That said, the stored registers are not in the right order after reg alloc to be able to recognize that this is stm @addr!, {r1, r2}.


http://reviews.llvm.org/D9107