[PATCH] D29591: [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine

Eli Friedman via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 14 10:48:24 PST 2017


efriedma added inline comments.


================
Comment at: test/CodeGen/ARM/fp16-promote.ll:857
 ; CHECK-VFP: add sp, sp, #8
-; CHECK-VFP: bx lr
+; CHECK-VFP: pop
 ; CHECK-NOVFP: ldrh
----------------
RKSimon wrote:
> filcab wrote:
> > apilipenko wrote:
> > > filcab wrote:
> > > > This looks sketchy. Does `pop` change the PC on ARM? Did it appear but wasn't there?
> > > > If so, shouldn't `bx lr` still be there, to go back to the caller? (I'm not very familiar with ARM asm)
> > > Here is the codegen for this test case:
> > > ```
> > > test_extractelement:
> > > 	.fnstart
> > > @ BB#0:
> > > 	.save	{r4, r5, r11, lr}
> > > 	push	{r4, r5, r11, lr}
> > > 	.pad	#8
> > > 	sub	sp, sp, #8
> > > 	ldrd	r4, r5, [r1]
> > > 	and	r1, r2, #3
> > > 	mov	r2, sp
> > > 	orr	r1, r2, r1, lsl #1
> > > 	stm	sp, {r4, r5}
> > > 	ldrh	r1, [r1]
> > > 	strh	r1, [r0]
> > > 	add	sp, sp, #8
> > > 	pop	{r4, r5, r11, pc}
> > > ```
> > > So, in fact we push `lr` and then pop it to `pc`. I changed the pattern matching slightly to make it more clear.
> > Ah. That makes sense, yes. I didn't remember you could pop multiple registers in that way.
> > What doesn't make sense is that code.
> > @rengolin Can you take a look at the post-patch `test_extractelement`?
> > It's loading stuff onto registers only to spill it again and reload the actual data we want. It's also saving r11 for no reason.
> > 
> @rengolin @jmolloy Please can you advise on the ARM codegen in test_extractelement? Is it something to worry about?
The loads and stores are not ideal, but they're expected: we're scalarizing the <4 x half>, and the default expansion of extractelement involves a stack temporary.  The reason the prologue and epilogue are changing is that load merging allows the backend to form ldrd: ldrd requires an even/odd register pair, so the register allocator chooses to use use r4 and r5, which therefore requires spilling r4/r5.

The end result looks fine.


https://reviews.llvm.org/D29591





More information about the llvm-commits mailing list