[PATCH] D29591: [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine

Filipe Cabecinhas via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 7 08:44:00 PST 2017


filcab added a subscriber: rengolin.
filcab added a comment.

No problems on my side, but let's see if Renato has comments on the ARM codegen.



================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:4465
+    if (Index >= NarrowByteWidth)
+      return Op.getOpcode() == ISD::ZERO_EXTEND
+                 ? Optional<ByteProvider>(ByteProvider::getConstantZero())
----------------
apilipenko wrote:
> filcab wrote:
> > `ANY_EXTEND` mentions that the bits are undefined. Maybe we can include it too (probably there's no difference in most cases, though)?
> calculateByteProvider can return either memory location (ByteProvider with memory location specified), zero constant (zero constant ByteProvider) or unknown (None). We can only express the origin of extend bytes for `ZERO_EXTEND`. In other cases including `ANY_EXTEND` it's unknown. 
> 
> Or you meant something different?
I was aiming for an "a read of undef can take any value" approach. But I'm not sure if "the value is undefined", in SDAG, has the same semantics. I don't think this is expected to be much of a benefit (or at all), so I think we might as well drop this question.


================
Comment at: test/CodeGen/ARM/fp16-promote.ll:857
 ; CHECK-VFP: add sp, sp, #8
-; CHECK-VFP: bx lr
+; CHECK-VFP: pop
 ; CHECK-NOVFP: ldrh
----------------
apilipenko wrote:
> filcab wrote:
> > This looks sketchy. Does `pop` change the PC on ARM? Did it appear but wasn't there?
> > If so, shouldn't `bx lr` still be there, to go back to the caller? (I'm not very familiar with ARM asm)
> Here is the codegen for this test case:
> ```
> test_extractelement:
> 	.fnstart
> @ BB#0:
> 	.save	{r4, r5, r11, lr}
> 	push	{r4, r5, r11, lr}
> 	.pad	#8
> 	sub	sp, sp, #8
> 	ldrd	r4, r5, [r1]
> 	and	r1, r2, #3
> 	mov	r2, sp
> 	orr	r1, r2, r1, lsl #1
> 	stm	sp, {r4, r5}
> 	ldrh	r1, [r1]
> 	strh	r1, [r0]
> 	add	sp, sp, #8
> 	pop	{r4, r5, r11, pc}
> ```
> So, in fact we push `lr` and then pop it to `pc`. I changed the pattern matching slightly to make it more clear.
Ah. That makes sense, yes. I didn't remember you could pop multiple registers in that way.
What doesn't make sense is that code.
@rengolin Can you take a look at the post-patch `test_extractelement`?
It's loading stuff onto registers only to spill it again and reload the actual data we want. It's also saving r11 for no reason.



https://reviews.llvm.org/D29591





More information about the llvm-commits mailing list