[PATCH] D29591: [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
Filipe Cabecinhas via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Feb 7 08:44:00 PST 2017
filcab added a subscriber: rengolin.
filcab added a comment.
No problems on my side, but let's see if Renato has comments on the ARM codegen.
================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:4465
+ if (Index >= NarrowByteWidth)
+ return Op.getOpcode() == ISD::ZERO_EXTEND
+ ? Optional<ByteProvider>(ByteProvider::getConstantZero())
----------------
apilipenko wrote:
> filcab wrote:
> > `ANY_EXTEND` mentions that the bits are undefined. Maybe we can include it too (probably there's no difference in most cases, though)?
> calculateByteProvider can return either memory location (ByteProvider with memory location specified), zero constant (zero constant ByteProvider) or unknown (None). We can only express the origin of extend bytes for `ZERO_EXTEND`. In other cases including `ANY_EXTEND` it's unknown.
>
> Or you meant something different?
I was aiming for an "a read of undef can take any value" approach. But I'm not sure if "the value is undefined", in SDAG, has the same semantics. I don't think this is expected to be much of a benefit (or at all), so I think we might as well drop this question.
================
Comment at: test/CodeGen/ARM/fp16-promote.ll:857
; CHECK-VFP: add sp, sp, #8
-; CHECK-VFP: bx lr
+; CHECK-VFP: pop
; CHECK-NOVFP: ldrh
----------------
apilipenko wrote:
> filcab wrote:
> > This looks sketchy. Does `pop` change the PC on ARM? Did it appear but wasn't there?
> > If so, shouldn't `bx lr` still be there, to go back to the caller? (I'm not very familiar with ARM asm)
> Here is the codegen for this test case:
> ```
> test_extractelement:
> .fnstart
> @ BB#0:
> .save {r4, r5, r11, lr}
> push {r4, r5, r11, lr}
> .pad #8
> sub sp, sp, #8
> ldrd r4, r5, [r1]
> and r1, r2, #3
> mov r2, sp
> orr r1, r2, r1, lsl #1
> stm sp, {r4, r5}
> ldrh r1, [r1]
> strh r1, [r0]
> add sp, sp, #8
> pop {r4, r5, r11, pc}
> ```
> So, in fact we push `lr` and then pop it to `pc`. I changed the pattern matching slightly to make it more clear.
Ah. That makes sense, yes. I didn't remember you could pop multiple registers in that way.
What doesn't make sense is that code.
@rengolin Can you take a look at the post-patch `test_extractelement`?
It's loading stuff onto registers only to spill it again and reload the actual data we want. It's also saving r11 for no reason.
https://reviews.llvm.org/D29591
More information about the llvm-commits
mailing list