[PATCH] D70487: [DAGCombiner] Allow zextended load combines.

Thu Nov 21 02:50:49 PST 2019

dmgreen added inline comments.

================
Comment at: llvm/test/CodeGen/ARM/load-combine-big-endian.ll:637
+; CHECK-NEXT:    lsl r0, r0, #16
+; CHECK-NEXT:    orr r0, r1, r0, lsr #24
 ; CHECK-NEXT:    mov pc, lr
----------------
courbet wrote:
> dmgreen wrote:
> > courbet wrote:
> > > RKSimon wrote:
> > > > regression
> > > Yes, that's something I meant to highlight in thsi patch, but forgot to send the comment:
> > > 
> > > I've seen in other tests for ARM we're OK paying quite a large number of instructions to avoid loads, e.g.
> > > 
> > > ```
> > > define i32 @load_i32_by_i8_bswap(i32* %arg) {
> > > ; BSWAP is not supported by 32 bit target
> > > ; CHECK-LABEL: load_i32_by_i8_bswap:
> > > ; CHECK:       @ %bb.0:
> > > ; CHECK-NEXT:    ldr r0, [r0]
> > > ; CHECK-NEXT:    mov r1, #65280
> > > ; CHECK-NEXT:    mov r2, #16711680
> > > ; CHECK-NEXT:    and r1, r1, r0, lsr #8
> > > ; CHECK-NEXT:    and r2, r2, r0, lsl #8
> > > ; CHECK-NEXT:    orr r1, r1, r0, lsr #24
> > > ; CHECK-NEXT:    orr r0, r2, r0, lsl #24
> > > ; CHECK-NEXT:    orr r0, r0, r1
> > > ; CHECK-NEXT:    mov pc, lr
> > > ```
> > > 
> > > I'm not very familiar with these targets, where do we draw the line ?
> > This appears to be running under a very old architecture where bswap is not legal.
> > 
> > Any idea why it is trying to perform this optimisation when there is no bswap to make it efficient?
> ```
> // Before legalize we can introduce illegal bswaps which will be later
> // converted to an explicit bswap sequence. This way we end up with a single
> // load and byte shuffling instead of several loads and byte shuffling.
> if (NeedsBswap && LegalOperations && !TLI.isOperationLegal(ISD::BSWAP, VT))
>   return SDValue();
> ```
> 
> I can always bail out on zext:
> 
> ```
> if (NeedsBswap && (LegalOperations || NeedsZext) && !TLI.isOperationLegal(ISD::BSWAP, VT))
>   return SDValue();
> ```
> 
> But it's not clear to me what's worth between a load and 4 arithmetic instructions on ARM.
> 
I think:
- On any remotely modern ARM system we will have a rev instruction, so I'm not very worried about the regressions here. They do show that an architecture without bswap may get worse though, if that is a thing that comes up in other places (I think all the archs here will have a bswap of some sort).
- I think counting instructions can be as good a measure as any on some of the smaller arm devices. Maybe something like 2:1 for loads to arithmetic.
- I'm not sure why we would do this at-all if we need a bswap but don't have one. But if the VT is still an illegal type, it will likely always look illegal even if it is eventually expanded into something sane. 

Can you add these two targets to the two test files:
-mtriple=thumbv6m-none-eabi
-mtriple=thumbv7m-none-eabi

If they are OK (along with the armv6 case) then I, from an ARM perspective, am happy with this.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70487/new/

https://reviews.llvm.org/D70487