[PATCH] D65580: [ARM] Tighten up VLDRH.32 with low alignments

Wed Aug 7 23:02:37 PDT 2019

dmgreen marked an inline comment as done.
dmgreen added a comment.

Thanks!

================
Comment at: llvm/test/CodeGen/Thumb2/mve-ldst-offset.ll:756
+; CHECK-NEXT:    .pad #8
+; CHECK-NEXT:    sub sp, #8
+; CHECK-NEXT:    ldr.w r3, [r0, #7]
----------------
samparker wrote:
> simon_tatham wrote:
> > samparker wrote:
> > > I am so confused by this, can you explain it for me please?
> > (Drive-by comment since this crossed my inbox)
> > 
> > I think what's going on here is:
> > 
> > `VLDRH.S32` means: load 8 bytes of memory, regard them as 4 16-bit halfwords (`H`), and sign-extend each one into a 32-bit lane (`S32`) of the output vector register.
> > 
> > But it requires alignment of at least 2 on the memory it's loading from. So in order to apply it to 8 bytes starting at an odd address, the generated code is copying the 8 source bytes to an aligned 8-byte stack slot, and then pointing the `VLDRH.S32` at that instead.
> > 
> > I assume this run of `llc` is in a mode where it assumes unaligned access support on the ordinary `LDR` instruction has been enabled in the hardware configuration. (If I remember, that's the default – to generate code compatible with a CPU that has that turned _off_ you have to say `-mno-unaligned-access` in clang, or whatever llc's equivalent option is.)
> Bah, thanks! For some reason I wasn't thinking about the need to widen, all the loads really threw me.
Yep, this is the default fallback of "align it via the stack and load it again". Its obviously not very efficient, but I don't believe it will often come up (it's only for unaligned 16bit loads). If it does we may be able to do something better, perhaps by splitting out the extend.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65580/new/

https://reviews.llvm.org/D65580