[PATCH] D80524: [ARM] Extra MVE VMLAV reduction patterns

Tue May 26 02:07:29 PDT 2020

dmgreen added a comment.

In D80524#2053955 <https://reviews.llvm.org/D80524#2053955>, @efriedma wrote:

> > There are some tests that end up looking worse, but are quite artificial due to passing half vector types through a call boundary. I would not expect the vmull to realistically come up like that, and a vmlava is likely better a lot of the time.
>
> Looking at the tests, I think the key distinction in the cases that get "worse" is that sign/zero-extend can be folded into the multiply.  It's not really related to the calling convention.  That said, not sure how likely that is to come up in practice... I guess if it's produced by a load, we can sign/zero-extend for free?

Yes. Specifically it needs to be sign extended from something that already places the lanes in the correct places. MVE doesn't have a normal sign extent instructions like neon (from the bottom 8 lanes of a v16i8 to a v8i16, for example). It can only use top/bottom vmovl's which need the lanes to be in the correct place. A <8 x i8> through a call boundary is actually (apparently) a 128bit vector with widened lanes, hence my comment about the calling convention. Otherwise the extend wouldn't match and we wouldn't produce a vmull anyway. A vmovlb is really a

  %s = shufflevector <16 x i8> %src, <16 x i8> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
  %ext = sext <8 x i8> %s to <8 x i16>

Until we do lane interleaving (which is in the works but we don't do yet), I wouldn't expect this to come up in practice. Like you said a load/store will likely do the extend for free from most code.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80524/new/

https://reviews.llvm.org/D80524