[PATCH] D75667: [ARM][MVE] Enable *SHRN* for tail predication
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Mar 6 02:46:24 PST 2020
dmgreen added a comment.
OK. I'm not sure if that is enough, if I am understanding correctly. What if we load a v8i16, extend that into two v4i32's using something like a VMULL, then narrow that back into a single v8i16. I don't think this is something that autovec will produce (yet), but could come up from intrinsics in a way that people are likely to write. Something like this:
#include <arm_mve.h>
void test(short *x, short *y, short *z, int n) {
while(n > 0) {
int pred = vctp16q(n);
int16x8_t a = vldrhq_z_s16(x, pred);
int16x8_t b = vldrhq_z_s16(y, pred);
int32x4_t top = vmulltq_int(a, b);
int32x4_t bot = vmullbq_int(a, b);
int16x8_t rtop = vqshrnbq(vuninitializedq_s16(), bot, 16);
int16x8_t rbot = vqshrntq(rtop, top, 16);
vstrhq_p_s16(z, rbot, pred);
x += 8;
y += 8;
z += 8;
n -= 8;
}
}
I'm pretty sure that tail predicating this would not be valid, as the top bits of one of the mul's could be cut off.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D75667/new/
https://reviews.llvm.org/D75667
More information about the llvm-commits
mailing list