[PATCH] D65884: [ARM] MVE Tail Predication
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Aug 8 07:14:26 PDT 2019
dmgreen added a comment.
In D65884#1620474 <https://reviews.llvm.org/D65884#1620474>, @samparker wrote:
> > Why does the llvm_arm_vctp32 not return a <4xi1> directly?
>
> The vctp family are defined like that because the ACLE specifies that they return a mve_pred16_t and I'm assuming this is a scalar - but I can't find a definition! I think that all the user facing predicate generators will produce a scalar and we will need to do the conversion to make it nice and LLVMy.
Sure, the ACLE intrinsic needs to return an i16, but does that mean the IR intrinsic needs to? It could be expanded to two instructions, llvm_arm_vctp32 and llvm_arm_vmrs, with the i16 coming from the vmrs. This kind of thing sounds like it would be useful already for things like masked loads. i.e I'm saying can we invert where the conversion happens?
So if we started with acle:
mve_pred16_t pred = vctp8q(i)
l = vldrbq_z_s8(a, pred)
It would get expanded to become:
// vctp8q
<4 x i1> t1 = llvm.arm.vctp(i)
i16 pred = llvm.arm.vmrs(t1)
// vldrbq_z_s8
<4 x i1> t2 = llvm.arm.vmsr(pred)
l = llvm.masked.load(a, t2)
And you could use instcombine to fold out the converts (vmsr(vmrs(a)) == a), into
t1 = llvm.arm.vctp(i)
llvm.masked.load(a, t1)
It would work even better for compares that already have predicate that llvm knows about. They whole thing would just become llvm IR and we can let it optimise away. This is getting a bit much into intrinsic design, though, with isn't this patches problem!
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D65884/new/
https://reviews.llvm.org/D65884
More information about the llvm-commits
mailing list