[LLVMdev] Is there pass to break down <4 x float> to scalars

Fri Oct 25 05:53:35 PDT 2013

On 25 October 2013 11:06, Richard Sandiford <rsandifo at linux.vnet.ibm.com>wrote:

> I wanted the same thing for SystemZ, which doesn't have vectors,
> in order to improve the llvmpipe code.
>

Hi Richard,

This is a nice patch. I was wondering how hard it'd be to do that, and it
seems that you're catching lots of corner cases.

My interest is also due to converting odd vectors into scalars, but to
convert them again to CPU vectors, say from OpenCL to NEON code.

It would also need some TargetTransformInfo hooks to decide which
> vectors should be decomposed.
>

If I got it right, this may not be necessary, or it may even be harmful.

Say you decide that <4 x i32> vectors should be left alone, so that your
pass only scalarise the others. But when the vectorizer passes again (to
try and use CPU vector instructions), it might not match the scalarised
version with the vector, and you end up with data movement between scalar
and vector pipelines, which normally slows down CPUs (at least in ARM's
case). Also, problematic cases like <5 x i32> could be better split into
3+2 pairs, rather than 4+1.

If you scalarise everything, than the vectorizers will have a better chance
of spotting patterns and vectorising the whole lot, then based on target
transform info.

Is that what you had in mind?

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131025/cf325b87/attachment.html>