[llvm-dev] [RFC] Allow loop vectorizer to choose vector widths that generate illegal types

Wed Jun 22 08:45:21 PDT 2016

On 15 June 2016 at 23:47, Michael Kuperstein via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> Step 1: Make -vectorizer-maximize-bandwidth the default. This should improve
> the performance of loops that contain mixed-width types.

Hi Michael,

Per target, after investigation, I think this is perfectly fine.

> Step 2: Remove the artificial width limitation altogether, and base the
> vectorization factor decision purely on the cost model. This should allow us
> to get rid of the interleaving code in the loop vectorizer, and get
> interleaving for "free" from the legalizer instead.

I'm slightly worried about this one, though.

The legalizer is a very large mess, with many unknown (or long
forgotten) inter-dependencies and intra-dependencies (with isel,
regalloc, back-end opt passes, etc), which were all mostly annealed
into working by heuristics and hack-fixing stuff. The multiple
attempts at re-writing the instruction selection is one demonstration
of that problem...

So, while I agree with Hal that this will put a good pressure into
improving the cost model (as well as the intra-dependencies), and
that's something very positive, I fear if the jump becomes to far,
we'll either break the world or not jump at all. For example,
FastISel.

I'm not saying we shouldn't do it, but if/when we do it, it would be
*very* beneficial to provide a multi-step migration path for future
targets to move in, not just a multi-step initial migration for the
primary target.

Another thing to consider is that the SLP vectorizer can use non-SIMD
FP co-processors (VFP on ARM), which have different costs than SIMD,
but may share the same decision path, especially if we move the
decision lower down into the legalizer.

Also, there are hidden costs between the different units in sharing
the registers or moving between, and that is not mapped into the
current cost model entirely (only via heuristics). This may not be a
problem for Intel, but it certainly will be for ARM/AArch64.

I had a plan 3 years ago to look into that, but never got around doing
it. Maybe it's about time I did... :)

Finally, if you need pre-testing and benchmarking, let me know and I
can spare some time to help you. I'll be glad to be copied on the
reviews and will do my best to help.

All in all, I don't think we'll get anything for free on this change.
There will be a cost, and it will be different on different targets,
but it may very well be a cost worth taking. I don't know enough yet
to have an opinion.

cheers,
--renato