[LLVMdev] SelectionDAG scalarizes vector operations.
ayal.zaks at intel.com
Wed Feb 8 13:46:34 PST 2012
> Hi Dave,
> >> We generate xEXT nodes in many cases. Unlike GCC which vectorizes
> >> inner loops, we vectorize the implicit outermost loop of
> >> data-parallel workloads (also called whole function vectorization).
Just to clarify, GCC vectorizes innermost and next-to-innermost (aka outer) loops, packing instances of the same original scalar instruction across different iterations into a vector instruction. It also vectorizes within basic blocks (aka SLP), packing distinct scalar instructions into vectors. And, it does the latter while considering a (possible) enclosing loop -- in order to place loop-invariant code outside, and also to unroll the enclosing loop if/as needed to fill the vectors. But, in any event, it creates fully vectorized code regions, with scalar code used only in supporting computations such as addressing, loop induction variable handling, reduction epilogs etc.
> >> We vectorize code even if the user uses xEXT instructions, uses mixed
> types, etc.
GCC does vectorize code which contains multiple data types, by choosing the vectorization factor according to the smallest type, and using multiple vectors to hold larger types.
> >> We choose a vectorization factor which is likely to generate more
> >> legal vector types, but if the user mixes types then we are forced to
> >> make a decision. We rely on the LLVM code generator to produce
> >> quality code. To my understanding, the GCC vectorizer does not
> >> vectorize code if it thinks that it misses a single operation.
Right. It queries whether the target supports a vectorized form (of the desired vectorization factor) for each scalar instruction in the loop or region. There is no scalarization -- code is either fully vectorized in a way that survives code generation, or else the vectorizer gives up and avoids modifying the relevant scalar code. This may indeed not be an optimal decision; but even then, there are cases where it's better not to vectorize.
> > My experience is similar to Nadav's. The Cray vectorizer vectorizes
> > much more code that the gcc vectorizer. Things are much more
> > complicated than gcc vector code would lead one to believe.
> I think it is important we produce non-scalarized code for the IR produced by
> the GCC vectorizer, since we know it can be done (otherwise GCC wouldn't
> have produced it). It is of course important to produce decent code in the
> most common cases coming from other vectorizers too. However it seems
> sensible to me to start with the case where you know you can easily get
> perfect results (GCC vectorizer output) and then try to progressively extend
> the goodness to the more problematic cases coming from other vectorizers.
BTW, the GCC vectorizer can also tell you why it did not vectorize; e.g., if some instruction was not available in vector form.
So the vectorizer takes care of any desired unrollings on its own, and does not rely on a separate unroll pass. It does rely on a separate if-conversion pass especially designed to eliminate if-then-else hammocks in relevant regions (loops) right before the vectorizer kicks in. This part may require undoing, when an if-converted loop is not vectorized and the target does not support the resulting predicated scalar instructions.
Hope this helps. Had the pleasure of working with the GCC autovect guys (or rather gals) from the start, before joining Nadav et al. recently.
> Ciao, Duncan.
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
More information about the llvm-dev