[LLVMdev] Vectorization Cost Models and Multi-Instruction Patterns?

Mon Jan 19 15:50:38 PST 2015

Hi all,

While tinkering with saturation instructions, I hit problems with the
cost model calculations.

The loop vectorizer cost model accumulates the individual TTI cost
model of each instruction.  For saturating arithmetic, this is a gross
overestimate, since you have 2 sexts (inputs), 2 icmps + 2 selects
(for the saturation), and a truncate (output); these all fold alway.
With an intrinsic, you'd end up with a better estimate; however, I'm
trying to see what problems we would encounter without intrinsics, and
I think this is the biggest one.
Note that AFAICT, costs for min/max patterns (icmp+iselect) are also
overestimated, but not as much as saturate.

Proposal:

Add a method, part of the vector API of TargetTransformInfo, for
multi-instruction cost computation.  It would take a scalar
Instruction, and a reference to a set of Instruction.  If it's able to
match a min/max/saturate/.., it adds all the matched instructions to
the set, so the caller (say LoopVectorizationCostModel) can ignore
them.

But:
-  this all seems icky: a very blunt hammer.
-  what, if anything, should we do about legality checks?  The
expanded IR equivalent of a saturate uses larger types than necessary,
so this might prevent vectorization.  In practice it doesn't, because
only load/store/PHI types are checked there.
-  is this useful in other cases, beyond min/max (maybe abs ?)  and saturate?

Thanks!

-Ahmed