[LLVMdev] Vectorization Cost Models and Multi-Instruction Patterns?
Ahmed Bougacha
ahmed.bougacha at gmail.com
Mon Jan 19 15:50:38 PST 2015
Hi all,
While tinkering with saturation instructions, I hit problems with the
cost model calculations.
The loop vectorizer cost model accumulates the individual TTI cost
model of each instruction. For saturating arithmetic, this is a gross
overestimate, since you have 2 sexts (inputs), 2 icmps + 2 selects
(for the saturation), and a truncate (output); these all fold alway.
With an intrinsic, you'd end up with a better estimate; however, I'm
trying to see what problems we would encounter without intrinsics, and
I think this is the biggest one.
Note that AFAICT, costs for min/max patterns (icmp+iselect) are also
overestimated, but not as much as saturate.
Proposal:
Add a method, part of the vector API of TargetTransformInfo, for
multi-instruction cost computation. It would take a scalar
Instruction, and a reference to a set of Instruction. If it's able to
match a min/max/saturate/.., it adds all the matched instructions to
the set, so the caller (say LoopVectorizationCostModel) can ignore
them.
But:
- this all seems icky: a very blunt hammer.
- what, if anything, should we do about legality checks? The
expanded IR equivalent of a saturate uses larger types than necessary,
so this might prevent vectorization. In practice it doesn't, because
only load/store/PHI types are checked there.
- is this useful in other cases, beyond min/max (maybe abs ?) and saturate?
Thanks!
-Ahmed
More information about the llvm-dev
mailing list