[LLVMdev] Vectorization Cost Models and Multi-Instruction Patterns?

Tue Jan 20 08:37:31 PST 2015

> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
> On Behalf Of Das, Dibyendu
> Sent: Tuesday, January 20, 2015 2:42 PM
> To: Ahmed Bougacha; LLVM Dev
> Subject: Re: [LLVMdev] Vectorization Cost Models and Multi-Instruction
> Patterns?
> 
> I guess it will help SAD generation also.
> 
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
> On Behalf Of Ahmed Bougacha
> Sent: Tuesday, January 20, 2015 5:21 AM
> To: LLVM Dev
> Subject: [LLVMdev] Vectorization Cost Models and Multi-Instruction
> Patterns?
> 
> Hi all,
> 
> While tinkering with saturation instructions, I hit problems with the cost
> model calculations.
> 
> The loop vectorizer cost model accumulates the individual TTI cost model of
> each instruction.  For saturating arithmetic, this is a gross overestimate, since
> you have 2 sexts (inputs), 2 icmps + 2 selects (for the saturation), and a
> truncate (output); these all fold alway.
> With an intrinsic, you'd end up with a better estimate; however, I'm trying to
> see what problems we would encounter without intrinsics, and I think this is
> the biggest one.
> Note that AFAICT, costs for min/max patterns (icmp+iselect) are also
> overestimated, but not as much as saturate.
> 
> 
> Proposal:
> 
> Add a method, part of the vector API of TargetTransformInfo, for multi-
> instruction cost computation.  It would take a scalar Instruction, and a
> reference to a set of Instruction.  If it's able to match a min/max/saturate/..,

Are you trying to match the multi-instruction with library calls or Intrinsics?
If it is intrinsic, you may also need to check legality before matching.

> it adds all the matched instructions to the set, so the caller (say
> LoopVectorizationCostModel) can ignore them.
> 
> But:
> -  this all seems icky: a very blunt hammer.
> -  what, if anything, should we do about legality checks?  The expanded IR
> equivalent of a saturate uses larger types than necessary, so this might
> prevent vectorization.  In practice it doesn't, because only load/store/PHI
> types are checked there.

IMO, it will still prevent vectorization because the value used by min/max/saturate
might have been defined by load/store/phi.

> -  is this useful in other cases, beyond min/max (maybe abs ?)  and saturate?
Yes, Sum-Of-Absolute-Differece is another case.

Regards,
Shahid
> 
> Thanks!
> 
> -Ahmed
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev