[llvm] r204884 - [X86][Vectorizer Cost Model] Correct vectorization cost model for v2i64->v2f64

Thu Mar 27 10:54:03 PDT 2014

On Mar 27, 2014, at 9:32 AM, Quentin Colombet <qcolombet at apple.com> wrote:

> 
> On Mar 27, 2014, at 8:52 AM, Adam Nemet <anemet at apple.com> wrote:
> 
>> 
>> On Mar 26, 2014, at 5:52 PM, Quentin Colombet <qcolombet at apple.com> wrote:
>> 
>>> Author: qcolombet
>>> Date: Wed Mar 26 19:52:16 2014
>>> New Revision: 204884
>>> 
>>> URL: http://llvm.org/viewvc/llvm-project?rev=204884&view=rev
>>> Log:
>>> [X86][Vectorizer Cost Model] Correct vectorization cost model for v2i64->v2f64
>>> and v4i64->v4f64.
>>> 
>>> The new costs match what we did for SSE2 and reflect the reality of our codegen.
>>> 
>>> <rdar://problem/16381225>
>>> 
>>> Added:
>>>   llvm/trunk/test/Transforms/LoopVectorize/X86/uint64_to_fp64-cost-model.ll
>>> Modified:
>>>   llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
>>> 
>>> Modified: llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp?rev=204884&r1=204883&r2=204884&view=diff
>>> ==============================================================================
>>> --- llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp (original)
>>> +++ llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp Wed Mar 26 19:52:16 2014
>>> @@ -512,6 +512,8 @@ unsigned X86TTI::getCastInstrCost(unsign
>>>    { ISD::UINT_TO_FP,  MVT::v4f64, MVT::v4i8,  2 },
>>>    { ISD::UINT_TO_FP,  MVT::v4f64, MVT::v4i16, 2 },
>>>    { ISD::UINT_TO_FP,  MVT::v4f64, MVT::v4i32, 6 },
>>> +    { ISD::UINT_TO_FP,  MVT::v2f64, MVT::v2i64, 2*10 },
>>> +    { ISD::UINT_TO_FP,  MVT::v4f64, MVT::v4i64, 4*10 },
>> 
>> I think that a comment would be in order here.  If this node is codegen’ed as Expand rather than by the backend then for example why doesn’t the generic cost estimate (BasicTTI) give the right result?
> BasicTTI estimates the cost of VF 2 to 4, and VF 4 to 10. One can argue that this is a very optimistic computation based on the actual codegen!
> 
> In  X86TTI::getCastInstrCost, there is already a comment about that for SSE2:
>     // These are somewhat magic numbers justified by looking at the output of
>     // Intel's IACA, running some kernels and making sure when we take
>     // legalization into account the throughput will be overestimated.
>     { ISD::UINT_TO_FP, MVT::v2f64, MVT::v2i64, 2*10 },
> 
> I’ve talked to Arnold and he said we basically need to put something expensive (10) and multiply it by the vector width.
> I’ve matched the behavior of SSE2 here.
> Do you think a comment along those lines would do?

I think that it would be good to point out where you think the generic code gets the number wrong.  Then if we improve the generic estimate we’d know which cost estimates to revisit in the backend.

As you know I am working on a very similar bug and in my case the estimate for scalarization overhead is optimistic because it does not take the length of the dependence chain into consideration.  I am planning to point this out in the comment.

Adam

> 
> 
> ** Details on the Expansion Computation **
> 
> We are indeed failing in the expand case, which is the related code:
> 
>  BasicTargetTransformInfo.cpp:339
>     // If we are converting vectors and the operation is illegal, or
>     // if the vectors are legalized to different types, estimate the
>     // scalarization costs.
>     unsigned Num = Dst->getVectorNumElements();
> The cast for the scalar type is legal and returns 1.
> 
>     unsigned Cost = TopTTI->getCastInstrCost(Opcode, Dst->getScalarType(),
>                                              Src->getScalarType());
> 
>     // Return the cost of multiple scalar invocation plus the cost of
>     // inserting and extracting the values.
> Then the scalarization overhead is estimated to 2 for VF 2 and 6 for VF 4:
> The cost of Index 0 is free (per X86TTI::getVectorInstrCost) and the cost of other indexes is 2 (1 insert, 1 extract).
> I.e.,
> VF N = 0 + 2 * (N - 1) + N * 1 = 3 * N - 2
> VF 2 = 3 * 2 - 2 = 4
> VF 4 = 3 * 4 - 2 = 10
>     return getScalarizationOverhead(Dst, true, true) + Num * Cost;
> 
> Thanks,
> -Quentin
>> 
>> Adam
>> 
>>> 
>>>    { ISD::FP_TO_SINT,  MVT::v8i8,  MVT::v8f32, 7 },
>>>    { ISD::FP_TO_SINT,  MVT::v4i8,  MVT::v4f32, 1 },
>>> 
>>> Added: llvm/trunk/test/Transforms/LoopVectorize/X86/uint64_to_fp64-cost-model.ll
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopVectorize/X86/uint64_to_fp64-cost-model.ll?rev=204884&view=auto
>>> ==============================================================================
>>> --- llvm/trunk/test/Transforms/LoopVectorize/X86/uint64_to_fp64-cost-model.ll (added)
>>> +++ llvm/trunk/test/Transforms/LoopVectorize/X86/uint64_to_fp64-cost-model.ll Wed Mar 26 19:52:16 2014
>>> @@ -0,0 +1,26 @@
>>> +; RUN: opt < %s  -loop-vectorize -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx -S -debug-only=loop-vectorize 2>&1 | FileCheck %s
>>> +; REQUIRES: asserts
>>> +
>>> +target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
>>> +target triple = "x86_64-apple-macosx10.8.0"
>>> +
>>> +
>>> +; CHECK: cost of 20 for VF 2 For instruction:   %conv = uitofp i64 %tmp to double
>>> +; CHECK: cost of 40 for VF 4 For instruction:   %conv = uitofp i64 %tmp to double
>>> +define void @uint64_to_double_cost(i64* noalias nocapture %a, double* noalias nocapture readonly %b) nounwind {
>>> +entry:
>>> +  br label %for.body
>>> +for.body:
>>> +  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
>>> +  %arrayidx = getelementptr inbounds i64* %a, i64 %indvars.iv
>>> +  %tmp = load i64* %arrayidx, align 4
>>> +  %conv = uitofp i64 %tmp to double
>>> +  %arrayidx2 = getelementptr inbounds double* %b, i64 %indvars.iv
>>> +  store double %conv, double* %arrayidx2, align 4
>>> +  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
>>> +  %exitcond = icmp eq i64 %indvars.iv.next, 256
>>> +  br i1 %exitcond, label %for.end, label %for.body
>>> +
>>> +for.end:
>>> +  ret void
>>> +}
>>> 
>>> 
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140327/6d44e350/attachment.html>