[llvm] r204880 - X86: Correct vectorization cost model for v8f32->v8i8.

Thu Mar 27 13:24:12 PDT 2014

On Thu, Mar 27, 2014 at 11:33 AM, Nadav Rotem <nrotem at apple.com> wrote:
>
> On Mar 27, 2014, at 11:06 AM, Eric Christopher <echristo at gmail.com> wrote:
>
>> On Thu, Mar 27, 2014 at 11:03 AM, Nadav Rotem <nrotem at apple.com> wrote:
>>> This looks like a codegen problem and not an ISA problem.  Can we convert
>>> v8f32->v8i32 efficiently? Yes.  Can we convert v8i32->v8i8 efficiently? Yes.
>>> Can we do it in less than 7 instructions? Probably.
>>
>> How many? I haven't looked at it in detail, but the codegen out of the
>> compiler seems to be cleaned up pretty well, post legalization. Do you
>> have an idea of how few we could do it in?
>>
>
> We already get  "fptosi <8 x float> %x to <8 x i32>" right, and it generates vcvttps2dq.  To get from <8 x i32> to <8 x i8>  I think that vpacksi32 (v packed i32) can do the trick.
>
> If that does not work then we can probably shuffle twice and blend.

Interesting, if you can come up with a likely candidate then could you
update the cost table there?

Thanks!

-eric

>
>
>> -eric
>>
>>>
>>>
>>> On Mar 27, 2014, at 9:40 AM, Eric Christopher <echristo at gmail.com> wrote:
>>>
>>>
>>> On Mar 27, 2014 8:41 AM, "Jim Grosbach" <grosbach at apple.com> wrote:
>>>>
>>>>
>>>>
>>>>> On Mar 26, 2014, at 10:32 PM, Eric Christopher <echristo at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> On Wed, Mar 26, 2014 at 5:04 PM, Jim Grosbach <grosbach at apple.com>
>>>>>> wrote:
>>>>>> Author: grosbach
>>>>>> Date: Wed Mar 26 19:04:11 2014
>>>>>> New Revision: 204880
>>>>>>
>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=204880&view=rev
>>>>>> Log:
>>>>>> X86: Correct vectorization cost model for v8f32->v8i8.
>>>>>>
>>>>>> Fix the cost model to reflect the reality of our codegen.
>>>>>
>>>>> Reality of our codegen or reality of the processors?
>>>>>
>>>>
>>>> The latter, though the cost model should be accurately reflecting both.
>>>
>>> It should, but in the latter a bug should be filed and a comment to that
>>> effect listed. "Realities of our CodeGen" really sounds like a deficiency
>>> we're papering over.
>>>
>>> -eric
>>>
>>> PS. Reading more of the patches I'll just CC Quentin on this response too.
>>> :)
>>>
>>>>> -eric
>>>>>
>>>>>>
>>>>>> rdar://16370633
>>>>>>
>>>>>> Added:
>>>>>>
>>>>>> llvm/trunk/test/Transforms/LoopVectorize/X86/fp_to_sint8-cost-model.ll
>>>>>> Modified:
>>>>>>   llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
>>>>>>
>>>>>> Modified: llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
>>>>>> URL:
>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp?rev=204880&r1=204879&r2=204880&view=diff
>>>>>>
>>>>>> ==============================================================================
>>>>>> --- llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp (original)
>>>>>> +++ llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp Wed Mar 26
>>>>>> 19:04:11 2014
>>>>>> @@ -513,7 +513,7 @@ unsigned X86TTI::getCastInstrCost(unsign
>>>>>>    { ISD::UINT_TO_FP,  MVT::v4f64, MVT::v4i16, 2 },
>>>>>>    { ISD::UINT_TO_FP,  MVT::v4f64, MVT::v4i32, 6 },
>>>>>>
>>>>>> -    { ISD::FP_TO_SINT,  MVT::v8i8,  MVT::v8f32, 1 },
>>>>>> +    { ISD::FP_TO_SINT,  MVT::v8i8,  MVT::v8f32, 7 },
>>>>>>    { ISD::FP_TO_SINT,  MVT::v4i8,  MVT::v4f32, 1 },
>>>>>>  };
>>>>>>
>>>>>>
>>>>>> Added:
>>>>>> llvm/trunk/test/Transforms/LoopVectorize/X86/fp_to_sint8-cost-model.ll
>>>>>> URL:
>>>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopVectorize/X86/fp_to_sint8-cost-model.ll?rev=204880&view=auto
>>>>>>
>>>>>> ==============================================================================
>>>>>> ---
>>>>>> llvm/trunk/test/Transforms/LoopVectorize/X86/fp_to_sint8-cost-model.ll
>>>>>> (added)
>>>>>> +++
>>>>>> llvm/trunk/test/Transforms/LoopVectorize/X86/fp_to_sint8-cost-model.ll Wed
>>>>>> Mar 26 19:04:11 2014
>>>>>> @@ -0,0 +1,24 @@
>>>>>> +; RUN: opt < %s  -loop-vectorize -mtriple=x86_64-apple-macosx10.8.0
>>>>>> -mcpu=corei7-avx -S -debug-only=loop-vectorize 2>&1 | FileCheck %s
>>>>>> +
>>>>>> +target datalayout =
>>>>>> "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
>>>>>> +target triple = "x86_64-apple-macosx10.8.0"
>>>>>> +
>>>>>> +
>>>>>> +; CHECK: cost of 7 for VF 8 For instruction:   %conv = fptosi float
>>>>>> %tmp to i8
>>>>>> +define void @float_to_sint8_cost(i8* noalias nocapture %a, float*
>>>>>> noalias nocapture readonly %b) nounwind {
>>>>>> +entry:
>>>>>> +  br label %for.body
>>>>>> +for.body:
>>>>>> +  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
>>>>>> +  %arrayidx = getelementptr inbounds float* %b, i64 %indvars.iv
>>>>>> +  %tmp = load float* %arrayidx, align 4
>>>>>> +  %conv = fptosi float %tmp to i8
>>>>>> +  %arrayidx2 = getelementptr inbounds i8* %a, i64 %indvars.iv
>>>>>> +  store i8 %conv, i8* %arrayidx2, align 4
>>>>>> +  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
>>>>>> +  %exitcond = icmp eq i64 %indvars.iv.next, 256
>>>>>> +  br i1 %exitcond, label %for.end, label %for.body
>>>>>> +
>>>>>> +for.end:
>>>>>> +  ret void
>>>>>> +}
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>