[cfe-commits] r166743 - in /cfe/trunk: lib/Headers/xmmintrin.h test/CodeGen/sse-builtins.c
manman ren
mren at apple.com
Wed Oct 31 12:45:13 PDT 2012
On Oct 30, 2012, at 4:58 PM, "Liao, Michael" <michael.liao at intel.com> wrote:
> Thanks. That would be great to model SSE variants of SQRTSS almost the same as AVX variants with tied input/output operand.
Committed r167064.
Thanks,
Manman
>
> Yours
> - Michael
>
> -----Original Message-----
> From: manman ren [mailto:mren at apple.com]
> Sent: Tuesday, October 30, 2012 4:52 PM
> To: Liao, Michael
> Cc: Eli Friedman; cfe-commits at cs.uiuc.edu Commits
> Subject: Re: [cfe-commits] r166743 - in /cfe/trunk: lib/Headers/xmmintrin.h test/CodeGen/sse-builtins.c
>
>
> On Oct 30, 2012, at 4:21 PM, Michael Liao <michael.liao at intel.com> wrote:
>
>> On Tue, 2012-10-30 at 16:16 -0700, manman ren wrote:
>>> On Oct 30, 2012, at 4:13 PM, Michael Liao <michael.liao at intel.com> wrote:
>>>
>>>> These unary insns are not modelled accurately to capture the
>>>> hardware behavior (high parts of result is preserved in SSE
>>>> variant). Even though the fix works here, it'd better to fix them in
>>>> X86 backend. Bug is filed in PR14221.
>>> I actually have a fix in backend, but thought it is cleaner to fix it
>>> in the front-end :)
>>
>> Hmm, I think we'd better fix BE and model them correctly to not only
>> fix this issue (correctness) but also potential performance issue.
>> Without modelling the destination register is an input as well, RA may
>> choose a bad candidate which cause partial register stall issue.
>
> The correctness issue only happens with vector inputs, and currently we select the vector variant for intrinsics only.
> My fix in BE actually will force register allocator to choose the same register for both operands.
> So it should avoid the partial register stall issue for the vector variant.
> Thanks for pointing it out, I will check in my BE fix for rsqrt and rcp.
> The definition of builtins for rsqrtss, rcpss will be updated to be the same as the intrinsics.
> sqrtss and sqrtsd are not included yet.
>
> Manman
>
>>
>> Yours
>> - Michael
>>
>>>
>>> Thanks,
>>> Manman
>>>
>>>>
>>>> Yours
>>>> - Michael
>>>>
>>>> On Tue, 2012-10-30 at 10:57 -0700, manman ren wrote:
>>>>> On Oct 29, 2012, at 6:01 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>>>>>
>>>>>> On Thu, Oct 25, 2012 at 8:23 PM, Manman Ren <mren at apple.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Sent from my iPhone
>>>>>>>
>>>>>>> On Oct 25, 2012, at 7:54 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>>>>>>>
>>>>>>>> On Thu, Oct 25, 2012 at 7:35 PM, Manman Ren <mren at apple.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sent from my iPhone
>>>>>>>>>
>>>>>>>>> On Oct 25, 2012, at 6:13 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> On Thu, Oct 25, 2012 at 5:25 PM, Manman Ren <mren at apple.com> wrote:
>>>>>>>>>>> Author: mren
>>>>>>>>>>> Date: Thu Oct 25 19:25:10 2012 New Revision: 166743
>>>>>>>>>>>
>>>>>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=166743&view=rev
>>>>>>>>>>> Log:
>>>>>>>>>>> X86 SSE Intrinsics: update header for sqrt_ss, rsqrt_ss and rcp_ss.
>>>>>>>>>>>
>>>>>>>>>>> There intrinsics pass through the upper FP values from the input.
>>>>>>>>>>> rdar://12558838
>>>>>>>>>>>
>>>>>>>>>>> Modified:
>>>>>>>>>>> cfe/trunk/lib/Headers/xmmintrin.h
>>>>>>>>>>> cfe/trunk/test/CodeGen/sse-builtins.c
>>>>>>>>>>>
>>>>>>>>>>> Modified: cfe/trunk/lib/Headers/xmmintrin.h
>>>>>>>>>>> URL:
>>>>>>>>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/xmm
>>>>>>>>>>> intrin.h?rev=166743&r1=166742&r2=166743&view=diff
>>>>>>>>>>> =============================================================
>>>>>>>>>>> =================
>>>>>>>>>>> --- cfe/trunk/lib/Headers/xmmintrin.h (original)
>>>>>>>>>>> +++ cfe/trunk/lib/Headers/xmmintrin.h Thu Oct 25 19:25:10
>>>>>>>>>>> +++ 2012
>>>>>>>>>>> @@ -95,7 +95,8 @@
>>>>>>>>>>> static __inline__ __m128 __attribute__((__always_inline__,
>>>>>>>>>>> __nodebug__))
>>>>>>>>>>> _mm_sqrt_ss(__m128 a)
>>>>>>>>>>> {
>>>>>>>>>>> - return __builtin_ia32_sqrtss(a);
>>>>>>>>>>> + __m128 c = __builtin_ia32_sqrtss(a); return (__m128) {
>>>>>>>>>>> + c[0], a[1], a[2], a[3] };
>>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> What does exactly does __builtin_ia32_sqrtss return, if not
>>>>>>>>>> the result of a sqrtss instruction?
>>>>>>>>> builtin returns the result of sqrtss instruction, which only updates the lowest FP. mm_sqrt_ss has the upper FPs pass through to the output.
>>>>>>>>
>>>>>>>> What exactly is in the top 96 bits of the return value of
>>>>>>>> __builtin_ia32_sqrtss, then?
>>>>>>> Those bits are undefined with the current implementation, same goes for sqrt_sd.
>>>>>>
>>>>>> Why are the intrinsics defined to take vectors in the first place, then?
>>>>> Not sure why the builtins for sqrtss, rsqrtss, rcpss and sqrtsd take vectors.
>>>>> Their top bits are undefined since they directly map to the corresponding instructions which have the top bits from the un-initialized destination registers.
>>>>>
>>>>> Thanks,
>>>>> Manman
>>>>>>
>>>>>> -Eli
>>>>>
>>>>> _______________________________________________
>>>>> cfe-commits mailing list
>>>>> cfe-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>>>
>>>>
>>>
>>
>>
>
More information about the cfe-commits
mailing list