[cfe-commits] r166743 - in /cfe/trunk: lib/Headers/xmmintrin.h test/CodeGen/sse-builtins.c

Wed Oct 31 12:45:13 PDT 2012

On Oct 30, 2012, at 4:58 PM, "Liao, Michael" <michael.liao at intel.com> wrote:

> Thanks. That would be great to model SSE variants of SQRTSS almost the same as AVX variants with tied input/output operand.

Committed r167064.

Thanks,
Manman

> 
> Yours
> - Michael
> 
> -----Original Message-----
> From: manman ren [mailto:mren at apple.com] 
> Sent: Tuesday, October 30, 2012 4:52 PM
> To: Liao, Michael
> Cc: Eli Friedman; cfe-commits at cs.uiuc.edu Commits
> Subject: Re: [cfe-commits] r166743 - in /cfe/trunk: lib/Headers/xmmintrin.h test/CodeGen/sse-builtins.c
> 
> 
> On Oct 30, 2012, at 4:21 PM, Michael Liao <michael.liao at intel.com> wrote:
> 
>> On Tue, 2012-10-30 at 16:16 -0700, manman ren wrote:
>>> On Oct 30, 2012, at 4:13 PM, Michael Liao <michael.liao at intel.com> wrote:
>>> 
>>>> These unary insns are not modelled accurately to capture the 
>>>> hardware behavior (high parts of result is preserved in SSE 
>>>> variant). Even though the fix works here, it'd better to fix them in 
>>>> X86 backend. Bug is filed in PR14221.
>>> I actually have a fix in backend, but thought it is cleaner to fix it 
>>> in the front-end :)
>> 
>> Hmm, I think we'd better fix BE and model them correctly to not only 
>> fix this issue (correctness) but also potential performance issue. 
>> Without modelling the destination register is an input as well, RA may 
>> choose a bad candidate which cause partial register stall issue.
> 
> The correctness issue only happens with vector inputs, and currently we select the vector variant for intrinsics only.
> My fix in BE actually will force register allocator to choose the same register for both operands.
> So it should avoid the partial register stall issue for the vector variant.
> Thanks for pointing it out, I will check in my BE fix for rsqrt and rcp.
> The definition of builtins for rsqrtss, rcpss will be updated to be the same as the intrinsics.
> sqrtss and sqrtsd are not included yet.
> 
> Manman 
> 
>> 
>> Yours
>> - Michael
>> 
>>> 
>>> Thanks,
>>> Manman
>>> 
>>>> 
>>>> Yours
>>>> - Michael
>>>> 
>>>> On Tue, 2012-10-30 at 10:57 -0700, manman ren wrote:
>>>>> On Oct 29, 2012, at 6:01 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>>>>> 
>>>>>> On Thu, Oct 25, 2012 at 8:23 PM, Manman Ren <mren at apple.com> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> Sent from my iPhone
>>>>>>> 
>>>>>>> On Oct 25, 2012, at 7:54 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>>>>>>> 
>>>>>>>> On Thu, Oct 25, 2012 at 7:35 PM, Manman Ren <mren at apple.com> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Sent from my iPhone
>>>>>>>>> 
>>>>>>>>> On Oct 25, 2012, at 6:13 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>>> On Thu, Oct 25, 2012 at 5:25 PM, Manman Ren <mren at apple.com> wrote:
>>>>>>>>>>> Author: mren
>>>>>>>>>>> Date: Thu Oct 25 19:25:10 2012 New Revision: 166743
>>>>>>>>>>> 
>>>>>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=166743&view=rev
>>>>>>>>>>> Log:
>>>>>>>>>>> X86 SSE Intrinsics: update header for sqrt_ss, rsqrt_ss and rcp_ss.
>>>>>>>>>>> 
>>>>>>>>>>> There intrinsics pass through the upper FP values from the input.
>>>>>>>>>>> rdar://12558838
>>>>>>>>>>> 
>>>>>>>>>>> Modified:
>>>>>>>>>>> cfe/trunk/lib/Headers/xmmintrin.h 
>>>>>>>>>>> cfe/trunk/test/CodeGen/sse-builtins.c
>>>>>>>>>>> 
>>>>>>>>>>> Modified: cfe/trunk/lib/Headers/xmmintrin.h
>>>>>>>>>>> URL: 
>>>>>>>>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/xmm
>>>>>>>>>>> intrin.h?rev=166743&r1=166742&r2=166743&view=diff
>>>>>>>>>>> =============================================================
>>>>>>>>>>> =================
>>>>>>>>>>> --- cfe/trunk/lib/Headers/xmmintrin.h (original)
>>>>>>>>>>> +++ cfe/trunk/lib/Headers/xmmintrin.h Thu Oct 25 19:25:10 
>>>>>>>>>>> +++ 2012
>>>>>>>>>>> @@ -95,7 +95,8 @@
>>>>>>>>>>> static __inline__ __m128 __attribute__((__always_inline__, 
>>>>>>>>>>> __nodebug__))
>>>>>>>>>>> _mm_sqrt_ss(__m128 a)
>>>>>>>>>>> {
>>>>>>>>>>> -  return __builtin_ia32_sqrtss(a);
>>>>>>>>>>> +  __m128 c = __builtin_ia32_sqrtss(a);  return (__m128) { 
>>>>>>>>>>> + c[0], a[1], a[2], a[3] };
>>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> What does exactly does __builtin_ia32_sqrtss return, if not 
>>>>>>>>>> the result of a sqrtss instruction?
>>>>>>>>> builtin returns the result of sqrtss instruction, which only updates the lowest FP. mm_sqrt_ss has the upper FPs pass through to the output.
>>>>>>>> 
>>>>>>>> What exactly is in the top 96 bits of the return value of 
>>>>>>>> __builtin_ia32_sqrtss, then?
>>>>>>> Those bits are undefined with the current implementation, same goes for sqrt_sd.
>>>>>> 
>>>>>> Why are the intrinsics defined to take vectors in the first place, then?
>>>>> Not sure why the builtins for sqrtss, rsqrtss, rcpss and sqrtsd take vectors.
>>>>> Their top bits are undefined since they directly map to the corresponding instructions which have the top bits from the un-initialized destination registers.
>>>>> 
>>>>> Thanks,
>>>>> Manman
>>>>>> 
>>>>>> -Eli
>>>>> 
>>>>> _______________________________________________
>>>>> cfe-commits mailing list
>>>>> cfe-commits at cs.uiuc.edu
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>>>> 
>>>> 
>>> 
>> 
>> 
>