[cfe-commits] r166743 - in /cfe/trunk: lib/Headers/xmmintrin.h test/CodeGen/sse-builtins.c

Wed Oct 31 16:34:18 PDT 2012

If the backend is fixed, shall we revert this patch to FE? It may
generate inefficient code.

Yours
- Michael

On Wed, 2012-10-31 at 12:45 -0700, manman ren wrote:
> On Oct 30, 2012, at 4:58 PM, "Liao, Michael" <michael.liao at intel.com> wrote:
> 
> > Thanks. That would be great to model SSE variants of SQRTSS almost the same as AVX variants with tied input/output operand.
> 
> Committed r167064.
> 
> Thanks,
> Manman
> 
> > 
> > Yours
> > - Michael
> > 
> > -----Original Message-----
> > From: manman ren [mailto:mren at apple.com] 
> > Sent: Tuesday, October 30, 2012 4:52 PM
> > To: Liao, Michael
> > Cc: Eli Friedman; cfe-commits at cs.uiuc.edu Commits
> > Subject: Re: [cfe-commits] r166743 - in /cfe/trunk: lib/Headers/xmmintrin.h test/CodeGen/sse-builtins.c
> > 
> > 
> > On Oct 30, 2012, at 4:21 PM, Michael Liao <michael.liao at intel.com> wrote:
> > 
> >> On Tue, 2012-10-30 at 16:16 -0700, manman ren wrote:
> >>> On Oct 30, 2012, at 4:13 PM, Michael Liao <michael.liao at intel.com> wrote:
> >>> 
> >>>> These unary insns are not modelled accurately to capture the 
> >>>> hardware behavior (high parts of result is preserved in SSE 
> >>>> variant). Even though the fix works here, it'd better to fix them in 
> >>>> X86 backend. Bug is filed in PR14221.
> >>> I actually have a fix in backend, but thought it is cleaner to fix it 
> >>> in the front-end :)
> >> 
> >> Hmm, I think we'd better fix BE and model them correctly to not only 
> >> fix this issue (correctness) but also potential performance issue. 
> >> Without modelling the destination register is an input as well, RA may 
> >> choose a bad candidate which cause partial register stall issue.
> > 
> > The correctness issue only happens with vector inputs, and currently we select the vector variant for intrinsics only.
> > My fix in BE actually will force register allocator to choose the same register for both operands.
> > So it should avoid the partial register stall issue for the vector variant.
> > Thanks for pointing it out, I will check in my BE fix for rsqrt and rcp.
> > The definition of builtins for rsqrtss, rcpss will be updated to be the same as the intrinsics.
> > sqrtss and sqrtsd are not included yet.
> > 
> > Manman 
> > 
> >> 
> >> Yours
> >> - Michael
> >> 
> >>> 
> >>> Thanks,
> >>> Manman
> >>> 
> >>>> 
> >>>> Yours
> >>>> - Michael
> >>>> 
> >>>> On Tue, 2012-10-30 at 10:57 -0700, manman ren wrote:
> >>>>> On Oct 29, 2012, at 6:01 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
> >>>>> 
> >>>>>> On Thu, Oct 25, 2012 at 8:23 PM, Manman Ren <mren at apple.com> wrote:
> >>>>>>> 
> >>>>>>> 
> >>>>>>> Sent from my iPhone
> >>>>>>> 
> >>>>>>> On Oct 25, 2012, at 7:54 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
> >>>>>>> 
> >>>>>>>> On Thu, Oct 25, 2012 at 7:35 PM, Manman Ren <mren at apple.com> wrote:
> >>>>>>>>> 
> >>>>>>>>> 
> >>>>>>>>> Sent from my iPhone
> >>>>>>>>> 
> >>>>>>>>> On Oct 25, 2012, at 6:13 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
> >>>>>>>>> 
> >>>>>>>>>> On Thu, Oct 25, 2012 at 5:25 PM, Manman Ren <mren at apple.com> wrote:
> >>>>>>>>>>> Author: mren
> >>>>>>>>>>> Date: Thu Oct 25 19:25:10 2012 New Revision: 166743
> >>>>>>>>>>> 
> >>>>>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=166743&view=rev
> >>>>>>>>>>> Log:
> >>>>>>>>>>> X86 SSE Intrinsics: update header for sqrt_ss, rsqrt_ss and rcp_ss.
> >>>>>>>>>>> 
> >>>>>>>>>>> There intrinsics pass through the upper FP values from the input.
> >>>>>>>>>>> rdar://12558838
> >>>>>>>>>>> 
> >>>>>>>>>>> Modified:
> >>>>>>>>>>> cfe/trunk/lib/Headers/xmmintrin.h 
> >>>>>>>>>>> cfe/trunk/test/CodeGen/sse-builtins.c
> >>>>>>>>>>> 
> >>>>>>>>>>> Modified: cfe/trunk/lib/Headers/xmmintrin.h
> >>>>>>>>>>> URL: 
> >>>>>>>>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/xmm
> >>>>>>>>>>> intrin.h?rev=166743&r1=166742&r2=166743&view=diff
> >>>>>>>>>>> =============================================================
> >>>>>>>>>>> =================
> >>>>>>>>>>> --- cfe/trunk/lib/Headers/xmmintrin.h (original)
> >>>>>>>>>>> +++ cfe/trunk/lib/Headers/xmmintrin.h Thu Oct 25 19:25:10 
> >>>>>>>>>>> +++ 2012
> >>>>>>>>>>> @@ -95,7 +95,8 @@
> >>>>>>>>>>> static __inline__ __m128 __attribute__((__always_inline__, 
> >>>>>>>>>>> __nodebug__))
> >>>>>>>>>>> _mm_sqrt_ss(__m128 a)
> >>>>>>>>>>> {
> >>>>>>>>>>> -  return __builtin_ia32_sqrtss(a);
> >>>>>>>>>>> +  __m128 c = __builtin_ia32_sqrtss(a);  return (__m128) { 
> >>>>>>>>>>> + c[0], a[1], a[2], a[3] };
> >>>>>>>>>>> }
> >>>>>>>>>> 
> >>>>>>>>>> What does exactly does __builtin_ia32_sqrtss return, if not 
> >>>>>>>>>> the result of a sqrtss instruction?
> >>>>>>>>> builtin returns the result of sqrtss instruction, which only updates the lowest FP. mm_sqrt_ss has the upper FPs pass through to the output.
> >>>>>>>> 
> >>>>>>>> What exactly is in the top 96 bits of the return value of 
> >>>>>>>> __builtin_ia32_sqrtss, then?
> >>>>>>> Those bits are undefined with the current implementation, same goes for sqrt_sd.
> >>>>>> 
> >>>>>> Why are the intrinsics defined to take vectors in the first place, then?
> >>>>> Not sure why the builtins for sqrtss, rsqrtss, rcpss and sqrtsd take vectors.
> >>>>> Their top bits are undefined since they directly map to the corresponding instructions which have the top bits from the un-initialized destination registers.
> >>>>> 
> >>>>> Thanks,
> >>>>> Manman
> >>>>>> 
> >>>>>> -Eli
> >>>>> 
> >>>>> _______________________________________________
> >>>>> cfe-commits mailing list
> >>>>> cfe-commits at cs.uiuc.edu
> >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
> >>>> 
> >>>> 
> >>> 
> >> 
> >> 
> > 
>