[cfe-commits] r166743 - in /cfe/trunk: lib/Headers/xmmintrin.h test/CodeGen/sse-builtins.c
Michael Liao
michael.liao at intel.com
Wed Oct 31 16:34:18 PDT 2012
If the backend is fixed, shall we revert this patch to FE? It may
generate inefficient code.
Yours
- Michael
On Wed, 2012-10-31 at 12:45 -0700, manman ren wrote:
> On Oct 30, 2012, at 4:58 PM, "Liao, Michael" <michael.liao at intel.com> wrote:
>
> > Thanks. That would be great to model SSE variants of SQRTSS almost the same as AVX variants with tied input/output operand.
>
> Committed r167064.
>
> Thanks,
> Manman
>
> >
> > Yours
> > - Michael
> >
> > -----Original Message-----
> > From: manman ren [mailto:mren at apple.com]
> > Sent: Tuesday, October 30, 2012 4:52 PM
> > To: Liao, Michael
> > Cc: Eli Friedman; cfe-commits at cs.uiuc.edu Commits
> > Subject: Re: [cfe-commits] r166743 - in /cfe/trunk: lib/Headers/xmmintrin.h test/CodeGen/sse-builtins.c
> >
> >
> > On Oct 30, 2012, at 4:21 PM, Michael Liao <michael.liao at intel.com> wrote:
> >
> >> On Tue, 2012-10-30 at 16:16 -0700, manman ren wrote:
> >>> On Oct 30, 2012, at 4:13 PM, Michael Liao <michael.liao at intel.com> wrote:
> >>>
> >>>> These unary insns are not modelled accurately to capture the
> >>>> hardware behavior (high parts of result is preserved in SSE
> >>>> variant). Even though the fix works here, it'd better to fix them in
> >>>> X86 backend. Bug is filed in PR14221.
> >>> I actually have a fix in backend, but thought it is cleaner to fix it
> >>> in the front-end :)
> >>
> >> Hmm, I think we'd better fix BE and model them correctly to not only
> >> fix this issue (correctness) but also potential performance issue.
> >> Without modelling the destination register is an input as well, RA may
> >> choose a bad candidate which cause partial register stall issue.
> >
> > The correctness issue only happens with vector inputs, and currently we select the vector variant for intrinsics only.
> > My fix in BE actually will force register allocator to choose the same register for both operands.
> > So it should avoid the partial register stall issue for the vector variant.
> > Thanks for pointing it out, I will check in my BE fix for rsqrt and rcp.
> > The definition of builtins for rsqrtss, rcpss will be updated to be the same as the intrinsics.
> > sqrtss and sqrtsd are not included yet.
> >
> > Manman
> >
> >>
> >> Yours
> >> - Michael
> >>
> >>>
> >>> Thanks,
> >>> Manman
> >>>
> >>>>
> >>>> Yours
> >>>> - Michael
> >>>>
> >>>> On Tue, 2012-10-30 at 10:57 -0700, manman ren wrote:
> >>>>> On Oct 29, 2012, at 6:01 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
> >>>>>
> >>>>>> On Thu, Oct 25, 2012 at 8:23 PM, Manman Ren <mren at apple.com> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> Sent from my iPhone
> >>>>>>>
> >>>>>>> On Oct 25, 2012, at 7:54 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
> >>>>>>>
> >>>>>>>> On Thu, Oct 25, 2012 at 7:35 PM, Manman Ren <mren at apple.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Sent from my iPhone
> >>>>>>>>>
> >>>>>>>>> On Oct 25, 2012, at 6:13 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>> On Thu, Oct 25, 2012 at 5:25 PM, Manman Ren <mren at apple.com> wrote:
> >>>>>>>>>>> Author: mren
> >>>>>>>>>>> Date: Thu Oct 25 19:25:10 2012 New Revision: 166743
> >>>>>>>>>>>
> >>>>>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=166743&view=rev
> >>>>>>>>>>> Log:
> >>>>>>>>>>> X86 SSE Intrinsics: update header for sqrt_ss, rsqrt_ss and rcp_ss.
> >>>>>>>>>>>
> >>>>>>>>>>> There intrinsics pass through the upper FP values from the input.
> >>>>>>>>>>> rdar://12558838
> >>>>>>>>>>>
> >>>>>>>>>>> Modified:
> >>>>>>>>>>> cfe/trunk/lib/Headers/xmmintrin.h
> >>>>>>>>>>> cfe/trunk/test/CodeGen/sse-builtins.c
> >>>>>>>>>>>
> >>>>>>>>>>> Modified: cfe/trunk/lib/Headers/xmmintrin.h
> >>>>>>>>>>> URL:
> >>>>>>>>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/xmm
> >>>>>>>>>>> intrin.h?rev=166743&r1=166742&r2=166743&view=diff
> >>>>>>>>>>> =============================================================
> >>>>>>>>>>> =================
> >>>>>>>>>>> --- cfe/trunk/lib/Headers/xmmintrin.h (original)
> >>>>>>>>>>> +++ cfe/trunk/lib/Headers/xmmintrin.h Thu Oct 25 19:25:10
> >>>>>>>>>>> +++ 2012
> >>>>>>>>>>> @@ -95,7 +95,8 @@
> >>>>>>>>>>> static __inline__ __m128 __attribute__((__always_inline__,
> >>>>>>>>>>> __nodebug__))
> >>>>>>>>>>> _mm_sqrt_ss(__m128 a)
> >>>>>>>>>>> {
> >>>>>>>>>>> - return __builtin_ia32_sqrtss(a);
> >>>>>>>>>>> + __m128 c = __builtin_ia32_sqrtss(a); return (__m128) {
> >>>>>>>>>>> + c[0], a[1], a[2], a[3] };
> >>>>>>>>>>> }
> >>>>>>>>>>
> >>>>>>>>>> What does exactly does __builtin_ia32_sqrtss return, if not
> >>>>>>>>>> the result of a sqrtss instruction?
> >>>>>>>>> builtin returns the result of sqrtss instruction, which only updates the lowest FP. mm_sqrt_ss has the upper FPs pass through to the output.
> >>>>>>>>
> >>>>>>>> What exactly is in the top 96 bits of the return value of
> >>>>>>>> __builtin_ia32_sqrtss, then?
> >>>>>>> Those bits are undefined with the current implementation, same goes for sqrt_sd.
> >>>>>>
> >>>>>> Why are the intrinsics defined to take vectors in the first place, then?
> >>>>> Not sure why the builtins for sqrtss, rsqrtss, rcpss and sqrtsd take vectors.
> >>>>> Their top bits are undefined since they directly map to the corresponding instructions which have the top bits from the un-initialized destination registers.
> >>>>>
> >>>>> Thanks,
> >>>>> Manman
> >>>>>>
> >>>>>> -Eli
> >>>>>
> >>>>> _______________________________________________
> >>>>> cfe-commits mailing list
> >>>>> cfe-commits at cs.uiuc.edu
> >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
> >>>>
> >>>>
> >>>
> >>
> >>
> >
>
More information about the cfe-commits
mailing list