[cfe-commits] r166743 - in /cfe/trunk: lib/Headers/xmmintrin.h test/CodeGen/sse-builtins.c

Michael Liao michael.liao at intel.com
Tue Oct 30 16:21:15 PDT 2012


On Tue, 2012-10-30 at 16:16 -0700, manman ren wrote:
> On Oct 30, 2012, at 4:13 PM, Michael Liao <michael.liao at intel.com> wrote:
> 
> > These unary insns are not modelled accurately to capture the hardware
> > behavior (high parts of result is preserved in SSE variant). Even though
> > the fix works here, it'd better to fix them in X86 backend. Bug is filed
> > in PR14221.
> I actually have a fix in backend, but thought it is cleaner to fix it in the front-end :)

Hmm, I think we'd better fix BE and model them correctly to not only fix
this issue (correctness) but also potential performance issue. Without
modelling the destination register is an input as well, RA may choose a
bad candidate which cause partial register stall issue.

Yours
- Michael

> 
> Thanks,
> Manman
> 
> > 
> > Yours
> > - Michael
> > 
> > On Tue, 2012-10-30 at 10:57 -0700, manman ren wrote:
> >> On Oct 29, 2012, at 6:01 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
> >> 
> >>> On Thu, Oct 25, 2012 at 8:23 PM, Manman Ren <mren at apple.com> wrote:
> >>>> 
> >>>> 
> >>>> Sent from my iPhone
> >>>> 
> >>>> On Oct 25, 2012, at 7:54 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
> >>>> 
> >>>>> On Thu, Oct 25, 2012 at 7:35 PM, Manman Ren <mren at apple.com> wrote:
> >>>>>> 
> >>>>>> 
> >>>>>> Sent from my iPhone
> >>>>>> 
> >>>>>> On Oct 25, 2012, at 6:13 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
> >>>>>> 
> >>>>>>> On Thu, Oct 25, 2012 at 5:25 PM, Manman Ren <mren at apple.com> wrote:
> >>>>>>>> Author: mren
> >>>>>>>> Date: Thu Oct 25 19:25:10 2012
> >>>>>>>> New Revision: 166743
> >>>>>>>> 
> >>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=166743&view=rev
> >>>>>>>> Log:
> >>>>>>>> X86 SSE Intrinsics: update header for sqrt_ss, rsqrt_ss and rcp_ss.
> >>>>>>>> 
> >>>>>>>> There intrinsics pass through the upper FP values from the input.
> >>>>>>>> rdar://12558838
> >>>>>>>> 
> >>>>>>>> Modified:
> >>>>>>>> cfe/trunk/lib/Headers/xmmintrin.h
> >>>>>>>> cfe/trunk/test/CodeGen/sse-builtins.c
> >>>>>>>> 
> >>>>>>>> Modified: cfe/trunk/lib/Headers/xmmintrin.h
> >>>>>>>> URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Headers/xmmintrin.h?rev=166743&r1=166742&r2=166743&view=diff
> >>>>>>>> ==============================================================================
> >>>>>>>> --- cfe/trunk/lib/Headers/xmmintrin.h (original)
> >>>>>>>> +++ cfe/trunk/lib/Headers/xmmintrin.h Thu Oct 25 19:25:10 2012
> >>>>>>>> @@ -95,7 +95,8 @@
> >>>>>>>> static __inline__ __m128 __attribute__((__always_inline__, __nodebug__))
> >>>>>>>> _mm_sqrt_ss(__m128 a)
> >>>>>>>> {
> >>>>>>>> -  return __builtin_ia32_sqrtss(a);
> >>>>>>>> +  __m128 c = __builtin_ia32_sqrtss(a);
> >>>>>>>> +  return (__m128) { c[0], a[1], a[2], a[3] };
> >>>>>>>> }
> >>>>>>> 
> >>>>>>> What does exactly does __builtin_ia32_sqrtss return, if not the result
> >>>>>>> of a sqrtss instruction?
> >>>>>> builtin returns the result of sqrtss instruction, which only updates the lowest FP. mm_sqrt_ss has the upper FPs pass through to the output.
> >>>>> 
> >>>>> What exactly is in the top 96 bits of the return value of
> >>>>> __builtin_ia32_sqrtss, then?
> >>>> Those bits are undefined with the current implementation, same goes for sqrt_sd.
> >>> 
> >>> Why are the intrinsics defined to take vectors in the first place, then?
> >> Not sure why the builtins for sqrtss, rsqrtss, rcpss and sqrtsd take vectors.
> >> Their top bits are undefined since they directly map to the corresponding instructions which have the top bits from the un-initialized destination registers.
> >> 
> >> Thanks,
> >> Manman
> >>> 
> >>> -Eli
> >> 
> >> _______________________________________________
> >> cfe-commits mailing list
> >> cfe-commits at cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
> > 
> > 
> 





More information about the cfe-commits mailing list