[llvm] r271510 - [X86][SSE] Replace (V)CVTTPS2DQ and VCVTTPD2DQ truncating (round to zero) f32/f64 to i32 with generic IR (llvm)
Eli Friedman via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 6 13:27:58 PDT 2016
On Wed, Jul 6, 2016 at 1:09 PM, Simon Pilgrim <llvm-dev at redking.me.uk>
> So that patch was mainly about matching the behaviour of _mm_cvttsd_epi32
> (cvttsd2si) which already did the scalar equivalent.
> It turns out there is a problem if constant folding of an out of range
> values occurs as LangRef says it should be undefined (although it actually
> sets the result to zero). But the cvttsd2si/cvttps2dq/cvttpd2dq
> instructions guarantee that the result is actually 0x80000000.
> I’ll prepare a reversion patch that reverts these changes (including for
> the scalar) and performs a fast-math only combine instead.
I just looked into it, and it turns out I screwed this up back in 2009
(r72979). Apparently nobody noticed. :(
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits