[llvm-commits] [llvm] r123135 - /llvm/trunk/lib/Target/README.txt
Chandler Carruth
chandlerc at gmail.com
Sun Jan 9 17:12:40 PST 2011
On Sun, Jan 9, 2011 at 4:39 PM, Chris Lattner <clattner at apple.com> wrote:
> Chandler, I don't see what the issue is here. While it "would be nice" to
> have generic rounding mode support in the IR, there is no problem with
> having an intrinsic here. llvm.x86.sse2.cvtsd2si is a readnone function, so
> it should be optimized just about as well as fptosi. What specifically are
> we missing?
>
> If you're concerned about the extraneous mov + xor in:
> + xorps %xmm1, %xmm1
> + movsd %xmm0, %xmm1
> + cvtsd2sil %xmm1, %eax
>
> The the right fix is to teach SimplifyDemandedVectorElts that
> llvm.x86.sse2.cvtsd2si does not demand a top element. This will allow the
> ir to be optimized to remove the insertion of the 0.0.
>
Interesting. The other, and probably more important thing I was seeing is
code like:
int a() { return f(1.1) + g(2.2); }
After inlining the 'g(2.2)' --> 2 constant folding works, but we're still
left with an intrinsic call with a constant argument of 1.1:
define i32 @_Z1av() nounwind readnone {
entry:
%0 = tail call i32 @llvm.x86.sse2.cvtsd2si(<2 x double> <double
1.100000e+00, double 0.000000e+00>) nounwind
%add = add nsw i32 %0, 2
ret i32 %add
}
However, perhaps the right way to solve this is along the same lines: teach
a pass to fold constant arguments to that intrinsic. I don't know how long a
list of these types of transformations there will be however. If constant
prop is enough, maybe this is the best way to go.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20110109/1b4636fd/attachment.html>
More information about the llvm-commits
mailing list