[llvm-commits] [llvm] r123135 - /llvm/trunk/lib/Target/README.txt

Sun Jan 9 17:12:40 PST 2011

On Sun, Jan 9, 2011 at 4:39 PM, Chris Lattner <clattner at apple.com> wrote:

> Chandler, I don't see what the issue is here.  While it "would be nice" to
> have generic rounding mode support in the IR, there is no problem with
> having an intrinsic here.  llvm.x86.sse2.cvtsd2si is a readnone function, so
> it should be optimized just about as well as fptosi.  What specifically are
> we missing?
>
> If you're concerned about the extraneous mov + xor in:
> +        xorps   %xmm1, %xmm1
> +        movsd   %xmm0, %xmm1
> +        cvtsd2sil       %xmm1, %eax
>
> The the right fix is to teach SimplifyDemandedVectorElts that
> llvm.x86.sse2.cvtsd2si does not demand a top element.  This will allow the
> ir to be optimized to remove the insertion of the 0.0.
>

Interesting. The other, and probably more important thing I was seeing is
code like:

int a() { return f(1.1) + g(2.2); }

After inlining the 'g(2.2)' --> 2 constant folding works, but we're still
left with an intrinsic call with a constant argument of 1.1:

define i32 @_Z1av() nounwind readnone {
entry:
  %0 = tail call i32 @llvm.x86.sse2.cvtsd2si(<2 x double> <double
1.100000e+00, double 0.000000e+00>) nounwind
  %add = add nsw i32 %0, 2
  ret i32 %add
}

However, perhaps the right way to solve this is along the same lines: teach
a pass to fold constant arguments to that intrinsic. I don't know how long a
list of these types of transformations there will be however. If constant
prop is enough, maybe this is the best way to go.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20110109/1b4636fd/attachment.html>