PATCH] ARM NEON Lowering: Merge extractelt, bitcast, sitofp sequence
Renato Golin
renato.golin at linaro.org
Thu Feb 14 16:04:04 PST 2013
Hi Arnold,
Nice catch! I'm no expert in TableGen, especially the patterns, so I'll
wait for others to comment.
Typo in:
+// Fix scalarized uitof/sitof 2xf64 to no use intermediate scalar
registers.
-> "to NOT use"
cheers,
--renato
On 14 February 2013 23:47, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:
> A vectorized sitfp on doubles will get scalarized to a sequence of an
> extract_element of <2 x i32>, a bitcast to f32 and a sitofp.
> Due to the the extract_element, and the bitcast we will uneccessarily
> generate
> moves between scalar and vector registers.
>
> The patch fixes this by using COPY_TO_REGCLASS and EXTRACT_SUBREG instead.
>
> Example:
>
> define void @vsitofp_double(<2 x i32>* %loadaddr,
> <2 x double>* %storeaddr) {
> %v0 = load <2 x i32>* %loadaddr
> %r = sitofp <2 x i32> %v0 to <2 x double>
> store <2 x double> %r, <2 x double>* %storeaddr
> ret void
> }
>
> We used to generate:
> vldr d16, [r0]
> vmov.32 r2, d16[1]
> vmov.32 r0, d16[0]
> vmov s0, r2
> vmov s2, r0
> vcvt.f64.s32 d17, s0
> vcvt.f64.s32 d16, s2
> vst1.32 {d16, d17}, [r1]
> Now we generate:
> vldr d0, [r0]
> vcvt.f64.s32 d17, s1
> vcvt.f64.s32 d16, s0
> vst1.32 {d16, d17}, [r1]
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130215/01c67c32/attachment.html>
More information about the llvm-commits
mailing list