PATCH] ARM NEON Lowering: Merge extractelt, bitcast, sitofp sequence
Arnold Schwaighofer
aschwaighofer at apple.com
Thu Feb 14 15:47:41 PST 2013
A vectorized sitfp on doubles will get scalarized to a sequence of an
extract_element of <2 x i32>, a bitcast to f32 and a sitofp.
Due to the the extract_element, and the bitcast we will uneccessarily generate
moves between scalar and vector registers.
The patch fixes this by using COPY_TO_REGCLASS and EXTRACT_SUBREG instead.
Example:
define void @vsitofp_double(<2 x i32>* %loadaddr,
<2 x double>* %storeaddr) {
%v0 = load <2 x i32>* %loadaddr
%r = sitofp <2 x i32> %v0 to <2 x double>
store <2 x double> %r, <2 x double>* %storeaddr
ret void
}
We used to generate:
vldr d16, [r0]
vmov.32 r2, d16[1]
vmov.32 r0, d16[0]
vmov s0, r2
vmov s2, r0
vcvt.f64.s32 d17, s0
vcvt.f64.s32 d16, s2
vst1.32 {d16, d17}, [r1]
Now we generate:
vldr d0, [r0]
vcvt.f64.s32 d17, s1
vcvt.f64.s32 d16, s0
vst1.32 {d16, d17}, [r1]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-ARM-Lowering-Merge-extractelt-bitcast-sitofp-sequenc.patch
Type: application/octet-stream
Size: 3270 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130214/e0f0da6d/attachment.obj>
More information about the llvm-commits
mailing list