PATCH] ARM NEON Lowering: Merge extractelt, bitcast, sitofp sequence

Arnold Schwaighofer aschwaighofer at apple.com
Thu Feb 14 15:47:41 PST 2013


A vectorized sitfp on doubles will get scalarized to a sequence of an
extract_element of <2 x i32>, a bitcast to f32 and a sitofp.
Due to the the extract_element, and the bitcast we will uneccessarily generate
moves between scalar and vector registers.

The patch fixes this by using COPY_TO_REGCLASS and EXTRACT_SUBREG instead.

Example:

define void @vsitofp_double(<2 x i32>* %loadaddr,
                            <2 x double>* %storeaddr) {
  %v0 = load <2 x i32>* %loadaddr
  %r = sitofp <2 x i32> %v0 to <2 x double>
  store <2 x double> %r, <2 x double>* %storeaddr
  ret void
}

We used to generate:
        vldr    d16, [r0]
        vmov.32 r2, d16[1]
        vmov.32 r0, d16[0]
        vmov    s0, r2
        vmov    s2, r0
        vcvt.f64.s32    d17, s0
        vcvt.f64.s32    d16, s2
        vst1.32 {d16, d17}, [r1]
Now we generate:
        vldr    d0, [r0]
        vcvt.f64.s32    d17, s1
        vcvt.f64.s32    d16, s0
        vst1.32 {d16, d17}, [r1]



-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-ARM-Lowering-Merge-extractelt-bitcast-sitofp-sequenc.patch
Type: application/octet-stream
Size: 3270 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130214/e0f0da6d/attachment.obj>


More information about the llvm-commits mailing list