PATCH] ARM NEON Lowering: Merge extractelt, bitcast, sitofp sequence

Fri Feb 15 09:26:12 PST 2013

LGTM. I would just modify the test to add a CHECK for the function symbol
(e.g. CHECK: vsitofp_double:) to ensure that the following checks are
falling in the correct part of the output. 

-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
by The Linux Foundation

> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
> bounces at cs.uiuc.edu] On Behalf Of Arnold
> Sent: Friday, February 15, 2013 5:07 AM
> To: Anton Korobeynikov
> Cc: Commit Messages and Patches for LLVM; Jakob Olesen
> Subject: Re: PATCH] ARM NEON Lowering: Merge extractelt, bitcast, sitofp
> sequence
> 
> No, those are not generic copies ie COPY, INSERT_SUBREG, etc (but opaque
> machineinstrs by the time they get to coalescing/ allocation) that is why
the
> coalescer/allocator can not reason about them.
> 
> What I am doing in this patch is to use instructions that the
> coalescer/allocator can understand.
> 
> Sent from my iPhone
> 
> On Feb 15, 2013, at 2:42 AM, Anton Korobeynikov
> <anton at korobeynikov.info> wrote:
> 
> > Arnold,
> >
> > The patterns look ok for me, but the problem itself looks like some
> > regalloc deficiency. I'd expect these no-op copies to be coalesced
> > out.
> >
> > Maybe Jakob knows more :)
> >
> > On Fri, Feb 15, 2013 at 3:47 AM, Arnold Schwaighofer
> > <aschwaighofer at apple.com> wrote:
> >> A vectorized sitfp on doubles will get scalarized to a sequence of an
> >> extract_element of <2 x i32>, a bitcast to f32 and a sitofp.
> >> Due to the the extract_element, and the bitcast we will uneccessarily
> >> generate moves between scalar and vector registers.
> >>
> >> The patch fixes this by using COPY_TO_REGCLASS and EXTRACT_SUBREG
> instead.
> >>
> >> Example:
> >>
> >> define void @vsitofp_double(<2 x i32>* %loadaddr,
> >>                            <2 x double>* %storeaddr) {
> >>  %v0 = load <2 x i32>* %loadaddr
> >>  %r = sitofp <2 x i32> %v0 to <2 x double>  store <2 x double> %r, <2
> >> x double>* %storeaddr  ret void }
> >>
> >> We used to generate:
> >>        vldr    d16, [r0]
> >>        vmov.32 r2, d16[1]
> >>        vmov.32 r0, d16[0]
> >>        vmov    s0, r2
> >>        vmov    s2, r0
> >>        vcvt.f64.s32    d17, s0
> >>        vcvt.f64.s32    d16, s2
> >>        vst1.32 {d16, d17}, [r1]
> >> Now we generate:
> >>        vldr    d0, [r0]
> >>        vcvt.f64.s32    d17, s1
> >>        vcvt.f64.s32    d16, s0
> >>        vst1.32 {d16, d17}, [r1]
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> llvm-commits mailing list
> >> llvm-commits at cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
> >
> >
> > --
> > With best regards, Anton Korobeynikov
> > Faculty of Mathematics and Mechanics, Saint Petersburg State
> > University
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits