PATCH] ARM NEON Lowering: Merge extractelt, bitcast, sitofp sequence
David Peixotto
dpeixott at codeaurora.org
Fri Feb 15 09:26:12 PST 2013
LGTM. I would just modify the test to add a CHECK for the function symbol
(e.g. CHECK: vsitofp_double:) to ensure that the following checks are
falling in the correct part of the output.
-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
by The Linux Foundation
> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
> bounces at cs.uiuc.edu] On Behalf Of Arnold
> Sent: Friday, February 15, 2013 5:07 AM
> To: Anton Korobeynikov
> Cc: Commit Messages and Patches for LLVM; Jakob Olesen
> Subject: Re: PATCH] ARM NEON Lowering: Merge extractelt, bitcast, sitofp
> sequence
>
> No, those are not generic copies ie COPY, INSERT_SUBREG, etc (but opaque
> machineinstrs by the time they get to coalescing/ allocation) that is why
the
> coalescer/allocator can not reason about them.
>
> What I am doing in this patch is to use instructions that the
> coalescer/allocator can understand.
>
> Sent from my iPhone
>
> On Feb 15, 2013, at 2:42 AM, Anton Korobeynikov
> <anton at korobeynikov.info> wrote:
>
> > Arnold,
> >
> > The patterns look ok for me, but the problem itself looks like some
> > regalloc deficiency. I'd expect these no-op copies to be coalesced
> > out.
> >
> > Maybe Jakob knows more :)
> >
> > On Fri, Feb 15, 2013 at 3:47 AM, Arnold Schwaighofer
> > <aschwaighofer at apple.com> wrote:
> >> A vectorized sitfp on doubles will get scalarized to a sequence of an
> >> extract_element of <2 x i32>, a bitcast to f32 and a sitofp.
> >> Due to the the extract_element, and the bitcast we will uneccessarily
> >> generate moves between scalar and vector registers.
> >>
> >> The patch fixes this by using COPY_TO_REGCLASS and EXTRACT_SUBREG
> instead.
> >>
> >> Example:
> >>
> >> define void @vsitofp_double(<2 x i32>* %loadaddr,
> >> <2 x double>* %storeaddr) {
> >> %v0 = load <2 x i32>* %loadaddr
> >> %r = sitofp <2 x i32> %v0 to <2 x double> store <2 x double> %r, <2
> >> x double>* %storeaddr ret void }
> >>
> >> We used to generate:
> >> vldr d16, [r0]
> >> vmov.32 r2, d16[1]
> >> vmov.32 r0, d16[0]
> >> vmov s0, r2
> >> vmov s2, r0
> >> vcvt.f64.s32 d17, s0
> >> vcvt.f64.s32 d16, s2
> >> vst1.32 {d16, d17}, [r1]
> >> Now we generate:
> >> vldr d0, [r0]
> >> vcvt.f64.s32 d17, s1
> >> vcvt.f64.s32 d16, s0
> >> vst1.32 {d16, d17}, [r1]
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> llvm-commits mailing list
> >> llvm-commits at cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
> >
> >
> > --
> > With best regards, Anton Korobeynikov
> > Faculty of Mathematics and Mechanics, Saint Petersburg State
> > University
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list