PATCH] ARM NEON Lowering: Merge extractelt, bitcast, sitofp sequence

Fri Feb 15 09:29:31 PST 2013

Yes, absolutely. I remember thinking this and then ended up forgetting to do it ;).
Thanks for catching it!

On Feb 15, 2013, at 11:26 AM, David Peixotto <dpeixott at codeaurora.org> wrote:

> LGTM. I would just modify the test to add a CHECK for the function symbol
> (e.g. CHECK: vsitofp_double:) to ensure that the following checks are
> falling in the correct part of the output. 
> 
> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
> by The Linux Foundation
> 
> 
>> -----Original Message-----
>> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
>> bounces at cs.uiuc.edu] On Behalf Of Arnold
>> Sent: Friday, February 15, 2013 5:07 AM
>> To: Anton Korobeynikov
>> Cc: Commit Messages and Patches for LLVM; Jakob Olesen
>> Subject: Re: PATCH] ARM NEON Lowering: Merge extractelt, bitcast, sitofp
>> sequence
>> 
>> No, those are not generic copies ie COPY, INSERT_SUBREG, etc (but opaque
>> machineinstrs by the time they get to coalescing/ allocation) that is why
> the
>> coalescer/allocator can not reason about them.
>> 
>> What I am doing in this patch is to use instructions that the
>> coalescer/allocator can understand.
>> 
>> Sent from my iPhone
>> 
>> On Feb 15, 2013, at 2:42 AM, Anton Korobeynikov
>> <anton at korobeynikov.info> wrote:
>> 
>>> Arnold,
>>> 
>>> The patterns look ok for me, but the problem itself looks like some
>>> regalloc deficiency. I'd expect these no-op copies to be coalesced
>>> out.
>>> 
>>> Maybe Jakob knows more :)
>>> 
>>> On Fri, Feb 15, 2013 at 3:47 AM, Arnold Schwaighofer
>>> <aschwaighofer at apple.com> wrote:
>>>> A vectorized sitfp on doubles will get scalarized to a sequence of an
>>>> extract_element of <2 x i32>, a bitcast to f32 and a sitofp.
>>>> Due to the the extract_element, and the bitcast we will uneccessarily
>>>> generate moves between scalar and vector registers.
>>>> 
>>>> The patch fixes this by using COPY_TO_REGCLASS and EXTRACT_SUBREG
>> instead.
>>>> 
>>>> Example:
>>>> 
>>>> define void @vsitofp_double(<2 x i32>* %loadaddr,
>>>>                           <2 x double>* %storeaddr) {
>>>> %v0 = load <2 x i32>* %loadaddr
>>>> %r = sitofp <2 x i32> %v0 to <2 x double>  store <2 x double> %r, <2
>>>> x double>* %storeaddr  ret void }
>>>> 
>>>> We used to generate:
>>>>       vldr    d16, [r0]
>>>>       vmov.32 r2, d16[1]
>>>>       vmov.32 r0, d16[0]
>>>>       vmov    s0, r2
>>>>       vmov    s2, r0
>>>>       vcvt.f64.s32    d17, s0
>>>>       vcvt.f64.s32    d16, s2
>>>>       vst1.32 {d16, d17}, [r1]
>>>> Now we generate:
>>>>       vldr    d0, [r0]
>>>>       vcvt.f64.s32    d17, s1
>>>>       vcvt.f64.s32    d16, s0
>>>>       vst1.32 {d16, d17}, [r1]
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>> 
>>> 
>>> 
>>> --
>>> With best regards, Anton Korobeynikov
>>> Faculty of Mathematics and Mechanics, Saint Petersburg State
>>> University
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>