PATCH] ARM NEON Lowering: Merge extractelt, bitcast, sitofp sequence
Arnold Schwaighofer
aschwaighofer at apple.com
Fri Feb 15 09:29:31 PST 2013
Yes, absolutely. I remember thinking this and then ended up forgetting to do it ;).
Thanks for catching it!
On Feb 15, 2013, at 11:26 AM, David Peixotto <dpeixott at codeaurora.org> wrote:
> LGTM. I would just modify the test to add a CHECK for the function symbol
> (e.g. CHECK: vsitofp_double:) to ensure that the following checks are
> falling in the correct part of the output.
>
> -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
> by The Linux Foundation
>
>
>> -----Original Message-----
>> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
>> bounces at cs.uiuc.edu] On Behalf Of Arnold
>> Sent: Friday, February 15, 2013 5:07 AM
>> To: Anton Korobeynikov
>> Cc: Commit Messages and Patches for LLVM; Jakob Olesen
>> Subject: Re: PATCH] ARM NEON Lowering: Merge extractelt, bitcast, sitofp
>> sequence
>>
>> No, those are not generic copies ie COPY, INSERT_SUBREG, etc (but opaque
>> machineinstrs by the time they get to coalescing/ allocation) that is why
> the
>> coalescer/allocator can not reason about them.
>>
>> What I am doing in this patch is to use instructions that the
>> coalescer/allocator can understand.
>>
>> Sent from my iPhone
>>
>> On Feb 15, 2013, at 2:42 AM, Anton Korobeynikov
>> <anton at korobeynikov.info> wrote:
>>
>>> Arnold,
>>>
>>> The patterns look ok for me, but the problem itself looks like some
>>> regalloc deficiency. I'd expect these no-op copies to be coalesced
>>> out.
>>>
>>> Maybe Jakob knows more :)
>>>
>>> On Fri, Feb 15, 2013 at 3:47 AM, Arnold Schwaighofer
>>> <aschwaighofer at apple.com> wrote:
>>>> A vectorized sitfp on doubles will get scalarized to a sequence of an
>>>> extract_element of <2 x i32>, a bitcast to f32 and a sitofp.
>>>> Due to the the extract_element, and the bitcast we will uneccessarily
>>>> generate moves between scalar and vector registers.
>>>>
>>>> The patch fixes this by using COPY_TO_REGCLASS and EXTRACT_SUBREG
>> instead.
>>>>
>>>> Example:
>>>>
>>>> define void @vsitofp_double(<2 x i32>* %loadaddr,
>>>> <2 x double>* %storeaddr) {
>>>> %v0 = load <2 x i32>* %loadaddr
>>>> %r = sitofp <2 x i32> %v0 to <2 x double> store <2 x double> %r, <2
>>>> x double>* %storeaddr ret void }
>>>>
>>>> We used to generate:
>>>> vldr d16, [r0]
>>>> vmov.32 r2, d16[1]
>>>> vmov.32 r0, d16[0]
>>>> vmov s0, r2
>>>> vmov s2, r0
>>>> vcvt.f64.s32 d17, s0
>>>> vcvt.f64.s32 d16, s2
>>>> vst1.32 {d16, d17}, [r1]
>>>> Now we generate:
>>>> vldr d0, [r0]
>>>> vcvt.f64.s32 d17, s1
>>>> vcvt.f64.s32 d16, s0
>>>> vst1.32 {d16, d17}, [r1]
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>>
>>> --
>>> With best regards, Anton Korobeynikov
>>> Faculty of Mathematics and Mechanics, Saint Petersburg State
>>> University
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
More information about the llvm-commits
mailing list