PATCH] ARM NEON Lowering: Merge extractelt, bitcast, sitofp sequence

Fri Feb 15 05:06:39 PST 2013

No, those are not generic copies ie COPY, INSERT_SUBREG, etc (but opaque machineinstrs by the time they get to coalescing/ allocation) that is why the coalescer/allocator can not reason about them.

What I am doing in this patch is to use instructions that the coalescer/allocator can understand.

Sent from my iPhone

On Feb 15, 2013, at 2:42 AM, Anton Korobeynikov <anton at korobeynikov.info> wrote:

> Arnold,
> 
> The patterns look ok for me, but the problem itself looks like some
> regalloc deficiency. I'd expect these no-op copies to be coalesced
> out.
> 
> Maybe Jakob knows more :)
> 
> On Fri, Feb 15, 2013 at 3:47 AM, Arnold Schwaighofer
> <aschwaighofer at apple.com> wrote:
>> A vectorized sitfp on doubles will get scalarized to a sequence of an
>> extract_element of <2 x i32>, a bitcast to f32 and a sitofp.
>> Due to the the extract_element, and the bitcast we will uneccessarily generate
>> moves between scalar and vector registers.
>> 
>> The patch fixes this by using COPY_TO_REGCLASS and EXTRACT_SUBREG instead.
>> 
>> Example:
>> 
>> define void @vsitofp_double(<2 x i32>* %loadaddr,
>>                            <2 x double>* %storeaddr) {
>>  %v0 = load <2 x i32>* %loadaddr
>>  %r = sitofp <2 x i32> %v0 to <2 x double>
>>  store <2 x double> %r, <2 x double>* %storeaddr
>>  ret void
>> }
>> 
>> We used to generate:
>>        vldr    d16, [r0]
>>        vmov.32 r2, d16[1]
>>        vmov.32 r0, d16[0]
>>        vmov    s0, r2
>>        vmov    s2, r0
>>        vcvt.f64.s32    d17, s0
>>        vcvt.f64.s32    d16, s2
>>        vst1.32 {d16, d17}, [r1]
>> Now we generate:
>>        vldr    d0, [r0]
>>        vcvt.f64.s32    d17, s1
>>        vcvt.f64.s32    d16, s0
>>        vst1.32 {d16, d17}, [r1]
>> 
>> 
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 
> 
> --
> With best regards, Anton Korobeynikov
> Faculty of Mathematics and Mechanics, Saint Petersburg State University