<div class="gmail_quote">On 17 August 2010 18:49, Bob Wilson <span dir="ltr"><<a href="mailto:bob.wilson@apple.com" target="_blank">bob.wilson@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
That's pretty clever, but I've got a simpler patch that produces better code. Committed as svn 111341.<br></blockquote><div><br></div><div>Hah. Yeah that is a bit simpler, isn't it. Thank you. If you don't mind I'm going to let you do -- hopefully the exact same thing -- for sign extend and any extend.</div>
<div><br></div><div>To hijack my own thread, I'm also trying to add a combine for (add (mul (zext a), (zext b)), c) --> (vmlal a, b, c). There's an intrinsic for vmlal and I tried producing that, but ended up triggering "Could not select: intrinsic %llvm.arm.neon.vmlalu" even though it handles that intrinsic coming from IR just fine.<br>
<br>Would you mind taking a look at the attached patch and telling me if there's a much simpler way to do this too? :) I don't want to manually list out which types vmlal is valid for, but I didn't find an API which would lower a single intrinsic for me or else return SDValue(), for example.<br>
<br>Thanks!<br><br>Nick<br></div><div><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
_test1: @ @test1<br>
Leh_func_begin0:<br>
@ BB#0:<br>
vmov d0, r0, r1<br>
vmov.u16 r0, d0[0]<br>
mov r1, #255<br>
orr r1, r1, #255, 24 @ 65280<br>
vmov.u16 r2, d0[1]<br>
vmov.u16 r3, d0[2]<br>
vmov.u16 r12, d0[3]<br>
and r0, r0, r1<br>
and r2, r2, r1<br>
and r3, r3, r1<br>
and r1, r12, r1<br>
vmov s3, r1<br>
vmov s2, r3<br>
vmov s1, r2<br>
vmov s0, r0<br>
<div> vmov r0, r1, d0<br>
vmov r2, r3, d1<br>
</div> mov pc, lr<br>
Leh_func_end0:<br>
<br>
It's still pretty awful. The ANDs to mask off the high bits are unnecessary, and all those extra VMOVs should be avoided.<br>
<br>
We probably need to expand SIGN_EXTEND and ANY_EXTEND as well. I can look at that later, since I'm off to dinner now.<br>
<div><div></div><div><br>
On Aug 17, 2010, at 3:58 PM, Nick Lewycky wrote:<br>
<br>
> This patch implements custom lowering for ZEXT to N x i32 vector types. Currently llvm just crashes.<br>
><br>
> I'm not very qualified either in the backend or with ARM assembly. Please review carefully!<br>
><br>
> Since you're probably wondering, the code it produces is lengthy:<br>
><br>
> test1: @ @test1<br>
> @ BB#0:<br>
> str r11, [sp, #-4]!<br>
> mov r11, sp<br>
> sub sp, sp, #28<br>
> bic sp, sp, #15<br>
> vmov.i32 d0, #0x0<br>
> vmov.u16 r2, d0[0]<br>
> vmov d0, r0, r1<br>
> strh r2, [sp, #14]<br>
> strh r2, [sp, #10]<br>
> strh r2, [sp, #6]<br>
> strh r2, [sp, #2]<br>
> vmov.u16 r0, d0[3]<br>
> vmov.u16 r2, d0[2]<br>
> vmov.u16 r1, d0[1]<br>
> strh r0, [sp, #12]<br>
> strh r2, [sp, #8]<br>
> vmov.u16 r2, d0[0]<br>
> strh r1, [sp, #4]<br>
> mov r1, sp<br>
> strh r2, [sp]<br>
> vldmia r1, {d0, d1}<br>
> vmov r0, r1, d0<br>
> vmov r2, r3, d1<br>
> mov sp, r11<br>
> ldr r11, [sp], #4<br>
> mov pc, lr<br>
><br>
> Nick<br>
><br>
</div></div>> <neon-zext.patch>_______________________________________________<br>
> llvm-commits mailing list<br>
> <a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>
> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>
<br>
</blockquote></div><br>