<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Dec 4, 2014 at 7:28 PM, Simon Pilgrim <span dir="ltr"><<a href="mailto:llvm-dev@redking.me.uk" target="_blank">llvm-dev@redking.me.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">chandlerc wrote:<br>

> I think an even better pattern is: movq, pshufd 0,2,2,2?<br>

><br>

> Also, do we correctly match to movd when the source is a foldable load? I can't remember if there is a test case for that, but its really important to not do a shuffle when just loading a single i32 from memory into an xmm register.<br>

</span>Yup - that'd be a nicer pattern (single register!) - easy enough to change.<br></blockquote><div><br></div><div>Looking at this today, I feel like I must be missing something... or I must have really been missing something earlier.</div><div><br></div><div>Why don't we lower this as pand with a constant mask? The load isn't going to cost more in any real world cases, right?</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

There is an existing movd folded load pattern using VMOVDI2PDIrm - I haven't seen any tests for it but it does seem to work alright.</blockquote></div><br>It'd be really nice to add tests for that.</div></div>