<div dir="ltr">Could you post the patches with Phabricator? That makes it so much easier to review.</div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jan 9, 2015 at 10:27 AM, Ahmed Bougacha <span dir="ltr"><<a href="mailto:ahmed.bougacha@gmail.com" target="_blank">ahmed.bougacha@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi all,<br>

<br>

As the last change in my extload series, here are 3 (WIP) patches to<br>

actually form extloads on vector types.<br>

They used to be disabled, because "None of the supported targets knows<br>

how to perform load and  sign extend on vectors in one instruction."<br>

<br>

The first patch enables the combine on legal vectors, but hides it<br>

behind a profitability callback.<br>

For instance, on ARM, several instructions have folded extload forms,<br>

so it's not always beneficial to create an extload node (and trying to<br>

match extloads is a whole 'nother can of worms).<br>

<br>

The second patch adds a combine to fold extloads of illegal<br>

(splittable) vector types, to replace it directly by multiple smaller<br>

extloads.  I'm not a big fan of this kind of pseudo-legalization in<br>

combines, but I tried the alternative: form illegal extloads, and<br>

later try to split them up, but then, you sometimes generate extloads<br>

that can't be split up, but have a valid ext+load expansion.  At<br>

vector-op legalization time, it's too late to generate this kind of<br>

thing, so it's better to just avoid creating egregiously illegal<br>

nodes.<br>

<br>

<br>

Finally, the last patches enables this all, unconditionally, on X86.<br>

<br>

Note that the splitting combine is happy with "custom" extloads.  As<br>

is, this bypasses the actual custom lowering, and just unrolls the<br>

extload.  But from what I've seen, this is still much better than the<br>

current custom lowering, which does some kind of unrolling at the end<br>

anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added<br>

FIXME).<br>

<br>

Also note that there's a regression in the widen_load-2.ll test, where<br>

we can no longer fold the load. I'll look into that later.<br>

<br>

<br>

Anyway: as can be seen from the nice testcase cleanups, there's<br>

something to be done here.  The combines feel a bit dirty, but I don't<br>

see a better alternative.  Finally, I didn't see changes on the<br>

testsuite (SSE2 X86-64, I'll try SSE4.1 and AVX2 as well.)<br>

<br>

Feedback heartily welcome!<br>

<br>

Thanks,<br>

<br>

- Ahmed<br>

<br>_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

<br></blockquote></div><br></div>