<div dir="ltr">Hi Tim,<div><br></div><div>Thanks for your response. The attached is the .bc file after my pass. I could generate the assembly with -mcpu=skx but not with -mcpu=core-avx2. Could you please take a look? BTW, I am using LLVM-3.7. </div><div><br></div><div>Best,</div><div>Zhi</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jan 20, 2016 at 1:21 PM, Tim Northover <span dir="ltr"><<a href="mailto:t.p.northover@gmail.com" target="_blank">t.p.northover@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">> Only typo that caught my eye is ‘llvm.masked.gather.v8f64’ which should have v2 instead of v8 to match the <2 x double><br>

<br>

</span>There's an extra comma after an "i1" too. But they both just result in<br>

LLVM rejecting the code immediately.<br>

<span class=""><br>

> But it still fails if I use -mcpu=core-avx2.<br>

<br>

</span>My simple tests get correctly expanded to scalar loads. I've still not<br>

seen a selection failure.<br>

<span class=""><br>

> It seems that avx2 supports gather/scatter, but I am not sure why it doesn't work.<br>

<br>

</span>AVX2 supports some gather instructions, but they're more limited than<br>

the AVX-512 variants ones @llvm.masked.gather was added for. It looks<br>

like you can get the AVX2 ones using x86-specific intrinsics (look for<br>

@llvm.x86.avx2.gather.d.pd etc in tests/CodeGen/X86).<br>

<br>

It might make sense to use the AVX2 ones for @llvm.masked.gather as<br>

well, but there would be more register shuffling so it might not.<br>

Either way, no-one seems to have done so yet.<br>

<br>

Cheers.<br>

<span class="HOEnZb"><font color="#888888"><br>

Tim.<br>

</font></span></blockquote></div><br></div>