<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Fri, Jul 22, 2016 at 7:06 AM Simon Pilgrim via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Author: rksimon<br>

Date: Fri Jul 22 08:58:44 2016<br>

New Revision: 276416<br>

<br>

URL: <a href="http://llvm.org/viewvc/llvm-project?rev=276416&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project?rev=276416&view=rev</a><br>

Log:<br>

[X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128 (reapplied)<br>

<br>

As reported on PR26235, we don't currently make use of the VBROADCASTF128/VBROADCASTI128 instructions (or the AVX512 equivalents) to load+splat a 128-bit vector to both lanes of a 256-bit vector.<br>

<br>

This patch enables lowering from subvector insertion/concatenation patterns and auto-upgrades the llvm.x86.avx.vbroadcastf128.pd.256 / llvm.x86.avx.vbroadcastf128.ps.256 intrinsics to match.<br>

<br>

We could possibly investigate using VBROADCASTF128/VBROADCASTI128 to load repeated constants as well (similar to how we already do for scalar broadcasts).<br>

<br>

Reapplied with fix for PR28657 - removed intrinsic definitions (clang companion patch to be be submitted shortly).<br></blockquote><div><br></div><div>I fixed a bug in the Clang emission in r278202 where it explicitly aligned the operand at 16 bytes, and it turns out the auto upgrade code here:</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">       Value *Load = Builder.CreateLoad(VT, Op);<br></blockquote><div><br></div><div>seems like it might have the same bug. I don't recall what the "deduced" alignment is for a <double, double> but I suspect that it is 16-byte alignment. If so, this would miscompile things during auto-upgrading.</div><div><br></div><div>I've made this an explicitly unaligned load much like we do in Clang for unaligned SSE and AVX loads in r278203. Does that make sense to you? Would you prefer a different approach?</div><div><br></div><div>One issue I have with that patch is that I can't test it becuase we only have assembly tests for autoupgrade.</div><div><br></div><div>I think we really need to have the autoupgrade test file check both the *IR* after auto-upgrade runs and the assembly so that we can write precise regression tests when we fix an IR-level bug even if the asm doesn't change. Is that something you could look into as you're working on this area?</div><div><br></div><div><br></div><div>Anyways, thanks for all the awesome improvements to the intrinsic lowering!</div></div></div>