<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Fri, Jul 22, 2016 at 7:06 AM Simon Pilgrim via cfe-commits <<a href="mailto:cfe-commits@lists.llvm.org">cfe-commits@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Author: rksimon<br>
Date: Fri Jul 22 08:58:56 2016<br>
New Revision: 276417<br>
<br>
URL: <a href="http://llvm.org/viewvc/llvm-project?rev=276417&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project?rev=276417&view=rev</a><br>
Log:<br>
[X86][AVX] Added support for lowering to VBROADCASTF128/VBROADCASTI128 with generic IR<br>
<br>
As discussed on D22460, I've updated the vbroadcastf128 pd256/ps256 builtins to map directly to generic IR - load+splat a 128-bit vector to both lanes of a 256-bit vector.<br>
<br>
Fix for PR28657.<br>
<br>
Modified:<br>
cfe/trunk/lib/CodeGen/CGBuiltin.cpp<br>
cfe/trunk/test/CodeGen/avx-builtins.c<br>
<br>
Modified: cfe/trunk/lib/CodeGen/CGBuiltin.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGBuiltin.cpp?rev=276417&r1=276416&r2=276417&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGBuiltin.cpp?rev=276417&r1=276416&r2=276417&view=diff</a><br>
==============================================================================<br>
--- cfe/trunk/lib/CodeGen/CGBuiltin.cpp (original)<br>
+++ cfe/trunk/lib/CodeGen/CGBuiltin.cpp Fri Jul 22 08:58:56 2016<br>
@@ -6619,6 +6619,26 @@ static Value *EmitX86MaskedLoad(CodeGenF<br>
return CGF.Builder.CreateMaskedLoad(Ops[0], Align, MaskVec, Ops[1]);<br>
}<br>
<br>
+static Value *EmitX86SubVectorBroadcast(CodeGenFunction &CGF,<br>
+ SmallVectorImpl<Value *> &Ops,<br>
+ llvm::Type *DstTy,<br>
+ unsigned SrcSizeInBits,<br>
+ unsigned Align) {<br>
+ // Load the subvector.<br>
+ Ops[0] = CGF.Builder.CreateAlignedLoad(Ops[0], Align);<br>
+<br>
+ // Create broadcast mask.<br>
+ unsigned NumDstElts = DstTy->getVectorNumElements();<br>
+ unsigned NumSrcElts = SrcSizeInBits / DstTy->getScalarSizeInBits();<br>
+<br>
+ SmallVector<uint32_t, 8> Mask;<br>
+ for (unsigned i = 0; i != NumDstElts; i += NumSrcElts)<br>
+ for (unsigned j = 0; j != NumSrcElts; ++j)<br>
+ Mask.push_back(j);<br>
+<br>
+ return CGF.Builder.CreateShuffleVector(Ops[0], Ops[0], Mask, "subvecbcst");<br>
+}<br>
+<br>
static Value *EmitX86Select(CodeGenFunction &CGF,<br>
Value *Mask, Value *Op0, Value *Op1) {<br>
<br>
@@ -6995,6 +7015,13 @@ Value *CodeGenFunction::EmitX86BuiltinEx<br>
getContext().getTypeAlignInChars(E->getArg(1)->getType()).getQuantity();<br>
return EmitX86MaskedLoad(*this, Ops, Align);<br>
}<br>
+<br>
+ case X86::BI__builtin_ia32_vbroadcastf128_pd256:<br>
+ case X86::BI__builtin_ia32_vbroadcastf128_ps256: {<br>
+ llvm::Type *DstTy = ConvertType(E->getType());<br>
+ return EmitX86SubVectorBroadcast(*this, Ops, DstTy, 128, 16);<br></blockquote><div><br></div><div>Somewhat to my surprise, after a bunch of debugging, we found a bug in this line.</div><div><br></div><div>See my fix in r278202. I wanted to mention it here in case others bisect back to this and wonder. And because frankly, I would never have thought of this. The broadcast instructions, even when taking a 128-bit input, don't have an alignment requirement here. Paint me surprised.</div><div><br></div><div>Anyways, just FYI and in case you want to double check my fix.</div></div></div>