<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><br class=""></div><br class=""><div><blockquote type="cite" class=""><div class="">On Mar 9, 2017, at 3:48 PM, Michael Kuperstein <<a href="mailto:mkuper@google.com" class="">mkuper@google.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">I'm sort of agnostic on having multiple tree nodes for the same set of scalar values vs. having a DAG - using a tree won't blow the graph up too much, since we're only talking about the leaves, and I don't expect high duplication of those either...<div class=""><div class="">If we have several tree nodes, we can keep the order on the node, if we have a single node, we can keep the order on the edges.<br class=""><br class=""></div><div class="">I do think it would be good to have an explicit graph / tree, instead of making the scalar-based lookup even more complicated.<br class=""><br class="">In any case, I'd strongly prefer to have this discussion while we don't have a known mis-compile affecting SPEC in tree.<br class="">Would you mind reverting for now? Note that reverting just became slightly more difficult because of conflicts with r297303...<br class=""></div></div></div></div></blockquote><div><div class="">I think we may still or again have a known miscompile of SPEC 177.mesa, don't we Bruno?</div><div class=""><br class=""></div><div class="">- Matthias</div></div><br class=""><blockquote type="cite" class=""><div class=""><div dir="ltr" class=""><div class=""><div class=""><div class=""><div class=""><br class=""></div><div class="">Thanks,</div><div class="">  Michael</div></div></div></div></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Thu, Mar 9, 2017 at 3:02 AM, Shahid, Asghar-ahmad <span dir="ltr" class=""><<a href="mailto:Asghar-ahmad.Shahid@amd.com" target="_blank" class="">Asghar-ahmad.Shahid@amd.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">





<div lang="EN-US" link="blue" vlink="purple" class="">
<div class="m_385631997782872349WordSection1"><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class="">The other probable idea regarding the resolution are as below,<u class=""></u><u class=""></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class=""><u class=""></u> <u class=""></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class="">Allow multiple TreeEntries for different UseMasks and<u class=""></u><u class=""></u></span></p><p class="m_385631997782872349MsoListParagraph"><u class=""></u><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class=""><span class="">1.<span style="font:7.0pt "Times New Roman"" class="">      
</span></span></span><u class=""></u><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class="">Maintain a Hash table for multiple UseMasks and query the tree index using a hash function which is based on the UseMasks, OR<u class=""></u><u class=""></u></span></p><p class="MsoNormal" style="margin-left:.25in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class=""><u class=""></u> <u class=""></u></span></p><p class="m_385631997782872349MsoListParagraph"><u class=""></u><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class=""><span class="">2.<span style="font:7.0pt "Times New Roman"" class="">      
</span></span></span><u class=""></u><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class="">Traverse the vectorizable Tree and look for the proper tree entry index after comparing all the Scalar’s indexes are same, i.e isEqual(INDEX(E->Scalars[i]))<u class=""></u><u class=""></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class=""><u class=""></u> <u class=""></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class="">Both of above may be expensive but it may be amortized based on the OpCode and early exits?<u class=""></u><u class=""></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class=""><u class=""></u> <u class=""></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class="">Regards,<u class=""></u><u class=""></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class="">Shahid<u class=""></u><u class=""></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class=""><u class=""></u> <u class=""></u></span></p><p class="MsoNormal"><b class=""><span style="font-size:11.0pt;font-family:"Calibri",sans-serif" class="">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif" class=""> Michael Kuperstein [mailto:<a href="mailto:mkuper@google.com" target="_blank" class="">mkuper@google.com</a>]
<br class="">
<b class="">Sent:</b> Wednesday, March 8, 2017 12:30 AM<br class="">
<b class="">To:</b> Shahid, Asghar-ahmad <<a href="mailto:Asghar-ahmad.Shahid@amd.com" target="_blank" class="">Asghar-ahmad.Shahid@amd.com</a>><br class="">
<b class="">Cc:</b> Daniel Jasper <<a href="mailto:djasper@google.com" target="_blank" class="">djasper@google.com</a>>; llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" target="_blank" class="">llvm-commits@lists.llvm.org</a>>; Hans Wennborg <<a href="mailto:hwennborg@google.com" target="_blank" class="">hwennborg@google.com</a>>; Matthias Braun <<a href="mailto:matze@braunis.de" target="_blank" class="">matze@braunis.de</a>>; Steven Wu <<a href="mailto:stevenwu@apple.com" target="_blank" class="">stevenwu@apple.com</a>></span></p><div class=""><div class="h5"><br class="">
<b class="">Subject:</b> Re: [llvm] r296863 - [SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available<u class=""></u><u class=""></u></div></div><div class=""><br class="webkit-block-placeholder"></div><div class=""><div class="h5"><p class="MsoNormal"><u class=""></u> <u class=""></u></p>
<div class=""><p class="MsoNormal">I think we're going to have to revert the entire patch set (r293386, r294027, r296411, anything else?) and go back to the drawing board on this...<u class=""></u><u class=""></u></p>
<div class=""><p class="MsoNormal"><u class=""></u> <u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">Any other ideas?<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"><u class=""></u> <u class=""></u></p>
<div class=""><p class="MsoNormal">On Mon, Mar 6, 2017 at 10:44 PM, Shahid, Asghar-ahmad <<a href="mailto:Asghar-ahmad.Shahid@amd.com" target="_blank" class="">Asghar-ahmad.Shahid@amd.com</a>> wrote:<u class=""></u><u class=""></u></p>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in" class="">
<div class="">
<div class=""><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class="">Hi Michael,</span><u class=""></u><u class=""></u></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class=""> </span><u class=""></u><u class=""></u></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class="">Apart from single load bundle tree entry another problem is the mapping of scalars to tree entries.</span><u class=""></u><u class=""></u></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class=""> </span><u class=""></u><u class=""></u></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class="">Currently even if we allow multiple load bundle tree entries for in-order & OOO uses, queries for tree
</span><u class=""></u><u class=""></u></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class="">entry index will not be proper as it is based on scalars which is assigned same index irrespective
 of the </span><u class=""></u><u class=""></u></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class="">order.</span><u class=""></u><u class=""></u></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class=""> </span><u class=""></u><u class=""></u></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class="">-Shahid</span><u class=""></u><u class=""></u></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d" class=""> </span><u class=""></u><u class=""></u></p><p class="MsoNormal"><b class=""><span style="font-size:11.0pt;font-family:"Calibri",sans-serif" class="">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif" class=""> Michael Kuperstein [mailto:<a href="mailto:mkuper@google.com" target="_blank" class="">mkuper@google.com</a>]
<br class="">
<b class="">Sent:</b> Tuesday, March 7, 2017 5:15 AM<br class="">
<b class="">To:</b> Shahid, Asghar-ahmad <<a href="mailto:Asghar-ahmad.Shahid@amd.com" target="_blank" class="">Asghar-ahmad.Shahid@amd.com</a>>; Daniel Jasper <<a href="mailto:djasper@google.com" target="_blank" class="">djasper@google.com</a>><br class="">
<b class="">Cc:</b> llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" target="_blank" class="">llvm-commits@lists.llvm.org</a>>; Hans Wennborg <<a href="mailto:hwennborg@google.com" target="_blank" class="">hwennborg@google.com</a>>; Matthias Braun <<a href="mailto:matze@braunis.de" target="_blank" class="">matze@braunis.de</a>>;
 Steven Wu <<a href="mailto:stevenwu@apple.com" target="_blank" class="">stevenwu@apple.com</a>><br class="">
<b class="">Subject:</b> Re: [llvm] r296863 - [SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available</span><u class=""></u><u class=""></u></p>
<div class="">
<div class=""><p class="MsoNormal"> <u class=""></u><u class=""></u></p>
<div class=""><p class="MsoNormal">Hi Shahid,<u class=""></u><u class=""></u></p>
<div class=""><p class="MsoNormal"> <u class=""></u><u class=""></u></p>
<div class=""><p class="MsoNormal">Unfortunately, this still fails in some cases - and it's partially my fault, since it's caused by a change I suggested on the review.<u class=""></u><u class=""></u></p>
</div>
</div>
<div class=""><p class="MsoNormal"> <u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">The problem is that we end up having a single load bundle that's used in two different orders. Since we only have a single tree entry for that bundle, we generate a single mask,
 and use the same shuffle for both uses. Note that I I'm not sure computing the mask on-demand (like we did before) would always help. I think it would break if one of the uses was in-order, another was out-of-order, and we reached buildTree_vec with the in-order
 use first. We'd then set NeedToShuffle to false, and wouldn't generate a shuffle for the out-of-order use.<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> <u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">I'm not sure if the right solution is to generate the mask on demand like we did before (but then we need to make sure the issue above doesn't happen), or have distinct tree entries.<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> <u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">Reduced example:<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> <u class=""></u><u class=""></u></p>
</div>
<div class="">
<div class=""><p class="MsoNormal">target datalayout = "e-m:e-i64:64-f80:128-n8:16:<wbr class="">32:64-S128"<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">target triple = "x86_64-pc-linux-gnu"<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> <u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">%complex = type { { double, double } }<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> <u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">define void @eggs(%complex* %p) local_unnamed_addr #0 {<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">bb:<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  br i1 undef, label %bb2, label %bb1<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> <u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">bb1:                                              ; preds = %bb<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  ret void<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> <u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">bb2:                                              ; preds = %bb2, %bb<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %phi0 = phi double [ %tmp21, %bb2 ], [ 0.000000e+00, %bb ]<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %phi1 = phi double [ %tmp20, %bb2 ], [ 0.000000e+00, %bb ]<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %phi2 = phi double [ %tmp29, %bb2 ], [ 0.000000e+00, %bb ]<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %phi3 = phi double [ %tmp28, %bb2 ], [ 0.000000e+00, %bb ]<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> <u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %gep0 = getelementptr inbounds %complex, %complex* %p, i64 0, i32 0, i32 0<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %load0 = load double, double* %gep0, align 8<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %gep1 = getelementptr inbounds %complex, %complex* %p, i64 0, i32 0, i32 1<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %load1 = load double, double* %gep1, align 8<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %gep2 = getelementptr inbounds %complex, %complex* %p, i64 1, i32 0, i32 0<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %load2 = load double, double* %gep2, align 8<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %gep3 = getelementptr inbounds %complex, %complex* %p, i64 1, i32 0, i32 1<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %load3 = load double, double* %gep3, align 8<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> <u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %tmp14 = fmul double 1.0, %load0<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %tmp15 = fmul double 2.0, %load1<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %tmp16 = fsub double %tmp14, %tmp15<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %tmp17 = fmul double 1.0, %load1<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %tmp18 = fmul double 2.0, %load0<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %tmp19 = fsub double %tmp17, %tmp18<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %tmp20 = fadd double %phi1, %tmp16<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %tmp21 = fadd double %phi0, %tmp19<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %tmp22 = fmul double 1.0, %load2<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %tmp23 = fmul double 2.0, %load3<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %tmp24 = fsub double %tmp22, %tmp23<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %tmp25 = fmul double 1.0, %load3<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %tmp26 = fmul double 2.0, %load2<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %tmp27 = fsub double %tmp25, %tmp26<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %tmp28 = fadd double %phi3, %tmp24<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %tmp29 = fadd double %phi2, %tmp27<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  br label %bb2<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">}<u class=""></u><u class=""></u></p>
</div>
</div>
<div class=""><p class="MsoNormal"> <u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">Good output:<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> <u class=""></u><u class=""></u></p>
</div>
<div class="">
<div class=""><p class="MsoNormal">bb2:                                              ; preds = %bb2, %bb<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %0 = phi <4 x double> [ %10, %bb2 ], [ zeroinitializer, %bb ]<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %gep0 = getelementptr inbounds %complex, %complex* %p, i64 0, i32 0, i32 0<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %gep1 = getelementptr inbounds %complex, %complex* %p, i64 0, i32 0, i32 1<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %gep2 = getelementptr inbounds %complex, %complex* %p, i64 1, i32 0, i32 0<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %gep3 = getelementptr inbounds %complex, %complex* %p, i64 1, i32 0, i32 1<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %1 = bitcast double* %gep0 to <4 x double>*<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %2 = load <4 x double>, <4 x double>* %1, align 8<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %3 = shufflevector <4 x double> %2, <4 x double> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3><u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %4 = bitcast double* %gep0 to <4 x double>*<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %5 = load <4 x double>, <4 x double>* %4, align 8<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %6 = shufflevector <4 x double> %5, <4 x double> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2><u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %7 = fmul <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, %6<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %8 = fmul <4 x double> <double 2.000000e+00, double 2.000000e+00, double 2.000000e+00, double 2.000000e+00>, %3<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %9 = fsub <4 x double> %7, %8<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %10 = fadd <4 x double> %0, %9<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  br label %bb2<u class=""></u><u class=""></u></p>
</div>
</div>
<div class=""><p class="MsoNormal"> <u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">Bad output:<u class=""></u><u class=""></u></p>
</div>
<div class="">
<div class=""><p class="MsoNormal">bb2:                                              ; preds = %bb2, %bb<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %0 = phi <4 x double> [ %10, %bb2 ], [ zeroinitializer, %bb ]<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %gep0 = getelementptr inbounds %complex, %complex* %p, i64 0, i32 0, i32 0<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %gep1 = getelementptr inbounds %complex, %complex* %p, i64 0, i32 0, i32 1<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %gep2 = getelementptr inbounds %complex, %complex* %p, i64 1, i32 0, i32 0<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %gep3 = getelementptr inbounds %complex, %complex* %p, i64 1, i32 0, i32 1<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %1 = bitcast double* %gep0 to <4 x double>*<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %2 = load <4 x double>, <4 x double>* %1, align 8<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %3 = shufflevector <4 x double> %2, <4 x double> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2><u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %4 = bitcast double* %gep0 to <4 x double>*<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %5 = load <4 x double>, <4 x double>* %4, align 8<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %6 = shufflevector <4 x double> %5, <4 x double> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2><u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %7 = fmul <4 x double> <double 1.000000e+00, double 1.000000e+00, double 1.000000e+00, double 1.000000e+00>, %6<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %8 = fmul <4 x double> <double 2.000000e+00, double 2.000000e+00, double 2.000000e+00, double 2.000000e+00>, %3<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %9 = fsub <4 x double> %7, %8<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  %10 = fadd <4 x double> %0, %9<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">  br label %bb2<u class=""></u><u class=""></u></p>
</div>
</div>
<div class=""><p class="MsoNormal"> <u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">Note the difference in %3.<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal">(In this reproducer, the "good" %3 shuffle is a nop, but the original case I saw had a different order.)<u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> <u class=""></u><u class=""></u></p>
</div>
<div class=""><p class="MsoNormal"> <u class=""></u><u class=""></u></p>
</div>
</div>
<div class=""><p class="MsoNormal"> <u class=""></u><u class=""></u></p>
<div class=""><p class="MsoNormal">On Fri, Mar 3, 2017 at 2:02 AM, Mohammad Shahid via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" target="_blank" class="">llvm-commits@lists.llvm.org</a>> wrote:<u class=""></u><u class=""></u></p>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0pt" class=""><p class="MsoNormal">Author: ashahid<br class="">
Date: Fri Mar  3 04:02:47 2017<br class="">
New Revision: 296863<br class="">
<br class="">
URL: <a href="http://llvm.org/viewvc/llvm-project?rev=296863&view=rev" target="_blank" class="">
http://llvm.org/viewvc/llvm-<wbr class="">project?rev=296863&view=rev</a><br class="">
Log:<br class="">
[SLP] Fixes the bug due to absence of in order uses of scalars which needs to be available<br class="">
for VectorizeTree() API.This API uses it for proper mask computation to be used in shufflevector IR.<br class="">
The fix is to compute the mask for out of order memory accesses while building the vectorizable tree<br class="">
instead of actual vectorization of vectorizable tree.It also needs to recompute the proper Lane for<br class="">
external use of vectorizable scalars based on shuffle mask.<br class="">
<br class="">
Reviewers: mkuper<br class="">
<br class="">
Differential Revision: <a href="https://reviews.llvm.org/D30159" target="_blank" class="">
https://reviews.llvm.org/<wbr class="">D30159</a><br class="">
<br class="">
Change-Id: Ide8773ce0ad3562f3cf4d1a0ad0f4<wbr class="">87e2f60ce5d<br class="">
<br class="">
Added:<br class="">
    llvm/trunk/test/Transforms/<wbr class="">SLPVectorizer/X86/jumbled-<wbr class="">load-bug.ll<br class="">
Modified:<br class="">
    llvm/trunk/include/llvm/<wbr class="">Analysis/LoopAccessAnalysis.h<br class="">
    llvm/trunk/lib/Analysis/<wbr class="">LoopAccessAnalysis.cpp<br class="">
    llvm/trunk/lib/Transforms/<wbr class="">Vectorize/SLPVectorizer.cpp<br class="">
    llvm/trunk/test/Transforms/<wbr class="">SLPVectorizer/X86/jumbled-<wbr class="">same.ll<br class="">
<br class="">
Modified: llvm/trunk/include/llvm/<wbr class="">Analysis/LoopAccessAnalysis.h<br class="">
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Analysis/LoopAccessAnalysis.h?rev=296863&r1=296862&r2=296863&view=diff" target="_blank" class="">
http://llvm.org/viewvc/llvm-<wbr class="">project/llvm/trunk/include/<wbr class="">llvm/Analysis/<wbr class="">LoopAccessAnalysis.h?rev=<wbr class="">296863&r1=296862&r2=296863&<wbr class="">view=diff</a><br class="">
==============================<wbr class="">==============================<wbr class="">==================<br class="">
--- llvm/trunk/include/llvm/<wbr class="">Analysis/LoopAccessAnalysis.h (original)<br class="">
+++ llvm/trunk/include/llvm/<wbr class="">Analysis/LoopAccessAnalysis.h Fri Mar  3 04:02:47 2017<br class="">
@@ -660,12 +660,15 @@ int64_t getPtrStride(<wbr class="">PredicatedScalarEvo<br class="">
 /// \brief Try to sort an array of loads / stores.<br class="">
 ///<br class="">
 /// An array of loads / stores can only be sorted if all pointer operands<br class="">
-/// refer to the same object, and the differences between these pointers<br class="">
+/// refer to the same object, and the differences between these pointers<br class="">
 /// are known to be constant. If that is the case, this returns true, and the<br class="">
 /// sorted array is returned in \p Sorted. Otherwise, this returns false, and<br class="">
 /// \p Sorted is invalid.<br class="">
+//  If \p Mask is not null, it also returns the \p Mask which is the shuffle<br class="">
+//  mask for actual memory access order.<br class="">
 bool sortMemAccesses(ArrayRef<Value *> VL, const DataLayout &DL,<br class="">
-                     ScalarEvolution &SE, SmallVectorImpl<Value *> &Sorted);<br class="">
+                     ScalarEvolution &SE, SmallVectorImpl<Value *> &Sorted,<br class="">
+                     SmallVectorImpl<unsigned> *Mask = nullptr);<br class="">
<br class="">
 /// \brief Returns true if the memory operations \p A and \p B are consecutive.<br class="">
 /// This is a simple API that does not depend on the analysis pass.<br class="">
<br class="">
Modified: llvm/trunk/lib/Analysis/<wbr class="">LoopAccessAnalysis.cpp<br class="">
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/LoopAccessAnalysis.cpp?rev=296863&r1=296862&r2=296863&view=diff" target="_blank" class="">
http://llvm.org/viewvc/llvm-<wbr class="">project/llvm/trunk/lib/<wbr class="">Analysis/LoopAccessAnalysis.<wbr class="">cpp?rev=296863&r1=296862&r2=<wbr class="">296863&view=diff</a><br class="">
==============================<wbr class="">==============================<wbr class="">==================<br class="">
--- llvm/trunk/lib/Analysis/<wbr class="">LoopAccessAnalysis.cpp (original)<br class="">
+++ llvm/trunk/lib/Analysis/<wbr class="">LoopAccessAnalysis.cpp Fri Mar  3 04:02:47 2017<br class="">
@@ -1040,7 +1040,8 @@ static unsigned getAddressSpaceOperand(V<br class="">
<br class="">
 bool llvm::sortMemAccesses(<wbr class="">ArrayRef<Value *> VL, const DataLayout &DL,<br class="">
                            ScalarEvolution &SE,<br class="">
-                           SmallVectorImpl<Value *> &Sorted) {<br class="">
+                           SmallVectorImpl<Value *> &Sorted,<br class="">
+                           SmallVectorImpl<unsigned> *Mask) {<br class="">
   SmallVector<std::pair<int64_<wbr class="">t, Value *>, 4> OffValPairs;<br class="">
   OffValPairs.reserve(VL.size()<wbr class="">);<br class="">
   Sorted.reserve(VL.size());<br class="">
@@ -1050,7 +1051,6 @@ bool llvm::sortMemAccesses(<wbr class="">ArrayRef<Valu<br class="">
   Value *Ptr0 = getPointerOperand(VL[0]);<br class="">
   const SCEV *Scev0 = SE.getSCEV(Ptr0);<br class="">
   Value *Obj0 = GetUnderlyingObject(Ptr0, DL);<br class="">
-<br class="">
   for (auto *Val : VL) {<br class="">
     // The only kind of access we care about here is load.<br class="">
     if (!isa<LoadInst>(Val))<br class="">
@@ -1077,14 +1077,30 @@ bool llvm::sortMemAccesses(<wbr class="">ArrayRef<Valu<br class="">
     OffValPairs.emplace_back(<wbr class="">Diff->getAPInt().getSExtValue(<wbr class="">), Val);<br class="">
   }<br class="">
<br class="">
-  std::sort(OffValPairs.begin(), OffValPairs.end(),<br class="">
-            [](const std::pair<int64_t, Value *> &Left,<br class="">
-               const std::pair<int64_t, Value *> &Right) {<br class="">
-              return Left.first < Right.first;<br class="">
+  SmallVector<unsigned, 4> UseOrder(VL.size());<br class="">
+  for (unsigned i = 0; i < VL.size(); i++) {<br class="">
+    UseOrder[i] = i;<br class="">
+  }<br class="">
+<br class="">
+  // Sort the memory accesses and keep the order of their uses in UseOrder.<br class="">
+  std::sort(UseOrder.begin(), UseOrder.end(),<br class="">
+            [&OffValPairs](unsigned Left, unsigned Right) {<br class="">
+              return OffValPairs[Left].first < OffValPairs[Right].first;<br class="">
             });<br class="">
<br class="">
-  for (auto &it : OffValPairs)<br class="">
-    Sorted.push_back(it.second);<br class="">
+  for (unsigned i = 0; i < VL.size(); i++)<br class="">
+    Sorted.emplace_back(<wbr class="">OffValPairs[UseOrder[i]].<wbr class="">second);<br class="">
+<br class="">
+  // Sort UseOrder to compute the Mask.<br class="">
+  if (Mask) {<br class="">
+    Mask->reserve(VL.size());<br class="">
+    for (unsigned i = 0; i < VL.size(); i++)<br class="">
+      Mask->emplace_back(i);<br class="">
+    std::sort(Mask->begin(), Mask->end(),<br class="">
+              [&UseOrder](unsigned Left, unsigned Right) {<br class="">
+                return UseOrder[Left] < UseOrder[Right];<br class="">
+              });<br class="">
+  }<br class="">
<br class="">
   return true;<br class="">
 }<br class="">
<br class="">
Modified: llvm/trunk/lib/Transforms/<wbr class="">Vectorize/SLPVectorizer.cpp<br class="">
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp?rev=296863&r1=296862&r2=296863&view=diff" target="_blank" class="">
http://llvm.org/viewvc/llvm-<wbr class="">project/llvm/trunk/lib/<wbr class="">Transforms/Vectorize/<wbr class="">SLPVectorizer.cpp?rev=296863&<wbr class="">r1=296862&r2=296863&view=diff</a><br class="">
==============================<wbr class="">==============================<wbr class="">==================<br class="">
--- llvm/trunk/lib/Transforms/<wbr class="">Vectorize/SLPVectorizer.cpp (original)<br class="">
+++ llvm/trunk/lib/Transforms/<wbr class="">Vectorize/SLPVectorizer.cpp Fri Mar  3 04:02:47 2017<br class="">
@@ -423,10 +423,8 @@ private:<br class="">
   /// be vectorized to use the original vector (or aggregate "bitcast" to a vector).<br class="">
   bool canReuseExtract(ArrayRef<Value *> VL, unsigned Opcode) const;<br class="">
<br class="">
-  /// Vectorize a single entry in the tree. VL icontains all isomorphic scalars<br class="">
-  /// in order of its usage in a user program, for example ADD1, ADD2 and so on<br class="">
-  /// or LOAD1 , LOAD2 etc.<br class="">
-  Value *vectorizeTree(ArrayRef<Value *> VL, TreeEntry *E);<br class="">
+  /// Vectorize a single entry in the tree.<br class="">
+  Value *vectorizeTree(TreeEntry *E);<br class="">
<br class="">
   /// Vectorize a single entry in the tree, starting in \p VL.<br class="">
   Value *vectorizeTree(ArrayRef<Value *> VL);<br class="">
@@ -466,8 +464,8 @@ private:<br class="">
                                       SmallVectorImpl<Value *> &Left,<br class="">
                                       SmallVectorImpl<Value *> &Right);<br class="">
   struct TreeEntry {<br class="">
-    TreeEntry() : Scalars(), VectorizedValue(nullptr),<br class="">
-    NeedToGather(0), NeedToShuffle(0) {}<br class="">
+    TreeEntry()<br class="">
+        : Scalars(), VectorizedValue(nullptr), NeedToGather(0), ShuffleMask() {}<br class="">
<br class="">
     /// \returns true if the scalars in VL are equal to this entry.<br class="">
     bool isSame(ArrayRef<Value *> VL) const {<br class="">
@@ -495,19 +493,23 @@ private:<br class="">
     /// Do we need to gather this sequence ?<br class="">
     bool NeedToGather;<br class="">
<br class="">
-    /// Do we need to shuffle the load ?<br class="">
-    bool NeedToShuffle;<br class="">
+    /// Records optional suffle mask for jumbled memory accesses in this.<br class="">
+    SmallVector<unsigned, 8> ShuffleMask;<br class="">
+<br class="">
   };<br class="">
<br class="">
   /// Create a new VectorizableTree entry.<br class="">
   TreeEntry *newTreeEntry(ArrayRef<Value *> VL, bool Vectorized,<br class="">
-                          bool NeedToShuffle) {<br class="">
+                          ArrayRef<unsigned> ShuffleMask = None) {<br class="">
     VectorizableTree.emplace_<wbr class="">back();<br class="">
     int idx = VectorizableTree.size() - 1;<br class="">
     TreeEntry *Last = &VectorizableTree[idx];<br class="">
     Last->Scalars.insert(Last-><wbr class="">Scalars.begin(), VL.begin(), VL.end());<br class="">
     Last->NeedToGather = !Vectorized;<br class="">
-    Last->NeedToShuffle = NeedToShuffle;<br class="">
+    if (!ShuffleMask.empty())<br class="">
+      Last->ShuffleMask.insert(Last-<wbr class="">>ShuffleMask.begin(), ShuffleMask.begin(),<br class="">
+                               ShuffleMask.end());<br class="">
+<br class="">
     if (Vectorized) {<br class="">
       for (int i = 0, e = VL.size(); i != e; ++i) {<br class="">
         assert(!ScalarToTreeEntry.<wbr class="">count(VL[i]) && "Scalar already in tree!");<br class="">
@@ -1030,21 +1032,21 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
<br class="">
   if (Depth == RecursionMaxDepth) {<br class="">
     DEBUG(dbgs() << "SLP: Gathering due to max recursion depth.\n");<br class="">
-    newTreeEntry(VL, false, false);<br class="">
+    newTreeEntry(VL, false);<br class="">
     return;<br class="">
   }<br class="">
<br class="">
   // Don't handle vectors.<br class="">
   if (VL[0]->getType()->isVectorTy(<wbr class="">)) {<br class="">
     DEBUG(dbgs() << "SLP: Gathering due to vector type.\n");<br class="">
-    newTreeEntry(VL, false, false);<br class="">
+    newTreeEntry(VL, false);<br class="">
     return;<br class="">
   }<br class="">
<br class="">
   if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))<br class="">
     if (SI->getValueOperand()-><wbr class="">getType()->isVectorTy()) {<br class="">
       DEBUG(dbgs() << "SLP: Gathering due to store vector type.\n");<br class="">
-      newTreeEntry(VL, false, false);<br class="">
+      newTreeEntry(VL, false);<br class="">
       return;<br class="">
     }<br class="">
   unsigned Opcode = getSameOpcode(VL);<br class="">
@@ -1061,7 +1063,7 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
   // If all of the operands are identical or constant we have a simple solution.<br class="">
   if (allConstant(VL) || isSplat(VL) || !allSameBlock(VL) || !Opcode) {<br class="">
     DEBUG(dbgs() << "SLP: Gathering due to C,S,B,O. \n");<br class="">
-    newTreeEntry(VL, false, false);<br class="">
+    newTreeEntry(VL, false);<br class="">
     return;<br class="">
   }<br class="">
<br class="">
@@ -1073,7 +1075,7 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
     if (EphValues.count(VL[i])) {<br class="">
       DEBUG(dbgs() << "SLP: The instruction (" << *VL[i] <<<br class="">
             ") is ephemeral.\n");<br class="">
-      newTreeEntry(VL, false, false);<br class="">
+      newTreeEntry(VL, false);<br class="">
       return;<br class="">
     }<br class="">
   }<br class="">
@@ -1086,7 +1088,7 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
       DEBUG(dbgs() << "SLP: \tChecking bundle: " << *VL[i] << ".\n");<br class="">
       if (E->Scalars[i] != VL[i]) {<br class="">
         DEBUG(dbgs() << "SLP: Gathering due to partial overlap.\n");<br class="">
-        newTreeEntry(VL, false, false);<br class="">
+        newTreeEntry(VL, false);<br class="">
         return;<br class="">
       }<br class="">
     }<br class="">
@@ -1099,7 +1101,7 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
     if (ScalarToTreeEntry.count(VL[i]<wbr class="">)) {<br class="">
       DEBUG(dbgs() << "SLP: The instruction (" << *VL[i] <<<br class="">
             ") is already in tree.\n");<br class="">
-      newTreeEntry(VL, false, false);<br class="">
+      newTreeEntry(VL, false);<br class="">
       return;<br class="">
     }<br class="">
   }<br class="">
@@ -1109,7 +1111,7 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
   for (unsigned i = 0, e = VL.size(); i != e; ++i) {<br class="">
     if (MustGather.count(VL[i])) {<br class="">
       DEBUG(dbgs() << "SLP: Gathering due to gathered scalar.\n");<br class="">
-      newTreeEntry(VL, false, false);<br class="">
+      newTreeEntry(VL, false);<br class="">
       return;<br class="">
     }<br class="">
   }<br class="">
@@ -1123,7 +1125,7 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
     // Don't go into unreachable blocks. They may contain instructions with<br class="">
     // dependency cycles which confuse the final scheduling.<br class="">
     DEBUG(dbgs() << "SLP: bundle in unreachable block.\n");<br class="">
-    newTreeEntry(VL, false, false);<br class="">
+    newTreeEntry(VL, false);<br class="">
     return;<br class="">
   }<br class="">
<br class="">
@@ -1132,7 +1134,7 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
     for (unsigned j = i+1; j < e; ++j)<br class="">
       if (VL[i] == VL[j]) {<br class="">
         DEBUG(dbgs() << "SLP: Scalar used twice in bundle.\n");<br class="">
-        newTreeEntry(VL, false, false);<br class="">
+        newTreeEntry(VL, false);<br class="">
         return;<br class="">
       }<br class="">
<br class="">
@@ -1147,7 +1149,7 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
     assert((!BS.getScheduleData(<wbr class="">VL[0]) ||<br class="">
             !BS.getScheduleData(VL[0])-><wbr class="">isPartOfBundle()) &&<br class="">
            "tryScheduleBundle should cancelScheduling on failure");<br class="">
-    newTreeEntry(VL, false, false);<br class="">
+    newTreeEntry(VL, false);<br class="">
     return;<br class="">
   }<br class="">
   DEBUG(dbgs() << "SLP: We are able to schedule this bundle.\n");<br class="">
@@ -1164,12 +1166,12 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
           if (Term) {<br class="">
             DEBUG(dbgs() << "SLP: Need to swizzle PHINodes (TerminatorInst use).\n");<br class="">
             BS.cancelScheduling(VL);<br class="">
-            newTreeEntry(VL, false, false);<br class="">
+            newTreeEntry(VL, false);<br class="">
             return;<br class="">
           }<br class="">
         }<br class="">
<br class="">
-      newTreeEntry(VL, true, false);<br class="">
+      newTreeEntry(VL, true);<br class="">
       DEBUG(dbgs() << "SLP: added a vector of PHINodes.\n");<br class="">
<br class="">
       for (unsigned i = 0, e = PH->getNumIncomingValues(); i < e; ++i) {<br class="">
@@ -1191,7 +1193,7 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
       } else {<br class="">
         BS.cancelScheduling(VL);<br class="">
       }<br class="">
-      newTreeEntry(VL, Reuse, false);<br class="">
+      newTreeEntry(VL, Reuse);<br class="">
       return;<br class="">
     }<br class="">
     case Instruction::Load: {<br class="">
@@ -1207,7 +1209,7 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
       if (DL->getTypeSizeInBits(<wbr class="">ScalarTy) !=<br class="">
           DL->getTypeAllocSizeInBits(<wbr class="">ScalarTy)) {<br class="">
         BS.cancelScheduling(VL);<br class="">
-        newTreeEntry(VL, false, false);<br class="">
+        newTreeEntry(VL, false);<br class="">
         DEBUG(dbgs() << "SLP: Gathering loads of non-packed type.\n");<br class="">
         return;<br class="">
       }<br class="">
@@ -1218,7 +1220,7 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
         LoadInst *L = cast<LoadInst>(VL[i]);<br class="">
         if (!L->isSimple()) {<br class="">
           BS.cancelScheduling(VL);<br class="">
-          newTreeEntry(VL, false, false);<br class="">
+          newTreeEntry(VL, false);<br class="">
           DEBUG(dbgs() << "SLP: Gathering non-simple loads.\n");<br class="">
           return;<br class="">
         }<br class="">
@@ -1238,7 +1240,7 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
<br class="">
       if (Consecutive) {<br class="">
         ++NumLoadsWantToKeepOrder;<br class="">
-        newTreeEntry(VL, true, false);<br class="">
+        newTreeEntry(VL, true);<br class="">
         DEBUG(dbgs() << "SLP: added a vector of loads.\n");<br class="">
         return;<br class="">
       }<br class="">
@@ -1255,7 +1257,8 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
       if (VL.size() > 2 && !ReverseConsecutive) {<br class="">
         bool ShuffledLoads = true;<br class="">
         SmallVector<Value *, 8> Sorted;<br class="">
-        if (sortMemAccesses(VL, *DL, *SE, Sorted)) {<br class="">
+        SmallVector<unsigned, 4> Mask;<br class="">
+        if (sortMemAccesses(VL, *DL, *SE, Sorted, &Mask)) {<br class="">
           auto NewVL = makeArrayRef(Sorted.begin(), Sorted.end());<br class="">
           for (unsigned i = 0, e = NewVL.size() - 1; i < e; ++i) {<br class="">
             if (!isConsecutiveAccess(NewVL[i]<wbr class="">, NewVL[i + 1], *DL, *SE)) {<br class="">
@@ -1264,14 +1267,14 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
             }<br class="">
           }<br class="">
           if (ShuffledLoads) {<br class="">
-            newTreeEntry(NewVL, true, true);<br class="">
+            newTreeEntry(NewVL, true, makeArrayRef(Mask.begin(), Mask.end()));<br class="">
             return;<br class="">
           }<br class="">
         }<br class="">
       }<br class="">
<br class="">
       BS.cancelScheduling(VL);<br class="">
-      newTreeEntry(VL, false, false);<br class="">
+      newTreeEntry(VL, false);<br class="">
<br class="">
       if (ReverseConsecutive) {<br class="">
         ++NumLoadsWantToChangeOrder;<br class="">
@@ -1298,12 +1301,12 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
         Type *Ty = cast<Instruction>(Val)-><wbr class="">getOperand(0)->getType();<br class="">
         if (Ty != SrcTy || !isValidElementType(Ty)) {<br class="">
           BS.cancelScheduling(VL);<br class="">
-          newTreeEntry(VL, false, false);<br class="">
+          newTreeEntry(VL, false);<br class="">
           DEBUG(dbgs() << "SLP: Gathering casts with different src types.\n");<br class="">
           return;<br class="">
         }<br class="">
       }<br class="">
-      newTreeEntry(VL, true, false);<br class="">
+      newTreeEntry(VL, true);<br class="">
       DEBUG(dbgs() << "SLP: added a vector of casts.\n");<br class="">
<br class="">
       for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {<br class="">
@@ -1326,13 +1329,13 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
         if (Cmp->getPredicate() != P0 ||<br class="">
             Cmp->getOperand(0)->getType() != ComparedTy) {<br class="">
           BS.cancelScheduling(VL);<br class="">
-          newTreeEntry(VL, false, false);<br class="">
+          newTreeEntry(VL, false);<br class="">
           DEBUG(dbgs() << "SLP: Gathering cmp with different predicate.\n");<br class="">
           return;<br class="">
         }<br class="">
       }<br class="">
<br class="">
-      newTreeEntry(VL, true, false);<br class="">
+      newTreeEntry(VL, true);<br class="">
       DEBUG(dbgs() << "SLP: added a vector of compares.\n");<br class="">
<br class="">
       for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {<br class="">
@@ -1364,7 +1367,7 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
     case Instruction::And:<br class="">
     case Instruction::Or:<br class="">
     case Instruction::Xor: {<br class="">
-      newTreeEntry(VL, true, false);<br class="">
+      newTreeEntry(VL, true);<br class="">
       DEBUG(dbgs() << "SLP: added a vector of bin op.\n");<br class="">
<br class="">
       // Sort operands of the instructions so that each side is more likely to<br class="">
@@ -1393,7 +1396,7 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
         if (cast<Instruction>(Val)-><wbr class="">getNumOperands() != 2) {<br class="">
           DEBUG(dbgs() << "SLP: not-vectorizable GEP (nested indexes).\n");<br class="">
           BS.cancelScheduling(VL);<br class="">
-          newTreeEntry(VL, false, false);<br class="">
+          newTreeEntry(VL, false);<br class="">
           return;<br class="">
         }<br class="">
       }<br class="">
@@ -1406,7 +1409,7 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
         if (Ty0 != CurTy) {<br class="">
           DEBUG(dbgs() << "SLP: not-vectorizable GEP (different types).\n");<br class="">
           BS.cancelScheduling(VL);<br class="">
-          newTreeEntry(VL, false, false);<br class="">
+          newTreeEntry(VL, false);<br class="">
           return;<br class="">
         }<br class="">
       }<br class="">
@@ -1418,12 +1421,12 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
           DEBUG(<br class="">
               dbgs() << "SLP: not-vectorizable GEP (non-constant indexes).\n");<br class="">
           BS.cancelScheduling(VL);<br class="">
-          newTreeEntry(VL, false, false);<br class="">
+          newTreeEntry(VL, false);<br class="">
           return;<br class="">
         }<br class="">
       }<br class="">
<br class="">
-      newTreeEntry(VL, true, false);<br class="">
+      newTreeEntry(VL, true);<br class="">
       DEBUG(dbgs() << "SLP: added a vector of GEPs.\n");<br class="">
       for (unsigned i = 0, e = 2; i < e; ++i) {<br class="">
         ValueList Operands;<br class="">
@@ -1440,12 +1443,12 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
       for (unsigned i = 0, e = VL.size() - 1; i < e; ++i)<br class="">
         if (!isConsecutiveAccess(VL[i], VL[i + 1], *DL, *SE)) {<br class="">
           BS.cancelScheduling(VL);<br class="">
-          newTreeEntry(VL, false, false);<br class="">
+          newTreeEntry(VL, false);<br class="">
           DEBUG(dbgs() << "SLP: Non-consecutive store.\n");<br class="">
           return;<br class="">
         }<br class="">
<br class="">
-      newTreeEntry(VL, true, false);<br class="">
+      newTreeEntry(VL, true);<br class="">
       DEBUG(dbgs() << "SLP: added a vector of stores.\n");<br class="">
<br class="">
       ValueList Operands;<br class="">
@@ -1463,7 +1466,7 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
       Intrinsic::ID ID = getVectorIntrinsicIDForCall(<wbr class="">CI, TLI);<br class="">
       if (!isTriviallyVectorizable(ID)) {<br class="">
         BS.cancelScheduling(VL);<br class="">
-        newTreeEntry(VL, false, false);<br class="">
+        newTreeEntry(VL, false);<br class="">
         DEBUG(dbgs() << "SLP: Non-vectorizable call.\n");<br class="">
         return;<br class="">
       }<br class="">
@@ -1477,7 +1480,7 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
             getVectorIntrinsicIDForCall(<wbr class="">CI2, TLI) != ID ||<br class="">
             !CI-><wbr class="">hasIdenticalOperandBundleSchem<wbr class="">a(*CI2)) {<br class="">
           BS.cancelScheduling(VL);<br class="">
-          newTreeEntry(VL, false, false);<br class="">
+          newTreeEntry(VL, false);<br class="">
           DEBUG(dbgs() << "SLP: mismatched calls:" << *CI << "!=" << *VL[i]<br class="">
                        << "\n");<br class="">
           return;<br class="">
@@ -1488,7 +1491,7 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
           Value *A1J = CI2->getArgOperand(1);<br class="">
           if (A1I != A1J) {<br class="">
             BS.cancelScheduling(VL);<br class="">
-            newTreeEntry(VL, false, false);<br class="">
+            newTreeEntry(VL, false);<br class="">
             DEBUG(dbgs() << "SLP: mismatched arguments in call:" << *CI<br class="">
                          << " argument "<< A1I<<"!=" << A1J<br class="">
                          << "\n");<br class="">
@@ -1501,14 +1504,14 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
                         CI->op_begin() + CI->getBundleOperandsEndIndex(<wbr class="">),<br class="">
                         CI2->op_begin() + CI2-><wbr class="">getBundleOperandsStartIndex())<wbr class="">) {<br class="">
           BS.cancelScheduling(VL);<br class="">
-          newTreeEntry(VL, false, false);<br class="">
+          newTreeEntry(VL, false);<br class="">
           DEBUG(dbgs() << "SLP: mismatched bundle operands in calls:" << *CI << "!="<br class="">
                        << *VL[i] << '\n');<br class="">
           return;<br class="">
         }<br class="">
       }<br class="">
<br class="">
-      newTreeEntry(VL, true, false);<br class="">
+      newTreeEntry(VL, true);<br class="">
       for (unsigned i = 0, e = CI->getNumArgOperands(); i != e; ++i) {<br class="">
         ValueList Operands;<br class="">
         // Prepare the operand vector.<br class="">
@@ -1525,11 +1528,11 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
       // then do not vectorize this instruction.<br class="">
       if (!isAltShuffle) {<br class="">
         BS.cancelScheduling(VL);<br class="">
-        newTreeEntry(VL, false, false);<br class="">
+        newTreeEntry(VL, false);<br class="">
         DEBUG(dbgs() << "SLP: ShuffleVector are not vectorized.\n");<br class="">
         return;<br class="">
       }<br class="">
-      newTreeEntry(VL, true, false);<br class="">
+      newTreeEntry(VL, true);<br class="">
       DEBUG(dbgs() << "SLP: added a ShuffleVector op.\n");<br class="">
<br class="">
       // Reorder operands if reordering would enable vectorization.<br class="">
@@ -1553,7 +1556,7 @@ void BoUpSLP::buildTree_rec(<wbr class="">ArrayRef<Val<br class="">
     }<br class="">
     default:<br class="">
       BS.cancelScheduling(VL);<br class="">
-      newTreeEntry(VL, false, false);<br class="">
+      newTreeEntry(VL, false);<br class="">
       DEBUG(dbgs() << "SLP: Gathering unknown instruction.\n");<br class="">
       return;<br class="">
   }<br class="">
@@ -1792,7 +1795,7 @@ int BoUpSLP::getEntryCost(<wbr class="">TreeEntry *E)<br class="">
             TTI->getMemoryOpCost(<wbr class="">Instruction::Load, ScalarTy, alignment, 0);<br class="">
       int VecLdCost = TTI->getMemoryOpCost(<wbr class="">Instruction::Load,<br class="">
                                            VecTy, alignment, 0);<br class="">
-      if (E->NeedToShuffle) {<br class="">
+      if (!E->ShuffleMask.empty()) {<br class="">
         VecLdCost += TTI->getShuffleCost(<br class="">
             TargetTransformInfo::SK_<wbr class="">PermuteSingleSrc, VecTy, 0);<br class="">
       }<br class="">
@@ -2358,8 +2361,9 @@ Value *BoUpSLP::vectorizeTree(<wbr class="">ArrayRef<V<br class="">
   if (ScalarToTreeEntry.count(VL[0]<wbr class="">)) {<br class="">
     int Idx = ScalarToTreeEntry[VL[0]];<br class="">
     TreeEntry *E = &VectorizableTree[Idx];<br class="">
-    if (E->isSame(VL) || (E->NeedToShuffle && E->isFoundJumbled(VL, *DL, *SE)))<br class="">
-      return vectorizeTree(VL, E);<br class="">
+    if (E->isSame(VL) ||<br class="">
+        (!E->ShuffleMask.empty() && E->isFoundJumbled(VL, *DL, *SE)))<br class="">
+      return vectorizeTree(E);<br class="">
   }<br class="">
<br class="">
   Type *ScalarTy = VL[0]->getType();<br class="">
@@ -2370,10 +2374,10 @@ Value *BoUpSLP::vectorizeTree(<wbr class="">ArrayRef<V<br class="">
   return Gather(VL, VecTy);<br class="">
 }<br class="">
<br class="">
-Value *BoUpSLP::vectorizeTree(<wbr class="">ArrayRef<Value *> VL, TreeEntry *E) {<br class="">
+Value *BoUpSLP::vectorizeTree(<wbr class="">TreeEntry *E) {<br class="">
   IRBuilder<>::InsertPointGuard Guard(Builder);<br class="">
<br class="">
-  if (E->VectorizedValue && !E->NeedToShuffle) {<br class="">
+  if (E->VectorizedValue && E->ShuffleMask.empty()) {<br class="">
     DEBUG(dbgs() << "SLP: Diamond merged for " << *E->Scalars[0] << ".\n");<br class="">
     return E->VectorizedValue;<br class="">
   }<br class="">
@@ -2611,27 +2615,18 @@ Value *BoUpSLP::vectorizeTree(<wbr class="">ArrayRef<V<br class="">
<br class="">
       // As program order of scalar loads are jumbled, the vectorized 'load'<br class="">
       // must be followed by a 'shuffle' with the required jumbled mask.<br class="">
-      if (!VL.empty() && (E->NeedToShuffle)) {<br class="">
-        assert(VL.size() == E->Scalars.size() &&<br class="">
-               "Equal number of scalars expected");<br class="">
+      if (!E->ShuffleMask.empty()) {<br class="">
         SmallVector<Constant *, 8> Mask;<br class="">
-        for (Value *Val : VL) {<br class="">
-          if (ScalarToTreeEntry.count(Val)) {<br class="">
-            int Idx = ScalarToTreeEntry[Val];<br class="">
-            TreeEntry *E = &VectorizableTree[Idx];<br class="">
-            for (unsigned Lane = 0, LE = VL.size(); Lane != LE; ++Lane) {<br class="">
-              if (E->Scalars[Lane] == Val) {<br class="">
-                Mask.push_back(Builder.<wbr class="">getInt32(Lane));<br class="">
-                break;<br class="">
-              }<br class="">
-            }<br class="">
-          }<br class="">
+        for (unsigned Lane = 0, LE = E->ShuffleMask.size(); Lane != LE;<br class="">
+             ++Lane) {<br class="">
+          Mask.push_back(Builder.<wbr class="">getInt32(E->ShuffleMask[Lane])<wbr class="">);<br class="">
         }<br class="">
-<br class="">
         // Generate shuffle for jumbled memory access<br class="">
         Value *Undef = UndefValue::get(VecTy);<br class="">
         Value *Shuf = Builder.CreateShuffleVector((<wbr class="">Value *)LI, Undef,<br class="">
                                                   ConstantVector::get(Mask));<br class="">
+        E->VectorizedValue = Shuf;<br class="">
+        ++NumVectorInstructions;<br class="">
         return Shuf;<br class="">
       }<br class="">
<br class="">
@@ -2816,7 +2811,7 @@ BoUpSLP::vectorizeTree(<wbr class="">ExtraValueToDebug<br class="">
   }<br class="">
<br class="">
   Builder.SetInsertPoint(&F-><wbr class="">getEntryBlock().front());<br class="">
-  auto *VectorRoot = vectorizeTree(ArrayRef<Value *>(), &VectorizableTree[0]);<br class="">
+  auto *VectorRoot = vectorizeTree(&<wbr class="">VectorizableTree[0]);<br class="">
<br class="">
   // If the vectorized tree can be rewritten in a smaller type, we truncate the<br class="">
   // vectorized root. InstCombine will then rewrite the entire expression. We<br class="">
@@ -2861,8 +2856,20 @@ BoUpSLP::vectorizeTree(<wbr class="">ExtraValueToDebug<br class="">
<br class="">
     Value *Vec = E->VectorizedValue;<br class="">
     assert(Vec && "Can't find vectorizable value");<br class="">
-<br class="">
-    Value *Lane = Builder.getInt32(ExternalUse.<wbr class="">Lane);<br class="">
+    unsigned i = 0;<br class="">
+    Value *Lane;<br class="">
+    // In case vectorizable scalars use are not in-order, scalars would have<br class="">
+    // been shuffled.Recompute the proper Lane of ExternalUse.<br class="">
+    if (!E->ShuffleMask.empty()) {<br class="">
+      SmallVector<unsigned, 4> Val(E->ShuffleMask.size());<br class="">
+      for (; i < E->ShuffleMask.size(); i++) {<br class="">
+        if (E->ShuffleMask[i] == (unsigned)ExternalUse.Lane)<br class="">
+          break;<br class="">
+      }<br class="">
+      Lane = Builder.getInt32(i);<br class="">
+    } else {<br class="">
+      Lane = Builder.getInt32(ExternalUse.<wbr class="">Lane);<br class="">
+    }<br class="">
     // If User == nullptr, the Scalar is used as extra arg. Generate<br class="">
     // ExtractElement instruction and update the record for this scalar in<br class="">
     // ExternallyUsedValues.<br class="">
<br class="">
Added: llvm/trunk/test/Transforms/<wbr class="">SLPVectorizer/X86/jumbled-<wbr class="">load-bug.ll<br class="">
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-load-bug.ll?rev=296863&view=auto" target="_blank" class="">
http://llvm.org/viewvc/llvm-<wbr class="">project/llvm/trunk/test/<wbr class="">Transforms/SLPVectorizer/X86/<wbr class="">jumbled-load-bug.ll?rev=<wbr class="">296863&view=auto</a><br class="">
==============================<wbr class="">==============================<wbr class="">==================<br class="">
--- llvm/trunk/test/Transforms/<wbr class="">SLPVectorizer/X86/jumbled-<wbr class="">load-bug.ll (added)<br class="">
+++ llvm/trunk/test/Transforms/<wbr class="">SLPVectorizer/X86/jumbled-<wbr class="">load-bug.ll Fri Mar  3 04:02:47 2017<br class="">
@@ -0,0 +1,43 @@<br class="">
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py<br class="">
+; RUN: opt < %s -S -slp-vectorizer | FileCheck %s<br class="">
+<br class="">
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:<wbr class="">32:64-S128"<br class="">
+target triple = "x86_64-unknown-linux-gnu"<br class="">
+<br class="">
+define <4 x i32> @zot() #0 {<br class="">
+; CHECK-LABEL: @zot(<br class="">
+; CHECK-NEXT:  bb:<br class="">
+; CHECK-NEXT:    [[P0:%.*]] = getelementptr inbounds [4 x i8], [4 x i8]* undef, i64 undef, i64 0<br class="">
+; CHECK-NEXT:    [[P1:%.*]] = getelementptr inbounds [4 x i8], [4 x i8]* undef, i64 undef, i64 1<br class="">
+; CHECK-NEXT:    [[P2:%.*]] = getelementptr inbounds [4 x i8], [4 x i8]* undef, i64 undef, i64 2<br class="">
+; CHECK-NEXT:    [[P3:%.*]] = getelementptr inbounds [4 x i8], [4 x i8]* undef, i64 undef, i64 3<br class="">
+; CHECK-NEXT:    [[TMP0:%.*]] = bitcast i8* [[P0]] to <4 x i8>*<br class="">
+; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x i8>, <4 x i8>* [[TMP0]], align 1<br class="">
+; CHECK-NEXT:    [[TMP2:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3><br class="">
+; CHECK-NEXT:    [[TMP3:%.*]] = extractelement <4 x i8> [[TMP2]], i32 0<br class="">
+; CHECK-NEXT:    [[I0:%.*]] = insertelement <4 x i8> undef, i8 [[TMP3]], i32 0<br class="">
+; CHECK-NEXT:    [[TMP4:%.*]] = extractelement <4 x i8> [[TMP2]], i32 1<br class="">
+; CHECK-NEXT:    [[I1:%.*]] = insertelement <4 x i8> [[I0]], i8 [[TMP4]], i32 1<br class="">
+; CHECK-NEXT:    [[TMP5:%.*]] = extractelement <4 x i8> [[TMP2]], i32 2<br class="">
+; CHECK-NEXT:    [[I2:%.*]] = insertelement <4 x i8> [[I1]], i8 [[TMP5]], i32 2<br class="">
+; CHECK-NEXT:    [[TMP6:%.*]] = extractelement <4 x i8> [[TMP2]], i32 3<br class="">
+; CHECK-NEXT:    [[I3:%.*]] = insertelement <4 x i8> [[I2]], i8 [[TMP6]], i32 3<br class="">
+; CHECK-NEXT:    [[RET:%.*]] = zext <4 x i8> [[I3]] to <4 x i32><br class="">
+; CHECK-NEXT:    ret <4 x i32> [[RET]]<br class="">
+;<br class="">
+bb:<br class="">
+  %p0 = getelementptr inbounds [4 x i8], [4 x i8]* undef, i64 undef, i64 0<br class="">
+  %p1 = getelementptr inbounds [4 x i8], [4 x i8]* undef, i64 undef, i64 1<br class="">
+  %p2 = getelementptr inbounds [4 x i8], [4 x i8]* undef, i64 undef, i64 2<br class="">
+  %p3 = getelementptr inbounds [4 x i8], [4 x i8]* undef, i64 undef, i64 3<br class="">
+  %v3 = load i8, i8* %p3, align 1<br class="">
+  %v2 = load i8, i8* %p2, align 1<br class="">
+  %v0 = load i8, i8* %p0, align 1<br class="">
+  %v1 = load i8, i8* %p1, align 1<br class="">
+  %i0 = insertelement <4 x i8> undef, i8 %v1, i32 0<br class="">
+  %i1 = insertelement <4 x i8> %i0, i8 %v0, i32 1<br class="">
+  %i2 = insertelement <4 x i8> %i1, i8 %v2, i32 2<br class="">
+  %i3 = insertelement <4 x i8> %i2, i8 %v3, i32 3<br class="">
+  %ret = zext <4 x i8> %i3 to <4 x i32><br class="">
+  ret <4 x i32> %ret<br class="">
+}<br class="">
<br class="">
Modified: llvm/trunk/test/Transforms/<wbr class="">SLPVectorizer/X86/jumbled-<wbr class="">same.ll<br class="">
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/jumbled-same.ll?rev=296863&r1=296862&r2=296863&view=diff" target="_blank" class="">
http://llvm.org/viewvc/llvm-<wbr class="">project/llvm/trunk/test/<wbr class="">Transforms/SLPVectorizer/X86/<wbr class="">jumbled-same.ll?rev=296863&r1=<wbr class="">296862&r2=296863&view=diff</a><br class="">
==============================<wbr class="">==============================<wbr class="">==================<br class="">
--- llvm/trunk/test/Transforms/<wbr class="">SLPVectorizer/X86/jumbled-<wbr class="">same.ll (original)<br class="">
+++ llvm/trunk/test/Transforms/<wbr class="">SLPVectorizer/X86/jumbled-<wbr class="">same.ll Fri Mar  3 04:02:47 2017<br class="">
@@ -13,7 +13,7 @@ define i32 @fn1() {<br class="">
 ; CHECK-NEXT:    [[TMP0:%.*]] = load <4 x i32>, <4 x i32>* bitcast ([4 x i32]* @b to <4 x i32>*), align 4<br class="">
 ; CHECK-NEXT:    [[TMP1:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> undef, <4 x i32> <i32 1, i32 2, i32 3, i32 0><br class="">
 ; CHECK-NEXT:    [[TMP2:%.*]] = icmp sgt <4 x i32> [[TMP1]], zeroinitializer<br class="">
-; CHECK-NEXT:    [[TMP3:%.*]] = extractelement <4 x i32> [[TMP0]], i32 1<br class="">
+; CHECK-NEXT:    [[TMP3:%.*]] = extractelement <4 x i32> [[TMP1]], i32 0<br class="">
 ; CHECK-NEXT:    [[TMP4:%.*]] = insertelement <4 x i32> undef, i32 [[TMP3]], i32 0<br class="">
 ; CHECK-NEXT:    [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 ptrtoint (i32 ()* @fn1 to i32), i32 1<br class="">
 ; CHECK-NEXT:    [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 ptrtoint (i32 ()* @fn1 to i32), i32 2<br class="">
<br class="">
<br class="">
______________________________<wbr class="">_________________<br class="">
llvm-commits mailing list<br class="">
<a href="mailto:llvm-commits@lists.llvm.org" target="_blank" class="">llvm-commits@lists.llvm.org</a><br class="">
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" target="_blank" class="">http://lists.llvm.org/cgi-bin/<wbr class="">mailman/listinfo/llvm-commits</a><u class=""></u><u class=""></u></p>
</blockquote>
</div><p class="MsoNormal"> <u class=""></u><u class=""></u></p>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div><p class="MsoNormal"><u class=""></u> <u class=""></u></p>
</div>
</div>
</div></div></div>
</div>

</blockquote></div><br class=""></div>
</div></blockquote></div><br class=""></body></html>