<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div><br class=""><blockquote type="cite" class=""><div class=""><div dir="auto" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><div dir="auto" class=""><br class=""></div><div dir="auto" class="">Nirav</div></div><div class="gmail_extra" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><br class=""><div class="gmail_quote">On Mar 16, 2017 16:19, "Aditya Nandakumar" <<a href="mailto:aditya_nandakumar@apple.com" class="">aditya_nandakumar@apple.com</a>> wrote:<br type="attribution" class=""><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><div style="word-wrap: break-word;" class="">Hi Nirav<div class=""><br class=""></div><div class="">This patch is affecting our internal backends (large instruction count regressions). I haven’t completely gone through your patch but form what I see, the problem seems to be that we don’t handle</div><div class="">descending into TokenFactors (in getStoreMergeCandidates).</div><div class="">I also see a relevant FIXME which matches what I observe as missing. I have the relevant DAG dump from before and after this change below.</div><div class="">Before:</div><div class=""><div class=""><br class=""></div><div class="">             <span class="Apple-converted-space"> </span>t17: i64 = add t6, Constant:i64<4></div><div class="">           <span class="Apple-converted-space"> </span>t18: ch = store<ST1[%dst.gep2.i105.4](<wbr class="">align=2)> t15, Constant:i8<0>, t17, undef:i64</div><div class="">           <span class="Apple-converted-space"> </span>t20: i64 = add t6, Constant:i64<5></div><div class="">         <span class="Apple-converted-space"> </span>t21: ch = store<ST1[%dst.gep2.i105.5]> t18, Constant:i8<0>, t20, undef:i64</div></div><div class=""><br class=""></div><div class="">After:</div><div class="">              t17: i64 = add t6, Constant:i64<4><br class="">            t18: ch = store<ST1[%dst.gep2.i105.4](<wbr class="">align=2)> t15, Constant:i8<0>, t17, undef:i64<br class="">              t20: i64 = add t6, Constant:i64<5><br class="">            t50: ch = store<ST1[%dst.gep2.i105.5]> t0, Constant:i8<0>, t20, undef:i64<br class="">          t51: ch = TokenFactor t18, t50</div><div class=""><br class=""></div><div class="">Clearly we need to handle TokenFactors for getStoreMergeCandidates.</div><div class=""><br class=""></div><div class="">Would it be possible to revert this patch and commit it again once you handle TokenFactors? Do you have an ETA for the TokenFactors handling ?</div><div class=""><br class=""></div><div class="">Thanks</div><div class="">Aditya</div><div class=""><div class=""><blockquote type="cite" class=""><div class="">On Mar 13, 2017, at 6:50 PM, Nirav Davé via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" target="_blank" class="">llvm-commits@lists.llvm.org</a>> wrote:</div><br class="m_-8394487157821945359Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="">Yes. It'll be in presently.<br class=""></div><div class=""><br class=""></div><div class="">Thanks, </div><div class=""><br class=""></div><div class="">-Nirav</div><div class=""><br class=""></div><div class="gmail_extra"><div class="gmail_quote">On Mon, Mar 13, 2017 at 9:23 PM, Craig Topper<span class="Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:craig.topper@gmail.com" target="_blank" class="">craig.topper@gmail.com</a>></span><span class="Apple-converted-space"> </span>wrote:</div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><div dir="ltr" class="">Will you also be restoring my fix for i256-add.ll?</div><div class="gmail_extra"><br clear="all" class=""><div class=""><div class="m_-8394487157821945359m_4108594283957026219gmail_signature" data-smartmail="gmail_signature">~Craig</div></div><br class=""><div class="gmail_quote">On Mon, Mar 13, 2017 at 5:34 PM, Nirav Dave via llvm-commits<span class="Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank" class="">llvm-commits@lists.llvm.org</a>></span><span class="Apple-converted-space"> </span>wrote:<br class=""><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;">Author: niravd<br class="">Date: Mon Mar 13 19:34:14 2017<br class="">New Revision: 297695<br class=""><br class="">URL:<span class="Apple-converted-space"> </span><a href="http://llvm.org/viewvc/llvm-project?rev=297695&view=rev" rel="noreferrer" target="_blank" class="">http://llvm.org/viewvc/llvm-pr<wbr class="">oject?rev=297695&view=rev</a><br class="">Log:<br class="">In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.<br class=""><br class="">   <span class="Apple-converted-space"> </span>Recommiting with compiler time improvements<br class=""><br class="">   <span class="Apple-converted-space"> </span>Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.<br class=""><br class="">   <span class="Apple-converted-space"> </span>* Simplify Consecutive Merge Store Candidate Search<br class=""><br class="">   <span class="Apple-converted-space"> </span>Now that address aliasing is much less conservative, push through<br class="">   <span class="Apple-converted-space"> </span>simplified store merging search and chain alias analysis which only<br class="">   <span class="Apple-converted-space"> </span>checks for parallel stores through the chain subgraph. This is cleaner<br class="">   <span class="Apple-converted-space"> </span>as the separation of non-interfering loads/stores from the<br class="">   <span class="Apple-converted-space"> </span>store-merging logic.<br class=""><br class="">   <span class="Apple-converted-space"> </span>When merging stores search up the chain through a single load, and<br class="">   <span class="Apple-converted-space"> </span>finds all possible stores by looking down from through a load and a<br class="">   <span class="Apple-converted-space"> </span>TokenFactor to all stores visited.<br class=""><br class="">   <span class="Apple-converted-space"> </span>This improves the quality of the output SelectionDAG and the output<br class="">   <span class="Apple-converted-space"> </span>Codegen (save perhaps for some ARM cases where we correctly constructs<br class="">   <span class="Apple-converted-space"> </span>wider loads, but then promotes them to float operations which appear<br class="">   <span class="Apple-converted-space"> </span>but requires more expensive constant generation).<br class=""><br class="">   <span class="Apple-converted-space"> </span>Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)<br class=""><br class="">   <span class="Apple-converted-space"> </span>Additional Minor Changes:<br class=""><br class="">     <span class="Apple-converted-space"> </span>1. Finishes removing unused AliasLoad code<br class=""><br class="">     <span class="Apple-converted-space"> </span>2. Unifies the chain aggregation in the merged stores across code<br class="">         paths<br class=""><br class="">     <span class="Apple-converted-space"> </span>3. Re-add the Store node to the worklist after calling<br class="">         SimplifyDemandedBits.<br class=""><br class="">     <span class="Apple-converted-space"> </span>4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is<br class="">         arbitrary, but seems sufficient to not cause regressions in<br class="">         tests.<br class=""><br class="">     <span class="Apple-converted-space"> </span>5. Remove Chain dependencies of Memory operations on CopyfromReg<br class="">         nodes as these are captured by data dependence<br class=""><br class="">     <span class="Apple-converted-space"> </span>6. Forward loads-store values through tokenfactors containing<br class="">         <span class="Apple-converted-space"> </span>{CopyToReg,CopyFromReg} Values.<br class=""><br class="">     <span class="Apple-converted-space"> </span>7. Peephole to convert buildvector of extract_vector_elt to<br class="">         extract_subvector if possible (see<br class="">         CodeGen/AArch64/store-merge.l<wbr class="">l)<br class=""><br class="">     <span class="Apple-converted-space"> </span>8. Store merging for the ARM target is restricted to 32-bit as<br class="">         some in some contexts invalid 64-bit operations are being<br class="">         generated. This can be removed once appropriate checks are<br class="">         added.<br class=""><br class="">   <span class="Apple-converted-space"> </span>This finishes the change Matt Arsenault started in r246307 and<br class="">   <span class="Apple-converted-space"> </span>jyknight's original patch.<br class=""><br class="">   <span class="Apple-converted-space"> </span>Many tests required some changes as memory operations are now<br class="">   <span class="Apple-converted-space"> </span>reorderable, improving load-store forwarding. One test in<br class="">   <span class="Apple-converted-space"> </span>particular is worth noting:<br class=""><br class="">     <span class="Apple-converted-space"> </span>CodeGen/PowerPC/ppc64-align-lo<wbr class="">ng-double.ll - Improved load-store<br class="">     <span class="Apple-converted-space"> </span>forwarding converts a load-store pair into a parallel store and<br class="">     <span class="Apple-converted-space"> </span>a memory-realized bitcast of the same value. However, because we<br class="">     <span class="Apple-converted-space"> </span>lose the sharing of the explicit and implicit store values we<br class="">     <span class="Apple-converted-space"> </span>must create another local store. A similar transformation<br class="">     <span class="Apple-converted-space"> </span>happens before SelectionDAG as well.<br class=""><br class="">   <span class="Apple-converted-space"> </span>Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle<br class=""><br class="">Added:<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/pr<wbr class="">32108.ll<br class="">Removed:<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/co<wbr class="">mbiner-aa-0.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/co<wbr class="">mbiner-aa-1.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/pr<wbr class="">18023.ll<br class="">Modified:<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/include/llvm/Target<wbr class="">/TargetLowering.h<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/lib/CodeGen/Selecti<wbr class="">onDAG/DAGCombiner.cpp<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/lib/CodeGen/TargetL<wbr class="">oweringBase.cpp<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/lib/Target/AArch64/<wbr class="">AArch64ISelLowering.cpp<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/lib/Target/ARM/ARMI<wbr class="">SelLowering.h<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AArch6<wbr class="">4/argument-blocks.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AArch6<wbr class="">4/arm64-abi.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AArch6<wbr class="">4/arm64-memset-inline.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AArch6<wbr class="">4/arm64-variadic-aapcs.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AArch6<wbr class="">4/merge-store.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AArch6<wbr class="">4/vector_merge_dep_check.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AMDGPU<wbr class="">/debugger-insert-nops.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AMDGPU<wbr class="">/insert_vector_elt.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AMDGPU<wbr class="">/merge-stores.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AMDGPU<wbr class="">/private-element-size.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AMDGPU<wbr class="">/si-triv-disjoint-mem-access.l<wbr class="">l<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/ARM/20<wbr class="">12-10-04-AAPCS-byval-align8.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/ARM/al<wbr class="">loc-no-stack-realign.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/ARM/gp<wbr class="">r-paired-spill.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/ARM/if<wbr class="">cvt10.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/ARM/il<wbr class="">legal-bitfield-loadstore.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/ARM/st<wbr class="">atic-addr-hoisting.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/BPF/un<wbr class="">def.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/MSP430<wbr class="">/Inst16mm.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/c<wbr class="">conv/arguments-float.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/c<wbr class="">conv/arguments-varargs.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/f<wbr class="">astcc.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/l<wbr class="">oad-store-left-right.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/m<wbr class="">icromips-li.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/m<wbr class="">ips64-f128-call.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/m<wbr class="">ips64-f128.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/m<wbr class="">no-ldc1-sdc1.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/m<wbr class="">sa/f16-llvm-ir.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/m<wbr class="">sa/i5_ld_st.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/o<wbr class="">32_cc_byval.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/o<wbr class="">32_cc_vararg.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/PowerP<wbr class="">C/anon_aggr.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/PowerP<wbr class="">C/complex-return.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/PowerP<wbr class="">C/jaggedstructs.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/PowerP<wbr class="">C/ppc64-align-long-double.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/PowerP<wbr class="">C/structsinmem.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/PowerP<wbr class="">C/structsinregs.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/System<wbr class="">Z/unaligned-01.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Thumb/<wbr class="">2010-07-15-debugOrdering.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Thumb/<wbr class="">stack-access.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/20<wbr class="">10-09-17-SideEffectsInChain.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/20<wbr class="">12-11-28-merge-store-alias.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/Me<wbr class="">rgeConsecutiveStores.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/av<wbr class="">x-vbroadcast.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/av<wbr class="">x512-mask-op.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/ch<wbr class="">ain_order.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/cl<wbr class="">ear_upper_vector_element_bits.<wbr class="">ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/co<wbr class="">py-eflags.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/da<wbr class="">g-merge-fast-accesses.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/do<wbr class="">nt-trunc-store-double-to-float<wbr class="">.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/ex<wbr class="">tractelement-legalization-stor<wbr class="">e-ordering.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/i2<wbr class="">56-add.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/i3<wbr class="">86-shrink-wrapping.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/li<wbr class="">ve-range-nosubreg.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/lo<wbr class="">nglong-deadload.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/me<wbr class="">rge-consecutive-loads-128.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/me<wbr class="">rge-consecutive-loads-256.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/me<wbr class="">rge-store-partially-alias-load<wbr class="">s.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/sp<wbr class="">lit-store.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/st<wbr class="">ores-merging.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/ve<wbr class="">ctor-compare-results.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/ve<wbr class="">ctor-shuffle-variable-128.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/ve<wbr class="">ctor-shuffle-variable-256.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/ve<wbr class="">ctorcall.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/wi<wbr class="">n32-eh.ll<br class="">   <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/XCore/<wbr class="">varargs.ll<br class=""><br class="">Modified: llvm/trunk/include/llvm/Target<wbr class="">/TargetLowering.h<br class="">URL:<span class="Apple-converted-space"> </span><a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetLowering.h?rev=297695&r1=297694&r2=297695&view=diff" rel="noreferrer" target="_blank" class="">http://llvm.org/viewvc/llvm-pr<wbr class="">oject/llvm/trunk/include/llvm/<wbr class="">Target/TargetLowering.h?rev=29<wbr class="">7695&r1=297694&r2=297695&view=<wbr class="">diff</a><br class="">==============================<wbr class="">==============================<wbr class="">==================<br class="">--- llvm/trunk/include/llvm/Target<wbr class="">/TargetLowering.h (original)<br class="">+++ llvm/trunk/include/llvm/Target<wbr class="">/TargetLowering.h Mon Mar 13 19:34:14 2017<br class="">@@ -363,6 +363,9 @@ public:<br class="">     return false;<br class="">   }<br class=""><br class="">+  /// Returns if it's reasonable to merge stores to MemVT size.<br class="">+  virtual bool canMergeStoresTo(EVT MemVT) const { return true; }<br class="">+<br class="">   /// \brief Return true if it is cheap to speculate a call to intrinsic cttz.<br class="">   virtual bool isCheapToSpeculateCttz() const {<br class="">     return false;<br class=""><br class="">Modified: llvm/trunk/lib/CodeGen/Selecti<wbr class="">onDAG/DAGCombiner.cpp<br class="">URL:<span class="Apple-converted-space"> </span><a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=297695&r1=297694&r2=297695&view=diff" rel="noreferrer" target="_blank" class="">http://llvm.org/viewvc/llvm-pr<wbr class="">oject/llvm/trunk/lib/CodeGen/S<wbr class="">electionDAG/DAGCombiner.cpp?re<wbr class="">v=297695&r1=297694&r2=297695&<wbr class="">view=diff</a><br class="">==============================<wbr class="">==============================<wbr class="">==================<br class="">--- llvm/trunk/lib/CodeGen/Selecti<wbr class="">onDAG/DAGCombiner.cpp (original)<br class="">+++ llvm/trunk/lib/CodeGen/Selecti<wbr class="">onDAG/DAGCombiner.cpp Mon Mar 13 19:34:14 2017<br class="">@@ -53,10 +53,6 @@ STATISTIC(SlicedLoads, "Number of load s<br class=""><br class=""> namespace {<br class="">   static cl::opt<bool><br class="">-    CombinerAA("combiner-alias-ana<wbr class="">lysis", cl::Hidden,<br class="">-               cl::desc("Enable DAG combiner alias-analysis heuristics"));<br class="">-<br class="">-  static cl::opt<bool><br class="">     CombinerGlobalAA("combiner-gl<wbr class="">obal-alias-analysis", cl::Hidden,<br class="">               <span class="Apple-converted-space"> </span>cl::desc("Enable DAG combiner's use of IR alias analysis"));<br class=""><br class="">@@ -133,6 +129,9 @@ namespace {<br class="">     /// Add to the worklist making sure its instance is at the back (next to be<br class="">     /// processed.)<br class="">     void AddToWorklist(SDNode *N) {<br class="">+      assert(N->getOpcode() != ISD::DELETED_NODE &&<br class="">+             "Deleted Node added to Worklist");<br class="">+<br class="">       // Skip handle nodes as they can't usefully be combined and confuse the<br class="">       // zero-use deletion strategy.<br class="">       if (N->getOpcode() == ISD::HANDLENODE)<br class="">@@ -177,6 +176,7 @@ namespace {<br class="">     void CommitTargetLoweringOpt(const TargetLowering::TargetLowering<wbr class="">Opt &TLO);<br class=""><br class="">   private:<br class="">+    unsigned MaximumLegalStoreInBits;<br class=""><br class="">     /// Check the specified integer node value to see if it can be simplified or<br class="">     /// if things it uses can be simplified by bit propagation.<br class="">@@ -422,15 +422,12 @@ namespace {<br class="">     /// Holds a pointer to an LSBaseSDNode as well as information on where it<br class="">     /// is located in a sequence of memory operations connected by a chain.<br class="">     struct MemOpLink {<br class="">-      MemOpLink (LSBaseSDNode *N, int64_t Offset, unsigned Seq):<br class="">-      MemNode(N), OffsetFromBase(Offset), SequenceNum(Seq) { }<br class="">+      MemOpLink(LSBaseSDNode *N, int64_t Offset)<br class="">+          : MemNode(N), OffsetFromBase(Offset) {}<br class="">       // Ptr to the mem node.<br class="">       LSBaseSDNode *MemNode;<br class="">       // Offset from the base ptr.<br class="">       int64_t OffsetFromBase;<br class="">-      // What is the sequence number of this mem node.<br class="">-      // Lowest mem operand in the DAG starts at zero.<br class="">-      unsigned SequenceNum;<br class="">     };<br class=""><br class="">     /// This is a helper function for visitMUL to check the profitability<br class="">@@ -441,12 +438,6 @@ namespace {<br class="">                                     <span class="Apple-converted-space"> </span>SDValue &AddNode,<br class="">                                     <span class="Apple-converted-space"> </span>SDValue &ConstNode);<br class=""><br class="">-    /// This is a helper function for MergeStoresOfConstantsOrVecElt<wbr class="">s. Returns a<br class="">-    /// constant build_vector of the stored constant values in Stores.<br class="">-    SDValue getMergedConstantVectorStore(S<wbr class="">electionDAG &DAG, const SDLoc &SL,<br class="">-                                         ArrayRef<MemOpLink> Stores,<br class="">-                                         SmallVectorImpl<SDValue> &Chains,<br class="">-                                         EVT Ty) const;<br class=""><br class="">     /// This is a helper function for visitAND and visitZERO_EXTEND.  Returns<br class="">     /// true if the (and (load x) c) pattern matches an extload.  ExtVT returns<br class="">@@ -460,18 +451,15 @@ namespace {<br class="">     /// This is a helper function for MergeConsecutiveStores. When the source<br class="">     /// elements of the consecutive stores are all constants or all extracted<br class="">     /// vector elements, try to merge them into one larger store.<br class="">-    /// \return number of stores that were merged into a merged store (always<br class="">-    /// a prefix of \p StoreNode).<br class="">-    bool MergeStoresOfConstantsOrVecElt<wbr class="">s(<br class="">-        SmallVectorImpl<MemOpLink> &StoreNodes, EVT MemVT, unsigned NumStores,<br class="">-        bool IsConstantSrc, bool UseVector);<br class="">+    /// \return True if a merged store was created.<br class="">+    bool MergeStoresOfConstantsOrVecElt<wbr class="">s(SmallVectorImpl<MemOpLink> &StoreNodes,<br class="">+                                         EVT MemVT, unsigned NumStores,<br class="">+                                         bool IsConstantSrc, bool UseVector);<br class=""><br class="">     /// This is a helper function for MergeConsecutiveStores.<br class="">     /// Stores that may be merged are placed in StoreNodes.<br class="">-    /// Loads that may alias with those stores are placed in AliasLoadNodes.<br class="">-    void getStoreMergeAndAliasCandidate<wbr class="">s(<br class="">-        StoreSDNode* St, SmallVectorImpl<MemOpLink> &StoreNodes,<br class="">-        SmallVectorImpl<LSBaseSDNode*> &AliasLoadNodes);<br class="">+    void getStoreMergeCandidates(StoreS<wbr class="">DNode *St,<br class="">+                                 SmallVectorImpl<MemOpLink> &StoreNodes);<br class=""><br class="">     /// Helper function for MergeConsecutiveStores. Checks if<br class="">     /// Candidate stores have indirect dependency through their<br class="">@@ -483,8 +471,7 @@ namespace {<br class="">     /// This optimization uses wide integers or vectors when possible.<br class="">     /// \return number of stores that were merged into a merged store (the<br class="">     /// affected nodes are stored as a prefix in \p StoreNodes).<br class="">-    bool MergeConsecutiveStores(StoreSD<wbr class="">Node *N,<br class="">-                                SmallVectorImpl<MemOpLink> &StoreNodes);<br class="">+    bool MergeConsecutiveStores(StoreSD<wbr class="">Node *N);<br class=""><br class="">     /// \brief Try to transform a truncation where C is a constant:<br class="">     ///     (trunc (and X, C)) -> (and (trunc X), (trunc C))<br class="">@@ -499,6 +486,13 @@ namespace {<br class="">         : DAG(D), TLI(D.getTargetLoweringInfo())<wbr class="">, Level(BeforeLegalizeTypes),<br class="">           OptLevel(OL), LegalOperations(false), LegalTypes(false), AA(A) {<br class="">       ForCodeSize = DAG.getMachineFunction().getFu<wbr class="">nction()->optForSize();<br class="">+<br class="">+      MaximumLegalStoreInBits = 0;<br class="">+      for (MVT VT : MVT::all_valuetypes())<br class="">+        if (EVT(VT).isSimple() && VT != MVT::Other &&<br class="">+            TLI.isTypeLegal(EVT(VT)) &&<br class="">+            VT.getSizeInBits() >= MaximumLegalStoreInBits)<br class="">+          MaximumLegalStoreInBits = VT.getSizeInBits();<br class="">     }<br class=""><br class="">     /// Runs the dag combiner on all nodes in the work list<br class="">@@ -1589,7 +1583,7 @@ SDValue DAGCombiner::visitTokenFactor(<wbr class="">SD<br class="">   }<br class=""><br class="">   SmallVector<SDNode *, 8> TFs;     // List of token factors to visit.<br class="">-  SmallVector<SDValue, 8> Ops;    // Ops for replacing token factor.<br class="">+  SmallVector<SDValue, 8> Ops;      // Ops for replacing token factor.<br class="">   SmallPtrSet<SDNode*, 16> SeenOps;<br class="">   bool Changed = false;             // If we should replace this token factor.<br class=""><br class="">@@ -1633,6 +1627,86 @@ SDValue DAGCombiner::visitTokenFactor(<wbr class="">SD<br class="">     }<br class="">   }<br class=""><br class="">+  // Remove Nodes that are chained to another node in the list. Do so<br class="">+  // by walking up chains breath-first stopping when we've seen<br class="">+  // another operand. In general we must climb to the EntryNode, but we can exit<br class="">+  // early if we find all remaining work is associated with just one operand as<br class="">+  // no further pruning is possible.<br class="">+<br class="">+  // List of nodes to search through and original Ops from which they originate.<br class="">+  SmallVector<std::pair<SDNode *, unsigned>, 8> Worklist;<br class="">+  SmallVector<unsigned, 8> OpWorkCount; // Count of work for each Op.<br class="">+  SmallPtrSet<SDNode *, 16> SeenChains;<br class="">+  bool DidPruneOps = false;<br class="">+<br class="">+  unsigned NumLeftToConsider = 0;<br class="">+  for (const SDValue &Op : Ops) {<br class="">+    Worklist.push_back(std::make_p<wbr class="">air(Op.getNode(), NumLeftToConsider++));<br class="">+    OpWorkCount.push_back(1);<br class="">+  }<br class="">+<br class="">+  auto AddToWorklist = [&](unsigned CurIdx, SDNode *Op, unsigned OpNumber) {<br class="">+    // If this is an Op, we can remove the op from the list. Remark any<br class="">+    // search associated with it as from the current OpNumber.<br class="">+    if (SeenOps.count(Op) != 0) {<br class="">+      Changed = true;<br class="">+      DidPruneOps = true;<br class="">+      unsigned OrigOpNumber = 0;<br class="">+      while (Ops[OrigOpNumber].getNode() != Op && OrigOpNumber < Ops.size())<br class="">+        OrigOpNumber++;<br class="">+      assert((OrigOpNumber != Ops.size()) &&<br class="">+             "expected to find TokenFactor Operand");<br class="">+      // Re-mark worklist from OrigOpNumber to OpNumber<br class="">+      for (unsigned i = CurIdx + 1; i < Worklist.size(); ++i) {<br class="">+        if (Worklist[i].second == OrigOpNumber) {<br class="">+          Worklist[i].second = OpNumber;<br class="">+        }<br class="">+      }<br class="">+      OpWorkCount[OpNumber] += OpWorkCount[OrigOpNumber];<br class="">+      OpWorkCount[OrigOpNumber] = 0;<br class="">+      NumLeftToConsider--;<br class="">+    }<br class="">+    // Add if it's a new chain<br class="">+    if (SeenChains.insert(Op).second) {<br class="">+      OpWorkCount[OpNumber]++;<br class="">+      Worklist.push_back(std::make_p<wbr class="">air(Op, OpNumber));<br class="">+    }<br class="">+  };<br class="">+<br class="">+  for (unsigned i = 0; i < Worklist.size() && i < 1024; ++i) {<br class="">+    // We need at least be consider at least 2 Ops to prune.<br class="">+    if (NumLeftToConsider <= 1)<br class="">+      break;<br class="">+    auto CurNode = Worklist[i].first;<br class="">+    auto CurOpNumber = Worklist[i].second;<br class="">+    assert((OpWorkCount[CurOpNumbe<wbr class="">r] > 0) &&<br class="">+           "Node should not appear in worklist");<br class="">+    switch (CurNode->getOpcode()) {<br class="">+    case ISD::EntryToken:<br class="">+      // Hitting EntryToken is the only way for the search to terminate without<br class="">+      // hitting<br class="">+      // another operand's search. Prevent us from marking this operand<br class="">+      // considered.<br class="">+      NumLeftToConsider++;<br class="">+      break;<br class="">+    case ISD::TokenFactor:<br class="">+      for (const SDValue &Op : CurNode->op_values())<br class="">+        AddToWorklist(i, Op.getNode(), CurOpNumber);<br class="">+      break;<br class="">+    case ISD::CopyFromReg:<br class="">+    case ISD::CopyToReg:<br class="">+      AddToWorklist(i, CurNode->getOperand(0).getNode<wbr class="">(), CurOpNumber);<br class="">+      break;<br class="">+    default:<br class="">+      if (auto *MemNode = dyn_cast<MemSDNode>(CurNode))<br class="">+        AddToWorklist(i, MemNode->getChain().getNode(), CurOpNumber);<br class="">+      break;<br class="">+    }<br class="">+    OpWorkCount[CurOpNumber]--;<br class="">+    if (OpWorkCount[CurOpNumber] == 0)<br class="">+      NumLeftToConsider--;<br class="">+  }<br class="">+<br class="">   SDValue Result;<br class=""><br class="">   // If we've changed things around then replace token factor.<br class="">@@ -1641,15 +1715,22 @@ SDValue DAGCombiner::visitTokenFactor(<wbr class="">SD<br class="">       // The entry token is the only possible outcome.<br class="">       Result = DAG.getEntryNode();<br class="">     } else {<br class="">-      // New and improved token factor.<br class="">-      Result = DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, Ops);<br class="">+      if (DidPruneOps) {<br class="">+        SmallVector<SDValue, 8> PrunedOps;<br class="">+        //<br class="">+        for (const SDValue &Op : Ops) {<br class="">+          if (SeenChains.count(Op.getNode()<wbr class="">) == 0)<br class="">+            PrunedOps.push_back(Op);<br class="">+        }<br class="">+        Result = DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, PrunedOps);<br class="">+      } else {<br class="">+        Result = DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, Ops);<br class="">+      }<br class="">     }<br class=""><br class="">-    // Add users to worklist if AA is enabled, since it may introduce<br class="">-    // a lot of new chained token factors while removing memory deps.<br class="">-    bool UseAA = CombinerAA.getNumOccurrences() > 0 ? CombinerAA<br class="">-      : DAG.getSubtarget().useAA();<br class="">-    return CombineTo(N, Result, UseAA /*add to worklist*/);<br class="">+    // Add users to worklist, since we may introduce a lot of new<br class="">+    // chained token factors while removing memory deps.<br class="">+    return CombineTo(N, Result, true /*add to worklist*/);<br class="">   }<br class=""><br class="">   return Result;<br class="">@@ -6792,6 +6873,9 @@ SDValue DAGCombiner::CombineExtLoad(SD<wbr class="">No<br class="">   SDValue NewChain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, Chains);<br class="">   SDValue NewValue = DAG.getNode(ISD::CONCAT_VECTOR<wbr class="">S, DL, DstVT, Loads);<br class=""><br class="">+  // Simplify TF.<br class="">+  AddToWorklist(NewChain.getNode<wbr class="">());<br class="">+<br class="">   CombineTo(N, NewValue);<br class=""><br class="">   // Replace uses of the original load (before extension)<br class="">@@ -10947,7 +11031,7 @@ SDValue DAGCombiner::visitLOAD(SDNode *N<br class="">               dbgs() << "\n");<br class="">         WorklistRemover DeadNodes(*this);<br class="">         DAG.ReplaceAllUsesOfValueWith<wbr class="">(SDValue(N, 1), Chain);<br class="">-<br class="">+        AddUsersToWorklist(Chain.getNo<wbr class="">de());<br class="">         if (N->use_empty())<br class="">           deleteAndRecombine(N);<br class=""><br class="">@@ -11000,7 +11084,7 @@ SDValue DAGCombiner::visitLOAD(SDNode *N<br class="">       StoreSDNode *PrevST = cast<StoreSDNode>(Chain);<br class="">       if (PrevST->getBasePtr() == Ptr &&<br class="">           PrevST->getValue().getValueTy<wbr class="">pe() == N->getValueType(0))<br class="">-      return CombineTo(N, Chain.getOperand(1), Chain);<br class="">+        return CombineTo(N, PrevST->getOperand(1), Chain);<br class="">     }<br class="">   }<br class=""><br class="">@@ -11018,14 +11102,7 @@ SDValue DAGCombiner::visitLOAD(SDNode *N<br class="">     }<br class="">   }<br class=""><br class="">-  bool UseAA = CombinerAA.getNumOccurrences() > 0 ? CombinerAA<br class="">-                                                  : DAG.getSubtarget().useAA();<br class="">-#ifndef NDEBUG<br class="">-  if (CombinerAAOnlyFunc.getNumOccu<wbr class="">rrences() &&<br class="">-      CombinerAAOnlyFunc != DAG.getMachineFunction().getNa<wbr class="">me())<br class="">-    UseAA = false;<br class="">-#endif<br class="">-  if (UseAA && LD->isUnindexed()) {<br class="">+  if (LD->isUnindexed()) {<br class="">     // Walk up chain skipping non-aliasing memory nodes.<br class="">     SDValue BetterChain = FindBetterChain(N, Chain);<br class=""><br class="">@@ -11607,6 +11684,7 @@ bool DAGCombiner::SliceUpLoad(SDNod<wbr class="">e *N)<br class="">   SDValue Chain = DAG.getNode(ISD::TokenFactor, SDLoc(LD), MVT::Other,<br class="">                               ArgChains);<br class="">   DAG.ReplaceAllUsesOfValueWith<wbr class="">(SDValue(N, 1), Chain);<br class="">+  AddToWorklist(Chain.getNode())<wbr class="">;<br class="">   return true;<br class=""> }<br class=""><br class="">@@ -12000,20 +12078,6 @@ bool DAGCombiner::isMulAddWithConst<wbr class="">Profi<br class="">   return false;<br class=""> }<br class=""><br class="">-SDValue DAGCombiner::getMergedConstant<wbr class="">VectorStore(<br class="">-    SelectionDAG &DAG, const SDLoc &SL, ArrayRef<MemOpLink> Stores,<br class="">-    SmallVectorImpl<SDValue> &Chains, EVT Ty) const {<br class="">-  SmallVector<SDValue, 8> BuildVector;<br class="">-<br class="">-  for (unsigned I = 0, E = Ty.getVectorNumElements(); I != E; ++I) {<br class="">-    StoreSDNode *St = cast<StoreSDNode>(Stores[I].Me<wbr class="">mNode);<br class="">-    Chains.push_back(St->getChain(<wbr class="">));<br class="">-    BuildVector.push_back(St->getV<wbr class="">alue());<br class="">-  }<br class="">-<br class="">-  return DAG.getBuildVector(Ty, SL, BuildVector);<br class="">-}<br class="">-<br class=""> bool DAGCombiner::MergeStoresOfCons<wbr class="">tantsOrVecElts(<br class="">                   SmallVectorImpl<MemOpLink> &StoreNodes, EVT MemVT,<br class="">                   unsigned NumStores, bool IsConstantSrc, bool UseVector) {<br class="">@@ -12022,22 +12086,8 @@ bool DAGCombiner::MergeStoresOfCons<wbr class="">tants<br class="">     return false;<br class=""><br class="">   int64_t ElementSizeBytes = MemVT.getSizeInBits() / 8;<br class="">-  LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode;<br class="">-  unsigned LatestNodeUsed = 0;<br class="">-<br class="">-  for (unsigned i=0; i < NumStores; ++i) {<br class="">-    // Find a chain for the new wide-store operand. Notice that some<br class="">-    // of the store nodes that we found may not be selected for inclusion<br class="">-    // in the wide store. The chain we use needs to be the chain of the<br class="">-    // latest store node which is *used* and replaced by the wide store.<br class="">-    if (StoreNodes[i].SequenceNum < StoreNodes[LatestNodeUsed].Seq<wbr class="">uenceNum)<br class="">-      LatestNodeUsed = i;<br class="">-  }<br class="">-<br class="">-  SmallVector<SDValue, 8> Chains;<br class=""><br class="">   // The latest Node in the DAG.<br class="">-  LSBaseSDNode *LatestOp = StoreNodes[LatestNodeUsed].Mem<wbr class="">Node;<br class="">   SDLoc DL(StoreNodes[0].MemNode);<br class=""><br class="">   SDValue StoredVal;<br class="">@@ -12053,7 +12103,18 @@ bool DAGCombiner::MergeStoresOfCons<wbr class="">tants<br class="">     assert(TLI.isTypeLegal(Ty) && "Illegal vector store");<br class=""><br class="">     if (IsConstantSrc) {<br class="">-      StoredVal = getMergedConstantVectorStore(D<wbr class="">AG, DL, StoreNodes, Chains, Ty);<br class="">+      SmallVector<SDValue, 8> BuildVector;<br class="">+      for (unsigned I = 0, E = Ty.getVectorNumElements(); I != E; ++I) {<br class="">+        StoreSDNode *St = cast<StoreSDNode>(StoreNodes[I<wbr class="">].MemNode);<br class="">+        SDValue Val = St->getValue();<br class="">+        if (MemVT.getScalarType().isInteg<wbr class="">er())<br class="">+          if (auto *CFP = dyn_cast<ConstantFPSDNode>(St-<wbr class="">>getValue()))<br class="">+            Val = DAG.getConstant(<br class="">+                (uint32_t)CFP->getValueAPF().b<wbr class="">itcastToAPInt().getZExtValue()<wbr class="">,<br class="">+                SDLoc(CFP), MemVT);<br class="">+        BuildVector.push_back(Val);<br class="">+      }<br class="">+      StoredVal = DAG.getBuildVector(Ty, DL, BuildVector);<br class="">     } else {<br class="">       SmallVector<SDValue, 8> Ops;<br class="">       for (unsigned i = 0; i < NumStores; ++i) {<br class="">@@ -12063,7 +12124,6 @@ bool DAGCombiner::MergeStoresOfCons<wbr class="">tants<br class="">         if (Val.getValueType() != MemVT)<br class="">           return false;<br class="">         Ops.push_back(Val);<br class="">-        Chains.push_back(St->getChain(<wbr class="">));<br class="">       }<br class=""><br class="">       // Build the extracted vector elements back into a vector.<br class="">@@ -12083,7 +12143,6 @@ bool DAGCombiner::MergeStoresOfCons<wbr class="">tants<br class="">     for (unsigned i = 0; i < NumStores; ++i) {<br class="">       unsigned Idx = IsLE ? (NumStores - 1 - i) : i;<br class="">       StoreSDNode *St  = cast<StoreSDNode>(StoreNodes[I<wbr class="">dx].MemNode);<br class="">-      Chains.push_back(St->getChain(<wbr class="">));<br class=""><br class="">       SDValue Val = St->getValue();<br class="">       StoreInt <<= ElementSizeBytes * 8;<br class="">@@ -12101,54 +12160,36 @@ bool DAGCombiner::MergeStoresOfCons<wbr class="">tants<br class="">     StoredVal = DAG.getConstant(StoreInt, DL, StoreTy);<br class="">   }<br class=""><br class="">-  assert(!Chains.empty());<br class="">+  SmallVector<SDValue, 8> Chains;<br class="">+<br class="">+  // Gather all Chains we're inheriting. As generally all chains are<br class="">+  // equal, do minor check to remove obvious redundancies.<br class="">+  Chains.push_back(StoreNodes[0]<wbr class="">.MemNode->getChain());<br class="">+  for (unsigned i = 1; i < NumStores; ++i)<br class="">+    if (StoreNodes[0].MemNode->getCha<wbr class="">in() != StoreNodes[i].MemNode->getChai<wbr class="">n())<br class="">+      Chains.push_back(StoreNodes[i]<wbr class="">.MemNode->getChain());<br class=""><br class="">+  LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode;<br class="">   SDValue NewChain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, Chains);<br class="">   SDValue NewStore = DAG.getStore(NewChain, DL, StoredVal,<br class="">                                   FirstInChain->getBasePtr(),<br class="">                                   FirstInChain->getPointerInfo(<wbr class="">),<br class="">                                   FirstInChain->getAlignment())<wbr class="">;<br class=""><br class="">-  bool UseAA = CombinerAA.getNumOccurrences() > 0 ? CombinerAA<br class="">-                                                  : DAG.getSubtarget().useAA();<br class="">-  if (UseAA) {<br class="">-    // Replace all merged stores with the new store.<br class="">-    for (unsigned i = 0; i < NumStores; ++i)<br class="">-      CombineTo(StoreNodes[i].MemNod<wbr class="">e, NewStore);<br class="">-  } else {<br class="">-    // Replace the last store with the new store.<br class="">-    CombineTo(LatestOp, NewStore);<br class="">-    // Erase all other stores.<br class="">-    for (unsigned i = 0; i < NumStores; ++i) {<br class="">-      if (StoreNodes[i].MemNode == LatestOp)<br class="">-        continue;<br class="">-      StoreSDNode *St = cast<StoreSDNode>(StoreNodes[i<wbr class="">].MemNode);<br class="">-      // ReplaceAllUsesWith will replace all uses that existed when it was<br class="">-      // called, but graph optimizations may cause new ones to appear. For<br class="">-      // example, the case in pr14333 looks like<br class="">-      //<br class="">-      //  St's chain -> St -> another store -> X<br class="">-      //<br class="">-      // And the only difference from St to the other store is the chain.<br class="">-      // When we change it's chain to be St's chain they become identical,<br class="">-      // get CSEed and the net result is that X is now a use of St.<br class="">-      // Since we know that St is redundant, just iterate.<br class="">-      while (!St->use_empty())<br class="">-        DAG.ReplaceAllUsesWith(SDValue<wbr class="">(St, 0), St->getChain());<br class="">-      deleteAndRecombine(St);<br class="">-    }<br class="">-  }<br class="">+  // Replace all merged stores with the new store.<br class="">+  for (unsigned i = 0; i < NumStores; ++i)<br class="">+    CombineTo(StoreNodes[i].MemNod<wbr class="">e, NewStore);<br class=""><br class="">-  StoreNodes.erase(<a href="http://storenodes.be/" target="_blank" class="">StoreNodes.be</a><wbr class="">gin() + NumStores, StoreNodes.end());<br class="">+  AddToWorklist(NewChain.getNode<wbr class="">());<br class="">   return true;<br class=""> }<br class=""><br class="">-void DAGCombiner::getStoreMergeAndA<wbr class="">liasCandidates(<br class="">-    StoreSDNode* St, SmallVectorImpl<MemOpLink> &StoreNodes,<br class="">-    SmallVectorImpl<LSBaseSDNode*> &AliasLoadNodes) {<br class="">+void DAGCombiner::getStoreMergeCand<wbr class="">idates(<br class="">+    StoreSDNode *St, SmallVectorImpl<MemOpLink> &StoreNodes) {<br class="">   // This holds the base pointer, index, and the offset in bytes from the base<br class="">   // pointer.<br class="">   BaseIndexOffset BasePtr = BaseIndexOffset::match(St->get<wbr class="">BasePtr(), DAG);<br class="">+  EVT MemVT = St->getMemoryVT();<br class=""><br class="">   // We must have a base and an offset.<br class="">   if (!BasePtr.Base.getNode())<br class="">@@ -12158,104 +12199,70 @@ void DAGCombiner::getStoreMergeAndA<wbr class="">liasC<br class="">   if (BasePtr.Base.isUndef())<br class="">     return;<br class=""><br class="">-  // Walk up the chain and look for nodes with offsets from the same<br class="">-  // base pointer. Stop when reaching an instruction with a different kind<br class="">-  // or instruction which has a different base pointer.<br class="">-  EVT MemVT = St->getMemoryVT();<br class="">-  unsigned Seq = 0;<br class="">-  StoreSDNode *Index = St;<br class="">-<br class="">-<br class="">-  bool UseAA = CombinerAA.getNumOccurrences() > 0 ? CombinerAA<br class="">-                                                  : DAG.getSubtarget().useAA();<br class="">-<br class="">-  if (UseAA) {<br class="">-    // Look at other users of the same chain. Stores on the same chain do not<br class="">-    // alias. If combiner-aa is enabled, non-aliasing stores are canonicalized<br class="">-    // to be on the same chain, so don't bother looking at adjacent chains.<br class="">-<br class="">-    SDValue Chain = St->getChain();<br class="">-    for (auto I = Chain->use_begin(), E = Chain->use_end(); I != E; ++I) {<br class="">-      if (StoreSDNode *OtherST = dyn_cast<StoreSDNode>(*I)) {<br class="">-        if (I.getOperandNo() != 0)<br class="">-          continue;<br class="">-<br class="">-        if (OtherST->isVolatile() || OtherST->isIndexed())<br class="">-          continue;<br class="">-<br class="">-        if (OtherST->getMemoryVT() != MemVT)<br class="">-          continue;<br class="">-<br class="">-        BaseIndexOffset Ptr = BaseIndexOffset::match(OtherST<wbr class="">->getBasePtr(), DAG);<br class="">-<br class="">-        if (Ptr.equalBaseIndex(BasePtr))<br class="">-          StoreNodes.push_back(MemOpLink<wbr class="">(OtherST, Ptr.Offset, Seq++));<br class="">-      }<br class="">-    }<br class="">-<br class="">-    return;<br class="">-  }<br class="">-<br class="">-  while (Index) {<br class="">-    // If the chain has more than one use, then we can't reorder the mem ops.<br class="">-    if (Index != St && !SDValue(Index, 0)->hasOneUse())<br class="">-      break;<br class="">-<br class="">-    // Find the base pointer and offset for this memory node.<br class="">-    BaseIndexOffset Ptr = BaseIndexOffset::match(Index-><wbr class="">getBasePtr(), DAG);<br class="">-<br class="">-    // Check that the base pointer is the same as the original one.<br class="">-    if (!Ptr.equalBaseIndex(BasePtr))<br class="">-      break;<br class="">+  // We looking for a root node which is an ancestor to all mergable<br class="">+  // stores. We search up through a load, to our root and then down<br class="">+  // through all children. For instance we will find Store{1,2,3} if<br class="">+  // St is Store1, Store2. or Store3 where the root is not a load<br class="">+  // which always true for nonvolatile ops. TODO: Expand<br class="">+  // the search to find all valid candidates through multiple layers of loads.<br class="">+  //<br class="">+  // Root<br class="">+  // |-------|-------|<br class="">+  // Load    Load    Store3<br class="">+  // |       |<br class="">+  // Store1   Store2<br class="">+  //<br class="">+  // FIXME: We should be able to climb and<br class="">+  // descend TokenFactors to find candidates as well.<br class=""><br class="">-    // The memory operands must not be volatile.<br class="">-    if (Index->isVolatile() || Index->isIndexed())<br class="">-      break;<br class="">+  SDNode *RootNode = (St->getChain()).getNode();<br class=""><br class="">-    // No truncation.<br class="">-    if (Index->isTruncatingStore())<br class="">-      break;<br class="">+  // Set of Parents of Candidates<br class="">+  std::set<SDNode *> CandidateParents;<br class=""><br class="">-    // The stored memory type must be the same.<br class="">-    if (Index->getMemoryVT() != MemVT)<br class="">-      break;<br class="">-<br class="">-    // We do not allow under-aligned stores in order to prevent<br class="">-    // overriding stores. NOTE: this is a bad hack. Alignment SHOULD<br class="">-    // be irrelevant here; what MATTERS is that we not move memory<br class="">-    // operations that potentially overlap past each-other.<br class="">-    if (Index->getAlignment() < MemVT.getStoreSize())<br class="">-      break;<br class="">+  if (LoadSDNode *Ldn = dyn_cast<LoadSDNode>(RootNode)<wbr class="">) {<br class="">+    RootNode = Ldn->getChain().getNode();<br class="">+    for (auto I = RootNode->use_begin(), E = RootNode->use_end(); I != E; ++I)<br class="">+      if (I.getOperandNo() == 0 && isa<LoadSDNode>(*I)) // walk down chain<br class="">+        CandidateParents.insert(*I);<br class="">+  } else<br class="">+    CandidateParents.insert(RootNo<wbr class="">de);<br class=""><br class="">-    // We found a potential memory operand to merge.<br class="">-    StoreNodes.push_back(MemOpLink<wbr class="">(Index, Ptr.Offset, Seq++));<br class="">+  bool IsLoadSrc = isa<LoadSDNode>(St->getValue()<wbr class="">);<br class="">+  bool IsConstantSrc = isa<ConstantSDNode>(St->getVal<wbr class="">ue()) ||<br class="">+                       isa<ConstantFPSDNode>(St->get<wbr class="">Value());<br class="">+  bool IsExtractVecSrc =<br class="">+      (St->getValue().getOpcode() == ISD::EXTRACT_VECTOR_ELT ||<br class="">+       St->getValue().getOpcode() == ISD::EXTRACT_SUBVECTOR);<br class="">+  auto CorrectValueKind = [&](StoreSDNode *Other) -> bool {<br class="">+    if (IsLoadSrc)<br class="">+      return isa<LoadSDNode>(Other->getValu<wbr class="">e());<br class="">+    if (IsConstantSrc)<br class="">+      return (isa<ConstantSDNode>(Other->ge<wbr class="">tValue()) ||<br class="">+              isa<ConstantFPSDNode>(Other->g<wbr class="">etValue()));<br class="">+    if (IsExtractVecSrc)<br class="">+      return (Other->getValue().getOpcode() == ISD::EXTRACT_VECTOR_ELT ||<br class="">+              Other->getValue().getOpcode() == ISD::EXTRACT_SUBVECTOR);<br class="">+    return false;<br class="">+  };<br class=""><br class="">-    // Find the next memory operand in the chain. If the next operand in the<br class="">-    // chain is a store then move up and continue the scan with the next<br class="">-    // memory operand. If the next operand is a load save it and use alias<br class="">-    // information to check if it interferes with anything.<br class="">-    SDNode *NextInChain = Index->getChain().getNode();<br class="">-    while (1) {<br class="">-      if (StoreSDNode *STn = dyn_cast<StoreSDNode>(NextInCh<wbr class="">ain)) {<br class="">-        // We found a store node. Use it for the next iteration.<br class="">-        Index = STn;<br class="">-        break;<br class="">-      } else if (LoadSDNode *Ldn = dyn_cast<LoadSDNode>(NextInCha<wbr class="">in)) {<br class="">-        if (Ldn->isVolatile()) {<br class="">-          Index = nullptr;<br class="">-          break;<br class="">+  // check all parents of mergable children<br class="">+  for (auto P = CandidateParents.begin(); P != CandidateParents.end(); ++P)<br class="">+    for (auto I = (*P)->use_begin(), E = (*P)->use_end(); I != E; ++I)<br class="">+      if (I.getOperandNo() == 0)<br class="">+        if (StoreSDNode *OtherST = dyn_cast<StoreSDNode>(*I)) {<br class="">+          if (OtherST->isVolatile() || OtherST->isIndexed())<br class="">+            continue;<br class="">+          // We can merge constant floats to equivalent integers<br class="">+          if (OtherST->getMemoryVT() != MemVT)<br class="">+            if (!(MemVT.isInteger() && MemVT.bitsEq(OtherST->getMemor<wbr class="">yVT()) &&<br class="">+                  isa<ConstantFPSDNode>(OtherST-<wbr class="">>getValue())))<br class="">+              continue;<br class="">+          BaseIndexOffset Ptr =<br class="">+              BaseIndexOffset::match(OtherST<wbr class="">->getBasePtr(), DAG);<br class="">+          if (Ptr.equalBaseIndex(BasePtr) && CorrectValueKind(OtherST))<br class="">+            StoreNodes.push_back(MemOpLink<wbr class="">(OtherST, Ptr.Offset));<br class="">         }<br class="">-<br class="">-        // Save the load node for later. Continue the scan.<br class="">-        AliasLoadNodes.push_back(Ldn);<br class="">-        NextInChain = Ldn->getChain().getNode();<br class="">-        continue;<br class="">-      } else {<br class="">-        Index = nullptr;<br class="">-        break;<br class="">-      }<br class="">-    }<br class="">-  }<br class=""> }<br class=""><br class=""> // We need to check that merging these stores does not cause a loop<br class="">@@ -12282,13 +12289,16 @@ bool DAGCombiner::checkMergeStoreCa<wbr class="">ndida<br class="">   return true;<br class=""> }<br class=""><br class="">-bool DAGCombiner::MergeConsecutiveS<wbr class="">tores(<br class="">-    StoreSDNode* St, SmallVectorImpl<MemOpLink> &StoreNodes) {<br class="">+bool DAGCombiner::MergeConsecutiveS<wbr class="">tores(StoreSDNode *St) {<br class="">   if (OptLevel == CodeGenOpt::None)<br class="">     return false;<br class=""><br class="">   EVT MemVT = St->getMemoryVT();<br class="">   int64_t ElementSizeBytes = MemVT.getSizeInBits() / 8;<br class="">+<br class="">+  if (MemVT.getSizeInBits() * 2 > MaximumLegalStoreInBits)<br class="">+    return false;<br class="">+<br class="">   bool NoVectors = DAG.getMachineFunction().getFu<wbr class="">nction()->hasFnAttribute(<br class="">       Attribute::NoImplicitFloat);<br class=""><br class="">@@ -12317,145 +12327,136 @@ bool DAGCombiner::MergeConsecutiveS<wbr class="">tores<br class="">   if (MemVT.isVector() && IsLoadSrc)<br class="">     return false;<br class=""><br class="">-  // Only look at ends of store sequences.<br class="">-  SDValue Chain = SDValue(St, 0);<br class="">-  if (Chain->hasOneUse() && Chain->use_begin()->getOpcode(<wbr class="">) == ISD::STORE)<br class="">-    return false;<br class="">-<br class="">-  // Save the LoadSDNodes that we find in the chain.<br class="">-  // We need to make sure that these nodes do not interfere with<br class="">-  // any of the store nodes.<br class="">-  SmallVector<LSBaseSDNode*, 8> AliasLoadNodes;<br class="">-<br class="">-  getStoreMergeAndAliasCandidate<wbr class="">s(St, StoreNodes, AliasLoadNodes);<br class="">+  SmallVector<MemOpLink, 8> StoreNodes;<br class="">+  // Find potential store merge candidates by searching through chain sub-DAG<br class="">+  getStoreMergeCandidates(St, StoreNodes);<br class=""><br class="">   // Check if there is anything to merge.<br class="">   if (StoreNodes.size() < 2)<br class="">     return false;<br class=""><br class="">-  // only do dependence check in AA case<br class="">-  bool UseAA = CombinerAA.getNumOccurrences() > 0 ? CombinerAA<br class="">-                                                  : DAG.getSubtarget().useAA();<br class="">-  if (UseAA && !checkMergeStoreCandidatesForD<wbr class="">ependencies(StoreNodes))<br class="">+  // Check that we can merge these candidates without causing a cycle<br class="">+  if (!checkMergeStoreCandidatesFor<wbr class="">Dependencies(StoreNodes))<br class="">     return false;<br class=""><br class="">   // Sort the memory operands according to their distance from the<br class="">-  // base pointer.  As a secondary criteria: make sure stores coming<br class="">-  // later in the code come first in the list. This is important for<br class="">-  // the non-UseAA case, because we're merging stores into the FINAL<br class="">-  // store along a chain which potentially contains aliasing stores.<br class="">-  // Thus, if there are multiple stores to the same address, the last<br class="">-  // one can be considered for merging but not the others.<br class="">+  // base pointer.<br class="">   std::sort(StoreNodes.begin(), StoreNodes.end(),<br class="">             [](MemOpLink LHS, MemOpLink RHS) {<br class="">-    return LHS.OffsetFromBase < RHS.OffsetFromBase ||<br class="">-           (LHS.OffsetFromBase == RHS.OffsetFromBase &&<br class="">-            LHS.SequenceNum < RHS.SequenceNum);<br class="">-  });<br class="">+              return LHS.OffsetFromBase < RHS.OffsetFromBase;<br class="">+            });<br class=""><br class="">   // Scan the memory operations on the chain and find the first non-consecutive<br class="">   // store memory address.<br class="">-  unsigned LastConsecutiveStore = 0;<br class="">+  unsigned NumConsecutiveStores = 0;<br class="">   int64_t StartAddress = StoreNodes[0].OffsetFromBase;<br class="">-  for (unsigned i = 0, e = StoreNodes.size(); i < e; ++i) {<br class="">-<br class="">-    // Check that the addresses are consecutive starting from the second<br class="">-    // element in the list of stores.<br class="">-    if (i > 0) {<br class="">-      int64_t CurrAddress = StoreNodes[i].OffsetFromBase;<br class="">-      if (CurrAddress - StartAddress != (ElementSizeBytes * i))<br class="">-        break;<br class="">-    }<br class=""><br class="">-    // Check if this store interferes with any of the loads that we found.<br class="">-    // If we find a load that alias with this store. Stop the sequence.<br class="">-    if (any_of(AliasLoadNodes, [&](LSBaseSDNode *Ldn) {<br class="">-          return isAlias(Ldn, StoreNodes[i].MemNode);<br class="">-        }))<br class="">+  // Check that the addresses are consecutive starting from the second<br class="">+  // element in the list of stores.<br class="">+  for (unsigned i = 1, e = StoreNodes.size(); i < e; ++i) {<br class="">+    int64_t CurrAddress = StoreNodes[i].OffsetFromBase;<br class="">+    if (CurrAddress - StartAddress != (ElementSizeBytes * i))<br class="">       break;<br class="">-<br class="">-    // Mark this node as useful.<br class="">-    LastConsecutiveStore = i;<br class="">+    NumConsecutiveStores = i + 1;<br class="">   }<br class=""><br class="">+  if (NumConsecutiveStores < 2)<br class="">+    return false;<br class="">+<br class="">   // The node with the lowest store address.<br class="">-  LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode;<br class="">-  unsigned FirstStoreAS = FirstInChain->getAddressSpace(<wbr class="">);<br class="">-  unsigned FirstStoreAlign = FirstInChain->getAlignment();<br class="">   LLVMContext &Context = *DAG.getContext();<br class="">   const DataLayout &DL = DAG.getDataLayout();<br class=""><br class="">   // Store the constants into memory as one consecutive store.<br class="">   if (IsConstantSrc) {<br class="">-    unsigned LastLegalType = 0;<br class="">-    unsigned LastLegalVectorType = 0;<br class="">-    bool NonZero = false;<br class="">-    for (unsigned i=0; i<LastConsecutiveStore+1; ++i) {<br class="">-      StoreSDNode *St  = cast<StoreSDNode>(StoreNodes[i<wbr class="">].MemNode);<br class="">-      SDValue StoredVal = St->getValue();<br class="">-<br class="">-      if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Store<wbr class="">dVal)) {<br class="">-        NonZero |= !C->isNullValue();<br class="">-      } else if (ConstantFPSDNode *C = dyn_cast<ConstantFPSDNode>(Sto<wbr class="">redVal)) {<br class="">-        NonZero |= !C->getConstantFPValue()->isNu<wbr class="">llValue();<br class="">-      } else {<br class="">-        // Non-constant.<br class="">-        break;<br class="">-      }<br class="">+    bool RV = false;<br class="">+    while (NumConsecutiveStores > 1) {<br class="">+      LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode;<br class="">+      unsigned FirstStoreAS = FirstInChain->getAddressSpace(<wbr class="">);<br class="">+      unsigned FirstStoreAlign = FirstInChain->getAlignment();<br class="">+      unsigned LastLegalType = 0;<br class="">+      unsigned LastLegalVectorType = 0;<br class="">+      bool NonZero = false;<br class="">+      for (unsigned i = 0; i < NumConsecutiveStores; ++i) {<br class="">+        StoreSDNode *ST = cast<StoreSDNode>(StoreNodes[i<wbr class="">].MemNode);<br class="">+        SDValue StoredVal = ST->getValue();<br class="">+<br class="">+        if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Store<wbr class="">dVal)) {<br class="">+          NonZero |= !C->isNullValue();<br class="">+        } else if (ConstantFPSDNode *C =<br class="">+                       dyn_cast<ConstantFPSDNode>(St<wbr class="">oredVal)) {<br class="">+          NonZero |= !C->getConstantFPValue()->isNu<wbr class="">llValue();<br class="">+        } else {<br class="">+          // Non-constant.<br class="">+          break;<br class="">+        }<br class=""><br class="">-      // Find a legal type for the constant store.<br class="">-      unsigned SizeInBits = (i+1) * ElementSizeBytes * 8;<br class="">-      EVT StoreTy = EVT::getIntegerVT(Context, SizeInBits);<br class="">-      bool IsFast;<br class="">-      if (TLI.isTypeLegal(StoreTy) &&<br class="">-          TLI.allowsMemoryAccess(Context<wbr class="">, DL, StoreTy, FirstStoreAS,<br class="">-                                 FirstStoreAlign, &IsFast) && IsFast) {<br class="">-        LastLegalType = i+1;<br class="">-      // Or check whether a truncstore is legal.<br class="">-      } else if (TLI.getTypeAction(Context, StoreTy) ==<br class="">-                 TargetLowering::TypePromoteIn<wbr class="">teger) {<br class="">-        EVT LegalizedStoredValueTy =<br class="">-          TLI.getTypeToTransformTo(Conte<wbr class="">xt, StoredVal.getValueType());<br class="">-        if (TLI.isTruncStoreLegal(Legaliz<wbr class="">edStoredValueTy, StoreTy) &&<br class="">-            TLI.allowsMemoryAccess(Context<wbr class="">, DL, LegalizedStoredValueTy,<br class="">-                                   FirstStoreAS, FirstStoreAlign, &IsFast) &&<br class="">+        // Find a legal type for the constant store.<br class="">+        unsigned SizeInBits = (i + 1) * ElementSizeBytes * 8;<br class="">+        EVT StoreTy = EVT::getIntegerVT(Context, SizeInBits);<br class="">+        bool IsFast = false;<br class="">+        if (TLI.isTypeLegal(StoreTy) &&<br class="">+            TLI.allowsMemoryAccess(Context<wbr class="">, DL, StoreTy, FirstStoreAS,<br class="">+                                   FirstStoreAlign, &IsFast) &&<br class="">             IsFast) {<br class="">           LastLegalType = i + 1;<br class="">+          // Or check whether a truncstore is legal.<br class="">+        } else if (TLI.getTypeAction(Context, StoreTy) ==<br class="">+                   TargetLowering::TypePromoteIn<wbr class="">teger) {<br class="">+          EVT LegalizedStoredValueTy =<br class="">+              TLI.getTypeToTransformTo(Conte<wbr class="">xt, StoredVal.getValueType());<br class="">+          if (TLI.isTruncStoreLegal(Legaliz<wbr class="">edStoredValueTy, StoreTy) &&<br class="">+              TLI.allowsMemoryAccess(Context<wbr class="">, DL, LegalizedStoredValueTy,<br class="">+                                     FirstStoreAS, FirstStoreAlign, &IsFast) &&<br class="">+              IsFast) {<br class="">+            LastLegalType = i + 1;<br class="">+          }<br class="">         }<br class="">-      }<br class=""><br class="">-      // We only use vectors if the constant is known to be zero or the target<br class="">-      // allows it and the function is not marked with the noimplicitfloat<br class="">-      // attribute.<br class="">-      if ((!NonZero || TLI.storeOfVectorConstantIsChe<wbr class="">ap(MemVT, i+1,<br class="">-                                                        FirstStoreAS)) &&<br class="">-          !NoVectors) {<br class="">-        // Find a legal type for the vector store.<br class="">-        EVT Ty = EVT::getVectorVT(Context, MemVT, i+1);<br class="">-        if (TLI.isTypeLegal(Ty) &&<br class="">-            TLI.allowsMemoryAccess(Context<wbr class="">, DL, Ty, FirstStoreAS,<br class="">-                                   FirstStoreAlign, &IsFast) && IsFast)<br class="">-          LastLegalVectorType = i + 1;<br class="">+        // We only use vectors if the constant is known to be zero or the target<br class="">+        // allows it and the function is not marked with the noimplicitfloat<br class="">+        // attribute.<br class="">+        if ((!NonZero ||<br class="">+             TLI.storeOfVectorConstantIsCh<wbr class="">eap(MemVT, i + 1, FirstStoreAS)) &&<br class="">+            !NoVectors) {<br class="">+          // Find a legal type for the vector store.<br class="">+          EVT Ty = EVT::getVectorVT(Context, MemVT, i + 1);<br class="">+          if (TLI.isTypeLegal(Ty) && TLI.canMergeStoresTo(Ty) &&<br class="">+              TLI.allowsMemoryAccess(Context<wbr class="">, DL, Ty, FirstStoreAS,<br class="">+                                     FirstStoreAlign, &IsFast) &&<br class="">+              IsFast)<br class="">+            LastLegalVectorType = i + 1;<br class="">+        }<br class="">       }<br class="">-    }<br class=""><br class="">-    // Check if we found a legal integer type to store.<br class="">-    if (LastLegalType == 0 && LastLegalVectorType == 0)<br class="">-      return false;<br class="">+      // Check if we found a legal integer type that creates a meaningful merge.<br class="">+      if (LastLegalType < 2 && LastLegalVectorType < 2)<br class="">+        break;<br class=""><br class="">-    bool UseVector = (LastLegalVectorType > LastLegalType) && !NoVectors;<br class="">-    unsigned NumElem = UseVector ? LastLegalVectorType : LastLegalType;<br class="">+      bool UseVector = (LastLegalVectorType > LastLegalType) && !NoVectors;<br class="">+      unsigned NumElem = (UseVector) ? LastLegalVectorType : LastLegalType;<br class=""><br class="">-    return MergeStoresOfConstantsOrVecElt<wbr class="">s(StoreNodes, MemVT, NumElem,<br class="">-                                           true, UseVector);<br class="">+      bool Merged = MergeStoresOfConstantsOrVecElt<wbr class="">s(StoreNodes, MemVT, NumElem,<br class="">+                                                    true, UseVector);<br class="">+      if (!Merged)<br class="">+        break;<br class="">+      // Remove merged stores for next iteration.<br class="">+      StoreNodes.erase(<a href="http://storenodes.be/" target="_blank" class="">StoreNodes.be</a><wbr class="">gin(), StoreNodes.begin() + NumElem);<br class="">+      RV = true;<br class="">+      NumConsecutiveStores -= NumElem;<br class="">+    }<br class="">+    return RV;<br class="">   }<br class=""><br class="">   // When extracting multiple vector elements, try to store them<br class="">   // in one vector store rather than a sequence of scalar stores.<br class="">   if (IsExtractVecSrc) {<br class="">+    LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode;<br class="">+    unsigned FirstStoreAS = FirstInChain->getAddressSpace(<wbr class="">);<br class="">+    unsigned FirstStoreAlign = FirstInChain->getAlignment();<br class="">     unsigned NumStoresToMerge = 0;<br class="">     bool IsVec = MemVT.isVector();<br class="">-    for (unsigned i = 0; i < LastConsecutiveStore + 1; ++i) {<br class="">+    for (unsigned i = 0; i < NumConsecutiveStores; ++i) {<br class="">       StoreSDNode *St  = cast<StoreSDNode>(StoreNodes[i<wbr class="">].MemNode);<br class="">       unsigned StoreValOpcode = St->getValue().getOpcode();<br class="">       // This restriction could be loosened.<br class="">@@ -12495,7 +12496,7 @@ bool DAGCombiner::MergeConsecutiveS<wbr class="">tores<br class="">   // Find acceptable loads. Loads need to have the same chain (token factor),<br class="">   // must not be zext, volatile, indexed, and they must be consecutive.<br class="">   BaseIndexOffset LdBasePtr;<br class="">-  for (unsigned i=0; i<LastConsecutiveStore+1; ++i) {<br class="">+  for (unsigned i = 0; i < NumConsecutiveStores; ++i) {<br class="">     StoreSDNode *St  = cast<StoreSDNode>(StoreNodes[i<wbr class="">].MemNode);<br class="">     LoadSDNode *Ld = dyn_cast<LoadSDNode>(St->getVa<wbr class="">lue());<br class="">     if (!Ld) break;<br class="">@@ -12528,7 +12529,7 @@ bool DAGCombiner::MergeConsecutiveS<wbr class="">tores<br class="">     }<br class=""><br class="">     // We found a potential memory operand to merge.<br class="">-    LoadNodes.push_back(MemOpLink(<wbr class="">Ld, LdPtr.Offset, 0));<br class="">+    LoadNodes.push_back(MemOpLink(<wbr class="">Ld, LdPtr.Offset));<br class="">   }<br class=""><br class="">   if (LoadNodes.size() < 2)<br class="">@@ -12540,7 +12541,9 @@ bool DAGCombiner::MergeConsecutiveS<wbr class="">tores<br class="">   if (LoadNodes.size() == 2 && TLI.hasPairedLoad(MemVT, RequiredAlignment) &&<br class="">       St->getAlignment() >= RequiredAlignment)<br class="">     return false;<br class="">-<br class="">+  LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode;<br class="">+  unsigned FirstStoreAS = FirstInChain->getAddressSpace(<wbr class="">);<br class="">+  unsigned FirstStoreAlign = FirstInChain->getAlignment();<br class="">   LoadSDNode *FirstLoad = cast<LoadSDNode>(LoadNodes[0].<wbr class="">MemNode);<br class="">   unsigned FirstLoadAS = FirstLoad->getAddressSpace();<br class="">   unsigned FirstLoadAlign = FirstLoad->getAlignment();<br class="">@@ -12609,30 +12612,19 @@ bool DAGCombiner::MergeConsecutiveS<wbr class="">tores<br class=""><br class="">   // We add +1 here because the LastXXX variables refer to location while<br class="">   // the NumElem refers to array/index size.<br class="">-  unsigned NumElem = std::min(LastConsecutiveStore, LastConsecutiveLoad) + 1;<br class="">+  unsigned NumElem = std::min(NumConsecutiveStores, LastConsecutiveLoad + 1);<br class="">   NumElem = std::min(LastLegalType, NumElem);<br class=""><br class="">   if (NumElem < 2)<br class="">     return false;<br class=""><br class="">-  // Collect the chains from all merged stores.<br class="">+  // Collect the chains from all merged stores. Because the common case<br class="">+  // all chains are the same, check if we match the first Chain.<br class="">   SmallVector<SDValue, 8> MergeStoreChains;<br class="">   MergeStoreChains.push_back(St<wbr class="">oreNodes[0].MemNode->getChain(<wbr class="">));<br class="">-<br class="">-  // The latest Node in the DAG.<br class="">-  unsigned LatestNodeUsed = 0;<br class="">-  for (unsigned i=1; i<NumElem; ++i) {<br class="">-    // Find a chain for the new wide-store operand. Notice that some<br class="">-    // of the store nodes that we found may not be selected for inclusion<br class="">-    // in the wide store. The chain we use needs to be the chain of the<br class="">-    // latest store node which is *used* and replaced by the wide store.<br class="">-    if (StoreNodes[i].SequenceNum < StoreNodes[LatestNodeUsed].Seq<wbr class="">uenceNum)<br class="">-      LatestNodeUsed = i;<br class="">-<br class="">-    MergeStoreChains.push_back(Sto<wbr class="">reNodes[i].MemNode->getChain()<wbr class="">);<br class="">-  }<br class="">-<br class="">-  LSBaseSDNode *LatestOp = StoreNodes[LatestNodeUsed].Mem<wbr class="">Node;<br class="">+  for (unsigned i = 1; i < NumElem; ++i)<br class="">+    if (StoreNodes[0].MemNode->getCha<wbr class="">in() != StoreNodes[i].MemNode->getChai<wbr class="">n())<br class="">+      MergeStoreChains.push_back(Sto<wbr class="">reNodes[i].MemNode->getChain()<wbr class="">);<br class=""><br class="">   // Find if it is better to use vectors or integers to load and store<br class="">   // to memory.<br class="">@@ -12656,6 +12648,8 @@ bool DAGCombiner::MergeConsecutiveS<wbr class="">tores<br class="">   SDValue NewStoreChain =<br class="">     DAG.getNode(ISD::TokenFactor, StoreDL, MVT::Other, MergeStoreChains);<br class=""><br class="">+  AddToWorklist(NewStoreChain.ge<wbr class="">tNode());<br class="">+<br class="">   SDValue NewStore =<br class="">       DAG.getStore(NewStoreChain, StoreDL, NewLoad, FirstInChain->getBasePtr(),<br class="">                   <span class="Apple-converted-space"> </span>FirstInChain->getPointerInfo()<wbr class="">, FirstStoreAlign);<br class="">@@ -12667,25 +12661,9 @@ bool DAGCombiner::MergeConsecutiveS<wbr class="">tores<br class="">                                   SDValue(NewLoad.getNode(), 1));<br class="">   }<br class=""><br class="">-  if (UseAA) {<br class="">-    // Replace the all stores with the new store.<br class="">-    for (unsigned i = 0; i < NumElem; ++i)<br class="">-      CombineTo(StoreNodes[i].MemNod<wbr class="">e, NewStore);<br class="">-  } else {<br class="">-    // Replace the last store with the new store.<br class="">-    CombineTo(LatestOp, NewStore);<br class="">-    // Erase all other stores.<br class="">-    for (unsigned i = 0; i < NumElem; ++i) {<br class="">-      // Remove all Store nodes.<br class="">-      if (StoreNodes[i].MemNode == LatestOp)<br class="">-        continue;<br class="">-      StoreSDNode *St = cast<StoreSDNode>(StoreNodes[i<wbr class="">].MemNode);<br class="">-      DAG.ReplaceAllUsesOfValueWith(<wbr class="">SDValue(St, 0), St->getChain());<br class="">-      deleteAndRecombine(St);<br class="">-    }<br class="">-  }<br class="">-<br class="">-  StoreNodes.erase(<a href="http://storenodes.be/" target="_blank" class="">StoreNodes.be</a><wbr class="">gin() + NumElem, StoreNodes.end());<br class="">+  // Replace the all stores with the new store.<br class="">+  for (unsigned i = 0; i < NumElem; ++i)<br class="">+    CombineTo(StoreNodes[i].MemNod<wbr class="">e, NewStore);<br class="">   return true;<br class=""> }<br class=""><br class="">@@ -12842,19 +12820,7 @@ SDValue DAGCombiner::visitSTORE(SDNode *<br class="">   if (SDValue NewST = TransformFPLoadStorePair(N))<br class="">     return NewST;<br class=""><br class="">-  bool UseAA = CombinerAA.getNumOccurrences() > 0 ? CombinerAA<br class="">-                                                  : DAG.getSubtarget().useAA();<br class="">-#ifndef NDEBUG<br class="">-  if (CombinerAAOnlyFunc.getNumOccu<wbr class="">rrences() &&<br class="">-      CombinerAAOnlyFunc != DAG.getMachineFunction().getNa<wbr class="">me())<br class="">-    UseAA = false;<br class="">-#endif<br class="">-  if (UseAA && ST->isUnindexed()) {<br class="">-    // FIXME: We should do this even without AA enabled. AA will just allow<br class="">-    // FindBetterChain to work in more situations. The problem with this is that<br class="">-    // any combine that expects memory operations to be on consecutive chains<br class="">-    // first needs to be updated to look for users of the same chain.<br class="">-<br class="">+  if (ST->isUnindexed()) {<br class="">     // Walk up chain skipping non-aliasing memory nodes, on this store and any<br class="">     // adjacent stores.<br class="">     if (findBetterNeighborChains(ST)) {<br class="">@@ -12888,8 +12854,15 @@ SDValue DAGCombiner::visitSTORE(SDNode *<br class="">     if (SimplifyDemandedBits(<br class="">             Value,<br class="">             APInt::getLowBitsSet(Value.ge<wbr class="">tScalarValueSizeInBits(),<br class="">-                                 ST->getMemoryVT().getScalarSi<wbr class="">zeInBits())))<br class="">+                                 ST->getMemoryVT().getScalarSi<wbr class="">zeInBits()))) {<br class="">+      // Re-visit the store if anything changed and the store hasn't been merged<br class="">+      // with another node (N is deleted) SimplifyDemandedBits will add Value's<br class="">+      // node back to the worklist if necessary, but we also need to re-visit<br class="">+      // the Store node itself.<br class="">+      if (N->getOpcode() != ISD::DELETED_NODE)<br class="">+        AddToWorklist(N);<br class="">       return SDValue(N, 0);<br class="">+    }<br class="">   }<br class=""><br class="">   // If this is a load followed by a store to the same location, then the store<br class="">@@ -12933,15 +12906,12 @@ SDValue DAGCombiner::visitSTORE(SDNode *<br class="">       // There can be multiple store sequences on the same chain.<br class="">       // Keep trying to merge store sequences until we are unable to do so<br class="">       // or until we merge the last store on the chain.<br class="">-      SmallVector<MemOpLink, 8> StoreNodes;<br class="">-      bool Changed = MergeConsecutiveStores(ST, StoreNodes);<br class="">+      bool Changed = MergeConsecutiveStores(ST);<br class="">       if (!Changed) break;<br class="">-<br class="">-      if (any_of(StoreNodes,<br class="">-                 [ST](const MemOpLink &Link) { return Link.MemNode == ST; })) {<br class="">-        // ST has been merged and no longer exists.<br class="">+      // Return N as merge only uses CombineTo and no worklist clean<br class="">+      // up is necessary.<br class="">+      if (N->getOpcode() == ISD::DELETED_NODE || !isa<StoreSDNode>(N))<br class="">         return SDValue(N, 0);<br class="">-      }<br class="">     }<br class="">   }<br class=""><br class="">@@ -12950,7 +12920,7 @@ SDValue DAGCombiner::visitSTORE(SDNode *<br class="">   // Make sure to do this only after attempting to merge stores in order to<br class="">   //  avoid changing the types of some subset of stores due to visit order,<br class="">   //  preventing their merging.<br class="">-  if (isa<ConstantFPSDNode>(Value)) {<br class="">+  if (isa<ConstantFPSDNode>(ST->get<wbr class="">Value())) {<br class="">     if (SDValue NewSt = replaceStoreOfFPConstant(ST))<br class="">       return NewSt;<br class="">   }<br class="">@@ -13887,6 +13857,35 @@ SDValue DAGCombiner::visitBUILD_VECTOR<wbr class="">(S<br class="">   if (ISD::allOperandsUndef(N))<br class="">     return DAG.getUNDEF(VT);<br class=""><br class="">+  // Check if we can express BUILD VECTOR via subvector extract.<br class="">+  if (!LegalTypes && (N->getNumOperands() > 1)) {<br class="">+    SDValue Op0 = N->getOperand(0);<br class="">+    auto checkElem = [&](SDValue Op) -> uint64_t {<br class="">+      if ((Op.getOpcode() == ISD::EXTRACT_VECTOR_ELT) &&<br class="">+          (Op0.getOperand(0) == Op.getOperand(0)))<br class="">+        if (auto CNode = dyn_cast<ConstantSDNode>(Op.ge<wbr class="">tOperand(1)))<br class="">+          return CNode->getZExtValue();<br class="">+      return -1;<br class="">+    };<br class="">+<br class="">+    int Offset = checkElem(Op0);<br class="">+    for (unsigned i = 0; i < N->getNumOperands(); ++i) {<br class="">+      if (Offset + i != checkElem(N->getOperand(i))) {<br class="">+        Offset = -1;<br class="">+        break;<br class="">+      }<br class="">+    }<br class="">+<br class="">+    if ((Offset == 0) &&<br class="">+        (Op0.getOperand(0).getValueTyp<wbr class="">e() == N->getValueType(0)))<br class="">+      return Op0.getOperand(0);<br class="">+    if ((Offset != -1) &&<br class="">+        ((Offset % N->getValueType(0).getVectorNu<wbr class="">mElements()) ==<br class="">+         0)) // IDX must be multiple of output size.<br class="">+      return DAG.getNode(ISD::EXTRACT_SUBVE<wbr class="">CTOR, SDLoc(N), N->getValueType(0),<br class="">+                         Op0.getOperand(0), Op0.getOperand(1));<br class="">+  }<br class="">+<br class="">   if (SDValue V = reduceBuildVecExtToExtBuildVec<wbr class="">(N))<br class="">     return V;<br class=""><br class="">@@ -15983,7 +15982,7 @@ static bool FindBaseOffset(SDValue Ptr,<br class="">   if (Base.getOpcode() == ISD::ADD) {<br class="">     if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Base.<wbr class="">getOperand(1))) {<br class="">       Base = Base.getOperand(0);<br class="">-      Offset += C->getZExtValue();<br class="">+      Offset += C->getSExtValue();<br class="">     }<br class="">   }<br class=""><br class="">@@ -16180,6 +16179,12 @@ void DAGCombiner::GatherAllAliases(<wbr class="">SDNod<br class="">       ++Depth;<br class="">       break;<br class=""><br class="">+    case ISD::CopyFromReg:<br class="">+      // Forward past CopyFromReg.<br class="">+      Chains.push_back(Chain.getOper<wbr class="">and(0));<br class="">+      ++Depth;<br class="">+      break;<br class="">+<br class="">     default:<br class="">       // For all other instructions we will just have to take what we can get.<br class="">       Aliases.push_back(Chain);<br class="">@@ -16208,6 +16213,18 @@ SDValue DAGCombiner::FindBetterChain(S<wbr class="">DN<br class="">   return DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, Aliases);<br class=""> }<br class=""><br class="">+// This function tries to collect a bunch of potentially interesting<br class="">+// nodes to improve the chains of, all at once. This might seem<br class="">+// redundant, as this function gets called when visiting every store<br class="">+// node, so why not let the work be done on each store as it's visited?<br class="">+//<br class="">+// I believe this is mainly important because MergeConsecutiveStores<br class="">+// is unable to deal with merging stores of different sizes, so unless<br class="">+// we improve the chains of all the potential candidates up-front<br class="">+// before running MergeConsecutiveStores, it might only see some of<br class="">+// the nodes that will eventually be candidates, and then not be able<br class="">+// to go from a partially-merged state to the desired final<br class="">+// fully-merged state.<br class=""> bool DAGCombiner::findBetterNeighbo<wbr class="">rChains(StoreSDNode *St) {<br class="">   // This holds the base pointer, index, and the offset in bytes from the base<br class="">   // pointer.<br class="">@@ -16243,10 +16260,8 @@ bool DAGCombiner::findBetterNeighbo<wbr class="">rChai<br class="">     if (!Ptr.equalBaseIndex(BasePtr))<br class="">       break;<br class=""><br class="">-    // Find the next memory operand in the chain. If the next operand in the<br class="">-    // chain is a store then move up and continue the scan with the next<br class="">-    // memory operand. If the next operand is a load save it and use alias<br class="">-    // information to check if it interferes with anything.<br class="">+    // Walk up the chain to find the next store node, ignoring any<br class="">+    // intermediate loads. Any other kind of node will halt the loop.<br class="">     SDNode *NextInChain = Index->getChain().getNode();<br class="">     while (true) {<br class="">       if (StoreSDNode *STn = dyn_cast<StoreSDNode>(NextInCh<wbr class="">ain)) {<br class="">@@ -16265,9 +16280,14 @@ bool DAGCombiner::findBetterNeighbo<wbr class="">rChai<br class="">         Index = nullptr;<br class="">         break;<br class="">       }<br class="">-    }<br class="">+    } // end while<br class="">   }<br class=""><br class="">+  // At this point, ChainedStores lists all of the Store nodes<br class="">+  // reachable by iterating up through chain nodes matching the above<br class="">+  // conditions.  For each such store identified, try to find an<br class="">+  // earlier chain to attach the store to which won't violate the<br class="">+  // required ordering.<br class="">   bool MadeChangeToSt = false;<br class="">   SmallVector<std::pair<StoreSD<wbr class="">Node *, SDValue>, 8> BetterChains;<br class=""><br class=""><br class="">Modified: llvm/trunk/lib/CodeGen/TargetL<wbr class="">oweringBase.cpp<br class="">URL:<span class="Apple-converted-space"> </span><a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp?rev=297695&r1=297694&r2=297695&view=diff" rel="noreferrer" target="_blank" class="">http://llvm.org/viewvc/llvm-pr<wbr class="">oject/llvm/trunk/lib/CodeGen/T<wbr class="">argetLoweringBase.cpp?rev=2976<wbr class="">95&r1=297694&r2=297695&view=<wbr class="">diff</a><br class="">==============================<wbr class="">==============================<wbr class="">==================<br class="">--- llvm/trunk/lib/CodeGen/TargetL<wbr class="">oweringBase.cpp (original)<br class="">+++ llvm/trunk/lib/CodeGen/TargetL<wbr class="">oweringBase.cpp Mon Mar<span class="Apple-converted-space"> </span></blockquote></div></div></blockquote></div></div></div></div></blockquote></div></div></div>...</blockquote></div></div></div></blockquote></div><br class=""></body></html>