<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div><br class=""><blockquote type="cite" class=""><div class=""><div dir="auto" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><div dir="auto" class=""><br class=""></div><div dir="auto" class="">Nirav</div></div><div class="gmail_extra" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><br class=""><div class="gmail_quote">On Mar 16, 2017 16:19, "Aditya Nandakumar" <<a href="mailto:aditya_nandakumar@apple.com" class="">aditya_nandakumar@apple.com</a>> wrote:<br type="attribution" class=""><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><div style="word-wrap: break-word;" class="">Hi Nirav<div class=""><br class=""></div><div class="">This patch is affecting our internal backends (large instruction count regressions). I haven’t completely gone through your patch but form what I see, the problem seems to be that we don’t handle</div><div class="">descending into TokenFactors (in getStoreMergeCandidates).</div><div class="">I also see a relevant FIXME which matches what I observe as missing. I have the relevant DAG dump from before and after this change below.</div><div class="">Before:</div><div class=""><div class=""><br class=""></div><div class=""> <span class="Apple-converted-space"> </span>t17: i64 = add t6, Constant:i64<4></div><div class=""> <span class="Apple-converted-space"> </span>t18: ch = store<ST1[%dst.gep2.i105.4](<wbr class="">align=2)> t15, Constant:i8<0>, t17, undef:i64</div><div class=""> <span class="Apple-converted-space"> </span>t20: i64 = add t6, Constant:i64<5></div><div class=""> <span class="Apple-converted-space"> </span>t21: ch = store<ST1[%dst.gep2.i105.5]> t18, Constant:i8<0>, t20, undef:i64</div></div><div class=""><br class=""></div><div class="">After:</div><div class=""> t17: i64 = add t6, Constant:i64<4><br class=""> t18: ch = store<ST1[%dst.gep2.i105.4](<wbr class="">align=2)> t15, Constant:i8<0>, t17, undef:i64<br class=""> t20: i64 = add t6, Constant:i64<5><br class=""> t50: ch = store<ST1[%dst.gep2.i105.5]> t0, Constant:i8<0>, t20, undef:i64<br class=""> t51: ch = TokenFactor t18, t50</div><div class=""><br class=""></div><div class="">Clearly we need to handle TokenFactors for getStoreMergeCandidates.</div><div class=""><br class=""></div><div class="">Would it be possible to revert this patch and commit it again once you handle TokenFactors? Do you have an ETA for the TokenFactors handling ?</div><div class=""><br class=""></div><div class="">Thanks</div><div class="">Aditya</div><div class=""><div class=""><blockquote type="cite" class=""><div class="">On Mar 13, 2017, at 6:50 PM, Nirav Davé via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" target="_blank" class="">llvm-commits@lists.llvm.org</a>> wrote:</div><br class="m_-8394487157821945359Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="">Yes. It'll be in presently.<br class=""></div><div class=""><br class=""></div><div class="">Thanks, </div><div class=""><br class=""></div><div class="">-Nirav</div><div class=""><br class=""></div><div class="gmail_extra"><div class="gmail_quote">On Mon, Mar 13, 2017 at 9:23 PM, Craig Topper<span class="Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:craig.topper@gmail.com" target="_blank" class="">craig.topper@gmail.com</a>></span><span class="Apple-converted-space"> </span>wrote:</div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><div dir="ltr" class="">Will you also be restoring my fix for i256-add.ll?</div><div class="gmail_extra"><br clear="all" class=""><div class=""><div class="m_-8394487157821945359m_4108594283957026219gmail_signature" data-smartmail="gmail_signature">~Craig</div></div><br class=""><div class="gmail_quote">On Mon, Mar 13, 2017 at 5:34 PM, Nirav Dave via llvm-commits<span class="Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank" class="">llvm-commits@lists.llvm.org</a>></span><span class="Apple-converted-space"> </span>wrote:<br class=""><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;">Author: niravd<br class="">Date: Mon Mar 13 19:34:14 2017<br class="">New Revision: 297695<br class=""><br class="">URL:<span class="Apple-converted-space"> </span><a href="http://llvm.org/viewvc/llvm-project?rev=297695&view=rev" rel="noreferrer" target="_blank" class="">http://llvm.org/viewvc/llvm-pr<wbr class="">oject?rev=297695&view=rev</a><br class="">Log:<br class="">In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.<br class=""><br class=""> <span class="Apple-converted-space"> </span>Recommiting with compiler time improvements<br class=""><br class=""> <span class="Apple-converted-space"> </span>Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.<br class=""><br class=""> <span class="Apple-converted-space"> </span>* Simplify Consecutive Merge Store Candidate Search<br class=""><br class=""> <span class="Apple-converted-space"> </span>Now that address aliasing is much less conservative, push through<br class=""> <span class="Apple-converted-space"> </span>simplified store merging search and chain alias analysis which only<br class=""> <span class="Apple-converted-space"> </span>checks for parallel stores through the chain subgraph. This is cleaner<br class=""> <span class="Apple-converted-space"> </span>as the separation of non-interfering loads/stores from the<br class=""> <span class="Apple-converted-space"> </span>store-merging logic.<br class=""><br class=""> <span class="Apple-converted-space"> </span>When merging stores search up the chain through a single load, and<br class=""> <span class="Apple-converted-space"> </span>finds all possible stores by looking down from through a load and a<br class=""> <span class="Apple-converted-space"> </span>TokenFactor to all stores visited.<br class=""><br class=""> <span class="Apple-converted-space"> </span>This improves the quality of the output SelectionDAG and the output<br class=""> <span class="Apple-converted-space"> </span>Codegen (save perhaps for some ARM cases where we correctly constructs<br class=""> <span class="Apple-converted-space"> </span>wider loads, but then promotes them to float operations which appear<br class=""> <span class="Apple-converted-space"> </span>but requires more expensive constant generation).<br class=""><br class=""> <span class="Apple-converted-space"> </span>Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)<br class=""><br class=""> <span class="Apple-converted-space"> </span>Additional Minor Changes:<br class=""><br class=""> <span class="Apple-converted-space"> </span>1. Finishes removing unused AliasLoad code<br class=""><br class=""> <span class="Apple-converted-space"> </span>2. Unifies the chain aggregation in the merged stores across code<br class=""> paths<br class=""><br class=""> <span class="Apple-converted-space"> </span>3. Re-add the Store node to the worklist after calling<br class=""> SimplifyDemandedBits.<br class=""><br class=""> <span class="Apple-converted-space"> </span>4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is<br class=""> arbitrary, but seems sufficient to not cause regressions in<br class=""> tests.<br class=""><br class=""> <span class="Apple-converted-space"> </span>5. Remove Chain dependencies of Memory operations on CopyfromReg<br class=""> nodes as these are captured by data dependence<br class=""><br class=""> <span class="Apple-converted-space"> </span>6. Forward loads-store values through tokenfactors containing<br class=""> <span class="Apple-converted-space"> </span>{CopyToReg,CopyFromReg} Values.<br class=""><br class=""> <span class="Apple-converted-space"> </span>7. Peephole to convert buildvector of extract_vector_elt to<br class=""> extract_subvector if possible (see<br class=""> CodeGen/AArch64/store-merge.l<wbr class="">l)<br class=""><br class=""> <span class="Apple-converted-space"> </span>8. Store merging for the ARM target is restricted to 32-bit as<br class=""> some in some contexts invalid 64-bit operations are being<br class=""> generated. This can be removed once appropriate checks are<br class=""> added.<br class=""><br class=""> <span class="Apple-converted-space"> </span>This finishes the change Matt Arsenault started in r246307 and<br class=""> <span class="Apple-converted-space"> </span>jyknight's original patch.<br class=""><br class=""> <span class="Apple-converted-space"> </span>Many tests required some changes as memory operations are now<br class=""> <span class="Apple-converted-space"> </span>reorderable, improving load-store forwarding. One test in<br class=""> <span class="Apple-converted-space"> </span>particular is worth noting:<br class=""><br class=""> <span class="Apple-converted-space"> </span>CodeGen/PowerPC/ppc64-align-lo<wbr class="">ng-double.ll - Improved load-store<br class=""> <span class="Apple-converted-space"> </span>forwarding converts a load-store pair into a parallel store and<br class=""> <span class="Apple-converted-space"> </span>a memory-realized bitcast of the same value. However, because we<br class=""> <span class="Apple-converted-space"> </span>lose the sharing of the explicit and implicit store values we<br class=""> <span class="Apple-converted-space"> </span>must create another local store. A similar transformation<br class=""> <span class="Apple-converted-space"> </span>happens before SelectionDAG as well.<br class=""><br class=""> <span class="Apple-converted-space"> </span>Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle<br class=""><br class="">Added:<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/pr<wbr class="">32108.ll<br class="">Removed:<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/co<wbr class="">mbiner-aa-0.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/co<wbr class="">mbiner-aa-1.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/pr<wbr class="">18023.ll<br class="">Modified:<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/include/llvm/Target<wbr class="">/TargetLowering.h<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/lib/CodeGen/Selecti<wbr class="">onDAG/DAGCombiner.cpp<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/lib/CodeGen/TargetL<wbr class="">oweringBase.cpp<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/lib/Target/AArch64/<wbr class="">AArch64ISelLowering.cpp<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/lib/Target/ARM/ARMI<wbr class="">SelLowering.h<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AArch6<wbr class="">4/argument-blocks.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AArch6<wbr class="">4/arm64-abi.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AArch6<wbr class="">4/arm64-memset-inline.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AArch6<wbr class="">4/arm64-variadic-aapcs.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AArch6<wbr class="">4/merge-store.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AArch6<wbr class="">4/vector_merge_dep_check.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AMDGPU<wbr class="">/debugger-insert-nops.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AMDGPU<wbr class="">/insert_vector_elt.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AMDGPU<wbr class="">/merge-stores.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AMDGPU<wbr class="">/private-element-size.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/AMDGPU<wbr class="">/si-triv-disjoint-mem-access.l<wbr class="">l<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/ARM/20<wbr class="">12-10-04-AAPCS-byval-align8.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/ARM/al<wbr class="">loc-no-stack-realign.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/ARM/gp<wbr class="">r-paired-spill.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/ARM/if<wbr class="">cvt10.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/ARM/il<wbr class="">legal-bitfield-loadstore.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/ARM/st<wbr class="">atic-addr-hoisting.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/BPF/un<wbr class="">def.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/MSP430<wbr class="">/Inst16mm.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/c<wbr class="">conv/arguments-float.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/c<wbr class="">conv/arguments-varargs.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/f<wbr class="">astcc.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/l<wbr class="">oad-store-left-right.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/m<wbr class="">icromips-li.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/m<wbr class="">ips64-f128-call.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/m<wbr class="">ips64-f128.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/m<wbr class="">no-ldc1-sdc1.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/m<wbr class="">sa/f16-llvm-ir.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/m<wbr class="">sa/i5_ld_st.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/o<wbr class="">32_cc_byval.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Mips/o<wbr class="">32_cc_vararg.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/PowerP<wbr class="">C/anon_aggr.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/PowerP<wbr class="">C/complex-return.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/PowerP<wbr class="">C/jaggedstructs.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/PowerP<wbr class="">C/ppc64-align-long-double.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/PowerP<wbr class="">C/structsinmem.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/PowerP<wbr class="">C/structsinregs.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/System<wbr class="">Z/unaligned-01.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Thumb/<wbr class="">2010-07-15-debugOrdering.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/Thumb/<wbr class="">stack-access.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/20<wbr class="">10-09-17-SideEffectsInChain.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/20<wbr class="">12-11-28-merge-store-alias.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/Me<wbr class="">rgeConsecutiveStores.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/av<wbr class="">x-vbroadcast.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/av<wbr class="">x512-mask-op.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/ch<wbr class="">ain_order.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/cl<wbr class="">ear_upper_vector_element_bits.<wbr class="">ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/co<wbr class="">py-eflags.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/da<wbr class="">g-merge-fast-accesses.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/do<wbr class="">nt-trunc-store-double-to-float<wbr class="">.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/ex<wbr class="">tractelement-legalization-stor<wbr class="">e-ordering.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/i2<wbr class="">56-add.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/i3<wbr class="">86-shrink-wrapping.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/li<wbr class="">ve-range-nosubreg.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/lo<wbr class="">nglong-deadload.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/me<wbr class="">rge-consecutive-loads-128.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/me<wbr class="">rge-consecutive-loads-256.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/me<wbr class="">rge-store-partially-alias-load<wbr class="">s.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/sp<wbr class="">lit-store.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/st<wbr class="">ores-merging.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/ve<wbr class="">ctor-compare-results.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/ve<wbr class="">ctor-shuffle-variable-128.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/ve<wbr class="">ctor-shuffle-variable-256.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/ve<wbr class="">ctorcall.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/X86/wi<wbr class="">n32-eh.ll<br class=""> <span class="Apple-converted-space"> </span>llvm/trunk/test/CodeGen/XCore/<wbr class="">varargs.ll<br class=""><br class="">Modified: llvm/trunk/include/llvm/Target<wbr class="">/TargetLowering.h<br class="">URL:<span class="Apple-converted-space"> </span><a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Target/TargetLowering.h?rev=297695&r1=297694&r2=297695&view=diff" rel="noreferrer" target="_blank" class="">http://llvm.org/viewvc/llvm-pr<wbr class="">oject/llvm/trunk/include/llvm/<wbr class="">Target/TargetLowering.h?rev=29<wbr class="">7695&r1=297694&r2=297695&view=<wbr class="">diff</a><br class="">==============================<wbr class="">==============================<wbr class="">==================<br class="">--- llvm/trunk/include/llvm/Target<wbr class="">/TargetLowering.h (original)<br class="">+++ llvm/trunk/include/llvm/Target<wbr class="">/TargetLowering.h Mon Mar 13 19:34:14 2017<br class="">@@ -363,6 +363,9 @@ public:<br class=""> return false;<br class=""> }<br class=""><br class="">+ /// Returns if it's reasonable to merge stores to MemVT size.<br class="">+ virtual bool canMergeStoresTo(EVT MemVT) const { return true; }<br class="">+<br class=""> /// \brief Return true if it is cheap to speculate a call to intrinsic cttz.<br class=""> virtual bool isCheapToSpeculateCttz() const {<br class=""> return false;<br class=""><br class="">Modified: llvm/trunk/lib/CodeGen/Selecti<wbr class="">onDAG/DAGCombiner.cpp<br class="">URL:<span class="Apple-converted-space"> </span><a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=297695&r1=297694&r2=297695&view=diff" rel="noreferrer" target="_blank" class="">http://llvm.org/viewvc/llvm-pr<wbr class="">oject/llvm/trunk/lib/CodeGen/S<wbr class="">electionDAG/DAGCombiner.cpp?re<wbr class="">v=297695&r1=297694&r2=297695&<wbr class="">view=diff</a><br class="">==============================<wbr class="">==============================<wbr class="">==================<br class="">--- llvm/trunk/lib/CodeGen/Selecti<wbr class="">onDAG/DAGCombiner.cpp (original)<br class="">+++ llvm/trunk/lib/CodeGen/Selecti<wbr class="">onDAG/DAGCombiner.cpp Mon Mar 13 19:34:14 2017<br class="">@@ -53,10 +53,6 @@ STATISTIC(SlicedLoads, "Number of load s<br class=""><br class=""> namespace {<br class=""> static cl::opt<bool><br class="">- CombinerAA("combiner-alias-ana<wbr class="">lysis", cl::Hidden,<br class="">- cl::desc("Enable DAG combiner alias-analysis heuristics"));<br class="">-<br class="">- static cl::opt<bool><br class=""> CombinerGlobalAA("combiner-gl<wbr class="">obal-alias-analysis", cl::Hidden,<br class=""> <span class="Apple-converted-space"> </span>cl::desc("Enable DAG combiner's use of IR alias analysis"));<br class=""><br class="">@@ -133,6 +129,9 @@ namespace {<br class=""> /// Add to the worklist making sure its instance is at the back (next to be<br class=""> /// processed.)<br class=""> void AddToWorklist(SDNode *N) {<br class="">+ assert(N->getOpcode() != ISD::DELETED_NODE &&<br class="">+ "Deleted Node added to Worklist");<br class="">+<br class=""> // Skip handle nodes as they can't usefully be combined and confuse the<br class=""> // zero-use deletion strategy.<br class=""> if (N->getOpcode() == ISD::HANDLENODE)<br class="">@@ -177,6 +176,7 @@ namespace {<br class=""> void CommitTargetLoweringOpt(const TargetLowering::TargetLowering<wbr class="">Opt &TLO);<br class=""><br class=""> private:<br class="">+ unsigned MaximumLegalStoreInBits;<br class=""><br class=""> /// Check the specified integer node value to see if it can be simplified or<br class=""> /// if things it uses can be simplified by bit propagation.<br class="">@@ -422,15 +422,12 @@ namespace {<br class=""> /// Holds a pointer to an LSBaseSDNode as well as information on where it<br class=""> /// is located in a sequence of memory operations connected by a chain.<br class=""> struct MemOpLink {<br class="">- MemOpLink (LSBaseSDNode *N, int64_t Offset, unsigned Seq):<br class="">- MemNode(N), OffsetFromBase(Offset), SequenceNum(Seq) { }<br class="">+ MemOpLink(LSBaseSDNode *N, int64_t Offset)<br class="">+ : MemNode(N), OffsetFromBase(Offset) {}<br class=""> // Ptr to the mem node.<br class=""> LSBaseSDNode *MemNode;<br class=""> // Offset from the base ptr.<br class=""> int64_t OffsetFromBase;<br class="">- // What is the sequence number of this mem node.<br class="">- // Lowest mem operand in the DAG starts at zero.<br class="">- unsigned SequenceNum;<br class=""> };<br class=""><br class=""> /// This is a helper function for visitMUL to check the profitability<br class="">@@ -441,12 +438,6 @@ namespace {<br class=""> <span class="Apple-converted-space"> </span>SDValue &AddNode,<br class=""> <span class="Apple-converted-space"> </span>SDValue &ConstNode);<br class=""><br class="">- /// This is a helper function for MergeStoresOfConstantsOrVecElt<wbr class="">s. Returns a<br class="">- /// constant build_vector of the stored constant values in Stores.<br class="">- SDValue getMergedConstantVectorStore(S<wbr class="">electionDAG &DAG, const SDLoc &SL,<br class="">- ArrayRef<MemOpLink> Stores,<br class="">- SmallVectorImpl<SDValue> &Chains,<br class="">- EVT Ty) const;<br class=""><br class=""> /// This is a helper function for visitAND and visitZERO_EXTEND. Returns<br class=""> /// true if the (and (load x) c) pattern matches an extload. ExtVT returns<br class="">@@ -460,18 +451,15 @@ namespace {<br class=""> /// This is a helper function for MergeConsecutiveStores. When the source<br class=""> /// elements of the consecutive stores are all constants or all extracted<br class=""> /// vector elements, try to merge them into one larger store.<br class="">- /// \return number of stores that were merged into a merged store (always<br class="">- /// a prefix of \p StoreNode).<br class="">- bool MergeStoresOfConstantsOrVecElt<wbr class="">s(<br class="">- SmallVectorImpl<MemOpLink> &StoreNodes, EVT MemVT, unsigned NumStores,<br class="">- bool IsConstantSrc, bool UseVector);<br class="">+ /// \return True if a merged store was created.<br class="">+ bool MergeStoresOfConstantsOrVecElt<wbr class="">s(SmallVectorImpl<MemOpLink> &StoreNodes,<br class="">+ EVT MemVT, unsigned NumStores,<br class="">+ bool IsConstantSrc, bool UseVector);<br class=""><br class=""> /// This is a helper function for MergeConsecutiveStores.<br class=""> /// Stores that may be merged are placed in StoreNodes.<br class="">- /// Loads that may alias with those stores are placed in AliasLoadNodes.<br class="">- void getStoreMergeAndAliasCandidate<wbr class="">s(<br class="">- StoreSDNode* St, SmallVectorImpl<MemOpLink> &StoreNodes,<br class="">- SmallVectorImpl<LSBaseSDNode*> &AliasLoadNodes);<br class="">+ void getStoreMergeCandidates(StoreS<wbr class="">DNode *St,<br class="">+ SmallVectorImpl<MemOpLink> &StoreNodes);<br class=""><br class=""> /// Helper function for MergeConsecutiveStores. Checks if<br class=""> /// Candidate stores have indirect dependency through their<br class="">@@ -483,8 +471,7 @@ namespace {<br class=""> /// This optimization uses wide integers or vectors when possible.<br class=""> /// \return number of stores that were merged into a merged store (the<br class=""> /// affected nodes are stored as a prefix in \p StoreNodes).<br class="">- bool MergeConsecutiveStores(StoreSD<wbr class="">Node *N,<br class="">- SmallVectorImpl<MemOpLink> &StoreNodes);<br class="">+ bool MergeConsecutiveStores(StoreSD<wbr class="">Node *N);<br class=""><br class=""> /// \brief Try to transform a truncation where C is a constant:<br class=""> /// (trunc (and X, C)) -> (and (trunc X), (trunc C))<br class="">@@ -499,6 +486,13 @@ namespace {<br class=""> : DAG(D), TLI(D.getTargetLoweringInfo())<wbr class="">, Level(BeforeLegalizeTypes),<br class=""> OptLevel(OL), LegalOperations(false), LegalTypes(false), AA(A) {<br class=""> ForCodeSize = DAG.getMachineFunction().getFu<wbr class="">nction()->optForSize();<br class="">+<br class="">+ MaximumLegalStoreInBits = 0;<br class="">+ for (MVT VT : MVT::all_valuetypes())<br class="">+ if (EVT(VT).isSimple() && VT != MVT::Other &&<br class="">+ TLI.isTypeLegal(EVT(VT)) &&<br class="">+ VT.getSizeInBits() >= MaximumLegalStoreInBits)<br class="">+ MaximumLegalStoreInBits = VT.getSizeInBits();<br class=""> }<br class=""><br class=""> /// Runs the dag combiner on all nodes in the work list<br class="">@@ -1589,7 +1583,7 @@ SDValue DAGCombiner::visitTokenFactor(<wbr class="">SD<br class=""> }<br class=""><br class=""> SmallVector<SDNode *, 8> TFs; // List of token factors to visit.<br class="">- SmallVector<SDValue, 8> Ops; // Ops for replacing token factor.<br class="">+ SmallVector<SDValue, 8> Ops; // Ops for replacing token factor.<br class=""> SmallPtrSet<SDNode*, 16> SeenOps;<br class=""> bool Changed = false; // If we should replace this token factor.<br class=""><br class="">@@ -1633,6 +1627,86 @@ SDValue DAGCombiner::visitTokenFactor(<wbr class="">SD<br class=""> }<br class=""> }<br class=""><br class="">+ // Remove Nodes that are chained to another node in the list. Do so<br class="">+ // by walking up chains breath-first stopping when we've seen<br class="">+ // another operand. In general we must climb to the EntryNode, but we can exit<br class="">+ // early if we find all remaining work is associated with just one operand as<br class="">+ // no further pruning is possible.<br class="">+<br class="">+ // List of nodes to search through and original Ops from which they originate.<br class="">+ SmallVector<std::pair<SDNode *, unsigned>, 8> Worklist;<br class="">+ SmallVector<unsigned, 8> OpWorkCount; // Count of work for each Op.<br class="">+ SmallPtrSet<SDNode *, 16> SeenChains;<br class="">+ bool DidPruneOps = false;<br class="">+<br class="">+ unsigned NumLeftToConsider = 0;<br class="">+ for (const SDValue &Op : Ops) {<br class="">+ Worklist.push_back(std::make_p<wbr class="">air(Op.getNode(), NumLeftToConsider++));<br class="">+ OpWorkCount.push_back(1);<br class="">+ }<br class="">+<br class="">+ auto AddToWorklist = [&](unsigned CurIdx, SDNode *Op, unsigned OpNumber) {<br class="">+ // If this is an Op, we can remove the op from the list. Remark any<br class="">+ // search associated with it as from the current OpNumber.<br class="">+ if (SeenOps.count(Op) != 0) {<br class="">+ Changed = true;<br class="">+ DidPruneOps = true;<br class="">+ unsigned OrigOpNumber = 0;<br class="">+ while (Ops[OrigOpNumber].getNode() != Op && OrigOpNumber < Ops.size())<br class="">+ OrigOpNumber++;<br class="">+ assert((OrigOpNumber != Ops.size()) &&<br class="">+ "expected to find TokenFactor Operand");<br class="">+ // Re-mark worklist from OrigOpNumber to OpNumber<br class="">+ for (unsigned i = CurIdx + 1; i < Worklist.size(); ++i) {<br class="">+ if (Worklist[i].second == OrigOpNumber) {<br class="">+ Worklist[i].second = OpNumber;<br class="">+ }<br class="">+ }<br class="">+ OpWorkCount[OpNumber] += OpWorkCount[OrigOpNumber];<br class="">+ OpWorkCount[OrigOpNumber] = 0;<br class="">+ NumLeftToConsider--;<br class="">+ }<br class="">+ // Add if it's a new chain<br class="">+ if (SeenChains.insert(Op).second) {<br class="">+ OpWorkCount[OpNumber]++;<br class="">+ Worklist.push_back(std::make_p<wbr class="">air(Op, OpNumber));<br class="">+ }<br class="">+ };<br class="">+<br class="">+ for (unsigned i = 0; i < Worklist.size() && i < 1024; ++i) {<br class="">+ // We need at least be consider at least 2 Ops to prune.<br class="">+ if (NumLeftToConsider <= 1)<br class="">+ break;<br class="">+ auto CurNode = Worklist[i].first;<br class="">+ auto CurOpNumber = Worklist[i].second;<br class="">+ assert((OpWorkCount[CurOpNumbe<wbr class="">r] > 0) &&<br class="">+ "Node should not appear in worklist");<br class="">+ switch (CurNode->getOpcode()) {<br class="">+ case ISD::EntryToken:<br class="">+ // Hitting EntryToken is the only way for the search to terminate without<br class="">+ // hitting<br class="">+ // another operand's search. Prevent us from marking this operand<br class="">+ // considered.<br class="">+ NumLeftToConsider++;<br class="">+ break;<br class="">+ case ISD::TokenFactor:<br class="">+ for (const SDValue &Op : CurNode->op_values())<br class="">+ AddToWorklist(i, Op.getNode(), CurOpNumber);<br class="">+ break;<br class="">+ case ISD::CopyFromReg:<br class="">+ case ISD::CopyToReg:<br class="">+ AddToWorklist(i, CurNode->getOperand(0).getNode<wbr class="">(), CurOpNumber);<br class="">+ break;<br class="">+ default:<br class="">+ if (auto *MemNode = dyn_cast<MemSDNode>(CurNode))<br class="">+ AddToWorklist(i, MemNode->getChain().getNode(), CurOpNumber);<br class="">+ break;<br class="">+ }<br class="">+ OpWorkCount[CurOpNumber]--;<br class="">+ if (OpWorkCount[CurOpNumber] == 0)<br class="">+ NumLeftToConsider--;<br class="">+ }<br class="">+<br class=""> SDValue Result;<br class=""><br class=""> // If we've changed things around then replace token factor.<br class="">@@ -1641,15 +1715,22 @@ SDValue DAGCombiner::visitTokenFactor(<wbr class="">SD<br class=""> // The entry token is the only possible outcome.<br class=""> Result = DAG.getEntryNode();<br class=""> } else {<br class="">- // New and improved token factor.<br class="">- Result = DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, Ops);<br class="">+ if (DidPruneOps) {<br class="">+ SmallVector<SDValue, 8> PrunedOps;<br class="">+ //<br class="">+ for (const SDValue &Op : Ops) {<br class="">+ if (SeenChains.count(Op.getNode()<wbr class="">) == 0)<br class="">+ PrunedOps.push_back(Op);<br class="">+ }<br class="">+ Result = DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, PrunedOps);<br class="">+ } else {<br class="">+ Result = DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, Ops);<br class="">+ }<br class=""> }<br class=""><br class="">- // Add users to worklist if AA is enabled, since it may introduce<br class="">- // a lot of new chained token factors while removing memory deps.<br class="">- bool UseAA = CombinerAA.getNumOccurrences() > 0 ? CombinerAA<br class="">- : DAG.getSubtarget().useAA();<br class="">- return CombineTo(N, Result, UseAA /*add to worklist*/);<br class="">+ // Add users to worklist, since we may introduce a lot of new<br class="">+ // chained token factors while removing memory deps.<br class="">+ return CombineTo(N, Result, true /*add to worklist*/);<br class=""> }<br class=""><br class=""> return Result;<br class="">@@ -6792,6 +6873,9 @@ SDValue DAGCombiner::CombineExtLoad(SD<wbr class="">No<br class=""> SDValue NewChain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, Chains);<br class=""> SDValue NewValue = DAG.getNode(ISD::CONCAT_VECTOR<wbr class="">S, DL, DstVT, Loads);<br class=""><br class="">+ // Simplify TF.<br class="">+ AddToWorklist(NewChain.getNode<wbr class="">());<br class="">+<br class=""> CombineTo(N, NewValue);<br class=""><br class=""> // Replace uses of the original load (before extension)<br class="">@@ -10947,7 +11031,7 @@ SDValue DAGCombiner::visitLOAD(SDNode *N<br class=""> dbgs() << "\n");<br class=""> WorklistRemover DeadNodes(*this);<br class=""> DAG.ReplaceAllUsesOfValueWith<wbr class="">(SDValue(N, 1), Chain);<br class="">-<br class="">+ AddUsersToWorklist(Chain.getNo<wbr class="">de());<br class=""> if (N->use_empty())<br class=""> deleteAndRecombine(N);<br class=""><br class="">@@ -11000,7 +11084,7 @@ SDValue DAGCombiner::visitLOAD(SDNode *N<br class=""> StoreSDNode *PrevST = cast<StoreSDNode>(Chain);<br class=""> if (PrevST->getBasePtr() == Ptr &&<br class=""> PrevST->getValue().getValueTy<wbr class="">pe() == N->getValueType(0))<br class="">- return CombineTo(N, Chain.getOperand(1), Chain);<br class="">+ return CombineTo(N, PrevST->getOperand(1), Chain);<br class=""> }<br class=""> }<br class=""><br class="">@@ -11018,14 +11102,7 @@ SDValue DAGCombiner::visitLOAD(SDNode *N<br class=""> }<br class=""> }<br class=""><br class="">- bool UseAA = CombinerAA.getNumOccurrences() > 0 ? CombinerAA<br class="">- : DAG.getSubtarget().useAA();<br class="">-#ifndef NDEBUG<br class="">- if (CombinerAAOnlyFunc.getNumOccu<wbr class="">rrences() &&<br class="">- CombinerAAOnlyFunc != DAG.getMachineFunction().getNa<wbr class="">me())<br class="">- UseAA = false;<br class="">-#endif<br class="">- if (UseAA && LD->isUnindexed()) {<br class="">+ if (LD->isUnindexed()) {<br class=""> // Walk up chain skipping non-aliasing memory nodes.<br class=""> SDValue BetterChain = FindBetterChain(N, Chain);<br class=""><br class="">@@ -11607,6 +11684,7 @@ bool DAGCombiner::SliceUpLoad(SDNod<wbr class="">e *N)<br class=""> SDValue Chain = DAG.getNode(ISD::TokenFactor, SDLoc(LD), MVT::Other,<br class=""> ArgChains);<br class=""> DAG.ReplaceAllUsesOfValueWith<wbr class="">(SDValue(N, 1), Chain);<br class="">+ AddToWorklist(Chain.getNode())<wbr class="">;<br class=""> return true;<br class=""> }<br class=""><br class="">@@ -12000,20 +12078,6 @@ bool DAGCombiner::isMulAddWithConst<wbr class="">Profi<br class=""> return false;<br class=""> }<br class=""><br class="">-SDValue DAGCombiner::getMergedConstant<wbr class="">VectorStore(<br class="">- SelectionDAG &DAG, const SDLoc &SL, ArrayRef<MemOpLink> Stores,<br class="">- SmallVectorImpl<SDValue> &Chains, EVT Ty) const {<br class="">- SmallVector<SDValue, 8> BuildVector;<br class="">-<br class="">- for (unsigned I = 0, E = Ty.getVectorNumElements(); I != E; ++I) {<br class="">- StoreSDNode *St = cast<StoreSDNode>(Stores[I].Me<wbr class="">mNode);<br class="">- Chains.push_back(St->getChain(<wbr class="">));<br class="">- BuildVector.push_back(St->getV<wbr class="">alue());<br class="">- }<br class="">-<br class="">- return DAG.getBuildVector(Ty, SL, BuildVector);<br class="">-}<br class="">-<br class=""> bool DAGCombiner::MergeStoresOfCons<wbr class="">tantsOrVecElts(<br class=""> SmallVectorImpl<MemOpLink> &StoreNodes, EVT MemVT,<br class=""> unsigned NumStores, bool IsConstantSrc, bool UseVector) {<br class="">@@ -12022,22 +12086,8 @@ bool DAGCombiner::MergeStoresOfCons<wbr class="">tants<br class=""> return false;<br class=""><br class=""> int64_t ElementSizeBytes = MemVT.getSizeInBits() / 8;<br class="">- LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode;<br class="">- unsigned LatestNodeUsed = 0;<br class="">-<br class="">- for (unsigned i=0; i < NumStores; ++i) {<br class="">- // Find a chain for the new wide-store operand. Notice that some<br class="">- // of the store nodes that we found may not be selected for inclusion<br class="">- // in the wide store. The chain we use needs to be the chain of the<br class="">- // latest store node which is *used* and replaced by the wide store.<br class="">- if (StoreNodes[i].SequenceNum < StoreNodes[LatestNodeUsed].Seq<wbr class="">uenceNum)<br class="">- LatestNodeUsed = i;<br class="">- }<br class="">-<br class="">- SmallVector<SDValue, 8> Chains;<br class=""><br class=""> // The latest Node in the DAG.<br class="">- LSBaseSDNode *LatestOp = StoreNodes[LatestNodeUsed].Mem<wbr class="">Node;<br class=""> SDLoc DL(StoreNodes[0].MemNode);<br class=""><br class=""> SDValue StoredVal;<br class="">@@ -12053,7 +12103,18 @@ bool DAGCombiner::MergeStoresOfCons<wbr class="">tants<br class=""> assert(TLI.isTypeLegal(Ty) && "Illegal vector store");<br class=""><br class=""> if (IsConstantSrc) {<br class="">- StoredVal = getMergedConstantVectorStore(D<wbr class="">AG, DL, StoreNodes, Chains, Ty);<br class="">+ SmallVector<SDValue, 8> BuildVector;<br class="">+ for (unsigned I = 0, E = Ty.getVectorNumElements(); I != E; ++I) {<br class="">+ StoreSDNode *St = cast<StoreSDNode>(StoreNodes[I<wbr class="">].MemNode);<br class="">+ SDValue Val = St->getValue();<br class="">+ if (MemVT.getScalarType().isInteg<wbr class="">er())<br class="">+ if (auto *CFP = dyn_cast<ConstantFPSDNode>(St-<wbr class="">>getValue()))<br class="">+ Val = DAG.getConstant(<br class="">+ (uint32_t)CFP->getValueAPF().b<wbr class="">itcastToAPInt().getZExtValue()<wbr class="">,<br class="">+ SDLoc(CFP), MemVT);<br class="">+ BuildVector.push_back(Val);<br class="">+ }<br class="">+ StoredVal = DAG.getBuildVector(Ty, DL, BuildVector);<br class=""> } else {<br class=""> SmallVector<SDValue, 8> Ops;<br class=""> for (unsigned i = 0; i < NumStores; ++i) {<br class="">@@ -12063,7 +12124,6 @@ bool DAGCombiner::MergeStoresOfCons<wbr class="">tants<br class=""> if (Val.getValueType() != MemVT)<br class=""> return false;<br class=""> Ops.push_back(Val);<br class="">- Chains.push_back(St->getChain(<wbr class="">));<br class=""> }<br class=""><br class=""> // Build the extracted vector elements back into a vector.<br class="">@@ -12083,7 +12143,6 @@ bool DAGCombiner::MergeStoresOfCons<wbr class="">tants<br class=""> for (unsigned i = 0; i < NumStores; ++i) {<br class=""> unsigned Idx = IsLE ? (NumStores - 1 - i) : i;<br class=""> StoreSDNode *St = cast<StoreSDNode>(StoreNodes[I<wbr class="">dx].MemNode);<br class="">- Chains.push_back(St->getChain(<wbr class="">));<br class=""><br class=""> SDValue Val = St->getValue();<br class=""> StoreInt <<= ElementSizeBytes * 8;<br class="">@@ -12101,54 +12160,36 @@ bool DAGCombiner::MergeStoresOfCons<wbr class="">tants<br class=""> StoredVal = DAG.getConstant(StoreInt, DL, StoreTy);<br class=""> }<br class=""><br class="">- assert(!Chains.empty());<br class="">+ SmallVector<SDValue, 8> Chains;<br class="">+<br class="">+ // Gather all Chains we're inheriting. As generally all chains are<br class="">+ // equal, do minor check to remove obvious redundancies.<br class="">+ Chains.push_back(StoreNodes[0]<wbr class="">.MemNode->getChain());<br class="">+ for (unsigned i = 1; i < NumStores; ++i)<br class="">+ if (StoreNodes[0].MemNode->getCha<wbr class="">in() != StoreNodes[i].MemNode->getChai<wbr class="">n())<br class="">+ Chains.push_back(StoreNodes[i]<wbr class="">.MemNode->getChain());<br class=""><br class="">+ LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode;<br class=""> SDValue NewChain = DAG.getNode(ISD::TokenFactor, DL, MVT::Other, Chains);<br class=""> SDValue NewStore = DAG.getStore(NewChain, DL, StoredVal,<br class=""> FirstInChain->getBasePtr(),<br class=""> FirstInChain->getPointerInfo(<wbr class="">),<br class=""> FirstInChain->getAlignment())<wbr class="">;<br class=""><br class="">- bool UseAA = CombinerAA.getNumOccurrences() > 0 ? CombinerAA<br class="">- : DAG.getSubtarget().useAA();<br class="">- if (UseAA) {<br class="">- // Replace all merged stores with the new store.<br class="">- for (unsigned i = 0; i < NumStores; ++i)<br class="">- CombineTo(StoreNodes[i].MemNod<wbr class="">e, NewStore);<br class="">- } else {<br class="">- // Replace the last store with the new store.<br class="">- CombineTo(LatestOp, NewStore);<br class="">- // Erase all other stores.<br class="">- for (unsigned i = 0; i < NumStores; ++i) {<br class="">- if (StoreNodes[i].MemNode == LatestOp)<br class="">- continue;<br class="">- StoreSDNode *St = cast<StoreSDNode>(StoreNodes[i<wbr class="">].MemNode);<br class="">- // ReplaceAllUsesWith will replace all uses that existed when it was<br class="">- // called, but graph optimizations may cause new ones to appear. For<br class="">- // example, the case in pr14333 looks like<br class="">- //<br class="">- // St's chain -> St -> another store -> X<br class="">- //<br class="">- // And the only difference from St to the other store is the chain.<br class="">- // When we change it's chain to be St's chain they become identical,<br class="">- // get CSEed and the net result is that X is now a use of St.<br class="">- // Since we know that St is redundant, just iterate.<br class="">- while (!St->use_empty())<br class="">- DAG.ReplaceAllUsesWith(SDValue<wbr class="">(St, 0), St->getChain());<br class="">- deleteAndRecombine(St);<br class="">- }<br class="">- }<br class="">+ // Replace all merged stores with the new store.<br class="">+ for (unsigned i = 0; i < NumStores; ++i)<br class="">+ CombineTo(StoreNodes[i].MemNod<wbr class="">e, NewStore);<br class=""><br class="">- StoreNodes.erase(<a href="http://storenodes.be/" target="_blank" class="">StoreNodes.be</a><wbr class="">gin() + NumStores, StoreNodes.end());<br class="">+ AddToWorklist(NewChain.getNode<wbr class="">());<br class=""> return true;<br class=""> }<br class=""><br class="">-void DAGCombiner::getStoreMergeAndA<wbr class="">liasCandidates(<br class="">- StoreSDNode* St, SmallVectorImpl<MemOpLink> &StoreNodes,<br class="">- SmallVectorImpl<LSBaseSDNode*> &AliasLoadNodes) {<br class="">+void DAGCombiner::getStoreMergeCand<wbr class="">idates(<br class="">+ StoreSDNode *St, SmallVectorImpl<MemOpLink> &StoreNodes) {<br class=""> // This holds the base pointer, index, and the offset in bytes from the base<br class=""> // pointer.<br class=""> BaseIndexOffset BasePtr = BaseIndexOffset::match(St->get<wbr class="">BasePtr(), DAG);<br class="">+ EVT MemVT = St->getMemoryVT();<br class=""><br class=""> // We must have a base and an offset.<br class=""> if (!BasePtr.Base.getNode())<br class="">@@ -12158,104 +12199,70 @@ void DAGCombiner::getStoreMergeAndA<wbr class="">liasC<br class=""> if (BasePtr.Base.isUndef())<br class=""> return;<br class=""><br class="">- // Walk up the chain and look for nodes with offsets from the same<br class="">- // base pointer. Stop when reaching an instruction with a different kind<br class="">- // or instruction which has a different base pointer.<br class="">- EVT MemVT = St->getMemoryVT();<br class="">- unsigned Seq = 0;<br class="">- StoreSDNode *Index = St;<br class="">-<br class="">-<br class="">- bool UseAA = CombinerAA.getNumOccurrences() > 0 ? CombinerAA<br class="">- : DAG.getSubtarget().useAA();<br class="">-<br class="">- if (UseAA) {<br class="">- // Look at other users of the same chain. Stores on the same chain do not<br class="">- // alias. If combiner-aa is enabled, non-aliasing stores are canonicalized<br class="">- // to be on the same chain, so don't bother looking at adjacent chains.<br class="">-<br class="">- SDValue Chain = St->getChain();<br class="">- for (auto I = Chain->use_begin(), E = Chain->use_end(); I != E; ++I) {<br class="">- if (StoreSDNode *OtherST = dyn_cast<StoreSDNode>(*I)) {<br class="">- if (I.getOperandNo() != 0)<br class="">- continue;<br class="">-<br class="">- if (OtherST->isVolatile() || OtherST->isIndexed())<br class="">- continue;<br class="">-<br class="">- if (OtherST->getMemoryVT() != MemVT)<br class="">- continue;<br class="">-<br class="">- BaseIndexOffset Ptr = BaseIndexOffset::match(OtherST<wbr class="">->getBasePtr(), DAG);<br class="">-<br class="">- if (Ptr.equalBaseIndex(BasePtr))<br class="">- StoreNodes.push_back(MemOpLink<wbr class="">(OtherST, Ptr.Offset, Seq++));<br class="">- }<br class="">- }<br class="">-<br class="">- return;<br class="">- }<br class="">-<br class="">- while (Index) {<br class="">- // If the chain has more than one use, then we can't reorder the mem ops.<br class="">- if (Index != St && !SDValue(Index, 0)->hasOneUse())<br class="">- break;<br class="">-<br class="">- // Find the base pointer and offset for this memory node.<br class="">- BaseIndexOffset Ptr = BaseIndexOffset::match(Index-><wbr class="">getBasePtr(), DAG);<br class="">-<br class="">- // Check that the base pointer is the same as the original one.<br class="">- if (!Ptr.equalBaseIndex(BasePtr))<br class="">- break;<br class="">+ // We looking for a root node which is an ancestor to all mergable<br class="">+ // stores. We search up through a load, to our root and then down<br class="">+ // through all children. For instance we will find Store{1,2,3} if<br class="">+ // St is Store1, Store2. or Store3 where the root is not a load<br class="">+ // which always true for nonvolatile ops. TODO: Expand<br class="">+ // the search to find all valid candidates through multiple layers of loads.<br class="">+ //<br class="">+ // Root<br class="">+ // |-------|-------|<br class="">+ // Load Load Store3<br class="">+ // | |<br class="">+ // Store1 Store2<br class="">+ //<br class="">+ // FIXME: We should be able to climb and<br class="">+ // descend TokenFactors to find candidates as well.<br class=""><br class="">- // The memory operands must not be volatile.<br class="">- if (Index->isVolatile() || Index->isIndexed())<br class="">- break;<br class="">+ SDNode *RootNode = (St->getChain()).getNode();<br class=""><br class="">- // No truncation.<br class="">- if (Index->isTruncatingStore())<br class="">- break;<br class="">+ // Set of Parents of Candidates<br class="">+ std::set<SDNode *> CandidateParents;<br class=""><br class="">- // The stored memory type must be the same.<br class="">- if (Index->getMemoryVT() != MemVT)<br class="">- break;<br class="">-<br class="">- // We do not allow under-aligned stores in order to prevent<br class="">- // overriding stores. NOTE: this is a bad hack. Alignment SHOULD<br class="">- // be irrelevant here; what MATTERS is that we not move memory<br class="">- // operations that potentially overlap past each-other.<br class="">- if (Index->getAlignment() < MemVT.getStoreSize())<br class="">- break;<br class="">+ if (LoadSDNode *Ldn = dyn_cast<LoadSDNode>(RootNode)<wbr class="">) {<br class="">+ RootNode = Ldn->getChain().getNode();<br class="">+ for (auto I = RootNode->use_begin(), E = RootNode->use_end(); I != E; ++I)<br class="">+ if (I.getOperandNo() == 0 && isa<LoadSDNode>(*I)) // walk down chain<br class="">+ CandidateParents.insert(*I);<br class="">+ } else<br class="">+ CandidateParents.insert(RootNo<wbr class="">de);<br class=""><br class="">- // We found a potential memory operand to merge.<br class="">- StoreNodes.push_back(MemOpLink<wbr class="">(Index, Ptr.Offset, Seq++));<br class="">+ bool IsLoadSrc = isa<LoadSDNode>(St->getValue()<wbr class="">);<br class="">+ bool IsConstantSrc = isa<ConstantSDNode>(St->getVal<wbr class="">ue()) ||<br class="">+ isa<ConstantFPSDNode>(St->get<wbr class="">Value());<br class="">+ bool IsExtractVecSrc =<br class="">+ (St->getValue().getOpcode() == ISD::EXTRACT_VECTOR_ELT ||<br class="">+ St->getValue().getOpcode() == ISD::EXTRACT_SUBVECTOR);<br class="">+ auto CorrectValueKind = [&](StoreSDNode *Other) -> bool {<br class="">+ if (IsLoadSrc)<br class="">+ return isa<LoadSDNode>(Other->getValu<wbr class="">e());<br class="">+ if (IsConstantSrc)<br class="">+ return (isa<ConstantSDNode>(Other->ge<wbr class="">tValue()) ||<br class="">+ isa<ConstantFPSDNode>(Other->g<wbr class="">etValue()));<br class="">+ if (IsExtractVecSrc)<br class="">+ return (Other->getValue().getOpcode() == ISD::EXTRACT_VECTOR_ELT ||<br class="">+ Other->getValue().getOpcode() == ISD::EXTRACT_SUBVECTOR);<br class="">+ return false;<br class="">+ };<br class=""><br class="">- // Find the next memory operand in the chain. If the next operand in the<br class="">- // chain is a store then move up and continue the scan with the next<br class="">- // memory operand. If the next operand is a load save it and use alias<br class="">- // information to check if it interferes with anything.<br class="">- SDNode *NextInChain = Index->getChain().getNode();<br class="">- while (1) {<br class="">- if (StoreSDNode *STn = dyn_cast<StoreSDNode>(NextInCh<wbr class="">ain)) {<br class="">- // We found a store node. Use it for the next iteration.<br class="">- Index = STn;<br class="">- break;<br class="">- } else if (LoadSDNode *Ldn = dyn_cast<LoadSDNode>(NextInCha<wbr class="">in)) {<br class="">- if (Ldn->isVolatile()) {<br class="">- Index = nullptr;<br class="">- break;<br class="">+ // check all parents of mergable children<br class="">+ for (auto P = CandidateParents.begin(); P != CandidateParents.end(); ++P)<br class="">+ for (auto I = (*P)->use_begin(), E = (*P)->use_end(); I != E; ++I)<br class="">+ if (I.getOperandNo() == 0)<br class="">+ if (StoreSDNode *OtherST = dyn_cast<StoreSDNode>(*I)) {<br class="">+ if (OtherST->isVolatile() || OtherST->isIndexed())<br class="">+ continue;<br class="">+ // We can merge constant floats to equivalent integers<br class="">+ if (OtherST->getMemoryVT() != MemVT)<br class="">+ if (!(MemVT.isInteger() && MemVT.bitsEq(OtherST->getMemor<wbr class="">yVT()) &&<br class="">+ isa<ConstantFPSDNode>(OtherST-<wbr class="">>getValue())))<br class="">+ continue;<br class="">+ BaseIndexOffset Ptr =<br class="">+ BaseIndexOffset::match(OtherST<wbr class="">->getBasePtr(), DAG);<br class="">+ if (Ptr.equalBaseIndex(BasePtr) && CorrectValueKind(OtherST))<br class="">+ StoreNodes.push_back(MemOpLink<wbr class="">(OtherST, Ptr.Offset));<br class=""> }<br class="">-<br class="">- // Save the load node for later. Continue the scan.<br class="">- AliasLoadNodes.push_back(Ldn);<br class="">- NextInChain = Ldn->getChain().getNode();<br class="">- continue;<br class="">- } else {<br class="">- Index = nullptr;<br class="">- break;<br class="">- }<br class="">- }<br class="">- }<br class=""> }<br class=""><br class=""> // We need to check that merging these stores does not cause a loop<br class="">@@ -12282,13 +12289,16 @@ bool DAGCombiner::checkMergeStoreCa<wbr class="">ndida<br class=""> return true;<br class=""> }<br class=""><br class="">-bool DAGCombiner::MergeConsecutiveS<wbr class="">tores(<br class="">- StoreSDNode* St, SmallVectorImpl<MemOpLink> &StoreNodes) {<br class="">+bool DAGCombiner::MergeConsecutiveS<wbr class="">tores(StoreSDNode *St) {<br class=""> if (OptLevel == CodeGenOpt::None)<br class=""> return false;<br class=""><br class=""> EVT MemVT = St->getMemoryVT();<br class=""> int64_t ElementSizeBytes = MemVT.getSizeInBits() / 8;<br class="">+<br class="">+ if (MemVT.getSizeInBits() * 2 > MaximumLegalStoreInBits)<br class="">+ return false;<br class="">+<br class=""> bool NoVectors = DAG.getMachineFunction().getFu<wbr class="">nction()->hasFnAttribute(<br class=""> Attribute::NoImplicitFloat);<br class=""><br class="">@@ -12317,145 +12327,136 @@ bool DAGCombiner::MergeConsecutiveS<wbr class="">tores<br class=""> if (MemVT.isVector() && IsLoadSrc)<br class=""> return false;<br class=""><br class="">- // Only look at ends of store sequences.<br class="">- SDValue Chain = SDValue(St, 0);<br class="">- if (Chain->hasOneUse() && Chain->use_begin()->getOpcode(<wbr class="">) == ISD::STORE)<br class="">- return false;<br class="">-<br class="">- // Save the LoadSDNodes that we find in the chain.<br class="">- // We need to make sure that these nodes do not interfere with<br class="">- // any of the store nodes.<br class="">- SmallVector<LSBaseSDNode*, 8> AliasLoadNodes;<br class="">-<br class="">- getStoreMergeAndAliasCandidate<wbr class="">s(St, StoreNodes, AliasLoadNodes);<br class="">+ SmallVector<MemOpLink, 8> StoreNodes;<br class="">+ // Find potential store merge candidates by searching through chain sub-DAG<br class="">+ getStoreMergeCandidates(St, StoreNodes);<br class=""><br class=""> // Check if there is anything to merge.<br class=""> if (StoreNodes.size() < 2)<br class=""> return false;<br class=""><br class="">- // only do dependence check in AA case<br class="">- bool UseAA = CombinerAA.getNumOccurrences() > 0 ? CombinerAA<br class="">- : DAG.getSubtarget().useAA();<br class="">- if (UseAA && !checkMergeStoreCandidatesForD<wbr class="">ependencies(StoreNodes))<br class="">+ // Check that we can merge these candidates without causing a cycle<br class="">+ if (!checkMergeStoreCandidatesFor<wbr class="">Dependencies(StoreNodes))<br class=""> return false;<br class=""><br class=""> // Sort the memory operands according to their distance from the<br class="">- // base pointer. As a secondary criteria: make sure stores coming<br class="">- // later in the code come first in the list. This is important for<br class="">- // the non-UseAA case, because we're merging stores into the FINAL<br class="">- // store along a chain which potentially contains aliasing stores.<br class="">- // Thus, if there are multiple stores to the same address, the last<br class="">- // one can be considered for merging but not the others.<br class="">+ // base pointer.<br class=""> std::sort(StoreNodes.begin(), StoreNodes.end(),<br class=""> [](MemOpLink LHS, MemOpLink RHS) {<br class="">- return LHS.OffsetFromBase < RHS.OffsetFromBase ||<br class="">- (LHS.OffsetFromBase == RHS.OffsetFromBase &&<br class="">- LHS.SequenceNum < RHS.SequenceNum);<br class="">- });<br class="">+ return LHS.OffsetFromBase < RHS.OffsetFromBase;<br class="">+ });<br class=""><br class=""> // Scan the memory operations on the chain and find the first non-consecutive<br class=""> // store memory address.<br class="">- unsigned LastConsecutiveStore = 0;<br class="">+ unsigned NumConsecutiveStores = 0;<br class=""> int64_t StartAddress = StoreNodes[0].OffsetFromBase;<br class="">- for (unsigned i = 0, e = StoreNodes.size(); i < e; ++i) {<br class="">-<br class="">- // Check that the addresses are consecutive starting from the second<br class="">- // element in the list of stores.<br class="">- if (i > 0) {<br class="">- int64_t CurrAddress = StoreNodes[i].OffsetFromBase;<br class="">- if (CurrAddress - StartAddress != (ElementSizeBytes * i))<br class="">- break;<br class="">- }<br class=""><br class="">- // Check if this store interferes with any of the loads that we found.<br class="">- // If we find a load that alias with this store. Stop the sequence.<br class="">- if (any_of(AliasLoadNodes, [&](LSBaseSDNode *Ldn) {<br class="">- return isAlias(Ldn, StoreNodes[i].MemNode);<br class="">- }))<br class="">+ // Check that the addresses are consecutive starting from the second<br class="">+ // element in the list of stores.<br class="">+ for (unsigned i = 1, e = StoreNodes.size(); i < e; ++i) {<br class="">+ int64_t CurrAddress = StoreNodes[i].OffsetFromBase;<br class="">+ if (CurrAddress - StartAddress != (ElementSizeBytes * i))<br class=""> break;<br class="">-<br class="">- // Mark this node as useful.<br class="">- LastConsecutiveStore = i;<br class="">+ NumConsecutiveStores = i + 1;<br class=""> }<br class=""><br class="">+ if (NumConsecutiveStores < 2)<br class="">+ return false;<br class="">+<br class=""> // The node with the lowest store address.<br class="">- LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode;<br class="">- unsigned FirstStoreAS = FirstInChain->getAddressSpace(<wbr class="">);<br class="">- unsigned FirstStoreAlign = FirstInChain->getAlignment();<br class=""> LLVMContext &Context = *DAG.getContext();<br class=""> const DataLayout &DL = DAG.getDataLayout();<br class=""><br class=""> // Store the constants into memory as one consecutive store.<br class=""> if (IsConstantSrc) {<br class="">- unsigned LastLegalType = 0;<br class="">- unsigned LastLegalVectorType = 0;<br class="">- bool NonZero = false;<br class="">- for (unsigned i=0; i<LastConsecutiveStore+1; ++i) {<br class="">- StoreSDNode *St = cast<StoreSDNode>(StoreNodes[i<wbr class="">].MemNode);<br class="">- SDValue StoredVal = St->getValue();<br class="">-<br class="">- if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Store<wbr class="">dVal)) {<br class="">- NonZero |= !C->isNullValue();<br class="">- } else if (ConstantFPSDNode *C = dyn_cast<ConstantFPSDNode>(Sto<wbr class="">redVal)) {<br class="">- NonZero |= !C->getConstantFPValue()->isNu<wbr class="">llValue();<br class="">- } else {<br class="">- // Non-constant.<br class="">- break;<br class="">- }<br class="">+ bool RV = false;<br class="">+ while (NumConsecutiveStores > 1) {<br class="">+ LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode;<br class="">+ unsigned FirstStoreAS = FirstInChain->getAddressSpace(<wbr class="">);<br class="">+ unsigned FirstStoreAlign = FirstInChain->getAlignment();<br class="">+ unsigned LastLegalType = 0;<br class="">+ unsigned LastLegalVectorType = 0;<br class="">+ bool NonZero = false;<br class="">+ for (unsigned i = 0; i < NumConsecutiveStores; ++i) {<br class="">+ StoreSDNode *ST = cast<StoreSDNode>(StoreNodes[i<wbr class="">].MemNode);<br class="">+ SDValue StoredVal = ST->getValue();<br class="">+<br class="">+ if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Store<wbr class="">dVal)) {<br class="">+ NonZero |= !C->isNullValue();<br class="">+ } else if (ConstantFPSDNode *C =<br class="">+ dyn_cast<ConstantFPSDNode>(St<wbr class="">oredVal)) {<br class="">+ NonZero |= !C->getConstantFPValue()->isNu<wbr class="">llValue();<br class="">+ } else {<br class="">+ // Non-constant.<br class="">+ break;<br class="">+ }<br class=""><br class="">- // Find a legal type for the constant store.<br class="">- unsigned SizeInBits = (i+1) * ElementSizeBytes * 8;<br class="">- EVT StoreTy = EVT::getIntegerVT(Context, SizeInBits);<br class="">- bool IsFast;<br class="">- if (TLI.isTypeLegal(StoreTy) &&<br class="">- TLI.allowsMemoryAccess(Context<wbr class="">, DL, StoreTy, FirstStoreAS,<br class="">- FirstStoreAlign, &IsFast) && IsFast) {<br class="">- LastLegalType = i+1;<br class="">- // Or check whether a truncstore is legal.<br class="">- } else if (TLI.getTypeAction(Context, StoreTy) ==<br class="">- TargetLowering::TypePromoteIn<wbr class="">teger) {<br class="">- EVT LegalizedStoredValueTy =<br class="">- TLI.getTypeToTransformTo(Conte<wbr class="">xt, StoredVal.getValueType());<br class="">- if (TLI.isTruncStoreLegal(Legaliz<wbr class="">edStoredValueTy, StoreTy) &&<br class="">- TLI.allowsMemoryAccess(Context<wbr class="">, DL, LegalizedStoredValueTy,<br class="">- FirstStoreAS, FirstStoreAlign, &IsFast) &&<br class="">+ // Find a legal type for the constant store.<br class="">+ unsigned SizeInBits = (i + 1) * ElementSizeBytes * 8;<br class="">+ EVT StoreTy = EVT::getIntegerVT(Context, SizeInBits);<br class="">+ bool IsFast = false;<br class="">+ if (TLI.isTypeLegal(StoreTy) &&<br class="">+ TLI.allowsMemoryAccess(Context<wbr class="">, DL, StoreTy, FirstStoreAS,<br class="">+ FirstStoreAlign, &IsFast) &&<br class=""> IsFast) {<br class=""> LastLegalType = i + 1;<br class="">+ // Or check whether a truncstore is legal.<br class="">+ } else if (TLI.getTypeAction(Context, StoreTy) ==<br class="">+ TargetLowering::TypePromoteIn<wbr class="">teger) {<br class="">+ EVT LegalizedStoredValueTy =<br class="">+ TLI.getTypeToTransformTo(Conte<wbr class="">xt, StoredVal.getValueType());<br class="">+ if (TLI.isTruncStoreLegal(Legaliz<wbr class="">edStoredValueTy, StoreTy) &&<br class="">+ TLI.allowsMemoryAccess(Context<wbr class="">, DL, LegalizedStoredValueTy,<br class="">+ FirstStoreAS, FirstStoreAlign, &IsFast) &&<br class="">+ IsFast) {<br class="">+ LastLegalType = i + 1;<br class="">+ }<br class=""> }<br class="">- }<br class=""><br class="">- // We only use vectors if the constant is known to be zero or the target<br class="">- // allows it and the function is not marked with the noimplicitfloat<br class="">- // attribute.<br class="">- if ((!NonZero || TLI.storeOfVectorConstantIsChe<wbr class="">ap(MemVT, i+1,<br class="">- FirstStoreAS)) &&<br class="">- !NoVectors) {<br class="">- // Find a legal type for the vector store.<br class="">- EVT Ty = EVT::getVectorVT(Context, MemVT, i+1);<br class="">- if (TLI.isTypeLegal(Ty) &&<br class="">- TLI.allowsMemoryAccess(Context<wbr class="">, DL, Ty, FirstStoreAS,<br class="">- FirstStoreAlign, &IsFast) && IsFast)<br class="">- LastLegalVectorType = i + 1;<br class="">+ // We only use vectors if the constant is known to be zero or the target<br class="">+ // allows it and the function is not marked with the noimplicitfloat<br class="">+ // attribute.<br class="">+ if ((!NonZero ||<br class="">+ TLI.storeOfVectorConstantIsCh<wbr class="">eap(MemVT, i + 1, FirstStoreAS)) &&<br class="">+ !NoVectors) {<br class="">+ // Find a legal type for the vector store.<br class="">+ EVT Ty = EVT::getVectorVT(Context, MemVT, i + 1);<br class="">+ if (TLI.isTypeLegal(Ty) && TLI.canMergeStoresTo(Ty) &&<br class="">+ TLI.allowsMemoryAccess(Context<wbr class="">, DL, Ty, FirstStoreAS,<br class="">+ FirstStoreAlign, &IsFast) &&<br class="">+ IsFast)<br class="">+ LastLegalVectorType = i + 1;<br class="">+ }<br class=""> }<br class="">- }<br class=""><br class="">- // Check if we found a legal integer type to store.<br class="">- if (LastLegalType == 0 && LastLegalVectorType == 0)<br class="">- return false;<br class="">+ // Check if we found a legal integer type that creates a meaningful merge.<br class="">+ if (LastLegalType < 2 && LastLegalVectorType < 2)<br class="">+ break;<br class=""><br class="">- bool UseVector = (LastLegalVectorType > LastLegalType) && !NoVectors;<br class="">- unsigned NumElem = UseVector ? LastLegalVectorType : LastLegalType;<br class="">+ bool UseVector = (LastLegalVectorType > LastLegalType) && !NoVectors;<br class="">+ unsigned NumElem = (UseVector) ? LastLegalVectorType : LastLegalType;<br class=""><br class="">- return MergeStoresOfConstantsOrVecElt<wbr class="">s(StoreNodes, MemVT, NumElem,<br class="">- true, UseVector);<br class="">+ bool Merged = MergeStoresOfConstantsOrVecElt<wbr class="">s(StoreNodes, MemVT, NumElem,<br class="">+ true, UseVector);<br class="">+ if (!Merged)<br class="">+ break;<br class="">+ // Remove merged stores for next iteration.<br class="">+ StoreNodes.erase(<a href="http://storenodes.be/" target="_blank" class="">StoreNodes.be</a><wbr class="">gin(), StoreNodes.begin() + NumElem);<br class="">+ RV = true;<br class="">+ NumConsecutiveStores -= NumElem;<br class="">+ }<br class="">+ return RV;<br class=""> }<br class=""><br class=""> // When extracting multiple vector elements, try to store them<br class=""> // in one vector store rather than a sequence of scalar stores.<br class=""> if (IsExtractVecSrc) {<br class="">+ LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode;<br class="">+ unsigned FirstStoreAS = FirstInChain->getAddressSpace(<wbr class="">);<br class="">+ unsigned FirstStoreAlign = FirstInChain->getAlignment();<br class=""> unsigned NumStoresToMerge = 0;<br class=""> bool IsVec = MemVT.isVector();<br class="">- for (unsigned i = 0; i < LastConsecutiveStore + 1; ++i) {<br class="">+ for (unsigned i = 0; i < NumConsecutiveStores; ++i) {<br class=""> StoreSDNode *St = cast<StoreSDNode>(StoreNodes[i<wbr class="">].MemNode);<br class=""> unsigned StoreValOpcode = St->getValue().getOpcode();<br class=""> // This restriction could be loosened.<br class="">@@ -12495,7 +12496,7 @@ bool DAGCombiner::MergeConsecutiveS<wbr class="">tores<br class=""> // Find acceptable loads. Loads need to have the same chain (token factor),<br class=""> // must not be zext, volatile, indexed, and they must be consecutive.<br class=""> BaseIndexOffset LdBasePtr;<br class="">- for (unsigned i=0; i<LastConsecutiveStore+1; ++i) {<br class="">+ for (unsigned i = 0; i < NumConsecutiveStores; ++i) {<br class=""> StoreSDNode *St = cast<StoreSDNode>(StoreNodes[i<wbr class="">].MemNode);<br class=""> LoadSDNode *Ld = dyn_cast<LoadSDNode>(St->getVa<wbr class="">lue());<br class=""> if (!Ld) break;<br class="">@@ -12528,7 +12529,7 @@ bool DAGCombiner::MergeConsecutiveS<wbr class="">tores<br class=""> }<br class=""><br class=""> // We found a potential memory operand to merge.<br class="">- LoadNodes.push_back(MemOpLink(<wbr class="">Ld, LdPtr.Offset, 0));<br class="">+ LoadNodes.push_back(MemOpLink(<wbr class="">Ld, LdPtr.Offset));<br class=""> }<br class=""><br class=""> if (LoadNodes.size() < 2)<br class="">@@ -12540,7 +12541,9 @@ bool DAGCombiner::MergeConsecutiveS<wbr class="">tores<br class=""> if (LoadNodes.size() == 2 && TLI.hasPairedLoad(MemVT, RequiredAlignment) &&<br class=""> St->getAlignment() >= RequiredAlignment)<br class=""> return false;<br class="">-<br class="">+ LSBaseSDNode *FirstInChain = StoreNodes[0].MemNode;<br class="">+ unsigned FirstStoreAS = FirstInChain->getAddressSpace(<wbr class="">);<br class="">+ unsigned FirstStoreAlign = FirstInChain->getAlignment();<br class=""> LoadSDNode *FirstLoad = cast<LoadSDNode>(LoadNodes[0].<wbr class="">MemNode);<br class=""> unsigned FirstLoadAS = FirstLoad->getAddressSpace();<br class=""> unsigned FirstLoadAlign = FirstLoad->getAlignment();<br class="">@@ -12609,30 +12612,19 @@ bool DAGCombiner::MergeConsecutiveS<wbr class="">tores<br class=""><br class=""> // We add +1 here because the LastXXX variables refer to location while<br class=""> // the NumElem refers to array/index size.<br class="">- unsigned NumElem = std::min(LastConsecutiveStore, LastConsecutiveLoad) + 1;<br class="">+ unsigned NumElem = std::min(NumConsecutiveStores, LastConsecutiveLoad + 1);<br class=""> NumElem = std::min(LastLegalType, NumElem);<br class=""><br class=""> if (NumElem < 2)<br class=""> return false;<br class=""><br class="">- // Collect the chains from all merged stores.<br class="">+ // Collect the chains from all merged stores. Because the common case<br class="">+ // all chains are the same, check if we match the first Chain.<br class=""> SmallVector<SDValue, 8> MergeStoreChains;<br class=""> MergeStoreChains.push_back(St<wbr class="">oreNodes[0].MemNode->getChain(<wbr class="">));<br class="">-<br class="">- // The latest Node in the DAG.<br class="">- unsigned LatestNodeUsed = 0;<br class="">- for (unsigned i=1; i<NumElem; ++i) {<br class="">- // Find a chain for the new wide-store operand. Notice that some<br class="">- // of the store nodes that we found may not be selected for inclusion<br class="">- // in the wide store. The chain we use needs to be the chain of the<br class="">- // latest store node which is *used* and replaced by the wide store.<br class="">- if (StoreNodes[i].SequenceNum < StoreNodes[LatestNodeUsed].Seq<wbr class="">uenceNum)<br class="">- LatestNodeUsed = i;<br class="">-<br class="">- MergeStoreChains.push_back(Sto<wbr class="">reNodes[i].MemNode->getChain()<wbr class="">);<br class="">- }<br class="">-<br class="">- LSBaseSDNode *LatestOp = StoreNodes[LatestNodeUsed].Mem<wbr class="">Node;<br class="">+ for (unsigned i = 1; i < NumElem; ++i)<br class="">+ if (StoreNodes[0].MemNode->getCha<wbr class="">in() != StoreNodes[i].MemNode->getChai<wbr class="">n())<br class="">+ MergeStoreChains.push_back(Sto<wbr class="">reNodes[i].MemNode->getChain()<wbr class="">);<br class=""><br class=""> // Find if it is better to use vectors or integers to load and store<br class=""> // to memory.<br class="">@@ -12656,6 +12648,8 @@ bool DAGCombiner::MergeConsecutiveS<wbr class="">tores<br class=""> SDValue NewStoreChain =<br class=""> DAG.getNode(ISD::TokenFactor, StoreDL, MVT::Other, MergeStoreChains);<br class=""><br class="">+ AddToWorklist(NewStoreChain.ge<wbr class="">tNode());<br class="">+<br class=""> SDValue NewStore =<br class=""> DAG.getStore(NewStoreChain, StoreDL, NewLoad, FirstInChain->getBasePtr(),<br class=""> <span class="Apple-converted-space"> </span>FirstInChain->getPointerInfo()<wbr class="">, FirstStoreAlign);<br class="">@@ -12667,25 +12661,9 @@ bool DAGCombiner::MergeConsecutiveS<wbr class="">tores<br class=""> SDValue(NewLoad.getNode(), 1));<br class=""> }<br class=""><br class="">- if (UseAA) {<br class="">- // Replace the all stores with the new store.<br class="">- for (unsigned i = 0; i < NumElem; ++i)<br class="">- CombineTo(StoreNodes[i].MemNod<wbr class="">e, NewStore);<br class="">- } else {<br class="">- // Replace the last store with the new store.<br class="">- CombineTo(LatestOp, NewStore);<br class="">- // Erase all other stores.<br class="">- for (unsigned i = 0; i < NumElem; ++i) {<br class="">- // Remove all Store nodes.<br class="">- if (StoreNodes[i].MemNode == LatestOp)<br class="">- continue;<br class="">- StoreSDNode *St = cast<StoreSDNode>(StoreNodes[i<wbr class="">].MemNode);<br class="">- DAG.ReplaceAllUsesOfValueWith(<wbr class="">SDValue(St, 0), St->getChain());<br class="">- deleteAndRecombine(St);<br class="">- }<br class="">- }<br class="">-<br class="">- StoreNodes.erase(<a href="http://storenodes.be/" target="_blank" class="">StoreNodes.be</a><wbr class="">gin() + NumElem, StoreNodes.end());<br class="">+ // Replace the all stores with the new store.<br class="">+ for (unsigned i = 0; i < NumElem; ++i)<br class="">+ CombineTo(StoreNodes[i].MemNod<wbr class="">e, NewStore);<br class=""> return true;<br class=""> }<br class=""><br class="">@@ -12842,19 +12820,7 @@ SDValue DAGCombiner::visitSTORE(SDNode *<br class=""> if (SDValue NewST = TransformFPLoadStorePair(N))<br class=""> return NewST;<br class=""><br class="">- bool UseAA = CombinerAA.getNumOccurrences() > 0 ? CombinerAA<br class="">- : DAG.getSubtarget().useAA();<br class="">-#ifndef NDEBUG<br class="">- if (CombinerAAOnlyFunc.getNumOccu<wbr class="">rrences() &&<br class="">- CombinerAAOnlyFunc != DAG.getMachineFunction().getNa<wbr class="">me())<br class="">- UseAA = false;<br class="">-#endif<br class="">- if (UseAA && ST->isUnindexed()) {<br class="">- // FIXME: We should do this even without AA enabled. AA will just allow<br class="">- // FindBetterChain to work in more situations. The problem with this is that<br class="">- // any combine that expects memory operations to be on consecutive chains<br class="">- // first needs to be updated to look for users of the same chain.<br class="">-<br class="">+ if (ST->isUnindexed()) {<br class=""> // Walk up chain skipping non-aliasing memory nodes, on this store and any<br class=""> // adjacent stores.<br class=""> if (findBetterNeighborChains(ST)) {<br class="">@@ -12888,8 +12854,15 @@ SDValue DAGCombiner::visitSTORE(SDNode *<br class=""> if (SimplifyDemandedBits(<br class=""> Value,<br class=""> APInt::getLowBitsSet(Value.ge<wbr class="">tScalarValueSizeInBits(),<br class="">- ST->getMemoryVT().getScalarSi<wbr class="">zeInBits())))<br class="">+ ST->getMemoryVT().getScalarSi<wbr class="">zeInBits()))) {<br class="">+ // Re-visit the store if anything changed and the store hasn't been merged<br class="">+ // with another node (N is deleted) SimplifyDemandedBits will add Value's<br class="">+ // node back to the worklist if necessary, but we also need to re-visit<br class="">+ // the Store node itself.<br class="">+ if (N->getOpcode() != ISD::DELETED_NODE)<br class="">+ AddToWorklist(N);<br class=""> return SDValue(N, 0);<br class="">+ }<br class=""> }<br class=""><br class=""> // If this is a load followed by a store to the same location, then the store<br class="">@@ -12933,15 +12906,12 @@ SDValue DAGCombiner::visitSTORE(SDNode *<br class=""> // There can be multiple store sequences on the same chain.<br class=""> // Keep trying to merge store sequences until we are unable to do so<br class=""> // or until we merge the last store on the chain.<br class="">- SmallVector<MemOpLink, 8> StoreNodes;<br class="">- bool Changed = MergeConsecutiveStores(ST, StoreNodes);<br class="">+ bool Changed = MergeConsecutiveStores(ST);<br class=""> if (!Changed) break;<br class="">-<br class="">- if (any_of(StoreNodes,<br class="">- [ST](const MemOpLink &Link) { return Link.MemNode == ST; })) {<br class="">- // ST has been merged and no longer exists.<br class="">+ // Return N as merge only uses CombineTo and no worklist clean<br class="">+ // up is necessary.<br class="">+ if (N->getOpcode() == ISD::DELETED_NODE || !isa<StoreSDNode>(N))<br class=""> return SDValue(N, 0);<br class="">- }<br class=""> }<br class=""> }<br class=""><br class="">@@ -12950,7 +12920,7 @@ SDValue DAGCombiner::visitSTORE(SDNode *<br class=""> // Make sure to do this only after attempting to merge stores in order to<br class=""> // avoid changing the types of some subset of stores due to visit order,<br class=""> // preventing their merging.<br class="">- if (isa<ConstantFPSDNode>(Value)) {<br class="">+ if (isa<ConstantFPSDNode>(ST->get<wbr class="">Value())) {<br class=""> if (SDValue NewSt = replaceStoreOfFPConstant(ST))<br class=""> return NewSt;<br class=""> }<br class="">@@ -13887,6 +13857,35 @@ SDValue DAGCombiner::visitBUILD_VECTOR<wbr class="">(S<br class=""> if (ISD::allOperandsUndef(N))<br class=""> return DAG.getUNDEF(VT);<br class=""><br class="">+ // Check if we can express BUILD VECTOR via subvector extract.<br class="">+ if (!LegalTypes && (N->getNumOperands() > 1)) {<br class="">+ SDValue Op0 = N->getOperand(0);<br class="">+ auto checkElem = [&](SDValue Op) -> uint64_t {<br class="">+ if ((Op.getOpcode() == ISD::EXTRACT_VECTOR_ELT) &&<br class="">+ (Op0.getOperand(0) == Op.getOperand(0)))<br class="">+ if (auto CNode = dyn_cast<ConstantSDNode>(Op.ge<wbr class="">tOperand(1)))<br class="">+ return CNode->getZExtValue();<br class="">+ return -1;<br class="">+ };<br class="">+<br class="">+ int Offset = checkElem(Op0);<br class="">+ for (unsigned i = 0; i < N->getNumOperands(); ++i) {<br class="">+ if (Offset + i != checkElem(N->getOperand(i))) {<br class="">+ Offset = -1;<br class="">+ break;<br class="">+ }<br class="">+ }<br class="">+<br class="">+ if ((Offset == 0) &&<br class="">+ (Op0.getOperand(0).getValueTyp<wbr class="">e() == N->getValueType(0)))<br class="">+ return Op0.getOperand(0);<br class="">+ if ((Offset != -1) &&<br class="">+ ((Offset % N->getValueType(0).getVectorNu<wbr class="">mElements()) ==<br class="">+ 0)) // IDX must be multiple of output size.<br class="">+ return DAG.getNode(ISD::EXTRACT_SUBVE<wbr class="">CTOR, SDLoc(N), N->getValueType(0),<br class="">+ Op0.getOperand(0), Op0.getOperand(1));<br class="">+ }<br class="">+<br class=""> if (SDValue V = reduceBuildVecExtToExtBuildVec<wbr class="">(N))<br class=""> return V;<br class=""><br class="">@@ -15983,7 +15982,7 @@ static bool FindBaseOffset(SDValue Ptr,<br class=""> if (Base.getOpcode() == ISD::ADD) {<br class=""> if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Base.<wbr class="">getOperand(1))) {<br class=""> Base = Base.getOperand(0);<br class="">- Offset += C->getZExtValue();<br class="">+ Offset += C->getSExtValue();<br class=""> }<br class=""> }<br class=""><br class="">@@ -16180,6 +16179,12 @@ void DAGCombiner::GatherAllAliases(<wbr class="">SDNod<br class=""> ++Depth;<br class=""> break;<br class=""><br class="">+ case ISD::CopyFromReg:<br class="">+ // Forward past CopyFromReg.<br class="">+ Chains.push_back(Chain.getOper<wbr class="">and(0));<br class="">+ ++Depth;<br class="">+ break;<br class="">+<br class=""> default:<br class=""> // For all other instructions we will just have to take what we can get.<br class=""> Aliases.push_back(Chain);<br class="">@@ -16208,6 +16213,18 @@ SDValue DAGCombiner::FindBetterChain(S<wbr class="">DN<br class=""> return DAG.getNode(ISD::TokenFactor, SDLoc(N), MVT::Other, Aliases);<br class=""> }<br class=""><br class="">+// This function tries to collect a bunch of potentially interesting<br class="">+// nodes to improve the chains of, all at once. This might seem<br class="">+// redundant, as this function gets called when visiting every store<br class="">+// node, so why not let the work be done on each store as it's visited?<br class="">+//<br class="">+// I believe this is mainly important because MergeConsecutiveStores<br class="">+// is unable to deal with merging stores of different sizes, so unless<br class="">+// we improve the chains of all the potential candidates up-front<br class="">+// before running MergeConsecutiveStores, it might only see some of<br class="">+// the nodes that will eventually be candidates, and then not be able<br class="">+// to go from a partially-merged state to the desired final<br class="">+// fully-merged state.<br class=""> bool DAGCombiner::findBetterNeighbo<wbr class="">rChains(StoreSDNode *St) {<br class=""> // This holds the base pointer, index, and the offset in bytes from the base<br class=""> // pointer.<br class="">@@ -16243,10 +16260,8 @@ bool DAGCombiner::findBetterNeighbo<wbr class="">rChai<br class=""> if (!Ptr.equalBaseIndex(BasePtr))<br class=""> break;<br class=""><br class="">- // Find the next memory operand in the chain. If the next operand in the<br class="">- // chain is a store then move up and continue the scan with the next<br class="">- // memory operand. If the next operand is a load save it and use alias<br class="">- // information to check if it interferes with anything.<br class="">+ // Walk up the chain to find the next store node, ignoring any<br class="">+ // intermediate loads. Any other kind of node will halt the loop.<br class=""> SDNode *NextInChain = Index->getChain().getNode();<br class=""> while (true) {<br class=""> if (StoreSDNode *STn = dyn_cast<StoreSDNode>(NextInCh<wbr class="">ain)) {<br class="">@@ -16265,9 +16280,14 @@ bool DAGCombiner::findBetterNeighbo<wbr class="">rChai<br class=""> Index = nullptr;<br class=""> break;<br class=""> }<br class="">- }<br class="">+ } // end while<br class=""> }<br class=""><br class="">+ // At this point, ChainedStores lists all of the Store nodes<br class="">+ // reachable by iterating up through chain nodes matching the above<br class="">+ // conditions. For each such store identified, try to find an<br class="">+ // earlier chain to attach the store to which won't violate the<br class="">+ // required ordering.<br class=""> bool MadeChangeToSt = false;<br class=""> SmallVector<std::pair<StoreSD<wbr class="">Node *, SDValue>, 8> BetterChains;<br class=""><br class=""><br class="">Modified: llvm/trunk/lib/CodeGen/TargetL<wbr class="">oweringBase.cpp<br class="">URL:<span class="Apple-converted-space"> </span><a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp?rev=297695&r1=297694&r2=297695&view=diff" rel="noreferrer" target="_blank" class="">http://llvm.org/viewvc/llvm-pr<wbr class="">oject/llvm/trunk/lib/CodeGen/T<wbr class="">argetLoweringBase.cpp?rev=2976<wbr class="">95&r1=297694&r2=297695&view=<wbr class="">diff</a><br class="">==============================<wbr class="">==============================<wbr class="">==================<br class="">--- llvm/trunk/lib/CodeGen/TargetL<wbr class="">oweringBase.cpp (original)<br class="">+++ llvm/trunk/lib/CodeGen/TargetL<wbr class="">oweringBase.cpp Mon Mar<span class="Apple-converted-space"> </span></blockquote></div></div></blockquote></div></div></div></div></blockquote></div></div></div>...</blockquote></div></div></div></blockquote></div><br class=""></body></html>