[llvm] r297695 - In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.

Thu Apr 27 19:04:29 PDT 2017

Hi Vedant:

I've been looking at this for a bit. No fix yet, but here's what I've
determined so far:

The r297695 patch is correct in that the transformation to the SelectionDAG
is correct and the problem is a latent issue downstream. We're now
correctly determining that the Loads at stores in the example have no
dependence order. The core of the problem seems to be from duplicating the
shared stub load (which is why it's only in 32-bit X86 and not 64)  the two
loads of b (which cannot be shared). It seems like we're having an issue
with duplication of the SU in ScheduleDAGRRList::CopyAndMoveSuccessors
where are confused determining the unfolding the load causing us to
miscount the successors.

-Nirav

On Thu, Apr 27, 2017 at 7:29 PM, Vedant Kumar <vsk at apple.com> wrote:

> Hi Nirav,
>
> I saw a crasher on bugzilla and narrowed down the issue to this commit.
> Please see:
>
>   https://bugs.llvm.org/show_bug.cgi?id=32610
>
> The reduced test case compiles with r297692 (previous commit), but the
> crash occurs when r297695 (this commit) is included. The original C file,
> and a reduced IR file, is attached to the bug report.
>
> Could you take a look?
>
> thanks,
> vedant
>
>
> > On Mar 13, 2017, at 5:34 PM, Nirav Dave via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
> >
> > Author: niravd
> > Date: Mon Mar 13 19:34:14 2017
> > New Revision: 297695
> >
> > URL: http://llvm.org/viewvc/llvm-project?rev=297695&view=rev
> > Log:
> > In visitSTORE, always use FindBetterChain, rather than only when UseAA
> is enabled.
> >
> >    Recommiting with compiler time improvements
> >
> >    Recommitting after fixup of 32-bit aliasing sign offset bug in
> DAGCombiner.
> >
> >    * Simplify Consecutive Merge Store Candidate Search
> >
> >    Now that address aliasing is much less conservative, push through
> >    simplified store merging search and chain alias analysis which only
> >    checks for parallel stores through the chain subgraph. This is cleaner
> >    as the separation of non-interfering loads/stores from the
> >    store-merging logic.
> >
> >    When merging stores search up the chain through a single load, and
> >    finds all possible stores by looking down from through a load and a
> >    TokenFactor to all stores visited.
> >
> >    This improves the quality of the output SelectionDAG and the output
> >    Codegen (save perhaps for some ARM cases where we correctly constructs
> >    wider loads, but then promotes them to float operations which appear
> >    but requires more expensive constant generation).
> >
> >    Some minor peephole optimizations to deal with improved SubDAG shapes
> (listed below)
> >
> >    Additional Minor Changes:
> >
> >      1. Finishes removing unused AliasLoad code
> >
> >      2. Unifies the chain aggregation in the merged stores across code
> >         paths
> >
> >      3. Re-add the Store node to the worklist after calling
> >         SimplifyDemandedBits.
> >
> >      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
> >         arbitrary, but seems sufficient to not cause regressions in
> >         tests.
> >
> >      5. Remove Chain dependencies of Memory operations on CopyfromReg
> >         nodes as these are captured by data dependence
> >
> >      6. Forward loads-store values through tokenfactors containing
> >          {CopyToReg,CopyFromReg} Values.
> >
> >      7. Peephole to convert buildvector of extract_vector_elt to
> >         extract_subvector if possible (see
> >         CodeGen/AArch64/store-merge.ll)
> >
> >      8. Store merging for the ARM target is restricted to 32-bit as
> >         some in some contexts invalid 64-bit operations are being
> >         generated. This can be removed once appropriate checks are
> >         added.
> >
> >    This finishes the change Matt Arsenault started in r246307 and
> >    jyknight's original patch.
> >
> >    Many tests required some changes as memory operations are now
> >    reorderable, improving load-store forwarding. One test in
> >    particular is worth noting:
> >
> >      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
> >      forwarding converts a load-store pair into a parallel store and
> >      a memory-realized bitcast of the same value. However, because we
> >      lose the sharing of the explicit and implicit store values we
> >      must create another local store. A similar transformation
> >      happens before SelectionDAG as well.
> >
> >    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
> >
> > Added:
> >    llvm/trunk/test/CodeGen/X86/pr32108.ll
> > Removed:
> >    llvm/trunk/test/CodeGen/X86/combiner-aa-0.ll
> >    llvm/trunk/test/CodeGen/X86/combiner-aa-1.ll
> >    llvm/trunk/test/CodeGen/X86/pr18023.ll
> > Modified:
> >    llvm/trunk/include/llvm/Target/TargetLowering.h
> >    llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> >    llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp
> >    llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp
> >    llvm/trunk/lib/Target/ARM/ARMISelLowering.h
> >    llvm/trunk/test/CodeGen/AArch64/argument-blocks.ll
> >    llvm/trunk/test/CodeGen/AArch64/arm64-abi.ll
> >    llvm/trunk/test/CodeGen/AArch64/arm64-memset-inline.ll
> >    llvm/trunk/test/CodeGen/AArch64/arm64-variadic-aapcs.ll
> >    llvm/trunk/test/CodeGen/AArch64/merge-store.ll
> >    llvm/trunk/test/CodeGen/AArch64/vector_merge_dep_check.ll
> >    llvm/trunk/test/CodeGen/AMDGPU/debugger-insert-nops.ll
> >    llvm/trunk/test/CodeGen/AMDGPU/insert_vector_elt.ll
> >    llvm/trunk/test/CodeGen/AMDGPU/merge-stores.ll
> >    llvm/trunk/test/CodeGen/AMDGPU/private-element-size.ll
> >    llvm/trunk/test/CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll
> >    llvm/trunk/test/CodeGen/ARM/2012-10-04-AAPCS-byval-align8.ll
> >    llvm/trunk/test/CodeGen/ARM/alloc-no-stack-realign.ll
> >    llvm/trunk/test/CodeGen/ARM/gpr-paired-spill.ll
> >    llvm/trunk/test/CodeGen/ARM/ifcvt10.ll
> >    llvm/trunk/test/CodeGen/ARM/illegal-bitfield-loadstore.ll
> >    llvm/trunk/test/CodeGen/ARM/static-addr-hoisting.ll
> >    llvm/trunk/test/CodeGen/BPF/undef.ll
> >    llvm/trunk/test/CodeGen/MSP430/Inst16mm.ll
> >    llvm/trunk/test/CodeGen/Mips/cconv/arguments-float.ll
> >    llvm/trunk/test/CodeGen/Mips/cconv/arguments-varargs.ll
> >    llvm/trunk/test/CodeGen/Mips/fastcc.ll
> >    llvm/trunk/test/CodeGen/Mips/load-store-left-right.ll
> >    llvm/trunk/test/CodeGen/Mips/micromips-li.ll
> >    llvm/trunk/test/CodeGen/Mips/mips64-f128-call.ll
> >    llvm/trunk/test/CodeGen/Mips/mips64-f128.ll
> >    llvm/trunk/test/CodeGen/Mips/mno-ldc1-sdc1.ll
> >    llvm/trunk/test/CodeGen/Mips/msa/f16-llvm-ir.ll
> >    llvm/trunk/test/CodeGen/Mips/msa/i5_ld_st.ll
> >    llvm/trunk/test/CodeGen/Mips/o32_cc_byval.ll
> >    llvm/trunk/test/CodeGen/Mips/o32_cc_vararg.ll
> >    llvm/trunk/test/CodeGen/PowerPC/anon_aggr.ll
> >    llvm/trunk/test/CodeGen/PowerPC/complex-return.ll
> >    llvm/trunk/test/CodeGen/PowerPC/jaggedstructs.ll
> >    llvm/trunk/test/CodeGen/PowerPC/ppc64-align-long-double.ll
> >    llvm/trunk/test/CodeGen/PowerPC/structsinmem.ll
> >    llvm/trunk/test/CodeGen/PowerPC/structsinregs.ll
> >    llvm/trunk/test/CodeGen/SystemZ/unaligned-01.ll
> >    llvm/trunk/test/CodeGen/Thumb/2010-07-15-debugOrdering.ll
> >    llvm/trunk/test/CodeGen/Thumb/stack-access.ll
> >    llvm/trunk/test/CodeGen/X86/2010-09-17-SideEffectsInChain.ll
> >    llvm/trunk/test/CodeGen/X86/2012-11-28-merge-store-alias.ll
> >    llvm/trunk/test/CodeGen/X86/MergeConsecutiveStores.ll
> >    llvm/trunk/test/CodeGen/X86/avx-vbroadcast.ll
> >    llvm/trunk/test/CodeGen/X86/avx512-mask-op.ll
> >    llvm/trunk/test/CodeGen/X86/chain_order.ll
> >    llvm/trunk/test/CodeGen/X86/clear_upper_vector_element_bits.ll
> >    llvm/trunk/test/CodeGen/X86/copy-eflags.ll
> >    llvm/trunk/test/CodeGen/X86/dag-merge-fast-accesses.ll
> >    llvm/trunk/test/CodeGen/X86/dont-trunc-store-double-to-float.ll
> >    llvm/trunk/test/CodeGen/X86/extractelement-legalization-
> store-ordering.ll
> >    llvm/trunk/test/CodeGen/X86/i256-add.ll
> >    llvm/trunk/test/CodeGen/X86/i386-shrink-wrapping.ll
> >    llvm/trunk/test/CodeGen/X86/live-range-nosubreg.ll
> >    llvm/trunk/test/CodeGen/X86/longlong-deadload.ll
> >    llvm/trunk/test/CodeGen/X86/merge-consecutive-loads-128.ll
> >    llvm/trunk/test/CodeGen/X86/merge-consecutive-loads-256.ll
> >    llvm/trunk/test/CodeGen/X86/merge-store-partially-alias-loads.ll
> >    llvm/trunk/test/CodeGen/X86/split-store.ll
> >    llvm/trunk/test/CodeGen/X86/stores-merging.ll
> >    llvm/trunk/test/CodeGen/X86/vector-compare-results.ll
> >    llvm/trunk/test/CodeGen/X86/vector-shuffle-variable-128.ll
> >    llvm/trunk/test/CodeGen/X86/vector-shuffle-variable-256.ll
> >    llvm/trunk/test/CodeGen/X86/vectorcall.ll
> >    llvm/trunk/test/CodeGen/X86/win32-eh.ll
> >    llvm/trunk/test/CodeGen/XCore/varargs.ll
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170427/10339342/attachment.html>