[llvm] r297695 - In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.

Sanjoy Das via llvm-commits llvm-commits at lists.llvm.org
Thu May 11 15:30:00 PDT 2017


Hi,

This change causes llc to crash on the following input:

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
declare void @f()

define i32 addrspace(1)* @test(i32 addrspace(1)* %a, i32 addrspace(1)*
%b, i1 %which) gc "statepoint-example" {
entry:
  %tok = tail call token (i64, i32, void ()*, i32, i32, ...)
@llvm.experimental.gc.statepoint.p0f_isVoidf(i64 0, i32 0, void ()*
@f, i32 0, i32 0, i32 0, i32 0, i32 addrspace(1)* %a, i32
addrspace(1)* %b)
  %a.r = tail call coldcc i8 addrspace(1)*
@llvm.experimental.gc.relocate.p1i8(token %tok, i32 7, i32 7) ; (%a,
%a)
  %b.r = tail call coldcc i8 addrspace(1)*
@llvm.experimental.gc.relocate.p1i8(token %tok, i32 8, i32 8) ; (%b,
%b)
  %cond.v = select i1 %which, i8 addrspace(1)* %a.r, i8 addrspace(1)* %b.r
  %cond = bitcast i8 addrspace(1)* %cond.v to i32 addrspace(1)*
  ret i32 addrspace(1)* %cond
}

declare token @llvm.experimental.gc.statepoint.p0f_isVoidf(i64, i32,
void ()*, i32, i32, ...)
declare i8 addrspace(1)* @llvm.experimental.gc.relocate.p1i8(token, i32, i32)

I've filed PR33010 for this, can you please take a look?

-- Sanjoy

On Thu, Apr 27, 2017 at 7:04 PM, Nirav Davé via llvm-commits
<llvm-commits at lists.llvm.org> wrote:
> Hi Vedant:
>
> I've been looking at this for a bit. No fix yet, but here's what I've
> determined so far:
>
> The r297695 patch is correct in that the transformation to the SelectionDAG
> is correct and the problem is a latent issue downstream. We're now correctly
> determining that the Loads at stores in the example have no dependence
> order. The core of the problem seems to be from duplicating the shared stub
> load (which is why it's only in 32-bit X86 and not 64)  the two loads of b
> (which cannot be shared). It seems like we're having an issue with
> duplication of the SU in ScheduleDAGRRList::CopyAndMoveSuccessors where are
> confused determining the unfolding the load causing us to miscount the
> successors.
>
> -Nirav
>
>
>
> On Thu, Apr 27, 2017 at 7:29 PM, Vedant Kumar <vsk at apple.com> wrote:
>>
>> Hi Nirav,
>>
>> I saw a crasher on bugzilla and narrowed down the issue to this commit.
>> Please see:
>>
>>   https://bugs.llvm.org/show_bug.cgi?id=32610
>>
>> The reduced test case compiles with r297692 (previous commit), but the
>> crash occurs when r297695 (this commit) is included. The original C file,
>> and a reduced IR file, is attached to the bug report.
>>
>> Could you take a look?
>>
>> thanks,
>> vedant
>>
>>
>> > On Mar 13, 2017, at 5:34 PM, Nirav Dave via llvm-commits
>> > <llvm-commits at lists.llvm.org> wrote:
>> >
>> > Author: niravd
>> > Date: Mon Mar 13 19:34:14 2017
>> > New Revision: 297695
>> >
>> > URL: http://llvm.org/viewvc/llvm-project?rev=297695&view=rev
>> > Log:
>> > In visitSTORE, always use FindBetterChain, rather than only when UseAA
>> > is enabled.
>> >
>> >    Recommiting with compiler time improvements
>> >
>> >    Recommitting after fixup of 32-bit aliasing sign offset bug in
>> > DAGCombiner.
>> >
>> >    * Simplify Consecutive Merge Store Candidate Search
>> >
>> >    Now that address aliasing is much less conservative, push through
>> >    simplified store merging search and chain alias analysis which only
>> >    checks for parallel stores through the chain subgraph. This is
>> > cleaner
>> >    as the separation of non-interfering loads/stores from the
>> >    store-merging logic.
>> >
>> >    When merging stores search up the chain through a single load, and
>> >    finds all possible stores by looking down from through a load and a
>> >    TokenFactor to all stores visited.
>> >
>> >    This improves the quality of the output SelectionDAG and the output
>> >    Codegen (save perhaps for some ARM cases where we correctly
>> > constructs
>> >    wider loads, but then promotes them to float operations which appear
>> >    but requires more expensive constant generation).
>> >
>> >    Some minor peephole optimizations to deal with improved SubDAG shapes
>> > (listed below)
>> >
>> >    Additional Minor Changes:
>> >
>> >      1. Finishes removing unused AliasLoad code
>> >
>> >      2. Unifies the chain aggregation in the merged stores across code
>> >         paths
>> >
>> >      3. Re-add the Store node to the worklist after calling
>> >         SimplifyDemandedBits.
>> >
>> >      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
>> >         arbitrary, but seems sufficient to not cause regressions in
>> >         tests.
>> >
>> >      5. Remove Chain dependencies of Memory operations on CopyfromReg
>> >         nodes as these are captured by data dependence
>> >
>> >      6. Forward loads-store values through tokenfactors containing
>> >          {CopyToReg,CopyFromReg} Values.
>> >
>> >      7. Peephole to convert buildvector of extract_vector_elt to
>> >         extract_subvector if possible (see
>> >         CodeGen/AArch64/store-merge.ll)
>> >
>> >      8. Store merging for the ARM target is restricted to 32-bit as
>> >         some in some contexts invalid 64-bit operations are being
>> >         generated. This can be removed once appropriate checks are
>> >         added.
>> >
>> >    This finishes the change Matt Arsenault started in r246307 and
>> >    jyknight's original patch.
>> >
>> >    Many tests required some changes as memory operations are now
>> >    reorderable, improving load-store forwarding. One test in
>> >    particular is worth noting:
>> >
>> >      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
>> >      forwarding converts a load-store pair into a parallel store and
>> >      a memory-realized bitcast of the same value. However, because we
>> >      lose the sharing of the explicit and implicit store values we
>> >      must create another local store. A similar transformation
>> >      happens before SelectionDAG as well.
>> >
>> >    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle
>> >
>> > Added:
>> >    llvm/trunk/test/CodeGen/X86/pr32108.ll
>> > Removed:
>> >    llvm/trunk/test/CodeGen/X86/combiner-aa-0.ll
>> >    llvm/trunk/test/CodeGen/X86/combiner-aa-1.ll
>> >    llvm/trunk/test/CodeGen/X86/pr18023.ll
>> > Modified:
>> >    llvm/trunk/include/llvm/Target/TargetLowering.h
>> >    llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
>> >    llvm/trunk/lib/CodeGen/TargetLoweringBase.cpp
>> >    llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp
>> >    llvm/trunk/lib/Target/ARM/ARMISelLowering.h
>> >    llvm/trunk/test/CodeGen/AArch64/argument-blocks.ll
>> >    llvm/trunk/test/CodeGen/AArch64/arm64-abi.ll
>> >    llvm/trunk/test/CodeGen/AArch64/arm64-memset-inline.ll
>> >    llvm/trunk/test/CodeGen/AArch64/arm64-variadic-aapcs.ll
>> >    llvm/trunk/test/CodeGen/AArch64/merge-store.ll
>> >    llvm/trunk/test/CodeGen/AArch64/vector_merge_dep_check.ll
>> >    llvm/trunk/test/CodeGen/AMDGPU/debugger-insert-nops.ll
>> >    llvm/trunk/test/CodeGen/AMDGPU/insert_vector_elt.ll
>> >    llvm/trunk/test/CodeGen/AMDGPU/merge-stores.ll
>> >    llvm/trunk/test/CodeGen/AMDGPU/private-element-size.ll
>> >    llvm/trunk/test/CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll
>> >    llvm/trunk/test/CodeGen/ARM/2012-10-04-AAPCS-byval-align8.ll
>> >    llvm/trunk/test/CodeGen/ARM/alloc-no-stack-realign.ll
>> >    llvm/trunk/test/CodeGen/ARM/gpr-paired-spill.ll
>> >    llvm/trunk/test/CodeGen/ARM/ifcvt10.ll
>> >    llvm/trunk/test/CodeGen/ARM/illegal-bitfield-loadstore.ll
>> >    llvm/trunk/test/CodeGen/ARM/static-addr-hoisting.ll
>> >    llvm/trunk/test/CodeGen/BPF/undef.ll
>> >    llvm/trunk/test/CodeGen/MSP430/Inst16mm.ll
>> >    llvm/trunk/test/CodeGen/Mips/cconv/arguments-float.ll
>> >    llvm/trunk/test/CodeGen/Mips/cconv/arguments-varargs.ll
>> >    llvm/trunk/test/CodeGen/Mips/fastcc.ll
>> >    llvm/trunk/test/CodeGen/Mips/load-store-left-right.ll
>> >    llvm/trunk/test/CodeGen/Mips/micromips-li.ll
>> >    llvm/trunk/test/CodeGen/Mips/mips64-f128-call.ll
>> >    llvm/trunk/test/CodeGen/Mips/mips64-f128.ll
>> >    llvm/trunk/test/CodeGen/Mips/mno-ldc1-sdc1.ll
>> >    llvm/trunk/test/CodeGen/Mips/msa/f16-llvm-ir.ll
>> >    llvm/trunk/test/CodeGen/Mips/msa/i5_ld_st.ll
>> >    llvm/trunk/test/CodeGen/Mips/o32_cc_byval.ll
>> >    llvm/trunk/test/CodeGen/Mips/o32_cc_vararg.ll
>> >    llvm/trunk/test/CodeGen/PowerPC/anon_aggr.ll
>> >    llvm/trunk/test/CodeGen/PowerPC/complex-return.ll
>> >    llvm/trunk/test/CodeGen/PowerPC/jaggedstructs.ll
>> >    llvm/trunk/test/CodeGen/PowerPC/ppc64-align-long-double.ll
>> >    llvm/trunk/test/CodeGen/PowerPC/structsinmem.ll
>> >    llvm/trunk/test/CodeGen/PowerPC/structsinregs.ll
>> >    llvm/trunk/test/CodeGen/SystemZ/unaligned-01.ll
>> >    llvm/trunk/test/CodeGen/Thumb/2010-07-15-debugOrdering.ll
>> >    llvm/trunk/test/CodeGen/Thumb/stack-access.ll
>> >    llvm/trunk/test/CodeGen/X86/2010-09-17-SideEffectsInChain.ll
>> >    llvm/trunk/test/CodeGen/X86/2012-11-28-merge-store-alias.ll
>> >    llvm/trunk/test/CodeGen/X86/MergeConsecutiveStores.ll
>> >    llvm/trunk/test/CodeGen/X86/avx-vbroadcast.ll
>> >    llvm/trunk/test/CodeGen/X86/avx512-mask-op.ll
>> >    llvm/trunk/test/CodeGen/X86/chain_order.ll
>> >    llvm/trunk/test/CodeGen/X86/clear_upper_vector_element_bits.ll
>> >    llvm/trunk/test/CodeGen/X86/copy-eflags.ll
>> >    llvm/trunk/test/CodeGen/X86/dag-merge-fast-accesses.ll
>> >    llvm/trunk/test/CodeGen/X86/dont-trunc-store-double-to-float.ll
>> >
>> > llvm/trunk/test/CodeGen/X86/extractelement-legalization-store-ordering.ll
>> >    llvm/trunk/test/CodeGen/X86/i256-add.ll
>> >    llvm/trunk/test/CodeGen/X86/i386-shrink-wrapping.ll
>> >    llvm/trunk/test/CodeGen/X86/live-range-nosubreg.ll
>> >    llvm/trunk/test/CodeGen/X86/longlong-deadload.ll
>> >    llvm/trunk/test/CodeGen/X86/merge-consecutive-loads-128.ll
>> >    llvm/trunk/test/CodeGen/X86/merge-consecutive-loads-256.ll
>> >    llvm/trunk/test/CodeGen/X86/merge-store-partially-alias-loads.ll
>> >    llvm/trunk/test/CodeGen/X86/split-store.ll
>> >    llvm/trunk/test/CodeGen/X86/stores-merging.ll
>> >    llvm/trunk/test/CodeGen/X86/vector-compare-results.ll
>> >    llvm/trunk/test/CodeGen/X86/vector-shuffle-variable-128.ll
>> >    llvm/trunk/test/CodeGen/X86/vector-shuffle-variable-256.ll
>> >    llvm/trunk/test/CodeGen/X86/vectorcall.ll
>> >    llvm/trunk/test/CodeGen/X86/win32-eh.ll
>> >    llvm/trunk/test/CodeGen/XCore/varargs.ll
>>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>


More information about the llvm-commits mailing list