<div dir="ltr"><div>+ctopper, +mkuper who may have some interest in the CMOV aspects of this.<br></div><div><br></div><div>Looks like we're now correcting hte chain of the two loads here so we can correctly pull and merge the loads past the select in DAGCombine after which we fail due to an issue with X86RegisterInfo::eliminateFrameIndex not being able to deal with CMOV where one of the argument FrameIndex that translates to the stack pointer with a non-zero offset. Adding Craig and Michael who may have more interested. </div><div><br></div><div>Attached is a temporary patch to prevent the immediate issue and hopefully unsticks you. It causes some degradation in various tests and some improvements in others.</div><div><br></div><div>-Nirav</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, May 11, 2017 at 6:41 PM, Sanjoy Das <span dir="ltr"><<a href="mailto:sanjoy@playingwithpointers.com" target="_blank">sanjoy@playingwithpointers.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Even if you can't immediately fix the issue cleanly, a conservative<br>
workaround that we can temporarily apply downstream* will be immensely<br>
helpful.<br>
<br>
* We have a JIT compiler based on LLVM.<br>
<span class="HOEnZb"><font color="#888888"><br>
-- Sanjoy<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
On Thu, May 11, 2017 at 3:30 PM, Sanjoy Das<br>
<<a href="mailto:sanjoy@playingwithpointers.com">sanjoy@playingwithpointers.<wbr>com</a>> wrote:<br>
> Hi,<br>
><br>
> This change causes llc to crash on the following input:<br>
><br>
> target datalayout = "e-m:e-i64:64-f80:128-n8:16:<wbr>32:64-S128"<br>
> declare void @f()<br>
><br>
> define i32 addrspace(1)* @test(i32 addrspace(1)* %a, i32 addrspace(1)*<br>
> %b, i1 %which) gc "statepoint-example" {<br>
> entry:<br>
>   %tok = tail call token (i64, i32, void ()*, i32, i32, ...)<br>
> @llvm.experimental.gc.<wbr>statepoint.p0f_isVoidf(i64 0, i32 0, void ()*<br>
> @f, i32 0, i32 0, i32 0, i32 0, i32 addrspace(1)* %a, i32<br>
> addrspace(1)* %b)<br>
>   %a.r = tail call coldcc i8 addrspace(1)*<br>
> @llvm.experimental.gc.<wbr>relocate.p1i8(token %tok, i32 7, i32 7) ; (%a,<br>
> %a)<br>
>   %b.r = tail call coldcc i8 addrspace(1)*<br>
> @llvm.experimental.gc.<wbr>relocate.p1i8(token %tok, i32 8, i32 8) ; (%b,<br>
> %b)<br>
>   %cond.v = select i1 %which, i8 addrspace(1)* %a.r, i8 addrspace(1)* %b.r<br>
>   %cond = bitcast i8 addrspace(1)* %cond.v to i32 addrspace(1)*<br>
>   ret i32 addrspace(1)* %cond<br>
> }<br>
><br>
> declare token @llvm.experimental.gc.<wbr>statepoint.p0f_isVoidf(i64, i32,<br>
> void ()*, i32, i32, ...)<br>
> declare i8 addrspace(1)* @llvm.experimental.gc.<wbr>relocate.p1i8(token, i32, i32)<br>
><br>
> I've filed PR33010 for this, can you please take a look?<br>
><br>
> -- Sanjoy<br>
><br>
> On Thu, Apr 27, 2017 at 7:04 PM, Nirav Davé via llvm-commits<br>
> <<a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a>> wrote:<br>
>> Hi Vedant:<br>
>><br>
>> I've been looking at this for a bit. No fix yet, but here's what I've<br>
>> determined so far:<br>
>><br>
>> The r297695 patch is correct in that the transformation to the SelectionDAG<br>
>> is correct and the problem is a latent issue downstream. We're now correctly<br>
>> determining that the Loads at stores in the example have no dependence<br>
>> order. The core of the problem seems to be from duplicating the shared stub<br>
>> load (which is why it's only in 32-bit X86 and not 64)  the two loads of b<br>
>> (which cannot be shared). It seems like we're having an issue with<br>
>> duplication of the SU in ScheduleDAGRRList::<wbr>CopyAndMoveSuccessors where are<br>
>> confused determining the unfolding the load causing us to miscount the<br>
>> successors.<br>
>><br>
>> -Nirav<br>
>><br>
>><br>
>><br>
>> On Thu, Apr 27, 2017 at 7:29 PM, Vedant Kumar <<a href="mailto:vsk@apple.com">vsk@apple.com</a>> wrote:<br>
>>><br>
>>> Hi Nirav,<br>
>>><br>
>>> I saw a crasher on bugzilla and narrowed down the issue to this commit.<br>
>>> Please see:<br>
>>><br>
>>>   <a href="https://bugs.llvm.org/show_bug.cgi?id=32610" rel="noreferrer" target="_blank">https://bugs.llvm.org/show_<wbr>bug.cgi?id=32610</a><br>
>>><br>
>>> The reduced test case compiles with r297692 (previous commit), but the<br>
>>> crash occurs when r297695 (this commit) is included. The original C file,<br>
>>> and a reduced IR file, is attached to the bug report.<br>
>>><br>
>>> Could you take a look?<br>
>>><br>
>>> thanks,<br>
>>> vedant<br>
>>><br>
>>><br>
>>> > On Mar 13, 2017, at 5:34 PM, Nirav Dave via llvm-commits<br>
>>> > <<a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a>> wrote:<br>
>>> ><br>
>>> > Author: niravd<br>
>>> > Date: Mon Mar 13 19:34:14 2017<br>
>>> > New Revision: 297695<br>
>>> ><br>
>>> > URL: <a href="http://llvm.org/viewvc/llvm-project?rev=297695&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project?rev=297695&view=rev</a><br>
>>> > Log:<br>
>>> > In visitSTORE, always use FindBetterChain, rather than only when UseAA<br>
>>> > is enabled.<br>
>>> ><br>
>>> >    Recommiting with compiler time improvements<br>
>>> ><br>
>>> >    Recommitting after fixup of 32-bit aliasing sign offset bug in<br>
>>> > DAGCombiner.<br>
>>> ><br>
>>> >    * Simplify Consecutive Merge Store Candidate Search<br>
>>> ><br>
>>> >    Now that address aliasing is much less conservative, push through<br>
>>> >    simplified store merging search and chain alias analysis which only<br>
>>> >    checks for parallel stores through the chain subgraph. This is<br>
>>> > cleaner<br>
>>> >    as the separation of non-interfering loads/stores from the<br>
>>> >    store-merging logic.<br>
>>> ><br>
>>> >    When merging stores search up the chain through a single load, and<br>
>>> >    finds all possible stores by looking down from through a load and a<br>
>>> >    TokenFactor to all stores visited.<br>
>>> ><br>
>>> >    This improves the quality of the output SelectionDAG and the output<br>
>>> >    Codegen (save perhaps for some ARM cases where we correctly<br>
>>> > constructs<br>
>>> >    wider loads, but then promotes them to float operations which appear<br>
>>> >    but requires more expensive constant generation).<br>
>>> ><br>
>>> >    Some minor peephole optimizations to deal with improved SubDAG shapes<br>
>>> > (listed below)<br>
>>> ><br>
>>> >    Additional Minor Changes:<br>
>>> ><br>
>>> >      1. Finishes removing unused AliasLoad code<br>
>>> ><br>
>>> >      2. Unifies the chain aggregation in the merged stores across code<br>
>>> >         paths<br>
>>> ><br>
>>> >      3. Re-add the Store node to the worklist after calling<br>
>>> >         SimplifyDemandedBits.<br>
>>> ><br>
>>> >      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is<br>
>>> >         arbitrary, but seems sufficient to not cause regressions in<br>
>>> >         tests.<br>
>>> ><br>
>>> >      5. Remove Chain dependencies of Memory operations on CopyfromReg<br>
>>> >         nodes as these are captured by data dependence<br>
>>> ><br>
>>> >      6. Forward loads-store values through tokenfactors containing<br>
>>> >          {CopyToReg,CopyFromReg} Values.<br>
>>> ><br>
>>> >      7. Peephole to convert buildvector of extract_vector_elt to<br>
>>> >         extract_subvector if possible (see<br>
>>> >         CodeGen/AArch64/store-merge.<wbr>ll)<br>
>>> ><br>
>>> >      8. Store merging for the ARM target is restricted to 32-bit as<br>
>>> >         some in some contexts invalid 64-bit operations are being<br>
>>> >         generated. This can be removed once appropriate checks are<br>
>>> >         added.<br>
>>> ><br>
>>> >    This finishes the change Matt Arsenault started in r246307 and<br>
>>> >    jyknight's original patch.<br>
>>> ><br>
>>> >    Many tests required some changes as memory operations are now<br>
>>> >    reorderable, improving load-store forwarding. One test in<br>
>>> >    particular is worth noting:<br>
>>> ><br>
>>> >      CodeGen/PowerPC/ppc64-align-<wbr>long-double.ll - Improved load-store<br>
>>> >      forwarding converts a load-store pair into a parallel store and<br>
>>> >      a memory-realized bitcast of the same value. However, because we<br>
>>> >      lose the sharing of the explicit and implicit store values we<br>
>>> >      must create another local store. A similar transformation<br>
>>> >      happens before SelectionDAG as well.<br>
>>> ><br>
>>> >    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle<br>
>>> ><br>
>>> > Added:<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>pr32108.ll<br>
>>> > Removed:<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>combiner-aa-0.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>combiner-aa-1.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>pr18023.ll<br>
>>> > Modified:<br>
>>> >    llvm/trunk/include/llvm/<wbr>Target/TargetLowering.h<br>
>>> >    llvm/trunk/lib/CodeGen/<wbr>SelectionDAG/DAGCombiner.cpp<br>
>>> >    llvm/trunk/lib/CodeGen/<wbr>TargetLoweringBase.cpp<br>
>>> >    llvm/trunk/lib/Target/AArch64/<wbr>AArch64ISelLowering.cpp<br>
>>> >    llvm/trunk/lib/Target/ARM/<wbr>ARMISelLowering.h<br>
>>> >    llvm/trunk/test/CodeGen/<wbr>AArch64/argument-blocks.ll<br>
>>> >    llvm/trunk/test/CodeGen/<wbr>AArch64/arm64-abi.ll<br>
>>> >    llvm/trunk/test/CodeGen/<wbr>AArch64/arm64-memset-inline.ll<br>
>>> >    llvm/trunk/test/CodeGen/<wbr>AArch64/arm64-variadic-aapcs.<wbr>ll<br>
>>> >    llvm/trunk/test/CodeGen/<wbr>AArch64/merge-store.ll<br>
>>> >    llvm/trunk/test/CodeGen/<wbr>AArch64/vector_merge_dep_<wbr>check.ll<br>
>>> >    llvm/trunk/test/CodeGen/<wbr>AMDGPU/debugger-insert-nops.ll<br>
>>> >    llvm/trunk/test/CodeGen/<wbr>AMDGPU/insert_vector_elt.ll<br>
>>> >    llvm/trunk/test/CodeGen/<wbr>AMDGPU/merge-stores.ll<br>
>>> >    llvm/trunk/test/CodeGen/<wbr>AMDGPU/private-element-size.ll<br>
>>> >    llvm/trunk/test/CodeGen/<wbr>AMDGPU/si-triv-disjoint-mem-<wbr>access.ll<br>
>>> >    llvm/trunk/test/CodeGen/ARM/<wbr>2012-10-04-AAPCS-byval-align8.<wbr>ll<br>
>>> >    llvm/trunk/test/CodeGen/ARM/<wbr>alloc-no-stack-realign.ll<br>
>>> >    llvm/trunk/test/CodeGen/ARM/<wbr>gpr-paired-spill.ll<br>
>>> >    llvm/trunk/test/CodeGen/ARM/<wbr>ifcvt10.ll<br>
>>> >    llvm/trunk/test/CodeGen/ARM/<wbr>illegal-bitfield-loadstore.ll<br>
>>> >    llvm/trunk/test/CodeGen/ARM/<wbr>static-addr-hoisting.ll<br>
>>> >    llvm/trunk/test/CodeGen/BPF/<wbr>undef.ll<br>
>>> >    llvm/trunk/test/CodeGen/<wbr>MSP430/Inst16mm.ll<br>
>>> >    llvm/trunk/test/CodeGen/Mips/<wbr>cconv/arguments-float.ll<br>
>>> >    llvm/trunk/test/CodeGen/Mips/<wbr>cconv/arguments-varargs.ll<br>
>>> >    llvm/trunk/test/CodeGen/Mips/<wbr>fastcc.ll<br>
>>> >    llvm/trunk/test/CodeGen/Mips/<wbr>load-store-left-right.ll<br>
>>> >    llvm/trunk/test/CodeGen/Mips/<wbr>micromips-li.ll<br>
>>> >    llvm/trunk/test/CodeGen/Mips/<wbr>mips64-f128-call.ll<br>
>>> >    llvm/trunk/test/CodeGen/Mips/<wbr>mips64-f128.ll<br>
>>> >    llvm/trunk/test/CodeGen/Mips/<wbr>mno-ldc1-sdc1.ll<br>
>>> >    llvm/trunk/test/CodeGen/Mips/<wbr>msa/f16-llvm-ir.ll<br>
>>> >    llvm/trunk/test/CodeGen/Mips/<wbr>msa/i5_ld_st.ll<br>
>>> >    llvm/trunk/test/CodeGen/Mips/<wbr>o32_cc_byval.ll<br>
>>> >    llvm/trunk/test/CodeGen/Mips/<wbr>o32_cc_vararg.ll<br>
>>> >    llvm/trunk/test/CodeGen/<wbr>PowerPC/anon_aggr.ll<br>
>>> >    llvm/trunk/test/CodeGen/<wbr>PowerPC/complex-return.ll<br>
>>> >    llvm/trunk/test/CodeGen/<wbr>PowerPC/jaggedstructs.ll<br>
>>> >    llvm/trunk/test/CodeGen/<wbr>PowerPC/ppc64-align-long-<wbr>double.ll<br>
>>> >    llvm/trunk/test/CodeGen/<wbr>PowerPC/structsinmem.ll<br>
>>> >    llvm/trunk/test/CodeGen/<wbr>PowerPC/structsinregs.ll<br>
>>> >    llvm/trunk/test/CodeGen/<wbr>SystemZ/unaligned-01.ll<br>
>>> >    llvm/trunk/test/CodeGen/Thumb/<wbr>2010-07-15-debugOrdering.ll<br>
>>> >    llvm/trunk/test/CodeGen/Thumb/<wbr>stack-access.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>2010-09-17-SideEffectsInChain.<wbr>ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>2012-11-28-merge-store-alias.<wbr>ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>MergeConsecutiveStores.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>avx-vbroadcast.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>avx512-mask-op.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>chain_order.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>clear_upper_vector_element_<wbr>bits.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>copy-eflags.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>dag-merge-fast-accesses.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>dont-trunc-store-double-to-<wbr>float.ll<br>
>>> ><br>
>>> > llvm/trunk/test/CodeGen/X86/<wbr>extractelement-legalization-<wbr>store-ordering.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>i256-add.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>i386-shrink-wrapping.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>live-range-nosubreg.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>longlong-deadload.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>merge-consecutive-loads-128.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>merge-consecutive-loads-256.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>merge-store-partially-alias-<wbr>loads.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>split-store.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>stores-merging.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>vector-compare-results.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>vector-shuffle-variable-128.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>vector-shuffle-variable-256.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>vectorcall.ll<br>
>>> >    llvm/trunk/test/CodeGen/X86/<wbr>win32-eh.ll<br>
>>> >    llvm/trunk/test/CodeGen/XCore/<wbr>varargs.ll<br>
>>><br>
>><br>
>><br>
>> ______________________________<wbr>_________________<br>
>> llvm-commits mailing list<br>
>> <a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a><br>
>> <a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a><br>
>><br>
</div></div></blockquote></div><br></div>