[llvm] r344142 - [DAGCombine] Improve Load-Store Forwarding

Tue Oct 23 02:41:16 PDT 2018

Given it's an out of tree test, its probably best if you write the patch
and I'll quickly LGTM it.

Nirav

On Tue, Oct 23, 2018, 04:26 Mikael Holmén <mikael.holmen at ericsson.com>
wrote:

> Hi,
>
> On 10/17/18 5:00 PM, Nirav Davé wrote:
> >     Regardless, restricting this optimization to only byte-multiples
> >     seems like the way to go then.
>
> Alright, should I create a patch with that or will you?
>
> I'll be out of office for a week now but I can create a patch after that
> and add you as reviewer if noone beats me to it.
>
> Regards,
> Mikael
>
> >
> >     On Wed, Oct 17, 2018 at 2:34 AM, Mikael Holmén
> >     <mikael.holmen at ericsson.com <mailto:mikael.holmen at ericsson.com>>
> wrote:
> >
> >         Hi,
> >
> >         On 10/16/2018 10:28 PM, Nirav Davé wrote:
> >         > I thought the current state of non-8-bit bytes was not yet
> viable as
> >         > things were baked in pretty hard.
> >
> >         Yes there are many "* 8", "/ 8", ">> 3" etc in-tree that are
> >         connected
> >         to the byte size, but we've changed most of them in our clone and
> >         generalized things so we use the bytesize that we've added to the
> >         DataLayout instead. It works :)
> >
> >         Every now and then there pops up a new hardcoded 8, 3 or similar
> >         in-tree
> >         that we must deal with but most of the time we notice that
> >         pretty quickly.
> >
> >         > Insisting that the LDType and STType
> >         > are some multiple of Byte should be sufficient, but that's
> probably
> >         > something that exists only for you.
> >         >
> >         > Can you try replacing with the following additional type
> legality check?
> >         >
> >         > -Nirav
> >         > +  if (DAG.getDataLayout().isBigEndian()) {
> >         > +    // Avoid dealing with big endian type legalization.
> >         > +    if (!TLI.isTypeLegal(STType) || !TLI.isTypeLegal(LDType))
> >         > +      return SDValue();
> >         > +    Offset =
> >         > +        (STMemType.getSizeInBits() -
> LDMemType.getSizeInBits()) / 8 -
> >         > Offset;
> >         > +  }
> >         > +
> >
> >         Unfortunately it doesn't help in this case.
> >
> >         We have
> >
> >             STType: i40
> >             LDType: i32
> >
> >         and both i32 and i40 are legal on my target. (Yes, we've added
> >         i40 to
> >         MVT which has also been interesting, there are a couple of
> >         places where
> >         it's assumed that the MVTs are all power-of-2, which i40 clearly
> >         isn't).
> >
> >         Checking that the sizes of LDType and STType are multiple of
> >         bytes like
> >
> >             // Normalize for Endianness.
> >             if (DAG.getDataLayout().isBigEndian()) {
> >               // Avoid dealing with big endian type legalization.
> >               if (!TLI.isTypeLegal(STType) || !TLI.isTypeLegal(LDType) ||
> >                   (LDType.getSizeInBits() % bitsPerByte() != 0) ||
> >                   (STType.getSizeInBits() % bitsPerByte() != 0))
> >                 return SDValue();
> >               Offset =
> >                 (STMemType.getSizeInBits() - LDMemType.getSizeInBits()) /
> >         bitsPerByte() - Offset;
> >             }
> >
> >         seems to work since it then bails out on the i40.
> >
> >
> >         I still have a nagging feeling that this could be solved by
> >         involving
> >         getStoreSizeInBits() instead of, or in addition to,
> >         getSizeInBits() in
> >         some way though. If we calculate Offset as
> >
> >               Offset =
> >                 (STMemType.getStoreSizeInBits() -
> >         LDMemType.getStoreSizeInBits())
> >         / bitsPerByte() - Offset;
> >
> >         then we would get (48 - 32)/16 - 0 so Offset would be 1 and then
> >         we'd
> >         abort transformation with
> >
> >             // TODO: Deal with nonzero offset.
> >             if (LD->getBasePtr().isUndef() || Offset != 0)
> >               return SDValue();
> >
> >         so that would work too in this particular case but I don't know
> if
> >         that's correct or not.
> >
> >         I also don't know if this is really just a problem for bigendian
> >         targets
> >         or not, or if it's indeed just a problem for my target with all
> its
> >         oddities.
> >
> >         Thanks,
> >         Mikael
> >
> >         >
> >         >
> >         >
> >         >
> >         >
> >         >
> >         > 16-bit bytes? My understanding is that we've baked in 8-bit
> bytes pretty
> >         > strongly into the backend. Are you
> >         >
> >         >
> >         > On Mon, Oct 15, 2018 at 2:43 AM, Mikael Holmén
> >          > <mikael.holmen at ericsson.com
> >         <mailto:mikael.holmen at ericsson.com>
> >         <mailto:mikael.holmen at ericsson.com
> >         <mailto:mikael.holmen at ericsson.com>>> wrote:
> >          >
> >          >     Hi,
> >          >
> >          >     On 10/12/2018 04:03 PM, Nirav Davé wrote:
> >          >     > Hmm. I wonder if this is an issue with bigendian vector
> >         representations.
> >          >     >
> >          >
> >          >     No vectors involved in this case though.
> >          >
> >          >     I think it's a size vs stores size issue, possibly that
> >         might affect
> >          >     bigendian vectors too? I don't know.
> >          >
> >          >     > I suspect there are not too many cases where we
> >         actually get to the
> >          >     > getTruncatedStoreValue in your case. Can you tell me
> >         what STType,
> >          >     > LDType, STMemType, LDMemType, and final offset are when
> >         we get there for
> >          >     > your test case?
> >          >
> >          >     When we reach getTruncatedStoreValue we have the
> >         following DAG:
> >          >
> >          >     SelectionDAG has 22 nodes:
> >          >         t0: ch = EntryToken
> >          >         t2: i24,ch = CopyFromReg t0, Register:i24 %0
> >          >           t23: ch = TokenFactor t2:1, t22
> >          >           t32: i32 = srl t38, Constant:i16<16>
> >          >         t17: ch,glue = CopyToReg t23, Register:i32 $a1_32, t32
> >          >               t27: i16 = add FrameIndex:i16<0>,
> Constant:i16<2>
> >          >             t34: i32,ch = load<(dereferenceable load 1 from
> >         %ir._p48 + 2),
> >          >     zext from i16> t22, t27, undef:i16
> >          >             t30: i32 = shl t38, Constant:i16<16>
> >          >           t37: i32 = add t34, t30
> >          >         t19: ch,glue = CopyToReg t17, Register:i32 $a0_32,
> >         t37, t17:1
> >          >             t24: i40 = any_extend t2
> >          >           t6: i40 = shl nuw t24, Constant:i16<16>
> >          >         t22: ch = store<(store 3 into %ir._p40, align 1)> t0,
> t6,
> >          >     FrameIndex:i16<0>, undef:i16
> >          >         t38: i32,ch = load<(dereferenceable load 2 from
> >         %ir._p48, align 1)>
> >          >     t22, FrameIndex:i16<0>, undef:i16
> >          >         t20: ch = PHXISD::RETURN t19, Register:i32 $a1_32,
> >         Register:i32
> >          >     $a0_32, t19:1
> >          >
> >          >     and the following types
> >          >
> >          >     STType: i40
> >          >     LDType: i32
> >          >     STMemType: i40
> >          >     LDMemType: i32
> >          >
> >          >     The Offset is 0.
> >          >
> >          >     Complicating factor: the byte size on my target is 16,
> >         not 8, so we've
> >          >     replaced the hardcoded 8 in "* 8" and "/ 8" with
> >         "bitsPerByte()" which
> >          >     in our case is 16 so the code looks like:
> >          >
> >          >         bool STCoversLD =
> >          >             BasePtrST.equalBaseIndex(BasePtrLD, DAG, Offset)
> >         && (Offset
> >          >      >= 0) &&
> >          >             (Offset * bitsPerByte() <=
> >         LDMemType.getSizeInBits()) &&
> >          >             (Offset * bitsPerByte() +
> >         LDMemType.getSizeInBits() <=
> >          >              STMemType.getSizeInBits());
> >          >
> >          >         if (!STCoversLD)
> >          >           return SDValue();
> >          >
> >          >         // Normalize for Endianness.
> >          >         if (DAG.getDataLayout().isBigEndian())
> >          >           Offset =
> >          >               (STMemType.getSizeInBits() -
> >         LDMemType.getSizeInBits()) /
> >          >               bitsPerByte() - Offset;
> >          >
> >          >     The Offset was thus calculated as
> >          >
> >          >        (40 - 32)/16 - 0
> >          >
> >          >     which I suspect is incorrect.
> >          >
> >          >     STMemType is i40, so the size in bits is 40. However, the
> >         store size
> >          >     for
> >          >     i40 is 48!
> >          >
> >          >     So I think that we need to involve the store size in some
> >         way here.
> >          >
> >          >     Regards,
> >          >     Mikael
> >          >
> >          >     >
> >          >     > -Nirav
> >          >     >
> >          >     >
> >          >     >
> >          >     >
> >          >     >
> >          >     >
> >          >     >
> >          >     > On Fri, Oct 12, 2018 at 9:33 AM, Mikael Holmén
> >          >      > <mikael.holmen at ericsson.com
> >         <mailto:mikael.holmen at ericsson.com>
> >         <mailto:mikael.holmen at ericsson.com
> >         <mailto:mikael.holmen at ericsson.com>>
> >          >     <mailto:mikael.holmen at ericsson.com
> >         <mailto:mikael.holmen at ericsson.com>
> >          >     <mailto:mikael.holmen at ericsson.com
> >         <mailto:mikael.holmen at ericsson.com>>>> wrote:
> >          >      >
> >          >      >     Hi again,
> >          >      >
> >          >      >     There seems to be another problem too.
> >          >      >
> >          >      >     I don't have an in-tree reproducer for it but I
> >         think the problem
> >          >      >     has to
> >          >      >     do with cases where the store size of the VT is
> >         larger than the
> >          >      >     "normal"
> >          >      >     size.
> >          >      >
> >          >      >
> >          >      >     Then I think the new code can miscalculate what
> >         part of the
> >          >     stored
> >          >      >     value
> >          >      >     it should use instead of the load. Not sure if
> >         this is only a
> >          >     problem
> >          >      >     for bigendian targets or not.
> >          >      >
> >          >      >     The end result is code that fails at runtime so
> >         it's quite nasty.
> >          >      >
> >          >      >
> >          >      >
> >          >      >
> >          >      >
> >          >      >     Regards,
> >          >      >     Mikael
> >          >      >
> >          >      >     On 10/12/2018 08:08 AM, Mikael Holmén wrote:
> >          >      >      > Yep, thanks!
> >          >      >      >
> >          >      >      > Regards,
> >          >      >      > Mikael
> >          >      >      >
> >          >      >      > On 10/11/2018 08:44 PM, Nirav Davé wrote:
> >          >      >      >> Should be fixed in rL344272.
> >          >      >      >>
> >          >      >      >> Thanks for the catch.
> >          >      >      >>
> >          >      >      >> -Nirav
> >          >      >      >>
> >          >      >      >> On Thu, Oct 11, 2018 at 7:41 AM, Mikael Holmén
> >          >      >      >> <mikael.holmen at ericsson.com
> >         <mailto:mikael.holmen at ericsson.com>
> >          >     <mailto:mikael.holmen at ericsson.com
> >         <mailto:mikael.holmen at ericsson.com>>
> >          >     <mailto:mikael.holmen at ericsson.com
> >         <mailto:mikael.holmen at ericsson.com>
> >         <mailto:mikael.holmen at ericsson.com
> >         <mailto:mikael.holmen at ericsson.com>>>
> >          >      >     <mailto:mikael.holmen at ericsson.com
> >         <mailto:mikael.holmen at ericsson.com>
> >          >     <mailto:mikael.holmen at ericsson.com
> >         <mailto:mikael.holmen at ericsson.com>>
> >          >      >     <mailto:mikael.holmen at ericsson.com
> >         <mailto:mikael.holmen at ericsson.com>
> >          >     <mailto:mikael.holmen at ericsson.com
> >         <mailto:mikael.holmen at ericsson.com>>>>> wrote:
> >          >      >      >>
> >          >      >      >>     Hi,
> >          >      >      >>
> >          >      >      >>     Reproducer for powerpc:
> >          >      >      >>
> >          >      >      >>        llc bug.ll -o - -O1
> >          >      >      >>
> >          >      >      >>     Without the patch we get
> >          >      >      >>
> >          >      >      >>               addis 4, 2, .LC0 at toc@ha
> >          >      >      >>               sth 3, -2(1)
> >          >      >      >>               ld 4, .LC0 at toc@l(4)
> >          >      >      >>               lbz 3, -2(1)
> >          >      >      >>               stb 3, 0(4)
> >          >      >      >>               blr
> >          >      >      >>
> >          >      >      >>     and with
> >          >      >      >>
> >          >      >      >>               addis 4, 2, .LC0 at toc@ha
> >          >      >      >>               sth 3, -2(1)
> >          >      >      >>               ld 4, .LC0 at toc@l(4)
> >          >      >      >>               stb 3, 0(4)
> >          >      >      >>               blr
> >          >      >      >>
> >          >      >      >>     Admittedly I'm no ppc expert but I think
> >         the final
> >          >     stb will
> >          >      >     write
> >          >      >      >> bits
> >          >      >      >>     [0-7] of 3 to 0(4).
> >          >      >      >>
> >          >      >      >>     Before your patch those 8 bits were setup
> >         from the
> >          >      >      >>
> >          >      >      >>               lbz 3, -2(1)
> >          >      >      >>
> >          >      >      >>     but with the patch, I think the bits we're
> >         after are
> >          >     placed
> >          >      >     at bit
> >          >      >      >>     [8-15] in 3 so we'll get the wrong byte.
> >          >      >      >>
> >          >      >      >>     /Mikael
> >          >      >      >>
> >          >      >      >>     On 10/11/2018 12:54 PM, Mikael Holmén
> wrote:
> >          >      >      >>      > Hi Nirav,
> >          >      >      >>      >
> >          >      >      >>      > I suspect that this patch doesn't
> >         handle big
> >          >     endian targets
> >          >      >      >>     correctly.
> >          >      >      >>      >
> >          >      >      >>      > I'll try to get back with a reproducer
> >         for some
> >          >     in-tree
> >          >      >     target,
> >          >      >      >>     but for
> >          >      >      >>      > my out-of-tree bigendian target it
> >         looks like it
> >          >     changes
> >          >      >     somethng
> >          >      >      >>     like
> >          >      >      >>      >
> >          >      >      >>      >   store i16 %v, i16* %p16
> >          >      >      >>      >   %p8 = bitcast i16* %p16 to i8*
> >          >      >      >>      >   %ld = load i16, i16* %p8
> >          >      >      >>      >
> >          >      >      >>      > to
> >          >      >      >>      >
> >          >      >      >>      >   store i16 %v, i16* %p16
> >          >      >      >>      >   %ld = truncate i16 %v to i8
> >          >      >      >>      >
> >          >      >      >>      > but I think it should rather be
> >          >      >      >>      >
> >          >      >      >>      >   store i16 %v, i16* %p16
> >          >      >      >>      >   %tmp = lshr i16 %v, 8
> >          >      >      >>      >   %ld = truncate i16 %tmp to i8
> >          >      >      >>      >
> >          >      >      >>      > I.e. if the target is bigendian, the
> >         load will
> >          >     read the
> >          >      >     high 8
> >          >      >      >>     bits from
> >          >      >      >>      > %v rather than the low. And the
> >         truncate that this
> >          >     patch
> >          >      >      >>     generates gives
> >          >      >      >>      > us the low bits.
> >          >      >      >>      >
> >          >      >      >>      > Regards,
> >          >      >      >>      > Mikael
> >          >      >      >>      >
> >          >      >      >>      > On 10/10/2018 04:15 PM, Nirav Dave via
> >          >     llvm-commits wrote:
> >          >      >      >>      >> Author: niravd
> >          >      >      >>      >> Date: Wed Oct 10 07:15:52 2018
> >          >      >      >>      >> New Revision: 344142
> >          >      >      >>      >>
> >          >      >      >>      >> URL:
> >          >      >
> >         http://llvm.org/viewvc/llvm-project?rev=344142&view=rev
> >         <http://llvm.org/viewvc/llvm-project?rev=344142&view=rev>
> >          >     <http://llvm.org/viewvc/llvm-project?rev=344142&view=rev
> >         <http://llvm.org/viewvc/llvm-project?rev=344142&view=rev>>
> >          >      >
> >           <http://llvm.org/viewvc/llvm-project?rev=344142&view=rev
> >         <http://llvm.org/viewvc/llvm-project?rev=344142&view=rev>
> >          >     <http://llvm.org/viewvc/llvm-project?rev=344142&view=rev
> >         <http://llvm.org/viewvc/llvm-project?rev=344142&view=rev>>>
> >          >      >      >>
> >          >     <http://llvm.org/viewvc/llvm-project?rev=344142&view=rev
> >         <http://llvm.org/viewvc/llvm-project?rev=344142&view=rev>
> >          >     <http://llvm.org/viewvc/llvm-project?rev=344142&view=rev
> >         <http://llvm.org/viewvc/llvm-project?rev=344142&view=rev>>
> >          >      >
> >           <http://llvm.org/viewvc/llvm-project?rev=344142&view=rev
> >         <http://llvm.org/viewvc/llvm-project?rev=344142&view=rev>
> >          >     <http://llvm.org/viewvc/llvm-project?rev=344142&view=rev
> >         <http://llvm.org/viewvc/llvm-project?rev=344142&view=rev>>>>
> >          >      >      >>      >> Log:
> >          >      >      >>      >> [DAGCombine] Improve Load-Store
> Forwarding
> >          >      >      >>      >>
> >          >      >      >>      >> Summary:
> >          >      >      >>      >> Extend analysis forwarding loads from
> >         preceeding
> >          >     stores to
> >          >      >      >> work with
> >          >      >      >>      >> extended loads and truncated stores to
> >         the same
> >          >     address
> >          >      >     so long
> >          >      >      >>     as the
> >          >      >      >>      >> load is fully subsumed by the store.
> >          >      >      >>      >>
> >          >      >      >>      >> Hexagon's swp-epilog-phis.ll and
> >          >     swp-memrefs-epilog1.ll
> >          >      >     test are
> >          >      >      >>      >> deleted as they've no longer seem to
> >         be relevant.
> >          >      >      >>      >>
> >          >      >      >>      >> Reviewers: RKSimon, rnk, kparzysz,
> >         javed.absar
> >          >      >      >>      >>
> >          >      >      >>      >> Subscribers: sdardis, nemanjai,
> >         hiraditya, atanasyan,
> >          >      >      >> llvm-commits
> >          >      >      >>      >>
> >          >      >      >>      >> Differential Revision:
> >          > https://reviews.llvm.org/D49200
> >         <https://reviews.llvm.org/D49200>
> >         <https://reviews.llvm.org/D49200 <
> https://reviews.llvm.org/D49200>>
> >          >      >     <https://reviews.llvm.org/D49200
> >         <https://reviews.llvm.org/D49200>
> >          >     <https://reviews.llvm.org/D49200
> >         <https://reviews.llvm.org/D49200>>>
> >          >      >      >>     <https://reviews.llvm.org/D49200
> >         <https://reviews.llvm.org/D49200>
> >          >     <https://reviews.llvm.org/D49200
> >         <https://reviews.llvm.org/D49200>>
> >          >      >     <https://reviews.llvm.org/D49200
> >         <https://reviews.llvm.org/D49200>
> >          >     <https://reviews.llvm.org/D49200
> >         <https://reviews.llvm.org/D49200>>>>
> >          >      >      >>      >>
> >          >      >      >>      >> Removed:
> >          >      >      >>      >>
> >          >     llvm/trunk/test/CodeGen/Hexagon/swp-epilog-phis.ll
> >          >      >      >>      >>
> >          >     llvm/trunk/test/CodeGen/Hexagon/swp-memrefs-epilog1.ll
> >          >      >      >>      >> Modified:
> >          >      >      >>      >>
> >          >     llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> >          >      >      >>      >>
> >          >     llvm/trunk/test/CodeGen/AArch64/arm64-ld-from-st.ll
> >          >      >      >>      >>
> >          >      >
> >           llvm/trunk/test/CodeGen/AArch64/regress-tblgen-chains.ll
> >          >      >      >>      >>
> >          >     llvm/trunk/test/CodeGen/Hexagon/clr_set_toggle.ll
> >          >      >      >>      >>
> >         llvm/trunk/test/CodeGen/Mips/cconv/vector.ll
> >          >      >      >>      >>
> >          >      >      >>
> >          >
> >           llvm/trunk/test/CodeGen/Mips/indirect-jump-hazard/jumptables.ll
> >          >      >      >>      >>
> >         llvm/trunk/test/CodeGen/Mips/o32_cc_byval.ll
> >          >      >      >>      >>
> >         llvm/trunk/test/CodeGen/Mips/o32_cc_vararg.ll
> >          >      >      >>      >>
> >          >     llvm/trunk/test/CodeGen/PowerPC/addi-offset-fold.ll
> >          >      >      >>      >>
> >          >      >
> >           llvm/trunk/test/CodeGen/SystemZ/store_nonbytesized_vecs.ll
> >          >      >      >>      >>
> >          >     llvm/trunk/test/CodeGen/X86/i386-shrink-wrapping.ll
> >          >      >      >>      >>
> >         llvm/trunk/test/CodeGen/X86/pr32108.ll
> >          >      >      >>      >>
> >         llvm/trunk/test/CodeGen/X86/pr38533.ll
> >          >      >      >>      >>
> >         llvm/trunk/test/CodeGen/X86/win64_vararg.ll
> >          >      >      >>      >>
> >          >      >      >>      >> Modified:
> >          >      >     llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> >          >      >      >>      >> URL:
> >          >      >      >>      >>
> >          >      >      >>
> >          >      >      >>
> >          >      >
> >          >
> >
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=344142&r1=344141&r2=344142&view=diff
> >         <
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=344142&r1=344141&r2=344142&view=diff
> >
> >          >
> >           <
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=344142&r1=344141&r2=344142&view=diff
> <
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=344142&r1=344141&r2=344142&view=diff
> >>
> >          >      >
> >          >
> >           <
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=344142&r1=344141&r2=344142&view=diff
> <
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=344142&r1=344141&r2=344142&view=diff>
> <
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=344142&r1=344141&r2=344142&view=diff
> <
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=344142&r1=344141&r2=344142&view=diff
> >>>
> >          >      >
> >          >      >      >>
> >          >      >      >>
> >          >      >      >>
> >          >      >
> >          >
> >           <
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=344142&r1=344141&r2=344142&view=diff
> <
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=344142&r1=344141&r2=344142&view=diff>
> <
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=344142&r1=344141&r2=344142&view=diff
> <
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=344142&r1=344141&r2=344142&view=diff
> >>
> >          >      >
> >          >
> >           <
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=344142&r1=344141&r2=344142&view=diff
> <
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=344142&r1=344141&r2=344142&view=diff>
> <
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=344142&r1=344141&r2=344142&view=diff
> <
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=344142&r1=344141&r2=344142&view=diff
> >>>>
> >          >      >
> >          >      >      >>
> >          >      >      >>
> >          >      >      >>      >>
> >          >      >      >>      >>
> >          >      >      >>
> >          >      >      >>
> >          >      >
> >          >
> >
>  ==============================================================================
> >          >      >
> >          >      >      >>
> >          >      >      >>
> >          >      >      >>      >>
> >          >      >      >>      >> ---
> >          >     llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
> >          >      >      >> (original)
> >          >      >      >>      >> +++
> >          >     llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Wed
> >          >      >      >> Oct 10
> >          >      >      >>      >> 07:15:52 2018
> >          >      >      >>      >> @@ -250,6 +250,11 @@ namespace {
> >          >      >      >>      >>       SDValue
> >         SplitIndexingFromLoad(LoadSDNode *LD);
> >          >      >      >>      >>       bool SliceUpLoad(SDNode *N);
> >          >      >      >>      >> +    // Scalars have size 0 to
> >         distinguish from
> >          >     singleton
> >          >      >      >> vectors.
> >          >      >      >>      >> +    SDValue
> >          > ForwardStoreValueToDirectLoad(LoadSDNode *LD);
> >          >      >      >>      >> +    bool
> >         getTruncatedStoreValue(StoreSDNode *ST,
> >          >      >     SDValue &Val);
> >          >      >      >>      >> +    bool
> >         extendLoadedValueToExtension(LoadSDNode
> >          >     *LD,
> >          >      >     SDValue
> >          >      >      >>     &Val);
> >          >      >      >>      >> +
> >          >      >      >>      >>       /// Replace an
> >         ISD::EXTRACT_VECTOR_ELT of a
> >          >     load
> >          >      >     with a
> >          >      >      >>     narrowed
> >          >      >      >>      >>       ///   load.
> >          >      >      >>      >>       ///
> >          >      >      >>      >> @@ -12762,6 +12767,133 @@ SDValue
> >          >      >      >> DAGCombiner::SplitIndexingFromLo
> >          >      >      >>      >>     return DAG.getNode(Opc, SDLoc(LD),
> >          >      >     BP.getSimpleValueType(),
> >          >      >      >>     BP, Inc);
> >          >      >      >>      >>   }
> >          >      >      >>      >> +static inline int
> >         numVectorEltsOrZero(EVT T) {
> >          >      >      >>      >> +  return T.isVector() ?
> >         T.getVectorNumElements()
> >          >     : 0;
> >          >      >      >>      >> +}
> >          >      >      >>      >> +
> >          >      >      >>      >> +bool
> >          >     DAGCombiner::getTruncatedStoreValue(StoreSDNode *ST,
> >          >      >      >> SDValue
> >          >      >      >>      >> &Val) {
> >          >      >      >>      >> +  Val = ST->getValue();
> >          >      >      >>      >> +  EVT STType = Val.getValueType();
> >          >      >      >>      >> +  EVT STMemType = ST->getMemoryVT();
> >          >      >      >>      >> +  if (STType == STMemType)
> >          >      >      >>      >> +    return true;
> >          >      >      >>      >> +  if (isTypeLegal(STMemType))
> >          >      >      >>      >> +    return false; // fail.
> >          >      >      >>      >> +  if (STType.isFloatingPoint() &&
> >          >      >     STMemType.isFloatingPoint() &&
> >          >      >      >>      >> +
> TLI.isOperationLegal(ISD::FTRUNC,
> >          >     STMemType)) {
> >          >      >      >>      >> +    Val = DAG.getNode(ISD::FTRUNC,
> >         SDLoc(ST),
> >          >      >     STMemType, Val);
> >          >      >      >>      >> +    return true;
> >          >      >      >>      >> +  }
> >          >      >      >>      >> +  if (numVectorEltsOrZero(STType) ==
> >          >      >      >>     numVectorEltsOrZero(STMemType) &&
> >          >      >      >>      >> +      STType.isInteger() &&
> >         STMemType.isInteger()) {
> >          >      >      >>      >> +    Val = DAG.getNode(ISD::TRUNCATE,
> >         SDLoc(ST),
> >          >      >     STMemType, Val);
> >          >      >      >>      >> +    return true;
> >          >      >      >>      >> +  }
> >          >      >      >>      >> +  if (STType.getSizeInBits() ==
> >          >      >     STMemType.getSizeInBits()) {
> >          >      >      >>      >> +    Val = DAG.getBitcast(STMemType,
> Val);
> >          >      >      >>      >> +    return true;
> >          >      >      >>      >> +  }
> >          >      >      >>      >> +  return false; // fail.
> >          >      >      >>      >> +}
> >          >      >      >>      >> +
> >          >      >      >>      >> +bool
> >          >      >
> >           DAGCombiner::extendLoadedValueToExtension(LoadSDNode *LD,
> >          >      >      >>      >> SDValue &Val) {
> >          >      >      >>      >> +  EVT LDMemType = LD->getMemoryVT();
> >          >      >      >>      >> +  EVT LDType = LD->getValueType(0);
> >          >      >      >>      >> +  assert(Val.getValueType() ==
> >         LDMemType &&
> >          >      >      >>      >> +         "Attempting to extend value
> >         of non-matching
> >          >      >     type");
> >          >      >      >>      >> +  if (LDType == LDMemType)
> >          >      >      >>      >> +    return true;
> >          >      >      >>      >> +  if (LDMemType.isInteger() &&
> >         LDType.isInteger()) {
> >          >      >      >>      >> +    switch (LD->getExtensionType()) {
> >          >      >      >>      >> +    case ISD::NON_EXTLOAD:
> >          >      >      >>      >> +      Val = DAG.getBitcast(LDType,
> Val);
> >          >      >      >>      >> +      return true;
> >          >      >      >>      >> +    case ISD::EXTLOAD:
> >          >      >      >>      >> +      Val =
> >         DAG.getNode(ISD::ANY_EXTEND, SDLoc(LD),
> >          >      >     LDType,
> >          >      >      >> Val);
> >          >      >      >>      >> +      return true;
> >          >      >      >>      >> +    case ISD::SEXTLOAD:
> >          >      >      >>      >> +      Val =
> >         DAG.getNode(ISD::SIGN_EXTEND, SDLoc(LD),
> >          >      >     LDType,
> >          >      >      >> Val);
> >          >      >      >>      >> +      return true;
> >          >      >      >>      >> +    case ISD::ZEXTLOAD:
> >          >      >      >>      >> +      Val =
> >         DAG.getNode(ISD::ZERO_EXTEND, SDLoc(LD),
> >          >      >     LDType,
> >          >      >      >> Val);
> >          >      >      >>      >> +      return true;
> >          >      >      >>      >> +    }
> >          >      >      >>      >> +  }
> >          >      >      >>      >> +  return false;
> >          >      >      >>      >> +}
> >          >      >      >>      >> +
> >          >      >      >>      >> +SDValue
> >          >      >
>  DAGCombiner::ForwardStoreValueToDirectLoad(LoadSDNode
> >          >      >      >>     *LD) {
> >          >      >      >>      >> +  if (OptLevel == CodeGenOpt::None ||
> >          >     LD->isVolatile())
> >          >      >      >>      >> +    return SDValue();
> >          >      >      >>      >> +  SDValue Chain = LD->getOperand(0);
> >          >      >      >>      >> +  StoreSDNode *ST =
> >          >     dyn_cast<StoreSDNode>(Chain.getNode());
> >          >      >      >>      >> +  if (!ST || ST->isVolatile())
> >          >      >      >>      >> +    return SDValue();
> >          >      >      >>      >> +
> >          >      >      >>      >> +  EVT LDType = LD->getValueType(0);
> >          >      >      >>      >> +  EVT LDMemType = LD->getMemoryVT();
> >          >      >      >>      >> +  EVT STMemType = ST->getMemoryVT();
> >          >      >      >>      >> +  EVT STType =
> >         ST->getValue().getValueType();
> >          >      >      >>      >> +
> >          >      >      >>      >> +  BaseIndexOffset BasePtrLD =
> >          >      >     BaseIndexOffset::match(LD, DAG);
> >          >      >      >>      >> +  BaseIndexOffset BasePtrST =
> >          >      >     BaseIndexOffset::match(ST, DAG);
> >          >      >      >>      >> +  int64_t Offset;
> >          >      >      >>      >> +
> >          >      >      >>      >> +  bool STCoversLD =
> >          >      >      >>      >> +
> >         BasePtrST.equalBaseIndex(BasePtrLD, DAG,
> >          >     Offset) &&
> >          >      >      >>     (Offset >=
> >          >      >      >>      >> 0) &&
> >          >      >      >>      >> +      (Offset * 8 <=
> >         LDMemType.getSizeInBits()) &&
> >          >      >      >>      >> +      (Offset * 8 +
> >         LDMemType.getSizeInBits() <=
> >          >      >      >>      >> STMemType.getSizeInBits());
> >          >      >      >>      >> +
> >          >      >      >>      >> +  if (!STCoversLD)
> >          >      >      >>      >> +    return SDValue();
> >          >      >      >>      >> +
> >          >      >      >>      >> +  // Memory as copy space
> >         (potentially masked).
> >          >      >      >>      >> +  if (Offset == 0 && LDType == STType
> &&
> >          >     STMemType ==
> >          >      >      >> LDMemType) {
> >          >      >      >>      >> +    // Simple case: Direct
> >         non-truncating forwarding
> >          >      >      >>      >> +    if (LDType.getSizeInBits() ==
> >          >      >     LDMemType.getSizeInBits())
> >          >      >      >>      >> +      return CombineTo(LD,
> >         ST->getValue(), Chain);
> >          >      >      >>      >> +    // Can we model the truncate and
> >         extension
> >          >     with an
> >          >      >     and mask?
> >          >      >      >>      >> +    if (STType.isInteger() &&
> >          >     LDMemType.isInteger() &&
> >          >      >      >>      >> !STType.isVector() &&
> >          >      >      >>      >> +        !LDMemType.isVector() &&
> >          >     LD->getExtensionType() !=
> >          >      >      >>      >> ISD::SEXTLOAD) {
> >          >      >      >>      >> +      // Mask to size of LDMemType
> >          >      >      >>      >> +      auto Mask =
> >          >      >      >>      >> +
> >          >      >      >>
> >         DAG.getConstant(APInt::getLowBitsSet(STType.getSizeInBits(),
> >          >      >      >>      >> +
> >          >      >      >>      >> STMemType.getSizeInBits()),
> >          >      >      >>      >> +                          SDLoc(ST),
> >         STType);
> >          >      >      >>      >> +      auto Val = DAG.getNode(ISD::AND,
> >          >     SDLoc(LD), LDType,
> >          >      >      >>      >> ST->getValue(), Mask);
> >          >      >      >>      >> +      return CombineTo(LD, Val,
> Chain);
> >          >      >      >>      >> +    }
> >          >      >      >>      >> +  }
> >          >      >      >>      >> +
> >          >      >      >>      >> +  // TODO: Deal with nonzero offset.
> >          >      >      >>      >> +  if (LD->getBasePtr().isUndef() ||
> >         Offset != 0)
> >          >      >      >>      >> +    return SDValue();
> >          >      >      >>      >> +  // Model necessary truncations /
> >         extenstions.
> >          >      >      >>      >> +  SDValue Val;
> >          >      >      >>      >> +  // Truncate Value To Stored Memory
> >         Size.
> >          >      >      >>      >> +  do {
> >          >      >      >>      >> +    if (!getTruncatedStoreValue(ST,
> Val))
> >          >      >      >>      >> +      continue;
> >          >      >      >>      >> +    if (!isTypeLegal(LDMemType))
> >          >      >      >>      >> +      continue;
> >          >      >      >>      >> +    if (STMemType != LDMemType) {
> >          >      >      >>      >> +      if
> >         (numVectorEltsOrZero(STMemType) ==
> >          >      >      >>      >> numVectorEltsOrZero(LDMemType) &&
> >          >      >      >>      >> +          STMemType.isInteger() &&
> >          >     LDMemType.isInteger())
> >          >      >      >>      >> +        Val =
> >         DAG.getNode(ISD::TRUNCATE, SDLoc(LD),
> >          >      >     LDMemType,
> >          >      >      >>     Val);
> >          >      >      >>      >> +      else
> >          >      >      >>      >> +        continue;
> >          >      >      >>      >> +    }
> >          >      >      >>      >> +    if
> >         (!extendLoadedValueToExtension(LD, Val))
> >          >      >      >>      >> +      continue;
> >          >      >      >>      >> +    return CombineTo(LD, Val, Chain);
> >          >      >      >>      >> +  } while (false);
> >          >      >      >>      >> +
> >          >      >      >>      >> +  // On failure, cleanup dead nodes
> >         we may have
> >          >     created.
> >          >      >      >>      >> +  if (Val->use_empty())
> >          >      >      >>      >> +    deleteAndRecombine(Val.getNode());
> >          >      >      >>      >> +  return SDValue();
> >          >      >      >>      >> +}
> >          >      >      >>      >> +
> >          >      >      >>      >>   SDValue
> >         DAGCombiner::visitLOAD(SDNode *N) {
> >          >      >      >>      >>     LoadSDNode *LD  =
> cast<LoadSDNode>(N);
> >          >      >      >>      >>     SDValue Chain = LD->getChain();
> >          >      >      >>      >> @@ -12828,17 +12960,8 @@ SDValue
> >          >      >     DAGCombiner::visitLOAD(SDNode *N
> >          >      >      >>      >>     // If this load is directly
> >         stored, replace
> >          >     the load
> >          >      >     value
> >          >      >      >> with
> >          >      >      >>      >> the stored
> >          >      >      >>      >>     // value.
> >          >      >      >>      >> -  // TODO: Handle store large -> read
> >         small portion.
> >          >      >      >>      >> -  // TODO: Handle TRUNCSTORE/LOADEXT
> >          >      >      >>      >> -  if (OptLevel != CodeGenOpt::None &&
> >          >      >      >>      >> -      ISD::isNormalLoad(N) &&
> >         !LD->isVolatile()) {
> >          >      >      >>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20181023/12018cfa/attachment-0001.html>