[PATCH] fix PR15296

Tue Feb 19 13:08:49 PST 2013

I have attached a patch that fixes the 256-bit problem in lowering without
this much change. It leaves everything else in DAG combine. Not sure if I
have a 32-bit mode problem or not though.

On Tue, Feb 19, 2013 at 11:21 AM, Michael Liao <michael.liao at intel.com>wrote:

> As explained in the log message, the root cause of inefficient code
> generation for 256-bit shift on AVX is that the necessary preparation of
> 256-bit integer shift is split into regular DAG lowering and DAG
> combining in existing code. The lowering will expand 256-bit into
> 128-bit ones and hence DAG combining has no chance to recognize that.
>
> Another point is that, if a transform could be in lowering, we should do
> that in lowering to minimize DAG combination overhead.
>
> Yours
> - Michael
>
> On Tue, 2013-02-19 at 11:16 -0800, Craig Topper wrote:
> > We shouldn't need to move this much code to fix this bug. Are you just
> > trying to avoid having duplicate/similar code in the DAG combine and
> > in the lowering path?
> >
> > On Tue, Feb 19, 2013 at 10:33 AM, Michael Liao
> > <michael.liao at intel.com> wrote:
> >         Hi Nadav
> >
> >         The issue for me to split this patch is that it will break
> >         lots of
> >         regression tests. The new lowering is moved from the original
> >         shift
> >         combining plus a special handling in 32-bit mode.
> >
> >         please check the latest patch following your other suggestion.
> >         The patch
> >         is revised to handle the canonicalized form of broadcast.
> >         Could you
> >         point me to more documents on the canonical forms of some
> >         useful
> >         patterns, such as broadcast.
> >
> >         BTW, the current code style suggests 'funcName'. In that file,
> >         there're
> >         many existing functions still following the previous one but
> >         there're
> >         some following the new one. Do you need change to new one? I
> >         could take
> >         that if needed.
> >
> >         Yours
> >         - Michael
> >
> >         On Tue, 2013-02-19 at 09:42 -0800, Nadav Rotem wrote:
> >         > Hi Michael,
> >         >
> >         > Thanks for working on this.  In this patch you moved a lot
> >         of code around. Can you split it into two patches: refactoring
> >         and the new lowering ?
> >         >
> >         > +static SDValue lowerScalarImmediateShift(SDValue Op,
> >         SelectionDAG &DAG,
> >         >
> >         > Please match the coding style of the existing code. In this
> >         file most static functions start with an upper case (for
> >         example PromoteSplat, LowerBuildVector, etc).
> >         >
> >         > +    // Check remaiing shift amounts.
> >         > +    for (unsigned i = Ratio; i != Amt.getNumOperands(); i
> >         += Ratio) {
> >         >
> >         > Typo.
> >         >
> >         > +  %smear.0 = insertelement <8 x i32> undef, i32 %shiftval,
> >         i32 0
> >         > +  %smear.1 = insertelement <8 x i32> %smear.0, i32 %
> >         shiftval, i32 1
> >         > +  %smear.2 = insertelement <8 x i32> %smear.1, i32 %
> >         shiftval, i32 2
> >         > +  %smear.3 = insertelement <8 x i32> %smear.2, i32 %
> >         shiftval, i32 3
> >         > +  %smear.4 = insertelement <8 x i32> %smear.3, i32 %
> >         shiftval, i32 4
> >         > +  %smear.5 = insertelement <8 x i32> %smear.4, i32 %
> >         shiftval, i32 5
> >         > +  %smear.6 = insertelement <8 x i32> %smear.5, i32 %
> >         shiftval, i32 6
> >         > +  %smear.7 = insertelement <8 x i32> %smear.6, i32 %
> >         shiftval, i32 7
> >         > +  %bitop = lshr <8 x i32> %input, %smear.7
> >         > +  ret <8 x i32> %bitop
> >         >
> >         > The canonicalized form of broadcast is insert+shuffle.
> >         >
> >         > Thanks,
> >         > Nadav
> >         >
> >         > On Feb 19, 2013, at 12:32 AM, Michael Liao
> >         <michael.liao at intel.com> wrote:
> >         >
> >         > > Hi All,
> >         > >
> >         > > This patch fixes PR15296 by lowering SRA/SRL/SHL into
> >         PSRA/PSRL/PSHL
> >         > > during DAG lowering instead of DAG combining, because
> >         256-bit integer
> >         > > vector will be expanded in lowering where DAG combining
> >         will lose the
> >         > > chance to transform it.
> >         > >
> >         > > Thanks for your review
> >         > > - Michael
> >         > >
> >         > >
> >
> <0001-Fix-PR15296.patch>_______________________________________________
> >         > > llvm-commits mailing list
> >         > > llvm-commits at cs.uiuc.edu
> >         > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >         >
> >
> >
> >
> >         _______________________________________________
> >         llvm-commits mailing list
> >         llvm-commits at cs.uiuc.edu
> >         http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
> >
> >
> >
> >
> > --
> > ~Craig
>
>
>

-- 
~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130219/18d0dac8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vshift.patch
Type: application/octet-stream
Size: 3015 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130219/18d0dac8/attachment.obj>