[PATCH] fix PR15296
Michael Liao
michael.liao at intel.com
Tue Feb 19 11:21:21 PST 2013
As explained in the log message, the root cause of inefficient code
generation for 256-bit shift on AVX is that the necessary preparation of
256-bit integer shift is split into regular DAG lowering and DAG
combining in existing code. The lowering will expand 256-bit into
128-bit ones and hence DAG combining has no chance to recognize that.
Another point is that, if a transform could be in lowering, we should do
that in lowering to minimize DAG combination overhead.
Yours
- Michael
On Tue, 2013-02-19 at 11:16 -0800, Craig Topper wrote:
> We shouldn't need to move this much code to fix this bug. Are you just
> trying to avoid having duplicate/similar code in the DAG combine and
> in the lowering path?
>
> On Tue, Feb 19, 2013 at 10:33 AM, Michael Liao
> <michael.liao at intel.com> wrote:
> Hi Nadav
>
> The issue for me to split this patch is that it will break
> lots of
> regression tests. The new lowering is moved from the original
> shift
> combining plus a special handling in 32-bit mode.
>
> please check the latest patch following your other suggestion.
> The patch
> is revised to handle the canonicalized form of broadcast.
> Could you
> point me to more documents on the canonical forms of some
> useful
> patterns, such as broadcast.
>
> BTW, the current code style suggests 'funcName'. In that file,
> there're
> many existing functions still following the previous one but
> there're
> some following the new one. Do you need change to new one? I
> could take
> that if needed.
>
> Yours
> - Michael
>
> On Tue, 2013-02-19 at 09:42 -0800, Nadav Rotem wrote:
> > Hi Michael,
> >
> > Thanks for working on this. In this patch you moved a lot
> of code around. Can you split it into two patches: refactoring
> and the new lowering ?
> >
> > +static SDValue lowerScalarImmediateShift(SDValue Op,
> SelectionDAG &DAG,
> >
> > Please match the coding style of the existing code. In this
> file most static functions start with an upper case (for
> example PromoteSplat, LowerBuildVector, etc).
> >
> > + // Check remaiing shift amounts.
> > + for (unsigned i = Ratio; i != Amt.getNumOperands(); i
> += Ratio) {
> >
> > Typo.
> >
> > + %smear.0 = insertelement <8 x i32> undef, i32 %shiftval,
> i32 0
> > + %smear.1 = insertelement <8 x i32> %smear.0, i32 %
> shiftval, i32 1
> > + %smear.2 = insertelement <8 x i32> %smear.1, i32 %
> shiftval, i32 2
> > + %smear.3 = insertelement <8 x i32> %smear.2, i32 %
> shiftval, i32 3
> > + %smear.4 = insertelement <8 x i32> %smear.3, i32 %
> shiftval, i32 4
> > + %smear.5 = insertelement <8 x i32> %smear.4, i32 %
> shiftval, i32 5
> > + %smear.6 = insertelement <8 x i32> %smear.5, i32 %
> shiftval, i32 6
> > + %smear.7 = insertelement <8 x i32> %smear.6, i32 %
> shiftval, i32 7
> > + %bitop = lshr <8 x i32> %input, %smear.7
> > + ret <8 x i32> %bitop
> >
> > The canonicalized form of broadcast is insert+shuffle.
> >
> > Thanks,
> > Nadav
> >
> > On Feb 19, 2013, at 12:32 AM, Michael Liao
> <michael.liao at intel.com> wrote:
> >
> > > Hi All,
> > >
> > > This patch fixes PR15296 by lowering SRA/SRL/SHL into
> PSRA/PSRL/PSHL
> > > during DAG lowering instead of DAG combining, because
> 256-bit integer
> > > vector will be expanded in lowering where DAG combining
> will lose the
> > > chance to transform it.
> > >
> > > Thanks for your review
> > > - Michael
> > >
> > >
> <0001-Fix-PR15296.patch>_______________________________________________
> > > llvm-commits mailing list
> > > llvm-commits at cs.uiuc.edu
> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
>
>
>
> --
> ~Craig
More information about the llvm-commits
mailing list