[PATCH] X86: fold SSE2/AVX2 logical shift by immediate amout into zero vector when possible

Thu Jul 11 05:35:17 PDT 2013

Thanks for the feedback!

> From: Duncan Sands <duncan.sands at gmail.com>
> On 10/07/13 22:15, Nadav Rotem wrote:
> > The patch LGTM.  I have a few comments:
> >
> > This is a NOP:
> 
> The IR level optimizers know about this one.

I considered this case, but I decided that it was probably best to not 
introduce unnecessary complexity into a target specific optimization when, 
as Duncan says, this case is already covered by the IR optimizer.

> >
> > +define <8 x i16> @test_srlw_1(<8 x i16> %InVec) {
> > +entry:
> > +  %shl = lshr <8 x i16> %InVec, <i16 0, i16 0, i16 0, i16 0, i16 
> 0, i16 0, i16
> > 0, i16 0>
> > +  ret <8 x i16> %shl
> > +}
> > +
> > +; CHECK: test_srlw_1:
> > +; CHECK: psrlw   $0, %xmm0
> > +; CHECK-NEXT: ret
> > +
> >
> > I think that this is also a missed optimization.  32 > 31.
> 
> I.e. according to LLVM semantics the result is undefined (undef).
> Interestingly the IR level optimizers don't turn this into undef.
> 
> Ciao, Duncan.

The behavior of shifts by amount larger than the element size is defined 
for x86 SSE2/AVX2.
Quoting the Intel reference manual (
http://download.intel.com/products/processor/manual/325462.pdf )
[Section 4.2: instructions PSRAW/PSRAD]
"As the bits in the data elements
are shifted right, the empty high-order bits are filled with the initial 
value of the sign bit of the data element. If the
value specified by the count operand is greater than 15 (for words) or 31 
(for doublewords), each destination data
element is filled with the initial value of the sign bit of the element."

In the case of our function @test_srad_3 (see below),
the count operand is greater than 31 and therefore the destination will be 
filled with the
sign bit of InVec. From a target perspective, the result is not undefined; 
it will either be a vector of all 0s or a vector of all 1s.

> > +define <4 x i32> @test_srad_3(<4 x i32> %InVec) {
> > +entry:
> > +  %shl = ashr <4 x i32> %InVec, <i32 32, i32 32, i32 32, i32 32>
> > +  ret <4 x i32> %shl
> > +}
> > +
> > +; CHECK: test_srad_3:
> > +; CHECK: psrad   $32, %xmm0
> > +; CHECK-NEXT: ret
> >
> >
> > Nadav

In my opinion this is not a missed opportunity since we don't know about 
the sign bit of variable %InVec and therefore it is not safe to fold the 
arithmetic shift into a vector of all 0s.

Please let me know what you think,
thanks again for the feedback.

Andrea Di Biagio
SN Systems - Sony Computer Entertainment Group

> > On Jul 10, 2013, at 1:00 PM, Eric Christopher <echristo at gmail.com
> > <mailto:echristo at gmail.com>> wrote:
> >
> >> Nadav might be someone good to review this.
> >>
> >> -eric
> >>
> >> On Wed, Jul 10, 2013 at 6:46 AM,  <Andrea_DiBiagio at sn.scee.net
> >> <mailto:Andrea_DiBiagio at sn.scee.net>> wrote:
> >>> Ping.
> >>>
> >>> (See attached file: patch.diff)
> >>> Andrea DiBiagio/SN R&D/BS/UK/SCEE wrote on 01/07/2013 12:01:44:
> >>>
> >>>> Friendly ping.
> >>>>
> >>>>> From: Andrea DiBiagio/SN R&D/BS/UK/SCEE
> >>>>> Hi all,
> >>>>>
> >>>>> I'd like to contribute a patch that teaches the x86 backend how to
> >>>>> combine SSE2/AVX2 packed logical shifts by immediate amount into
> >>>>> vectors of all 0s.
> >>>>>
> >>>>> SSE2/AVX2 logical shift by immediate amount where the amount is
> >>>>> greater than or
> >>>>> equal to the vector element size always return a vector of all 0s.
> >>>>>
> >>>>> Example:
> >>>>> pslld $35, %xmm0   # SSE2 packed doubleword logical shift left.
> >>>>>                    # %xmm0 is a vector of packed int (MVT::v4i32).
> >>>>>
> >>>>> The shift from this example will return a vector of all zeros in 
%xmm0
> >>> and
> >>>>> therefore it could be easily rewritten for example as:
> >>>>> xorps %xmm0, %xmm0
> >>>>>
> >>>>> This patch adds a new target combine rule in X86ISelLowering.cpp 
to
> >>>>> make sure that we simplify when possible vector shifts into zero
> >>> vectors.
> >>>>>
> >>>>> I added two new tests to verify that vector shifts are correctly 
folded
> >>> into
> >>>>> vectors of all 0s when the immediate amount is equal or exceeds
> >>>>> the vector element size.
> >>>>>
> >>>>> Thanks,
> >>>>> Andrea Di Biagio
> >>>>> SN Systems - Sony Computer Entertainment

**********************************************************************
This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. 
If you have received this email in error please notify postmaster at scee.net
This footnote also confirms that this email message has been checked for 
all known viruses.
Sony Computer Entertainment Europe Limited
Registered Office: 10 Great Marlborough Street, London W1F 7LP, United 
Kingdom
Registered in England: 3277793
**********************************************************************

P Please consider the environment before printing this e-mail