[llvm] r212249 - [x86] Based on a long conversation between myself, Jim Grosbach, Hal

Thu Jul 3 04:24:18 PDT 2014

Hi Chandler (and all),
I agree that in general 'widening' packed integer vector types is
often a better (or not worse) default strategy than promoting vectors.

On X86, in my experience, 'widening' is perfectly safe (i.e. doesn't
have a negative impact) in most cases where the type is used by vector
arithmetic binops and vector logical binops (You saw for example how
this can be beneficial on some test-cases from the x86 test
'lower-bitcast.ll).

However, in my experience, things become complicated when dealing for
example with vector shifts on x86.
On non AVX2 machines, the lack of support for packed shifts with
variable count is one of the main reasons why the backend end up
scalarizing many packed shifts.

With your change, if we widen the vector type, then we introduce more
elements in the vector; that means, if the type was used by a
ISD::SHL/SRL/SRA, we are potentially introducing a longer chain of
scalar shifts.

Example 1.
-----
define <2 x i32> @test1( <2 x i32> %A, <2 x i32> %B) {
  %1 = lshr <2 x i32> %A, %B
  ret < 2 x i32> %1
}

[Without AVX2]
Before this change, type <2 x i32> was promoted to <2 x i64>. The
logical shift was then scalarized into a sequence of  2 'shrq'.
If we widen the vector from <2 x i32> to <4 x i32> the shift is now
scalarized into a worse longer sequence of 4 'shrl'.

Example 2.
----

define <2 x i32> @test2( <2 x i32> %A, <2 x i32> %B) {
  %1 = ashr <2 x i32> %A, %B
  ret < 2 x i32> %1
}

[Without AVX2]
Similar to Example 1.
We end up with a longer scalar sequence of 'sarl' instructions.

On the other hand, widening has a positive impact on vector shifts in
the following two cases:

Example 3.

define <2 x i32> @test3( <2 x i32> %A, <2 x i32> %B) {
  %1 = shl <2 x i32> %A, %B
  ret < 2 x i32> %1
}

[Without AVX2]
If we don't promote type <2 x i32> to <2 x i64>, during the lowering
stage we realize that this shift can be safely converted into a vector
multiply. Therefore, we avoid to scalarize the shift into two 'shlq'
instructions.
Before we produced: 2x vpextrq + 2x shlq + 1x vpunpcklqdq.
With your new hidden flag enabled we produce a much shorter sequence:
vpslld + vpaddd + vpmulld.

Example 4.
define <2 x i32> @test4(<2 x i32> %A, <2 x i32> %B) {
  %1 = ashr <2 x i32> %A, <i32 3, i32 3>
  ret <2 x i32> %1
}

This is a very interesting case.
Basically, SSE/AVX doesn't support packed arithmetic shifts by
immediate count where the element is a quadword (i64).
If we then promote type  <2 x i32> to <2 x i64>, we are then forced to
scalarize the packed shift into a sequence of two 'sarq' instructions
(plus 2 vector extracts followed by a vector insert). This is
suboptimal and we could have clearly done a better job, since SSE
normally supports packed doubleword arithmetic shifts instead...

With your change, the codegen is much much better!
We widen <2 x i32> to <4 x i32>. This allows to lower the shift into a
single 'vpsrad' instruction.

In conclusion (sorry for the long post....)
I hope this can help you understanding the impact of widening vs
promotion in the case of vector shifts.
In general I am of the opinion that having 'widening' as the default
is a good idea. If we fix how we do the widening/promotion for shifts
then this has the potential of improving the performances by quite
alot (just my opinion :-) ).

Cheers,
Andrea Di Biagio

On Thu, Jul 3, 2014 at 3:11 AM, Chandler Carruth <chandlerc at gmail.com> wrote:
> Author: chandlerc
> Date: Wed Jul  2 21:11:29 2014
> New Revision: 212249
>
> URL: http://llvm.org/viewvc/llvm-project?rev=212249&view=rev
> Log:
> [x86] Based on a long conversation between myself, Jim Grosbach, Hal
> Finkel, Eric Christopher, and a bunch of other people I'm probably
> forgetting (sorry), add an option to the x86 backend to widen vectors
> during type legalization rather than promote them.
>
> This still would promote vNi1 vectors to get the masks right, but would
> widen other vectors. A lot of experiments are piling up right now
> showing that widening should probably be the default legalization
> strategy outside of vNi1 cases, but it is very hard to test the
> rammifications of that and fix bugs in widening-based legalization
> without an option that enables it. I'll be checking in tests shortly
> that use this option to exercise cases where widening doesn't work well
> and hopefully we'll be able to switch fully to this soon.
>
> Modified:
>     llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>     llvm/trunk/lib/Target/X86/X86ISelLowering.h
>
> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=212249&r1=212248&r2=212249&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Wed Jul  2 21:11:29 2014
> @@ -58,6 +58,12 @@ using namespace llvm;
>
>  STATISTIC(NumTailCalls, "Number of tail calls");
>
> +static cl::opt<bool> ExperimentalVectorWideningLegalization(
> +    "x86-experimental-vector-widening-legalization", cl::init(false),
> +    cl::desc("Enable an experimental vector type legalization through widening "
> +             "rather than promotion."),
> +    cl::Hidden);
> +
>  static cl::opt<bool> ExperimentalVectorShuffleLowering(
>      "x86-experimental-vector-shuffle-lowering", cl::init(false),
>      cl::desc("Enable an experimental vector shuffle lowering code path."),
> @@ -1588,6 +1594,16 @@ void X86TargetLowering::resetOperationAc
>    setPrefFunctionAlignment(4); // 2^4 bytes.
>  }
>
> +TargetLoweringBase::LegalizeTypeAction
> +X86TargetLowering::getPreferredVectorAction(EVT VT) const {
> +  if (ExperimentalVectorWideningLegalization &&
> +      VT.getVectorNumElements() != 1 &&
> +      VT.getVectorElementType().getSimpleVT() != MVT::i1)
> +    return TypeWidenVector;
> +
> +  return TargetLoweringBase::getPreferredVectorAction(VT);
> +}
> +
>  EVT X86TargetLowering::getSetCCResultType(LLVMContext &, EVT VT) const {
>    if (!VT.isVector())
>      return Subtarget->hasAVX512() ? MVT::i1: MVT::i8;
>
> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.h
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.h?rev=212249&r1=212248&r2=212249&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86ISelLowering.h (original)
> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.h Wed Jul  2 21:11:29 2014
> @@ -796,6 +796,9 @@ namespace llvm {
>      /// \brief Reset the operation actions based on target options.
>      void resetOperationActions() override;
>
> +    /// \brief Customize the preferred legalization strategy for certain types.
> +    LegalizeTypeAction getPreferredVectorAction(EVT VT) const override;
> +
>    protected:
>      std::pair<const TargetRegisterClass*, uint8_t>
>      findRepresentativeClass(MVT VT) const override;
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits