[PATCH] D35700: DAGCombiner: Extend reduceBuildVecToTrunc to handle non-zero offset
Zvi Rackover via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jul 24 14:58:18 PDT 2017
zvi added inline comments.
================
Comment at: include/llvm/Target/TargetLowering.h:2779
+ // -->
+ // v4i32 truncate (bitcast (shuffle<1,u,3,u,4,u,5,u,6,u,7,u> V, u) to v4i64)
+ virtual bool isDesirableToCombineStridedeBuildVectorToShuffleTruncate(
----------------
guyblank wrote:
> shuffle mask should be <1,u,3,u,5,u,7,u>
right
================
Comment at: include/llvm/Target/TargetLowering.h:2782
+ int Stride, int Offset, EVT SrcVT, EVT BuildVecVT) const {
+ return false;
+ }
----------------
RKSimon wrote:
> It would possibly make sense to change this to
> ```
> isDesirableToCombineBuildVectorToShuffleTruncate(ArrayRef<int> ShuffleMask, EVT SrcVT, EVT TruncVT)
> ```
> checking with a general shuffle mask - I'd expect any non-lane crossing shuffle mask to be valid on AVX512.
Thanks. Will look into this suggestion.
================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:14433
// The first BUILD_VECTOR operand must be an an extract from index zero
// (assuming no undef and little-endian).
----------------
guyblank wrote:
> update comment
ok
================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:14442
+ // Check for profitability before proceeding with more expensive checks.
+ if (!TLI.isDesirableToCombineStridedeBuildVectorToShuffleTruncate(
+ Stride, Offset, ExtractedFromVec.getValueType(), VT))
----------------
guyblank wrote:
> I think you should add
> Offset != 0
> to the condition
Makes sense. To remain consistent with current flow.
================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:14473
+ }
+ if (!DAG.getTargetLoweringInfo().isShuffleMaskLegal(Mask, ExtractedVT))
+ return SDValue();
----------------
guyblank wrote:
> you can use TLI
ok
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:35835
+ Offset == 0 ||
+ // For 32-bit elements VPERMD is better than shuffle+truncate
+ (SrcVT.getScalarSizeInBits() != 32 && Subtarget.hasAVX2());
----------------
guyblank wrote:
> what about VPERMW for 16-bit elements?
We need to improve lowerBuildVector to make use of VPERMW. Until then we should go with this combine. Will add a TODO.
================
Comment at: test/CodeGen/X86/shuffle-strided-with-offset-256.ll:39
+; AVX512F-NEXT: vpmovsxwd %ymm0, %zmm0
+; AVX512F-NEXT: vpmovdb %zmm0, %xmm0
; AVX512F-NEXT: vmovdqa %xmm0, (%rsi)
----------------
guyblank wrote:
> should this be
>
> ```
> vpmovdb %zmm0, (%rsi)
> ```
>
> same for AVX512VL
Look like a follow-up work needed here.
https://reviews.llvm.org/D35700
More information about the llvm-commits
mailing list