[PATCH] D35700: DAGCombiner: Extend reduceBuildVecToTrunc to handle non-zero offset
Guy Blank via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Jul 23 08:46:40 PDT 2017
guyblank added inline comments.
================
Comment at: include/llvm/Target/TargetLowering.h:2779
+ // -->
+ // v4i32 truncate (bitcast (shuffle<1,u,3,u,4,u,5,u,6,u,7,u> V, u) to v4i64)
+ virtual bool isDesirableToCombineStridedeBuildVectorToShuffleTruncate(
----------------
shuffle mask should be <1,u,3,u,5,u,7,u>
================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:14433
// The first BUILD_VECTOR operand must be an an extract from index zero
// (assuming no undef and little-endian).
----------------
update comment
================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:14442
+ // Check for profitability before proceeding with more expensive checks.
+ if (!TLI.isDesirableToCombineStridedeBuildVectorToShuffleTruncate(
+ Stride, Offset, ExtractedFromVec.getValueType(), VT))
----------------
I think you should add
Offset != 0
to the condition
================
Comment at: lib/CodeGen/SelectionDAG/DAGCombiner.cpp:14473
+ }
+ if (!DAG.getTargetLoweringInfo().isShuffleMaskLegal(Mask, ExtractedVT))
+ return SDValue();
----------------
you can use TLI
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:35835
+ Offset == 0 ||
+ // For 32-bit elements VPERMD is better than shuffle+truncate
+ (SrcVT.getScalarSizeInBits() != 32 && Subtarget.hasAVX2());
----------------
what about VPERMW for 16-bit elements?
https://reviews.llvm.org/D35700
More information about the llvm-commits
mailing list