[PATCH] D34601: [X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess.
Farhana Aleen via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 5 17:07:20 PDT 2017
Farhana added inline comments.
================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:170
+/// Generate unpacklo/unpackhi shuffle mask.
+static void createUnpackShuffleMask(int NumElts, SmallVectorImpl<uint32_t> &Mask,
+ bool Lo, bool Unary) {
----------------
Why not use the createUnpackShuffleMask(..) defined in X86ISelLowering.cpp? I would recommend you to declare a template function of this in X86ISelLowering.h which will allow both int and int32_t mask and get rid of this from X86InterleavedAccess.cpp.
================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:186
+// shufle(VEC1,VEC2,{NumElement/2, NumElement/2+1, NumElement/2+2...,
+// NumElement-1, NumElement-NumElement/2,
+// NumElement-NumElement/2+1..., 2*NumElement-1})
----------------
The comment does not match the coding logic. Something is off here. May be it would be NumElement+NumElement/2?
================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:205
+
+void X86InterleavedAccessGroup::transposeChar_32x4(
+ ArrayRef<Instruction *> Matrix,
----------------
I would think you could write a more generate transpose function, a function of 8 elements which would scale into 16 and 32. Why is it written only for 32 VLen?
================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:384
+ // VecInst contains the Ptr argument.
+ Value *VecInst = Inst->getOperand(1);
+ Type *IntOf16 = Type::getInt16Ty(Shuffles[0]->getContext());
----------------
Value *VecInst = SI->getPointerOperand()
================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:384
+ // VecInst contains the Ptr argument.
+ Value *VecInst = Inst->getOperand(1);
+ Type *IntOf16 = Type::getInt16Ty(Shuffles[0]->getContext());
----------------
Farhana wrote:
> Value *VecInst = SI->getPointerOperand()
What is the reason for generating for stores instead of 1 wide store? I would think codegen is smart enough to generate expected 4 optimized stores here. Can you please keep it consistent with the other case which means do the concatenation of the transposed vectors and generate 1 wide store.
StoreInst *SI = cast<StoreInst>(Inst);
Value *BasePtr = SI->getPointerOperand();
case 32: {
transposeChar_32x4(DecomposedVectors, TransposedVectors);
// VecInst contains the Ptr argument.
Value *VecInst = SI->getPointerOperand();
Type *IntOf16 = Type::getInt16Ty(Shuffles[0]->getContext());
// From <128xi8>* to <64xi16>*
Type *VecTran = VectorType::get(IntOf16, 64)->getPointerTo();
BasePtr = Builder.CreateBitCast(VecInst, VecTran);
And move the following two statements out of the switch.
// 3. Concatenate the contiguous-vectors back into a wide vector.
Value *WideVec = concatenateVectors(Builder, TransposedVectors);
// 4. Generate a store instruction for wide-vec.
Builder.CreateAlignedStore(WideVec, BasePtr, // replace this with
SI->getAlignment());
================
Comment at: lib/Target/X86/X86InterleavedAccess.cpp:389
+ Value *VecBasePtr = Builder.CreateBitCast(VecInst, VecTran);
+ for (unsigned i = 0; i < 4; i++) {
+ Value *NewBasePtr = Builder.CreateGEP(VecBasePtr, Builder.getInt32(i));
----------------
++i
https://reviews.llvm.org/D34601
More information about the llvm-commits
mailing list