[llvm] r248943 - [SLP] Don't vectorize loads of non-packed types (like i1, i2).
Mikhail Zolotukhin via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 30 16:51:39 PDT 2015
> On Sep 30, 2015, at 4:46 PM, Philip Reames <listmail at philipreames.com> wrote:
>
> Going purely by the description, this change sounds a bit odd. If we know the i1 load will be codegened an an i8 load followed by a truncate, wouldn't it be better to vectorize this as a <4 x i8> load followed by a vector truncate? I get that this patch is fixing a correctness issue and needs fixed, but what's the long term answer here?
Yes, this patch is for correctness. I haven't thought about how we should vectorize such cases in future, but what you suggest makes sense to me.
Michael
>
> Just curious,
> Philip
>
> On 09/30/2015 02:05 PM, Michael Zolotukhin via llvm-commits wrote:
>> Author: mzolotukhin
>> Date: Wed Sep 30 16:05:43 2015
>> New Revision: 248943
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=248943&view=rev
>> Log:
>> [SLP] Don't vectorize loads of non-packed types (like i1, i2).
>>
>> Summary:
>> Given an array of i2 elements, 4 consecutive scalar loads will be lowered to
>> i8-sized loads and thus will access 4 consecutive bytes in memory. If we
>> vectorize these loads into a single <4 x i2> load, it'll access only 1 byte in
>> memory. Hence, we should prohibit vectorization in such cases.
>>
>> PS: Initial patch was proposed by Arnold.
>>
>> Reviewers: aschwaighofer, nadav, hfinkel
>>
>> Subscribers: llvm-commits
>>
>> Differential Revision: http://reviews.llvm.org/D13277
>>
>> Modified:
>> llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
>> llvm/trunk/test/Transforms/SLPVectorizer/X86/bad_types.ll
>>
>> Modified: llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp?rev=248943&r1=248942&r2=248943&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp (original)
>> +++ llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp Wed Sep 30 16:05:43 2015
>> @@ -1158,6 +1158,23 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>> return;
>> }
>> case Instruction::Load: {
>> + // Check that a vectorized load would load the same memory as a scalar
>> + // load.
>> + // For example we don't want vectorize loads that are smaller than 8 bit.
>> + // Even though we have a packed struct {<i2, i2, i2, i2>} LLVM treats
>> + // loading/storing it as an i8 struct. If we vectorize loads/stores from
>> + // such a struct we read/write packed bits disagreeing with the
>> + // unvectorized version.
>> + const DataLayout &DL = F->getParent()->getDataLayout();
>> + Type *ScalarTy = VL[0]->getType();
>> +
>> + if (DL.getTypeSizeInBits(ScalarTy) !=
>> + DL.getTypeAllocSizeInBits(ScalarTy)) {
>> + BS.cancelScheduling(VL);
>> + newTreeEntry(VL, false);
>> + DEBUG(dbgs() << "SLP: Gathering loads of non-packed type.\n");
>> + return;
>> + }
>> // Check if the loads are consecutive or of we need to swizzle them.
>> for (unsigned i = 0, e = VL.size() - 1; i < e; ++i) {
>> LoadInst *L = cast<LoadInst>(VL[i]);
>> @@ -1167,7 +1184,7 @@ void BoUpSLP::buildTree_rec(ArrayRef<Val
>> DEBUG(dbgs() << "SLP: Gathering non-simple loads.\n");
>> return;
>> }
>> - const DataLayout &DL = F->getParent()->getDataLayout();
>> +
>> if (!isConsecutiveAccess(VL[i], VL[i + 1], DL)) {
>> if (VL.size() == 2 && isConsecutiveAccess(VL[1], VL[0], DL)) {
>> ++NumLoadsWantToChangeOrder;
>>
>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/bad_types.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/bad_types.ll?rev=248943&r1=248942&r2=248943&view=diff
>> ==============================================================================
>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/bad_types.ll (original)
>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/bad_types.ll Wed Sep 30 16:05:43 2015
>> @@ -47,4 +47,30 @@ exit:
>> ret void
>> }
>> +define i8 @test3(i8 *%addr) {
>> +; Check that we do not vectorize types that are padded to a bigger ones.
>> +;
>> +; CHECK-LABEL: @test3
>> +; CHECK-NOT: <4 x i2>
>> +; CHECK: ret i8
>> +entry:
>> + %a = bitcast i8* %addr to i2*
>> + %a0 = getelementptr inbounds i2, i2* %a, i64 0
>> + %a1 = getelementptr inbounds i2, i2* %a, i64 1
>> + %a2 = getelementptr inbounds i2, i2* %a, i64 2
>> + %a3 = getelementptr inbounds i2, i2* %a, i64 3
>> + %l0 = load i2, i2* %a0, align 1
>> + %l1 = load i2, i2* %a1, align 1
>> + %l2 = load i2, i2* %a2, align 1
>> + %l3 = load i2, i2* %a3, align 1
>> + br label %bb1
>> +bb1: ; preds = %entry
>> + %p0 = phi i2 [ %l0, %entry ]
>> + %p1 = phi i2 [ %l1, %entry ]
>> + %p2 = phi i2 [ %l2, %entry ]
>> + %p3 = phi i2 [ %l3, %entry ]
>> + %r = zext i2 %p2 to i8
>> + ret i8 %r
>> +}
>> +
>> declare void @f(i64, i64)
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
More information about the llvm-commits
mailing list