[PATCH] D77635: [LV] Vectorize with FoldTail when Primary Induction is absent
Serguei Katkov via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Apr 13 21:19:26 PDT 2020
skatkov added a comment.
Hello @Ayal, unfortunately this patch causes the functional regression.
For the test below, vectorizer decided to vectorize inner loop by 32 while it has only a couple of iteration and it causes a miscompile.
Please fix it quickly or revert the patch.
The reproducer:
; ModuleID = './repro.ll'
source_filename = "./repro.ll"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128-ni:1-p2:32:8:8:32-ni:2"
target triple = "x86_64-unknown-linux-gnu"
@global = external global i8*
define void @hoge(i8* nonnull align 8 dereferenceable_or_null(8) %arg, i8* align 8 dereferenceable_or_null(16) %arg1) {
bb:
%tmp = load atomic i8*, i8** @global unordered, align 8
%tmp2 = getelementptr inbounds i8, i8* %tmp, i64 852
br label %bb3
bb3: ; preds = %bb12, %bb
%tmp4 = phi i32 [ 1, %bb ], [ %tmp15, %bb12 ]
%tmp5 = phi i32 [ 0, %bb ], [ %tmp8, %bb12 ]
br label %bb7
bb6: ; preds = %bb12
ret void
bb7: ; preds = %bb7, %bb3
%tmp8 = phi i32 [ %tmp5, %bb3 ], [ %tmp10, %bb7 ]
%tmp9 = phi i32 [ 1, %bb3 ], [ %tmp10, %bb7 ]
%tmp10 = add nuw nsw i32 %tmp9, 1
%tmp11 = icmp ugt i32 %tmp9, 5
br i1 %tmp11, label %bb12, label %bb7
bb12: ; preds = %bb7
%tmp13 = mul i32 %tmp8, %tmp4
%tmp14 = trunc i32 %tmp13 to i8
fence release
store atomic i8 %tmp14, i8* %tmp2 unordered, align 1
fence seq_cst
%tmp15 = add nuw nsw i32 %tmp4, 1
%tmp16 = icmp ult i32 %tmp4, 240
br i1 %tmp16, label %bb3, label %bb6
}
ran as
> opt -passes=loop-vectorize -S -o res.ll ./repro.ll
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D77635/new/
https://reviews.llvm.org/D77635
More information about the llvm-commits
mailing list