[llvm-bugs] [Bug 29065] New: [SCEV] scev expansion generates redundent code in memcheck during vectorization
via llvm-bugs
llvm-bugs at lists.llvm.org
Fri Aug 19 14:15:26 PDT 2016
https://llvm.org/bugs/show_bug.cgi?id=29065
Bug ID: 29065
Summary: [SCEV] scev expansion generates redundent code in
memcheck during vectorization
Product: libraries
Version: trunk
Hardware: PC
OS: All
Status: NEW
Severity: normal
Priority: P
Component: Scalar Optimizations
Assignee: unassignedbugs at nondot.org
Reporter: wmi at google.com
CC: llvm-bugs at lists.llvm.org
Classification: Unclassified
Testcase 1.c:
-----------------------------------
#define COMBP(d, s, m) ( ((d) & ~(m)) | ((s) & (m)) )
void foo(unsigned *dd, int dw, int dh, unsigned *ds, int dp, int sp, int sx,
int sy, int dx, int dy, unsigned lwmask) {
int i, j;
unsigned *ls, *ld;
int nw = dw >> 5;
int lwbits = dw & 31;
unsigned *psd = ds + sp * sy + (sx >> 5);
unsigned *pdd = dd + dp * dy + (dx >> 5);
for (i = 0; i < dh; i++) {
ls = psd + i * sp;
ld = pdd + i * dp;
for (j = 0; j < nw; j++) {
*ld = (*ls & *ld);
ld++;
ls++;
}
}
}
------------------------------------
For the memcheck of innerloop vectorization, we expect the code like this:
ls = psd + i * sp;
ld = pdd + i * dp;
if (ls < ld + nw || ld < ls + nw)
goto conflict.
However, some computation defining psd and pdd are regenerated inside preheader
of innerloop, and these redundent computations cannot be cleaned up in later
pass.
for.body: ; preds = %for.inc21,
%for.body.lr.ph
%indvars.iv = phi i64 [ 0, %for.body.lr.ph ], [ %indvars.iv.next, %for.inc21
]
%10 = mul i64 %0, %indvars.iv
%11 = add i64 %5, %10
%scevgep = getelementptr i32, i32* %dd, i64 %11 ==> The add involving %dd
has been computed outside of loop when defining pdd. why not directly use pdd
here.
%scevgep51 = bitcast i32* %scevgep to i8*
%12 = add i64 %7, %10
%scevgep52 = getelementptr i32, i32* %dd, i64 %12 ==> The add involving %dd
has been computed outside of loop when defininng pdd. why not directly use pdd
here.
%scevgep5253 = bitcast i32* %scevgep52 to i8*
%13 = mul i64 %1, %indvars.iv
%14 = add i64 %8, %13
%scevgep54 = getelementptr i32, i32* %ds, i64 %14 ==> The add involving %ds
has been computed outside of loop when defining psd. why not directly use psd
here.
%scevgep5455 = bitcast i32* %scevgep54 to i8*
%15 = add i64 %9, %13
%scevgep56 = getelementptr i32, i32* %ds, i64 %15 ==> The add involving %ds
has been computed outside of loop when defining psd. why not directly use psd
here.
%scevgep5657 = bitcast i32* %scevgep56 to i8*
br i1 %cmp1742, label %for.body18.preheader, label %for.inc21
...
vector.memcheck: ; preds = %min.iters.checked
%bound0 = icmp ule i8* %scevgep51, %scevgep5657
%bound1 = icmp ule i8* %scevgep5455, %scevgep5253
%found.conflict = and i1 %bound0, %bound1
%memcheck.conflict = and i1 %found.conflict, true
Final assembly for the memcheck:
# BB#5: # %vector.memcheck
# in Loop: Header=BB0_2 Depth=1
movq %r9, %r10
movq %r13, %r9
movq %rbp, %r13
movq -48(%rsp), %rdx # 8-byte Reload
leaq (%rdx,%rbx), %rbp
leaq (%r13,%rbp,4), %r15
movq -56(%rsp), %rdx # 8-byte Reload
leaq (%rdx,%rax), %rdx
movq -104(%rsp), %rbp # 8-byte Reload
leaq (%rbp,%rdx,4), %rdx
cmpq %rdx, %r15
movq -104(%rsp), %rbp # 8-byte Reload
ja .LBB0_8
# BB#6: # %vector.memcheck
# in Loop: Header=BB0_2 Depth=1
addq -72(%rsp), %rbx # 8-byte Folded Reload
leaq (%r13,%rbx,4), %rdx
addq -64(%rsp), %rax # 8-byte Folded Reload
leaq (%rbp,%rax,4), %rax
cmpq %rdx, %rax
ja .LBB0_8
When the iteration number of innerloop is not large enough, the preparation
code for vectorization/unroll in the preheader of innerloop matters. Such
preparation code to compute loop iteration number or generate runtime check is
often generated by SCEV expansion. Here the redundency exists because SCEV
expansion doesn't reuse existing value psd and pdd.
* D12090 and D21313 relieved the reuse problem of SCEV expansion somewhat, but
why they are defeated here?
New SCEV expr is created after decomposition and combination of original
SCEVs, without those SCEVs appearing inside of the new SCEV expr. An example,
an old SCEVAddExpr S which is {a + b} has an existing value v = a + b
associated with it. When we generate SCEV for expr S + c, a new SCEVAddExpr
with three operands will be created {a + b + c} and no S can be found in the
new SCEVAddExpr. When we expand the new SCEV, it will only generate a + b + c
instead of v + c.
* Why such redundencies cannot be cleaned up by later passes?
SCEV transformation can make the expanded expr a lot different from the expr
to be reused, for example, SCEV expansion wants to turn things like
ptrtoint+arithmetic+inttoptr into GEP so expr may be aggressively reassociated.
After that, enabling extra-vectorizer-passes and even NaryReassociate cannot
clean up the redundencies.
To solve the problem, adding more facility to enhance the reuse during SCEV
expansion or enhancing cleanup passes, I am wondering which way is better to
persue.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20160819/0f2dbdf0/attachment-0001.html>
More information about the llvm-bugs
mailing list