[llvm-bugs] [Bug 26063] New: Significant performance regression with r256890
via llvm-bugs
llvm-bugs at lists.llvm.org
Thu Jan 7 06:20:54 PST 2016
https://llvm.org/bugs/show_bug.cgi?id=26063
Bug ID: 26063
Summary: Significant performance regression with r256890
Product: libraries
Version: trunk
Hardware: PC
OS: All
Status: NEW
Severity: normal
Priority: P
Component: Common Code Generator Code
Assignee: unassignedbugs at nondot.org
Reporter: james.molloy at arm.com
CC: dan433584 at gmail.com, llvm-bugs at lists.llvm.org
Classification: Unclassified
Created attachment 15577
--> https://llvm.org/bugs/attachment.cgi?id=15577&action=edit
Reproducer to show the actual output difference
We've noticed a 17% regression in an important third-party benchmark, and have
bisected it to:
Author: Dan Gohman <dan433584 at gmail.com>
Date: Wed Jan 6 00:43:06 2016 +0000
[SelectionDAGBuilder] Set NoUnsignedWrap for inbounds gep and load/store
offsets.
In an inbounds getelementptr, when an index produces a constant
non-negative
offset to add to the base, the add can be assumed to not have unsigned
overflow.
This relies on the assumption that addresses can't occupy more than half
the
address space, which isn't possible in C because it wouldn't be possible to
represent the difference between the start of the object and
one-past-the-end
in a ptrdiff_t.
Setting the NoUnsignedWrap flag is theoretically useful in general, and is
specifically useful to the WebAssembly backend, since it permits stronger
constant offset folding.
Differential Revision: http://reviews.llvm.org/D15544
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@256890
91177308-0d34-0410-b5e6-96231b3b80d8
A reproducer is attached in "reproduce.ll", which, when compiled with llc -O3
goes from this (snipped out the loop body only) before r256890:
.LBB0_1: @ %while.body
@ =>This Inner Loop Header: Depth=1
bl f
ldrb r1, [r4, #1]!
cmp r0, #1
cmpne r1, #0
bne .LBB0_1
To this after r256890:
.LBB0_1: @ %while.body
@ =>This Inner Loop Header: Depth=1
bl f
mov r1, r0
add r0, r4, #1
cmp r1, #1
beq .LBB0_3
@ BB#2: @ %while.body
@ in Loop: Header=BB0_1 Depth=1
ldrb r1, [r4, #1]
mov r4, r0
cmp r1, #0
bne .LBB0_1
What appears to be happening is that two different GEPs of the same base and
offset are created - one is inbounds and the other not:
%incdec.ptr = getelementptr inbounds i8, i8* %a.addr.06, i32 1
%scevgep = getelementptr i8, i8* %a.addr.06, i32 1
Previously, the SDAG nodes created for these would have been identical and they
would have been commoned, which in this case provides further scope for
optimization by ARM's load/store addressing modes. But now, two different SDAG
nodes are created and this cannot happen. Therefore the code is pessimized.
The file "minimal-reproducer.ll" contains this code pattern extracted, with a
bit of unoptimizable control flow to force the required basic block structure.
Unfortunately when running llc on the minimal reproducer the correct result
appears. This is because ISel happens to select the same instruction for both
GEPs (an ADDri) and this is immediately CSE'd. But looking at the debug output
I can see that two different yet identical (because fast-math flags aren't
printed!) SDAG nodes are kept all the way through legalization and DAG combine.
I'm not sure what the actual fix is here - there are many places where this
could technically be fixed. But as-is this patch causes pretty nasty
regressions.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20160107/cf564d33/attachment.html>
More information about the llvm-bugs
mailing list