[PATCH] [LSR] Generate and use zero extends
Sanjoy Das
sanjoy at playingwithpointers.com
Mon Apr 27 14:05:15 PDT 2015
> Does your test fail on trunk? This is what the test outputs for me:
>
> ok_161: ; preds = %ok_158
>
> %lsr.iv.next = add nuw nsw i64 %lsr.iv, 1
> %4 = add i64 %0, %lsr.iv.next
> %tmp1 = trunc i64 %4 to i32
> %tmp188 = icmp slt i32 %tmp1, %tmp160
> br i1 %tmp188, label %ok_146, label %block_81
That's behavior I'm trying to avoid -- I don't want two additions on
every iteration of the loop.
To elaborate a little bit, here is the -debug-only=loop-reduce on the
test case before the change:
'''
After filtering out undesirable candidates:
LSR is examining the following uses:
LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32
reg({%tmp156,+,1}<nuw><nsw><%ok_146>)
reg((zext i32 %tmp156 to i64)) + 1*reg({2,+,1}<nw><%ok_146>) + imm(-2)
reg((zext i32 %tmp156 to i64)) + 1*reg({0,+,1}<nuw><%ok_146>)
reg((zext i32 %tmp156 to i64)) + 1*reg({-1,+,1}<nw><%ok_146>) + imm(1)
reg((zext i32 %tmp156 to i64)) + 1*reg({3,+,1}<nw><%ok_146>) + imm(-3)
LSR Use: Kind=Basic, Offsets={0}, all-fixups-outside-loop, widest
fixup type: i64
reg({(-1 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>)
reg({(zext i32 %tmp156 to i64),+,1}<nuw><nsw><%ok_146>) + imm(-1)
reg((zext i32 %tmp156 to i64)) + 1*reg({0,+,1}<nuw><%ok_146>) + imm(-1)
reg((zext i32 %tmp156 to i64)) + 1*reg({-1,+,1}<nw><%ok_146>)
reg((-1 + (zext i32 %tmp156 to i64))) + 1*reg({0,+,1}<nuw><%ok_146>)
reg({(3 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) + imm(-4)
reg({(2 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) + imm(-3)
LSR Use: Kind=Address of double, Offsets={16}, widest fixup type: double*
-16 + reg({(16 + (8 * (zext i32 %tmp156 to i64)) + %d),+,8}<nw><%ok_146>)
-24 + reg(((8 * (zext i32 %tmp156 to i64)) + %d)) + 4*reg({6,+,2}<%ok_146>)
-16 + reg(((8 * (zext i32 %tmp156 to i64)) + %d)) + 2*reg({8,+,4}<%ok_146>)
-24 + reg(((8 * (zext i32 %tmp156 to i64)) + %d)) + 2*reg({12,+,4}<%ok_146>)
-16 + reg(%d) + 8*reg({(2 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>)
-24 + reg(%d) + 8*reg({(3 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>)
-8 + reg((16 + %d)<nsw>) + 8*reg({(-1 + (zext i32 %tmp156 to
i64)),+,1}<nw><%ok_146>)
-16 + reg(((8 * (zext i32 %tmp156 to i64)) + %d)) + 4*reg({4,+,2}<%ok_146>)
-16 + reg(((8 * (zext i32 %tmp156 to i64)) + %d)) +
8*reg({2,+,1}<nw><%ok_146>)
-16 + reg((16 + (8 * (zext i32 %tmp156 to i64)) + %d)) +
2*reg({0,+,4}<%ok_146>)
-16 + reg((16 + (8 * (zext i32 %tmp156 to i64)) + %d)) +
4*reg({0,+,2}<%ok_146>)
-16 + reg((16 + (8 * (zext i32 %tmp156 to i64)) + %d)) +
8*reg({0,+,1}<nuw><%ok_146>)
-16 + reg((16 + %d)<nsw>) + 2*reg({(4 * (zext i32 %tmp156 to
i64)),+,4}<nuw><nsw><%ok_146>)
-16 + reg((16 + %d)<nsw>) + 4*reg({(2 * (zext i32 %tmp156 to
i64)),+,2}<%ok_146>)
-16 + reg((16 + %d)<nsw>) + 8*reg({(zext i32 %tmp156 to
i64),+,1}<nuw><nsw><%ok_146>)
LSR Use: Kind=Address of i32, Offsets={12}, widest fixup type: i32*
-12 + reg({(12 + (4 * (zext i32 %tmp156 to i64)) + %b),+,4}<nw><%ok_146>)
-20 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) +
2*reg({4,+,2}<%ok_146>)
-12 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) +
1*reg({0,+,4}<%ok_146>)
-12 + reg((12 + %b)<nsw>) + 1*reg({(4 * (zext i32 %tmp156 to
i64)),+,4}<nuw><nsw><%ok_146>)
-12 + reg(((4 * (zext i32 %tmp156 to i64)) + %b)) + 2*reg({6,+,2}<%ok_146>)
-20 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) +
1*reg({8,+,4}<%ok_146>)
-12 + reg(((4 * (zext i32 %tmp156 to i64)) + %b)) + 1*reg({12,+,4}<%ok_146>)
-8 + reg(%b) + 4*reg({(2 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>)
-12 + reg(%b) + 4*reg({(3 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>)
-8 + reg((12 + %b)<nsw>) + 4*reg({(-1 + (zext i32 %tmp156 to
i64)),+,1}<nw><%ok_146>)
-12 + reg(((4 * (zext i32 %tmp156 to i64)) + %b)) +
4*reg({3,+,1}<nw><%ok_146>)
-12 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) +
2*reg({0,+,2}<%ok_146>)
-12 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) +
4*reg({0,+,1}<nuw><%ok_146>)
-12 + reg((12 + %b)<nsw>) + 2*reg({(2 * (zext i32 %tmp156 to
i64)),+,2}<%ok_146>)
-12 + reg((12 + %b)<nsw>) + 4*reg({(zext i32 %tmp156 to
i64),+,1}<nuw><nsw><%ok_146>)
New best at 4 regs, with addrec cost 2, plus 2 scale cost, plus 9 imm
cost, plus 3 setup cost.
Regs: {%tmp156,+,1}<nuw><nsw><%ok_146> {(-1 + (zext i32 %tmp156 to
i64)),+,1}<nw><%ok_146> (16 + %d)<nsw> (12 + %b)<nsw>
New best at 4 regs, with addrec cost 1, plus 3 base adds, plus 2 scale
cost, plus 3 setup cost.
Regs: {0,+,1}<nuw><%ok_146> (zext i32 %tmp156 to i64) (16 + (8 *
(zext i32 %tmp156 to i64)) + %d) (12 + (4 * (zext i32 %tmp156 to i64))
+ %b)
The chosen solution requires 4 regs, with addrec cost 1, plus 3 base
adds, plus 2 scale cost, plus 3 setup cost:
LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32
reg((zext i32 %tmp156 to i64)) + 1*reg({0,+,1}<nuw><%ok_146>)
LSR Use: Kind=Basic, Offsets={0}, all-fixups-outside-loop, widest
fixup type: i64
reg((zext i32 %tmp156 to i64)) + 1*reg({0,+,1}<nuw><%ok_146>) + imm(-1)
LSR Use: Kind=Address of double, Offsets={16}, widest fixup type: double*
-16 + reg((16 + (8 * (zext i32 %tmp156 to i64)) + %d)) +
8*reg({0,+,1}<nuw><%ok_146>)
LSR Use: Kind=Address of i32, Offsets={12}, widest fixup type: i32*
-12 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) +
4*reg({0,+,1}<nuw><%ok_146>)
'''
And here is the debug output after the change. LSR now "sees" a few
more candidate formulae (marked with '<- new solution +++++'):
'''
After filtering out undesirable candidates:
LSR is examining the following uses:
LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32
reg({%tmp156,+,1}<nuw><nsw><%ok_146>)
reg(%tmp156) + 1*reg({0,+,1}<nuw><%ok_146>)
reg((zext i32 %tmp156 to i64)) + 1*reg({0,+,1}<nuw><%ok_146>)
reg((zext i32 %tmp156 to i64)) + 1*reg({-1,+,1}<nw><%ok_146>) + imm(1)
reg((zext i32 %tmp156 to i64)) + 1*reg({3,+,1}<nw><%ok_146>) + imm(-3)
reg((zext i32 %tmp156 to i64)) + 1*reg({2,+,1}<nw><%ok_146>) + imm(-2)
LSR Use: Kind=Basic, Offsets={0}, all-fixups-outside-loop, widest
fixup type: i64
reg({(-1 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>)
reg({(zext i32 %tmp156 to i64),+,1}<nuw><nsw><%ok_146>) + imm(-1)
reg((zext i32 %tmp156 to i64)) + 1*reg({0,+,1}<nuw><%ok_146>) + imm(-1)
reg((zext i32 %tmp156 to i64)) + 1*reg({-1,+,1}<nw><%ok_146>)
reg((-1 + (zext i32 %tmp156 to i64))) + 1*reg({0,+,1}<nuw><%ok_146>)
reg({%tmp156,+,1}<nuw><nsw><%ok_146>) + imm(-1) <- new solution +++++
reg({(3 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) + imm(-4)
reg({(2 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>) + imm(-3)
LSR Use: Kind=Address of double, Offsets={16}, widest fixup type: double*
-16 + reg({(16 + (8 * (zext i32 %tmp156 to i64)) + %d),+,8}<nw><%ok_146>)
-24 + reg(((8 * (zext i32 %tmp156 to i64)) + %d)) + 4*reg({6,+,2}<%ok_146>)
-16 + reg(((8 * (zext i32 %tmp156 to i64)) + %d)) + 2*reg({8,+,4}<%ok_146>)
-24 + reg(((8 * (zext i32 %tmp156 to i64)) + %d)) + 2*reg({12,+,4}<%ok_146>)
-16 + reg(%d) + 8*reg({(2 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>)
-24 + reg(%d) + 8*reg({(3 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>)
-8 + reg((16 + %d)<nsw>) + 8*reg({(-1 + (zext i32 %tmp156 to
i64)),+,1}<nw><%ok_146>)
-16 + reg((16 + %d)<nsw>) +
8*reg({%tmp156,+,1}<nuw><nsw><%ok_146>) <- new solution +++++
-16 + reg(((8 * (zext i32 %tmp156 to i64)) + %d)) + 4*reg({4,+,2}<%ok_146>)
-16 + reg(((8 * (zext i32 %tmp156 to i64)) + %d)) +
8*reg({2,+,1}<nw><%ok_146>)
-16 + reg((16 + (8 * (zext i32 %tmp156 to i64)) + %d)) +
2*reg({0,+,4}<%ok_146>)
-16 + reg((16 + (8 * (zext i32 %tmp156 to i64)) + %d)) +
4*reg({0,+,2}<%ok_146>)
-16 + reg((16 + (8 * (zext i32 %tmp156 to i64)) + %d)) +
8*reg({0,+,1}<nuw><%ok_146>)
-16 + reg((16 + %d)<nsw>) + 2*reg({(4 * (zext i32 %tmp156 to
i64)),+,4}<nuw><nsw><%ok_146>)
-16 + reg((16 + %d)<nsw>) + 4*reg({(2 * (zext i32 %tmp156 to
i64)),+,2}<%ok_146>)
-16 + reg((16 + %d)<nsw>) + 8*reg({(zext i32 %tmp156 to
i64),+,1}<nuw><nsw><%ok_146>)
LSR Use: Kind=Address of i32, Offsets={12}, widest fixup type: i32*
-12 + reg({(12 + (4 * (zext i32 %tmp156 to i64)) + %b),+,4}<nw><%ok_146>)
-20 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) +
2*reg({4,+,2}<%ok_146>)
-12 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) +
1*reg({0,+,4}<%ok_146>)
-12 + reg((12 + %b)<nsw>) + 1*reg({(4 * (zext i32 %tmp156 to
i64)),+,4}<nuw><nsw><%ok_146>)
-12 + reg(((4 * (zext i32 %tmp156 to i64)) + %b)) + 2*reg({6,+,2}<%ok_146>)
-20 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) +
1*reg({8,+,4}<%ok_146>)
-12 + reg(((4 * (zext i32 %tmp156 to i64)) + %b)) + 1*reg({12,+,4}<%ok_146>)
-8 + reg(%b) + 4*reg({(2 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>)
-12 + reg(%b) + 4*reg({(3 + (zext i32 %tmp156 to i64)),+,1}<nw><%ok_146>)
-8 + reg((12 + %b)<nsw>) + 4*reg({(-1 + (zext i32 %tmp156 to
i64)),+,1}<nw><%ok_146>)
-12 + reg((12 + %b)<nsw>) +
4*reg({%tmp156,+,1}<nuw><nsw><%ok_146>) <- new solution +++++
-12 + reg(((4 * (zext i32 %tmp156 to i64)) + %b)) +
4*reg({3,+,1}<nw><%ok_146>)
-12 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) +
2*reg({0,+,2}<%ok_146>)
-12 + reg((12 + (4 * (zext i32 %tmp156 to i64)) + %b)) +
4*reg({0,+,1}<nuw><%ok_146>)
-12 + reg((12 + %b)<nsw>) + 2*reg({(2 * (zext i32 %tmp156 to
i64)),+,2}<%ok_146>)
-12 + reg((12 + %b)<nsw>) + 4*reg({(zext i32 %tmp156 to
i64),+,1}<nuw><nsw><%ok_146>)
New best at 3 regs, with addrec cost 1, plus 1 base add, plus 2 scale
cost, plus 2 setup cost.
Regs: {%tmp156,+,1}<nuw><nsw><%ok_146> (16 + %d)<nsw> (12 + %b)<nsw>
The chosen solution requires 3 regs, with addrec cost 1, plus 1 base
add, plus 2 scale cost, plus 2 setup cost:
LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32
reg({%tmp156,+,1}<nuw><nsw><%ok_146>)
LSR Use: Kind=Basic, Offsets={0}, all-fixups-outside-loop, widest
fixup type: i64
reg({%tmp156,+,1}<nuw><nsw><%ok_146>) + imm(-1)
LSR Use: Kind=Address of double, Offsets={16}, widest fixup type: double*
-16 + reg((16 + %d)<nsw>) + 8*reg({%tmp156,+,1}<nuw><nsw><%ok_146>)
LSR Use: Kind=Address of i32, Offsets={12}, widest fixup type: i32*
-12 + reg((12 + %b)<nsw>) + 4*reg({%tmp156,+,1}<nuw><nsw><%ok_146>)
'''
The new formulae lets LSR choose a better solution.
Looking at the output tells me that I should definitely add something
to the textual representation of a formulae that has an implicit zero
extend.
> Would you be able to test both post-inc and pre-inc variants? We've a
> lot of bugs because of TransformForPostIncUse.
I think that is a good idea. I'm not familiar with LSR so I suspect
it will take me some time to get there, but just that is actually a
good reason for writing these tests. :)
It also looks like (I'm not a 100% sure) that there's an inconsistency
in how LSR treats overflow in post-inc expressions vs. how SCEV treats
them. For instance, in ScalarEvolution.cpp line ~1500 we say that
{S,+,X} is <nuw> if "S + MaxBECount * X" does not unsigned
overflow. But that does not say anything about the overflow of the
post-inc SCEV expression (e.g. if BECount == 0 then all add recs are
<nuw>).
The problem is that TransformForPostIncUse(zext({S,+,X})) =
zext({S+X,+,X}) -- ScalarEvolutionNormalization.cpp, line ~100.
Now if {S,+,1} is nuw then {zext S,+,1} = zext({S,+,1}), but
{(zext S) + 1,+,1} is != zext({S+1,+,1}) by SCEV's definition of nuw.
But TransformForPostIncUse({zext S,+,1}) =
TransformForPostIncUse(zext({S,+,1})) = zext({S+1,+,1}) when it should
really be {(zext S) + 1,+,1}. I don't have a concrete example where
this is a problem, but this definitely looks fishy to me.
-- Sanjoy
On Mon, Apr 27, 2015 at 11:14 AM, Andrew Trick <atrick at apple.com> wrote:
> This looks reasonable. Thanks!
>
> Does your test fail on trunk? This is what the test outputs for me:
>
> ok_161: ; preds = %ok_158
>
> %lsr.iv.next = add nuw nsw i64 %lsr.iv, 1
> %4 = add i64 %0, %lsr.iv.next
> %tmp1 = trunc i64 %4 to i32
> %tmp188 = icmp slt i32 %tmp1, %tmp160
> br i1 %tmp188, label %ok_146, label %block_81
>
> Would you be able to test both post-inc and pre-inc variants? We've a lot of bugs because of TransformForPostIncUse.
>
>
> ================
> Comment at: lib/Transforms/Scalar/LoopStrengthReduce.cpp:3685
> @@ -3669,1 +3684,3 @@
>
> +/// GenerateZExts - Generate reuse formulae from different IV types.
> +void LSRInstance::GenerateZExts(LSRUse &LU, unsigned LUIdx, Formula Base) {
> ----------------
> Please comment this at least as well as your commit comment if not better.
>
> http://reviews.llvm.org/D9181
>
> EMAIL PREFERENCES
> http://reviews.llvm.org/settings/panel/emailpreferences/
>
>
More information about the llvm-commits
mailing list