[llvm-dev] Optimizations hindered by GVN widening
Anna Thomas via llvm-dev
llvm-dev at lists.llvm.org
Thu Jun 30 08:05:40 PDT 2016
Currently, the GVN optimization widens loads if the alignment permits it. There are some limitations that show up, as seen in example below:
Initial IR:
declare void @consume(i8) readonly
define i8 @bar(i8* align 2 %a) {
%1 = load i8, i8* %a
%idx = getelementptr i8, i8* %a, i64 1
%2 = load i8, i8* %idx, align 1
call void @consume(i8 %1).
ret i8 %2
}
define i1 @foo(i8* %a) {
entry:
%0 = call i8 @bar(i8* %a)
%1 = icmp eq i8 %0, 0
br i1 %1, label %cont.1, label %exit
cont.1:
store i8 0, i8* %a
%2 = call i8 @bar(i8* %a)
%3 = icmp eq i8 %2, 0
ret i1 %3
exit:
ret i1 true
}
Since %a is 2 byte aligned, GVN widens the loads in bar, then bar() gets inlined into foo. The resulting final IR at O3:
define i8 @bar(i8* align 2 %a) {
%1 = bitcast i8* %a to i16*
%2 = load i16, i16* %1, align 2
%3 = trunc i16 %2 to i8
%idx = getelementptr i8, i8* %a, i64 1
%4 = lshr i16 %2, 8
%5 = trunc i16 %4 to i8
call void @consume(i8 %3)
ret i8 %5
}
define i1 @foo(i8* %a) {
entry:
%0 = bitcast i8* %a to i16*
%1 = load i16, i16* %0, align 2 <— widened load from bar()
%2 = trunc i16 %1 to i8
call void @consume(i8 %2)
%3 = icmp ult i16 %1, 256
br i1 %3, label %cont.1, label %exit
cont.1: ; preds = %entry
store i8 0, i8* %a, align 2
%4 = load i16, i16* %0, align 2 <— widened load from bar()
%5 = trunc i16 %4 to i8
call void @consume(i8 %5)
%6 = icmp ult i16 %4, 256
ret i1 %6
exit: ; preds = %entry
ret i1 true
}
In the absence of GVN widening (we can see this when %a is 1 byte aligned in bar), bar is inlined into foo as-is.
Final IR at O3:
define i8 @bar(i8* align 1 %a) {
%1 = load i8, i8* %a, align 1 <— align is 1, so GVN does not widen load
%idx = getelementptr i8, i8* %a, i64 1
%2 = load i8, i8* %idx, align 1
call void @consume(i8 %1)
ret i8 %2
}
define i1 @foo(i8* %a) {
entry:
%0 = load i8, i8* %a, align 1
%idx.i = getelementptr i8, i8* %a, i64 1
%1 = load i8, i8* %idx.i, align 1 <— both loads exist non-widened from bar(). Used in GVN for removing the loads in cont.1 BB.
call void @consume(i8 %0)
%2 = icmp eq i8 %1, 0
br i1 %2, label %cont.1, label %exit
cont.1: ; preds = %entry
store i8 0, i8* %a, align 1
; both the loads are removed. First load has the value fed from the store above. Second load is same as the one in entry BB (%1).
call void @consume(i8 0)
ret i1 true
exit: ; preds = %entry
ret i1 true
}
Here the 2 loads in the cont.1 basic block are removed by GVN, since the loads are not widened to a single load. Also, GVN and later optimizations does value forwarding of %a (0) and also the compare folds to true.
This does not happen when GVN widens the load. Note that at the time GVN did the widening, it was a good transform since we have a single load instead of 2. However, inlining and later optimizations proved that leaving the load non-widened was more beneficial.
Any thoughts on how we could mitigate these problems introduced by GVN widening?
Thanks,
Anna
More information about the llvm-dev
mailing list