[llvm-dev] Optimizations hindered by GVN widening

Thu Jun 30 08:05:40 PDT 2016

Currently, the GVN optimization widens loads if the alignment permits it. There are some limitations that show up, as seen in example below:

Initial IR:

declare void @consume(i8) readonly

define i8 @bar(i8* align 2 %a) {
  %1 = load i8, i8* %a
  %idx = getelementptr i8, i8* %a, i64 1
  %2 = load i8, i8* %idx, align 1
  call void @consume(i8 %1).
  ret i8 %2
}

define i1 @foo(i8* %a) {
entry:
  %0 = call i8 @bar(i8* %a)
  %1 = icmp eq i8 %0, 0
  br i1 %1, label %cont.1, label %exit

cont.1:
  store i8 0, i8* %a
  %2 = call i8 @bar(i8* %a)
  %3 = icmp eq i8 %2, 0
  ret i1 %3

exit:
  ret i1 true
}

Since %a is 2 byte aligned, GVN widens the loads in bar, then bar() gets inlined into foo. The resulting final IR at O3:

define i8 @bar(i8* align 2 %a) {
  %1 = bitcast i8* %a to i16*
  %2 = load i16, i16* %1, align 2
  %3 = trunc i16 %2 to i8
  %idx = getelementptr i8, i8* %a, i64 1
  %4 = lshr i16 %2, 8
  %5 = trunc i16 %4 to i8
  call void @consume(i8 %3)
  ret i8 %5 
}
define i1 @foo(i8* %a) {
entry:
  %0 = bitcast i8* %a to i16*
  %1 = load i16, i16* %0, align 2 <— widened load from bar()
  %2 = trunc i16 %1 to i8
  call void @consume(i8 %2) 
  %3 = icmp ult i16 %1, 256 
  br i1 %3, label %cont.1, label %exit

cont.1:                                           ; preds = %entry
  store i8 0, i8* %a, align 2
  %4 = load i16, i16* %0, align 2  <— widened load from bar()
  %5 = trunc i16 %4 to i8
  call void @consume(i8 %5) 
  %6 = icmp ult i16 %4, 256 
  ret i1 %6

exit:                                             ; preds = %entry
  ret i1 true
}

In the absence of GVN widening (we can see this when %a is 1 byte aligned in bar), bar is inlined into foo as-is. 
Final IR at O3:
define i8 @bar(i8* align 1 %a) {
  %1 = load i8, i8* %a, align 1  <— align is 1, so GVN does not widen load 
  %idx = getelementptr i8, i8* %a, i64 1
  %2 = load i8, i8* %idx, align 1
  call void @consume(i8 %1) 
  ret i8 %2
} 

define i1 @foo(i8* %a) {
entry: 
  %0 = load i8, i8* %a, align 1
  %idx.i = getelementptr i8, i8* %a, i64 1
  %1 = load i8, i8* %idx.i, align 1 <— both loads exist non-widened from bar(). Used in GVN for removing the loads in cont.1 BB. 
  call void @consume(i8 %0)
  %2 = icmp eq i8 %1, 0
  br i1 %2, label %cont.1, label %exit 

cont.1:                                           ; preds = %entry
  store i8 0, i8* %a, align 1
; both the loads are removed. First load has the value fed from the store above. Second load is same as the one in entry BB (%1).  
  call void @consume(i8 0)    
  ret i1 true  

exit:                                             ; preds = %entry
  ret i1 true
} 

Here the 2 loads in the cont.1 basic block are removed by GVN, since the loads are not widened to a single load. Also, GVN and later optimizations does value forwarding of %a (0) and also the compare folds to true. 

This does not happen when GVN widens the load. Note that at the time GVN did the widening, it was a good transform since we have a single load instead of 2. However, inlining and later optimizations proved that leaving the load non-widened was more beneficial. 

Any thoughts on how we could mitigate these problems introduced by GVN widening? 

Thanks,
Anna