[PATCH] D21496: LoadCombine Load Aliasing Fix

Sat Jun 18 19:12:42 PDT 2016

rriddle added inline comments.

================
Comment at: /Users/rriddle/Desktop/llvm/llvm/test/Transforms/LoadCombine/load-combine-alias.ll:12
@@ +11,3 @@
+  %i1 = getelementptr inbounds %struct.Str, %struct.Str* %1, i64 0, i32 0
+  store i32 %0, i32* %i1, align 4
+  %add.ptr = getelementptr inbounds i32, i32* %i, i64 1
----------------
eli.friedman wrote:
> rriddle wrote:
> > eli.friedman wrote:
> > > What transform is this testcase checking for?  I don't think you can prove any useful aliasing property between %i, %i1, and %str?
> > In the current implementation %1 is being aliased in that store operation which would cause every load chain to try and combine which would prevent the load at %0 from being able to combine with %2, %4, etc.
> > 
> I'm not sure I follow... is the transform you're trying to perform something like this?
> 
> ```
> void f(int* a, int* b) {
>   a[0] = b[0];
>   a[1] = b[1];
>   a[2] = b[2];
>   a[3] = b[3];
>   // Combine to "memcpy(a, b, 16)"???
> }
> ```
No, the point of transform is to combine each of the individual loads into a single large one. In this specific test, there are multiple GEP + load pairs coming from %i. Each of those will combine down to a single i128 load. It may be easier to see with this output 

```
define i32 @Load_MultiChain(i32* %i) {
  %1 = getelementptr inbounds i32, i32* %i, i64 1
  %2 = bitcast i32* %i to i8*
  %3 = getelementptr i8, i8* %2, i64 0
  %4 = bitcast i8* %3 to i128*
  %.combined = load i128, i128* %4, align 4
  %combine.extract.shift = lshr i128 %.combined, 32
  %combine.extract.trunc1 = trunc i128 %combine.extract.shift to i32
  %5 = load i32, i32* %1, align 4
  %combine.extract.trunc = trunc i128 %.combined to i32
  %6 = load i32, i32* %i, align 4
  %7 = getelementptr inbounds i32, i32* %i, i64 2
  %combine.extract.shift2 = lshr i128 %.combined, 64
  %combine.extract.trunc3 = trunc i128 %combine.extract.shift2 to i32
  %8 = load i32, i32* %7, align 4
  %9 = getelementptr inbounds i32, i32* %i, i64 3
  %combine.extract.shift4 = lshr i128 %.combined, 96
  %combine.extract.trunc5 = trunc i128 %combine.extract.shift4 to i32
  %10 = load i32, i32* %9, align 4
  %11 = getelementptr inbounds i32, i32* %i, i64 4
  %12 = load i32, i32* %11, align 4
  %13 = getelementptr inbounds i32, i32* %i, i64 5
  %14 = load i32, i32* %13, align 4
  %15 = add nsw i32 %combine.extract.trunc, %14
  ret i32 %15
}
```
In the above every load that originated from i has been combined into a single i128 load. This test is not about transforming into the most optimal result it is testing to make sure that a referenced load chain does not affect all of the other load chains in the basic block. The problem with the current implementation is that it is too conservative when it comes to a write operation that aliases a load. When it reaches such an instruction it tries to combine all of the load chains at that point. This reduces the amount of loads that will be combined if it combines every chain when it reaches a store that references a load.

http://reviews.llvm.org/D21496