[PATCH] [AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend.

Tue Oct 28 18:59:39 PDT 2014

Hi Jingyue,
> My general concern is the part extracting constant struct field index seems wrong. Unlike arrays where every elements share the same type, structure fields can have different types. Therefore, changing a field index may change the intermediate type in a pointer arithmetic, causing incorrect IR. 
> 
> For example, consider
> ```
> struct S {
>   int a[4]; // sizeof(int) = 4
>   double b[8]; // sizeof(double) = 8
> };
> 
> %s = alloca struct %S
> %p = getelementptr %s, 0, 1, i ; &s.b[i] = s + 4*sizeof(int) + i*sizeof(double)
> ```
> If you extract the 1, the gep becomes:
> ```
> %p = getelementptr %s, 0, 0, i; &s.a[i] = s + i*sizeof(int)
> ```
> and the constant offset is 4 * sizeof(int) = 16. 
> 
> As you probably see already,
> ```
> s + 4*sizeof(int) + i*sizeof(double) != s + 4*sizeof(int) + i*sizeof(int)
> ```
> 
> Does this make sense to you? What is a good way to fix this? 
Yes, your concern is reasonable. So to extract a index of structure type, we should not keep the original gep form of 3 indices.
For your case about "%p = getelementptr %s, 0, 1, i", we can transform it to either of two simpler forms:
1st Form: 2 simpler GEPs
    %i_offset = mul %i, sizeof(double)
    %s_i8 = bitcast %s to i8*
    %s_i = getelementptr i8* %s_i8, %i_offset
    %s_i_c = getelementptr i8* %s_i, 4*sizeof(int)
    %result = bitcast %s_i_c to double*
Bitcast can be ignored, the original GEP with 3 indices are transformed into 1 MUL/SHL and 2 simpler GEPs with 1 index. If we find constant within the index "%i", it can also be added into the last GEP.

2nd Form: ptrtoint+arithmetic+inttoptr 
    %ptrint = ptrtoint %s to i64
    %i_offset = mul %i, sizeof(double)
    %s_i = add %ptrint, %i_offset
    %s_i_c = add %ptrint, 4*sizeof(int)
    %result = inttoptr %s_i_c to double*
This is similar, ptrtoint/inttoptr can be ignored, we transform the original GEP with 3 indices into 1 MUL/SHL and 2 ADDs.

The main benefit is that
(1) We can always extract the indices of struct types.
(2) We can do CSE easily on simpler GEPs and MUL/SHL/ADDs.

Such transformation is similar to the address sinking login in CGP pass, which also transform GEP to one of such forms by checking useAA(). For the correctness and performance, several benchmarks has been tested on AArch64 backend. I cannot test NVPTX, but I think if the test cases have complex address calculation, it can also get benefit.

Thanks,
-Hao

http://reviews.llvm.org/D5864