[LLVMdev] Multi-dimensional array accesses in LLVM-IR | Thoughts

Wed Sep 12 01:38:19 PDT 2012

On 09/11/2012 03:38 PM, Hal Finkel wrote:
> This is a general problem that also affects, for example, Preston's
> dependence-analysis work. How have you thought about dealing with this?

Hi Hal,

you are right. Preston's dependence-analysis and Polly are sharing the same
problem. As far as I know, non of us implemented a solution yet.
Preston, what is your status on multi-dim arrays?

As stated earlier the problem is that access functions to multi-dimensional arrays
are linearized at LLVM-IR level. Unfortunately it is a lot harder to reason about
linearized accesses. Hence, we want to recover the multi-dimensional access function.

In LLVM-IR words: When we currently analyze a multi-dimensional
variable length memory access, we calculate a single SCEV expression
which describes the memory access. However, we would prefer to have for
each array dimension a separate SCEV* expression. Those individual expressions
will be a lot easier to analyze.

I think there are two major directions we can go:

1) Represent multi-dimensional access functions in LLVM-IR

Instead of linearizing multi-dimensional array accesses to single 
dimensional array accesses, we extend LLVM-IR such that it can express 
multi-dimensional accesses and we ensure that front-ends directly
generate LLVM-IR multi-dimensional accesses.

2) Recover multi-dimensionality from linearized accesses

We take a linearized single-dimensional SCEV* expression, apply some 
magic and recover both the number and size of the original array
dimensions as well as a set of SCEV* expressions describing the individual
access functions of the original dimensions.

Both approaches have advantages/disadvantages:

Approach 1)
   Pro:
	- Provides exact information about multi-dimensional arrays,
          if the multi-dimensionality is explicit in the source program

   Contra:
	- We need to explicitly express multi-dimensionality in LLVM-IR
	- Front-ends need to explicitly generate the multi-dimensional IR
	- Existing passes need to maintain and reason about the new IR
	- Only C99 supports variable length arrays. Most existing C/C++
	  programs have a manual implementation of variable length
	  arrays, which cannot be automatically translated by the front-end.

Approach 2)
   Pro:
	- Does not distinguish between manually simulated multi-dimensional
	  arrays and multi-dimensional arrays that have been explicitly
	  expressed in the source code.
	- Changes are limited to an LLVM-IR analysis

   Contra:
	- The general problem is difficult
	- We may need to add some run-time checks as statically proving
          the delinearization may not be entirely possible

I personally would first have a look at approach '2'. Access functions of
multi-dimensional follow often a very regular structure, which appears not
only for accesses generated by compiler front-ends, but also for
manual implementations of multi-dimensional arrays. One common pattern
is that there is a single parameter that represents the size of a
dimension [1]. Meaning SCEV* expressions describing access to an array "A[][m][p]" will contain the sizes '%p' and '%m * %p' more or less explicit.

Here some examples I committed in r163619:

multidim_only_ivs_2d.ll:
; void foo(long n, long m, double A[n][m]) {
;   for (long i = 0; i < n; i++)                                                
;     for (long j = 0; j < m; j++)                                              
;       A[i][j] = 1.0;                                                          
; }         
; Access function: {{0,+,%m}<%for.i>,+,1}<nw><%for.j>

multidim_only_ivs_3d.ll:
; void foo(long n, long m, long o, double A[n][m][o]) {                         
;   for (long i = 0; i < n; i++)                                                
;     for (long j = 0; j < m; j++)                                              
;       for (long k = 0; k < o; k++)                                            
;         A[i][j][k] = 1.0;                                                     
; }                                                                                                                                                         
; Access function: {{{0,+,(%m * %o)}<%for.i>,+,%o}<%for.j>,+,1}<nw><%for.k>

multidim_ivs_and_integer_offsets_3d.ll:
; void foo(long n, long m, long o, double A[n][m][o]) {
;   for (long i = 0; i < n; i++)                                                
;     for (long j = 0; j < m; j++)                                              
;       for (long k = 0; k < o; k++)                                            
;         A[i+3][j-4][k+7] = 1.0;                                               
; }                                                                             
;                                                                               
; Access function:
;   {{{(56 + (8 * (-4 + (3 * %m)) * %o) + %A),+,(8 * %m * %o)}<%for.i>,+,       
;      (8 * %o)}<%for.j>,+,8}<%for.k>  

multidim_only_ivs_3d_cast.ll:
; void foo(int n, int m, int o, double A[n][m][o]) {                            
;   for (int i = 0; i < n; i++)                                                 
;     for (int j = 0; j < m; j++)                                               
;       for (int k = 0; k < o; k++)                                             
;         A[i][j][k] = 1.0;                                                     
; }                                                                             
;                                                                               
; Access function:                                                              
;   {{{%A,+,(8 * (zext i32 %m to i64) * (zext i32 %o to i64))}<%for.i>,+,       
;    (8 * (zext i32 %o to i64))}<%for.j>,+,8}<%for.k>

multidim_ivs_and_parameteric_offsets_3d.ll:
; void foo(long n, long m, long o, double A[n][m][o], long p, long q, long r) { 
;                                                                               
;   for (long i = 0; i < n; i++)                                                
;     for (long j = 0; j < m; j++)                                              
;       for (long k = 0; k < o; k++)                                            
;         A[i+p][j+q][k+r] = 1.0;                                               
; }                                                                             
;                                                                               
; Access function:                                                              
;    {{{((8 * ((((%m * %p) + %q) * %o) + %r)) + %A),+,(8 * %m * %o)}<%for.i>,+, 
;        (8 * %o)}<%for.j>,+,8}<%for.k> 

Those examples are obviously not exhaustive, but they already cover many common
cases. Analyzing them seems to be possible. As you may realize the size of the
array dimensions are all explicitly available in the 'step' of the SCEVAddRec
repressions. Even in the last example which contains parameters both for the
sizes and the offsets, the parameters for the sizes are the only ones that
appear on the right hand sides ('8 * %m * %o' and '8 * %o'). 

I would guess that we could write a SCEVIterator, that analyzes the SCEVs and guesses
the dimension sizes. After having written that, it should be possible to write
another SCEVIterator that given the dimension sizes can separate the different
dimensions of a SCEV.

I doubt we will always be able to prove that our guess is correct, but we could
add a run-time check to test the conditions that we can not prove statically. This is
also the point, where I think the front-end (or the user with attributes) could help.
Accesses could be annotated with meta-data providing the size of the array dimensions,
such that our analysis can start from this meta-data. This could allow our analysis
to take some short cuts and to avoid some run-time checks.

Does this sound like a reasonable approach? Any ideas or suggestions on that topic?

Cheers
Tobi

[1] FORTRAN seems to express array size as "max(0, %dim_size)"