[LLVMdev] Alignment of pointee

Tue Mar 25 06:53:35 PDT 2014

Hi all,

Is there a way to express in the IR that a pointer's value is a multiple 
of, say, 32 bytes? I.e. the data the pointer points to has an alignment 
of 32 bytes. I am not meaning the natural alignment determined by the 
object's size. I am talking about a double* pointer and like to 
explicitly overestimate the alignment.

I am trying to add this pointer as a function's argument, so that later 
aligned (vector-) loads would get generated.

See the pseudo code of what I try to accomplish:

define void @foo( double* noalias %arg0 )
{
    // switching to C style
   for( int outer=0 ; outer < end ; ++outer ) {
     for( int inner=0 ; inner < 4 ; ++inner ) {
arg0[ outer*4 + inner ] += arg0[ outer*4 + inner ];
   }
}

The loop vectorizer does its job on the 'inner' loop and generates 
vector loads/adds/stores for this code. However, the vector loads/stores 
are not optimally aligned as they could be resulting a lot of 
boilerplate code produced in codegen (lots of permutations).

After vectorization the code looks similar to

define void @foo( double* noalias %arg0 )
{
    // switching to C style
   for( int outer=0 ; outer < end ; ++outer ) {

vector.body:                                      ; preds = 
%vector.body, %L5
   %index = phi i64 [ 0, %L5 ], [ %index.next, %vector.body ]
   %42 = add i64 %7, %index
   %43 = getelementptr double* %arg1, i64 %42
   %44 = bitcast double* %43 to <4 x double>*
   %wide.load = load <4 x double>* %44, align 8

   %132 = fadd <4 x double> %wide.load, %wide.load54

   %364 = getelementptr double* %arg0, i64 %93
   %365 = bitcast double* %364 to <4 x double>*
   store <4 x double> %329, <4 x double>* %365, align 8
   }
}

One can see that if the initial alignment of the pointee of %arg0 was 32 
bytes and since the vectorizer operates on a loop with a fixed trip 
count of 4 and the size of double is 8 bytes, the vector loads and 
stores could be ideally aligned with 32 bytes (which on my target 
architecture would result in vector loads without additional permutations.

Is it somehow possible to achieve this? I am generating the IR with the 
builder, i.e. I am not coming from C or clang.

Thank you,
Frank