[LLVMdev] scalarrepl fails to promote array of vector

Sat Mar 10 13:22:40 PST 2012

On Mar 10, 2012, at 9:34 AM, Fan Dawei wrote:

> Hi all,
> 
> I want to use scalarrepl pass to eliminate the allocation of mat_alloc which is of type [4 x <4 x float>] in the following program. 
> 
> $cat test.ll
> 
> ; ModuleID = 'test.ll'
> 
> define void @main(<4 x float>* %inArg, <4 x float>* %outArg, [4 x <4 x float>]* %constants) nounwind {
> entry:
>   %inArg1 = load <4 x float>* %inArg
>   %mat_alloc = alloca [4 x <4 x float>]
>   %matVal = load [4 x <4 x float>]* %constants
>   store [4 x <4 x float>] %matVal, [4 x <4 x float>]* %mat_alloc
>   %0 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 0
>   %1 = load <4 x float>* %0
>   %2 = fmul <4 x float> %1, %inArg1
>   %3 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 1
>   %4 = load <4 x float>* %3
>   %5 = fmul <4 x float> %4, %inArg1
>   %6 = fadd <4 x float> %2, %5
>   %7 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 2
>   %8 = load <4 x float>* %7
>   %9 = fmul <4 x float> %8, %inArg1
>   %10 = fadd <4 x float> %6, %9
>   %11 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 3
>   %12 = load <4 x float>* %11
>   %13 = fadd <4 x float> %10, %12
>   %14 = getelementptr <4 x float>* %outArg, i32 1
>   store <4 x float> %13, <4 x float>* %14
>   ret void
> } 
> 
> $ opt -S -stats -scalarrepl test.ll 
> 
> No transformation is performed. I've examined the source code of scalarrepl. It seems this pass does not handle array allocations. Is there other transformation pass I can use to eliminate this allocation?

Hi David,

ScalarRepl gets shy about loads and stores of the entire aggregate:

>   %matVal = load [4 x <4 x float>]* %constants
>   store [4 x <4 x float>] %matVal, [4 x <4 x float>]* %mat_alloc

It is possible to generalize scalarrepl to handle these similar to the way it handles memcpy, but noone has done that yet.  Also, it's not generally recommended to do stuff like this, because you'll get inefficient code from many parts of the optimizer and code generator.

-Chris