[LLVMdev] scalarrepl fails to promote array of vector

Mon Mar 12 01:20:14 PDT 2012

Hi Fan,

> You said that scalarRepl gets shy about loads and stores of the entire
> aggregate. Then I use a test case:
>
> ; ModuleID = 'test1.ll'
> define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly {
>    %stackArray = alloca <4 x i32>
>    %XC = bitcast i32* %X to <4 x i32>*
>    %arrayVal = load <4 x i32>* %XC
>    store <4 x i32> %arrayVal, <4 x i32>* %stackArray
>    %arrayVal1 = load <4 x i32>* %stackArray
>    %1 = extractelement <4 x i32> %arrayVal1, i32 1
>    ret i32 %1
> }
>
> $ opt -S -stats -scalarrepl test1.ll
> ; ModuleID = 'test1.ll'
>
> define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly {
>    %XC = bitcast i32* %X to <4 x i32>*
>    %arrayVal = load <4 x i32>* %XC
>    %1 = extractelement <4 x i32> %arrayVal, i32 1
>    ret i32 %1
> }
> ===-------------------------------------------------------------------------===
>                            ... Statistics Collected ...
> ===-------------------------------------------------------------------------===
>
> 1 mem2reg    - Number of alloca's promoted with a single store
> 1 scalarrepl - Number of allocas promoted
>
> You can see that the stackArray is eliminated,

I think you may be confusing arrays and vectors: there is no stack array in
your example, only the vector <4 x i32>.  As a general rule hardly any
optimization is done for loads and stores of arrays because front-ends don't
produce them much.  Much more effort is made for vectors because they can be
important for getting good performance.

Ciao, Duncan.

  although there is loads and
> stores of the entire aggregate.
>
> However, the optimised code is still not optimal. I want the code just load one
> element from X instead of the whole array.
>
> Thanks,
> David
>
>
>
>
>
> On Sun, Mar 11, 2012 at 5:22 AM, Chris Lattner <clattner at apple.com
> <mailto:clattner at apple.com>> wrote:
>
>
>     On Mar 10, 2012, at 9:34 AM, Fan Dawei wrote:
>
>      > Hi all,
>      >
>      > I want to use scalarrepl pass to eliminate the allocation of mat_alloc
>     which is of type [4 x <4 x float>] in the following program.
>      >
>      > $cat test.ll
>      >
>      > ; ModuleID = 'test.ll'
>      >
>      > define void @main(<4 x float>* %inArg, <4 x float>* %outArg, [4 x <4 x
>     float>]* %constants) nounwind {
>      > entry:
>      >   %inArg1 = load <4 x float>* %inArg
>      >   %mat_alloc = alloca [4 x <4 x float>]
>      >   %matVal = load [4 x <4 x float>]* %constants
>      >   store [4 x <4 x float>] %matVal, [4 x <4 x float>]* %mat_alloc
>      >   %0 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 0
>      >   %1 = load <4 x float>* %0
>      >   %2 = fmul <4 x float> %1, %inArg1
>      >   %3 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 1
>      >   %4 = load <4 x float>* %3
>      >   %5 = fmul <4 x float> %4, %inArg1
>      >   %6 = fadd <4 x float> %2, %5
>      >   %7 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 2
>      >   %8 = load <4 x float>* %7
>      >   %9 = fmul <4 x float> %8, %inArg1
>      >   %10 = fadd <4 x float> %6, %9
>      >   %11 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 3
>      >   %12 = load <4 x float>* %11
>      >   %13 = fadd <4 x float> %10, %12
>      >   %14 = getelementptr <4 x float>* %outArg, i32 1
>      >   store <4 x float> %13, <4 x float>* %14
>      >   ret void
>      > }
>      >
>      > $ opt -S -stats -scalarrepl test.ll
>      >
>      > No transformation is performed. I've examined the source code of
>     scalarrepl. It seems this pass does not handle array allocations. Is there
>     other transformation pass I can use to eliminate this allocation?
>
>     Hi David,
>
>     ScalarRepl gets shy about loads and stores of the entire aggregate:
>
>      >   %matVal = load [4 x <4 x float>]* %constants
>      >   store [4 x <4 x float>] %matVal, [4 x <4 x float>]* %mat_alloc
>
>     It is possible to generalize scalarrepl to handle these similar to the way
>     it handles memcpy, but noone has done that yet.  Also, it's not generally
>     recommended to do stuff like this, because you'll get inefficient code from
>     many parts of the optimizer and code generator.
>
>     -Chris
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev