[LLVMdev] scalarrepl fails to promote array of vector

Sun Mar 11 20:35:19 PDT 2012

Hi Chris,

Thanks for your reply.

You said that scalarRepl gets shy about loads and stores of the entire
aggregate. Then I use a test case:

; ModuleID = 'test1.ll'
define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly {
  %stackArray = alloca <4 x i32>
  %XC = bitcast i32* %X to <4 x i32>*
  %arrayVal = load <4 x i32>* %XC
  store <4 x i32> %arrayVal, <4 x i32>* %stackArray
  %arrayVal1 = load <4 x i32>* %stackArray
  %1 = extractelement <4 x i32> %arrayVal1, i32 1
  ret i32 %1
}

$ opt -S -stats -scalarrepl test1.ll
; ModuleID = 'test1.ll'

define i32 @fun(i32* nocapture %X, i32 %i) nounwind uwtable readonly {
  %XC = bitcast i32* %X to <4 x i32>*
  %arrayVal = load <4 x i32>* %XC
  %1 = extractelement <4 x i32> %arrayVal, i32 1
  ret i32 %1
}
===-------------------------------------------------------------------------===
                          ... Statistics Collected ...
===-------------------------------------------------------------------------===

1 mem2reg    - Number of alloca's promoted with a single store
1 scalarrepl - Number of allocas promoted

You can see that the stackArray is eliminated, although there is loads and
stores of the entire aggregate.

However, the optimised code is still not optimal. I want the code just load
one element from X instead of the whole array.

Thanks,
David

On Sun, Mar 11, 2012 at 5:22 AM, Chris Lattner <clattner at apple.com> wrote:

>
> On Mar 10, 2012, at 9:34 AM, Fan Dawei wrote:
>
> > Hi all,
> >
> > I want to use scalarrepl pass to eliminate the allocation of mat_alloc
> which is of type [4 x <4 x float>] in the following program.
> >
> > $cat test.ll
> >
> > ; ModuleID = 'test.ll'
> >
> > define void @main(<4 x float>* %inArg, <4 x float>* %outArg, [4 x <4 x
> float>]* %constants) nounwind {
> > entry:
> >   %inArg1 = load <4 x float>* %inArg
> >   %mat_alloc = alloca [4 x <4 x float>]
> >   %matVal = load [4 x <4 x float>]* %constants
> >   store [4 x <4 x float>] %matVal, [4 x <4 x float>]* %mat_alloc
> >   %0 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 0
> >   %1 = load <4 x float>* %0
> >   %2 = fmul <4 x float> %1, %inArg1
> >   %3 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 1
> >   %4 = load <4 x float>* %3
> >   %5 = fmul <4 x float> %4, %inArg1
> >   %6 = fadd <4 x float> %2, %5
> >   %7 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32 2
> >   %8 = load <4 x float>* %7
> >   %9 = fmul <4 x float> %8, %inArg1
> >   %10 = fadd <4 x float> %6, %9
> >   %11 = getelementptr inbounds [4 x <4 x float>]* %mat_alloc, i32 0, i32
> 3
> >   %12 = load <4 x float>* %11
> >   %13 = fadd <4 x float> %10, %12
> >   %14 = getelementptr <4 x float>* %outArg, i32 1
> >   store <4 x float> %13, <4 x float>* %14
> >   ret void
> > }
> >
> > $ opt -S -stats -scalarrepl test.ll
> >
> > No transformation is performed. I've examined the source code of
> scalarrepl. It seems this pass does not handle array allocations. Is there
> other transformation pass I can use to eliminate this allocation?
>
> Hi David,
>
> ScalarRepl gets shy about loads and stores of the entire aggregate:
>
> >   %matVal = load [4 x <4 x float>]* %constants
> >   store [4 x <4 x float>] %matVal, [4 x <4 x float>]* %mat_alloc
>
> It is possible to generalize scalarrepl to handle these similar to the way
> it handles memcpy, but noone has done that yet.  Also, it's not generally
> recommended to do stuff like this, because you'll get inefficient code from
> many parts of the optimizer and code generator.
>
> -Chris
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120312/eed446ef/attachment.html>