[LLVMdev] failed folding with constant array with opt -O3
Peng Cheng
gm4cheng at gmail.com
Wed Sep 10 12:00:47 PDT 2014
Thanks for your help! Providing data layout works for the example I have.
I also feel there is still something else going on during the
investigation.
I have another simple ir, which basically does the same thing as the
original example except initialize the array element by element instead of
by an constant array.
---
define void @f(i32* nocapture %l0) {
entry:
%fc_ = alloca [1 x i32]
%0 = getelementptr inbounds [1 x i32]* %fc_, i32 0, i32 0
store i32 1, i32* %0, align 4
%1 = getelementptr [1 x i32]* %fc_, i64 0, i64 0
%2 = load i32* %1
%tobool = icmp eq i32 %2, 0
br i1 %tobool, label %4, label %3
; <label>:3 ; preds = %entry
store i32 1, i32* %l0
br label %4
; <label>:4 ; preds = %entry, %3
%storemerge = phi i32 [ 1, %3 ], [ 0, %entry ]
store i32 %storemerge, i32* %l0
ret void
}
---
For this ir, without target data layout,
opt -O2 got the expected optmization:
---
; ModuleID = 't1.txt'
; Function Attrs: nounwind
define void @f(i32* nocapture %l0) #0 {
store i32 1, i32* %l0
ret void
}
attributes #0 = { nounwind }
---
By checking the ir after each transformation, I see that gvn removes the
load expression and the transformations before do nothing.
So, I ran opt -gvn on the ir, but did not get load expression eliminated.
Checking into the gvn code. Looks like the memory dependency computation
got different results for the load expression with -O2 or -gvn. With O2,
the load is not clobbered, but with gvn alone, it says clobbered.
Does that sound expected?
On Wed, Sep 10, 2014 at 12:50 PM, Philip Reames <listmail at philipreames.com>
wrote:
> I came in to an email this morning that said basically the same thing for
> the reduced example we were looking at. However, the original IR it came
> from (before hand reduction) had the data layout set correctly, so there's
> probably still *something* going on. It's just not what I thought at
> first. :)
>
> Philip
>
>
>
> On 09/10/2014 02:26 AM, Roel Jordans wrote:
>
>> Looking at the -debug output of opt shows that SROA was skipped due to
>> missing target data.
>>
>> Adding something like:
>>
>> target datalayout = "e-p:32:32:32-i32:32:32"
>>
>> to the top seems sufficient to fix the issue at -O3.
>>
>> By defining the size and storage requirements for i32 SROA is capable of
>> rewriting the array load into a constant scalar load which can then be
>> further optimized.
>>
>> Cheers,
>> Roel
>>
>> On 09/09/14 18:30, Peng Cheng wrote:
>>
>>> I have the following simplified llvm ir, which basically returns value
>>> based on the first value of a constant array.
>>>
>>> ----
>>> ; ModuleID = 'simple_ir3.txt'
>>>
>>> @f.b = constant [1 x i32] [i32 1], align 4 ; constant array
>>> with value 1 at the first element
>>>
>>> define void @f(i32* nocapture %l0) {
>>> entry:
>>> %fc_ = alloca [1 x i32]
>>> %f.b.v = load [1 x i32]* @f.b
>>> store [1 x i32] %f.b.v, [1 x i32]* %fc_
>>> %0 = getelementptr [1 x i32]* %fc_, i64 0, i64 0 ; load the first
>>> element of the constant array, which is actually 1
>>> %1 = load i32* %0
>>> %tobool = icmp ne i32 %1, 0 ; check the first element to
>>> see if it is 1, which is actually always true since the first element of
>>> constant array is 1
>>> br i1 %tobool, label %2, label %4
>>>
>>> ; <label>:2 ; true branch
>>> store i32 1, i32* %l0;
>>> %3 = load i32* %l0;
>>> br label %4
>>>
>>> ; <label>:4
>>> %storemerge = phi i32 [ %3, %2 ], [ 0, %entry ]
>>> store i32 %storemerge, i32* %l0
>>> ret void
>>> }
>>> ---
>>>
>>> I ran opt -O3 simple_ir.txt -S, and got:
>>>
>>> ---
>>> ; ModuleID = 'simple_ir3.txt'
>>>
>>> @f.b = constant [1 x i32] [i32 1], align 4
>>>
>>> ; Function Attrs: nounwind
>>> define void @f(i32* nocapture %l0) #0 {
>>> entry:
>>> %fc_ = alloca [1 x i32]
>>> store [1 x i32] [i32 1], [1 x i32]* %fc_
>>> %0 = getelementptr [1 x i32]* %fc_, i64 0, i64 0
>>> %1 = load i32* %0
>>> %tobool = icmp eq i32 %1, 0
>>> br i1 %tobool, label %3, label %2
>>>
>>> ; <label>:2 ; preds = %entry
>>> store i32 1, i32* %l0
>>> br label %3
>>>
>>> ; <label>:3 ; preds = %entry, %2
>>> %storemerge = phi i32 [ 1, %2 ], [ 0, %entry ]
>>> store i32 %storemerge, i32* %l0
>>> ret void
>>> }
>>>
>>> attributes #0 = { nounwind }
>>> ---
>>>
>>> I would expect that the constant folding, or some other transformations,
>>> would be able to fold the constant to get the following ir:
>>>
>>> ---
>>> define void @f(i32* nocapture %l0) #0 {
>>> store i32 1, i32* %l0
>>> ret void
>>> }
>>> ---
>>>
>>> How could I get the expected optimized ir? update the original ir, or
>>> use different set of transformations?
>>>
>>> Any suggestions or comments?
>>>
>>>
>>> Thanks,
>>> -Peng
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140910/6880e97d/attachment.html>
More information about the llvm-dev
mailing list