[llvm-dev] LLVM Loop vectorizer - 2 vector.body blocks appear

Michael Zolotukhin via llvm-dev llvm-dev at lists.llvm.org
Sun Aug 7 16:32:02 PDT 2016


> On Aug 7, 2016, at 3:33 PM, Alex Susu <alex.e.susu at gmail.com> wrote:
> 
>  Hello.
>    Michael, thank you for your answer - indeed, your command generates only 1 vector.body.
> 
>    I give the following commands to compile:
>        $(LLVM_PATH)/clang -fvectorize -mllvm -force-vector-width=8 src.c -S -emit-llvm
>        $(LLVM_PATH)/opt -debug -O3 -loop-vectorize -force-vector-width=8 src.ll -S >3better_after_opt.ll
>        $(LLVM_PATH)/llc -print-after-all -debug -march=connex -O0 -asm-show-inst -asm-verbose src_after_opt.ll
Hi Alex,

I assume you run these three commands to model a clang's O3 behavior using opt? If so, then it’s better to do it the following way:

1. Generate IR with clang before optimizations kick in:
clang -O3 -mllvm -disable-llvm-optzns -S -emit-llvm src.c -o src_noopt.ll

2. Run opt on it:
opt -O3 src_noopt.ll -S -o src_after_opt.ll 
You can also pass you custom flags here, like “-force-vector-width=8”. No need to pass -loop-vectorize, as it’s already present in O3 pipeline. I guess passing it along with O3 might be the reason you see two vector bodies (e.g. remainder loop might have been vectorized by the second invocation of vectorizer).

3. Run llc if you need an asm file:
llc src_after_opt.ll -o src.s -march=connex  -asm-show-inst -asm-verbose

Michael

> 
>   I'd like to mention I am using the version of LoopVectorize.cpp from beginning of Jul 2016.
> 
>  Best regards,
>    Alex
> 
> On 8/6/2016 2:15 AM, Michael Zolotukhin wrote:
>> Hi Alex,
>> 
>> How do you compile this program? I compile it as follows, and don’t see extra vector-bodies:
>> 
>>> bin/clang -O3 vec.c -S -o - |grep "##"
>> 
>>    _foo:                                   ## @foo
>>    ## BB#0:                                ## %entry
>>    ## BB#1:                                ## %for.body.preheader
>>    ## BB#8:                                ## %min.iters.checked
>>    ## BB#9:                                ## %vector.memcheck
>>    ## BB#10:                               ## %vector.memcheck
>>    ## BB#11:                               ## %vector.body.preheader
>>    ## BB#12:                               ## %vector.body.prol
>>    LBB0_13:                                ## %vector.body.prol.loopexit
>>    ## BB#14:                               ## %vector.body.preheader.new
>>    LBB0_15:                                ## %vector.body
>>                                             ## =>This Inner Loop Header: Depth=1
>>    LBB0_16:                                ## %middle.block
>>    LBB0_2:                                 ## %for.body.preheader27
>>    ## BB#3:                                ## %for.body.prol.preheader
>>    LBB0_4:                                 ## %for.body.prol
>>                                             ## =>This Inner Loop Header: Depth=1
>>    LBB0_5:                                 ## %for.body.prol.loopexit
>>    ## BB#6:                                ## %for.body.preheader27.new
>>    LBB0_7:                                 ## %for.body
>>                                             ## =>This Inner Loop Header: Depth=1
>>    LBB0_17:                                ## %for.cond.cleanup
>> 
>> 
>> 
>> Best regards,
>> Michael
>> 
>> 
>>> On Jul 31, 2016, at 5:29 PM, Alex Susu <alex.e.susu at gmail.com
>>> <mailto:alex.e.susu at gmail.com <mailto:alex.e.susu at gmail.com>>> wrote:
>>> 
>>> Hello.
>>>   Mikhail, with the more recent version of the LoopVectorize.cpp code (retrieved at the
>>> beginning of July 2016) I ran the following piece of C code:
>>>   void foo(long *A, long *B, long *C, long N) {
>>>     for (long i = 0; i < N; ++i) {
>>>       C[i] = A[i] + B[i];
>>>     }
>>>   }
>>> 
>>>   The vectorized LLVM program I obtain contains 2 vector.body blocks - one named
>>> "vector.body" and the other "vector.body34" for example. The code seems correct - the
>>> first "vector.body" block is responsible for the vector add of a number of vector
>>> elements multiple of VF * UF. There are 2 epilogues which makes things a bit strange - I
>>> am still trying to understand the code.
>>> 
>>> 
>>>   Is it possible to explain to me where in LoopVectorize.cpp are created 2 vector.body
>>> blocks? I know that InnerLoopVectorizer::vectorize() calls
>>> InnerLoopVectorizer::createEmptyLoop() which creates the blocks required for
>>> vectorization, but I have difficulties to follow the classes instantiations.
>>>   I ask because in fact, I would prefer having only one "vector.body" block for the
>>> above C program, as it was happening with LoopVectorize.cpp version of Nov 2015.
>>> 
>>> Thank you very much,
>>>   Alex

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160807/24eec82d/attachment.html>


More information about the llvm-dev mailing list