[llvm-dev] [LLVMdev] LLVM loop vectorizer

Fri Jun 3 18:28:14 PDT 2016

Hi Alex,

I think the changes you want are actually not vectorizer related. Vectorizer just uses data provided by other passes.

What you probably might want is to look into routine Loop::getStartLoc() (see lib/Analysis/LoopInfo.cpp). If you find a way to improve it, patches are welcome:)

Thanks,
Michael

> On Jun 3, 2016, at 6:13 PM, Alex Susu <alex.e.susu at gmail.com> wrote:
> 
>  Hello.
>    Mikhail, I come back to this older thread.
>    I need to do a few changes to LoopVectorize.cpp.
> 
>    One of them is related to figuring out the exact C source line and column number of the loops being vectorized. I've noticed that a recent version of LoopVectorize.cpp prints imprecise debug info for vectorized loops such as, for example, the location of a character of an assignment statement inside the respective loop.
>    It would help me a lot in my project to find the exact C source line and column number of the first and last character of the loop being vectorized. (imprecise location would make my life more complicated).
>    Is this feasible? Or are there limitations at the level of clang of retrieving the exact C source line and column number location of the beginning and end of a loop (it can include indent chars before and after the loop)?
>    (I've seen other examples with imprecise location such as the "Reading diagnostics" chapter in the book https://books.google.ro/books?isbn=1782166939 .)
> 
>    Note: to be able to retrieve the debug info from the C source file we require to run clang with -Rpass* options, as discussed before. Otherwise, if we run clang first, then opt on the resulting .ll file which runs LoopVectorize, we lose the C source file debug info (DebugLoc class, etc) and obtain the debug info from the .ll file. An example:
>        clang -O3 3better.c -arch=mips -ffast-math -Rpass=debug -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -S -emit-llvm -fvectorize -mllvm -debug -mllvm -force-vector-width=16 -save-temps
> 
>  Thank you,
>    Alex
> 
> 
> 
> On 2/18/2016 2:17 AM, Mikhail Zolotukhin wrote:
>> Hi Alex,
>> 
>> I'm not aware of efforts on loop coalescing in LLVM, but probably polly can do
>> something like this. Also, one related thought: it might be worth making it a separate
>> pass, not a part of loop vectorizer. LLVM already has several 'utility' passes (e.g.
>> loop rotation), which primarily aims at enabling other passes.
>> 
>> Thanks, Michael
>> 
>>> On Feb 15, 2016, at 6:44 AM, RCU <alex.e.susu at gmail.com
>>> <mailto:alex.e.susu at gmail.com>> wrote:
>>> 
>>> Hello, Michael. I come back to this older email. Sorry if you receive it again.
>>> 
>>> I am trying to implement coalescing/collapsing of nested loops. This would be
>>> clearly beneficial for the loop vectorizer, also. I'm normally planning to start
>>> modifying the LLVM loop vectorizer to add loop coalescing of the LLVM language.
>>> 
>>> Are you aware of a similar effort on loop coalescing in LLVM (maybe even a different
>>> LLVM pass, not related to the LLVM loop vectorizer)?
>>> 
>>> Thank you, Alex
>>> 
>>> On 7/9/2015 10:38 AM, RCU wrote:
>>>> 
>>>> 
>>>> With best regards, Alex Susu
>>>> 
>>>> On 7/8/2015 9:17 PM, Michael Zolotukhin wrote:
>>>>> Hi Alex,
>>>>> 
>>>>> Example from the link you provided looks like this:
>>>>> 
>>>>> |for  (i=0;  i<M;  i++  ){ z[i]=0; for  (ckey=row_ptr[i];  ckey<row_ptr[i+1];
>>>>> ckey++)  { z[i]  +=  data[ckey]*x[colind[ckey]]; } }|
>>>>> 
>>>>> Is it the loop you are trying to vectorize? I don’t see any ‘if’ inside the
>>>>> innermost loop.
>>>> I tried to simplify this code in the hope the loop vectorizer can take care of it
>>>> better: I linearized...
>>>> 
>>>>> But anyway, here vectorizer might have following troubles: 1) iteration count of
>>>>> the innermost loop is unknown. 2) Gather accesses ( a[b[i]] ). With AVX512 set of
>>>>> instructions it’s possible to generate efficient code for such case, but a) I
>>>>> think it’s not supported yet, b) if this ISA isn’t available, then vectorized
>>>>> code would need to ‘manually’ gather scalar values to vector, which might be slow
>>>>> (and thus, vectorizer might decide to leave the code scalar).
>>>>> 
>>>>> And here is a list of papers vectorizer is based on: // The reduction-variable
>>>>> vectorization is based on the paper: //  D. Nuzman and R. Henderson.
>>>>> Multi-platform Auto-vectorization. // // Variable uniformity checks are inspired
>>>>> by: //  Karrenberg, R. and Hack, S. Whole Function Vectorization. // // The
>>>>> interleaved access vectorization is based on the paper: //  Dorit Nuzman, Ira
>>>>> Rosen and Ayal Zaks.  Auto-Vectorization of Interleaved //  Data for SIMD // //
>>>>> Other ideas/concepts are from: //  A. Zaks and D. Nuzman. Autovectorization in
>>>>> GCC-two years later. // //  S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua.
>>>>> An Evaluation of //  Vectorizing Compilers. And probably, some of the parts are
>>>>> written from scratch with no reference to a paper.
>>>>> 
>>>>> The presentations you found are a good starting point, but while they’re still
>>>>> good from getting basics of the vectorizer, they are a bit outdated now in a
>>>>> sense that a lot of new features has been added since then (and bugs fixed:) ).
>>>>> Also, I’d recommend trying a newer LLVM version - I don’t think it’ll handle the
>>>>> example above, but it would be much more convenient to investigate why the loop
>>>>> isn’t vectorized and fix vectorizer if we figure out how.
>>>>> 
>>>>> Best regards, Michael
>>>>> 
>>>> 
>>>> Thanks for the papers - these appear to be written in the header of the file
>>>> implementing the loop vect. tranformation (found at
>>>> "where-you-want-llvm-to-live"/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp ).
>>>> 
>>>>>> On Jul 8, 2015, at 10:01 AM, RCU <alex.e.susu at gmail.com
>>>>>> <mailto:alex.e.susu at gmail.com><mailto:alex.e.susu at gmail.com>> wrote:
>>>>>> 
>>>>>> Hello. I am trying to vectorize a CSR SpMV (sparse matrix vector
>>>>>> multiplication) procedure but the LLVM loop vectorizer is not able to handle
>>>>>> such code. I am using cland and llvm version 3.4 (on Ubuntu 12.10). I use the
>>>>>> -fvectorize option with clang and -loop-vectorize with opt-3.4 . The CSR SpMV
>>>>>> function is inspired from
>>>>>> http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp
>>>>>> 
>>>>>> 
>>>>>> 
> (I can provide the exact code samples used).
>>>>>> 
>>>>>> Basically the problem is the loop vectorizer does NOT work with if inside loop
>>>>>> (be it 2 nested loops or a modification of SpMV I did with just 1 loop - I can
>>>>>> provide the exact code) changing the value of the accumulator z. I can sort of
>>>>>> understand why LLVM isn't able to vectorize the code. However,
>>>>>> athttp://llvm.org/docs/Vectorizers.html#if-conversionit is written: <<The Loop
>>>>>> Vectorizer is able to "flatten" the IF statement in the code and generate a
>>>>>> single stream of instructions. The Loop Vectorizer supports any control flow in
>>>>>> the innermost loop. The innermost loop may contain complex nesting of IFs,
>>>>>> ELSEs and even GOTOs.>> Could you please tell me what are these lines exactly
>>>>>> trying to say.
>>>>>> 
>>>>>> Could you please tell me what algorithm is the LLVM loop vectorizer using
>>>>>> (maybe the algorithm is described in a paper) - I currently found only 2
>>>>>> presentations on this
>>>>>> topic:http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdfand
>>>>>> https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf
>>>>>> 
>>>>>> 
>>>>>> 
> .
>>>>>> 
>>>>>> Thank you very much, Alex _______________________________________________ LLVM
>>>>>> Developers mailing list LLVMdev at cs.uiuc.edu
>>>>>> <mailto:LLVMdev at cs.uiuc.edu><mailto:LLVMdev at cs.uiuc.edu>http://llvm.cs.uiuc.edu
>>>>>> 
>>>>>> 
> <http://llvm.cs.uiuc.edu/>
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>