[llvm-dev] [LLVMdev] LLVM loop vectorizer
Mikhail Zolotukhin via llvm-dev
llvm-dev at lists.llvm.org
Fri Jun 3 18:28:14 PDT 2016
Hi Alex,
I think the changes you want are actually not vectorizer related. Vectorizer just uses data provided by other passes.
What you probably might want is to look into routine Loop::getStartLoc() (see lib/Analysis/LoopInfo.cpp). If you find a way to improve it, patches are welcome:)
Thanks,
Michael
> On Jun 3, 2016, at 6:13 PM, Alex Susu <alex.e.susu at gmail.com> wrote:
>
> Hello.
> Mikhail, I come back to this older thread.
> I need to do a few changes to LoopVectorize.cpp.
>
> One of them is related to figuring out the exact C source line and column number of the loops being vectorized. I've noticed that a recent version of LoopVectorize.cpp prints imprecise debug info for vectorized loops such as, for example, the location of a character of an assignment statement inside the respective loop.
> It would help me a lot in my project to find the exact C source line and column number of the first and last character of the loop being vectorized. (imprecise location would make my life more complicated).
> Is this feasible? Or are there limitations at the level of clang of retrieving the exact C source line and column number location of the beginning and end of a loop (it can include indent chars before and after the loop)?
> (I've seen other examples with imprecise location such as the "Reading diagnostics" chapter in the book https://books.google.ro/books?isbn=1782166939 .)
>
> Note: to be able to retrieve the debug info from the C source file we require to run clang with -Rpass* options, as discussed before. Otherwise, if we run clang first, then opt on the resulting .ll file which runs LoopVectorize, we lose the C source file debug info (DebugLoc class, etc) and obtain the debug info from the .ll file. An example:
> clang -O3 3better.c -arch=mips -ffast-math -Rpass=debug -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -S -emit-llvm -fvectorize -mllvm -debug -mllvm -force-vector-width=16 -save-temps
>
> Thank you,
> Alex
>
>
>
> On 2/18/2016 2:17 AM, Mikhail Zolotukhin wrote:
>> Hi Alex,
>>
>> I'm not aware of efforts on loop coalescing in LLVM, but probably polly can do
>> something like this. Also, one related thought: it might be worth making it a separate
>> pass, not a part of loop vectorizer. LLVM already has several 'utility' passes (e.g.
>> loop rotation), which primarily aims at enabling other passes.
>>
>> Thanks, Michael
>>
>>> On Feb 15, 2016, at 6:44 AM, RCU <alex.e.susu at gmail.com
>>> <mailto:alex.e.susu at gmail.com>> wrote:
>>>
>>> Hello, Michael. I come back to this older email. Sorry if you receive it again.
>>>
>>> I am trying to implement coalescing/collapsing of nested loops. This would be
>>> clearly beneficial for the loop vectorizer, also. I'm normally planning to start
>>> modifying the LLVM loop vectorizer to add loop coalescing of the LLVM language.
>>>
>>> Are you aware of a similar effort on loop coalescing in LLVM (maybe even a different
>>> LLVM pass, not related to the LLVM loop vectorizer)?
>>>
>>> Thank you, Alex
>>>
>>> On 7/9/2015 10:38 AM, RCU wrote:
>>>>
>>>>
>>>> With best regards, Alex Susu
>>>>
>>>> On 7/8/2015 9:17 PM, Michael Zolotukhin wrote:
>>>>> Hi Alex,
>>>>>
>>>>> Example from the link you provided looks like this:
>>>>>
>>>>> |for (i=0; i<M; i++ ){ z[i]=0; for (ckey=row_ptr[i]; ckey<row_ptr[i+1];
>>>>> ckey++) { z[i] += data[ckey]*x[colind[ckey]]; } }|
>>>>>
>>>>> Is it the loop you are trying to vectorize? I don’t see any ‘if’ inside the
>>>>> innermost loop.
>>>> I tried to simplify this code in the hope the loop vectorizer can take care of it
>>>> better: I linearized...
>>>>
>>>>> But anyway, here vectorizer might have following troubles: 1) iteration count of
>>>>> the innermost loop is unknown. 2) Gather accesses ( a[b[i]] ). With AVX512 set of
>>>>> instructions it’s possible to generate efficient code for such case, but a) I
>>>>> think it’s not supported yet, b) if this ISA isn’t available, then vectorized
>>>>> code would need to ‘manually’ gather scalar values to vector, which might be slow
>>>>> (and thus, vectorizer might decide to leave the code scalar).
>>>>>
>>>>> And here is a list of papers vectorizer is based on: // The reduction-variable
>>>>> vectorization is based on the paper: // D. Nuzman and R. Henderson.
>>>>> Multi-platform Auto-vectorization. // // Variable uniformity checks are inspired
>>>>> by: // Karrenberg, R. and Hack, S. Whole Function Vectorization. // // The
>>>>> interleaved access vectorization is based on the paper: // Dorit Nuzman, Ira
>>>>> Rosen and Ayal Zaks. Auto-Vectorization of Interleaved // Data for SIMD // //
>>>>> Other ideas/concepts are from: // A. Zaks and D. Nuzman. Autovectorization in
>>>>> GCC-two years later. // // S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua.
>>>>> An Evaluation of // Vectorizing Compilers. And probably, some of the parts are
>>>>> written from scratch with no reference to a paper.
>>>>>
>>>>> The presentations you found are a good starting point, but while they’re still
>>>>> good from getting basics of the vectorizer, they are a bit outdated now in a
>>>>> sense that a lot of new features has been added since then (and bugs fixed:) ).
>>>>> Also, I’d recommend trying a newer LLVM version - I don’t think it’ll handle the
>>>>> example above, but it would be much more convenient to investigate why the loop
>>>>> isn’t vectorized and fix vectorizer if we figure out how.
>>>>>
>>>>> Best regards, Michael
>>>>>
>>>>
>>>> Thanks for the papers - these appear to be written in the header of the file
>>>> implementing the loop vect. tranformation (found at
>>>> "where-you-want-llvm-to-live"/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp ).
>>>>
>>>>>> On Jul 8, 2015, at 10:01 AM, RCU <alex.e.susu at gmail.com
>>>>>> <mailto:alex.e.susu at gmail.com><mailto:alex.e.susu at gmail.com>> wrote:
>>>>>>
>>>>>> Hello. I am trying to vectorize a CSR SpMV (sparse matrix vector
>>>>>> multiplication) procedure but the LLVM loop vectorizer is not able to handle
>>>>>> such code. I am using cland and llvm version 3.4 (on Ubuntu 12.10). I use the
>>>>>> -fvectorize option with clang and -loop-vectorize with opt-3.4 . The CSR SpMV
>>>>>> function is inspired from
>>>>>> http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp
>>>>>>
>>>>>>
>>>>>>
> (I can provide the exact code samples used).
>>>>>>
>>>>>> Basically the problem is the loop vectorizer does NOT work with if inside loop
>>>>>> (be it 2 nested loops or a modification of SpMV I did with just 1 loop - I can
>>>>>> provide the exact code) changing the value of the accumulator z. I can sort of
>>>>>> understand why LLVM isn't able to vectorize the code. However,
>>>>>> athttp://llvm.org/docs/Vectorizers.html#if-conversionit is written: <<The Loop
>>>>>> Vectorizer is able to "flatten" the IF statement in the code and generate a
>>>>>> single stream of instructions. The Loop Vectorizer supports any control flow in
>>>>>> the innermost loop. The innermost loop may contain complex nesting of IFs,
>>>>>> ELSEs and even GOTOs.>> Could you please tell me what are these lines exactly
>>>>>> trying to say.
>>>>>>
>>>>>> Could you please tell me what algorithm is the LLVM loop vectorizer using
>>>>>> (maybe the algorithm is described in a paper) - I currently found only 2
>>>>>> presentations on this
>>>>>> topic:http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdfand
>>>>>> https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf
>>>>>>
>>>>>>
>>>>>>
> .
>>>>>>
>>>>>> Thank you very much, Alex _______________________________________________ LLVM
>>>>>> Developers mailing list LLVMdev at cs.uiuc.edu
>>>>>> <mailto:LLVMdev at cs.uiuc.edu><mailto:LLVMdev at cs.uiuc.edu>http://llvm.cs.uiuc.edu
>>>>>>
>>>>>>
> <http://llvm.cs.uiuc.edu/>
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
More information about the llvm-dev
mailing list