[llvm-dev] [LLVMdev] LLVM loop vectorizer - changing vectorized code

Mehdi Amini via llvm-dev llvm-dev at lists.llvm.org
Mon Jun 13 12:34:43 PDT 2016

> On Jun 13, 2016, at 12:22 PM, Alex Susu via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>  Hello, Mikhail.
>    I'm planning to do source-to-source transformation for loop vectorization.
>    Basically I want to generate C (C++) code from C (C++) source code:
>      - the code that is not vectorized remains the same - this would be simple to achieve if we can obtain precisely the source location of each statement;
>      - the code that gets vectorized I want to translate in C code the parts that are sequential and generate SIMD intrinsics for my SIMD processor where normally it would generate vector instructions.
>     I started looking at InnerLoopVectorizer::vectorize() and InnerLoopVectorizer::createEmptyLoop(). Not generating LLVM code but C/C++ code (with the help of LLVM intrinsics) is not trivial, but it should be reasonably simple to achieve.
>    Would you advise for such an operation as the one described above?  I guess doing this as a Clang phase (working on the source code) is not really a bad idea either, since I would have better control on source code, but I would need to reimplement the loop vectorizer algorithm that is currently implemented on LLVM code.

Some related work: http://llvm.org/devmtg/2013-04/krzikalla-slides.pdf


>  Thank you,
>    Alex
> On 6/4/2016 4:28 AM, Mikhail Zolotukhin wrote:
>> Hi Alex,
>> I think the changes you want are actually not vectorizer related. Vectorizer just uses
>> data provided by other passes.
>> What you probably might want is to look into routine Loop::getStartLoc() (see
>> lib/Analysis/LoopInfo.cpp). If you find a way to improve it, patches are welcome:)
>> Thanks, Michael
>>> On Jun 3, 2016, at 6:13 PM, Alex Susu <alex.e.susu at gmail.com> wrote:
>>> Hello. Mikhail, I come back to this older thread. I need to do a few changes to
>>> LoopVectorize.cpp.
>>> One of them is related to figuring out the exact C source line and column number of
>>> the loops being vectorized. I've noticed that a recent version of LoopVectorize.cpp
>>> prints imprecise debug info for vectorized loops such as, for example, the location
>>> of a character of an assignment statement inside the respective loop. It would help
>>> me a lot in my project to find the exact C source line and column number of the first
>>> and last character of the loop being vectorized. (imprecise location would make my
>>> life more complicated). Is this feasible? Or are there limitations at the level of
>>> clang of retrieving the exact C source line and column number location of the
>>> beginning and end of a loop (it can include indent chars before and after the loop)?
>>> (I've seen other examples with imprecise location such as the "Reading diagnostics"
>>> chapter in the book https://books.google.ro/books?isbn=1782166939 .)
>>> Note: to be able to retrieve the debug info from the C source file we require to run
>>> clang with -Rpass* options, as discussed before. Otherwise, if we run clang first,
>>> then opt on the resulting .ll file which runs LoopVectorize, we lose the C source
>>> file debug info (DebugLoc class, etc) and obtain the debug info from the .ll file. An
>>> example: clang -O3 3better.c -arch=mips -ffast-math -Rpass=debug
>>> -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -S -emit-llvm -fvectorize -mllvm
>>> -debug -mllvm -force-vector-width=16 -save-temps
>>> Thank you, Alex
>>> On 2/18/2016 2:17 AM, Mikhail Zolotukhin wrote:
>>>> Hi Alex,
>>>> I'm not aware of efforts on loop coalescing in LLVM, but probably polly can do
>>>> something like this. Also, one related thought: it might be worth making it a
>>>> separate pass, not a part of loop vectorizer. LLVM already has several 'utility'
>>>> passes (e.g. loop rotation), which primarily aims at enabling other passes.
>>>> Thanks, Michael
>>>>> On Feb 15, 2016, at 6:44 AM, RCU <alex.e.susu at gmail.com
>>>>> <mailto:alex.e.susu at gmail.com>> wrote:
>>>>> Hello, Michael. I come back to this older email. Sorry if you receive it again.
>>>>> I am trying to implement coalescing/collapsing of nested loops. This would be
>>>>> clearly beneficial for the loop vectorizer, also. I'm normally planning to start
>>>>> modifying the LLVM loop vectorizer to add loop coalescing of the LLVM language.
>>>>> Are you aware of a similar effort on loop coalescing in LLVM (maybe even a
>>>>> different LLVM pass, not related to the LLVM loop vectorizer)?
>>>>> Thank you, Alex
>>>>> On 7/9/2015 10:38 AM, RCU wrote:
>>>>>> With best regards, Alex Susu
>>>>>> On 7/8/2015 9:17 PM, Michael Zolotukhin wrote:
>>>>>>> Hi Alex,
>>>>>>> Example from the link you provided looks like this:
>>>>>>> |for  (i=0;  i<M;  i++  ){ z[i]=0; for  (ckey=row_ptr[i];
>>>>>>> ckey<row_ptr[i+1]; ckey++)  { z[i]  +=  data[ckey]*x[colind[ckey]]; } }|
>>>>>>> Is it the loop you are trying to vectorize? I don’t see any ‘if’ inside the
>>>>>>> innermost loop.
>>>>>> I tried to simplify this code in the hope the loop vectorizer can take care of
>>>>>> it better: I linearized...
>>>>>>> But anyway, here vectorizer might have following troubles: 1) iteration count
>>>>>>> of the innermost loop is unknown. 2) Gather accesses ( a[b[i]] ). With AVX512
>>>>>>> set of instructions it’s possible to generate efficient code for such case,
>>>>>>> but a) I think it’s not supported yet, b) if this ISA isn’t available, then
>>>>>>> vectorized code would need to ‘manually’ gather scalar values to vector,
>>>>>>> which might be slow (and thus, vectorizer might decide to leave the code
>>>>>>> scalar).
>>>>>>> And here is a list of papers vectorizer is based on: // The
>>>>>>> reduction-variable vectorization is based on the paper: //  D. Nuzman and R.
>>>>>>> Henderson. Multi-platform Auto-vectorization. // // Variable uniformity
>>>>>>> checks are inspired by: //  Karrenberg, R. and Hack, S. Whole Function
>>>>>>> Vectorization. // // The interleaved access vectorization is based on the
>>>>>>> paper: //  Dorit Nuzman, Ira Rosen and Ayal Zaks.  Auto-Vectorization of
>>>>>>> Interleaved //  Data for SIMD // // Other ideas/concepts are from: //  A.
>>>>>>> Zaks and D. Nuzman. Autovectorization in GCC-two years later. // //  S.
>>>>>>> Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua. An Evaluation of //
>>>>>>> Vectorizing Compilers. And probably, some of the parts are written from
>>>>>>> scratch with no reference to a paper.
>>>>>>> The presentations you found are a good starting point, but while they’re
>>>>>>> still good from getting basics of the vectorizer, they are a bit outdated now
>>>>>>> in a sense that a lot of new features has been added since then (and bugs
>>>>>>> fixed:) ). Also, I’d recommend trying a newer LLVM version - I don’t think
>>>>>>> it’ll handle the example above, but it would be much more convenient to
>>>>>>> investigate why the loop isn’t vectorized and fix vectorizer if we figure out
>>>>>>> how.
>>>>>>> Best regards, Michael
>>>>>> Thanks for the papers - these appear to be written in the header of the file
>>>>>> implementing the loop vect. tranformation (found at
>>>>>> "where-you-want-llvm-to-live"/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
>>>>>> ).
>>>>>>>> On Jul 8, 2015, at 10:01 AM, RCU <alex.e.susu at gmail.com
>>>>>>>> <mailto:alex.e.susu at gmail.com><mailto:alex.e.susu at gmail.com>> wrote:
>>>>>>>> Hello. I am trying to vectorize a CSR SpMV (sparse matrix vector
>>>>>>>> multiplication) procedure but the LLVM loop vectorizer is not able to
>>>>>>>> handle such code. I am using cland and llvm version 3.4 (on Ubuntu 12.10).
>>>>>>>> I use the -fvectorize option with clang and -loop-vectorize with opt-3.4 .
>>>>>>>> The CSR SpMV function is inspired from
>>>>>>>> http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp
> (I can provide the exact code samples used).
>>>>>>>> Basically the problem is the loop vectorizer does NOT work with if inside
>>>>>>>> loop (be it 2 nested loops or a modification of SpMV I did with just 1 loop
>>>>>>>> - I can provide the exact code) changing the value of the accumulator z. I
>>>>>>>> can sort of understand why LLVM isn't able to vectorize the code. However,
>>>>>>>> athttp://llvm.org/docs/Vectorizers.html#if-conversionit is written: <<The
>>>>>>>> Loop Vectorizer is able to "flatten" the IF statement in the code and
>>>>>>>> generate a single stream of instructions. The Loop Vectorizer supports any
>>>>>>>> control flow in the innermost loop. The innermost loop may contain complex
>>>>>>>> nesting of IFs, ELSEs and even GOTOs.>> Could you please tell me what are
>>>>>>>> these lines exactly trying to say.
>>>>>>>> Could you please tell me what algorithm is the LLVM loop vectorizer using
>>>>>>>> (maybe the algorithm is described in a paper) - I currently found only 2
>>>>>>>> presentations on this
>>>>>>>> topic:http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdfand
>>>>>>>> https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf
> .
>>>>>>>> Thank you very much, Alex _______________________________________________
>>>>>>>> LLVM Developers mailing list LLVMdev at cs.uiuc.edu
>>>>>>>> <mailto:LLVMdev at cs.uiuc.edu><mailto:LLVMdev at cs.uiuc.edu>http://llvm.cs.uiuc.edu
> <http://llvm.cs.uiuc.edu/>
>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

More information about the llvm-dev mailing list