[llvm-dev] [LLVMdev] LLVM loop vectorizer - changing vectorized code

Mon Jun 13 12:22:03 PDT 2016

   Hello, Mikhail.
     I'm planning to do source-to-source transformation for loop vectorization.
     Basically I want to generate C (C++) code from C (C++) source code:
       - the code that is not vectorized remains the same - this would be simple to 
achieve if we can obtain precisely the source location of each statement;
       - the code that gets vectorized I want to translate in C code the parts that are 
sequential and generate SIMD intrinsics for my SIMD processor where normally it would 
generate vector instructions.
      I started looking at InnerLoopVectorizer::vectorize() and 
InnerLoopVectorizer::createEmptyLoop(). Not generating LLVM code but C/C++ code (with the 
help of LLVM intrinsics) is not trivial, but it should be reasonably simple to achieve.

     Would you advise for such an operation as the one described above?  I guess doing 
this as a Clang phase (working on the source code) is not really a bad idea either, since 
I would have better control on source code, but I would need to reimplement the loop 
vectorizer algorithm that is currently implemented on LLVM code.

   Thank you,
     Alex

On 6/4/2016 4:28 AM, Mikhail Zolotukhin wrote:
> Hi Alex,
>
> I think the changes you want are actually not vectorizer related. Vectorizer just uses
> data provided by other passes.
>
> What you probably might want is to look into routine Loop::getStartLoc() (see
> lib/Analysis/LoopInfo.cpp). If you find a way to improve it, patches are welcome:)
>
> Thanks, Michael
>
>> On Jun 3, 2016, at 6:13 PM, Alex Susu <alex.e.susu at gmail.com> wrote:
>>
>> Hello. Mikhail, I come back to this older thread. I need to do a few changes to
>> LoopVectorize.cpp.
>>
>> One of them is related to figuring out the exact C source line and column number of
>> the loops being vectorized. I've noticed that a recent version of LoopVectorize.cpp
>> prints imprecise debug info for vectorized loops such as, for example, the location
>> of a character of an assignment statement inside the respective loop. It would help
>> me a lot in my project to find the exact C source line and column number of the first
>> and last character of the loop being vectorized. (imprecise location would make my
>> life more complicated). Is this feasible? Or are there limitations at the level of
>> clang of retrieving the exact C source line and column number location of the
>> beginning and end of a loop (it can include indent chars before and after the loop)?
>> (I've seen other examples with imprecise location such as the "Reading diagnostics"
>> chapter in the book https://books.google.ro/books?isbn=1782166939 .)
>>
>> Note: to be able to retrieve the debug info from the C source file we require to run
>> clang with -Rpass* options, as discussed before. Otherwise, if we run clang first,
>> then opt on the resulting .ll file which runs LoopVectorize, we lose the C source
>> file debug info (DebugLoc class, etc) and obtain the debug info from the .ll file. An
>> example: clang -O3 3better.c -arch=mips -ffast-math -Rpass=debug
>> -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -S -emit-llvm -fvectorize -mllvm
>> -debug -mllvm -force-vector-width=16 -save-temps
>>
>> Thank you, Alex
>>
>>
>>
>> On 2/18/2016 2:17 AM, Mikhail Zolotukhin wrote:
>>> Hi Alex,
>>>
>>> I'm not aware of efforts on loop coalescing in LLVM, but probably polly can do
>>> something like this. Also, one related thought: it might be worth making it a
>>> separate pass, not a part of loop vectorizer. LLVM already has several 'utility'
>>> passes (e.g. loop rotation), which primarily aims at enabling other passes.
>>>
>>> Thanks, Michael
>>>
>>>> On Feb 15, 2016, at 6:44 AM, RCU <alex.e.susu at gmail.com
>>>> <mailto:alex.e.susu at gmail.com>> wrote:
>>>>
>>>> Hello, Michael. I come back to this older email. Sorry if you receive it again.
>>>>
>>>> I am trying to implement coalescing/collapsing of nested loops. This would be
>>>> clearly beneficial for the loop vectorizer, also. I'm normally planning to start
>>>> modifying the LLVM loop vectorizer to add loop coalescing of the LLVM language.
>>>>
>>>> Are you aware of a similar effort on loop coalescing in LLVM (maybe even a
>>>> different LLVM pass, not related to the LLVM loop vectorizer)?
>>>>
>>>> Thank you, Alex
>>>>
>>>> On 7/9/2015 10:38 AM, RCU wrote:
>>>>>
>>>>>
>>>>> With best regards, Alex Susu
>>>>>
>>>>> On 7/8/2015 9:17 PM, Michael Zolotukhin wrote:
>>>>>> Hi Alex,
>>>>>>
>>>>>> Example from the link you provided looks like this:
>>>>>>
>>>>>> |for  (i=0;  i<M;  i++  ){ z[i]=0; for  (ckey=row_ptr[i];
>>>>>> ckey<row_ptr[i+1]; ckey++)  { z[i]  +=  data[ckey]*x[colind[ckey]]; } }|
>>>>>>
>>>>>> Is it the loop you are trying to vectorize? I don’t see any ‘if’ inside the
>>>>>> innermost loop.
>>>>> I tried to simplify this code in the hope the loop vectorizer can take care of
>>>>> it better: I linearized...
>>>>>
>>>>>> But anyway, here vectorizer might have following troubles: 1) iteration count
>>>>>> of the innermost loop is unknown. 2) Gather accesses ( a[b[i]] ). With AVX512
>>>>>> set of instructions it’s possible to generate efficient code for such case,
>>>>>> but a) I think it’s not supported yet, b) if this ISA isn’t available, then
>>>>>> vectorized code would need to ‘manually’ gather scalar values to vector,
>>>>>> which might be slow (and thus, vectorizer might decide to leave the code
>>>>>> scalar).
>>>>>>
>>>>>> And here is a list of papers vectorizer is based on: // The
>>>>>> reduction-variable vectorization is based on the paper: //  D. Nuzman and R.
>>>>>> Henderson. Multi-platform Auto-vectorization. // // Variable uniformity
>>>>>> checks are inspired by: //  Karrenberg, R. and Hack, S. Whole Function
>>>>>> Vectorization. // // The interleaved access vectorization is based on the
>>>>>> paper: //  Dorit Nuzman, Ira Rosen and Ayal Zaks.  Auto-Vectorization of
>>>>>> Interleaved //  Data for SIMD // // Other ideas/concepts are from: //  A.
>>>>>> Zaks and D. Nuzman. Autovectorization in GCC-two years later. // //  S.
>>>>>> Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua. An Evaluation of //
>>>>>> Vectorizing Compilers. And probably, some of the parts are written from
>>>>>> scratch with no reference to a paper.
>>>>>>
>>>>>> The presentations you found are a good starting point, but while they’re
>>>>>> still good from getting basics of the vectorizer, they are a bit outdated now
>>>>>> in a sense that a lot of new features has been added since then (and bugs
>>>>>> fixed:) ). Also, I’d recommend trying a newer LLVM version - I don’t think
>>>>>> it’ll handle the example above, but it would be much more convenient to
>>>>>> investigate why the loop isn’t vectorized and fix vectorizer if we figure out
>>>>>> how.
>>>>>>
>>>>>> Best regards, Michael
>>>>>>
>>>>>
>>>>> Thanks for the papers - these appear to be written in the header of the file
>>>>> implementing the loop vect. tranformation (found at
>>>>> "where-you-want-llvm-to-live"/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
>>>>> ).
>>>>>
>>>>>>> On Jul 8, 2015, at 10:01 AM, RCU <alex.e.susu at gmail.com
>>>>>>> <mailto:alex.e.susu at gmail.com><mailto:alex.e.susu at gmail.com>> wrote:
>>>>>>>
>>>>>>> Hello. I am trying to vectorize a CSR SpMV (sparse matrix vector
>>>>>>> multiplication) procedure but the LLVM loop vectorizer is not able to
>>>>>>> handle such code. I am using cland and llvm version 3.4 (on Ubuntu 12.10).
>>>>>>> I use the -fvectorize option with clang and -loop-vectorize with opt-3.4 .
>>>>>>> The CSR SpMV function is inspired from
>>>>>>> http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp
>>>>>>>
>>>>>>>
>>>>>>>
>>
>>>>>>>
(I can provide the exact code samples used).
>>>>>>>
>>>>>>> Basically the problem is the loop vectorizer does NOT work with if inside
>>>>>>> loop (be it 2 nested loops or a modification of SpMV I did with just 1 loop
>>>>>>> - I can provide the exact code) changing the value of the accumulator z. I
>>>>>>> can sort of understand why LLVM isn't able to vectorize the code. However,
>>>>>>> athttp://llvm.org/docs/Vectorizers.html#if-conversionit is written: <<The
>>>>>>> Loop Vectorizer is able to "flatten" the IF statement in the code and
>>>>>>> generate a single stream of instructions. The Loop Vectorizer supports any
>>>>>>> control flow in the innermost loop. The innermost loop may contain complex
>>>>>>> nesting of IFs, ELSEs and even GOTOs.>> Could you please tell me what are
>>>>>>> these lines exactly trying to say.
>>>>>>>
>>>>>>> Could you please tell me what algorithm is the LLVM loop vectorizer using
>>>>>>> (maybe the algorithm is described in a paper) - I currently found only 2
>>>>>>> presentations on this
>>>>>>> topic:http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdfand
>>>>>>> https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf
>>>>>>>
>>>>>>>
>>>>>>>
>>
>>>>>>>
.
>>>>>>>
>>>>>>> Thank you very much, Alex _______________________________________________
>>>>>>> LLVM Developers mailing list LLVMdev at cs.uiuc.edu
>>>>>>> <mailto:LLVMdev at cs.uiuc.edu><mailto:LLVMdev at cs.uiuc.edu>http://llvm.cs.uiuc.edu
>>>>>>>
>>>>>>>
>>
>>>>>>>
<http://llvm.cs.uiuc.edu/>
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>
>
>