[llvm-dev] [LLVMdev] LLVM loop vectorizer - start and end locations

Thu Aug 18 17:16:31 PDT 2016

   Hello.
     Hal, thank you very much - if I have to make my application 100% reliable I might 
enhance Clang as you suggested.

     As I've already suggested, I am interested in getting both exact start and end 
locations for a loop in order to replace it with a different loop with different content 
(using vector intrinsics) in the source file - so basically I want to perform a rather 
non-standard source-to-source transformation.

   Best regards,
     Alex


On 8/13/2016 1:52 AM, Hal Finkel wrote:
> Hi Alex,
>
> If you want to get both the starting and ending locations, I think your best bet is to
> enhance Clang to insert into the loop metadata, not just the location of the start of
> the loop, but also the location of the end of the loop. Then you can grab that in the
> backend.
>
> What's your use case for this exactly?
>
> -Hal
>
> ----- Original Message -----
>> From: "Alex Susu" <alex.e.susu at gmail.com> To: "llvm-dev" <llvm-dev at lists.llvm.org>
>> Cc: "Adam Nemet" <anemet at apple.com>, "Hal Finkel" <hfinkel at anl.gov> Sent: Friday,
>> August 12, 2016 4:43:27 AM Subject: Re: [llvm-dev] [LLVMdev] LLVM loop vectorizer -
>> start and end locations
>>
>> Hello. Hal, Adam, thank you very much for the fix mentioned. I ran an opt built with
>> this fix and I got the precise start loop location. I am interested in getting both
>> exact start and end locations for a loop in order to replace the loop with a
>> different content in the source file (basically perform a rather non-standard
>> source-to-source transformation).
>>
>> I've tried to compute the end location for the loop by "parsing" the file (looking
>> at each character) at least from the start location, but this can be quite complex
>> for nested blocks in the loop, etc.
>>
>> Also, I've tried to get more information from the LLVM IR instructions: -
>> Loop::getUniqueExitBlock()::front()::getDebugLoc() returns the first statement after
>> the loop. But, in the case of a nested loop the first (and last) statement after the
>> loop is the "increment" statement in the outer enclosing loop. So, even if for
>> simple loops getUniqueExitBlock etc looks promising, this is still not great. - I
>> also iterated through all the statements of all basic-blocks of the loop (used
>> Loop::block_begin() and block_end(), etc). From these I can choose the the min and
>> max locations. This is not great either because the loops can contain comments before
>> the final "}" of the loop (if there is one) and this would result in imprecise end
>> location - most importantly the "}" of the loop basically does not have a
>> corresponding LLVM IR instruction. Of course "parsing" to the right of the max
>> location found above for an uncommented "}" is not very difficult.
>>
>> I could also try to get more information from the AST of Clang while being in the
>> opt tool, but I don't know how to read it - maybe I could use Libtooling. Do you have
>> an idea here? Can I get the corresponding AST node from an LLVM IR instruction? (Here
>> I got an interesting pointer:
>> http://clang-developers.42468.n3.nabble.com/Matching-Clang-s-AST-nodes-to-the-LLVM-IR-instructions-they-produced-td3665037.html,
>>
>>
but maybe it is outdated)
>>
>> Thank you, Alex
>>
>>
>> On 6/8/2016 1:29 AM, Adam Nemet wrote:
>>> Hi Alex,
>>>
>>> This has been very recently fixed by Hal.  See http://reviews.llvm.org/rL270771
>>>
>>> Adam
>>>
>>>> On Jun 4, 2016, at 3:13 AM, Alex Susu via llvm-dev <llvm-dev at lists.llvm.org
>>>> <mailto:llvm-dev at lists.llvm.org>> wrote:
>>>>
>>>> Hello. Mikhail, I come back to this older thread. I need to do a few changes to
>>>> LoopVectorize.cpp.
>>>>
>>>> One of them is related to figuring out the exact C source line and column number
>>>> of the loops being vectorized. I've noticed that a recent version of
>>>> LoopVectorize.cpp prints imprecise debug info for vectorized loops such as, for
>>>> example, the location of a character of an assignment statement inside the
>>>> respective loop. It would help me a lot in my project to find the exact C source
>>>> line and column number of the first and last character of the loop being
>>>> vectorized. (imprecise location would make my life more complicated). Is this
>>>> feasible? Or are there limitations at the level of clang of retrieving the exact
>>>> C source line and column number location of the beginning and end of a loop (it
>>>> can include indent chars before and after the loop)? (I've seen other examples
>>>> with imprecise location such as the "Reading diagnostics" chapter in the book
>>>> https://books.google.ro/books?isbn=1782166939.)
>>>>
>>>> Note: to be able to retrieve the debug info from the C source file we require to
>>>> run clang with -Rpass* options, as discussed before. Otherwise, if we run clang
>>>> first, then opt on the resulting .ll file which runs LoopVectorize, we lose the C
>>>> source file debug info (DebugLoc class, etc) and obtain the debug info from the
>>>> .ll file. An example: clang -O3 3better.c -arch=mips -ffast-math -Rpass=debug
>>>> -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize -S -emit-llvm -fvectorize
>>>> -mllvm -debug -mllvm -force-vector-width=16 -save-temps
>>>>
>>>> Thank you, Alex
>>>>
>>>>
>>>>
>>>> On 2/18/2016 2:17 AM, Mikhail Zolotukhin wrote:
>>>>> Hi Alex,
>>>>>
>>>>> I'm not aware of efforts on loop coalescing in LLVM, but probably polly can do
>>>>> something like this. Also, one related thought: it might be worth making it a
>>>>> separate pass, not a part of loop vectorizer. LLVM already has several
>>>>> 'utility' passes (e.g. loop rotation), which primarily aims at enabling other
>>>>> passes.
>>>>>
>>>>> Thanks, Michael
>>>>>
>>>>>> On Feb 15, 2016, at 6:44 AM, RCU <alex.e.susu at gmail.com
>>>>>> <mailto:alex.e.susu at gmail.com> <mailto:alex.e.susu at gmail.com>> wrote:
>>>>>>
>>>>>> Hello, Michael. I come back to this older email. Sorry if you receive it
>>>>>> again.
>>>>>>
>>>>>> I am trying to implement coalescing/collapsing of nested loops. This would
>>>>>> be clearly beneficial for the loop vectorizer, also. I'm normally planning to
>>>>>> start modifying the LLVM loop vectorizer to add loop coalescing of the LLVM
>>>>>> language.
>>>>>>
>>>>>> Are you aware of a similar effort on loop coalescing in LLVM (maybe even a
>>>>>> different LLVM pass, not related to the LLVM loop vectorizer)?
>>>>>>
>>>>>> Thank you, Alex
>>>>>>
>>>>>> On 7/9/2015 10:38 AM, RCU wrote:
>>>>>>>
>>>>>>>
>>>>>>> With best regards, Alex Susu
>>>>>>>
>>>>>>> On 7/8/2015 9:17 PM, Michael Zolotukhin wrote:
>>>>>>>> Hi Alex,
>>>>>>>>
>>>>>>>> Example from the link you provided looks like this:
>>>>>>>>
>>>>>>>> |for  (i=0;  i<M;  i++  ){ z[i]=0; for  (ckey=row_ptr[i]; |
>>>>>>>> ckey<row_ptr[i+1]; ckey++)  { z[i]  +=  data[ckey]*x[colind[ckey]]; } }|
>>>>>>>>
>>>>>>>> Is it the loop you are trying to vectorize? I don’t see any ‘if’ inside
>>>>>>>> the innermost loop.
>>>>>>> I tried to simplify this code in the hope the loop vectorizer can take care
>>>>>>> of it better: I linearized...
>>>>>>>
>>>>>>>> But anyway, here vectorizer might have following troubles: 1) iteration
>>>>>>>> count of the innermost loop is unknown. 2) Gather accesses ( a[b[i]] ).
>>>>>>>> With AVX512 set of instructions it’s possible to generate efficient code
>>>>>>>> for such case, but a) I think it’s not supported yet, b) if this ISA
>>>>>>>> isn’t available, then vectorized code would need to ‘manually’ gather
>>>>>>>> scalar values to vector, which might be slow (and thus, vectorizer might
>>>>>>>> decide to leave the code scalar).
>>>>>>>>
>>>>>>>> And here is a list of papers vectorizer is based on: // The
>>>>>>>> reduction-variable vectorization is based on the paper: //  D. Nuzman and
>>>>>>>> R. Henderson. Multi-platform Auto-vectorization. // // Variable
>>>>>>>> uniformity checks are inspired by: //  Karrenberg, R. and Hack, S. Whole
>>>>>>>> Function Vectorization. // // The interleaved access vectorization is
>>>>>>>> based on the paper: // Dorit Nuzman, Ira Rosen and Ayal Zaks.
>>>>>>>> Auto-Vectorization of Interleaved // Data for SIMD // // Other
>>>>>>>> ideas/concepts are from: //  A. Zaks and D. Nuzman. Autovectorization in
>>>>>>>> GCC-two years later. // //  S. Maleki, Y. Gao, M. Garzaran, T. Wong and
>>>>>>>> D. Padua. An Evaluation of //  Vectorizing Compilers. And probably, some
>>>>>>>> of the parts are written from scratch with no reference to a paper.
>>>>>>>>
>>>>>>>> The presentations you found are a good starting point, but while they’re
>>>>>>>> still good from getting basics of the vectorizer, they are a bit outdated
>>>>>>>> now in a sense that a lot of new features has been added since then (and
>>>>>>>> bugs fixed:) ). Also, I’d recommend trying a newer LLVM version - I
>>>>>>>> don’t think it’ll handle the example above, but it would be much more
>>>>>>>> convenient to investigate why the loop isn’t vectorized and fix
>>>>>>>> vectorizer if we figure out how.
>>>>>>>>
>>>>>>>> Best regards, Michael
>>>>>>>>
>>>>>>>
>>>>>>> Thanks for the papers - these appear to be written in the header of the
>>>>>>> file implementing the loop vect. tranformation (found at
>>>>>>> "where-you-want-llvm-to-live"/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
>>>>>>>
>>>>>>>
).
>>>>>>>
>>>>>>>>> On Jul 8, 2015, at 10:01 AM, RCU <alex.e.susu at gmail.com
>>>>>>>>> <mailto:alex.e.susu at gmail.com>
>>>>>>>>> <mailto:alex.e.susu at gmail.com><mailto:alex.e.susu at gmail.com>> wrote:
>>>>>>>>>
>>>>>>>>> Hello. I am trying to vectorize a CSR SpMV (sparse matrix vector
>>>>>>>>> multiplication) procedure but the LLVM loop vectorizer is not able to
>>>>>>>>> handle such code. I am using cland and llvm version 3.4 (on Ubuntu
>>>>>>>>> 12.10). I use the -fvectorize option with clang and -loop-vectorize
>>>>>>>>> with opt-3.4 . The CSR SpMV function is inspired from
>>>>>>>>> http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>
>>>>>>>>>
(I can provide the exact code samples used).
>>>>>>>>>
>>>>>>>>> Basically the problem is the loop vectorizer does NOT work with if
>>>>>>>>> inside loop (be it 2 nested loops or a modification of SpMV I did with
>>>>>>>>> just 1 loop - I can provide the exact code) changing the value of the
>>>>>>>>> accumulator z. I can sort of understand why LLVM isn't able to
>>>>>>>>> vectorize the code. However,
>>>>>>>>> athttp://llvm.org/docs/Vectorizers.html#if-conversionitis written:
>>>>>>>>> <<The Loop Vectorizer is able to "flatten" the IF statement in the
>>>>>>>>> code and generate a single stream of instructions. The Loop Vectorizer
>>>>>>>>> supports any control flow in the innermost loop. The innermost loop may
>>>>>>>>> contain complex nesting of IFs, ELSEs and even GOTOs.>> Could you
>>>>>>>>> please tell me what are these lines exactly trying to say.
>>>>>>>>>
>>>>>>>>> Could you please tell me what algorithm is the LLVM loop vectorizer
>>>>>>>>> using (maybe the algorithm is described in a paper) - I currently found
>>>>>>>>> only 2 presentations on this
>>>>>>>>> topic:http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdfand
>>>>>>>>> <http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdfand>
>>>>>>>>> https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>
>>>>>>>>>
.
>>>>>>>>>
>>>>>>>>> Thank you very much, Alex
>>>>>>>>> _______________________________________________ LLVM Developers mailing
>>>>>>>>> listLLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
>>>>>>>>> <mailto:LLVMdev at cs.uiuc.edu><mailto:LLVMdev at cs.uiuc.edu>http://llvm.cs.uiuc.edu
>>>>>>>>>
>>>>>>>>>
<http://llvm.cs.uiuc.edu/>
>>>>>>>>>
>>>>>>>>>
>>>> <http://llvm.cs.uiuc.edu/>
>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>>>>
>>>> _______________________________________________ LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>