<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi Alex,<div class=""><br class=""></div><div class="">I'm not aware of efforts on loop coalescing in LLVM, but probably polly can do something like this. Also, one related thought: it might be worth making it a separate pass, not a part of loop vectorizer. LLVM already has several 'utility' passes (e.g. loop rotation), which primarily aims at enabling other passes.</div><div class=""><br class=""></div><div class="">Thanks,</div><div class="">Michael</div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Feb 15, 2016, at 6:44 AM, RCU <<a href="mailto:alex.e.susu@gmail.com" class="">alex.e.susu@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class=""> Hello, Michael.</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class=""> I come back to this older email. Sorry if you receive it again.</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class=""> I am trying to implement coalescing/collapsing of nested loops. This would be clearly beneficial for the loop vectorizer, also.</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class=""> I'm normally planning to start modifying the LLVM loop vectorizer to add loop coalescing of the LLVM language.</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class=""> Are you aware of a similar effort on loop coalescing in LLVM (maybe even a different LLVM pass, not related to the LLVM loop vectorizer)?</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class=""> Thank you,</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class=""> Alex</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><span style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; float: none; display: inline !important;" class="">On 7/9/2015 10:38 AM, RCU wrote:</span><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><br class=""><br class="">With best regards,<br class=""> Alex Susu<br class=""><br class="">On 7/8/2015 9:17 PM, Michael Zolotukhin wrote:<br class=""><blockquote type="cite" class="">Hi Alex,<br class=""><br class="">Example from the link you provided looks like this:<br class=""><br class="">|for (i=0; i<M; i++ ){<br class=""> z[i]=0;<br class=""> for (ckey=row_ptr[i]; ckey<row_ptr[i+1]; ckey++) {<br class=""> z[i] += data[ckey]*x[colind[ckey]];<br class=""> }<br class=""> }|<br class=""><br class="">Is it the loop you are trying to vectorize? I don’t see any ‘if’ inside the innermost loop.<br class=""></blockquote>I tried to simplify this code in the hope the loop vectorizer can take care of it better:<br class="">I linearized...<br class=""><br class=""><blockquote type="cite" class="">But anyway, here vectorizer might have following troubles:<br class="">1) iteration count of the innermost loop is unknown.<br class="">2) Gather accesses ( a[b[i]] ). With AVX512 set of instructions it’s possible to generate<br class="">efficient code for such case, but a) I think it’s not supported yet, b) if this ISA isn’t<br class="">available, then vectorized code would need to ‘manually’ gather scalar values to vector,<br class="">which might be slow (and thus, vectorizer might decide to leave the code scalar).<br class=""><br class="">And here is a list of papers vectorizer is based on:<br class="">// The reduction-variable vectorization is based on the paper:<br class="">// D. Nuzman and R. Henderson. Multi-platform Auto-vectorization.<br class="">//<br class="">// Variable uniformity checks are inspired by:<br class="">// Karrenberg, R. and Hack, S. Whole Function Vectorization.<br class="">//<br class="">// The interleaved access vectorization is based on the paper:<br class="">// Dorit Nuzman, Ira Rosen and Ayal Zaks. Auto-Vectorization of Interleaved<br class="">// Data for SIMD<br class="">//<br class="">// Other ideas/concepts are from:<br class="">// A. Zaks and D. Nuzman. Autovectorization in GCC-two years later.<br class="">//<br class="">// S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua. An Evaluation of<br class="">// Vectorizing Compilers.<br class="">And probably, some of the parts are written from scratch with no reference to a paper.<br class=""><br class="">The presentations you found are a good starting point, but while they’re still good from<br class="">getting basics of the vectorizer, they are a bit outdated now in a sense that a lot of new<br class="">features has been added since then (and bugs fixed:) ). Also, I’d recommend trying a newer<br class="">LLVM version - I don’t think it’ll handle the example above, but it would be much more<br class="">convenient to investigate why the loop isn’t vectorized and fix vectorizer if we figure<br class="">out how.<br class=""><br class="">Best regards,<br class="">Michael<br class=""><br class=""></blockquote><br class=""> Thanks for the papers - these appear to be written in the header of the file<br class="">implementing the loop vect. tranformation (found at<br class="">"where-you-want-llvm-to-live"/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp ).<br class=""><br class=""><blockquote type="cite" class=""><blockquote type="cite" class="">On Jul 8, 2015, at 10:01 AM, RCU <<a href="mailto:alex.e.susu@gmail.com" class="">alex.e.susu@gmail.com</a><span class="Apple-converted-space"> </span><<a href="mailto:alex.e.susu@gmail.com" class="">mailto:alex.e.susu@gmail.com</a>>><br class="">wrote:<br class=""><br class="">Hello.<br class=""> I am trying to vectorize a CSR SpMV (sparse matrix vector multiplication) procedure<br class="">but the LLVM loop vectorizer is not able to handle such code.<br class=""> I am using cland and llvm version 3.4 (on Ubuntu 12.10). I use the -fvectorize option<br class="">with clang and -loop-vectorize with opt-3.4 .<br class=""> The CSR SpMV function is inspired from<br class=""><a href="http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp" class="">http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp</a><br class=""><br class="">(I can provide the exact code samples used).<br class=""><br class=""> Basically the problem is the loop vectorizer does NOT work with if inside loop (be it<br class="">2 nested loops or a modification of SpMV I did with just 1 loop - I can provide the<br class="">exact code) changing the value of the accumulator z. I can sort of understand why LLVM<br class="">isn't able to vectorize the code.<br class=""> However, at<span class="Apple-converted-space"> </span><a href="http://llvm.org/docs/Vectorizers.html#if-conversion" class="">http://llvm.org/docs/Vectorizers.html#if-conversion</a><span class="Apple-converted-space"> </span>it is written:<br class=""> <<The Loop Vectorizer is able to "flatten" the IF statement in the code and<br class="">generate a single stream of instructions.<br class=""> The Loop Vectorizer supports any control flow in the innermost loop.<br class=""> The innermost loop may contain complex nesting of IFs, ELSEs and even<br class="">GOTOs.>><br class=""> Could you please tell me what are these lines exactly trying to say.<br class=""><br class=""> Could you please tell me what algorithm is the LLVM loop vectorizer using (maybe the<br class="">algorithm is described in a paper) - I currently found only 2 presentations on this<br class="">topic:<span class="Apple-converted-space"> </span><a href="http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdf" class="">http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdf</a><span class="Apple-converted-space"> </span>and<br class=""><a href="https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf" class="">https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf</a><br class=""><br class="">.<br class=""><br class="">Thank you very much,<br class=""> Alex<br class="">_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:LLVMdev@cs.uiuc.edu" class="">LLVMdev@cs.uiuc.edu</a><span class="Apple-converted-space"> </span><<a href="mailto:LLVMdev@cs.uiuc.edu" class="">mailto:LLVMdev@cs.uiuc.edu</a>><span class="Apple-converted-space"> </span><a href="http://llvm.cs.uiuc.edu/" class="">http://llvm.cs.uiuc.edu</a><br class=""><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a></blockquote></blockquote></blockquote></div></blockquote></div><br class=""></div></body></html>