<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi Alex,<div class=""><br class=""></div><div class="">Example from the link you provided looks like this:</div><div class=""><pre class="lang-c prettyprinted prettyprint" style="margin-top: 0px; padding: 5px; border: 0px; font-size: 13px; overflow: auto; width: auto; max-height: 600px; background-color: rgb(238, 238, 238); font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, sans-serif; color: rgb(57, 51, 24); word-wrap: normal;"><code style="margin: 0px; padding: 0px; border: 0px; font-family: Consolas, Menlo, Monaco, 'Lucida Console', 'Liberation Mono', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Courier New', monospace, sans-serif; white-space: inherit;" class=""><span class="kwd" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 139);">for</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);"> </span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">(</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">i</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">=</span><span class="lit" style="margin: 0px; padding: 0px; border: 0px; color: rgb(128, 0, 0);">0</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">;</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);"> i</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);"><</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">M</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">;</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);"> i</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">++</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);"> </span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">){</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">
z</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">[</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">i</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">]=</span><span class="lit" style="margin: 0px; padding: 0px; border: 0px; color: rgb(128, 0, 0);">0</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">;</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">
</span><span class="kwd" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 139);">for</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);"> </span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">(</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">ckey</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">=</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">row_ptr</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">[</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">i</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">];</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);"> ckey</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);"><</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">row_ptr</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">[</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">i</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">+</span><span class="lit" style="margin: 0px; padding: 0px; border: 0px; color: rgb(128, 0, 0);">1</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">];</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);"> ckey</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">++)</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);"> </span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">{</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">
</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);"> z</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">[</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">i</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">]</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);"> </span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">+=</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);"> data</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">[</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">ckey</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">]*</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">x</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">[</span><span class="pln" style="white-space: inherit; margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">colind</span><span class="pun" style="white-space: inherit; margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">[</span><span class="pln" style="white-space: inherit; margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">ckey]</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">];</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">
</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">}</span><span class="pln" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">
</span><span class="pun" style="margin: 0px; padding: 0px; border: 0px; color: rgb(0, 0, 0);">}</span></code></pre><div class="">Is it the loop you are trying to vectorize? I don’t see any ‘if’ inside the innermost loop.</div><div class=""><br class=""></div><div class="">But anyway, here vectorizer might have following troubles:</div><div class="">1) iteration count of the innermost loop is unknown.</div><div class="">2) Gather accesses ( a[b[i]] ). With AVX512 set of instructions it’s possible to generate efficient code for such case, but a) I think it’s not supported yet, b) if this ISA isn’t available, then vectorized code would need to ‘manually’ gather scalar values to vector, which might be slow (and thus, vectorizer might decide to leave the code scalar).</div><div class=""><br class=""></div><div class="">And here is a list of papers vectorizer is based on:</div><div class=""><div class=""><font face="Menlo" class="">// The reduction-variable vectorization is based on the paper:</font></div><div class=""><font face="Menlo" class="">// D. Nuzman and R. Henderson. Multi-platform Auto-vectorization.</font></div><div class=""><font face="Menlo" class="">//</font></div><div class=""><font face="Menlo" class="">// Variable uniformity checks are inspired by:</font></div><div class=""><font face="Menlo" class="">// Karrenberg, R. and Hack, S. Whole Function Vectorization.</font></div><div class=""><font face="Menlo" class="">//</font></div><div class=""><font face="Menlo" class="">// The interleaved access vectorization is based on the paper:</font></div><div class=""><font face="Menlo" class="">// Dorit Nuzman, Ira Rosen and Ayal Zaks. Auto-Vectorization of Interleaved</font></div><div class=""><font face="Menlo" class="">// Data for SIMD</font></div><div class=""><font face="Menlo" class="">//</font></div><div class=""><font face="Menlo" class="">// Other ideas/concepts are from:</font></div><div class=""><font face="Menlo" class="">// A. Zaks and D. Nuzman. Autovectorization in GCC-two years later.</font></div><div class=""><font face="Menlo" class="">//</font></div><div class=""><font face="Menlo" class="">// S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua. An Evaluation of</font></div><div class=""><font face="Menlo" class="">// Vectorizing Compilers.</font></div></div><div class="">And probably, some of the parts are written from scratch with no reference to a paper.</div><div class=""><br class=""></div><div class="">The presentations you found are a good starting point, but while they’re still good from getting basics of the vectorizer, they are a bit outdated now in a sense that a lot of new features has been added since then (and bugs fixed:) ). Also, I’d recommend trying a newer LLVM version - I don’t think it’ll handle the example above, but it would be much more convenient to investigate why the loop isn’t vectorized and fix vectorizer if we figure out how.</div><div class=""><br class=""></div><div class="">Best regards,</div><div class="">Michael</div><div class=""><br class=""></div><div><blockquote type="cite" class=""><div class="">On Jul 8, 2015, at 10:01 AM, RCU <<a href="mailto:alex.e.susu@gmail.com" class="">alex.e.susu@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""> Hello.<br class=""> I am trying to vectorize a CSR SpMV (sparse matrix vector multiplication) procedure but the LLVM loop vectorizer is not able to handle such code.<br class=""> I am using cland and llvm version 3.4 (on Ubuntu 12.10). I use the -fvectorize option with clang and -loop-vectorize with opt-3.4 .<br class=""> The CSR SpMV function is inspired from <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_questions_13636464_slow-2Dsparse-2Dmatrix-2Dvector-2Dproduct-2Dcsr-2Dusing-2Dopen-2Dmp&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=YrnXMkXbtHiVnAzEYF16UiagUwQmdFx0lsDW2d0Yc3I&s=ASs-TtJP-eJqMPkdteZRH2jo6UIyrHfPXV_4yYgIAtk&e=" class="">http://stackoverflow.com/questions/13636464/slow-sparse-matrix-vector-product-csr-using-open-mp</a> (I can provide the exact code samples used).<br class=""><br class=""> Basically the problem is the loop vectorizer does NOT work with if inside loop (be it 2 nested loops or a modification of SpMV I did with just 1 loop - I can provide the exact code) changing the value of the accumulator z. I can sort of understand why LLVM isn't able to vectorize the code.<br class=""> However, at <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_docs_Vectorizers.html-23if-2Dconversion&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=YrnXMkXbtHiVnAzEYF16UiagUwQmdFx0lsDW2d0Yc3I&s=IQPrSU1Z6gax_f4lCb98JMuRe05wM9CEsNTtzGYpT-A&e=" class="">http://llvm.org/docs/Vectorizers.html#if-conversion</a> it is written:<br class=""> <<The Loop Vectorizer is able to "flatten" the IF statement in the code and generate a single stream of instructions.<br class=""> The Loop Vectorizer supports any control flow in the innermost loop.<br class=""> The innermost loop may contain complex nesting of IFs, ELSEs and even GOTOs.>><br class=""> Could you please tell me what are these lines exactly trying to say.<br class=""><br class=""> Could you please tell me what algorithm is the LLVM loop vectorizer using (maybe the algorithm is described in a paper) - I currently found only 2 presentations on this topic: <a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_devmtg_2013-2D11_slides_Rotem-2DVectorization.pdf&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=YrnXMkXbtHiVnAzEYF16UiagUwQmdFx0lsDW2d0Yc3I&s=w2iQ6eT6544P1hYW09Ktq0NQjRuJsTftB7dodbzzs0U&e=" class="">http://llvm.org/devmtg/2013-11/slides/Rotem-Vectorization.pdf</a> and <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__archive.fosdem.org_2014_schedule_event_llvmautovec_attachments_audio_321_export_events_attachments_llvmautovec_audio_321_AutoVectorizationLLVM.pdf&d=AwMFaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=YrnXMkXbtHiVnAzEYF16UiagUwQmdFx0lsDW2d0Yc3I&s=9NTZt18td9EgWbXAnix3hlhnYtyC2tky7XZDzsRQtfc&e=" class="">https://archive.fosdem.org/2014/schedule/event/llvmautovec/attachments/audio/321/export/events/attachments/llvmautovec/audio/321/AutoVectorizationLLVM.pdf</a> .<br class=""><br class=""> Thank you very much,<br class=""> Alex<br class="">_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:LLVMdev@cs.uiuc.edu" class="">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" class="">http://llvm.cs.uiuc.edu</a><br class=""><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br class=""></div></blockquote></div><br class=""></div></body></html>