<div dir="ltr"><div class="gmail_quote">I've replied to some of the higher level concerns already, but I wanted to point out one specific thing:</div><div class="gmail_quote"><br></div><div class="gmail_quote">On Fri, Apr 10, 2015 at 3:30 AM Cong Hou <<a href="mailto:congh@google.com">congh@google.com</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">LLVM uses the widest type to calculate the maximum vectorization factor, which greatly limits the bandwidth of either calculations or loads/stores from SIMD instructions. One example is converting 8-bit integers to 32-bit integers from arrays in a loop: currently the VF of this simple loop is decided by 32-bit integer and for SSE2 it will be 4. Then we will have 1 load and 1 store in every 4 iterations. If we calculate VF based on 8-bit integer, which will be 16, we will have 1 load and 4 stores in every 16 iterations, saving many loads.<br></blockquote><div><br></div><div>While I'm generally in favor of this kind of change, I think the test case you're looking at is actually a separate issue that I've written up several times w.r.t. our vectorizer.</div><div><br></div><div>Because C does integer promotion from 8-bit integer types to 32-bit integer types, we very commonly see things that are vectorized with 32-bit integer math when they don't need to.</div><div><br></div><div>The IR can't narrow these operations from 32-bit integer operations to 8-bit integer operations without losing information because in 8-bits the operations might overflow. But when vectorizing, we don't care about this. We should aggressively narrow operations above a trunc which we could hoist the trunc above by stripping overflow flags while building the vectorizable operation tree so that we can fit more operations into a single vector. Does that make sense?</div><div><br></div><div>-Chandler</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

This patch mainly changes the function getWidestType() to getNarrowestType(), and uses it to calculate VF.<br>

<br>

<a href="http://reviews.llvm.org/D8943" target="_blank">http://reviews.llvm.org/D8943</a><br>

<br>

Files:<br>

  lib/Target/X86/X86TargetTransformInfo.cpp<br>

  lib/Transforms/Vectorize/LoopVectorize.cpp<br>

  test/Transforms/LoopVectorize/X86/fp64_to_uint32-cost-model.ll<br>

  test/Transforms/LoopVectorize/X86/vector_ptr_load_store.ll<br>

<br>

EMAIL PREFERENCES<br>

  <a href="http://reviews.llvm.org/settings/panel/emailpreferences/" target="_blank">http://reviews.llvm.org/settings/panel/emailpreferences/</a><br>

_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

</blockquote></div></div>