<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>On Jan 7, 2013, at 9:29 PM, Chris Lattner <<a href="mailto:clattner@apple.com">clattner@apple.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><span style="font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ">if we don't need a scalar cleanup loop (e.g. because the vectorization factor of a loop is known to subdivide the constant tripcount), isn't it always beneficial to do the vectorization, even if the new tripcount is low?</span></blockquote></div><br><div>Yes, I agree. This is something that I haven't gotten to. </div><div><br></div><div>I am now looking at a few examples where we vectorize and unroll too much, and I haven't found right solution yet. Until now I worked to increase the iteration 'width', by vectorizing and unrolling. In some cases we handle 32 floats in one iteration (v8f32, unrolled 4 times). I assumed that 'n' was high and that the cost of the scalar post-loop is negligible compared to the vectorized loop. But if we widen the loop to 32-elements, then the cost of the scalar loop is potentially 31-scalar operations. In some cases we only discover the length of the array at runtime, and it can be small. I think that reckless-widening of loops is not the right way to go, but I am still not sure what to do.</div><div><br></div><div>Thanks,</div><div>Nadav</div></body></html>