<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><br><div><div>On Oct 21, 2013, at 12:09 PM, Dmitry Babokin <<a href="mailto:babokin@gmail.com">babokin@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">By the way, I'm curious, is the any reason why you focus on SSE4, not AVX? Seems that vectorizer should care the most about the latest silicon.</div><br class="Apple-interchange-newline"></blockquote></div><br><div>I am interested in looking at the SSE4 code because lowering of AVX code is more complicated, especially for masks. The problem that <8 x i1> can be legalized to <8 x i32> for YMM, or <8 x i16> for XMM. ISPC worked around this limitation by explicitly extending the mask. The SEXT canonicalization reverted the code pattern that ISPC generated. </div><div><br></div><div>Thanks,</div><div>Nadav </div></body></html>