<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Fri, May 2, 2014 at 12:21 AM, Andrew Trick <span dir="ltr"><<a href="mailto:atrick@apple.com" target="_blank">atrick@apple.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Yes, I agree with Raul that SLP could be improved to handle bitwise operations, it just isn’t going to help Michael’s case. The algorithm needs a starting point to begin searching for vectorizable ops. It starts with either stores or phis under the assumption that it wants to vectorize a whole chain.</blockquote>

<div><br></div><div>OK, that makes sense.</div><div><br></div><div>However, now that I think about it, bswap *is* easily modeled in the "bitvector" space -- it's just a shuffle. So it might be possible to even recognize byteswap trees starting with the stores, going through a shuffle, and then the loads.</div>

<div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> You’re probably thinking that a more general algorithm could recognize load combining as a side-effect. I’m guessing that's not worth the complexity in terms of the algorithm’s structure and profitability heuristics, but I’m not the expert so will say no more.</blockquote>

</div><br>Yea, nor am I an expert, so I'll also stop speculating. It at least seems worth investigating how much complexity would be required to model bit vectors in the SLP pass. Arnold and Michael can probably do that and then make the call on whether to do that or just do an explicit pass downstream from there.</div>

</div>