<div dir="ltr">Hi all,<div><br></div><div>Attached herewith is a fairly simple LLVM file (shuffle.ll) with lots of vector shuffles. </div><div><br></div><div>When I use llc with -O3 -mcpu=core-avx2 the first shuffle sequence containing types of 128 wide gets reduced a single shuffle, where as the second shuffle sequence containing types of 256 wide doesn't get reduced to a single shuffle instruction in the resulting X86 code (Shuffle.s attached).</div><div><br></div><div>The second sequence is identical to first and is a rewidening of the sequence for a higher vector length.</div><div><br></div><div>Can this be explained and where in the machine lowering passes does this simplification happen?</div><div><br></div><div>Thanks<br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div>Kind regards,<br>Charith Mendis<br><br>Graduate Student,<div>CSAIL,<br><div>Massachusetts Institute of Technology</div></div></div></div></div>

</div></div>