<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">Hi Chandler,<div><br></div><div>Here are a few more test cases.</div><div>I’ve ordered them from the hottest to the coldest.</div><div><br></div><div>To reproduce:</div><div>llc <test case> -x86-experimental-vector-shuffle-lowering=<true | false> [specific feature]</div><div><br></div><div>1. avx2_vperm.ll avx2</div><div><div style="margin: 0px;"><div>We use a sequence of extract, 2 shuffle, insert instead of vperm when avx2 is set.</div><div><br></div><div>2. avx_blend.ll avx</div><div>Instead of using one big blend, we use 2 extracts, one small blend, and one insert.</div><div><br></div><div>3. avx2_extract2perm.ll avx2</div><div>We use a sequence of two instructions: extract, unpck, instead of one perm.</div></div><div style="margin: 0px;"><br></div><div style="margin: 0px;">4. pxor.ll none</div><div style="margin: 0px;">Instead of using pxor to set a register to zero, we use a sequence composed of xorpd, shuffle.</div><div style="margin: 0px;"><br></div><div style="margin: 0px;">5. sse4.1_pmovzxwd.ll sse4.1</div><div style="margin: 0px;">Instead of using a single pmovzxwd, we use a movq followed by an unpck.</div><div style="margin: 0px;"><br></div><div style="margin: 0px;">If you prefer, I can file PRs.</div></div><div style="margin: 0px;"><br></div><div style="margin: 0px;">Cheers,</div><div style="margin: 0px;">-Quentin</div><div> </div></body></html>