<div dir="ltr">Hi,<div><br></div><div>As a follow up to <a href="https://reviews.llvm.org/D23646">Patch D23646</a>, I'm trying to figure out if there should be an alignment check and what the correct approach is. </div><div><br></div><div>Some background: </div><div>For stores, the pass turns:</div><div><div><font size="1">%i.vec = shuffle <8 x i32> %v0, <8 x i32> %v1,</font></div><div><font size="1">                 <0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11></font></div><div><font size="1">store <12 x i32> %i.vec, <12 x i32>* %ptr</font></div><div>Into:<br></div><div><font size="1">%sub.v0 = shuffle <8 x i32> %v0, <8 x i32> v1, <0, 1, 2, 3></font></div><div><font size="1">%sub.v1 = shuffle <8 x i32> %v0, <8 x i32> v1, <4, 5, 6, 7></font></div><div><font size="1">%sub.v2 = shuffle <8 x i32> %v0, <8 x i32> v1, <8, 9, 10, 11></font></div><div><font size="1">call void llvm.aarch64.neon.st3(%sub.v0, %sub.v1, %sub.v2, %ptr)</font></div></div><div><br></div><div>The purpose of the above patch is to enable more general patterns such as turning:</div><div><div><font size="1">%i.vec = shuffle <32 x i32> %v0, <32 x i32> %v1,</font></div><div><font size="1">                <4, 32, 16, 5, 33, 17, 6, 34, 18, 7, 35, 19></font></div><div><font size="1">store <12 x i32> %i.vec, <12 x i32>* %ptr</font></div><div>Into:<br></div><div><font size="1">%sub.v0 = shuffle <32 x i32> %v0, <32 x i32> v1, <4, 5, 6, 7></font></div><div><font size="1">%sub.v1 = shuffle <32 x i32> %v0, <32 x i32> v1, <32, 33, 34, 35></font></div><div><font size="1">%sub.v2 = shuffle <32 x i32> %v0, <32 x i32> v1, <16, 17, 18, 19></font></div><div><font size="1">call void llvm.aarch64.neon.st3(%sub.v0, %sub.v1, %sub.v2, %ptr)</font></div></div><div><br></div><div>The question I'm trying to get answered if there should have been an alignment check for the original pass, and, similarly, if there should be an expanded one for the more general pattern.</div><div>In the example above, I was looking to check if the data at positions 4, 16, 32 is aligned, but I cannot get a clear picture on the impact on performance (hence the side question below).</div><div>Also, some preliminary alignment checks I added break some ARM tests (and not their AArch64 counterparts). The cause is getting "not fast" from allowsMisalignedMemoryAccesses, from checking hasV7Ops.</div><div>I'd appreciate getting some guidance one how to best address and analyze this.<br></div><div><br></div><div>Side question for Tim and other ARM folks, could I get a recommendation on reading material for performance tuning for the different ARM archs?</div><div><br></div><div>Thank you,</div><div>Alina</div><div><br></div>


<div><br></div><div><br></div></div>