<div dir="ltr"><a href="https://llvm.org/docs/CommandGuide/llvm-mca.html#using-markers-to-analyze-specific-code-blocks">The docs for llvm-mca</a> suggest using inline assembly to mark the region that llvm-mca should examine, i.e.<div><br></div><div>__asm volatile("# LLVM-MCA-BEGIN");<div>// ...</div><div>__asm volatile("# LLVM-MCA-END");<br></div><div><br></div></div><div>However, these directives seem to interfere with auto-vectorization.</div><div><br></div><div><source>:8:3: remark: loop not vectorized: call instruction cannot be vectorized [-Rpass-analysis=loop-vectorize]<br>                __asm volatile("# LLVM-MCA-BEGIN sum_marked");<br>                ^<br><br><source>:6:2: remark: loop not vectorized: read with atomic ordering or volatile read [-Rpass-analysis=loop-vectorize]<br>        for (size_t index = 0; index < count; index++)<br>        ^<br></div><div><a href="https://godbolt.org/z/NSQchu"><br></a></div><div><a href="https://godbolt.org/z/NSQchu">Compiler Explorer link.</a></div><div><br></div><div>Any ideas for a workaround, other than compiling unmarked source and then manually inserting markers into the emitted assembly?</div></div>