<div dir="ltr">The current X86 NOP padding uses one long NOP followed by the remainder in one-byte NOPs. If the processor actually executes those NOPs, as it sometimes does with aligned bundling, this can have a performance impact. From my micro-benchmarks run on my one machine, a 15-byte NOP followed by twelve one-byte NOPs is about 20% worse than a 15 followed by a 12. This patch changes NOP emission to emit as many 15-byte (the maximum) as possible followed by at most one shorter NOP.<div>
<br></div><div style>David</div><div style><br></div></div>