[PATCH] D71238: Align non-fused branches within 32-Byte boundary (basic case)

Tue Dec 10 09:50:50 PST 2019

reames added a comment.

In D71238#1776751 <https://reviews.llvm.org/D71238#1776751>, @chandlerc wrote:

> I just want to mention here, that my comment on the original patch still stands and really needs to be addressed:
>
> We need to have detailed performance and binary size numbers for this kind of change. We should be looking at the test suite and SPEC for performance. I'd like to see before/after performance numbers on a system that is *not* using the microcode feature that addresses jCC (either a newer CPU, an older CPu, or an unpatched CPU). I'd also like to see before/after performance numbers on a system that *is* using the microcode feature to address the jCC issue. I'd suggest Clang itself, Firefox, and Chromium as binaries to show the binary size impact of this patch. Happy for others to suggest binaries to use to evaluate the binary size impact.

I strongly agree that Intel should share these numbers in detail.  So far, we've seen some summary shared on llvm-dev, but nothing in detail.

I can't currently share the details of our internal runs, but let me summarize takeaways:

- All results are very preliminary - this is all from a single machine with a subset of our workloads.
- Applying microcode without mitigation hurts.  Geomean impacts aren't too terrible (5-10%), but some individual workloads suffer badly.
- Apply mitigation (either nop padding or prefix padding) addresses most of the regressions seen.  Not all; further work is needed to understand the outliers, but it's a definite step forward.
- The difference between nop and prefix padding is in the noise for the current measurements.  (Noise is fairly high though, so take that with a grain of salt.)
- Applying mitigations on an unpatched machine does have a small negative impact.  For us, it appears to be around a -1% geomean impact. (Again, *very* noisy results so take with a grain of salt.)
- Reminder: This is all in the context of a JIT targetting the specific machine architecture run on.   That may be important w.r.t. nop padding cost for instance.

> I continue to think that any patch we plan to land should have these numbers, and each incremental update to that patch should have the updated numbers. Otherwise, there is no way to do a good job reviewing the *impact* of these changes even after folks are satisfied with the *code* in these changes.

I understand your concern, but I want to caution drawing too firm a line here.  We have a situation where review on a very important patch is stalled, and we need to unblock it.  This subseting was my attempt to do so, but I'm not in a position to be able to share data at the scale your arguing for.  If we can't work incrementally here, I suspect this effort is going to stall.

For context, we will be shipping a mitigation for this.  I would strongly prefer that be based on upstream work, but if I have to, we'll diverge and drop from the upstream discussion.

Also, to be careful, I think there's a huge difference between landing infrastructure (this patch) and enabling by default.  This patch does not do the later, and I'm not sure I support *ever* doing so.  That's for later discussion.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71238/new/

https://reviews.llvm.org/D71238