[llvm-dev] Status of Intel JCC Mitigations and Next Steps

Wed Mar 25 15:53:38 PDT 2020

On 3/25/20 2:22 PM, Eli Friedman wrote:
>
> I agree we shouldn’t try to guess what the user is trying to do.  
> There shouldn’t be an unbounded set of heuristic rules; “documented” 
> implies some sort of promise of stability in addition to the actual 
> text in the manual.  And we shouldn’t try to guess whether the user’s 
> code cares about the length of a specific instruction.
>
> I think you’re creating a false dichotomy here, though.  There’s some 
> space between “the assembler can rewrite any instruction” and “the 
> assembler should guess what the user is doing”.  The patches you’re 
> referring to are dealing with a very specific issue: there are 
> multiple ways users specify prefixes for certain instructions.  
> Sometimes the prefixes are integrated into the instruction.  Sometimes 
> they’re written separately, like “rep; stosb”. And sometimes users end 
> up using raw “.byte” directives.  So we add a rule: if we see 
> something that looks like a prefix, followed by an instruction, we 
> assume the combination might actually be a prefixed instruction, and 
> don’t rewrite it.  This is a heuristic in some sense: we can’t prove 
> whether the prefixed instruction will ever execute at runtime.  But 
> adding prefixes to the instruction is likely problematic.
>
I see this as very much a slipper slope problem.  As you mention, I am 
trying to draw a hard line, but that's mostly because I can't figure out 
what another reasonable line to draw might be.  Why is one idiomatic 
construct supported, but another not?  Why is "label: nop; mov $(rax), 
%rax" (with the expectation a fault occurs at label + 1 byte) any less 
reasonable?  If we say that's reasonable, and should just work, well, we 
literally can't implement prefix padding at all.  Unless maybe there's 
something I'm missing here?

> We could take a different tack on this: instead of giving up pure 
> “strictness”, we could instead give the user a diagnostic on patterns 
> we recognize.  But forcing users to rewrite “rep;stosb”->“rep stosb” 
> doesn’t seem productive.
>
> -Eli
>
> *From:* Philip Reames <listmail at philipreames.com>
> *Sent:* Wednesday, March 25, 2020 1:34 PM
> *To:* Eric Christopher <echristo at gmail.com>; Eli Friedman 
> <efriedma at quicinc.com>
> *Cc:* Luo, Yuanke <yuanke.luo at intel.com>; llvm-dev 
> <llvm-dev at lists.llvm.org>; Zhang, Annita <annita.zhang at intel.com>; 
> Craig Topper <craig.topper at intel.com>
> *Subject:* [EXT] Re: [llvm-dev] Status of Intel JCC Mitigations and 
> Next Steps
>
> The slightly unexpected bit for me in these responses is the 
> willingness to accept layout changes if documented.  Let me lay out 
> some options, which of these seem reasonable?
>
> 1) Default to automatic padding, provide option to disable, document 
> where padding is inserted.
>
> 2) Default to not automatically padding, provide option to enable, 
> document where padding would be inserted if enabled.
>
> And orthogonality to the above, two interpretations of each:
>
> a) auto padding is allowed to break common idioms (because you enabled 
> it explicitly or because you can disable).  If it does, you are 
> expected to either simply not use it, or possibly use fine grained 
> directives in source.
>
> b) auto padding conservative even when explicitly enabled. One 
> implication of this scheme is that different versions of the assembler 
> will almost by definition have to tweak insertion as we find new 
> problematic idioms.  This creates both documentation difficulties, and 
> user confusion.
>
> So, four choices total.  Which seem reasonable?
>
> Personally, I'd have no problem w/ 2a.  Any of the other variants 
> concern me.
>
> Philip
>
> On 3/25/20 1:08 PM, Eric Christopher wrote:
>
>     FWIW I'm with Eli here if you need any more data points.
>
>     -eric
>
>     On Tue, Mar 24, 2020 at 8:21 PM Eli Friedman via llvm-dev
>     <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>         Changing the length of a sequence of assembly instructions
>         will break someone’s code at some point.  The length of a
>         sequence of instructions is known, in general, and people will
>         write code to take advantage of that. For example, I’ve seen
>         assembly code using something like Duff’s device, except that
>         instead of using a jump table, it just computed the
>         destination as “base+n*caselength”. Given that, I don’t think
>         it’s reasonable to hide this mechanism from user control.
>
>         We definitely should not have any undocumented or
>         unpredictable behavior in the assembler.  The actual
>         instruction bytes matter.  That said, I’m not sure there’s a
>         strong line between “automagic” and “explicit”, as long as the
>         rules are documented.
>
>         -Eli
>
>         *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org
>         <mailto:llvm-dev-bounces at lists.llvm.org>> *On Behalf Of
>         *Philip Reames via llvm-dev
>         *Sent:* Tuesday, March 24, 2020 3:55 PM
>         *To:* llvm-dev <llvm-dev at lists.llvm.org
>         <mailto:llvm-dev at lists.llvm.org>>
>         *Cc:* Luo, Yuanke <yuanke.luo at intel.com
>         <mailto:yuanke.luo at intel.com>>; Zhang, Annita
>         <annita.zhang at intel.com <mailto:annita.zhang at intel.com>>;
>         Craig Topper <craig.topper at intel.com
>         <mailto:craig.topper at intel.com>>
>         *Subject:* [EXT] [llvm-dev] Status of Intel JCC Mitigations
>         and Next Steps
>
>         TLDR - We have a choice to make about assembler support, and a
>         disagreement about how to move forward.  Community input needed.
>
>         Background
>
>         Intel has a hardware bug in Skylake and later whose mitigation
>         requires padding of branches to avoid performance
>         degradation.  Background here:
>         https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf
>
>         We now have in tree support for alignment of such branches via
>         nop padding, and limited support for padding existing
>         instructions with either prefixes or larger immediate values.
>         This has survived several days of dedicated testing and
>         appears to be reasonably robust. The padding support applies
>         both to branch alignment for the mitigation, but also normal
>         align directives.
>
>         The original patches proposed a somewhat different approach
>         than we've ended up taking - primarily because of memory
>         overhead concerns. However, there was also a deeper
>         disagreement on the original review threads (D70157 and
>         others) which was never settled, and we seem to be at a point
>         where this needs attention.  In short, the question is how
>         assembler support should be handled.
>
>         The Choice
>
>         The problematic use case comes when assembling user provided
>         .s files.  (Instead of the more restricted output of the
>         compiler.)  Our basic choice is do we want to force a new
>         directive syntax (and thus a source code change to use the new
>         feature), or attempt to automatically infer where it's safe to
>         insert padding?
>
>         The options as I see them:
>
>           * Assembler directives w/explicit opt in - In this model,
>             assembler input is assumed to only enable padding in
>             regions where it is safe to do so.
>           * Automagic assembler - In this model, the assembler is
>             responsible for inferring where it is legal to pad without
>             breaking user expectations.
>
>         (I'll stop and disclaim that I'm strongly in favor of the
>         former.  I've tried to describe the pros/cons of each, but my
>         perspective is definitely biased.)
>
>         The difference between the two is a huge amount of complexity,
>         and a very fundamental correctness risk.  The basic problem is
>         that assemblers have to handle unconstrained inputs, and IMO,
>         the semantics of assembler as used in practice is so under
>         specified that it's really hard to infer semantics in any
>         useful way.  As a couple of examples, is the fault behavior of
>         an instruction well defined?  Is the label near an instruction
>         used by the signal handler?  Is that data byte just before an
>         instruction actually decoded as part of the instruction?
>
>         The benefit of the later option is that existing assembly
>         files can be used without modification.  This is a huge
>         advantage in terms of ease of mitigation for existing code
>         bases. It's also the approach the original patch sets for GCC
>         took.
>
>         In the original review thread(s), I had taken the position
>         that we should reject the automagic assembler based on the
>         correctness concerns mentioned.  I had thought the consensus
>         in the review was clearly in that direction as well, but this
>         has recently come up again.  Given that, I wanted to open it
>         to a wider audience.
>
>         Why am I pushing for a decision now?
>
>         There are two major reasons.  First, there have recently been
>         a couple of patches posted and landed (D76176, and D76052)
>         building towards the automagic assembler variant.  And second,
>         I've started getting review comments
>         (https://reviews.llvm.org/D76398#1930383) which block forward
>         progress on generalized padding support assuming the automagic
>         interpretation.  Implementing the automatic assembler variant
>         for prefix and immediate padding adds substantial complexity
>         and I would very much like not to bother with if I don't have to.
>
>         Current implementation details
>
>         We have support in the integrated assembler only for
>         autopadding suppression.  This allows a LLVM based compiler to
>         effectively apply padding selectively.  In particular, we've
>         instrumented lowering from MI to MC (X86MCInstLowering.cpp) to
>         selectively disable padding around constructs which are
>         thought to be problematic.  We do not have an agreed upon
>         syntax for this in assembler; the code that got checked in is
>         modeled closely around the last seriously discussed variant
>         (see below).  This support is able to use all of the padding
>         variants: nop, prefix, and immediate.
>
>         We also have limited support in the assembler for not
>         inserting nops between fragments where doing so would break
>         known idioms.  The list of such idioms is, IMO, ad hoc.  This
>         assembler support does not include prefix or immediate padding.
>
>         Philip
>
>         p.s. For those interested, here's roughly what the last round
>         of assembler syntax I remember being discussed looked like.
>
>         .autopadding
>         .noautopadding
>
>         These two directives would respectively enable and disable
>         automatic padding of instructions within the region defined. 
>         It's presumed to be legal to insert nops between instructions,
>         modify encodings, or otherwise adjust offsets of instruction
>         boundaries within the region to achieve target specific
>         desired alignments. Similarly, it's presumed not to be legal
>         to change relative offsets outside an explicitly enabled
>         region.  (Except for existing cases - e.g. relaxation of
>         branches, etc...)
>
>         The assembler would provide a command line flag which
>         conceptually wrapped the whole file in a pair of
>         enable/disable directives.
>
>         We'd previously discussed a variant with push/pop semantics
>         and more fine grained control over alignment requests, but I
>         believe we decided that was overkill in the end.  (I walked
>         away with that impression based on the integrated assembler
>         work at least.)
>
>         _______________________________________________
>         LLVM Developers mailing list
>         llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>         https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200325/51deb070/attachment.html>