[PATCH] D75936: Add a Pass to X86 that builds a Condensed CFG for Load Value Injection (LVI) Gadgets [4/6]

Mon Apr 13 16:53:16 PDT 2020

mattdr accepted this revision.
mattdr added a comment.
This revision is now accepted and ready to land.

Accepting modulo some comments to be addressed. Most of my review effort was spent on the data structure and algorithm employed as well as code style and readability.

I am least confident about my understanding of `instrAccessesStackSlot` and the other functions that make up `instrIsFixedAccess`, since each function seems to be pattern-matching on something very specific without a reference to why. I also don't know if this diff provides an exhaustive list of fixed loads, or indeed if it was intended to.

================
Comment at: llvm/lib/Target/X86/ImmutableGraph.h:103
+
+  class NodeSet {
+    friend class iterator;
----------------
sconstab wrote:
> This had not occurred to me until now, but a lot of code is shared between `NodeSet` and `EdgeSet`. Maybe a template could reduce the redundancy?
Ideally I agree we'd find a way to collapse these -- but for this diff, let's content ourselves with a FIXME comment to that effect.

================
Comment at: llvm/lib/Target/X86/ImmutableGraph.h:102
+
+  size_type getNodeIndex(const Node &N) const {
+    return std::distance(nodes_begin(), &N);
----------------
Worth adding a comment for this (and `getEdgeIndex`) that this will crash if the `Node` (`Edge`) provided is not a reference into this specific instance of `ImmutableGraph`.

================
Comment at: llvm/lib/Target/X86/ImmutableGraph.h:340
+      auto NumEdges = static_cast<size_type>(AdjList[VI].second.size());
+      if (NumEdges > 0) {
+        for (size_type VEI = 0; VEI < NumEdges; ++VEI, ++EI) {
----------------
This `if` is unnecessary

================
Comment at: llvm/lib/Target/X86/ImmutableGraph.h:348
+    }
+    assert(VI == VertexSize && EI == EdgeSize && "Gadget graph malformed");
+    VertexArray[VI].Edges = &EdgeArray[EdgeSize]; // terminator node
----------------
Technically a "generic" graph, so we should leave out "Gadget" here

================
Comment at: llvm/lib/Target/X86/ImmutableGraph.h:366
+    std::vector<size_type> TrimmedNodes(TrimNodes.size());
+    for (size_type I = 0; I < TrimNodes.size(); ++I) {
+      TrimmedNodes[I] = TrimmedNodesSoFar;
----------------
Two comments here would make the code significantly easier to understand:
1. Note that we're using `.size()` here rather than `.count()`, so we're actually iterating over *all* Node indices, not just the ones to be trimmed
2. The `TrimmedNodes` vector maps indices in the original NodeSet to the number of `Node`s before that index that have been trimmed by that index, to allow later code to map elements to their new position in a dense array with the trimmed items removed

================
Comment at: llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp:56
+    cl::desc("Don't treat conditional branches as disclosure gadgets. This "
+             "may improve performance, at the cost of security."),
+    cl::init(false), cl::Hidden);
----------------
This is another case where references to good documentation will go a long way. Without details about what the tradeoff is and how to reason about it, it doesn't seem like anyone should use this flag.

================
Comment at: llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp:236
+  LLVM_DEBUG(dbgs() << "Hardening data-dependent loads...\n");
+  hardenLoads(MF, false);
+  LLVM_DEBUG(dbgs() << "Hardening data-dependent loads... Done\n");
----------------
Each call to `hardenLoads` leads to a call to `buildGadgetGraph`. A *lot* of the work that `getGadgetGraph` does seems to be common between mitigating fixed and non-fixed loads -- for example, computing register dataflow and liveness over the entire function.

And calling `hardenLoads` twice looks to be the common case, since `NoFixedLoads` is `false` by default.

Could we make this pass about half as expensive by default by combining these two calls to `hardenLoads` into one? It would do the expensive work once, then either harden _all_ loads or _only_ non-fixed loads.

================
Comment at: llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp:322
+  DenseSet<std::pair<GraphIter, GraphIter>> GadgetEdgeSet;
+  auto AnalyzeUse = [&](NodeAddr<UseNode *> Use, MachineInstr *MI) {
+    assert(!(Use.Addr->getFlags() & NodeAttrs::PhiRef));
----------------
Please add a comment explaining the semantics of the boolean return here. I //think// it's: `true` if we need to consider defs of this instruction tainted by this use (and therefore add them for analysis); `false` if this instruction consumes its use

================
Comment at: llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp:329
+    // to all Defs, unless the instruction is a call
+    if (UseMI.isCall())
+      return false;
----------------
Why is it okay to assume that a call doesn't propagate its uses to defs? Is it because we can assume the CFI transform is already inserting an LFENCE? Whatever the reason, let's state it explicitly here

================
Comment at: llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp:341
+                        GadgetBegin.first, GadgetEnd.first);
+      if (UseMI.mayLoad()) // FIXME: This should be more precise
+        return false;      // stop traversing further uses of `Reg`
----------------
Some more detail would be useful here: precise about what? What are the likely errors?

================
Comment at: llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp:409
+        NodeList Defs = SA.Addr->members_if(DataFlowGraph::IsDef, DFG);
+        llvm::for_each(Defs, AnalyzeDef);
+      }
----------------
We analyze every def from every instruction in the function, but then //also// in `AnalyzeDefUseChain` analyze every def of every instruction with an interesting use. Are we doing a lot of extra work?

================
Comment at: llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp:493
+    const MachineInstr &MI, unsigned Reg) const {
+  if (!MI.isConditionalBranch())
+    return false;
----------------
Worth a comment here that we don't need to worry about indirect branches (jmp to register) because elsewhere we prevent them from being generated

================
Comment at: llvm/lib/Target/X86/X86LoadValueInjectionLoadHardening.cpp:506
+    const PseudoSourceValue *PSV = MMO->getPseudoValue();
+    if (PSV && PSV->kind() == K && MMO->isLoad())
+      return true;
----------------
It seems very weird to make this a template argument rather than just, like, a regular argument.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75936/new/

https://reviews.llvm.org/D75936