[PATCH] D119296: KCFI sanitizer

Fri Jun 24 10:45:29 PDT 2022

samitolvanen planned changes to this revision.
samitolvanen added inline comments.

================
Comment at: clang/lib/CodeGen/CodeGenModule.cpp:6694

-  if (isExternallyVisible(T->getLinkage())) {
+  if (isExternallyVisible(T->getLinkage()) || !OnlyExternal) {
     std::string OutName;
----------------
pcc wrote:
> It would be better to have a separate function for computing the KCFI type ids than to try and reuse this one, since you don't need the MDStrings (i.e. unnecessary string uniquing) in your new caller. It also isn't appropriate to pass MetadataIdMap in your new caller because you'll end up with inconsistent values added to MetadataIdMap if we end up calling the other caller (e.g. if both CFI and KCFI are enabled), but that's moot if you avoid it.
The code in SanitizerArgs.cpp doesn't allow both CFI and KCFI to be enabled at the same time, but you're right, it's probably better to just split this into a separate function for KCFI. I'll do that in the next version.

================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:2453
+  // additional machine instructions being emitted between the check and
+  // the call. This means we don't have to worry about expanding BLR_BTI
+  // and TCRETURNri* pseudos.
----------------
pcc wrote:
> For PAuth ABI the motivation for avoiding the gap between check and call was around avoiding spilling the verified pointer, is this not possible with KCFI?
I don't believe that's a concern here. We might emit instructions for setting up call arguments between the check and the call, but nothing should spill the pointer or change the target register anymore. The main reason to avoid bundling the check and the call is to avoid further complications with expanding the remaining pseudo instructions.

================
Comment at: llvm/lib/Target/X86/X86AsmPrinter.cpp:119
+  MCSymbol *FnSym = OutContext.getOrCreateSymbol("__cfi_" + MF.getName());
+  // Use the same linkage as the parent function.
+  emitLinkage(&MF.getFunction(), FnSym);
----------------
pcc wrote:
> If this is just about satisfying objtool do these symbols need to be exported or can they just be STB_LOCAL?
They don't have to be exported, but we ran into some objtool confusion when the parent function was weak, so it's easier to just use the same linkage for the preamble symbol.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.h:83
+    KCFI_NT_CALL,
+    KCFI_TC_RETURN,
+
----------------
pcc wrote:
> samitolvanen wrote:
> > joaomoreira wrote:
> > > samitolvanen wrote:
> > > > joaomoreira wrote:
> > > > > I did not revise the entire patch yet. With this said, IMHO, this looks like an overcomplication of a simple problem. Is there a reason why you really need specific KCFI_ nodes instead of only embedding the hash information into an attribute at the Machine Instruction? Then, if hash == 0, it just means it is a call that doesn't need instrumentation.
> > > > > 
> > > > > This latter approach will require less code and should be easier to maintain compatible with other CFI approaches. If the reason is because you don't want to have a useless attribute for non-call instructions, then you could possibly have a map where you bind the call instruction with a respective hash.
> > > > > 
> > > > > Unless there is a strong reason for these, I would much better prefer the slim approach suggested. Either way, if there is a reason for this, I would also suggest that you at least don't name these as "KCFI_something", as in the future others might want to reuse the same structure for other CFI approaches.
> > > > > Is there a reason why you really need specific KCFI_ nodes instead of only embedding the hash information into an attribute at the Machine Instruction?
> > > > 
> > > > This implementation is similar to `CALL_RVMARKER`, `CALL_BTI` and basically all other pseudo call instructions in LLVM. Is adding an attribute to `MachineInstr` the preferred approach instead?
> > > > 
> > > > > I would also suggest that you at least don't name these as "KCFI_something", as in the future others might want to reuse the same structure for other CFI approaches.
> > > > 
> > > > Always happy to hear suggestions for alternative naming. Did you have something in mind?
> > > > This implementation is similar to `CALL_RVMARKER`, `CALL_BTI` and basically all other pseudo call instructions in LLVM. Is adding an attribute to `MachineInstr` the preferred approach instead?
> > > 
> > > My understanding is that if, every time a new mitigation or optimization comes in, you create a new opcode for it, it will eventually bloat to non-feasibility.
> > > 
> > > Imagine you have some mitigation like [[ https://www.cs.columbia.edu/~vpk/research/kguard/ | kguard  ]] being implemented. Now you can have calls which are KCFI checked but not KGUARD checked; then KCFI not-checked but KGUARD checked; then KCFI and KGUARD checked.; then none-checked. And then you need all these variations for tail calls (which imho is a first, minor, instance of the problem)...
> > > 
> > > So, in general, my understanding is that this approach works, yeah, but that in the long term it could become a hassle... so ideally we should use attributes to define these sub-specific instructions instead of opcodes.
> > > 
> > > > 
> > > > > I would also suggest that you at least don't name these as "KCFI_something", as in the future others might want to reuse the same structure for other CFI approaches.
> > > > 
> > > > Always happy to hear suggestions for alternative naming. Did you have something in mind?
> > > 
> > > I think switching from KCFI into CFI would already be good enough, as in the end these are all implementing the [[ https://dl.acm.org/doi/10.1145/1102120.1102165 | control-flow integrity ]] concept.
> > > My understanding is that if, every time a new mitigation or optimization comes in, you create a new opcode for it, it will eventually bloat to non-feasibility.
> > 
> > I do agree, if there are enough call variants that can be combined, this will quickly get messy. So far this probably just hasn't been an issue. I'm not particularly happy about the multiple pseudo instructions here either.
> > 
> > @pcc, any thoughts about the best way to make this cleaner? Should we switch to a `MachineInstr` attribute, or perhaps pass another operand with the specific call type, or something else?
> > 
> > > so ideally we should use attributes to define these sub-specific instructions instead of opcodes.
> > 
> > One concern I have with using an attribute is that we have to ensure that no pass accidentally drops it while transforming the call instruction or replacing it with something else. That's not an issue if we have a separate pseudo instruction with the type as an operand. Did you run into any issues with this?
> > 
> > Also, how do you serialize the type attribute in MIR? Something similar to `debug-instr-number`?
> > 
> > > I think switching from KCFI into CFI would already be good enough
> > 
> > Sure, sounds good to me. I'll change the naming once we have a consensus on the correct approach here.
> Would it make sense to add a field to `MachineInstr::ExtraInfo` to hold the additional information here? That way you can avoid increasing the size of `MachineInstr`.
I'll look into it. Per offline discussion, we might need a similar construction for `SDNode` to avoid increasing memory usage too much should we go with type hash attributes instead.

================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp:3093
+            FunctionType->getZExtValue() != ExpectedType->getZExtValue())
+          dbgs() << Call.getModule()->getName() << ":"
+                 << Call.getDebugLoc().getLine()
----------------
pcc wrote:
> With CFI, if this situation occurs we unconditionally crash, the rationale being that if we end up in this situation it is a situation unexpected by the developer and may even be intended to be unreachable at runtime, so it's best to crash if it does occur. The same approach is used for things like UBSan integer overflow checks if the optimizer figures out that the overflow will always happen. If I'm understanding the code correctly KCFI will just let the call occur in this case and I'm not sure if that's a good idea.
The difference here is that unlike with CFI, we never emit KCFI checks for direct calls in the back-end. While this code drops unnecessary kcfi operand bundles from the IR, it doesn't actually change the machine code we generate.

Also, I would argue that KCFI should focus //only// on runtime protection of actual indirect calls, and otherwise shouldn't trap on potential undefined behavior that already exists. Causing a runtime failure for something we can detect at compile-time and don't aim to protect against doesn't sound ideal, especially in the kernel where we prefer to avoid crashing unnecessarily. I did consider changing this to a warning though, so the developers are notified of potential issues, but opted for debug logging for now. Thoughts?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119296/new/

https://reviews.llvm.org/D119296