[LLVMdev] Adding diversity for security (and testing)

Fri Sep 20 00:52:03 PDT 2013

Stephen Crane wrote:
> Thanks for all the feedback! It seems there is some interest, so I thought I'd try to summarize discussions so far, and provide patches for closer inspection. I'm not sure if patches should end up here or on a different list in this instance, so if I should instead send this to a different list, I'm happy to do so.
>
> - Is diversity needed, or are existing protections sufficient? As several people have mentioned, industry adopters need very low overhead in many situations, and diversity might be a good fit here. Ultimately diversity is just one piece of the whole security puzzle, and could be combined with other techniques for increased assurance. There did not seem to be a strong consensus in this discussion about whether diversity is critical, although it appears that existing static analysis tools are insufficient to cover all cases.
>
> - Distribution. Distributing large numbers of randomized variants to end-users could be difficult. Prelink diversification was mentioned, which may be a good direction to go with this for the future. However, a basic implementation in LLVM would provide a good starting point for future improvements. This implementation would also provide a useful security measure now for users who are already compiling their own software.
>
> - Security. As was pointed out, diversity is not a perfect solution, and could be bypassed through the use of memory disclosures and scripting. However, this is certainly much harder for the attacker to pull off. In addition, existing security techniques should be used alongside diversity for defense-in-depth.
>
> - Random number generation. Discussions established that the use of some sort of an internal RNG is required for reproducibility. The use of a cryptographically secure RNG is considered best-practice for any randomness used for security purposes. Since the RNG we are proposing is still very fast, there doesn't appear to be a significant disadvantage to using it. We do need a fallback RNG when OpenSSL is unavailable, and have therefore included a simple LCG as well.
>
> I have attached our current patches against r190882, as well as a bit of documentation. Not sure what the next steps here are, but feel free to test out the code and let us know how it goes.

Awesome! I'll jump right in to reviewing. Note that my reviews often 
focus on low-level stuff (formatting, typos). I understand that these 
patches may not be ready for that, and rather expect a higher-level 
review, but I'll do what I do anyways.

You have separate LLVM_WITH_OPENSSL and LLVM_ENABLE_RNG settings. Why? 
Either LLVM_ENABLE_RNG should always be on and then people can choose to 
link in openssl or not. Or maybe you're worried that this is insecure 
and you should remove LLVM_ENABLE_RNG and make it driven solely on 
LLVM_ENABLE_RNG? If you think we can put a good enough RNG into LLVM, we 
should go with turning on LLVM_ENABLE_RNG permanently, but if not then 
we should make the whole thing conditional on whether LLVM_WITH_OPENSSL 
is set.

+  //// isInsertedNOP - Return true if the instruction is an
+  //// artificially inserted NOP

Three slashes for a doxygen comment, not four.

+  /**
+   * Shuffles an iplist of type T
+   */

In LLVM, the style is:

   // Shuffles an iplist of type T

+    void shuffle(iplist<T>& list){

Missing space before '{'.

+    if(list.empty()) return;

Missing space before '('.

+    for(typename iplist<T>::iterator i = list.begin(); i != list.end(); ){

See 
http://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop 
. The llvm way is:

     for (typename iplist<T>::iterator i = list.begin(), e = list.end();
          i != e; ++i) {

Also spaces before '(' and '{'.

+    for(typename SmallVector<T*, 10>::size_type i = 0; i < sv.size(); i++){

Similarly.

Are you familiar with std::random_device and the random number generator 
C++ 11 standard libary components? Yep, I'm going to ask you to expose a 
compatible API, or reasons why that doesn't work (or is silly). 
std::random_device is defined to be non-deterministic, and you needn't 
provide methods that you don't call (entropy()) but it would be 
convenient for everyone reading the API if it were a subset of an 
existing standard, or with clearly documented deviations from the standard.

--- /dev/null
+++ b/lib/Support/RandomNumberGenerator.cpp
@@ -0,0 +1,271 @@
+
+//
+//                     The LLVM Compiler Infrastructure

Whoops! Missed the ruler comment at the top.

+namespace {
+  static cl::opt<unsigned long long>
+  RandomSeed("rng-seed", cl::value_desc("seed"),
+             cl::desc("Seed for the random number generator"));

namespace and static are redundant. Just use static. 
http://llvm.org/docs/CodingStandards.html#static

+  PKCS5_PBKDF2_HMAC_SHA1(Password.data(), Password.size(), (unsigned 
char*)&Salt, sizeof(Salt), PBKDF_ITERATIONS, KeyLen, RandomBytes);

80-column violation. 
http://llvm.org/docs/CodingStandards.html#source-code-width

+void RandomNumberGenerator::ReadStateFile(StringRef StateFilename) {

There's nothing wrong with this function per se. It isn't buggy. I'm 
just wondering how this API fits with the general "llvm as a library" 
approach. LLVM doesn't generally bake in the assumption that it's 
running on a system with a filesystem. For example, the bitcode reader 
has a convenience method which reads a .bc file, sure, but that just 
does the action of loading it into a MemoryBuffer, and forwards it along 
to the API using MemoryBuffer that does the work. I think you should 
follow that pattern here too.

+void RandomNumberGenerator::WriteStateFile(StringRef StateFilename) {

llvm::raw_ostream, for similar reasons?

+//===- NOPInsertion.cpp - Insert NOPs between instructions    ---*- C++ 
-*-===//

The emacs mode marker isn't necessary on .cpp files, it's only .h files 
which are ambiguous (C vs. C++). Also, some of those spaces should be 
hyphens.

+bool NOPInsertionPass::runOnMachineFunction(MachineFunction &Fn) {
+  const TargetInstrInfo *TII = Fn.getTarget().getInstrInfo();
+  PreNOPFunctionCount++;
+  static int nopsInserted = 0;

Hmmm. We don't currently run machine function passes in parallel, but in 
theory LLVM is designed to allow it (function passes aren't allowed to 
look outside their functions, for example). Fortunately, it looks like 
the only thing you do with this variable is increment it, but never read 
it for anything? Nuke it?

Overall this looks like a great start. However, I would like some other 
people to review things:
   - I don't actually approve lib/CodeGen and lib/Target changes. 
Somebody else is going to have to think about whether "InsertedNOP" 
belongs as an MIFlag.
   - I'm concerned about seeding the RNG. I'm especially concerned about 
seeding it in the LTO case, I hadn't thought of that until I saw it in 
the patch. I'm appreciate if somebody with a security background could 
ponder that one.
   - There's also a clang patch which should be reviewed by cfe-commits.

Nick