Runtime stack overflow prevention for Twines

Marco Alesiani marco.diiga at gmail.com
Fri May 16 06:34:14 PDT 2014


Compile-time prevention of runtime stack overflow when concatenating Twines
(infinite recursion). Attached patch, test case and a plain English
explanation of the problem.

-- 

*__________________________*

 *Marco Alesiani*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140516/b3ec6afe/attachment.html>
-------------- next part --------------
From 02bb519b4b09614e0d3b0e48208bbbc5772be43a Mon Sep 17 00:00:00 2001
From: Alesiani Marco <marco.diiga at gmail.com>
Date: Thu, 15 May 2014 21:12:14 +0200
Subject: [PATCH] Runtime stack overflow prevention for Twines

Compile-time prevention of runtime stack overflow when concatenating
Twines (infinite recursion)
---
 include/llvm/ADT/Twine.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/llvm/ADT/Twine.h b/include/llvm/ADT/Twine.h
index a54fd74..356ce93 100644
--- a/include/llvm/ADT/Twine.h
+++ b/include/llvm/ADT/Twine.h
@@ -182,6 +182,10 @@ namespace llvm {
       assert(isValid() && "Invalid twine!");
     }
 
+    /// Since the intended use of twines is as temporary objects, assignments 
+    /// when concatenating might cause undefined behavior or stack corruptions
+    Twine& operator=(const Twine &Other) LLVM_DELETED_FUNCTION;
+
     /// isNull - Check for the null twine.
     bool isNull() const {
       return getLHSKind() == NullKind;
-- 
1.9.3

-------------- next part --------------
== Explanation and test case for Twine stack overflow problem ==

As the documentation stands, Twines are inherently dangerous and they should only be used as temporary arguments 
to a function. No temporaries should be stored as in the following
________________________________________________________________
void foo(const Twine &T);
...
StringRef X = ...
unsigned i = ...
const Twine &Tmp = X + "." + Twine(i);
foo(Tmp);
________________________________________________________________

anyway there's a different case much more subtle that doesn't seem to create problems at first and (tested on a Win8 
machine - VS2012) sometimes only shows up in different project configurations (e.g. Release but not in Debug):
________________________________________________________________
 const char *ptr1 = "_text1";
 const char *ptr2 = "_text2";
 const char *ptr3 = "_text3";
 llvm::Twine data = llvm::Twine("data") + ptr1 + ptr2 + ptr3;
 function_call(data); // function_call(Twine& tw);
________________________________________________________________
 
This is a totally different case where data consistency is put at risk and only at runtime one might encounter (or 
not) a quite substantial variety of errors, mostly stack overflows.

The reason is explained by the following C++ test-case which causes a runtime stack-overflow on a Win8 VS2012 system

________________________________________________________________
#include <iostream>
using namespace std;

class Twine;

enum Kind {twineType, stringType, emptyKind};

union Child {
  const char *cStringPtr;
  const Twine* twine;
};

// Simplified model of the Twine class - relevant excerpts
class Twine {
public:

  Child LHS;
  Kind LHSKind;
  Child RHS;
  Kind RHSKind;

  Twine(const char* pt) {
    LHSKind = Kind::stringType;
    LHS.cStringPtr = pt;
    RHSKind = Kind::emptyKind;
  }

  explicit Twine(Child _LHS, Kind _LHSKind,
    Child _RHS, Kind _RHSKind)
    : LHS(_LHS), RHS(_RHS), LHSKind(_LHSKind), RHSKind(_RHSKind) {
  }

  Twine Twine::concat(const Twine &Suffix) const {
    Child NewLHS, NewRHS;
    NewLHS.twine = this;
    NewRHS.twine = &Suffix;
    Kind NewLHSKind = Kind::twineType, NewRHSKind = Kind::twineType;

    if(RHSKind == Kind::emptyKind) { // If LHS is unused, just copy the POD stuff and forget about the object (save indirection)
      NewLHS = LHS;
      NewLHSKind = LHSKind;
    }
    if(Suffix.RHSKind == Kind::emptyKind) { // If RHS is unused, just copy the POD stuff and forget about the object (save indirection)
      NewRHS = Suffix.LHS;
      NewRHSKind = Suffix.LHSKind;
    }
    return Twine(NewLHS, NewLHSKind, NewRHS, NewRHSKind);
  }

};

Twine operator+(const Twine &LHS, const Twine &RHS) {
  return LHS.concat(RHS);
}

int main(int argc, char* argv[])
{

  Twine dt("A");
  dt = dt + "B"; // First time this works since the temporary Child data is entirely copied (POD = deep copy)

  dt = dt + "C"; // Second time this doesn't work! dt's Child LHS (which contained the pointer to the data) is 
                 // substituted with a pointer to itself. No more data.

  // dt.str(); // Infinite recursion when exploring LHS and stack overflow

  return 0;
}
________________________________________________________________

Whether or not this is considered a bug, it can introduce subtle memory problems in an application and aside 
from the "Twine is inherently dangerous" line, there's no mention of it. The proposed fix attempts to avoid such 
a (likely) unintended use.

As for the "Rule of Three", I believe the issue only stands for already-initialized reuse of objects, thus: only 
for assignment operators.


More information about the llvm-commits mailing list