Introduce StringBuilder utility around raw_string_ostream

Alp Toker alp at nuanti.com
Thu Jun 12 08:16:15 PDT 2014


On 12/06/2014 16:54, Chandler Carruth wrote:
> How about an even simpler approach.
>
> Put a std::string inside raw_string_ostream, and a default constructor 
> which binds the reference to its internal std::string. I'm moderately 
> confident that our optimizers will remove the "dead" std::string that 
> is left in the class by users that want to leverage external storage.
>
> This has the advantage of not introducing any new types or patterns, 
> merely removing the requirement to pass in a std::string to provide 
> storage by using internal storage.
>
> That's getting far enough down in the simplicity (for folks learning 
> about the APIs) to make me quite happy.

Okay, but that's still causing std::string heap allocations.

With my approach I can do:

   typedef SmallStringBuilder<128> StringBuilder;

That turns a whole lot of allocations and strlens across the board into 
no-ops with a simple centralised change (patch attached, based on Yaron 
Keren's suggestion). This really *does* make an impact and there are 
secondary benefits to consider like reduced fragmentation (though I 
don't have figures, it's clearly going to help). Stack allocation always 
wins for small strings outside of recursion scenarios.

And the patch also enforces clean storage reset() which is a really 
tricky operation to do manually with an ostream (requires sequential 
flush, storage reset, and sync -- do we really expect people to 
hand-write that without bugs?)

And it'll be visible to further PGO optimisations (anything that tunes a 
SmallString will apply here too).

On the other hand the approach you suggested just adds some syntactic 
sugar, at the cost of tying the code closer to std::string heap storage 
-- exactly what I'm incrementally working to avoid.

Alp.



>
>
> On Thu, Jun 12, 2014 at 2:39 PM, Alp Toker <alp at nuanti.com 
> <mailto:alp at nuanti.com>> wrote:
>
>
>     On 12/06/2014 14:17, Yaron Keren wrote:
>
>         This looks like a very useful class. It should work for the
>         similar SmallString pattern, right?
>
>           SmallString<256> OutName;
>           llvm::raw_svector_ostream Out(OutName);
>         cast<ItaniumMangleContext>(CGM.getCXXABI().getMangleContext()).mangleCXXVTT(RD,
>         Out);
>           Out.flush();
>           StringRef Name = OutName.str();
>
>         ->
>
>           StringBuilder Out;
>         cast<ItaniumMangleContext>(CGM.getCXXABI().getMangleContext()).mangleCXXVTT(RD,
>         Out);
>           StringRef Name = Out.str();
>
>
>     Yeah, that'd work. But for cases like the above we should first
>     provide a SmallString<> backed specialization that permits
>     StringBuilder<256> Out;
>
>     The initial patch I posted is aimed more at the std::string-backed
>     cases. A stack-allocating specialization should follow nicely from
>     that.
>
>     Alp.
>
>
>
>         Yaron
>
>
>
>         2014-06-12 8:01 GMT+03:00 Alp Toker <alp at nuanti.com
>         <mailto:alp at nuanti.com> <mailto:alp at nuanti.com
>         <mailto:alp at nuanti.com>>>:
>
>
>
>             On 12/06/2014 07:41, David Blaikie wrote:
>
>                 (would find this a little easier to review if it were
>         in Phab)
>
>                 One off-the-cuff thought: If it's not already
>         documented, you
>                 might
>                 want to put a comment in raw_string_ostream's ctor that
>                 declares that
>                 it /really shouldn't/ touch the string (since you're
>         passing in an
>                 object in the derived class that won't itself be
>         constructed until
>                 after the raw_string_ostream's ctor has finished). Or,
>         better
>                 yet, if
>                 there's a way to build a raw_string_ostream without a
>         string, then
>                 attach it later (in the constructor of StringBuilder) that
>                 might be
>                 good.
>
>
>             Sure, I'll add a comment. (But I don't think it's a big
>         concern to
>             justify any major shuffle -- the ctor is right there
>         inline to see
>             a few lines above, and it seems unlikely a stream adapter
>         would
>             ever touch its storage unless written to.)
>
>
>                 (insert various bikeshedding about the name - though
>         I'm not
>                 really
>                 sure raw_ostream ever needed to adopt the C++ standard
>         library
>                 naming
>                 convention in the first place, but I haven't looked
>         closely)
>
>
>             So, I thought about that..
>
>             Having tried raw_string_builder_ostream,
>         string_builder_ostream,
>             raw_string_builder, I ended up with StringBuilder as the
>         only one
>             that looked beautiful in use.
>
>             Hard to say why, but it's probably because it has a dual
>         role as
>             storage and streaming interface unlike the ostream adapter
>             classes, making the distinction significant.
>
>             Alp.
>
>
>
>                 On Wed, Jun 11, 2014 at 9:25 PM, Alp Toker
>         <alp at nuanti.com <mailto:alp at nuanti.com>
>                 <mailto:alp at nuanti.com <mailto:alp at nuanti.com>>> wrote:
>
>                     Replaces the pattern:
>
>                        std::string buf;
>                        raw_string_ostream os(buf);
>                        ...
>                        os.flush();
>                        use(buf);
>
>                     with:
>                        StringBuilder os;
>                        ...
>                        use (os.str());
>
>                     Benefits:
>
>                       * Provides an inherently safe and opaque
>         interface for
>                     building std::string
>                     instances
>                       * Opens up the possibility of changing the
>         underlying
>                     storage in future.
>
>                     Patch also converts various uses, some of which were
>                     accessing the storage
>                     without flushing.
>
>                     Alp.
>
>                     --
>         http://www.nuanti.com
>                     the browser experts
>
>
>                     _______________________________________________
>                     llvm-commits mailing list
>         llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>         <mailto:llvm-commits at cs.uiuc.edu
>         <mailto:llvm-commits at cs.uiuc.edu>>
>
>         http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
>             -- http://www.nuanti.com
>             the browser experts
>
>             _______________________________________________
>             llvm-commits mailing list
>         llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>         <mailto:llvm-commits at cs.uiuc.edu
>         <mailto:llvm-commits at cs.uiuc.edu>>
>         http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
>
>     -- 
>     http://www.nuanti.com
>     the browser experts
>
>     _______________________________________________
>     llvm-commits mailing list
>     llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>     http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>

-- 
http://www.nuanti.com
the browser experts

-------------- next part --------------
diff --git a/include/llvm/Support/raw_ostream.h b/include/llvm/Support/raw_ostream.h
index 4d342da..ae2c113 100644
--- a/include/llvm/Support/raw_ostream.h
+++ b/include/llvm/Support/raw_ostream.h
@@ -15,6 +15,7 @@
 #define LLVM_SUPPORT_RAW_OSTREAM_H
 
 #include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/SmallVector.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/DataTypes.h"
 
@@ -463,6 +464,12 @@ class raw_svector_ostream : public raw_ostream {
   /// current_pos - Return the current position within the stream, not
   /// counting the bytes currently in the buffer.
   uint64_t current_pos() const override;
+
+protected:
+  explicit raw_svector_ostream(SmallVectorImpl<char> &O, std::nullptr_t)
+      : OS(O){};
+  void init();
+
 public:
   /// Construct a new raw_svector_ostream.
   ///
@@ -497,22 +504,26 @@ public:
 
 /// StringBuilder - A raw_ostream that builds an std::string.  This is
 /// a raw_string_ostream with storage.
-class StringBuilder : public raw_string_ostream {
-  std::string Buffer;
+template <unsigned InternalLen>
+class SmallStringBuilder : public raw_svector_ostream {
+  SmallVector<char, InternalLen> Buffer;
 
 protected:
   // There's no need to explicitly flush a StringBuilder.
-  using raw_string_ostream::flush;
+  using raw_svector_ostream::flush;
 
 public:
-  StringBuilder() : raw_string_ostream(Buffer) {}
+  SmallStringBuilder() : raw_svector_ostream(Buffer, 0) { init(); }
 
   void clear() {
     flush();
     Buffer.clear();
+    resync();
   }
 };
 
+typedef SmallStringBuilder<128> StringBuilder;
+
 } // end llvm namespace
 
 #endif
diff --git a/lib/Analysis/Analysis.cpp b/lib/Analysis/Analysis.cpp
index 24f57cd..3c07e8c 100644
--- a/lib/Analysis/Analysis.cpp
+++ b/lib/Analysis/Analysis.cpp
@@ -87,7 +87,7 @@ LLVMBool LLVMVerifyModule(LLVMModuleRef M, LLVMVerifierFailureAction Action,
     report_fatal_error("Broken module found, compilation aborted!");
 
   if (OutMessages)
-    *OutMessages = strdup(MsgsOS.str().c_str());
+    *OutMessages = strndup(MsgsOS.str().data(), MsgsOS.str().size());
 
   return Result;
 }
diff --git a/lib/IR/Core.cpp b/lib/IR/Core.cpp
index 2431d70..ee4ce48 100644
--- a/lib/IR/Core.cpp
+++ b/lib/IR/Core.cpp
@@ -62,6 +62,11 @@ void LLVMShutdown() {
 
 /*===-- Error handling ----------------------------------------------------===*/
 
+static char *LLVMCreateMessage(StringRef Message) {
+  assert(Message.find('\0') == Message.npos);
+  return strndup(Message.data(), Message.size());
+}
+
 char *LLVMCreateMessage(const char *Message) {
   return strdup(Message);
 }
@@ -113,7 +118,7 @@ char *LLVMGetDiagInfoDescription(LLVMDiagnosticInfoRef DI) {
   StringBuilder Msg;
   DiagnosticPrinterRawOStream DP(Msg);
   unwrap(DI)->print(DP);
-  return LLVMCreateMessage(Msg.str().c_str());
+  return LLVMCreateMessage(Msg.str());
 }
 
 LLVMDiagnosticSeverity LLVMGetDiagInfoSeverity(LLVMDiagnosticInfoRef DI){
@@ -199,7 +204,7 @@ LLVMBool LLVMPrintModuleToFile(LLVMModuleRef M, const char *Filename,
 char *LLVMPrintModuleToString(LLVMModuleRef M) {
   StringBuilder os;
   unwrap(M)->print(os, nullptr);
-  return strdup(os.str().c_str());
+  return LLVMCreateMessage(os.str());
 }
 
 /*--.. Operations on inline assembler ......................................--*/
@@ -273,7 +278,7 @@ char *LLVMPrintTypeToString(LLVMTypeRef Ty) {
   StringBuilder os;
   assert(unwrap(Ty) != nullptr && "Expecting non-null Type");
   unwrap(Ty)->print(os);
-  return strdup(os.str().c_str());
+  return LLVMCreateMessage(os.str());
 }
 
 /*--.. Operations on integer types .........................................--*/
@@ -520,7 +525,7 @@ char* LLVMPrintValueToString(LLVMValueRef Val) {
   StringBuilder os;
   assert(unwrap(Val) != nullptr && "Expecting non-null Value");
   unwrap(Val)->print(os);
-  return strdup(os.str().c_str());
+  return LLVMCreateMessage(os.str());
 }
 
 void LLVMReplaceAllUsesWith(LLVMValueRef OldVal, LLVMValueRef NewVal) {
diff --git a/lib/IRReader/IRReader.cpp b/lib/IRReader/IRReader.cpp
index c1f6c71..cb64592 100644
--- a/lib/IRReader/IRReader.cpp
+++ b/lib/IRReader/IRReader.cpp
@@ -110,7 +110,7 @@ LLVMBool LLVMParseIRInContext(LLVMContextRef ContextRef,
     if (OutMessage) {
       StringBuilder os;
       Diag.print(nullptr, os, false);
-      *OutMessage = strdup(os.str().c_str());
+      *OutMessage = strndup(os.str().data(), os.str().size());
     }
     return 1;
   }
diff --git a/lib/LTO/LTOCodeGenerator.cpp b/lib/LTO/LTOCodeGenerator.cpp
index c7c516f..debfbd1 100644
--- a/lib/LTO/LTOCodeGenerator.cpp
+++ b/lib/LTO/LTOCodeGenerator.cpp
@@ -568,10 +568,13 @@ void LTOCodeGenerator::DiagnosticHandler2(const DiagnosticInfo &DI) {
   DiagnosticPrinterRawOStream DP(Msg);
   DI.print(DP);
 
+  // Null-terminate the C string.
+  Msg << '\0';
+
   // If this method has been called it means someone has set up an external
   // diagnostic handler. Assert on that.
   assert(DiagHandler && "Invalid diagnostic handler");
-  (*DiagHandler)(Severity, Msg.str().c_str(), DiagContext);
+  (*DiagHandler)(Severity, Msg.str().drop_back().data(), DiagContext);
 }
 
 void
diff --git a/lib/Support/raw_ostream.cpp b/lib/Support/raw_ostream.cpp
index f55838e..0cc9427 100644
--- a/lib/Support/raw_ostream.cpp
+++ b/lib/Support/raw_ostream.cpp
@@ -704,6 +704,10 @@ void raw_string_ostream::write_impl(const char *Ptr, size_t Size) {
 // and we only need to set the vector size when the data is flushed.
 
 raw_svector_ostream::raw_svector_ostream(SmallVectorImpl<char> &O) : OS(O) {
+  init();
+}
+
+void raw_svector_ostream::init() {
   // Set up the initial external buffer. We make sure that the buffer has at
   // least 128 bytes free; raw_ostream itself only requires 64, but we want to
   // make sure that we don't grow the buffer unnecessarily on destruction (when
diff --git a/lib/Target/TargetMachineC.cpp b/lib/Target/TargetMachineC.cpp
index cf04faa..d1c683d 100644
--- a/lib/Target/TargetMachineC.cpp
+++ b/lib/Target/TargetMachineC.cpp
@@ -241,9 +241,9 @@ LLVMBool LLVMTargetMachineEmitToMemoryBuffer(LLVMTargetMachineRef T,
   formatted_raw_ostream Out(Code);
   bool Result = LLVMTargetMachineEmit(T, M, Out, codegen, ErrorMessage);
 
-  std::string &Data = Code.str();
-  *OutMemBuf = LLVMCreateMemoryBufferWithMemoryRangeCopy(Data.c_str(),
-                                                     Data.length(), "");
+  StringRef Buffer = Code.str();
+  *OutMemBuf = LLVMCreateMemoryBufferWithMemoryRangeCopy(Buffer.data(),
+                                                         Buffer.size(), "");
   return Result;
 }
 


More information about the llvm-commits mailing list