[llvm-commits] [llvm] r146735 - in /llvm/trunk: CMakeLists.txt include/llvm/Support/JSONParser.h lib/Support/CMakeLists.txt lib/Support/JSONParser.cpp test/CMakeLists.txt test/Other/json-bench-test.ll unittests/CMakeLists.txt unittests/Support/JSONParserTest.cpp utils/Makefile utils/json-bench/ utils/json-bench/CMakeLists.txt utils/json-bench/JSONBench.cpp utils/json-bench/Makefile

Chris Lattner clattner at apple.com
Mon Dec 19 17:06:13 PST 2011


On Dec 16, 2011, at 5:09 AM, Manuel Klimek wrote:
> Author: klimek
> Date: Fri Dec 16 07:09:10 2011
> New Revision: 146735
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=146735&view=rev
> Log:
> Adds a JSON parser and a benchmark (json-bench) to catch performance regressions.

Thanks for adding this.  I have some serious concerns about the structure of the implementation of this though, as long as several picky microscopic things:

> +++ llvm/trunk/include/llvm/Support/JSONParser.h Fri Dec 16 07:09:10 2011
> @@ -0,0 +1,444 @@
> +//===--- JsonParser.h - Simple JSON parser ----------------------*- C++ -*-===//

Please match the comment line to the filename, including capitalization.

> +
> +#ifndef LLVM_CLANG_TOOLING_JSON_PARSER_H
> +#define LLVM_CLANG_TOOLING_JSON_PARSER_H

Please fix.

> +
> +#include "llvm/ADT/StringRef.h"
> +#include "llvm/Support/Allocator.h"
> +#include "llvm/Support/ErrorHandling.h"
> +
> +#include <string>

This doesn't need <string> here, it gets it transitively through StringRef.h.

> 
> +/// \brief A parser for JSON text.
> +///
> +/// Use an object of JSONParser to iterate over the values of a JSON text.
> +/// All objects are parsed during the iteration, so you can only iterate once
> +/// over the JSON text, but the cost of partial iteration is minimized.
> +/// Create a new JSONParser if you want to iterate multiple times.
> +class JSONParser {
> +public:
> +  /// \brief Create a JSONParser for the given input.
> +  ///
> +  /// Parsing is started via parseRoot(). Access to the object returned from
> +  /// parseRoot() will parse the input lazily.
> +  JSONParser(StringRef Input);
...
> +  bool failed() const;
...
> +  std::string getErrorMessage() const;

Please switch this to use llvm::SourceMgr to produce caret diagnostics on errors.  It is really unfortunate to have another poor parser with a weird error API interface.

> 
> +private:
> +  /// \brief These methods manage the implementation details of parsing new JSON
> +  /// atoms.
> +  /// @{
> +  JSONString *parseString();
> +  JSONValue *parseValue();
> +  JSONKeyValuePair *parseKeyValuePair();
> +  /// @}

The structure of the parser is exposing all the parsing implementation details to the clients of the header.  Why not have the builder be a simple C function and make the JSONParser class be completely private to the .cpp file?  This would be a lot cleaner, and it would be even better to detangle "skip()" out of the various classes.

> 
> +/// \brief Implementation of JSON containers (arrays and objects).
> +///
> +/// JSONContainers drive the lazy parsing of JSON arrays and objects via
> +/// forward iterators. Call 'skip' to validate parsing of all elements of the
> +/// container and to position the parse stream behind the container.
> +template <typename AtomT, char StartChar, char EndChar,
> +          JSONAtom::Kind ContainerKind>
> +class JSONContainer : public JSONValue {

It is somewhat cute what you're doing here, but it doesn't seem worthwhile to make this a template.  This can just be a class with the template arguments becoming constructor arguments.  Yes, you lose the compile-time specialization for the various cases, but it probably won't matter and this lets you move all the implementation guts out of line to the .cpp file.

> 
> +#endif // LLVM_CLANG_TOOLING_JSON_PARSER_H

Please update.

> +++ llvm/trunk/lib/Support/JSONParser.cpp Fri Dec 16 07:09:10 2011
> @@ -0,0 +1,221 @@
> +//===--- JsonParser.cpp - Simple JSON parser ------------------------------===//

Capitalization again.

> +#include "llvm/Support/JSONParser.h"
> +
> +#include "llvm/ADT/Twine.h"
> +#include "llvm/Support/Casting.h"
> +
> +namespace llvm {

Please use "using namespace llvm;" here, per the coding standards.

> 
> +// Forbidding inlining improves performance by roughly 20%.
> +// FIXME: Remove once llvm optimizes this to the faster version without hints.

This is a pretty terrible hack.  Please file a PR about this.

> +// Checks if there is a whitespace character at the current position.
> +bool JSONParser::isWhitespace() {
> +  return Position != Input.end() && (*Position == ' ' || *Position == '\t' ||
> +                                     *Position == '\n' || *Position == '\r');
> +}

You apparently care a lot about performance here.  If you want to make this fast, you should follow the lead of the clang lexer and rely on the source buffer being nul terminated (which SourceMgr provides).  This makes checks like this a lot faster by eliminating a ton of redundant "end" checks.

> 
> +bool JSONAtom::skip() const {
> +  switch (MyKind) {
> +  case JK_Array:        return cast<JSONArray>(this)->skip();
> +  case JK_Object:       return cast<JSONObject>(this)->skip();
> +  case JK_String:       return cast<JSONString>(this)->skip();
> +  case JK_KeyValuePair: return cast<JSONKeyValuePair>(this)->skip();

This should be in the parsing logic, not each of the value classes.

-Chris




More information about the llvm-commits mailing list