[clang] 8b7881a - [clang-format] Add basic support for formatting JSON

via cfe-commits cfe-commits at lists.llvm.org
Sat Jun 26 07:20:54 PDT 2021


Author: mydeveloperday
Date: 2021-06-26T15:20:17+01:00
New Revision: 8b7881a084d0bc26499cf93e4ac45b59257a03b3

URL: https://github.com/llvm/llvm-project/commit/8b7881a084d0bc26499cf93e4ac45b59257a03b3
DIFF: https://github.com/llvm/llvm-project/commit/8b7881a084d0bc26499cf93e4ac45b59257a03b3.diff

LOG: [clang-format] Add basic support for formatting JSON

I find as I develop I'm moving between many different languages C++,C#,JavaScript all the time. As I move between the file types I like to keep `clang-format` as my formatting tool of choice. (hence why I initially added C# support  in {D58404}) I know those other languages have their own tools but I have to learn them all, and I have to work out how to configure them, and they may or may not have integration into my IDE or my source code integration.

I am increasingly finding that I'm editing additional JSON files as part of my daily work and my editor and git commit hooks are just not setup to go and run [[ https://stedolan.github.io/jq/ | jq ]], So I tend to go to  [[ https://jsonformatter.curiousconcept.com/ | JSON Formatter ]] and copy and paste back and forth. To get nicely formatted JSON. This is a painful process and I'd like a new one that causes me much less friction.

This has come up from time to time:

{D10543}
https://stackoverflow.com/questions/35856565/clang-format-a-json-file
https://bugs.llvm.org/show_bug.cgi?id=18699

I would like to stop having to do that and have formatting JSON as a first class clang-format support `Language` (even if it has minimal style settings at present).

This revision adds support for formatting JSON using the inbuilt JSON serialization library of LLVM, With limited control at present only over the indentation level

This adds an additional Language into the .clang-format file to separate the settings from your other supported languages.

Reviewed By: HazardyKnusperkeks

Differential Revision: https://reviews.llvm.org/D93528

Added: 
    clang/unittests/Format/FormatTestJson.cpp

Modified: 
    clang/docs/ClangFormat.rst
    clang/docs/ClangFormatStyleOptions.rst
    clang/docs/ReleaseNotes.rst
    clang/include/clang/Format/Format.h
    clang/lib/Format/ContinuationIndenter.cpp
    clang/lib/Format/Format.cpp
    clang/lib/Format/TokenAnnotator.cpp
    clang/tools/clang-format/ClangFormat.cpp
    clang/tools/clang-format/clang-format-diff.py
    clang/tools/clang-format/git-clang-format
    clang/unittests/Format/CMakeLists.txt

Removed: 
    


################################################################################
diff  --git a/clang/docs/ClangFormat.rst b/clang/docs/ClangFormat.rst
index d5333c0032b4..4a1422e85b06 100644
--- a/clang/docs/ClangFormat.rst
+++ b/clang/docs/ClangFormat.rst
@@ -11,12 +11,12 @@ Standalone Tool
 ===============
 
 :program:`clang-format` is located in `clang/tools/clang-format` and can be used
-to format C/C++/Java/JavaScript/Objective-C/Protobuf/C# code.
+to format C/C++/Java/JavaScript/JSON/Objective-C/Protobuf/C# code.
 
 .. code-block:: console
 
   $ clang-format -help
-  OVERVIEW: A tool to format C/C++/Java/JavaScript/Objective-C/Protobuf/C# code.
+  OVERVIEW: A tool to format C/C++/Java/JavaScript/JSON/Objective-C/Protobuf/C# code.
 
   If no arguments are specified, it formats the code from standard input
   and writes the result to the standard output.

diff  --git a/clang/docs/ClangFormatStyleOptions.rst b/clang/docs/ClangFormatStyleOptions.rst
index fd19cdfc3e60..96d89db7a5cc 100644
--- a/clang/docs/ClangFormatStyleOptions.rst
+++ b/clang/docs/ClangFormatStyleOptions.rst
@@ -2919,6 +2919,9 @@ the configuration (without a prefix: ``Auto``).
   * ``LK_JavaScript`` (in configuration: ``JavaScript``)
     Should be used for JavaScript.
 
+  * ``LK_Json`` (in configuration: ``Json``)
+    Should be used for JSON.
+
   * ``LK_ObjC`` (in configuration: ``ObjC``)
     Should be used for Objective-C, Objective-C++.
 

diff  --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index 7b23c1ef77af..f26c484c5bac 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -278,6 +278,8 @@ clang-format
 - Option ``AlignArrayOfStructure`` has been added to allow for ordering array-like
   initializers.
 
+- Support for formatting JSON file (\*.json) has been added to clang-format.
+
 libclang
 --------
 

diff  --git a/clang/include/clang/Format/Format.h b/clang/include/clang/Format/Format.h
index 515ac42d42f5..c424e79a971c 100644
--- a/clang/include/clang/Format/Format.h
+++ b/clang/include/clang/Format/Format.h
@@ -2493,6 +2493,8 @@ struct FormatStyle {
     LK_Java,
     /// Should be used for JavaScript.
     LK_JavaScript,
+    /// Should be used for JSON.
+    LK_Json,
     /// Should be used for Objective-C, Objective-C++.
     LK_ObjC,
     /// Should be used for Protocol Buffers
@@ -2506,6 +2508,7 @@ struct FormatStyle {
   };
   bool isCpp() const { return Language == LK_Cpp || Language == LK_ObjC; }
   bool isCSharp() const { return Language == LK_CSharp; }
+  bool isJson() const { return Language == LK_Json; }
 
   /// Language, this format style is targeted at.
   LanguageKind Language;
@@ -3796,6 +3799,8 @@ inline StringRef getLanguageName(FormatStyle::LanguageKind Language) {
     return "Java";
   case FormatStyle::LK_JavaScript:
     return "JavaScript";
+  case FormatStyle::LK_Json:
+    return "Json";
   case FormatStyle::LK_Proto:
     return "Proto";
   case FormatStyle::LK_TableGen:

diff  --git a/clang/lib/Format/ContinuationIndenter.cpp b/clang/lib/Format/ContinuationIndenter.cpp
index cbf016f4b166..8fbc15f27922 100644
--- a/clang/lib/Format/ContinuationIndenter.cpp
+++ b/clang/lib/Format/ContinuationIndenter.cpp
@@ -1906,12 +1906,12 @@ ContinuationIndenter::createBreakableToken(const FormatToken &Current,
                                            LineState &State, bool AllowBreak) {
   unsigned StartColumn = State.Column - Current.ColumnWidth;
   if (Current.isStringLiteral()) {
-    // FIXME: String literal breaking is currently disabled for C#, Java and
-    // JavaScript, as it requires strings to be merged using "+" which we
+    // FIXME: String literal breaking is currently disabled for C#, Java, Json
+    // and JavaScript, as it requires strings to be merged using "+" which we
     // don't support.
     if (Style.Language == FormatStyle::LK_Java ||
         Style.Language == FormatStyle::LK_JavaScript || Style.isCSharp() ||
-        !Style.BreakStringLiterals || !AllowBreak)
+        Style.isJson() || !Style.BreakStringLiterals || !AllowBreak)
       return nullptr;
 
     // Don't break string literals inside preprocessor directives (except for

diff  --git a/clang/lib/Format/Format.cpp b/clang/lib/Format/Format.cpp
index e95ec55d0805..2b860d2a25f7 100644
--- a/clang/lib/Format/Format.cpp
+++ b/clang/lib/Format/Format.cpp
@@ -63,6 +63,7 @@ template <> struct ScalarEnumerationTraits<FormatStyle::LanguageKind> {
     IO.enumCase(Value, "TableGen", FormatStyle::LK_TableGen);
     IO.enumCase(Value, "TextProto", FormatStyle::LK_TextProto);
     IO.enumCase(Value, "CSharp", FormatStyle::LK_CSharp);
+    IO.enumCase(Value, "Json", FormatStyle::LK_Json);
   }
 };
 
@@ -1133,6 +1134,9 @@ FormatStyle getLLVMStyle(FormatStyle::LanguageKind Language) {
   if (Language == FormatStyle::LK_TableGen) {
     LLVMStyle.SpacesInContainerLiterals = false;
   }
+  if (LLVMStyle.isJson()) {
+    LLVMStyle.ColumnLimit = 0;
+  }
 
   return LLVMStyle;
 }
@@ -2825,6 +2829,25 @@ reformat(const FormatStyle &Style, StringRef Code,
   if (Expanded.Language == FormatStyle::LK_JavaScript && isMpegTS(Code))
     return {tooling::Replacements(), 0};
 
+  // JSON only needs the formatting passing.
+  if (Style.isJson()) {
+    std::vector<tooling::Range> Ranges(1, tooling::Range(0, Code.size()));
+    auto Env =
+        std::make_unique<Environment>(Code, FileName, Ranges, FirstStartColumn,
+                                      NextStartColumn, LastStartColumn);
+    // Perform the actual formatting pass.
+    tooling::Replacements Replaces =
+        Formatter(*Env, Style, Status).process().first;
+    // add a replacement to remove the "x = " from the result.
+    if (!Replaces.add(tooling::Replacement(FileName, 0, 4, ""))) {
+      // apply the reformatting changes and the removal of "x = ".
+      if (applyAllReplacements(Code, Replaces)) {
+        return {Replaces, 0};
+      }
+    }
+    return {tooling::Replacements(), 0};
+  }
+
   typedef std::function<std::pair<tooling::Replacements, unsigned>(
       const Environment &)>
       AnalyzerPass;
@@ -2991,6 +3014,8 @@ static FormatStyle::LanguageKind getLanguageByFileName(StringRef FileName) {
     return FormatStyle::LK_TableGen;
   if (FileName.endswith_insensitive(".cs"))
     return FormatStyle::LK_CSharp;
+  if (FileName.endswith_insensitive(".json"))
+    return FormatStyle::LK_Json;
   return FormatStyle::LK_Cpp;
 }
 
@@ -3163,4 +3188,4 @@ llvm::Expected<FormatStyle> getStyle(StringRef StyleName, StringRef FileName,
 }
 
 } // namespace format
-} // namespace clang
\ No newline at end of file
+} // namespace clang

diff  --git a/clang/lib/Format/TokenAnnotator.cpp b/clang/lib/Format/TokenAnnotator.cpp
index 5f973950b8d7..aa69ff88bd74 100644
--- a/clang/lib/Format/TokenAnnotator.cpp
+++ b/clang/lib/Format/TokenAnnotator.cpp
@@ -2900,6 +2900,8 @@ bool TokenAnnotator::spaceRequiredBetween(const AnnotatedLine &Line,
                                           const FormatToken &Right) {
   if (Left.is(tok::kw_return) && Right.isNot(tok::semi))
     return true;
+  if (Style.isJson() && Left.is(tok::string_literal) && Right.is(tok::colon))
+    return false;
   if (Left.is(Keywords.kw_assert) && Style.Language == FormatStyle::LK_Java)
     return true;
   if (Style.ObjCSpaceAfterProperty && Line.Type == LT_ObjCProperty &&
@@ -3227,6 +3229,9 @@ bool TokenAnnotator::spaceRequiredBefore(const AnnotatedLine &Line,
     // and "%d %d"
     if (Left.is(tok::numeric_constant) && Right.is(tok::percent))
       return HasExistingWhitespace();
+  } else if (Style.isJson()) {
+    if (Right.is(tok::colon))
+      return false;
   } else if (Style.isCSharp()) {
     // Require spaces around '{' and  before '}' unless they appear in
     // interpolated strings. Interpolated strings are merged into a single token
@@ -3670,6 +3675,26 @@ bool TokenAnnotator::mustBreakBefore(const AnnotatedLine &Line,
       return true;
   }
 
+  // Basic JSON newline processing.
+  if (Style.isJson()) {
+    // Always break after a JSON record opener.
+    // {
+    // }
+    if (Left.is(TT_DictLiteral) && Left.is(tok::l_brace))
+      return true;
+    // Always break after a JSON array opener.
+    // [
+    // ]
+    if (Left.is(TT_ArrayInitializerLSquare) && Left.is(tok::l_square) &&
+        !Right.is(tok::r_square))
+      return true;
+    // Always break afer successive entries.
+    // 1,
+    // 2
+    if (Left.is(tok::comma))
+      return true;
+  }
+
   // If the last token before a '}', ']', or ')' is a comma or a trailing
   // comment, the intention is to insert a line break after it in order to make
   // shuffling around entries easier. Import statements, especially in

diff  --git a/clang/tools/clang-format/ClangFormat.cpp b/clang/tools/clang-format/ClangFormat.cpp
index c51d9f4ea1f0..144e87f78c64 100644
--- a/clang/tools/clang-format/ClangFormat.cpp
+++ b/clang/tools/clang-format/ClangFormat.cpp
@@ -411,6 +411,17 @@ static bool format(StringRef FileName) {
   unsigned CursorPosition = Cursor;
   Replacements Replaces = sortIncludes(*FormatStyle, Code->getBuffer(), Ranges,
                                        AssumedFileName, &CursorPosition);
+
+  // To format JSON insert a variable to trick the code into thinking its
+  // JavaScript.
+  if (FormatStyle->isJson()) {
+    auto Err = Replaces.add(tooling::Replacement(
+        tooling::Replacement(AssumedFileName, 0, 0, "x = ")));
+    if (Err) {
+      llvm::errs() << "Bad Json variable insertion\n";
+    }
+  }
+
   auto ChangedCode = tooling::applyAllReplacements(Code->getBuffer(), Replaces);
   if (!ChangedCode) {
     llvm::errs() << llvm::toString(ChangedCode.takeError()) << "\n";
@@ -506,7 +517,8 @@ int main(int argc, const char **argv) {
   cl::SetVersionPrinter(PrintVersion);
   cl::ParseCommandLineOptions(
       argc, argv,
-      "A tool to format C/C++/Java/JavaScript/Objective-C/Protobuf/C# code.\n\n"
+      "A tool to format C/C++/Java/JavaScript/JSON/Objective-C/Protobuf/C# "
+      "code.\n\n"
       "If no arguments are specified, it formats the code from standard input\n"
       "and writes the result to the standard output.\n"
       "If <file>s are given, it reformats the files. If -i is specified\n"

diff  --git a/clang/tools/clang-format/clang-format-
diff .py b/clang/tools/clang-format/clang-format-
diff .py
index 4edc0b2e19c2..ea483f59e96b 100755
--- a/clang/tools/clang-format/clang-format-
diff .py
+++ b/clang/tools/clang-format/clang-format-
diff .py
@@ -48,7 +48,7 @@ def main():
                       '(case sensitive, overrides -iregex)')
   parser.add_argument('-iregex', metavar='PATTERN', default=
                       r'.*\.(cpp|cc|c\+\+|cxx|c|cl|h|hh|hpp|hxx|m|mm|inc|js|ts'
-                      r'|proto|protodevel|java|cs)',
+                      r'|proto|protodevel|java|cs|json)',
                       help='custom pattern selecting file paths to reformat '
                       '(case insensitive, overridden by -regex)')
   parser.add_argument('-sort-includes', action='store_true', default=False,

diff  --git a/clang/tools/clang-format/git-clang-format b/clang/tools/clang-format/git-clang-format
index 3646b4ff41d7..0233ceb3a868 100755
--- a/clang/tools/clang-format/git-clang-format
+++ b/clang/tools/clang-format/git-clang-format
@@ -85,6 +85,7 @@ def main():
       'js',  # JavaScript
       'ts',  # TypeScript
       'cs',  # C Sharp
+      'json',  # Json
       ])
 
   p = argparse.ArgumentParser(

diff  --git a/clang/unittests/Format/CMakeLists.txt b/clang/unittests/Format/CMakeLists.txt
index 281ebc2d411c..5175c49d855f 100644
--- a/clang/unittests/Format/CMakeLists.txt
+++ b/clang/unittests/Format/CMakeLists.txt
@@ -9,6 +9,7 @@ add_clang_unittest(FormatTests
   FormatTestCSharp.cpp
   FormatTestJS.cpp
   FormatTestJava.cpp
+  FormatTestJson.cpp
   FormatTestObjC.cpp
   FormatTestProto.cpp
   FormatTestRawStrings.cpp

diff  --git a/clang/unittests/Format/FormatTestJson.cpp b/clang/unittests/Format/FormatTestJson.cpp
new file mode 100644
index 000000000000..a1680d80b92b
--- /dev/null
+++ b/clang/unittests/Format/FormatTestJson.cpp
@@ -0,0 +1,197 @@
+//===- unittest/Format/FormatTestJson.cpp - Formatting tests for Json     -===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "FormatTestUtils.h"
+#include "clang/Format/Format.h"
+#include "llvm/Support/Debug.h"
+#include "gtest/gtest.h"
+
+#define DEBUG_TYPE "format-test-json"
+
+namespace clang {
+namespace format {
+
+class FormatTestJson : public ::testing::Test {
+protected:
+  static std::string format(llvm::StringRef Code, unsigned Offset,
+                            unsigned Length, const FormatStyle &Style) {
+    LLVM_DEBUG(llvm::errs() << "---\n");
+    LLVM_DEBUG(llvm::errs() << Code << "\n\n");
+
+    tooling::Replacements Replaces;
+
+    // Mock up what ClangFormat.cpp will do for JSON by adding a variable
+    // to trick JSON into being JavaScript
+    if (Style.isJson()) {
+      auto Err = Replaces.add(
+          tooling::Replacement(tooling::Replacement("", 0, 0, "x = ")));
+      if (Err) {
+        llvm::errs() << "Bad Json variable insertion\n";
+      }
+    }
+    auto ChangedCode = applyAllReplacements(Code, Replaces);
+    if (!ChangedCode) {
+      llvm::errs() << "Bad Json varibale replacement\n";
+    }
+    StringRef NewCode = *ChangedCode;
+
+    std::vector<tooling::Range> Ranges(1, tooling::Range(0, NewCode.size()));
+    Replaces = reformat(Style, NewCode, Ranges);
+    auto Result = applyAllReplacements(NewCode, Replaces);
+    EXPECT_TRUE(static_cast<bool>(Result));
+    LLVM_DEBUG(llvm::errs() << "\n" << *Result << "\n\n");
+    return *Result;
+  }
+
+  static std::string
+  format(llvm::StringRef Code,
+         const FormatStyle &Style = getLLVMStyle(FormatStyle::LK_Json)) {
+    return format(Code, 0, Code.size(), Style);
+  }
+
+  static FormatStyle getStyleWithColumns(unsigned ColumnLimit) {
+    FormatStyle Style = getLLVMStyle(FormatStyle::LK_Json);
+    Style.ColumnLimit = ColumnLimit;
+    return Style;
+  }
+
+  static void
+  verifyFormat(llvm::StringRef Code,
+               const FormatStyle &Style = getLLVMStyle(FormatStyle::LK_Json)) {
+    EXPECT_EQ(Code.str(), format(Code, Style)) << "Expected code is not stable";
+    EXPECT_EQ(Code.str(), format(test::messUp(Code), Style));
+  }
+};
+
+TEST_F(FormatTestJson, JsonRecord) {
+  verifyFormat("{}");
+  verifyFormat("{\n"
+               "  \"name\": 1\n"
+               "}");
+  verifyFormat("{\n"
+               "  \"name\": \"Foo\"\n"
+               "}");
+  verifyFormat("{\n"
+               "  \"name\": {\n"
+               "    \"value\": 1\n"
+               "  }\n"
+               "}");
+  verifyFormat("{\n"
+               "  \"name\": {\n"
+               "    \"value\": 1\n"
+               "  },\n"
+               "  \"name\": {\n"
+               "    \"value\": 2\n"
+               "  }\n"
+               "}");
+  verifyFormat("{\n"
+               "  \"name\": {\n"
+               "    \"value\": [\n"
+               "      1,\n"
+               "      2,\n"
+               "    ]\n"
+               "  }\n"
+               "}");
+  verifyFormat("{\n"
+               "  \"name\": {\n"
+               "    \"value\": [\n"
+               "      \"name\": {\n"
+               "        \"value\": 1\n"
+               "      },\n"
+               "      \"name\": {\n"
+               "        \"value\": 2\n"
+               "      }\n"
+               "    ]\n"
+               "  }\n"
+               "}");
+  verifyFormat(R"({
+  "firstName": "John",
+  "lastName": "Smith",
+  "isAlive": true,
+  "age": 27,
+  "address": {
+    "streetAddress": "21 2nd Street",
+    "city": "New York",
+    "state": "NY",
+    "postalCode": "10021-3100"
+  },
+  "phoneNumbers": [
+    {
+      "type": "home",
+      "number": "212 555-1234"
+    },
+    {
+      "type": "office",
+      "number": "646 555-4567"
+    }
+  ],
+  "children": [],
+  "spouse": null
+})");
+}
+
+TEST_F(FormatTestJson, JsonArray) {
+  verifyFormat("[]");
+  verifyFormat("[\n"
+               "  1\n"
+               "]");
+  verifyFormat("[\n"
+               "  1,\n"
+               "  2\n"
+               "]");
+  verifyFormat("[\n"
+               "  {},\n"
+               "  {}\n"
+               "]");
+  verifyFormat("[\n"
+               "  {\n"
+               "    \"name\": 1\n"
+               "  },\n"
+               "  {}\n"
+               "]");
+}
+
+TEST_F(FormatTestJson, JsonNoStringSplit) {
+  FormatStyle Style = getLLVMStyle(FormatStyle::LK_Json);
+  Style.IndentWidth = 4;
+  verifyFormat(
+      "[\n"
+      "    {\n"
+      "        "
+      "\"naaaaaaaa\": \"foooooooooooooooooooooo oooooooooooooooooooooo\"\n"
+      "    },\n"
+      "    {}\n"
+      "]",
+      Style);
+  verifyFormat("[\n"
+               "    {\n"
+               "        "
+               "\"naaaaaaaa\": "
+               "\"foooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo"
+               "oooooooooooooooooooooooooo\"\n"
+               "    },\n"
+               "    {}\n"
+               "]",
+               Style);
+
+  Style.ColumnLimit = 80;
+  verifyFormat("[\n"
+               "    {\n"
+               "        "
+               "\"naaaaaaaa\":\n"
+               "            "
+               "\"foooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo"
+               "oooooooooooooooooooooooooo\"\n"
+               "    },\n"
+               "    {}\n"
+               "]",
+               Style);
+}
+
+} // namespace format
+} // end namespace clang


        


More information about the cfe-commits mailing list