[PATCH] D75687: [clangd] Only minimally escape text when rendering to markdown.

Kadir Cetinkaya via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Mar 6 02:13:53 PST 2020


kadircet added a comment.

Just writing down my investigation results for context:
Looks like we'll never escape `$%'(,/:;?@[^{|}` anymore.

Markdown already doesn't provide backslash escaping for `$%',/:;?@^|` which leaves us with parentheses `([{}`:

- `[` looks fine as it is only used for hyperlinks and we escape the `]` if need be
- '(' again seems to be only special for providing a text for hyperlinks, so that should also be OK.
- `{}` apparently doesn't do anything in normal markdown but some extensions make use of it for adding `id`s to headers, i.e `# header1 {#mycrazyheader}`, so again should be OK.



================
Comment at: clang-tools-extra/clangd/FormattedString.cpp:69
+    return false;
+  if (Contents.front() == '!' || Contents.front() == '?' ||
+      Contents.front() == '/')
----------------
what is the `?` for ?


================
Comment at: clang-tools-extra/clangd/FormattedString.cpp:73
+  // Check the start of the tag name.
+  if (Contents.empty() || !llvm::isAlpha(Contents.front()))
+    return false;
----------------
we early exited already in case of empty `contents`


================
Comment at: clang-tools-extra/clangd/FormattedString.cpp:130
+  case '#': { // ATX heading.
+    if (!(StartsLine && Before.empty()))
+      return false;
----------------
nit: if(!StartsLine || !Before.empty()))


================
Comment at: clang-tools-extra/clangd/FormattedString.cpp:132
+      return false;
+    llvm::StringRef Rest = After.ltrim(C);
+    return Rest.empty() || Rest.startswith(" ");
----------------
nit: After.ltrim('#')


================
Comment at: clang-tools-extra/clangd/FormattedString.cpp:133
+    llvm::StringRef Rest = After.ltrim(C);
+    return Rest.empty() || Rest.startswith(" ");
+  }
----------------
markdown is weird :(


================
Comment at: clang-tools-extra/clangd/FormattedString.cpp:139
+    //   ]: is a link reference
+    // The following are only links if the a link reference exists:
+    //   ] by itself is a shortcut link
----------------
s/the a/the/


================
Comment at: clang-tools-extra/clangd/FormattedString.cpp:150
+    // Not a delimiter if surrounded by space.
+    return !SpaceSurrounds();
+  case '-': // Setex heading, horizontal ruler, or bullet.
----------------
`_` seems to behave different than `*` :(

it seems to rather depend on the spaces around the text being emphasized, i.e

```
foo _ bar _ foo -> no emphasis
foo _ bar_ foo -> no emphasis
foo _bar_ foo -> emphasis on bar
foo_bar_ foo -> no emphasis
```

so this should rather be `Before.endswith(" ") && isAlpha(After)` for the beginning of emphasis and the opposite for the ending.
Not sure if there's an easy way to decide on it in isolation.


================
Comment at: clang-tools-extra/clangd/FormattedString.cpp:162
+      return true;
+    return !SpaceSurrounds();
+  case '<': // HTML tag (or autolink, which we choose not to escape)
----------------
nit: `return IsBullet() || RulerLength() || !SpaceSurrounds()`;


================
Comment at: clang-tools-extra/clangd/FormattedString.cpp:164
+  case '<': // HTML tag (or autolink, which we choose not to escape)
+    return looksLikeTag(After);
+  case '>': // Quote marker. Needs escaping at start of line.
----------------
is this really worth all the trouble ?


================
Comment at: clang-tools-extra/clangd/FormattedString.cpp:166
+  case '>': // Quote marker. Needs escaping at start of line.
+    return StartsLine && Before.empty();
+  case '&': { // HTML entity reference
----------------
I believe it is also allowed to have whitespaces(less than 4?) before the `>` to be interpreted as a quote marker.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75687/new/

https://reviews.llvm.org/D75687





More information about the cfe-commits mailing list