[PATCH] D109701: [clang] Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter` interface

Vaibhav Yenamandra via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Sep 28 16:37:09 PDT 2021


vaibhav.y added inline comments.


================
Comment at: clang/include/clang/Basic/Sarif.h:95
+//
+/// Since every in clang artifact MUST have a location (there being no nested
+/// artifacts), the creation method \ref SarifArtifact::create requires a
----------------
aaron.ballman wrote:
> How do we expect to handle artifact locations that don't correspond directly to a file? For example, the user can specify macros on the command line and those macros could have a diagnostic result associated with them. Can we handle that sort of scenario?
I hadn't considered `-D` macros. Definitely a valid case to handle. I will respond with more information once I read through the related portions of the SARIF spec. A (sort of hacky?) solution come to mind after a quick glance:

Setting the artifact URI to: `data:text/plain:<CLI_ARG>` with an offset, and saying that it's role is `"referencedOnCommandLine"`. It seems strange since we need to copy over the command line, instead of referencing it directly. What do you think: https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html#_Toc34317617

Dropping a TODO comment so I can update later.

`TODO(envp)`


================
Comment at: clang/include/clang/Basic/Sarif.h:109
+  SarifArtifactLocation Location;
+  SmallVector<StringRef> Roles;
+
----------------
aaron.ballman wrote:
> What size would you like this `SmallVector` to have?
Glancing through valid values for roles: my expectation is that each artifact will have a _small_ number of roles associated, so `4` seems like a good threshold here. I don't really have a strong opinion on the size, so I'm open to changing this.


================
Comment at: clang/include/clang/Basic/Sarif.h:250
+
+  SarifResult() = default;
+
----------------
aaron.ballman wrote:
> A default constructed `SarifResult` will have an uninitialized `RuleIdx` -- are you okay with that?
Hrm, I had overlooked that case (since `appendResult` took and index and a rule). I'll make it take a rule index upon construction. This will also make it so that `appendResult` takes just the `SarifResult`, and not an (Idx, Result), which is definitely an artifact of before I added RuleIdx to `SarifResult`.



================
Comment at: clang/include/clang/Basic/Sarif.h:297-298
+  /// \internal
+  /// Return a pointer to the current tool. If no run exists, this will
+  /// crash.
+  json::Object *getCurrentTool();
----------------
aaron.ballman wrote:
> 
Thanks for rewording. I'll also make this return a reference, since the pointer returned cannot be null.


================
Comment at: clang/include/clang/Basic/Sarif.h:388-389
+private:
+  /// Langauge options to use for the current SARIF document
+  const LangOptions *LangOpts;
+
----------------
aaron.ballman wrote:
> I think this should be a value type rather than a possibly null pointer type -- this way, the document can always rely on there being valid language options to check, and if the user provides no custom language options, the default `LangOptions` suffice. Alternatively, it seems reasonable to expect the user to have to pass in a valid language options object in order to create a SARIF document. WDYT?
I agree with that, will change it to store an owned value.

I think leaving the two constructors as is is fine as long as the `default` variant will also leave `LangOpts` in a valid state.


================
Comment at: clang/lib/Basic/Sarif.cpp:207-209
+    if (statusIter.second) {
+      I = statusIter.first;
+    }
----------------
aaron.ballman wrote:
> Our usual coding style elides these too. Btw, you can find the coding style document at: https://llvm.org/docs/CodingStandards.html
Thanks, sorry there's so many of these! I definitely need to not auto-pilot with style.


================
Comment at: clang/lib/Basic/Sarif.cpp:219-222
+json::Object *SarifDocumentWriter::getCurrentTool() {
+  assert(hasRun() && "Need to call createRun() before using getcurrentTool!");
+  return Runs.back().getAsObject()->get("tool")->getAsObject();
+}
----------------
aaron.ballman wrote:
> Should this return a reference rather than a pointer?
Makes sense to convert to a ref, the pointer returned can never be null anyway


================
Comment at: clang/lib/Basic/Sarif.cpp:230-232
+  if (!hasRun()) {
+    return;
+  }
----------------
aaron.ballman wrote:
> Is there a reason why we don't want to assert instead?
Creating a document requires ending any ongoing runs, and it is valid to create a document without any runs, so `createDocument()` calls `endRun()`.

I guess having a flag to mark the status of the document, and only calling `endRun()` if a an active run exists would likely be better. (`hasRun()` seems to have a rather broad responsibility, tracking both the availability & state of the current run)

I'm thinking of adding a `Closed` flag to the writer (default `true`), which is unset whenever `createRun()` is called, and `endRun()` will set the same flag. That way we only `endRun()` if there is something to end, like so: 

1. Constructor makes writer with `Closed = true`
2. `createRun()` requires `Closed == true`, and sets it to `false`
3. `endRun()` requires `Closed == false`, and sets it to `true`
4. `createDocument()` requires `Closed == true`, and will call `endRun()` to ensure that

What do you think?


================
Comment at: clang/lib/Basic/Sarif.cpp:299
+  // Clear resources associated with a previous run
+  endRun();
+
----------------
aaron.ballman wrote:
> Is there a reason we don't want to assert that the caller has already ended a run before they created a new one?
Calling `createDocument()` also calls `endRun()` (so as to provide a "complete" view of the document under construction).

So having `endRun()` amount to a no-op when there is no run because it is valid for `createDocument()` return a document with no runs associated with it. What do you think?


================
Comment at: clang/lib/Basic/Sarif.cpp:316
+
+json::Object *SarifDocumentWriter::currentRun() {
+  assert(hasRun() && "SARIF Document has no runs, create a run first!");
----------------
aaron.ballman wrote:
> Should this return a reference as well?
That is reasonable. Will have `currentTool()` and `currentRun()` return references.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109701/new/

https://reviews.llvm.org/D109701



More information about the cfe-commits mailing list