[PATCH] D109701: [clang] Emit SARIF Diagnostics: Create `clang::SarifDocumentWriter` interface
Vaibhav Yenamandra via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Tue Sep 28 16:37:09 PDT 2021
vaibhav.y added inline comments.
================
Comment at: clang/include/clang/Basic/Sarif.h:95
+//
+/// Since every in clang artifact MUST have a location (there being no nested
+/// artifacts), the creation method \ref SarifArtifact::create requires a
----------------
aaron.ballman wrote:
> How do we expect to handle artifact locations that don't correspond directly to a file? For example, the user can specify macros on the command line and those macros could have a diagnostic result associated with them. Can we handle that sort of scenario?
I hadn't considered `-D` macros. Definitely a valid case to handle. I will respond with more information once I read through the related portions of the SARIF spec. A (sort of hacky?) solution come to mind after a quick glance:
Setting the artifact URI to: `data:text/plain:<CLI_ARG>` with an offset, and saying that it's role is `"referencedOnCommandLine"`. It seems strange since we need to copy over the command line, instead of referencing it directly. What do you think: https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html#_Toc34317617
Dropping a TODO comment so I can update later.
`TODO(envp)`
================
Comment at: clang/include/clang/Basic/Sarif.h:109
+ SarifArtifactLocation Location;
+ SmallVector<StringRef> Roles;
+
----------------
aaron.ballman wrote:
> What size would you like this `SmallVector` to have?
Glancing through valid values for roles: my expectation is that each artifact will have a _small_ number of roles associated, so `4` seems like a good threshold here. I don't really have a strong opinion on the size, so I'm open to changing this.
================
Comment at: clang/include/clang/Basic/Sarif.h:250
+
+ SarifResult() = default;
+
----------------
aaron.ballman wrote:
> A default constructed `SarifResult` will have an uninitialized `RuleIdx` -- are you okay with that?
Hrm, I had overlooked that case (since `appendResult` took and index and a rule). I'll make it take a rule index upon construction. This will also make it so that `appendResult` takes just the `SarifResult`, and not an (Idx, Result), which is definitely an artifact of before I added RuleIdx to `SarifResult`.
================
Comment at: clang/include/clang/Basic/Sarif.h:297-298
+ /// \internal
+ /// Return a pointer to the current tool. If no run exists, this will
+ /// crash.
+ json::Object *getCurrentTool();
----------------
aaron.ballman wrote:
>
Thanks for rewording. I'll also make this return a reference, since the pointer returned cannot be null.
================
Comment at: clang/include/clang/Basic/Sarif.h:388-389
+private:
+ /// Langauge options to use for the current SARIF document
+ const LangOptions *LangOpts;
+
----------------
aaron.ballman wrote:
> I think this should be a value type rather than a possibly null pointer type -- this way, the document can always rely on there being valid language options to check, and if the user provides no custom language options, the default `LangOptions` suffice. Alternatively, it seems reasonable to expect the user to have to pass in a valid language options object in order to create a SARIF document. WDYT?
I agree with that, will change it to store an owned value.
I think leaving the two constructors as is is fine as long as the `default` variant will also leave `LangOpts` in a valid state.
================
Comment at: clang/lib/Basic/Sarif.cpp:207-209
+ if (statusIter.second) {
+ I = statusIter.first;
+ }
----------------
aaron.ballman wrote:
> Our usual coding style elides these too. Btw, you can find the coding style document at: https://llvm.org/docs/CodingStandards.html
Thanks, sorry there's so many of these! I definitely need to not auto-pilot with style.
================
Comment at: clang/lib/Basic/Sarif.cpp:219-222
+json::Object *SarifDocumentWriter::getCurrentTool() {
+ assert(hasRun() && "Need to call createRun() before using getcurrentTool!");
+ return Runs.back().getAsObject()->get("tool")->getAsObject();
+}
----------------
aaron.ballman wrote:
> Should this return a reference rather than a pointer?
Makes sense to convert to a ref, the pointer returned can never be null anyway
================
Comment at: clang/lib/Basic/Sarif.cpp:230-232
+ if (!hasRun()) {
+ return;
+ }
----------------
aaron.ballman wrote:
> Is there a reason why we don't want to assert instead?
Creating a document requires ending any ongoing runs, and it is valid to create a document without any runs, so `createDocument()` calls `endRun()`.
I guess having a flag to mark the status of the document, and only calling `endRun()` if a an active run exists would likely be better. (`hasRun()` seems to have a rather broad responsibility, tracking both the availability & state of the current run)
I'm thinking of adding a `Closed` flag to the writer (default `true`), which is unset whenever `createRun()` is called, and `endRun()` will set the same flag. That way we only `endRun()` if there is something to end, like so:
1. Constructor makes writer with `Closed = true`
2. `createRun()` requires `Closed == true`, and sets it to `false`
3. `endRun()` requires `Closed == false`, and sets it to `true`
4. `createDocument()` requires `Closed == true`, and will call `endRun()` to ensure that
What do you think?
================
Comment at: clang/lib/Basic/Sarif.cpp:299
+ // Clear resources associated with a previous run
+ endRun();
+
----------------
aaron.ballman wrote:
> Is there a reason we don't want to assert that the caller has already ended a run before they created a new one?
Calling `createDocument()` also calls `endRun()` (so as to provide a "complete" view of the document under construction).
So having `endRun()` amount to a no-op when there is no run because it is valid for `createDocument()` return a document with no runs associated with it. What do you think?
================
Comment at: clang/lib/Basic/Sarif.cpp:316
+
+json::Object *SarifDocumentWriter::currentRun() {
+ assert(hasRun() && "SARIF Document has no runs, create a run first!");
----------------
aaron.ballman wrote:
> Should this return a reference as well?
That is reasonable. Will have `currentTool()` and `currentRun()` return references.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D109701/new/
https://reviews.llvm.org/D109701
More information about the cfe-commits
mailing list