[clang] Clarify use of contractions in diagnostic messages (PR #116803)

Tue Nov 19 09:33:24 PST 2024

AaronBallman wrote:

> > could there be tools that try to parse the messages
> 
> Hmm, I think we have other formats that are better suited for that (don’t we have a flag that makes us print JSON diagnostics?), so I’d _hope_ that no-one tries to just parse the diagnostics from the terminal, and even then, you could definitely hard-code common contractions imo, but that is an interesting question nonetheless.

Yeah, I think we'd want to push folks towards using `-fdiagnostics-format` which supports interchange formats like SARIF.

> I guess that makes sense yeah (I personally don’t care that much about consistency wrt diagnostic wording, but I can also see why that’s something we’d want).

While it is annoying to have to remember a list of rules about diagnostic messages, I think it's important that we aim for consistency because I think we want there to be one "voice" to things like diagnostics, documentation, and other communications with the user. (The docs don't have to be consistent with the diagnostics, but should be consistent with other documentation in Clang, etc.) I think that provides a better user experience than having multiple "voices" throughout the product.

Here's where we're at currently for contractions vs long form (looking at sema, parse, and common diagnostics):
`can't`: 0 contractions vs 795 long
`isn't`: 9 contractions vs 352 long
`doesn't`: 3 contractions vs 190 long
`aren't`: 3 contractions vs 41 long
`shouldn't`: 0 contractions vs 26 long
`don't`: 2 contractions vs 15 long
`won't`: 0 contractions vs 13 long
`wasn't`: 0 contractions vs 10 long
`couldn't`: 0 contractions vs 9 long
`hasn't`: 0 contractions vs 2 long
`didn't`: 0 contractions vs 0 long

so I think we have a general preference for long form over contractions. From spot-checking the uses of contractions, it seems that all uses could pretty easily be written just as clearly as the long form and it wouldn't be much churn (about 15-20 messages in total).

> So in sum, enforcing one over the other is not what I’d want to do (and I just don’t think it’s all that necessary), but if we decide to go that route, then I’m fine w/ that too ;Þ

I don't see much benefit to having such a lopsided approach as we currently have. That said, the proposal is to "prefer", so it's guiding rather than purely prescriptive. Can you live with that?

https://github.com/llvm/llvm-project/pull/116803