[clang] [CIR] Build out AST consumer patterns to reach the entry point into CIRGen (PR #91007)

Tue May 14 05:35:03 PDT 2024

AaronBallman wrote:

> Thanks for everyone's input so far. Let me try to summarize discussions in this PR so we can set on an approach and give advice to our CIR community (and encode on our webpage) on how to move forward in upcoming patches. Useful resources:
> 
>     * [LLVM Coding Standard](https://llvm.org/docs/CodingStandards.html)
> 
>     * [MLIR Style Guide](https://mlir.llvm.org/getting_started/DeveloperGuide/#style-guide)
> 
> 
> CC'ing more people who might care: @seven-mile, @Lancern.
> ## Use of `auto`
> 
> As @joker-eph mentioned, MLIR isn't meant to differ from LLVM/Clang, though they encode the differences as part of their guidelines. The `auto` [guidance](https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable) is still in favor of us keeping it for all `isa`, `dyn_cast` and `cast`, which is used in CIR, so it probably doesn't affect most of what we currently care about. Here's my suggestion for the entire `lib/CIR`:
> 
>     1. Use it for function templates such as `isa`, `dyn_cast`, `cast`, `create` and `rewriteOp.*` variants.

Agreed -- anywhere the type is spelled out somewhere in the initializer is fine (so also `static_cast`, `getAs`, `new`, etc).

>     2. Use it for things considered obvious/common in MLIR space, examples:
> 
> 
>     * `auto loc = op->getLoc()`.

What is obvious in the MLIR space is not necessarily what's obvious in Clang; I have no idea whether that returns a `SourceLocation`, a `PresumedLoc`, a `FullSourceLoc`, etc, so I don't think that is a use of `auto` I would want to have to reason about as a reviewer. That said, if the only use of `loc` is: `Diag(loc, diag::err_something);` then the type really doesn't matter and use of `auto` is probably fine. It's a judgment call.

>     * Getting operands and results from operations (they obey Value Semantics), e.g.: `DepthwiseConv2DNhwcHwcmOp op; ...; auto stride = op.getStrides();`

Again, I have no idea what the type of that is and I have no way to verify that it actually *uses* value semantics because the pointer and qualifiers can be inferred and references can be dropped. That makes it harder to review the code; I would spell out the type in this case as well.

>     * Other examples we are happy to provide upon reviewer feedback if it makes sense.

The big things for reviewers are: don't use `auto` if the type isn't incredibly obvious (spelled in the initialization or extremely obvious based on immediately surrounding code) and don't make us infer important parts the declarator (if it's a `const Foo *` and `auto` makes sense for some reason, spell it `const auto *` rather than `auto`). And if a reviewer asks for something to be converted from `auto` to an explicit type name, assume that means the code isn't as readable as it could be and switch to a concrete type (we should not be arguing "I think this is clear therefore you should also think it's clear" during review, that just makes the review process painful for everyone involved).

> Using the logic above, all `auto`s in this current PR have to be changed (since none apply).
> ## Var naming: CamelCase vs camelBack
> 
> From this discussion, seems like @AaronBallman and @erichkeane are fine with us using camelBack and all the other differences from [MLIR Style Guide](https://mlir.llvm.org/getting_started/DeveloperGuide/#style-guide) in CIRGen and the rest of CIR. Is that right? This is based on the comment:
> 
> > The mixed naming conventions in the header should be fixed (preference is to follow LLVM style if we're changing code around, but if the local preference is for MLIR, that's fine so long as it's consistent).
>
> However, @AaronBallman also wrote:
> 
> > Also, the fact that the naming keeps being inconsistent is a pretty strong reason to change to use the LLVM naming convention, at least for external interfaces.
> 
> Should we ignore this in light of your first comment? If not, can you clarify what do you mean by external interfaces? Just want to make sure we get it right looking fwd.

My thinking is: 1) (most important) the convention should be consistent with other nearby code, whatever that convention happens to be. 2) if there's not already a convention to follow and if it's code that lives in `clang/include/clang`, it should generally follow the LLVM style (it's an "external" interface) because those are the APIs that folks outside of CIR will be calling into. If it's code that lives in `clang/lib/`, following the MLIR style is less of a concern.

So in the case of this review, there's mixed styles being used in the PR and so something has to change. If we're changing something anyway, changing to the LLVM style is generally preferred over changing to the MLIR style.

One caveat to this is: avoid busywork but use your best judgement on how we *eventually* get to a consistent use of the LLVM style. If changing to LLVM style means touching 1000s of lines of code while changing to MLIR style means touching 10s of lines of code, switch to MLIR style. But if changing to LLVM style means touching 100 lines of code and changing to MLIR style means touching 150 lines of code, it's probably best to just bite the bullet and switch to LLVM style even though MLIR would be less effort.

Does this all seem reasonable?

https://github.com/llvm/llvm-project/pull/91007