[clang] [CIR] Build out AST consumer patterns to reach the entry point into CIRGen (PR #91007)

Mon May 20 13:07:47 PDT 2024

bcardosolopes wrote:

> What is obvious in the MLIR space is not necessarily what's obvious in Clang;

Sure.

>  I have no idea whether that returns a `SourceLocation`, a `PresumedLoc`, a `FullSourceLoc`, etc, so I don't think that is a use of `auto` I would want to have to reason about as a reviewer. That said, if the only use of `loc` is: `Diag(loc, diag::err_something);` then the type really doesn't matter and use of `auto` is probably fine. It's a judgment call.

For location, `getLoc` always return `mlir::Location`, it's not related to Clang's source location. CIRGen use of `loc` is mostly to pass it down to MLIR APIs, because MLIR source locations are required whenever one is building or rewriting an operation. We usually convert Clang's location into MLIR and pass those around as expected. Has that changed your opinion for this specific case?

> Again, I have no idea what the type of that is and I have no way to verify that it actually _uses_ value semantics because the pointer and qualifiers can be inferred and references can be dropped. That makes it harder to review the code; I would spell out the type in this case as well.

Makes sense. For some background: MLIR "instructions", usually called "operations", usually have pretty default return types when querying operands, `mlir::Value` for SSA values or when an `mlir::Attribute` is returned, the getter/setter is suffixed with `Attr`. Would it be reasonable to suggest a guideline where in CIRGen we get more strict on not using `auto` but for other passes and transformations it should be fine to use auto on those? My rationale here is that post CIRGen it's very likely the reviwer audience is going to be more among MLIR experts than Clang experts and the former are more familiar with the idiom.

> The big things for reviewers are: don't use `auto` if the type isn't incredibly obvious (spelled in the initialization or extremely obvious based on immediately surrounding code) and don't make us infer important parts the declarator (if it's a `const Foo *` and `auto` makes sense for some reason, spell it `const auto *` rather than `auto`).

Agreed. Btw, I'm not advocating for plain `auto` here/everywhere. This is more about `auto` usage big picture.

> And if a reviewer asks for something to be converted from `auto` to an explicit type name, assume that means the code isn't as readable as it could be and switch to a concrete type (we should not be arguing "I think this is clear therefore you should also think it's clear" during review, that just makes the review process painful for everyone involved).

Agreed. If it makes the reviewer job easier we should just abide.

> My thinking is: 1) (most important) the convention should be consistent with other nearby code, whatever that convention happens to be.

Ok, for camelBack, it seems like `clang/lib/CodeGen` is already adopting it in multiple places (unsure if it's intentional or not?), and looks incomplete because probably no one took the effort to make it uniform? The signal this gives me is that we should just do it (use camelBack) for CIRGen.

2) if there's not already a convention to follow and if it's code that lives in `clang/include/clang`, it should generally follow the LLVM style (it's an "external" interface) because those are the APIs that folks outside of CIR will be calling into. If it's code that lives in `clang/lib/`, following the MLIR style is less of a concern.

Great & agreed, thanks for the clarification.

> ... changing to the LLVM style is generally preferred over changing to the MLIR style. One caveat to this is: avoid busywork but use your best judgement on how we eventually get to a consistent use of the LLVM style. ... it's probably best to just bite the bullet and switch to LLVM style even though MLIR would be less effort. Does this all seem reasonable?

Yep, thank you @AaronBallman for putting the time here to help us set the right approach looking fwd. To be more specific, I'd suggest:

- `clang/include` -> LLVM
- `lib/Driver/*` -> LLVM
- `lib/FrontendTool` -> LLVM
- `lib/CIR/FrontendAction` -> LLVM
- `lib/CIR/CodeGen` -> MLIR
- `lib/CIR/*` -> MLIR

What do you think?

https://github.com/llvm/llvm-project/pull/91007